History. Compilers. Phases and other tools. Compiler. Today. Comp 104: Operating Systems Concepts. Introduction to Compilers.

Size: px
Start display at page:

Download "History. Compilers. Phases and other tools. Compiler. Today. Comp 104: Operating Systems Concepts. Introduction to Compilers."

Transcription

1 Comp 104: Operating Systems Concepts Introduction to Compilers Compilers Definition Structure Passes Lexical Analysis Symbol table Access methods Today 1 2 Compilers History Definition: A compiler is a program which translates a high-level source program into a lower-level object program (target) SOURCE PROG. analysis ANALYSED PROG. synthesis OBJECT PROG. 3 Late 1940ies (post-von Neumann) Programs were written in machine code C (move the number 2 to location 0000 (hex) Highly complex, tedious and prone to error Assemblers appeared Machine instructions given as mnemonics MOV X,2 (assuming X has the value 0000 (hex)) Greatly improved the speed and accuracy of writing code But still non-trivial, and non-portable to new processors Needed a mathematical notation Fortran appeared between X = 2 Exploited context free grammars (Chomsky) and finite state automatata 4 Compiler Phases and other tools Responsible for converting source code into executable code. Analyses the code to determine the functionality Synthesises executable code for a given processor Optimises code to improve performance, or exploit specific processor instructions Assumes various data structures: Tokens Variables, language keywords, syntactic constructs etc Symbol Table Relates user defined entities (variables, methods, classes etc) with their associated values or internal structures Literal Table Stores constants, strings, etc. Used to reduce the size of the resulting code Syntax/Parse Tree The resulting structure formed through the analysis of the code Intermediate Code Intermediate representation between different phases of the compilation 5 Interpreters: Unlike compilers, code is executed immediately Slow execution, used more for scripting or functional languages Assemblers: Constructs final machine code from processor specific Assembly code Often used as last phase of a compilation process to produce binary executable. Linkers: Collates separately compiled objects into a single file, including shared library objects or system calls. Preprocessors: Called prior to the compilation process to perform macro substitutions E.g. RATFOR preprocessor, or cpp for C code Profilers: Collects statistics about the behaviour of a program and can be used to improve the performance of the code. 6

2 Analysis and Synthesis Analysis: checks that program constructs are legal and meaningful builds up information about objects declared Synthesis: takes analysed program and generates code necessary for its execution Compilation based on language definition, which comprises: syntax semantics 7 source program (character stream) optimiser scanner Compiler Structure tokens IR (parse tree) IR (tuples) code generator parser semantic routines target code SYMBOL TABLE IR = Intermediate Representation 8 Compiler Organisation Each of compiler tasks described previously (in Compiler Structure) is a phase Phases can be organised into a number of passes a pass consists of one or more phases acting on some representation of the complete program representations produced between source and target are Intermediate Representations (IRs) Single Pass Compilers One pass compilers very common because of their simplicity No IRs: all phases of compiler interleaved Compilation driven by parser Scanner acts as subroutine of parser, returning a token on each call As each phrase recognised by parser, it calls semantic routines to process declarations, check for semantic errors and generate code Code not as efficient as multi-pass 9 10 Multi-Pass Compilers Number of passes depends on number of IRs and on any optimisations Multi-pass allows complete separation of phases more modular easier to develop more portable Main forms of IR: Abstract Syntax Tree (AST) Intermediate Code (IC) Postfix Tuples Virtual Machine Code 11 Compiler Implementation Compilers often written in HLLs for ease of maintenance, portability, etc. e.g. Pascal compiler written in C, runs on machine X Problem: always need both compilers available To alter compiler: Make necessary changes Re-compile using C compiler To move to machine Y: Re-write code generator to produce code for Y Compile compiler on machine Y (using Y s C compiler) 12

3 Bootstrapping Suppose our compiler is written in the language it compiles e.g. C compiler written in C language We can then run compiler through itself! Bootstrapping To alter compiler: Make necessary changes Run compiler through itself To move to machine Y: Re-write code generator to produce code for Y Run compiler through itself to generate version of compiler that will run directly on Y 13 The Scanner (Lexical Analyser) Converts groups of characters into tokens (lexemes) tokens usually represented as integers white space and comments are skipped Each token may be accompanied by a value could be a pointer to further information As identifiers encountered, entered into a symbol table used to collect info. about declared objects Scanners often hand-coded for efficiency, but may be automatically generated (e.g. Lex) 14 Example Symbol Table Access begin inta; float b; a = 1; b = 1.2; a = b + 1; print (a * 2); end begin inta; float b; a = 1; b = 1.2; TOKEN VALUE beginsymb intsymb semisymb floatsymb semisymb assignsymb integer 1 semisymb assignsymb float 1.2 symbol table a b 15 The symbol table is used by most compiler phases Even used post-compilation (debugging) Structure of table and algorithms used can make difference between a slow and fast compiler Methods: Sequential lookup Binary chop and binary tree Hash addressing Hash chaining 16 Sequential Lookup Binary Chop Table is just a vector of names Search sequentially from beginning If name not found, add to end Advantages: Very simple to implement Disadvantages: Inefficient For table with N names, requires N/2 comparisons on average Can slow down a compiler by a factor of 10 or more 17 Keep names in alphabetical order To find name: Compare with middle element to determine which half Compare with middle element again to narrow down to quarter, etc. Advantage: Much more efficient than sequential log 2 N-1 comparisons on average Disadvantage: Adding a new name means shifting up every name above it 18

4 Question If the symbol table for a compiler is size 4096, how many comparisons on average need to be made when performing a lookup using the binary chop method? a) 2 b) 11 c) 12 d) 16 e) 31 Answer: b 11 as there are log 2 N-1 comparisons on average 19 Binary Tree Each node contains pointer to 2 sub-trees Left sub-tree contains all names < current Right sub-tree has all names >= current Advantages: In best case, search time can be as good as binary chop Adding a new name is simple and efficient Disadvantages: Efficiency depends on how balanced the tree is Tree can easily become unbalanced In worst case, method as bad as sequential lookup! May need to do costly re-balancing occasionally 20 Hash Addressing To determine position in table, apply a hash function, returning a hash key Example fn: Sum of character codes modulo N, where N is table size (prime) Advantages: Can be highly efficient Even similar names can generate totally different hash keys Disadvantages: Requires hash function producing good distribution Possibility of collisions May require re-hashing mechanism, possibly multiple times 21 Hash Chaining As before, but link together names having same hash key hash( fred ) array of pointers fred jim Number of comparisons needed very small 22 Question Reserved Words Concerning compilation, which of the following is NOT a method for symbol table access? a) Sequential lookup b) Direct lookup c) Binary chop d) Hash addressing e) Hash chaining Words like for, while, if, etc. are reserved words Could use binary chop on a table of reserved words first; if not there, search symbol table Simpler to pre-hash all reserved words into the symbol table and use one lookup mechanism Answer: b Direct Lookup 23 24

5 Today Parsing Context-free grammar & BNF Example: The Micro language Parse Tree Abstract syntax tree Parser (Syntax Analyser) Reads tokens and groups them into units as specified by language grammar i.e. it recognises syntactic phrases Parser must produce good errors and be able to recover from errors Scanning and Parsing Syntax source file Regular essions define tokens BNF rules define grammar elements sum = x1 + x2; input stream Scanner tokens Parser parse tree sum = x1 + x2 ; x1 sum = + x2 27 Defines the structure of legal statements in the language Usually specified formally using a contextfree grammar (CFG) Notation most widely used is Backus-Naur Form (BNF), or extended BNF A CFG is written as a set of rules (productions) In extended BNF: {... means zero or many [...] means zero or one 28 Backus Naur Form Backus Naur Form (BNF) iw a standard notation for essing syntax as a set of grammar rules. BNF was developed by Noam Chomsky, John Backus, and Peter Naur. First used to describe Algol. BNF can describe any context-free grammar. Fortunately, computer languages are mostly context-free. Computer languages remove non-context-free meaning by either (a) defining more grammar rules or (b) pushing the problem off to the semantic analysis phase. A Context-Free Grammar A grammar is context-free if all the syntax rules apply regardless of the symbols before or after (the context). Example: (1) sentence => noun-phrase verb-phrase. (2) noun-phrase => article noun (3) article => a the (4) noun =>boy girl cat dog (5) verb-phrase => verb noun-phrase (6) verb => sees pets bites Terminal symbols: 'a' 'the' 'boy' 'girl' 'sees' 'pets' 'bites' 29 30

6 A Context-Free Grammar A sentence that matches the productions (1) - (6) is valid. a girl sees a boy a girl sees a girl a girl sees the dog the dog pets the girl a boy bites the dog a dog pets the boy... To eliminate unwanted sentences without imposing context sensitive grammar, specify semantic rules: Backus Naur Form Grammar Rules or Productions: define symbols. assignment_stmt ::= id = ession ; The nonterminal symbol being defined. The definition (production) Nonterminal Symbols: anything that is defined on the left-side of some production. Terminal Symbols: things that are not defined by productions. They can be literals, symbols, and other lexemes of the language defined by lexical rules. Identifiers: id ::= [A-Za-z_]\w* Delimiters: ; Operators: = + - * / % "a boy may not bite a dog" Backus Naur Form (2) Different notations (same meaning): assignment_stmt ::= id = ession + term <assignment-stmt> => <id> = <> + <term> AssignmentStmt id = ession + term ::=, =>, mean "consists of" or "defined as" Alternatives ( " " ): ession => ession + term ession - term term Alternative Example The following BNF syntax is an example of how an arithmetic ession might be constructed in a simple language Note the recursive nature of the rules Concatenation: number => DIGIT number DIGIT Syntax for Arithmetic Expr. BNF rules can be recursive <ession> ::= <term> <addop> <term> <ession> <addop> <term> <term> ::= <primary> <term> <multop> <primary> <primary> ::= <digit> <letter> ( <ession> ) <digit> ::= <letter> ::= a b c... y z <multop> ::= * / <addop> ::= + - => + term - term term term => term * factor term / factor factor factor => ( ) ID Are the following essions legal, according to this syntax? i) -a ii) b+c^(3/d) iii) a*(c-(4+b)) iv) 5(9-e)/d 35 where the tokens are: := [0-9]+ ID := [A-Za-z_][A-Za-z_0-9]* 36

7 Repetition Uses of Recursion => + term => + term + term => + term + term + term => term term + term Parser can recursively expand each time one is found Could lead to arbitrary depth analysis Greatly simplifies implementation Example: The Micro Language To illustrate BNF parsing, consider an example imaginary language: the Micro language 1) A program is of the form begin end sequence of statements 2) Only statements allowed are assignment read (list of variables) write (list of essions) Micro 3) Variables are declared implicitly their type is integer 4) Each statement ends in a semi-colon 5) Only operators are +, - parentheses may be used Micro CFG 1. <program> ::= begin <stat-list> end 2. <stat-list> ::= <statement> { <statement> 3. <statement> ::= id := <> ; 4. <statement> ::= read ( <id-list> ) ; 5. <statement> ::= write ( <-list> ) ; 6. <id-list> ::= id {, id 7. <-list> ::= <> {, <> 8. <> ::= <primary> { <addop> <primary> 9. <primary> ::= ( <> ) 10. <primary> ::= id 11. <primary> ::= intliteral 12. <addop> ::= <addop> ::= - 1) A program is of the form begin statements end 2) Permissible statements: assignment read (list of variables) write (list of essions) 3) Variables are declared implicitly their type is integer 4)Statements end in a semi-colon 5) Valid operators are +, - but can use parentheses BNF Items such as <program> are non-terminals require further expansion Items such as begin are terminals correspond to language tokens Usual to combine productions using (or) e.g. <primary> ::= ( <> ) id intliteral 41 Parsing Bottom-up Look for patterns in the input which correspond to phrases in the grammar Replace patterns of items by phrases, then combine these into higher-level phrases, and so on Stop when input converted to single <program> Top-down Assume input is a <program> Search for each of the sub-phrases forming a <program>, then for each of the sub-sub-phrases, and so on Stop when we reach terminals A program is syntactically correct iff it can be derived from the CFG 42

8 Question Consider the following grammar, where S, A and B are nonterminals, and a and b are terminals: S ::= AB A ::= a A ::= BaB B ::= bba Which of the following is FALSE? a) The length of every string derived from S is even. b) No string derived from S has an odd number of consecutive b s. c) No string derived from S has three consecutive a s. d) No string derived from S has four consecutive b s. e) Every string derived from S has at least as many b s as a s. Example Parse: begin A := B + (10 - C); end <program> begin <stat-list> end (apply rule 1) begin <statement> end (2) begin id := <> ; end (3) begin id := <primary> <addop> <primary>; end (8) begin id := <primary> + <primary> ; end (12)... Answer:c No string derived from S has three consecutive a s Exercise Complete the previous parse Clue - this is the final line of the parse: begin id := id + (intliteral - id); end Parse Answer begin A := B + (10 - C); end <program> begin <stat-list> end (apply rule 1) begin <statement> end (2) begin id := <> ; end (3) begin id := <primary> <addop> <primary>; end (8) begin id := <primary> + <primary> ; end (12) begin id := id + <primary> ; end (10) begin id := id + (<>) ; end (9) begin id := id + (<primary><addop><primary>); end (8) begin id := id + (<primary> - <primary>); end (13) begin id := id + (intliteral - <primary>); end (11) begin id := id + (intliteral - id); end (10) <program> begin <stat-list> end <statement> id := <> ; <primary> <addop> <primary> Parse Tree id + ( <> ) <primary> <addop> <primary> intliteral - id The parser creates a data structure representing how the input is matched to grammar rules. Usually as a tree. Also called syntax tree or derivation tree 47 Expression Grammars For essions, a CFG can indicate associativity and operator precedence, e.g. <> ::= <factor> { <addop> <factor> <factor> ::= <primary> { <multop> <primary> <primary> ::= ( <> ) id literal <> <factor> <addop> <factor> <primary> + <primary> <multop> <primary> id id * id A+B*C 48

9 Ambiguity A grammar is ambiguous if there is more than one parse tree for a valid sentence. Example: => + * id number How would you parse x + y * z using this rule? 49 Example of Ambiguity Grammar Rules: => + () Expression: * 4 Two possible parse trees: (2) + (3) * (4) (2) + * (3) (4) 50 Another Example of Ambiguity Ambiguity Grammar rules: => + - ( ) Expression: Parse trees: (2) - (3) - (4) (2) - - (3) (4) 51 Ambiguity can lead to inconsistent implementations of a language. Ambiguity can cause infinite loops in some parsers. Specification of a grammar should be unambiguous! How to resolve ambiguity: rewrite grammar rules to remove ambiguity add some additional requirement for parser, such as "always use the left-most match first" EBNF (later) helps remove ambiguity 52 Abstract Syntax Tree (AST) Semantics More compact form of derivation tree contains just enough info. to drive later phases e.g. Y := 3*X + I Y := id + * id tag attribute const 3 id to symbol table I X 53 Specify meaning of language constructs usually defined informally A statement may be syntactically legal but semantically meaningless colourless green ideas sleep furiously Semantic errors may be static (detected at compile time) e.g. a := x + true; dynamic (detected at run time) e.g. array subscript out of bounds 54

10 Question If the array x contains 20 ints, as defined by the following declaration: int x[] = new int[20]; What kind of message would be generated by the following line of code? a := 22; val := x[a]; a) A Syntax Error. b) A Static Semantic Error. c) A Dynamic Semantic Error. d) A Warning, rather than an error. e) None of the above. Answer: c A dynamic semantic error the value of a would cause an array out of bounds error Semantics Also needed to generate appropriate code e.g. a = b in Java and C, this means assign b to a in Pascal and Ada, this means compare equality of a and b hence, generate different code in each case Semantic Routines Object Descriptors 1) Semantic analysis Completes analysis phase of compilation Object descriptors are associated with identifiers in symbol table Static semantic error checking performed 2) Semantic synthesis Code generation 57 Symbol table entry name token descriptor list link (to next entry in chain) Token tells us what name is e.g. while-token, if-token, identifier, etc. A descriptor contains things like type, address, array bounds, etc. Need a list of descriptors because of identifier re-use 58 Identifier Re-use Can have code such as: int x; // level 1 main() { float x; // level 2 Descriptor Lists For efficiency, the most local descriptors are kept at the front of the list At the end of a block, all descriptors declared in that block must be deleted symbol table entry x 2 float 1 integer 59 To aid in this, all descriptors within same block may be linked together 60

11 Attribute Propagation Before code can be generated, semantic attributes may need to be propagated through tree Top-down (inherited attributes) declarations processed to build symbol table identifiers looked up in table to attach attribute info to nodes Bottom-up (synthesised attributes) determine types of essions based on operators and types of identifiers Propagation can be done at same time as static semantic error checking, and often forms next pass May also be combined with code generation 61 Example: a := b*c +b*d float a, d; int b, c; SYMBOL TABLE inherited (float) a Type attribute recorded in extra field of each node After propagation, tree is said to be decorated := (int) * + (float) * (float) synthesised b (int) c (int) b (int) d (float) 62 Static Semantic Error Checking With info from attribute propagation, static checking often trivial, e.g. type mismatch (compare type attributes) identifier not declared (null descriptor field in symbol table) identifier already declared (descriptor with current level number already present) Question A BNF grammar includes the following statement: <statement> ::= <iden> := ( <> ); What kind of message would be produced by the following line of code? a := (2 + b; a) A Syntax Error. b) A Static Semantic Error. c) A Dynamic Semantic Error. d) A Warning, rather than an error. e) None of the above. Answer: a A syntax error all the tokens are valid, but the close parenthesis is missing, resulting in an error in the grammar Code Generation Often performed by tree-walking the AST Abstract Syntax Tree (AST) Again More compact form of derivation tree contains just enough info. to drive later phases e.g. Y := 3*X + I GenAssign(node) { // Gen code for RHS, leaving result in R1 GenExpr(node.rhs, R1); //Calculate addr for LHS GenAddr(node.lhs, Addr); := to symbol table Gen(STORE, R1, Addr) id + GenExpr(node, reg) { I if (node.type == op) { GenExpr(node.lhs, reg); * id X GenExpr(node.rhs, reg+1); Y tag attribute Gen(node.opcode, reg, reg+1);... const 3 id 65 66

12 Tree Walking := LOAD R1, #3 LOAD R2, X Y (int) + (int) MULT R1, R2 LOAD R2, I (int) * I (int) ADD R1, R2 STORE R1, Y 3 X (int) Advantage of AST is that order of traversal can be chosen code generated in one-pass compiler corresponds to strictly fixed traversal of tree (hence, code not as good) 67 Intermediate Code (IC) Instead of generating target machine code, semantic routines may generate IC. can form input to separate code generator (CG) advantage is that all target machine dependencies can be limited to CG Postfix e.g. a := b*c + b*d a b c * b d * + := Concise and simple, but not very good for generating code unless stack-based architecture used 68 Postfix In normal algebraic notation the arithmetic operator appears between the two operands to which it is being applied This is called infix notation example: a / b + c It may require parentheses to specify the desired order of operations example: a / (b + c) In postfix (or Reverse Polish) notation the operator is placed directly after the two operands to which it applies Therefore, in postfix notation the need for parenthesis is Operator Precedence To do the conversion from infix to postfix, we need to prioritise operators as follows: ^ highest priority *, / +, - <, >, =,... & (and) (or) lowest priority eliminated Exercise Convert the following infix essions into postfix: a+b/c a*c+(b-d) a*c+b-d Postfix Example 1: The infix ession: a ^ b + c Becomes in postfix: a b ^ c + Example 2: The infix ession: a ^ (b + c) Becomes in postfix: a b c + ^ Example 3: The infix ession: b * c + 5 ^ ( / a ) Becomes in postfix: b c * a / + ^

13 Question Today Which of the following postfix essions is equivalent to the following ession? a*b c/d a) a b c d * - / b) a b * - c d / c) a b c d / - * d) a b * c d / - e) a b c * - d / Answer: d a b * c d / - 73 Code generation Three address code Code optimisation Techniques Classification of optimisations Time of application Area of application 74 Intermediate Code Three Address Code Code can be generated from syntax tree However, this doesn t represent target code very well Tree represents constructs such as conditionals (if then else) or loops (while do) Target code includes jumps to memory addresses Intermediate code represents a linearisation of the syntax tree Postfix is an example of a stack-based linerisation Typically related in some way to target architecture Good for efficient code Can be exploited by code optimisation routines Reflects the notion of simple operations of the form: x= y op z Many instructions are of this form Introduces the notion of temporary variables These represent interior nodes in the tree Usually assigned to registers Represents a left-to-right linearization of the code Other variants exist, e.g. for unary operations x= -y Three Address Code Example: factorial function + * - 2 a b

14 P-Code P-Code Was initially a target assembly generated by Pascal compilers in early 70ies Format is very similar to assembly designed to work on a hypothetical stack machine called a P-machine aim was to aid portability P-code instructions could then be mapped to assembly for target platform * + 2 a b - 3 Simple, abstract version given on the next slide Question Code Optimisation Which of the following is NOT a form of intermediate representation used by compilers? Aim is to improve quality of target code a) Postfix b) Tuples c) Context-free grammar d) Abstract syntax tree e) Virtual machine code Answer: c A context-free grammar defines the language used by the compiler; the rest are intermediate representations 81 Disadvantages compiler more difficult to write compilation time may double or triple target code often bears little resemblance to unoptimised code greater chance of translation errors more difficult to debug programs 82 Optimisation Techniques Techniques Constant folding can evaluate essions involving constants at compile-time aim is for the compiler to pre-compute (or remove) as many operations as possible a := 3*16-2; LOAD 1, #46 STORE 1, a 83 Global register allocation analyse program to determine which variables are likely to be used most and allocate these to registers good use of registers is a very important feature of efficient code aided by architectures that provide an increased number of registers 84

15 Techniques Code deletion identify and delete unreachable or dead code boolean debug = false;... if (debug) {... No need to generate code for this Techniques Common sub-ession elimination avoid generating code for unnecessary operations by identifying essions that are repeated a := (b*c/5 + x) - (b*c/5 + y) generate code for b*c/5 only once Exercise Techniques Optimise the following: Code motion out of loops a = 100 3*22; b = (a-30)*5; if (a<b){ screen.println(a); 87 for (int i=0; i <= n; i++) { x = a + 5; //loop-invariant code Screen.println(x*i); x = a + 5; for (int i=0; i <= n; i++) { Screen.println(x*i); 88 Techniques Question Strength reduction replace operations by others which are equivalent but more efficient e.g. a* 2 LOAD 1, a LOAD 1, a MULT 1, #2 ADD 1, 1 What optimisation technique could be applied in the following examples? a = b^2 a = a / 2 a) Constant Folding b) Code Deletion c) Common Sub-Expression Elimination d) Strength Reduction e) Global Register Allocation 89 Answer: d Both essions can be reduced by changing the operator: a = b ^ 2 can be reduced to a = b * b a = a / 2 is a right shift operation: a = a >> 1 90

16 Classification of Optimisations Optimisations can be classified according to their different characteristics Two useful classifications: the period of the compilation process during which an optimisation can be applied the area of the program to which the optimisation applies Time of Application Optimisations can be performed at virtually every stage of the compilation process e.g. constant folding can be performed during parsing other optimisations might be applied to target code The majority of optimisations are performed either during or just after intermediate code generation, or during target code generation source-level optimisations do not depend upon characteristics of the target machine and can be performed earlier target-level optimisations depend upon the target architecture sometimes an optimisation can consist of both Target Code Optimisations Optimisations performed on target code are known as peephole optimisations scan target code, searching for sequences of target code that can be replaced by more efficient ones, e.g. LOAD 1, a ADD 1, #1 STORE 1, a INC a replacements may introduce further possibilities effective and simple sometimes tacked onto end of one-pass compiler 93 Area of Application Optimisations can be applied to different areas of a program Local optimisations: those that are applied to straightline segments of code, i.e. with no jumps into or out of the sequence easiest optimisations to perform Global optimisations: those that extend beyond basic blocks but are confined to an individual procedure more difficult to perform Inter-procedural optimisations: those that extend beyond the boundaries of procedures to the entire program most difficult optimisations to perform 94 Today Compiler-writing tools Regular essions Lex Yacc Code generator generators 95 Compiler-Writing Tools Various software tools exist which aid in the construction of compilers. Parser generators e.g. yacc Code generator generators Scanner generators e.g. lex The input to lex consists of a definition of each token as a regular ession 96

17 Regular Expressions (REs) Used in many UNIX tools, e.g. awk, grep, sed, lex, vi REs specify patterns to be matched against input text An RE may be just a string: cat matches the string cat A full stop matches any single char: c.t matches cat, cut, cot, etc. REs The beginning of a line is specified by ^ End of line is specified as $ An asterisk means zero or more occurrences of the immediately preceding item xy*z matches xz, xyz, xyyz, xyyyz, etc. A plus sign means one or more xy+z matches xyz, xyyz, etc. A vertical bar means or ; e.g. x(a b)y matches xay or xby Exercise What will be matched by the pattern a.d in the following line of characters? add a dog and aardvark Using the same line of characters what will match ^a.d? What do we get if we search a file for all occurrences of the following patterns? ^hello$ ^$ Exercise Using the same line of characters as before, what will be matched by the following? an*d an+d What will be matched by: 10*1.* Character Classes Square brackets denote a character class: [abc] matches character a, b, or c Can also abbreviate: [1-6] is equivalent to [123456] Asterisk and plus may be applied to character classes e.g. to define hex numbers in a Java or C program: 0x[0-9a-fA-F]+ Can negate a character class: [^abc] match any char except a,b,c Note different use of ^ 101 Exercise Which of the following will match [Kk]a[itl]e Kate kite kale kit? What matches the following? [ \t\n]+ 102

18 Lex Input to lex consists of pairs of REs and actions Each RE defines a particular language token Each action is a fragment of C code, to be executed if the token is encountered Lex transforms this input to a C function called yylex(), which returns a token each time it is called The string that matches an RE is placed in an array called yytext Extra info about a token can be passed back to calling program via a global variable called yylval Lex Example [ \t\n]+ ; while return(while_symb); for return(for_symb);.. [0-9]+ { /* convert to int */ yylval = atoi(yytext); return(); [a-za-z][a-za-z0-9]* { /* find in sym tab */ yylval = lookup(yytext); return(ident): Yacc Stands for Yet Another Compiler-Compiler It is a parser generator Parser generators are programs that take as input the grammar defining a language and produce as output a parser for that language A Yacc parser matches sequences of input tokens to the rules of the given grammar 105 Yacc The specification file that Yacc takes as input consists of three sections Definitions: contains info about the tokens, data types and grammar rules required to build the parser Rules: contains the rules (in a form of BNF) of the grammar, along with actions in C code to be executed whenever a given rule is recognised Auxiliary routines: contains any auxiliary procedure and function declarations required to complete the parser 106 Yacc Example Error Recovery in Yacc Example format of rules: assign : IDENT BECOMES SEMI { /* action for assignment */ while : WHILE DO statement { /* action for while stat */ The parsing procedure produced by Yacc is called yyparse() returns an int value : 0 if the parse is successful, 1 otherwise 107 Errors need to be recognised and recovered from: Yacc provides error productions as the principal way to achieve this Error productions have on their right hand side an error pseudotoken These productions identify a context in which erroneous tokens can be deleted until tokens are encountered that enable the parse to be re-started When errors are encountered appropriate syntax error messages will be generated 108

19 Code Generator Generators CGGs remove the burden of deciding what code to generate for each construct Implementer produces a formal description of what each target machine instruction does CG automatically searches machine description to find the instructions(s) that produce desired computation Question Lex is a software tool that can be used to aid compiler construction. It is an example of which of the following? A scanner generator A parser generator A code generator generator A semantic analyser A code debugger Code almost as good as conventional compiler, but generation speed much slower 109 Answer: a Lex is responsible for identifying tokens using regular essions. It is therefore a scanner generator 110

Compilers. History. Today. Comp 104: Operating Systems Concepts 1/27/2015. Introduction to Compilers. Compilers. Definition:

Compilers. History. Today. Comp 104: Operating Systems Concepts 1/27/2015. Introduction to Compilers. Compilers. Definition: Comp 104: Operating Systems Concepts Introduction to Compilers Compilers Definition Structure Passes Lexical Analysis Symbol table Access methods Today 1 2 Compilers History Definition: A compiler is a

More information

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation Comp 204: Computer Systems and Their Implementation Lecture 22: Code Generation and Optimisation 1 Today Code generation Three address code Code optimisation Techniques Classification of optimisations

More information

A Simple Syntax-Directed Translator

A Simple Syntax-Directed Translator Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

Language Processors Chapter 1. By: Bhargavi H Goswami Assistant Professor Sunshine Group of Institutes Rajkot, Gujarat, India

Language Processors Chapter 1. By: Bhargavi H Goswami Assistant Professor Sunshine Group of Institutes Rajkot, Gujarat, India Language Processors Chapter 1. By: Bhargavi H Goswami Assistant Professor Sunshine Group of Institutes Rajkot, Gujarat, India Is it ok if Analysis of Source Statement is followed by synthesis of equivalent

More information

CPS 506 Comparative Programming Languages. Syntax Specification

CPS 506 Comparative Programming Languages. Syntax Specification CPS 506 Comparative Programming Languages Syntax Specification Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens

More information

CST-402(T): Language Processors

CST-402(T): Language Processors CST-402(T): Language Processors Course Outcomes: On successful completion of the course, students will be able to: 1. Exhibit role of various phases of compilation, with understanding of types of grammars

More information

Principles of Programming Languages COMP251: Syntax and Grammars

Principles of Programming Languages COMP251: Syntax and Grammars Principles of Programming Languages COMP251: Syntax and Grammars Prof. Dekai Wu Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China Fall 2007

More information

This book is licensed under a Creative Commons Attribution 3.0 License

This book is licensed under a Creative Commons Attribution 3.0 License 6. Syntax Learning objectives: syntax and semantics syntax diagrams and EBNF describe context-free grammars terminal and nonterminal symbols productions definition of EBNF by itself parse tree grammars

More information

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics ISBN Chapter 3 Describing Syntax and Semantics ISBN 0-321-49362-1 Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Copyright 2009 Addison-Wesley. All

More information

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below. UNIT I Translator: It is a program that translates one language to another Language. Examples of translator are compiler, assembler, interpreter, linker, loader and preprocessor. Source Code Translator

More information

Compiler Code Generation COMP360

Compiler Code Generation COMP360 Compiler Code Generation COMP360 Students who acquire large debts putting themselves through school are unlikely to think about changing society. When you trap people in a system of debt, they can t afford

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Any questions about the syllabus?! Course Material available at www.cs.unic.ac.cy/ioanna! Next time reading assignment [ALSU07]

More information

Notes on the Exam. Question 1. Today. Comp 104:Operating Systems Concepts 11/05/2015. Revision Lectures (separate questions and answers)

Notes on the Exam. Question 1. Today. Comp 104:Operating Systems Concepts 11/05/2015. Revision Lectures (separate questions and answers) Comp 104:Operating Systems Concepts Revision Lectures (separate questions and answers) Today Here are a sample of questions that could appear in the exam Please LET ME KNOW if there are particular subjects

More information

COP 3402 Systems Software Syntax Analysis (Parser)

COP 3402 Systems Software Syntax Analysis (Parser) COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis

More information

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language. UNIT I LEXICAL ANALYSIS Translator: It is a program that translates one language to another Language. Source Code Translator Target Code 1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System

More information

Comp 204: Computer Systems and Their Implementation. Lecture 25a: Revision Lectures (separate questions and answers)

Comp 204: Computer Systems and Their Implementation. Lecture 25a: Revision Lectures (separate questions and answers) Comp 204: Computer Systems and Their Implementation Lecture 25a: Revision Lectures (separate questions and answers) 1 Today Here are a sample of questions that could appear in the exam Please LET ME KNOW

More information

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3 Programming Language Specification and Translation ICOM 4036 Fall 2009 Lecture 3 Some parts are Copyright 2004 Pearson Addison-Wesley. All rights reserved. 3-1 Language Specification and Translation Topics

More information

Question 1. Notes on the Exam. Today. Comp 104: Operating Systems Concepts 11/05/2015. Revision Lectures

Question 1. Notes on the Exam. Today. Comp 104: Operating Systems Concepts 11/05/2015. Revision Lectures Comp 104: Operating Systems Concepts Revision Lectures Today Here are a sample of questions that could appear in the exam Please LET ME KNOW if there are particular subjects you want to know about??? 1

More information

Part 5 Program Analysis Principles and Techniques

Part 5 Program Analysis Principles and Techniques 1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

More information

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1 Defining Program Syntax Chapter Two Modern Programming Languages, 2nd ed. 1 Syntax And Semantics Programming language syntax: how programs look, their form and structure Syntax is defined using a kind

More information

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview COMP 181 Compilers Lecture 2 Overview September 7, 2006 Administrative Book? Hopefully: Compilers by Aho, Lam, Sethi, Ullman Mailing list Handouts? Programming assignments For next time, write a hello,

More information

Stating the obvious, people and computers do not speak the same language.

Stating the obvious, people and computers do not speak the same language. 3.4 SYSTEM SOFTWARE 3.4.3 TRANSLATION SOFTWARE INTRODUCTION Stating the obvious, people and computers do not speak the same language. People have to write programs in order to instruct a computer what

More information

COMPILER DESIGN LECTURE NOTES

COMPILER DESIGN LECTURE NOTES COMPILER DESIGN LECTURE NOTES UNIT -1 1.1 OVERVIEW OF LANGUAGE PROCESSING SYSTEM 1.2 Preprocessor A preprocessor produce input to compilers. They may perform the following functions. 1. Macro processing:

More information

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis? Semantic analysis and intermediate representations Which methods / formalisms are used in the various phases during the analysis? The task of this phase is to check the "static semantics" and generate

More information

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. More often than not, though, you ll want to use flex to generate a scanner that divides

More information

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Crafting a Compiler with C (II) Compiler V. S. Interpreter Crafting a Compiler with C (II) 資科系 林偉川 Compiler V S Interpreter Compilation - Translate high-level program to machine code Lexical Analyzer, Syntax Analyzer, Intermediate code generator(semantics Analyzer),

More information

The Structure of a Syntax-Directed Compiler

The Structure of a Syntax-Directed Compiler Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree Type Checker (AST) Decorated AST Translator Intermediate Representation Symbol Tables Optimizer (IR) IR Code Generator Target

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

More information

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 1. Introduction Parsing is the task of Syntax Analysis Determining the syntax, or structure, of a program. The syntax is defined by the grammar rules

More information

Syntax. A. Bellaachia Page: 1

Syntax. A. Bellaachia Page: 1 Syntax 1. Objectives & Definitions... 2 2. Definitions... 3 3. Lexical Rules... 4 4. BNF: Formal Syntactic rules... 6 5. Syntax Diagrams... 9 6. EBNF: Extended BNF... 10 7. Example:... 11 8. BNF Statement

More information

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

Compilers. Lecture 2 Overview. (original slides by Sam

Compilers. Lecture 2 Overview. (original slides by Sam Compilers Lecture 2 Overview Yannis Smaragdakis, U. Athens Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Last time The compilation problem Source language High-level abstractions Easy

More information

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR Pune Vidyarthi Griha s COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR By Prof. Anand N. Gharu (Assistant Professor) PVGCOE Computer Dept.. 22nd Jan 2018 CONTENTS :- 1. Role of lexical analysis 2.

More information

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer: Theoretical Part Chapter one:- - What are the Phases of compiler? Six phases Scanner Parser Semantic Analyzer Source code optimizer Code generator Target Code Optimizer Three auxiliary components Literal

More information

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters : Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Scanner Parser Static Analyzer Intermediate Representation Front End Back End Compiler / Interpreter

More information

Syntax and Semantics

Syntax and Semantics Syntax and Semantics Syntax - The form or structure of the expressions, statements, and program units Semantics - The meaning of the expressions, statements, and program units Syntax Example: simple C

More information

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi UNIT I Programming Language Syntax and semantics B y Kainjan Sanghavi Contents Language Definition Syntax Abstract and Concrete Syntax Concept of binding Language Definition Should enable a person or computer

More information

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis CSE450 Translation of Programming Languages Lecture 4: Syntax Analysis http://xkcd.com/859 Structure of a Today! Compiler Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator

More information

Syntax. In Text: Chapter 3

Syntax. In Text: Chapter 3 Syntax In Text: Chapter 3 1 Outline Syntax: Recognizer vs. generator BNF EBNF Chapter 3: Syntax and Semantics 2 Basic Definitions Syntax the form or structure of the expressions, statements, and program

More information

Describing Syntax and Semantics

Describing Syntax and Semantics Describing Syntax and Semantics Introduction Syntax: the form or structure of the expressions, statements, and program units Semantics: the meaning of the expressions, statements, and program units Syntax

More information

Theory and Compiling COMP360

Theory and Compiling COMP360 Theory and Compiling COMP360 It has been said that man is a rational animal. All my life I have been searching for evidence which could support this. Bertrand Russell Reading Read sections 2.1 3.2 in the

More information

Using an LALR(1) Parser Generator

Using an LALR(1) Parser Generator Using an LALR(1) Parser Generator Yacc is an LALR(1) parser generator Developed by S.C. Johnson and others at AT&T Bell Labs Yacc is an acronym for Yet another compiler compiler Yacc generates an integrated

More information

A simple syntax-directed

A simple syntax-directed Syntax-directed is a grammaroriented compiling technique Programming languages: Syntax: what its programs look like? Semantic: what its programs mean? 1 A simple syntax-directed Lexical Syntax Character

More information

ICOM 4036 Spring 2004

ICOM 4036 Spring 2004 Language Specification and Translation ICOM 4036 Spring 2004 Lecture 3 Copyright 2004 Pearson Addison-Wesley. All rights reserved. 3-1 Language Specification and Translation Topics Structure of a Compiler

More information

The Structure of a Syntax-Directed Compiler

The Structure of a Syntax-Directed Compiler Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree (AST) Type Checker Decorated AST Translator Intermediate Representation Symbol Tables Optimizer (IR) IR Code Generator Target

More information

CS 4201 Compilers 2014/2015 Handout: Lab 1

CS 4201 Compilers 2014/2015 Handout: Lab 1 CS 4201 Compilers 2014/2015 Handout: Lab 1 Lab Content: - What is compiler? - What is compilation? - Features of compiler - Compiler structure - Phases of compiler - Programs related to compilers - Some

More information

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc. Syntax Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will

More information

Programming Language Syntax and Analysis

Programming Language Syntax and Analysis Programming Language Syntax and Analysis 2017 Kwangman Ko (http://compiler.sangji.ac.kr, kkman@sangji.ac.kr) Dept. of Computer Engineering, Sangji University Introduction Syntax the form or structure of

More information

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1 Table of Contents About the Authors... iii Introduction... xvii Chapter 1: System Software... 1 1.1 Concept of System Software... 2 Types of Software Programs... 2 Software Programs and the Computing Machine...

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:

More information

Chapter 3. Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics Chapter 3 Describing Syntax and Semantics Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings of Programs:

More information

Chapter 4. Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing Copyright 2012 Addison-Wesley. All rights reserved.

More information

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning

More information

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built) Programming languages must be precise Remember instructions This is unlike natural languages CS 315 Programming Languages Syntax Precision is required for syntax think of this as the format of the language

More information

Question Bank. 10CS63:Compiler Design

Question Bank. 10CS63:Compiler Design Question Bank 10CS63:Compiler Design 1.Determine whether the following regular expressions define the same language? (ab)* and a*b* 2.List the properties of an operator grammar 3. Is macro processing a

More information

A programming language requires two major definitions A simple one pass compiler

A programming language requires two major definitions A simple one pass compiler A programming language requires two major definitions A simple one pass compiler [Syntax: what the language looks like A context-free grammar written in BNF (Backus-Naur Form) usually suffices. [Semantics:

More information

Time : 1 Hour Max Marks : 30

Time : 1 Hour Max Marks : 30 Total No. of Questions : 6 P4890 B.E/ Insem.- 74 B.E ( Computer Engg) PRINCIPLES OF MODERN COMPILER DESIGN (2012 Pattern) (Semester I) Time : 1 Hour Max Marks : 30 Q.1 a) Explain need of symbol table with

More information

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End Architecture of Compilers, Interpreters : Organization of Programming Languages ource Analyzer Optimizer Code Generator Context Free Grammars Intermediate Representation Front End Back End Compiler / Interpreter

More information

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars CMSC 330: Organization of Programming Languages Context Free Grammars Where We Are Programming languages Ruby OCaml Implementing programming languages Scanner Uses regular expressions Finite automata Parser

More information

Reading Assignment. Scanner. Read Chapter 3 of Crafting a Compiler.

Reading Assignment. Scanner. Read Chapter 3 of Crafting a Compiler. Reading Assignment Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree (AST) Type Checker Decorated AST Read Chapter 3 of Crafting a Compiler. Translator Intermediate Representation

More information

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units Syntax - the form or structure of the expressions, statements, and program units Semantics - the meaning of the expressions, statements, and program units Who must use language definitions? 1. Other language

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

VIVA QUESTIONS WITH ANSWERS

VIVA QUESTIONS WITH ANSWERS VIVA QUESTIONS WITH ANSWERS 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the

More information

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar. LANGUAGE PROCESSORS Presented By: Prof. S.J. Soni, SPCE Visnagar. Introduction Language Processing activities arise due to the differences between the manner in which a software designer describes the

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

When do We Run a Compiler?

When do We Run a Compiler? When do We Run a Compiler? Prior to execution This is standard. We compile a program once, then use it repeatedly. At the start of each execution We can incorporate values known at the start of the run

More information

Lexical Scanning COMP360

Lexical Scanning COMP360 Lexical Scanning COMP360 Captain, we re being scanned. Spock Reading Read sections 2.1 3.2 in the textbook Regular Expression and FSA Assignment A new assignment has been posted on Blackboard It is due

More information

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Examples of attributes: values of evaluated subtrees, type information, source file coordinates, 1 2 3 Attributes can be added to the grammar symbols, and program fragments can be added as semantic actions to the grammar, to form a syntax-directed translation scheme. Some attributes may be set by

More information

Compilers and Interpreters

Compilers and Interpreters Overview Roadmap Language Translators: Interpreters & Compilers Context of a compiler Phases of a compiler Compiler Construction tools Terminology How related to other CS Goals of a good compiler 1 Compilers

More information

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

More information

Group A Assignment 3(2)

Group A Assignment 3(2) Group A Assignment 3(2) Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Lexical analyzer using LEX. 3.1.1 Problem Definition: Lexical analyzer for sample language using LEX. 3.1.2 Perquisite:

More information

CSCE 314 Programming Languages

CSCE 314 Programming Languages CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee 1 What Is a Programming Language? Language = syntax + semantics The syntax of a language is concerned with the form of a program: how

More information

Working of the Compilers

Working of the Compilers Working of the Compilers Manisha Yadav Nisha Thakran IT DEPARTMENT IT DEPARTMENT DCE,GURGAON DCE,GURGAON Abstract- The objective of the paper is to depict the working of the compilers that were designed

More information

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design PSD3A Principles of Compiler Design Unit : I-V 1 UNIT I - SYLLABUS Compiler Assembler Language Processing System Phases of Compiler Lexical Analyser Finite Automata NFA DFA Compiler Tools 2 Compiler -

More information

Syntax Intro and Overview. Syntax

Syntax Intro and Overview. Syntax Syntax Intro and Overview CS331 Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal

More information

Compiler Design (40-414)

Compiler Design (40-414) Compiler Design (40-414) Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007 Evaluation: Midterm Exam 35% Final Exam 35% Assignments and Quizzes 10% Project

More information

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; } Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas

More information

2.2 Syntax Definition

2.2 Syntax Definition 42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions

More information

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics ISBN Chapter 3 Describing Syntax and Semantics ISBN 0-321-49362-1 Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the

More information

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler. More detailed overview of compiler front end Structure of a compiler Today we ll take a quick look at typical parts of a compiler. This is to give a feeling for the overall structure. source program lexical

More information

Appendix A The DL Language

Appendix A The DL Language Appendix A The DL Language This appendix gives a description of the DL language used for many of the compiler examples in the book. DL is a simple high-level language, only operating on integer data, with

More information

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2 Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Formal Languages Basis for the design and implementation of programming languages Alphabet: finite set Σ of symbols String: finite sequence

More information

Habanero Extreme Scale Software Research Project

Habanero Extreme Scale Software Research Project Habanero Extreme Scale Software Research Project Comp215: Grammars Zoran Budimlić (Rice University) Grammar, which knows how to control even kings - Moliere So you know everything about regular expressions

More information

Undergraduate Compilers in a Day

Undergraduate Compilers in a Day Question of the Day Backpatching o.foo(); In Java, the address of foo() is often not known until runtime (due to dynamic class loading), so the method call requires a table lookup. After the first execution

More information

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab

More information

Chapter 3. Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics Chapter 3 Describing Syntax and Semantics Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings of Programs:

More information

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (REs) Nondeterministic Finite Automata (NFA) Converting an RE to an NFA Deterministic Finite Automatic (DFA) Lexical Analysis Why separate

More information

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Objective PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Explain what is meant by compiler. Explain how the compiler works. Describe various analysis of the source program. Describe the

More information

Context-free grammars (CFG s)

Context-free grammars (CFG s) Syntax Analysis/Parsing Purpose: determine if tokens have the right form for the language (right syntactic structure) stream of tokens abstract syntax tree (AST) AST: captures hierarchical structure of

More information

EECS 6083 Intro to Parsing Context Free Grammars

EECS 6083 Intro to Parsing Context Free Grammars EECS 6083 Intro to Parsing Context Free Grammars Based on slides from text web site: Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. 1 Parsing sequence of tokens parser

More information

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. COMPILER DESIGN 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the target

More information

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100 GATE- 2016-17 Postal Correspondence 1 Compiler Design Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key concepts,

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 4. Lexical Analysis (Scanning) Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office: TA-121 For

More information

COMPILERS BASIC COMPILER FUNCTIONS

COMPILERS BASIC COMPILER FUNCTIONS COMPILERS BASIC COMPILER FUNCTIONS A compiler accepts a program written in a high level language as input and produces its machine language equivalent as output. For the purpose of compiler construction,

More information

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; } Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas

More information

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Syntax Analysis COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Based in part on slides and notes by Bjoern Brandenburg, S. Olivier and A. Block. 1 The Big Picture Character Stream Token

More information