Compiler I. (dt. Übersetzer I) Prof. Dr. Uwe Kastens. Winter 2001/2002. Objectives

Size: px
Start display at page:

Download "Compiler I. (dt. Übersetzer I) Prof. Dr. Uwe Kastens. Winter 2001/2002. Objectives"

Transcription

1 CI-1 Compiler I (dt. Übersetzer I) Prof. Dr. Uwe Kastens Winter 2001/2002 Objectives CI-2 The participants are taught to understand fundamental techniques of language implementation, use generating tools and standard solutions, understand compiler construction as a systematic combination of algorithms, theories and software engineering methods for the solution of a precisely specified task, apply compiler techniques for languages other than programming languages. Lectures Forms of teaching: Tutorials Homeworks Exercises Running project

2 Lectures in English CI-3 Some agreements about giving lectures in English: I ll speak English unless someone asks me to explain something in German. Stop me or slow me down whenever you get lost. I don t speak as well as a native speaker; but I ll do my best... You may ask questions and give answers in English or in German. I ll prepare the slides in English. A German version is available. You ll have to learn to speak about the material in at least one of the two languages. You may vote which language to be used in the tutorials. You may chose German or English for the oral exam. Syllabus CI-4 Week Chapter Topic 1 Introduction Compiler tasks 2 Compiler structure 3 Lexical analysis Scanning, token representation 4 Syntactic analysis Recursive decent parsing 5 LR Parsing 6 Parser generators 7 Grammar design 8 Semantic analysis Attribute grammars 9 Attribute grammar specifications 10 Name analysis 11 Type analysis 12 Transformation Intermediate language, target trees 13 Target texts 14 Synthesis Overview 15 Summary

3 Prerequisites CI-5 from Lecture Topic here needed for Foundations of Programming Languages: 4 levels of language properties Compiler tasks, compiler structure Context-free grammars Scope rules Data types Lifetime, runtime stack Syntactic analysis Name analysis Type analysis Storage model, code generation Modeling: Finite automata Context-free grammars Lexical analysis Syntactic analysis References CI-6 Material for this course Compiler I: in German Übersetzer I (1999/2000): in English Compiler II: Modellierung: Grundlagen der Programmiersprachen: U. Kastens: Übersetzerbau, Handbuch der Informatik 3.3, Oldenbourg, 1990 (not available on the market anymore, available in the library of the University) W. M. Waite, L. R. Carter: An Introduction to Compiler Construction, Harper Collins, New York, 1993 W. M. Waite, G. Goos: Compiler Construction, Springer-Verlag, 1983 R. Wilhelm, D. Maurer: Übersetzerbau - Theorie, Konstruktion, Generierung, Springer-Verlag, 1992 A. Aho, R. Sethi, J. D. Ullman: Compilers - Principles, Techniques and Tools, Addison-Wesley, 1986 A. W. Appel: Modern Compiler Implementation in C, Cambridge University Press, 1997 (available for Java and for ML, too)

4 Course material in the Web CI-7 Commented slide in the course material CI-7a

5 What does a compiler compile? CI-8 A compiler transforms correct sentences of its source language into sentences of its target language such that their meaning is unchanged. Examples: Source language: Programming language C++ Programming language Java Programming language C++ Application language LaTeX Data base language (SQL) Target language: Machine language Sparc code Abstract machine Java Bytecode Programming language (source-to-source) C Application language HTML Data base system calls class Average { private: int sum, count; public: Average (void) { sum = 0; count = 0; } void Enter (int val) { sum = sum + val; count++; } float GetAverage (void) { return sum / count; } }; _Enter 7Averagei: pushl %ebp movl %esp,%ebp movl 8(%ebp),%edx movl 12(%ebp),%eax addl %eax,(%edx) incl 4(%edx) L6: movl %ebp,%esp popl %ebp ret What is compiled here? CI-9 class Average { private int sum, count; public Average () { sum = 0; count = 0; } void Enter (int val) { sum = sum + val; count++; } float GetAverage () { return sum / count; } }; : Enter: (int) --> void Access: [] Attribute Code (Length 49) Code: 21 Bytes Stackdepth: 3 Locals: 2 0: aload_0 1: aload_0 2: getfield cp4 5: iload_1 6: iadd 7: putfield cp4 10: aload_0 11: dup 12: getfield cp3 15: iconst_1 16: iadd

6 What is compiled here? CI-10 program Average; var sum, count: integer; aver: integer; procedure Enter (val: integer); begin sum := sum + val; count := count + 1; end; begin sum := 0; count := 0; Enter (5); Enter (7); aver := sum div count; end void ENTER_5 (char *slnk, int VAL_4) { {/* data definitions: */ /* executable code: */ { SUM_1 = (SUM_1)+(VAL_4); COUNT_2 = (COUNT_2)+(1); ; } }}/* ENTER_5 */ \documentstyle[12pt]{article} \begin{document} \section{introduction} This is a very short document. It just shows \begin{itemize} \item an item, and \item another item. \end{itemize} \end{document} %%Page: bop a Fc(1)81 b(in)n(tro)r(duction) y Fb(This)16 b(is)g(a)h(v)o(ery)e(short) i(do)q(cumen)o(t.)j(it)c(just)g (sho)o(ws) y Fa(\017)24 b Fb(an)17 b(item,) c(and) y Fa(\017)24 b Fb(another)17 b(item.) y(1)p eop Languages for specification and modeling CI-11 SDL (CCITT) Specification and Description Language: UML Unified Modeling Language:

7 Domain Specific Languages (DSL) CI-12 A language designed for a specific application domain. Application Generator: Implementation of a DSL by a program generator Examples: Simulation of mechatronic feedback systems Robot control Collecting data from instruments Testing car instruments Report generator for bibliographies: string name = int since = int cnt = 0; InString "Which author?"; InInt "Since which year?"; "\npapers of ", name, " since ", since, ":\n"; [ SELECT name <= Author && since <= Year; cnt = cnt + 1; Year, "\t", Title, "\n"; ] "\n", name, " published ", cnt, "papers.\n"; U. Kastens: Construction of Application Generators Using Eli, Workshop on Compiler Techniques for Application Domain Languages..., Linköping, April 1996 Programming languages as source or target languages Programming languages as source languages: Program analysis call graphs, control-flow graph, data dependencies, e. g. for the year 2000 problem Recognition of structures and patterns e. g. for Reengineering CI-13 Program languages as target languages: Specifications (SDL, OMT, UML) graphic modeling of structures DSL, Application generator => Compiler task: Source-to-source compilation

8 Semester project as running example CI-14 A Structure Generator We are going to develop a tool that implements record structures. In particular, the structure generator takes a set of record descriptions. Each specifies a set of named and typed fields. For each record a Java class declaration is to be generated. It contains a constructor method and access methods for the specified record fields. The tool will be used in an environment where field description are created by other tools, which for example analyze texts for the occurrence of certain phrases. Hence, the descriptions of fields may occur in arbitrary order, and the same field may be described more than once. The structure generator accumulates the field descriptions such that for each record a single class declaration is generated which has all fields of that record. Design a domain specific language. Implement an application generator for it. Apply all techniques of the course that are useful for the task. Meaning preserving transformation CI-15 A compiler transforms correct sentences of its source language into sentences of its target language such that their meaning is unchanged. Source language Language definition Meaning described for abstract machine Compilation same results Target language Machine description Execution A meaning is defined only for correct programs. Compiler task: Error handling The compiler analyses static properties of the program at compile time, e. g. definitions of Variables, types of expressions. Decides: Is the program compilable? Dynamic properties of the program are checked at runtime, e. g. indexing of arrays. Decides: Is the program executable? But in Java: Compilation of bytecode at runtime, just in time compilation (JIT)

9 Example: Tokens and structure CI-16 Character sequence int count = 0; double sum = 0.0; while (count<maxvect) { sum = sum+vect[count]; count++;} Tokens int count = 0; double sum = 0.0; while (count<maxvect) { sum = sum+vect[count]; count++;} Expressions Structure Declarations Statements Tokens Example: Names, types, generated code CI-17 int count = 0; double sum = 0.0; while (count<maxvect) { sum = sum+vect[count]; count++;} int k1: (count, local variable, int) k2: (sum, local variable, double) Names and types double int int boolean k3: (maxvect, member variable, int) k4: (vect, member variable, double array)... generated Bytecode 0 iconst_0 1 istore_1 2 dconst_0 3 dstore_2 4 goto 19 7 dload_2 8 getstatic #5 <vect[]> 11 iload_1 12 faload 13 f2d 14 dadd 15 dstore_2 16 iinc iload_1 20 getstatic #4 <maxvect> 23 if_icmplt 7

10 Language definition - Compiler task CI-18 Notation of tokens keywords, identifiers, literals formal definition: regular expressions lexical analysis Syntactic structure formal definition: context-free grammar syntactic analysis Static semantics binding names to program objects, typing rules usually defined by informal texts semantic analysis, transformation Dynamic semantics semantics, effect of the execution of constructs usually defined by informal texts in terms of an abstract machine Definition of the target language (machine) transformation, code generation transformation, code generation assembly Compiler tasks CI-19 Structuring Lexical analysis Syntactic analysis Scanning Conversion Parsing Tree construction Translation Semantic analysis Transformation Name analysis Type analysis Data mapping Action mapping Execution-order Encoding Code generation Assembly Register allocation Instruction selection Instruction encoding Internal Addressing External Addressing

11 Compiler structure and interfaces CI-20 Source program Token sequence Abstract program tree Lexical analysis Syntactic analysis Semantic analysis Transformation Analysis (frontend) Intermediate language Optimization Synthesis (backend) Code generation Abstract machine program Target program Assembly Peephole optimization Software qualities of the compiler CI-21 Correctness Translate correct programs correctly. Reject wrong programs and give error messages Efficiency Storage and time used by the compiler Code efficiency Storage and time used by the generated code Compiler task: Optimization User support Compiler task: Error handling (recognition, message, recovery) Robustness Give a reasonable reaction on every input

12 Strategies for compiler construction CI-22 Obey exactly to the language definition Use generating tools Use standard components Apply standard methods Validate the compiler against a test suite Verify components of the compiler Generators CI-23 Pattern: Environment Specification Generator Implemented algorithm Interfaces Typical compiler tasks solved by generators: Regular expressions Scanner generator Finite automaton Context-free grammar Parser generator Stack automaton Attribute grammar Attribute evaluator Tree walking algorithm generator Code patterns Code selection Pattern matching generator integrated system Eli: Specifications Cooperating generators Compiler

13 Environment of compilers CI-24 Compilation units Libraries Source programs Preprocessor Executable program Source program Interactive commands Core dump Code files Compiler Input Debugger Output Linker Executable program Source program Input Interpreter Analysis part abstract machine Output Compilation and interpretation of Java programs CI-25 Source modules load needed class files dynamically - local or via Internet Java Compiler Class files in Java Bytecode (intermediate language) Class loader Bytecode prozessor in software Just-In-Time Compiler (JIT) Interpreter Java Virtual Machine JVM Input Machine code Output

14 Lexical Analysis Ci-26 Input: Program represented by a sequence of characters Tasks: Compiler modul: Input reader Recognize and classify tokens Scanner (central phase, finite state machine) Skip irrelevant characters Encode tokens: Store token information Conversion Identifier modul Literal modules String storage Output: Program represented by a sequence of encoded tokens Uniform encoding of tokens by triples: Representation of tokens CI-27 Syntax code attribute source position terminal code of value or reference to locate error messages the concrete syntax into data module of later compiler phases Examples: double sum = 5.6e-5; while (count < maxvect) { sum = sum + vect[count]; DoubleToken 12, 1 Ident , 8 Assign 12, 12 FloatNumber 16 12, 14 Semicolon 12, 20 WhileToken 13, 1 OpenParen 13, 7 Ident , 8 LessOpr 13, 14 Ident , 16 CloseParen 13, 23 OpenBracket 14, 1 Ident , 3

15 Specification of token notations Example: identifiers Ident ::= Letter X X ::= Letter X X ::= Digit X X ::= regular grammar syntax diagram Ident = Letter (Letter Digit)* regular expression Ident: Letter Digit CI-28 acceptor: finite state machine Letter Letter 1 2 Letter Digit Regular expressions mapped to syntax diagrams CI-29 Transformation rules: regular expression A syntax diagram for A empty a a empty single character BC B C sequence B B C alternative C B* repetition, may be empty B + B B repetition, non-empty

16 Construction of deterministic finite state machines CI-30 Syntax diagram nodes, arcs set of nodes m q sets of nodes m q and m r connected with the same character a deterministic finite state machine transitions, states state q transitions q ---> r with character a Construction: 1. enumerate nodes; exit of the diagram gets the number 0 2. initial set of nodes m 1 contains all nodes initial state 1 that are reachable from the begin of the diagram 3. construct new sets of nodes (states) and transitions: For a character a and a set m q containing node k create set m r with all nodes n, according to the following schema: k m q n m r k m q for a create a 4. repeat step 3 until no new sets of nodes can be created 5. a state q is a final state iff 0 is in m q. n m r Example: Floating point numbers in Pascal CI-31 Syntax diagram z. z E z {1} {1, 2, 4} {3} {3, 4, 0} {5, 6, 7} {7} {7, 0} z z. E z z E + - z z z + z. z E z 7 - z z E z deterministic finite state machine z

17 Composition of token automata CI-32 Construct one finite state machine for each token. Compose them forming a single one: Identify the initial states of the single automata and identical structures evolving from there (transitions with the same character and states). Keep the final states of single automata distinct, they classify the tokens. Add automata for comments and irrelevant characters (white space) a character classes: a all but * c all but * or ) d digits l all letters but E s + - * < > ;, ) [ ] ^ b blank tab newline l, E, d _ * * * c 17 / 18 ( eof s / ) = : 0 13 = 15 l, E, d l, E 6. d = d b d d Example: tokens of Lax [Waite, Goos: Compiler Construction] E E d d d , - d Rule of the longest match CI-33 An automaton may contain transitions from final states: When does the automaton stop? Rule of the longest match: The automaton continues as long as there is a transition with the next character. After having stopped it sets back to the most recently passed final state. If no final state has been passed an error message is issued. Consequence: Some kinds of tokens have to be separated explicitly. Check the concrete grammar for tokens that may occur adjacent!

18 Scanner: Aspects of implementation CI-34 Runtime is proportional to the number of characters in the program Operations per character must be fast - otherwise the Scanner dominates compilation time Table driven automata are too slow: Loop interprets table, 2-dimensional array access, branches Directly programmed automata is faster; transform transitions into control flow: sequence repeat loop branch 2006 bei Prof. Dr. Uwe Kastens Fast loops for sequences of irrelevant blanks. Implementation of character classes: bit pattern or indexing - avoid slow operations with sets of characters. Do not copy characters from input buffer - maintain a pointer into the buffer, instead. Identifier module and literal modules CI-35 Uniform interface for all scanner support modules: Input parameters: pointer to token text and its length; Output parameters: syntax code, attribute Identifier module encodes identifier occurrences bijective (1:1), and recognizes keywords Implementation: hash vector, extensible table, collision lists Literal modules for floating point numbers, integral numbers, strings Variants for representation in memory: token text; value converted into compiler data; value converted into target data Caution: Avoid overflow on conversion! Cross compiler: compiler representation may differ from target representation 2006 bei Prof. Dr. Uwe Kastens Character string memory: stores strings without limits on their lengths, used by the identifier module and the literal modules

19 Scanner generators CI-36 generate the central function of lexical analysis GLA University of Colorado, Boulder; component of the Eli system Lex Unix standard tool Flex Successor of Lex Rex GMD Karlsruhe Token specification: regular expressions GLA library of precoined specifications; recognizers for some tokens may be programmed Lex, Flex, Rex transitions may be made conditional Interface: GLA as described in this chapter; cooperates with other Eli components Lex, Flex, Rex actions may be associated with tokens (statement sequences) interface to parser generator Yacc Implementation: GLA Lex, Flex, Rex Rex Flex, Rex directly programmed automaton in C table-driven automaton in C table-driven automaton in C or in Modula-2 faster, smaller implementations than generated by Lex Syntactic analysis CI-37 Input: token sequence Tasks: Parsing: construct derivation according to concrete syntax, Tree construction according to abstract syntax, Error handling (detection, message, recovery) Result: abstract program tree Compiler module parser: deterministic stack automaton, augmented by actions for tree construction top-down parsers: leftmost derivation; tree construction top-down or bottom-up bottom-up parsers: rightmost derivation backwards; tree construction bottom-up Abstract program tree (condensed derivation tree): represented by a data structure in memory for the translation phase to operate on, linear sequence of nodes on a file (costly in runtime), sequence of calls of functions of the translation phase.

20 Concrete and abstract syntax CI-38 concrete syntax context-free grammar defines the structure of source programs unambigous specifies derivation and parser abstract syntax context-free grammar defines abstract program trees usually ambiguous translation phase is based on it parser actions specify the ---> tree construction some chain productions only for syntactic purposekeep only semantically relevant ones Expr ::= Fact have no action no node created symbols of syntactic chain productions comprised in symbol classes Exp={Expr,Fact} same action at structural equivalent productions: Expr ::= Expr AddOpr Fact &BinEx Fact ::= Fact MulOpr Opd &BinEx terminal symbols given the concrete syntax and the actions and keep only semantically relevant ones as tree nodes the symbol classes the abstract syntax can be generated Example: concrete expression grammar CI-39 name production action p1: Expr ::= Expr AddOpr Fact BinEx p2: Expr ::= Fact p3: Fact ::= Fact MulOpr Opd BinEx p4: Fact ::= Opd p5: Opd ::= '(' Expr ')' Expr p6: Opd ::= Ident IdEx p2 p7: AddOpr ::= '+' PlusOpr p8: AddOpr ::= '-' MinusOpr p3 Fact p9: MulOpr ::= '*' TimesOpr Fact MulOpr p10: MulOpr ::= '/' DivOpr p4 p9 derivation tree for a * (b + c) Opd p5 Opd * ( Expr ) p6 a Expr AddOpr p2 p7 Fact + p4 Opd p6 b p1 Fact p4 Opd p6 c

21 Example: abstract expression grammar CI-40 name production BinEx: Exp ::= Exp BinOpr Exp IdEx: Exp ::= Ident PlusOpr: BinOpr ::= '+' MinusOpr: BinOpr ::= '-' TimesOpr: BinOpr ::= '*' DivOpr: BinOpr ::= '/' abstract program tree for a * (b + c) BinEx Exp IdEx a Exp BinOpr Exp TimesOpr BinEx Exp BinOpr Exp * IdEx PlusOpr IdEx b + c symbol classes: Exp = { Expr, Fact, Opd }, BinOpr = { AddOpr, MulOpr } Actions of the concrete syntax: productions of the abstract syntax to create tree nodes for no action at a concrete chain production: no tree node is created Recursive descent parser CI-41 top-down (construction of the derivation tree), predictive method Sytematic transformation of a context-free grammar into a set of functions: non-terminal symbol X function X alternative productions for X branches in the function body decision set of production pi decision for branch pi non-terminal occurrence X ::=... Y... function call Y() terminal occurrence X ::=... t... accept a token t an read the next token Example: p1: Stmt ::= Variable ':=' Expr p2: Stmt ::= 'while' Expr 'do' Stmt Function: void Stmt () { switch (CurrSymbol) { case decision set for p1: Variable(); accept(assignsym); Expr(); break; default: Fehlerbehandlung(); } } case decision set for p2: accept(whilesym); Expr(); accept(dosym); Stmt(); break;

22 Grammar conditions for recursive descent CI-42 A context-free grammar is strong LL(1), if for any pair of productions that have the same symbol on their left-hand sides, the decision sets are disjoint: productions: A ::= u A ::= v decision sets: First (u Follow(A)) First (v Follow(A)) = First set and follow set: First (u) := { t T v V* exists and a derivation u * tv} and ε First (u) if u * ε exists Follow (A) := { t T u,v V* exist, A N and a derivation S * uavsuch that t First (v) } Example: production decision set p1: Prog ::= Block # begin p2: Block ::= begin Decls Stmts end begin p3: Decls ::= Decl ; Decls new p4: Decls ::= Ident begin p5: Decls ::= new Ident new p6: Stmts ::= Stmts ; Stmt begin Ident p7: Stmts ::= Stmt begin Ident p8: Stmt ::= Block begin p9: Stmt ::= Ident := Ident Ident non-terminal X First(X) Follow(X) Prog begin Block begin # ; end Decls ε new Ident begin Decl new ; Stmts begin Ident ; end Stmt begin Ident ; end Grammar transformations for LL(1) CI-43 Consequences of strong LL(1) condition: A strong LL(1) grammar can not have alternative productions that begin with the same symbols productions that are directly or indirectly left-recursive. Simple grammar transformations that keep the defined language invariant: left-factorization: non-ll(1) productions u, v, w V* X N does not occur in the A ::= v u original grammar A ::= v w transformed A ::= v X X ::= u X ::= w elimination of direct recursion : A ::= A u A ::= v A ::= v X X ::= u X X ::= 2006 bei Prof. Dr. Uwe Kastens EBNF constructs can avoid violation of strong LL(1) condition: for example repetition of u: A ::= v ( u )* w additional condition: First(u) First(w Follow(A)) = branch in the function body: v while (CurrToken in First(u)) { u } w correspondingly for EBNF constructs u +, [u]

23 Comparison: top-down vs. bottom-up CI-44 Information a stack automata has when it decides to apply production A ::= x : top-down, predictive bottom-up leftmost derivation rightmost derivation backwards contents of the stack u A x v direction of tree construction u A x input accepted k lookahead input accepted A bottom-up parser has seen more of the input when it decides to apply a production. Consequence: bottom-up parsers and their grammar classes are more powerful. k lookahead LR(1) automata CI-45 LR(k) grammars introduced 1965 by Donald Knuth; non-practical until subclasses were defined. LR parsers construct the derivation tree bottom-up, a right-derivation backwards. LR(k) grammar condition can not be checked directly, but a context-free grammar is LR(k), iff the (canonical) LR(k) automaton is deterministic. We consider only 1 token lookahead: LR(1). The stacks of LR(k) (and LL(k)) automata contain states. The construction of LR and LL states is based on the notion of items (also called situations): An item represents the progress of analysis with respect to one production: [ A ::= u. v R ] z. B. [ B ::= (. D ; S ) {#}]. position of analysis R expected right context, i. e. a set of terminals which may follow after the application of the complete production. (for general k: R contains terminal sequences not longer than k) Reduce item: [ A ::= u v. R ] z. B. [ B ::= ( D ; S ). {#}] characterizes a reduction using this production if the next input token is in R. Each state of an automaton represents LL: one item LR: a set of items

24 LR(1) states and operations CI-46 A state of an LR automaton represents a set of items Each item represents a way in which analysis may proceed from that state. A shift transition is made under a token read from input or a non-terminal symbol obtained from a preceding reduction. The state is pushed. A reduction is made according to a reduce item. n states are popped for a production of length n. 3 a 2 4 B ::= (. D ; S ) {#} D ::=. D ; a {;} D ::=. a {;} D B ::= ( D. ; S ) {#} D ::= D. ; a {;} D ::= a. {;} red. p3 Operations: shift read and push the next state on the stack reduce reduce with a certain production, pop n states from the stack error error recognized, report it, recover stop input accepted Example for a LR(1) automaton CI bei Prof. Dr. Uwe Kastens Grammar: p1 B ::= ( D ; S ) p2 D ::= D ; a p3 D ::= a p4 S ::= b ; S p5 S ::= b B ::=. ( D ; S ) {#} ( B ::= (. D ; S ) {#} D ::=. D ; a {;} D ::=. a {;} D B ::= ( D. ; S ) {#} D ::= D. ; a {;} B ::= ( D ;. S ) {#} D ::= D ;. a {;} S ::=. b ; S {)} S ::=. b {)} B ::= ( D ; S. ) {#} ; S ) B ::= ( D ; S ). {#} red. p1, stop a b a b 3 D ::= a. {;} D ::= D ; a. {;} S ::= b. ; S {)} S ::= b. {)} S ::= b ;. S {)} S ::=. b ; S {)} S ::=. b {)} S ::= b ; S. {)} ; S red. p3 red. p2 red. p5 red. p4

25 Construction of LR(1) automata CI-48 Create the start state; create transitions and states as long as new ones can be created. Transitive closure is to be applied to each state: If [ A ::= u. B v R ] is in state q, with the analysis position before a non-terminal B, then for each production B ::= w [ B ::=. w First (v R) ] has to be added to state q. before: after: 2 B ::= (. D ; S ) {#} B ::= (. D ; S ) {#} D ::=. D ; a {;} D ::=. a {;} Start state: Closure of [ S ::=. u {#} ] S ::= u is the unique start production, # is an artificial end symbol (eof) 1 B ::=. ( D ; S ) {#} Successor states: For each symbol x (terminal or non-terminal), which occurs in some items after the analysis position, a transition is created to a successor state. That contains a corresponding item with the analysis position advanced behind the x occurrence. 4 B ::= ( D. ; S ) {#} D ::= D. ; a {;} 2 B ::= (. D ; S ) {#} D ::=. D ; a {;} D ::=. a {;} D a 3 D ::= a. {;} shift x (terminal or non-terminal): from current state q under x into thesuccessor state q, push q reduce p: apply production p B ::= u, pop as many states, as there are symbols in u, from the new current state make a shift with B error: the current state has no transition under the next input token, issue a message and recover stop: recuce start production, see # in the input Operations of the LR(1) automaton Example: stack CI-49 input reduction 1 ( a ; a ; b ; b ) # 1 2 a ; a ; b ; b ) # ; a ; b ; b ) # p3 1 2 ; a ; b ; b ) # ; a ; b ; b ) # a ; b ; b ) # ; b ; b ) # p2 1 2 ; b ; b ) # ; b ; b ) # b ; b ) # ; b ) # b ) # ) # p ) # ) # p ) # ) # # p1 1 #

26 LR conflicts CI-50 An LR(1) automaton that has conflicts is not deterministic. Its grammar is not LR(1); correspondingly defined for any other LR class. 2 kinds of conflicts: reduce-reduce conflict: A state contains two reduce items, the right context sets of which are not disjoint: shift-reduce conflict: A state contains a shift item with the analysis position in front of a t and a reduce item with t in its right context set.... A ::= u. R1 B ::= v. R A ::= u.t v R1 B ::= w. R2... R1, R2 not disjoint t R2 Shift-reduce conflict for dangling else ambiguity CI bei Prof. Dr. Uwe Kastens S ::=. Stmt {#} Stmt ::=. if... then Stmt {#} Stmt ::=. if... then Stmt else Stmt {#} Stmt ::=. a {#} if... then Stmt ::= if... then. Stmt {#} Stmt ::= if... then. Stmt else Stmt {#} Stmt ::=. if... then Stmt {# else} Stmt ::=. if... then Stmt else Stmt {# else} Stmt ::=. a {# else} if... then Stmt ::= if... then. Stmt Stmt ::= if... then. Stmt else Stmt Stmt ::=. if... then Stmt Stmt ::=. if... then Stmt else Stmt Stmt ::=. a Stmt Stmt ::= if... then Stmt. Stmt ::= if... then Stmt. else Stmt {# else} {# else} {# else} {# else} {# else} {# else} {# else} Stmt a Stmt if a a else shift-reduce conflict

27 Simplified LR grammar classes CI-52 LR(1): too many states for practical use Reason: right-contexts distinguish many states Strategy: simplify right-contexts sets, fewer states, grammar classes are less powerful Grammar hierarchy: (strict inclusions) context-free LR(0): all items without right-context Consequence: reduce items only in singleton sets SLR(1): LR(0) states; in reduce items use larger right-context sets for decision: [ A ::= u. Follow (A) ] LALR(1): identify LR(1) states if their items differ only in their right-context sets, unite the sets for those items; yields the states of the LR(0) automaton augmented by the "exact LR(1) right-context. State-of-the-art parser generators accept LALR(1) LL(k) strong LL(1) = LL(1) unambiguous LR(k) LR(1) LALR(1) SLR(1) LR(0) Implementation of LR automata CI-53 Table-driven: states terminals sq rp e ~ nonterminals sq ~ sq: shift into state q rp: reduce production p e: error ~: never reached Compress tables: merge rows or columns that differ only in irrelevant entries; method: graph coloring extract a separate error matrix (bit matrix); increases the chances for merging normalize the values of rows or columns; yields smaller domain; supports merging eliminate LR(0) reduce states; new operation in predecessor state: shift-reduce eliminates about 30% of the states in practical cases About 10-20% of the original table sizes can be achieved! Directly programmed LR-automata are possible - but usually too large.

28 Error handling: general criteria CI-54 recognize error as early as possible LL and LR can do that report the symptom in terms of the source text continue parsing short after the error position avoid avalanche errors build a tree that has a correct structure do not backtrack, do not undo actions no runtime penalty for correct programs Error position CI-55 Error recovery: Means that are taken by the parser after recognition of a syntactic error in order to continue parsing Correct prefix: The token sequence w T* is a correct prefix in the language L(G), if there is an u T* such that w u L(G); i. e. w can be extended to a sentence in L(G). Error position: t is the (first) error position in the input w t x, where t T and w, x T*, if w is a correct prefix in L(G) and w t is not a correct prefix. Example: int compute (int i) { a = i * / c; return i;} w t LL and LR parsers recognize an error at the error position; they can not accept t in the current state.

29 Error recovery CI-56 Continuation point: The token d at or behind the error position t such that parsing of the input continues at d. Error repair with respect to a consistent derivation - regardless the intension of the programmer! Let the input be w t x with the error position at t and let w t x = w y d z, then the recovery (conceptually) deletes y and inserts v, such that w v d is a correct prefix in L(G), with d T and w, y, v, z T*. Examples: w y d z w yd z w y d z a = i * / c;... a = i * c;... a = i * / c;... a = i *e/ c;... a = i * / c;... a = i * e ;... delete / insert error id. e delete / c and insert error id. e Recovery method: simulated continuation CI-57 Problem: Determine a continuation point close to the error position and reach it. Idea: Use parse stack to determine a set of tokens as potential continuation points. Steps of the method: 1. Save the contents of the parse stack when an error is recognized. Skip the error token. 2. Compute a set D T of tokens that may be used as continuation point (anchor set) Let a modified parser run to completion: Instead of reading a token from input it is inserted into D; (modification given below) 3. Find a continuation point d: Skip input tokens until a token of D is found. 4. Reach the continuation point d: Restore the saved parser stack as the current stack. Perform dedicated transitions until d is acceptable. Instead of reading tokens (conceptually) insert tokens. Thus a correct prefix is constructed. 5. Continue normal parsing. Augment parser construction for steps 2 and 4: For each parser state select a transition and its token, such that the parser empties its stack and terminates as fast as possible. This selection can be generated automatically. The quality of the recovery can be improved by influence on the computation of D.

30 Parser generators PGS Univ. Karlsruhe; in Eli LALR(1), table-driven Cola Univ. Paderborn; in Eli LALR(1), optional: table-driven or directly programmed Lalr Univ. / GMD Karlsruhe LALR(1), table-driven Yacc Unix tool LALR(1), table-driven Bison Gnu LALR(1), table-driven Llgen Amsterdam Compiler Kit LL(1), recursive descent Deer Univ. Colorado, Bouder LL(1), recursive descent Form of grammar specification: EBNF: Cola, PGS, Lalr; BNF: Yacc, Bison Error recovery: simulated continuation, automatically generated: Cola, PGS, Lalr error productions, hand-specified: Yacc, Bison Actions: statements in the implementation language at the end of productions: Yacc, Bison anywhere in productions: Cola, PGS, Lalr Conflict resolution: modification of states (reduce if...) Cola, PGS, Lalr order of productions: Yacc, Bison rules for precedence and associativity: Yacc, Bison Implementation languages: C: Cola, Yacc, Bison C, Pascal, Modula-2, Ada: PGS, Lalr CI-58 Design of concrete grammars CI-59 Objectives The concrete grammars for parsing is parsable - fulfills the grammar condition of the chosen parser generator; specifies the intended language - or a small super set of it; is provable related to the documented grammar; can be mapped to a suitable abstract grammar.

31 Grammar design for an existing language CI-60 Take the grammar of the language specification literally. Only conservative modifications for parsability or for mapping to abstract syntax. Describe any modification. (see ANSI C Specification in the Eli system description Java language specification (1996): Specification grammar is not LALR(1). 5 problems are described and how to solve them. Ada language specification (1983): Specification grammar is LALR(1) - requirement of the language competition ANSI C, C++: several ambiguities and LALR(1) conflicts, e.g. dangling else, typedef problem : A (*B); is a declaration of variable B, if A is a type name, otherwise it is a call of function A Grammar design together with language design CI-61 Read grammars before writing a new grammar. Apply grammar patterns systematically (cf. GdP-2.5, GdP-2.8) repetitions optional constructs precedence, associativity of operators Syntactic structure should reflect semantic structure: E. g. a range in the sense of scope rules should be represented by a single subtree of the derivation tree (of the abstract tree). Violated in Pascal: functiondeclaration ::= functionheading block functionheading ::= function identifier formalparameters : resulttype ; formalparameters together with block form a range, but identifier does not belong to it

32 Syntactic restrictions versus semantic conditions CI-62 Express a restriction syntactically only if it can be completely covered with reasonable complexity: Restriction can not be decided syntactically: e.g. type check in expressions: BoolExpression ::= IntExpression < IntExpression Restriction can not always be decided syntactically: e. g. disallow array type to be used as function result Type ::= ArrayType NonArrayType Identifier ResultType ::= NonArrayType If a type identifier may specify an array type, a semantic condition is needed, anyhow Syntactic restriction is unreasonable complex: e. g. distinction of compile-time expressions from ordinary expressions requires duplication of the expression syntax. Reasons of LALR(1) conflicts CI-63 context-free unambiguous LR(k) LR(1) LALR(1) Grammar condition does not hold: ambiguous unbounded lookahead needed fixed length lookahead > 1 needed merge of LR(1) states introduces conflicts most cases rare cases LALR(1) parser generator can not distinguish these cases.

33 Eliminate ambiguities unite syntactic constructs - distinguish them semantically CI-64 Examples: Java: ClassOrInterfaceType ::= ClassType InterfaceType InterfaceType ::= TypeName ClassType ::= TypeName replace first production by ClassOrInterfaceType ::= TypeName semantic analysis distinguishes between class type and interface type Pascal: factor ::= variable... functiondesignator variable ::= entirevariable... entirevariable ::= variableidentifier variableidentifier ::= identifier (**) functiondesignator ::= functionidentifier (*) functionidentifer ( actualparameters ) functionidentifier ::= identifier eliminate marked (*) alternative semantic analysis checks whether (**) is a function identifier Unbounded lookahead The decision for a reduction is determined by a distinguishing token that may be arbitrarily far to the right: Example, forward declarations as could have been defined in Pascal: functiondeclaration ::= function forwardident formalparameters : resulttype ; forward function functionident formalparameters : resulttype ; block The distinction between forwardident and functionident would require to see the forward or the begin token. Replace forwardident and functionident by the same nonterminal; distinguish semantically. CI bei Prof. Dr. Uwe Kastens

34 LR(1) but not LALR(1) CI-66 Identification of LR(1) states causes non-disjoint right-context sets. Artificial example: 2006 bei Prof. Dr. Uwe Kastens Grammar: Z ::= S S ::= A a S ::= B c S ::= b A c S ::= b B a A ::= d. B ::= d. LR(1) states Z ::=. S {#} S ::=. A a {#} S ::=. B c {#} S ::=. b A c {#} S ::=. b B a {#} A ::=. d {a} B ::=. d {c} b S ::= b. A c {#} S ::= b. B a {#} A ::=. d {c} B ::=. d {a} d d A ::= d. B ::= d. identified states A ::= d. B ::= d. {a} {c} {c} {a} Avoid the distinction between A and B - at least in one of the contexts. LALR(1) state A ::= d. {a, c} B ::= d. {a, c} 4. Semantic analysis and transformation CI-67 Input: abstract program tree Tasks: name analysis properties of program entities type analysis, operator identification transformation Compiler module: environment module definition module signature module tree generator Output: target tree, intermediate code, target program in case of source-to-source Standard implementations and generators for compiler modules Operations of the compiler modules are called at nodes of the abstract program tree Model: Specification: generated: dependent computations in trees attribute grammars tree walking algorithm that calls operations in specified contexts and in an admissable order

35 4.1 Attribute grammars CI-68 Attribute grammar (AG) specifies dependent computations in the abstract program tree declarative: explicit dependencies only; a suitable order of execution is computed Computations solve the tasks of semantic analysis and transformation Generator produces a plan for tree walks that execute calls of the computations, such that the specified dependencies are obeyed, computed values are propagated through the tree Result: attribute evaluator; applicable for any tree specified by the AG Example: attribute grammar tree with dependent attributes evaluated Decls size 16 RULE Decls ::= Decls Decl COMPUTE Decls[1].size = Add (Decls[2].size, Decl.size); RULE Decls ::= Decl COMPUTE Decls.size = Decl.size; RULE Decl ::= Type Name COMPUTE Decl.size =...; Decls size 4 Decl size 4 Decls size 12 Decl size 8 Decl size 4 Basic concepts of attribute grammars CI-69 An AG specifies computations in tree: expressed by computations associated to productions of the abstract syntax RULE p: Y ::= u COMPUTE f(...); g(...); computations f(...) and g(...) are executed in every tree context of type p An AG specifies dependencies between computations: expressed by attributes associated to grammar symbols RULE p: X ::= u Y v COMPUTE X.b = f(y.a); Y.a = g(...); post-condition pre-condition f(y.a) uses the result of g(...); hence Y.a=g(...) will be executed before f(y.a) dependent computations in adjacent contexts: RULE r: X ::= v Y w COMPUTE X.b = f(y.a); RULE p: Y ::= u COMPUTE Y.a = g(...); attributes may specify dependencies without propagating any value: X.GotType = ResetTypeOf(...); Y.Type = GetTypeOf(...) <- X.GotType; ResetTypeOf will be called before GetTypeOf

36 Definition of attribute grammars CI-69a An attribute grammar is defined by a context-free grammar G, (abstract syntax, tree grammar) for each symbol X of G a set of attributes A(X), written X.a if a A(X) for each production (rule) p of G a set of computations of one of the forms X.a = f (... Y.b... ) or g (... Y.b... ) where X and Y occur in p Consistency and completeness of an AG: Each A(X) is partitioned into two disjoint subsets: AI(X) and AS(X) AI(X): inherited attributes are computed in rules p where X is on the right-hand side of p AS(X): synthesized attributes are computed in rules p where X is on the left-hand side of p Each rule p: X ::=... Y... has exactly one computation for all attributes of AS(X), and for all attributes of AI(Y), for all symbol occurrences on the right-hand side of p AG Example: Compute expression values CI-69b The AG specifies: The value of an expression is computed and printed: ATTR value: int; RULE: Root ::= Expr COMPUTE printf ("value is %d\n", Expr.value); TERM Number: int; RULE: Expr ::= Number COMPUTE Expr.value = Number; SYMBOL Opr: left, right: int; RULE: Opr ::= '+' COMPUTE Opr.value = ADD (Opr.left, Opr.right); RULE: Opr ::= '*' COMPUTE Opr.value = MUL (Opr.left, Opr.right); RULE: Expr ::= Expr Opr Expr COMPUTE Expr[1].value = Opr.value; Opr.left = Expr[2].value; Opr.right = Expr[3].value;

37 AG Binary numbers CI-70 Attributes: L.v, B.v value L.lg number of digits in the sequence L L.s, B.s scaling of B or the least significant digit of L RULE p1: D ::= L '.' L COMPUTE D.v = ADD (L[1].v, L[2].v); L[1].s = 0; L[2].s = NEG (L[2].lg); RULE p2: L ::= L B COMPUTE L[1].v = ADD (L[2].v, B.v); B.s = L[1].s; L[2].s = ADD (L[1].s, 1); L[1].lg = ADD (L[2].lg, 1); RULE p3: L ::= B COMPUTE L.v = B.v; B.s = L.s; L.lg = 1; RULE p4: B ::= '0' COMPUTE B.v = 0; RULE p5: B ::= '1' COMPUTE B.v = Power2 (B.s); An attributed tree for AG Binary numbers CI-71 p1 D 5.25 p2 L L p dependency 1 L 2 4 p2 2 L 1 4 p3 B p B B 0 0 p4 p5 L 1 p3 B p B p attributes: D v s L lg v s B v

38 Dependency analysis for AGs CI-72 2 disjoint sets of attributes for each symbol X: AI (X) : inherited (dt. erworben), computed in upper contexts of X AS (X): synthesized (dt. abgeleitet), computed in lower contexts of X. y upper context of X p: Y ::= u X v dependencies between attributes AI (X,1) AI (X,2) Objective: Partition of attribute sets, such that u lower context of X q : X ::= w AS (X,1) AS (X,2) v context switch on tree walk AI (X, i) is computed before the i-th visit of X AS (X, i) is computed during the i-th visit of X w Necessary precondition for the existence of such a partition: No node in any tree has direct or indirect dependencies that contradict the evaluation order of the sequence of sets: AI (X, 1), AS (X, 1),..., AI (X, k), AS (X, k) Dependency graphs for AG Binary numbers CI-73 p1 D v L lg s v B s v s s p3 p4 L lg v L lg v B s v p2 L lg v s B s v L lg s v indirect dependency B s v p5

39 Construction of attribute evaluators CI-74 For a given attribute grammar an attribute evaluator is constructed: It is applicable to any tree that obeys the abstract syntax specified in the rules of the AG. It performs a tree walk and executes computations when visiting a context for which they are specified. The execution order obeys the attribute dependencies. Pass-oriented strategies for the tree walk: k times depth-first left-to-right k times depth-first alternatingly left-to-right / right-to left once bottom-up AG class LAG (k) AAG (k) SAG The attribute dependencies of the AG are checked whether the desired pass-oriented strategy is applicable; see LAG(k) algorithm bei Prof. Dr. Uwe Kastens non-pass-oriented strategies: visit-sequences: an individual plan for each rule of the abstract syntax Generator fits the plans to the dependencies. OAG Visit-sequences CI-75 A visit-sequence (dt. Besuchssequenz) vs p for each production of the tree grammar: p: X o ::= X 1... X i... X n A visit-sequence is a sequence of operations: i, j j-th visit of the i-th subtree j j-th return to the ancestor node eval c execution of a computation c associated to p Example in the tree: visit-sequences A vs p :... C,1... B,1... C, p: A::= BC q: C::= DE B C D E vs q :... D, E, Implementation: one procedure for each section of a visit-sequence upto a call with a switch over applicable productions for attribute partitions guaranty correct interleaving: AI (X,1) AS (X,1) AI (X,2) AS (X,2)

40 Visit-sequences for the AG Binary numbers CI-76 vs p1 : D ::= L '.' L L[1],1; L[1].s=0; L[1],2; L[2],1; L[2].s=NEG(L[2].lg); L[2],2; D.v=ADD(L[1].v, L[2].v); 1 vs p2 : L ::= L B L[2],1; L[1].lg=ADD(L[2].lg,1); 1 L[2].s=ADD(L[1].s,1); L[2],2; B.s=L[1].s; B,1; L[1].v=ADD(L[2].v, B.v); 2 vs p3 : L ::= B L.lg=1; 1; B.s=L.s; B,1; L.v=B.v; 2 vs p4 : B ::= '0' B.v=0; 1 vs p5 : B ::= '1' B.v=Power2(B.s); 1 Implementation: Procedure vs<i><p> for each section of a vs p to a i a call with a switch over alternative rules for X,i Tree walk for AG Binary numbers CI-76a p1 D 5.25 p2 L L p tree walk 1 L 2 4 p2 2 L 1 4 p3 B p B B 0 0 p4 p5 L 1 p3 B p B p attributes: D v s L lg v s B v

41 LAG (k) condition and algorithm CI-77 An AG is a LAG(k), if: For each symbol X there is an attribute partition A (X,1),..., A (X, k), such that the attributes in A (X, i) can be computed in the i-th depth-first left-to-right pass. Necessary and sufficient condition over dependency graphs - expressed graphically: A dependency from right to left X b A(X,j) j > i Y a A(Y,i) X a b A(X,i) A(X,j) i < j A dependency at one symbol on the right-hand side Algorithm: computes A (1),..., A (k), if the AG is LAG(k), for i = 1, 2,... A (i) := all attributes that are not yet assigned remove attributes from A(i) as long as the following rules are applicable: remove X.b, if there is a context where it depends on an attribute of A (i) according to the pattern given above, remove Z.c, if it depends on a removed attribute Finally: all attributes are assigned to a passes i = 1,..., k the AG is LAG(k) all attributes are removed from A(i) the AG is not LAG(k) for any k Generators for attribute grammars CI-78 LIGA University of Paderborn OAG FNC-2 INRIA ANCAG (Oberklasse von OAG) Synthesizer Generator Cornell University OAG, inkrementell CoCo Universität Linz LAG(1) Properties of the generator LIGA integrated in the Eli system, cooperates with other Eli tools high level specification language Lido modular and reusable AG components object-oriented constructs usable for abstraction of computational patterns computations are calls of functions implemented outside the AG side-effect computations can be controlled by dependencies notations for remote attribute access visit-sequence controlled attribute evaluators, implemented in C attribute storage optimization

42 State attributes without values CI-78a RULE: Root ::= Expr COMPUTE Expr.print = "yes"; printf ("\n") <- Expr.printed; RULE: Expr ::= Number COMPUTE Expr.printed = printf ("%d ", Number) <- Expr.print; RULE: Opr ::= '+' COMPUTE Opr.printed = printf ("+ ") <- Opr.print; RULE: Opr ::= '*' COMPUTE Opr.printed = printf ("* ") <- Opr.print; RULE: Expr ::= Expr Opr Expr COMPUTE Expr[2].print = Expr[1].print; Expr[3].print = Expr[2].printed; Opr.print = Expr[3].printed; Expr[1].printed = Opr.printed; The attributes print and printed do not have a value They just describe preand post-conditions of computations: Expr.print: postfix output has been done up to not including this node Expr.printed: postfix output has been done up to including this node Dependency pattern CHAIN CI-78b CHAIN print: VOID; RULE: Root ::= Expr COMPUTE CHAINSTART HEAD.print = "yes"; printf ("\n ") <- TAIL.print; RULE: Expr ::= Number COMPUTE Expr.print = printf ("%d ", Number) <- Expr.print; RULE: Opr ::= '+' COMPUTE Opr.post = printf ("+") <- Opr.pre; RULE: Expr ::= Expr Opr Expr COMPUTE Opr.pre = Expr[3].print; Expr[1].print = Opr.post; A CHAIN specifies a left-to-right depth-first dependency through a subtree. Trivial computations of the form X.a = Y.b in the CHAIN order can be omitted. They are added as needed.

43 Dependency pattern INCLUDING CI-78c ATTR depth: int; RULE: Root ::= Block COMPUTE Block.depth = 0; RULE: Statement ::= Block COMPUTE Block.depth = ADD (INCLUDING Block.depth, 1); TERM Ident: int; RULE: Definition ::= define' Ident COMPUTE printf ("%s defined on depth %d\n ", StringTable (Ident), INCLUDING Block.depth); An attribute at the root of a subtree is used from within the subtree. Propagation through the contexts in between is omitted. INCLUDING Block.depth accesses the depth attribut of the next upper node of type Block. Dependency pattern CONSTITUENTS CI-78d RULE: Block ::= '{' Sequence '}' COMPUTE Block.DefDone = CONSTITUENTS Definition.DefDone; RULE: Definition ::= 'Define' Ident COMPUTE Definition.DefDone = printf ("%s defined in line %d\n", StringTable(Ident), LINE); RULE: Usage ::= 'use' Ident COMPUTE printf ("%s used in line %d\n ", StringTable(Ident), LINE), <- INCLUDING BLOCK.DefDone; A computation accesses attributes from the subtree below its context. Propagation through the contexts in between is omitted. The shown combination with INCLUDING is a common dependency pattern. CONSTITUENTS Definition.DefDone accesses the DefDone attributes of all Definition nodes in the subtree below this context

Compiler I (dt. Übersetzer I) Prof. Dr. Uwe Kastens Winter 2001/2002

Compiler I (dt. Übersetzer I) Prof. Dr. Uwe Kastens Winter 2001/2002 CI-1 Compiler I (dt. Übersetzer I) Prof. Dr. Uwe Kastens Winter 2001/2002 Objectives CI-2 The participants are taught to understand fundamental techniques of language implementation, use generating tools

More information

2001 bei Prof. Dr. Uwe Kastens

2001 bei Prof. Dr. Uwe Kastens CI-1 Objectives CI-2 I (dt. Übersetzer I) Prof. Dr. Uwe Kastens Winter 2001/2002 The participants are taught to understand fundamental techniques of language implementation, use generating tools and standard

More information

Compiler I. (dt. Übersetzer I) Prof. Dr. Uwe Kastens. Winter 2001/2002. Lecture Compiler I WS 2001/2002 / Slide 01

Compiler I. (dt. Übersetzer I) Prof. Dr. Uwe Kastens. Winter 2001/2002. Lecture Compiler I WS 2001/2002 / Slide 01 CI-1 Compiler I (dt. Übersetzer I) Prof. Dr. Uwe Kastens Winter 2001/2002 Lecture Compiler I WS 2001/2002 / Slide 01 Welcome to the lecture! Objectives CI-2 The participants are taught to understand fundamental

More information

3. Context-free Grammars and Syntactic Analysis

3. Context-free Grammars and Syntactic Analysis 3. Context-free Grammars and Syntactic Analysis PLaC-3.1 Input: Tasks: Result: token sequence Parsing: construct a derivation according to the concrete syntax, Tree construction: build a structure tree

More information

Programming Languages and Compilers

Programming Languages and Compilers PLaC-0.1 Programming Languages and Compilers Prof. Dr. Uwe Kastens WS 2013 / 2014 2013 bei Prof. Dr. Uwe Kastens 0. Introduction PLaC-0.2 The participants are taught to Objectives understand properties

More information

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised: EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)

More information

Wednesday, September 9, 15. Parsers

Wednesday, September 9, 15. Parsers Parsers What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs: What is a parser Parsers A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Chapter 4. Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing Copyright 2012 Addison-Wesley. All rights reserved.

More information

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing Roadmap > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing The role of the parser > performs context-free syntax analysis > guides

More information

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468 Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure

More information

Wednesday, August 31, Parsers

Wednesday, August 31, Parsers Parsers How do we combine tokens? Combine tokens ( words in a language) to form programs ( sentences in a language) Not all combinations of tokens are correct programs (not all sentences are grammatically

More information

Monday, September 13, Parsers

Monday, September 13, Parsers Parsers Agenda Terminology LL(1) Parsers Overview of LR Parsing Terminology Grammar G = (Vt, Vn, S, P) Vt is the set of terminals Vn is the set of non-terminals S is the start symbol P is the set of productions

More information

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design PSD3A Principles of Compiler Design Unit : I-V 1 UNIT I - SYLLABUS Compiler Assembler Language Processing System Phases of Compiler Lexical Analyser Finite Automata NFA DFA Compiler Tools 2 Compiler -

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

More information

CS606- compiler instruction Solved MCQS From Midterm Papers

CS606- compiler instruction Solved MCQS From Midterm Papers CS606- compiler instruction Solved MCQS From Midterm Papers March 06,2014 MC100401285 Moaaz.pk@gmail.com Mc100401285@gmail.com PSMD01 Final Term MCQ s and Quizzes CS606- compiler instruction If X is a

More information

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code. COMP 412 FALL 2017 Syntax Analysis, VII One more LR(1) example, plus some more stuff Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon,

More information

3. Parsing. Oscar Nierstrasz

3. Parsing. Oscar Nierstrasz 3. Parsing Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up

More information

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1 CSE P 501 Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008 1/15/2008 2002-08 Hal Perkins & UW CSE C-1 Agenda for Today Parsing overview Context free grammars Ambiguous grammars Reading:

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

More information

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers. Part III : Parsing From Regular to Context-Free Grammars Deriving a Parser from a Context-Free Grammar Scanners and Parsers A Parser for EBNF Left-Parsable Grammars Martin Odersky, LAMP/DI 1 From Regular

More information

Question Bank. 10CS63:Compiler Design

Question Bank. 10CS63:Compiler Design Question Bank 10CS63:Compiler Design 1.Determine whether the following regular expressions define the same language? (ab)* and a*b* 2.List the properties of an operator grammar 3. Is macro processing a

More information

CJT^jL rafting Cm ompiler

CJT^jL rafting Cm ompiler CJT^jL rafting Cm ompiler ij CHARLES N. FISCHER Computer Sciences University of Wisconsin Madison RON K. CYTRON Computer Science and Engineering Washington University RICHARD J. LeBLANC, Jr. Computer Science

More information

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. COMPILER DESIGN 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the target

More information

VIVA QUESTIONS WITH ANSWERS

VIVA QUESTIONS WITH ANSWERS VIVA QUESTIONS WITH ANSWERS 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the

More information

Context-free grammars

Context-free grammars Context-free grammars Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals - tokens non-terminals - represent higher-level structures of a program start symbol,

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

CST-402(T): Language Processors

CST-402(T): Language Processors CST-402(T): Language Processors Course Outcomes: On successful completion of the course, students will be able to: 1. Exhibit role of various phases of compilation, with understanding of types of grammars

More information

Compiler Construction: Parsing

Compiler Construction: Parsing Compiler Construction: Parsing Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Parsing 1 / 33 Context-free grammars. Reference: Section 4.2 Formal way of specifying rules about the structure/syntax

More information

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701) Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov 2014 Compiler Design (170701) Question Bank / Assignment Unit 1: INTRODUCTION TO COMPILING

More information

Appendix A The DL Language

Appendix A The DL Language Appendix A The DL Language This appendix gives a description of the DL language used for many of the compiler examples in the book. DL is a simple high-level language, only operating on integer data, with

More information

Software II: Principles of Programming Languages

Software II: Principles of Programming Languages Software II: Principles of Programming Languages Lecture 4 Language Translation: Lexical and Syntactic Analysis Translation A translator transforms source code (a program written in one language) into

More information

Generators for High-Speed Front-Ends. J. Grosch DR. JOSEF GROSCH COCOLAB - DATENVERARBEITUNG GERMANY

Generators for High-Speed Front-Ends. J. Grosch DR. JOSEF GROSCH COCOLAB - DATENVERARBEITUNG GERMANY Generators for High-Speed Front-Ends J. Grosch DR. JOSEF GROSCH COCOLAB - DATENVERARBEITUNG GERMANY Cocktail Toolbox for Compiler Construction Generators for High-Speed Front-Ends Josef Grosch Sept. 28,

More information

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab

More information

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1 CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions

More information

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a EDA180: Compiler Construc6on Top- down parsing Görel Hedin Revised: 2013-01- 30a Compiler phases and program representa6ons source code Lexical analysis (scanning) Intermediate code genera6on tokens intermediate

More information

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis. Topics Chapter 4 Lexical and Syntax Analysis Introduction Lexical Analysis Syntax Analysis Recursive -Descent Parsing Bottom-Up parsing 2 Language Implementation Compilation There are three possible approaches

More information

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing 8 Parsing Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces strings A parser constructs a parse tree for a string

More information

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Introduction - Language implementation systems must analyze source code, regardless of the specific implementation approach - Nearly all syntax analysis is based on

More information

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing Parsing Wrapup Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing LR(1) items Computing closure Computing goto LR(1) canonical collection This lecture LR(1) parsing Building ACTION

More information

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1 Table of Contents About the Authors... iii Introduction... xvii Chapter 1: System Software... 1 1.1 Concept of System Software... 2 Types of Software Programs... 2 Software Programs and the Computing Machine...

More information

Chapter 3: Lexing and Parsing

Chapter 3: Lexing and Parsing Chapter 3: Lexing and Parsing Aarne Ranta Slides for the book Implementing Programming Languages. An Introduction to Compilers and Interpreters, College Publications, 2012. Lexing and Parsing* Deeper understanding

More information

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous. Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or

More information

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised: EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing Görel Hedin Revised: 2017-09-04 This lecture Regular expressions Context-free grammar Attribute grammar

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Any questions about the syllabus?! Course Material available at www.cs.unic.ac.cy/ioanna! Next time reading assignment [ALSU07]

More information

SYLLABUS UNIT - I UNIT - II UNIT - III UNIT - IV CHAPTER - 1 : INTRODUCTION CHAPTER - 4 : SYNTAX AX-DIRECTED TRANSLATION TION CHAPTER - 7 : STORA

SYLLABUS UNIT - I UNIT - II UNIT - III UNIT - IV CHAPTER - 1 : INTRODUCTION CHAPTER - 4 : SYNTAX AX-DIRECTED TRANSLATION TION CHAPTER - 7 : STORA Contents i SYLLABUS UNIT - I CHAPTER - 1 : INTRODUCTION Programs Related to Compilers. Translation Process, Major Data Structures, Other Issues in Compiler Structure, Boot Strapping and Porting. CHAPTER

More information

Top down vs. bottom up parsing

Top down vs. bottom up parsing Parsing A grammar describes the strings that are syntactically legal A recogniser simply accepts or rejects strings A generator produces sentences in the language described by the grammar A parser constructs

More information

GUJARAT TECHNOLOGICAL UNIVERSITY

GUJARAT TECHNOLOGICAL UNIVERSITY Type of course: Core GUJARAT TECHNOLOGICAL UNIVERSITY SUBJECT NAME: Compiler Design SUBJECT CODE: 21701 B.E. 7 th SEMESTER Prerequisite: Data Structures and Algorithms, Theory of Computation, Rationale:

More information

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc. Syntax Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will

More information

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Crafting a Compiler with C (II) Compiler V. S. Interpreter Crafting a Compiler with C (II) 資科系 林偉川 Compiler V S Interpreter Compilation - Translate high-level program to machine code Lexical Analyzer, Syntax Analyzer, Intermediate code generator(semantics Analyzer),

More information

Construction of Application Generators Using Eli. Uwe Kastens, University of Paderborn, FRG. Abstract

Construction of Application Generators Using Eli. Uwe Kastens, University of Paderborn, FRG. Abstract Construction of Application Generators Using Eli Uwe Kastens, University of Paderborn, FRG Abstract Application generators are a powerful means for reuse of software design. They produce special purpose

More information

Principles of Programming Languages [PLP-2015] Detailed Syllabus

Principles of Programming Languages [PLP-2015] Detailed Syllabus Principles of Programming Languages [PLP-2015] Detailed Syllabus This document lists the topics presented along the course. The PDF slides published on the course web page (http://www.di.unipi.it/~andrea/didattica/plp-15/)

More information

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23 Parser Larissa von Witte Institut für oftwaretechnik und Programmiersprachen 11. Januar 2016 L. v. Witte 11. Januar 2016 1/23 Contents Introduction Taxonomy Recursive Descent Parser hift Reduce Parser

More information

Lecture 8: Deterministic Bottom-Up Parsing

Lecture 8: Deterministic Bottom-Up Parsing Lecture 8: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Fri Feb 12 13:02:57 2010 CS164: Lecture #8 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 9/22/06 Prof. Hilfinger CS164 Lecture 11 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient

More information

SYNTAX ANALYSIS 1. Define parser. Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with collective meaning. Also termed as Parsing. 2. Mention the basic

More information

Building Compilers with Phoenix

Building Compilers with Phoenix Building Compilers with Phoenix Syntax-Directed Translation Structure of a Compiler Character Stream Intermediate Representation Lexical Analyzer Machine-Independent Optimizer token stream Intermediate

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution

More information

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1 CSE P 501 Compilers Parsing & Context-Free Grammars Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 C-1 Administrivia Project partner signup: please find a partner and fill out the signup form by noon

More information

LECTURE NOTES ON COMPILER DESIGN P a g e 2

LECTURE NOTES ON COMPILER DESIGN P a g e 2 LECTURE NOTES ON COMPILER DESIGN P a g e 1 (PCCS4305) COMPILER DESIGN KISHORE KUMAR SAHU SR. LECTURER, DEPARTMENT OF INFORMATION TECHNOLOGY ROLAND INSTITUTE OF TECHNOLOGY, BERHAMPUR LECTURE NOTES ON COMPILER

More information

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators Jeremy R. Johnson 1 Theme We have now seen how to describe syntax using regular expressions and grammars and how to create

More information

CS5363 Final Review. cs5363 1

CS5363 Final Review. cs5363 1 CS5363 Final Review cs5363 1 Programming language implementation Programming languages Tools for describing data and algorithms Instructing machines what to do Communicate between computers and programmers

More information

CSCI312 Principles of Programming Languages!

CSCI312 Principles of Programming Languages! CSCI312 Principles of Programming Languages!! Chapter 3 Regular Expression and Lexer Xu Liu Recap! Copyright 2006 The McGraw-Hill Companies, Inc. Clite: Lexical Syntax! Input: a stream of characters from

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309 PART 3 - SYNTAX ANALYSIS F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 64 / 309 Goals Definition of the syntax of a programming language using context free grammars Methods for parsing

More information

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI UNIT I - LEXICAL ANALYSIS 1. What is the role of Lexical Analyzer? [NOV 2014] 2. Write

More information

Chapter 3. Parsing #1

Chapter 3. Parsing #1 Chapter 3 Parsing #1 Parser source file get next character scanner get token parser AST token A parser recognizes sequences of tokens according to some grammar and generates Abstract Syntax Trees (ASTs)

More information

LR Parsing LALR Parser Generators

LR Parsing LALR Parser Generators LR Parsing LALR Parser Generators Outline Review of bottom-up parsing Computing the parsing DFA Using parser generators 2 Bottom-up Parsing (Review) A bottom-up parser rewrites the input string to the

More information

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl Compilerconstructie najaar 2013 http://www.liacs.nl/home/rvvliet/coco/ Rudy van Vliet kamer 124 Snellius, tel. 071-527 5777 rvvliet(at)liacs(dot)nl college 1, dinsdag 3 september 2013 Overview 1 Why this

More information

Lecture 7: Deterministic Bottom-Up Parsing

Lecture 7: Deterministic Bottom-Up Parsing Lecture 7: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Tue Sep 20 12:50:42 2011 CS164: Lecture #7 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Syntax Analysis (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2007 College of Engineering, Pune Syntax Analysis: 2/124 Syntax

More information

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution Exam Facts Format Wednesday, July 20 th from 11:00 a.m. 1:00 p.m. in Gates B01 The exam is designed to take roughly 90

More information

Syntax Analysis Part I

Syntax Analysis Part I Syntax Analysis Part I Chapter 4: Context-Free Grammars Slides adapted from : Robert van Engelen, Florida State University Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token,

More information

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3 Programming Language Specification and Translation ICOM 4036 Fall 2009 Lecture 3 Some parts are Copyright 2004 Pearson Addison-Wesley. All rights reserved. 3-1 Language Specification and Translation Topics

More information

UNIT III & IV. Bottom up parsing

UNIT III & IV. Bottom up parsing UNIT III & IV Bottom up parsing 5.0 Introduction Given a grammar and a sentence belonging to that grammar, if we have to show that the given sentence belongs to the given grammar, there are two methods.

More information

LR Parsing Techniques

LR Parsing Techniques LR Parsing Techniques Introduction Bottom-Up Parsing LR Parsing as Handle Pruning Shift-Reduce Parser LR(k) Parsing Model Parsing Table Construction: SLR, LR, LALR 1 Bottom-UP Parsing A bottom-up parser

More information

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino 3. Syntax Analysis Andrea Polini Formal Languages and Compilers Master in Computer Science University of Camerino (Formal Languages and Compilers) 3. Syntax Analysis CS@UNICAM 1 / 54 Syntax Analysis: the

More information

Programming Language Syntax and Analysis

Programming Language Syntax and Analysis Programming Language Syntax and Analysis 2017 Kwangman Ko (http://compiler.sangji.ac.kr, kkman@sangji.ac.kr) Dept. of Computer Engineering, Sangji University Introduction Syntax the form or structure of

More information

Compiler Construction

Compiler Construction Compiler Construction Lecture 1: Introduction Thomas Noll Lehrstuhl für Informatik 2 (Software Modeling and Verification) noll@cs.rwth-aachen.de http://moves.rwth-aachen.de/teaching/ss-14/cc14/ Summer

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

LIDO - Reference Manual

LIDO - Reference Manual LIDO - Reference Manual $Revision: 4.31 $ Uwe Kastens Compiler and Programming Language Group Faculty of Electrical Engineering, Computer Science and Mathematics University of Paderborn, Germany Copyright,

More information

Syntax. A. Bellaachia Page: 1

Syntax. A. Bellaachia Page: 1 Syntax 1. Objectives & Definitions... 2 2. Definitions... 3 3. Lexical Rules... 4 4. BNF: Formal Syntactic rules... 6 5. Syntax Diagrams... 9 6. EBNF: Extended BNF... 10 7. Example:... 11 8. BNF Statement

More information

LIDO - Reference Manual

LIDO - Reference Manual LIDO - Reference Manual $Revision: 4.34 $ Uwe Kastens Compiler and Programming Language Group Faculty of Electrical Engineering, Computer Science and Mathematics University of Paderborn, Germany Copyright,

More information

ICOM 4036 Spring 2004

ICOM 4036 Spring 2004 Language Specification and Translation ICOM 4036 Spring 2004 Lecture 3 Copyright 2004 Pearson Addison-Wesley. All rights reserved. 3-1 Language Specification and Translation Topics Structure of a Compiler

More information

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4020 July 5, 2007 14.00-15.30 This exam (8 pages) consists of 60 True/False

More information

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Formal Languages and Compilers Lecture VII Part 3: Syntactic A Formal Languages and Compilers Lecture VII Part 3: Syntactic Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/

More information

Context-free grammars (CFG s)

Context-free grammars (CFG s) Syntax Analysis/Parsing Purpose: determine if tokens have the right form for the language (right syntactic structure) stream of tokens abstract syntax tree (AST) AST: captures hierarchical structure of

More information

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram CS6660 COMPILER DESIGN Question Bank UNIT I-INTRODUCTION TO COMPILERS 1. Define compiler. 2. Differentiate compiler and interpreter. 3. What is a language processing system? 4. List four software tools

More information

More Assigned Reading and Exercises on Syntax (for Exam 2)

More Assigned Reading and Exercises on Syntax (for Exam 2) More Assigned Reading and Exercises on Syntax (for Exam 2) 1. Read sections 2.3 (Lexical Syntax) and 2.4 (Context-Free Grammars) on pp. 33 41 of Sethi. 2. Read section 2.6 (Variants of Grammars) on pp.

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 60 20 DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK VI SEMESTER CS6660 COMPILER DESIGN Regulation 20 Academic Year 207 8 Prepared by Ms. S.

More information

Compiler Design Aug 1996

Compiler Design Aug 1996 Aug 1996 Part A 1 a) What are the different phases of a compiler? Explain briefly with the help of a neat diagram. b) For the following Pascal keywords write the state diagram and also write program segments

More information

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2 Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Formal Languages Basis for the design and implementation of programming languages Alphabet: finite set Σ of symbols String: finite sequence

More information

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1 CSE P 501 Compilers LR Parsing Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 D-1 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts UW CSE P 501 Spring 2018

More information

Grammars and Parsing. Paul Klint. Grammars and Parsing

Grammars and Parsing. Paul Klint. Grammars and Parsing Paul Klint Grammars and Languages are one of the most established areas of Natural Language Processing and Computer Science 2 N. Chomsky, Aspects of the theory of syntax, 1965 3 A Language...... is a (possibly

More information

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1) TD parsing - LL(1) Parsing First and Follow sets Parse table construction BU Parsing Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1) Problems with SLR Aho, Sethi, Ullman, Compilers

More information

Optimizing Finite Automata

Optimizing Finite Automata Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states

More information

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7 Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess

More information

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F). CS 2210 Sample Midterm 1. Determine if each of the following claims is true (T) or false (F). F A language consists of a set of strings, its grammar structure, and a set of operations. (Note: a language

More information