MIDTERM REVIEW. Lectures 1-15

Size: px
Start display at page:

Download "MIDTERM REVIEW. Lectures 1-15"

Transcription

1 MIDTERM REVIEW Lectures 1-15

2 LECTURE 1: OVERVIEW AND HISTORY Evolution Design considerations: What is a good or bad programming construct? Early 70s: structured programming in which goto-based control flow was replaced by high-level constructs (e.g. while loops and case statements). Late 80s: nested block structure gave way to object-oriented structures. Special Purposes Many languages were designed for a specific problem domain (e.g Scientific applications, Business applications, Artificial intelligence, Systems programming, Internet programming, etc). Personal Preference The strength and variety of personal preference makes it unlikely that anyone will ever develop a universally accepted programming language.

3 LECTURE 1: OVERVIEW AND HISTORY Expressive Power Theoretically, all languages are equally powerful (Turing complete). Language features have a huge impact on the programmer's ability to read, write, maintain, and analyze programs. Ease of Use for Novice Low learning curve and often interpreted, e.g. Basic and Logo. Ease of Implementation Runs on virtually everything, e.g. Basic, Pascal, and Java. Open Source Freely available, e.g. Java. Excellent Compilers and Tools Supporting tools to help the programmer manage very large projects. Economics, Patronage, and Inertia Powerful sponsor: Cobol, PL/I, Ada. Some languages remain widely used long after "better" alternatives.

4 LECTURE 1: OVERVIEW AND HISTORY Classification of Programming Languages Declarative: Implicit solution. What should the computer do? Functional Lisp, Scheme, ML, Haskell Logic Prolog Dataflow Simulink, Scala Imperative: Explicit solution. How should the computer do it? Procedural Fortran, C Object-Oriented Smalltalk, C++, Java

5 LECTURE 2: COMPILATION AND INTERPRETATION Programs written in high-level languages can be run in two ways. Compiled into an executable program written in machine language for the target machine. Directly interpreted and the execution is simulated by the interpreter. In general, which approach is more efficient?

6 LECTURE 2: COMPILATION AND INTERPRETATION Programs written in high-level languages can be run in two ways. Compiled into an executable program written in machine language for the target machine. Directly interpreted and the execution is simulated by the interpreter. In general, which approach is more efficient? Compilation is always more efficient but interpretation leads to more flexibility.

7 LECTURE 2: COMPILATION AND INTERPRETATION How do you choose? Typically, most languages are implemented using a mixture of both approaches. Practically speaking, there are two aspects that distinguish what we consider compilation from interpretation. Thorough Analysis Compilation requires a thorough analysis of the code. Non-trivial Transformation Compilation generates intermediate representations that typically do not resemble the source code.

8 LECTURE 2: COMPILATION AND INTERPRETATION Preceprocessing Initial translation step. Slightly modifies source code to be interpreted more efficiently. Removing comments and whitespace, grouping characters into tokens, etc. Linking Linkers merge necessary library routines to create the final executable.

9 LECTURE 2: COMPILATION AND INTERPRETATION Post-Compilation Assembly Many compilers translate the source code into assembly rather than machine language. Changes in machine language won t affect source code. Assembly is easier to read (for debugging purposes). Source-to-source Translation Compiling source code into another high-level language. Early C++ programs were compiled into C, which was compiled into assembly.

10 LECTURE 3: COMPILER PHASES Front-End Analysis Source Program Scanner (Lexical Analysis) Tokens Back-End Synthesis Abstract Syntax Tree Machine-Independent Code Improvement Modified Intermediate Form Parser (Syntax Analysis) Parse Tree Semantic Analysis & Intermediate Code Generation Abstract Syntax Tree Target Code Generation Assembly or Object Code Machine-Specific Code Improvement Modified Assembly or Object Code

11 LECTURE 3: COMPILER PHASES Lexical analysis is the process of tokenizing characters that appear in a program. A scanner (or lexer) groups characters together into meaningful tokens which are then sent to the parser. As the scanner reads in the characters, it produces meaningful tokens. Tokens are typically defined using regular expressions, which are understood by a lexical analyzer generator such as lex. What the scanner picks up: The resulting tokens: i, n, t,, m, a, i, n, (, ), {. int, main, (, ), {, int, i, =, getint, (, ),.

12 LECTURE 3: COMPILER PHASES Syntax analysis is performed by a parser which takes the tokens generated by the scanner and creates a parse tree which shows how tokens fit together within a valid program. The structure of the parse tree is dictated by the grammar of the programming language.

13 LECTURE 3: COMPILER PHASES Semantic analysis is the process of attempting to discover whether a valid pattern of tokens is actually meaningful. Even if we know that the sequence of tokens is valid, it may still be an incorrect program. For example: a = b; What if a is an int and b is a character array? To protect against these kinds of errors, the semantic analyzer will keep track of the types of identifiers and expressions in order to ensure they are used consistently.

14 LECTURE 3: COMPILER PHASES What kinds of errors can be caught in the lexical analysis phase? Invalid tokens. What kinds of errors are caught in the syntax analysis phase? Syntax errors: invalid sequences of tokens.

15 LECTURE 3: COMPILER PHASES Static Semantic Checks: semantic rules that can be checked at compile time. Type checking. Every variable is declared before used. Identifiers are used in appropriate contexts. Checking function call arguments. Dynamic Semantic Checks: semantic rules that are checked at run time. Array subscript values are within bounds. Arithmetic errors, e.g. division by zero. Pointers are not dereferenced unless pointing to valid object. When a check fails at run time, an exception is raised.

16 LECTURE 3: COMPILER PHASES Assuming C++, what kinds of errors are these? int int =?3; int y = 3; x = y; Hello, World! int x; double y = 2.5; x = y; void sum(int, int); sum(1,2,3); myint++ z = y/x // y is 1, x is 0

17 LECTURE 3: COMPILER PHASES Assuming C++, what kinds of errors are these? int // Lexical int =?3; // Syntax int y = 3; x = y; // Static semantic Hello, World! // Syntax int x; double y = 2.5; x = y; // Static semantic void sum(int, int); sum(1,2,3); // Static Semantic myint++ // Syntax z = y/x // y is 1, x is 0 // Dynamic Semantic

18 LECTURE 3: COMPILER PHASES Code Optimization Once the AST (or alternative intermediate form) has been generated, the compiler can perform machine-independent code optimization. The goal is to modify the code so that it is quicker and uses resources more efficiently. There is an additional optimization step performed after the creation of the object code.

19 LECTURE 3: COMPILER PHASES Target Code Generation Goal: translate the intermediate form of the code (typically, the AST) into object code. In the case of languages that translate into assembly language, the code generator will first pass through the symbol table, creating space for the variables. Next, the code generator passes through the intermediate code form, generating the appropriate assembly code. As stated before, the compiler makes one more pass through the object code to perform further optimization.

20 LECTURE 4: SYNTAX We know from the previous lecture that the front-end of the compiler has three main phases: Scanning Parsing Semantic Analysis Syntax Verification Scanning Identifies the valid tokens, the basic building blocks, within a program. Parsing Identifies the valid patterns of tokens, or constructs. So how do we specify what a valid token is? Or what constitutes a valid construct?

21 LECTURE 4: SYNTAX Tokens can be constructed from regular characters using just three rules: 1. Concatenation. 2. Alternation (choice among a finite set of alternatives). 3. Kleene Closure (arbitrary repetition). Any set of strings that can be defined by these three rules is a regular set. Regular sets are generated by regular expressions.

22 LECTURE 4: SYNTAX Formally, all of the following are valid regular expressions (let R and S be regular expressions and let Σ be a finite set of symbols): The empty set. The set containing the empty string ε. The set containing a single literal character α from the alphabet Σ. Concatenation: RS is the set of strings obtained by concatenation of one string from R with a string from S. Alternation: R S describes the union of R and S. Kleene Closure: R* is the set of strings that can be obtained by concatenating any number of strings from R.

23 LECTURE 4: SYNTAX You can either use parentheses to avoid ambiguity or assume Kleene star has the highest priority, followed by concatenation then alternation. Examples: a* = {ε, a, aa, aaa, aaaa, aaaaa, } a b* = {ε, a, b, bb, bbb, bbbb, } (ab)* = {ε, ab, abab, ababab, abababab, } (a b)* = {ε, a, b, aa, ab, ba, bb, aaa, aab, }

24 LECTURE 4: SYNTAX Create regular expressions for the following examples: Zero or more c s followed by a single a or a single b. Binary strings starting and ending with 1. Binary strings containing at least 3 1 s.

25 LECTURE 4: SYNTAX Create regular expressions for the following examples: Zero or more c s followed by a single a or a single b. c*(a b) Binary strings starting and ending with (0 1)*1 Binary strings containing at least 3 1 s. 0*10*10*1(0 1)*

26 LECTURE 4: SYNTAX We can completely define our tokens in terms of regular expressions, but more complicated constructs necessitate recursion. The set of strings that can be defined by adding recursion to regular expressions is known as a Context-Free Language. Context-Free Languages are generated by Context-Free Grammars.

27 LECTURE 4: SYNTAX Context-free grammars are composed of rules known as productions. Each production has left-hand side symbols known as non-terminals, or variables. On the right-hand side, a production may contain terminals (tokens) or other nonterminals. One of the non-terminals is named the start symbol. expr id number - expr ( expr ) expr op expr op + - * /

28 LECTURE 4: SYNTAX So, how do we use the context-free grammar to generate syntactically valid strings of terminals (or tokens)? 1. Begin with the start symbol. 2. Choose a production with the start symbol on the left side. 3. Replace the start symbol with the right side of the chosen production. 4. Choose a non-terminal A in the resulting string. 5. Replace A with the right side of a production whose left side is A. 6. Repeat 4 and 5 until no non-terminals remain.

29 LECTURE 7: PARSING program expr expr term expr_tail expr_tail + term expr_tail ε term factor term_tail term_tail * factor term_tail ϵ factor (expr) int How can we derive the following strings from this grammar? (3 + 1) * 5 (1 + 5) * 7

30 LECTURE 4: SYNTAX Write a grammar which recognizes if-statements of the form: if expression statements else statements where expressions are of the form id > num or id < num. Statements can be any numbers of statements of the form id = num or print id.

31 LECTURE 4: SYNTAX program if expr stmts else stmts expr id > num id < num stmts stmt stmts stmt stmt id = num print id

32 LECTURE 5: SCANNING A recognizer for a language is a program that takes a string x as input and answers yes if x is a sentence of the language and no otherwise. In the context of lexical analysis, given a string and a regular expression, a recognizer of the language specified by the regular expression answers yes if the string is in the language. How can we recognize a regular expression (int)? What about (int for)? We could, for example, write an ad hoc scanner that contained simple conditions to test, the ability to peek ahead at the next token, and loops for numerous characters of the same type.

33 LECTURE 5: SCANNING A set of regular expressions can be compiled into a recognizer automatically by constructing a finite automaton using scanner generator tools (lex, for example). A finite automaton is a simple idealized machine that is used to recognize patterns within some input. A finite automaton will accept or reject an input depending on whether the pattern defined by the finite automaton occurs in the input. The elements of a finite automaton, given a set of input characters, are A finite set of states (or nodes). A specially-denoted start state. A set of final (accepting) states. A set of labeled transitions (or arcs) from one state to another.

34 LECTURE 5: SCANNING Finite automata come in two flavors. Deterministic Never any ambiguity. For any given state and any given input, only one possible transition. Non-deterministic There may be more than one transition from any given state for any given character. There may be epsilon transitions transitions labeled by the empty string. There is no obvious algorithm for converting regular expressions to DFAs.

35 LECTURE 5: SCANNING Typically scanner generators create DFAs from regular expressions in the following way: Create NFA equivalent to regular expression. Construct DFA equivalent to NFA. Minimize the number of states in the DFA.

36 LECTURE 5: SCANNING Concatenation: ab s a b f Alternation: a b s ε ε a b ε ε f ε Kleene Closure: a* s ε a ε f ε

37 LECTURE 5: SCANNING Create NFAs for the regular expressions we created before: Zero or more c s followed by a single a or a single b. c*(a b) Binary strings starting and ending with (0 1)*1 Binary strings containing at least 3 1 s. 0*10*10*1(0 1)*

38 LECTURE 6: SCANNING PART 2 How do we take our minimized DFA and practically implement a scanner? After all, finite automata are idealized machines. We didn t actually build a physical recognizer yet! Well, we have two options. Represent the DFA using goto and case (switch) statements. Handwritten scanners. Use a table to represent states and transitions. Driver program simply indexes table. Auto-generated scanners. The scanner generator Lex creates a table and driver in C. Some other scanner generators create only the table for use by a handwritten driver.

39 LECTURE 6: SCANNING PART 2 state = s1 token = loop case state of s1: case in_char of c : state = s2 else error s2: case in_char of a : state = s1 b : state = s1 : state = s1 return token else error token = token + in_char read new in_char S1 a c b S2

40 LECTURE 6: SCANNING PART 2 Longest Possible Token Rule So, why do we need to peek ahead? Why not just accept when we pick up c or cac? Scanners need to accept as many tokens as they can to form a valid token. For example, should be one literal token, not two (e.g and 159). So when we pick up 4, we peek ahead at 1 to see if we can keep going or return the token as is. If we peeked ahead after 4 and saw whitespace, we could return the token in its current form. A single peek means we have a look-ahead of one character.

41 LECTURE 6: SCANNING PART 2 a Table-driven scanning approach: S1 c S2 b State a b c Return S1 - - S2 - S2 S1 S1 - token A driver program uses the current state and input character to index into the table. We can either Move to a new state. Return a token (and save the image). Raise an error (and recover gracefully).

42 LECTURE 7: PARSING So now that we know the ins-and-outs of how compilers determine the valid tokens of a program, we can talk about how they determine valid patterns of tokens. A parser is the part of the compiler which is responsible for serving as the recognizer of the programming language, in the same way that the scanner is the recognizer for the tokens.

43 LECTURE 7: PARSING Even though we typically picture parsing as the stage that comes after scanning, this isn t really the case. In a real scenario, the parser will generally call the scanner as needed to obtain input tokens. It creates a parse tree out of the tokens and passes it to the later stages of the compiler. This style of compilation is known as syntax-directed translation.

44 LECTURE 7: PARSING Let s review context-free grammars. Each context-free grammar has four components: A finite set of tokens (terminal symbols) A finite set of nonterminals. A finite set of productions N (T N)* A special nonterminal called the start symbol. The idea is similar to regular expressions, except that we can create recursive definitions. Therefore, context-free grammars are more expressive.

45 LECTURE 7: PARSING Given a context-free grammar, parsing is the process of determining whether the start symbol can derive the program. If successful, the program is a valid program. If failed, the program is invalid.

46 LECTURE 7: PARSING There are two classes of grammars for which linear-time parsers can be constructed: LL Left-to-right, leftmost derivation Input is read from left to right. Derivation is left-most. Can be hand-written or generated by a parser generator. LR Left-to-right, rightmost derivation Input is read from left to right. Derivation is right-most. More common, larger class of grammars. Almost always automatically generated.

47 LECTURE 7: PARSING LL parsers are Top-Down ( Predictive ) parsers. Construct the parse tree from the root down, predicting the production used based on some lookahead. LR parsers are Bottom-Up parsers. Construct the parse tree from the leaves up, joining nodes together under single parents.

48 LECTURE 8: PARSING There are two types of LL parsers: Recursive Descent Parsers and Table-Driven Top- Down Parsers. Recursive descent parsers are an LL parser in which every non-terminal in the grammar corresponds to a subroutine of the parser. Typically hand-written but can be automatically generated. Used when a language is relatively simple.

49 LECTURE 8: PARSING In a table-driven parser, we have two elements: A driver program, which maintains a stack of symbols. (language independent) A parsing table, typically automatically generated. (language dependent)

50 LECTURE 8: PARSING Here s the general method for performing table-driven parsing: We have a stack of grammar symbols. Initially, we just push the start symbol. We have a string of input tokens, ending with $. We have a parsing table M[N, T]. We can index into M using the current non-terminal at the top of the stack and the input token. 1. If top == input == $ : accept. 2. If top == input: pop the top of the stack, read new input token, goto If top is nonterminal: if M[N, T] is a production: pop top of stack and replace with production, goto 1. else error. 4. Else error.

51 LECTURE 8: PARSING Calculating an LL(1) parsing table includes calculating the first and follow sets. This is how we make decisions about which production to take based on the input.

52 LECTURE 8: PARSING First Sets Case 1: Let s say N ω. To figure out which input tokens will allow us to replace N with ω, we calculate First(ω) the set of tokens which could start the string ω. If X is a terminal symbol, First(X) = X. If X is ε, add ε to First(X). If X is a non-terminal, look at all productions where X is on left-hand side. Each production will be of the form: X Y 1 Y 2 Y k where Y is a nonterminal or terminal. Then: Put First(Y 1 ) - ε in First(X). If ε is in First(Y 1 ), then put First(Y 2 ) - ε in First(X). If ε is in First(Y 2 ), then put First(Y 3 ) - ε in First(X). If ε is in Y 1, Y 2,, Y k, then add ε to First(X).

53 LECTURE 8: PARSING If we compute First(X) for every terminal and non-terminal X in a grammar, then we can compute First(ω), the tokens which can veritably start any string derived from ω. Why do we care about the First(ω) sets? During parsing, suppose the top-of-stack symbol is nonterminal A and there are two productions A α and A β. Suppose also that the current token is a. Well, if First(α) includes a, then we can predict this will be the production taken.

54 LECTURE 8: PARSING Follow Sets Follow(N) gives us the set of terminal symbols that could follow the non-terminal symbol N. To calculate Follow(N), do the following: If N is the starting non-terminal, put EOF (or other program-ending symbol) in Follow(N). If X αn, where α is some string of non-terminals and/or terminals, put Follow(X) in Follow(N). If X αnβ where α, β are some string of non-terminals and/or terminals, put First(β) in Follow(N). If First(β) includes ε, then put Follow(X) in Follow(N).

55 LECTURE 8: PARSING Why do we care about the Follow(N) sets? During parsing, suppose the top-of-stack symbol is nonterminal A and there are two productions A α and A β. Suppose also that the current token is a. What if neither First(α) nor First(β) contain a, but they contain ε? We use the Follow sets to determine which production to take.

56 LECTURE 9: COMPUTING AN LL(1) PARSING TABLE The basic outline for creating a parsing table from a LL(1) grammar is the following: Compute the First sets of the non-terminals. Compute the Follow sets of the non-terminals. For each production N ω, Add N ω to M[ N, t] for each t in First(ω). If First(ω) contains ε, add N ω to M[ N, t] for each t in Follow(N). All undefined entries represent a parsing error.

57 LECTURE 9: COMPUTING AN LL(1) PARSING TABLE stmt if expr then stmt else stmt stmt while expr do stmt stmt begin stmts end stmts stmt ; stmts stmts ε expr id Let s compute the LL(1) parsing table for this grammar and parse the string: while id do begin begin end ; end $

58 LECTURE 10: SEMANTIC ANALYSIS We ve discussed in previous lectures how the syntax analysis phase of compilation results in the creation of a parse tree. Semantic analysis is performed by annotating, or decorating, the parse tree. These annotations are known as attributes. An attribute grammar connects syntax with semantics.

59 LECTURE 10: SEMANTIC ANALYSIS Attribute Grammars Each grammar production has a semantic rule with actions (e.g. assignments) to modify values of attributes of (non)terminals. A (non)terminal may have any number of attributes. Attributes have values that hold information related to the (non)terminal. General form: production semantic rule <A> <B> <C> A.a :=...; B.a :=...; C.a :=...

60 LECTURE 10: SEMANTIC ANALYSIS Some points to remember: A (non)terminal may have any number of attributes. The val attribute of a (non)terminal holds the subtotal value of the subexpression. Nonterminals are indexed in the attribute grammar to distinguish multiple occurrences of the nonterminal in a production this has no bearing on the grammar itself. Strictly speaking, attribute grammars only contain copy rules and semantic functions. Semantic functions may only refer to attributes in the current production.

61 LECTURE 10: SEMANTIC ANALYSIS Strictly speaking, attribute grammars only consist of copy rules and calls to semantic functions. But in practice, we can specify well-defined notation to make the semantic rules look more code-like. E 1 E 2 + T E 1. val E 2. val + T. val E 1 E 2 T E T T 1 T 2 * F T 1 T 2 / F T F F 1 - F 2 F ( E ) F const E 1. val E 2. val T. val E. val = T. val T 1. val T 2. val F. val T 1. val T 2. val/f. val T. val = F. val F 1. val F 2. val F. val = E. val F. val = const. val

62 LECTURE 10: SEMANTIC ANALYSIS E T 8 8 T 4 * F 2 Evaluation of the attributes is called the decoration of the parse tree. Imagine we have the string (1+3)*2. The parse tree is shown here. F 4 ( E 4 ) E 1 + T 3 const 2 The val attribute of each symbol is shown beside it. Attribute flow is upward in this case. T F 1 1 F const 3 3 The val of the overall expression is the val of the root. const 1

63 LECTURE 10: SEMANTIC ANALYSIS Each grammar production A ω is associated with a set of semantic rules of the form b := f(c1, c2,, ck) If b is an attribute associated with A, it is called a synthesized attribute. If b is an attribute associated with a grammar symbol on the right side of the production (that is, in ω) then b is called an inherited attribute.

64 LECTURE 10: SEMANTIC ANALYSIS Synthesized attributes of a node hold values that are computed from attribute values of the child nodes in the parse tree and therefore information flows upwards. production E 1 E 2 + T semantic rule E 1.val := E 2.val + T.val E 4 E 1 + T 3

65 LECTURE 10: SEMANTIC ANALYSIS Inherited attributes of child nodes are set by the parent node or sibling nodes and therefore information flows downwards. Consider the following attribute grammar. D T L T int T real L L 1, id L id L. in = T. type T. type = integer T. type = real L 1. in = L. in, addtype(id. entry, L. in) addtype(id. entry, L. in) real id1, id2, id3

66 LECTURE 10: SEMANTIC ANALYSIS In the same way that a context-free grammar does not indicate how a string should be parsed, an attribute grammar does not specify how the attribute rules should be applied. It merely defines the set of valid decorated parse trees, not how they are constructed. An attribute flow algorithm propagates attribute values through the parse tree by traversing the tree according to the set (write) and use (read) dependencies (an attribute must be set before it is used).

67 LECTURE 10: SEMANTIC ANALYSIS A grammar is called S-attributed if all attributes are synthesized. A grammar is called L-attributed if the parse tree traversal to update attribute values is always left-to-right and depth-first. For a production A X 1 X 2 X 3 X n The attributes of X j (1<= j <= n) only depend on: The attributes of X 1 X 2 X 3 X j 1. The inherited attributes of A. Values of inherited attributes must be passed down to children from left to right. Semantic rules can be applied immediately during parsing and parse trees do not need to be kept in memory. This is an essential grammar property for a one-pass compiler. An S-attributed grammar is a special case of an L-attributed grammar.

68 NAMES A name is a mnemonic character string used to represent something else. Typically alphanumeric characters (i.e. myint ) but can also be other symbols (i.e. + ). Names enable programmers to refer to variables, constants, operations, and types instead of low level concepts such as memory address. Names are essential in high-level languages for supporting abstraction. In this context, abstraction refers to the ability to hide a program fragment behind a name. By hiding the details, we can use the name as a black box. We only need to consider the object s purpose, rather than its implementation.

69 NAMES Names enable control abstractions and data abstractions in high level languages. Control Abstraction Subroutines (procedures and functions) allow programmers to focus on a manageable subset of program text, subroutine interface hides implementation details. Control flow constructs (if-then, while, for, return) hide low-level machine ops. Data Abstraction Object-oriented classes hide data representation details behind a set of operations.

70 BINDING A binding is an association between a name and an entity. The binding time is the time at which a binding is created, or in other words, when an implementation decision is made. There are many different times when binding can occur: Language design time: the design of specific language constructs. Syntax (names grammar) if (a>0) b:=a; (C syntax style) Keywords (names builtins) class (C++ and Java), extern Reserved words (names special constructs) main (C) Meaning of operators (operator operation) + (add), % (mod), ** (power) Built-in primitive types (type name type) float, short, int, long, string

71 BINDING Language implementation time: fixation of implementation constants. Examples: precision of types, organization and maximum sizes of stack and heap, etc. Program writing time: the programmer's choice of algorithms and data structures. Examples: A function may be called sum_grades(), a variable may be called x. Compile time: the time of translation of high-level constructs to machine code and choice of memory layout for data objects. Example: translate for(i=0; i<100; i++) a[i] = 1.0;? Link time: the time at which multiple object codes (machine code files) and libraries are combined into one executable. Example: which cout routine to use? /usr/lib/libc.a or /usr/lib/libc.so?

72 BINDING Load time: when the operating system loads the executable in memory. Example: In an older OS, the binding between a global variable and the physical memory location is determined at load time. Run time: when a program executes. Example: Binding between the value of a variable to the variable.

73 OBJECT LIFETIME Key events in an object s lifetime: Object creation. Creation of bindings. The object is manipulated via its binding. Deactivation and reactivation of (temporarily invisible) bindings. (in-and-out of scope) Destruction of bindings. Destruction of object. The time between binding creation and binding destruction is the binding s lifetime. The time between object creation and object destruction is the object s lifetime.

74 DANGLING REFERENCE When the binding lifetime exceeds the object s lifetime, we have a dangling reference. Typically, this is a sign of a bug. myobject = new SomeClass; foo(myobject); foo(someclass *a) { delete (myobject); // myobject is a global variable a->action(); }

75 MEMORY LEAKS When all bindings are destroyed, but the object still exists, we have a memory leak. { } SomeClass* myobject = new SomeClass; myobject->action(); return;

76 STORAGE MANAGEMENT Obviously, objects need to be stored somewhere during the execution of the program. The lifetime of the object, however, generally decides the storage mechanism used. We can divide them up into three categories. The objects that are alive throughout the execution of a program (e.g. global variables). The objects that are alive within a routine (e.g. local variables). The objects whose lifetime can be dynamically changed (the objects that are managed by the new/delete constructs).

77 STORAGE MANAGEMENT The three types of objects correspond to three principal storage allocation mechanisms. Static objects have an absolute storage address that is retained throughout the execution of the program. Global variables and data. Subroutine code and class method code. Stack objects are allocated in last-in first-out order, usually in conjunction with subroutine calls and returns. Actual arguments passed by value to a subroutine. Local variables of a subroutine. Heap objects may be allocated and deallocated at arbitrary times, but require an expensive storage management algorithm. Dynamically allocated data in C++. Java class instances are always stored on the heap.

78 TYPICAL PROGRAM/DATA LAYOUT IN MEMORY Higher Addr Stack Heap Static Data Program code is at the bottom of the memory region (code section). The code section is protected from runtime modification by the OS. Static data objects are stored in the static region. Stack grows downward. Heap grows upward. Lower Addr Code

79 STATIC ALLOCATION Program code is statically allocated in most implementations of imperative languages. Statically allocated variables are history sensitive. Global variables keep state during entire program lifetime Static local variables in C/C++ functions keep state across function invocations. Static data members are shared by objects and keep state during program lifetime. Advantage of statically allocated objects is the fast access due to absolute addressing of the object. Can static allocation be used for local variables? No, statically allocated local variables have only one copy of each variable. Cannot deal with the cases when multiple copies of a local variable are alive! When does this happen?

80 STACK ALLOCATION Each instance of a subroutine that is active has a subroutine frame (sometimes called activation record) on the run-time stack. Compiler generates subroutine calling sequence to setup frame, call the routine, and to destroy the frame afterwards. Subroutine frame layouts vary between languages, implementations, and machine platforms.

81 TYPICAL STACK-ALLOCATED SUBROUTINE FRAME Lower Addr fp Higher Addr Temporary storage (e.g. for expression evaluation) Local variables Bookkeeping (e.g. saved CPU registers) Return address Subroutine arguments and returns Typical subroutine frame layout Most modern processors have two registers: fp (frame pointer) and sp (stack pointer) to support efficient execution of subroutines in high level languages. A frame pointer (fp) points to the frame of the currently active subroutine at run time. Subroutine arguments, local variables, and return values are accessed by constant address offsets from the fp.

82 SUBROUTINE FRAMES ON THE STACK Subroutine frames are pushed and popped onto/from the runtime stack. The stack pointer (sp) points to the next available free space on the stack to push a new frame onto when a subroutine is called. The frame pointer (fp) points to the frame of the currently active subroutine, which is always the topmost frame on the stack. The fp of the previous active frame is saved in the current frame and restored after the call. In this example: M called A A called B B called A fp sp A B A M temporaries local variables bookkeeping return address arguments temporaries local variables bookkeeping return address arguments temporaries local variables bookkeeping return address arguments temporaries local variables bookkeeping return address arguments

83 HEAP ALLOCATION The heap is used to store objects who lifetime is dynamic. Implicit heap allocation: Done automatically. Java class instances are placed on the heap. Scripting languages and functional languages make extensive use of the heap for storing objects. Some procedural languages allow array declarations with run-time dependent array size. Resizable character strings. Explicit heap allocation: Statements and/or functions for allocation and deallocation. Malloc/free, new/delete.

84 HEAP ALLOCATION PROBLEMS Heap is a large block of memory (say N bytes). Requests for memory of various sizes may arrive randomly. For example, a program executes new. Each request may ask for 1 to N bytes. If a request of X bytes is granted, a continuous X bytes in the heap is allocated for the request. The memory will be used for a while and then returned to the system (when the program executes delete ). The problem: how can we make sure memory is allocated such that as many requests as possible are satisfied?

85 HEAP ALLOCATION EXAMPLE Example: 10KB memory to be managed. r1 = req(1k); r2 = req (2K); r3 = req(4k); free(r2); free(r1); r4 = req(4k); How we assign memory makes a difference! Internal fragment: unused memory within a block. Example: asking for 100 bytes and get a 512 bytes block. External fragment: unused memory between blocks. Even when the total available memory is more than a request, the request cannot be satisfied as in the example.

86 GARBAGE COLLECTION Explicit manual deallocation errors are among the most expensive and hard to detect problems in real-world applications. If an object is deallocated too soon, a reference to the object becomes a dangling reference. If an object is never deallocated, the program leaks memory. Automatic garbage collection removes all objects from the heap that are not accessible, i.e. are not referenced. Used in Lisp, Scheme, Prolog, Ada, Java, Haskell. Disadvantage is GC overhead, but GC algorithm efficiency has been improved. Not always suitable for real-time processing.

87 GARBAGE COLLECTION How does it work roughly? The language defines the lifetime of objects. The runtime keeps track of the number of references (bindings) to each object. Increment when a new reference is made, decrement when the reference is destroyed. Can delete when the reference count is 0. Need to determine when a variable is alive or dead based on language specification.

88 SCOPE Statically scoped language: the scope of bindings is determined at compile time. Used by almost all but a few programming languages. More intuitive than dynamic scoping. We can take a C program and know exactly which names refer to which objects at which points in the program solely by looking at the code. Dynamically scoped language: the scope of bindings is determined at run time. Used in Lisp (early versions), APL, Snobol, and Perl (selectively). Bindings depend on the flow of execution at runtime.

89 SCOPE The set of active bindings at any point in time is known as the referencing environment. Determined by scope rules. May also be determined by binding rules. There are two options for determining the reference environment: Deep binding: choice is made when the reference is first created. Shallow binding: choice is made when the reference is first used. Relevant for dynamically-scoped languages.

90 STATIC SCOPING The bindings between names and objects can be determined by examination of the program text. Scope rules of a program language define the scope of variables and subroutines, which is the region of program text in which a name-to-object binding is usable. Early Basic: all variables are global and visible everywhere Fortran 77: the scope of a local variable is limited to a subroutine; the scope of a global variable is the whole program text unless it is hidden by a local variable declaration with the same variable name. Algol 60, Pascal, and Ada: these languages allow nested subroutines definitions and adopt the closest nested scope rule bindings introduced in some scope are valid in all internally nested scopes unless hidden by some other binding to the same name.

91 CLOSEST NESTED SCOPE RULE To find the object referenced by a given name: Look for a declaration in the current innermost scope. If there is none, look for a declaration in the immediately surrounding scope, etc. def f1(a1): x = 1 def f2(a2): def f3(a3): print "x in f3: ", x #body of f3: f3,a3,f2,a2,x in f1,f1,a1 visible #body of f2: f3,f2,a2,x in f1,f1,a1 visible def f4(a4): def f5(a5): x = 2 #body of f5: x in f5,f5,a5,f4,a4,f2,f1,a1 visible #body of f4: f5,f4,a4,f2,x in f1,f1,a1 visible #body of f1: x in f1,f1,a1,f2,f4 visible

92 STATIC LINKS In the previous lecture, we saw how we can use offsets from the current frame pointer to access local objects in the current subroutine. What if I m referencing a local variable to an enclosing subroutine? How can I find the frame that holds this variable? The order of stack frames will not necessarily correspond to the lexical nesting. But the enclosing subroutine must appear somewhere on the stack as I couldn t have called the current subroutine without first calling the enclosing subroutine.

93 STATIC LINKS We will maintain information about the lexically surrounding subroutine by creating a static link between a frame and its parent. fp f3 f4 f5 f2 f1 def f1(): x = 1 def f2(): print x def f3(): print x def f4(): print x f3() def f5(): print x f4() f5() f2() if name == main : f1() # executes first!

94 DYNAMIC SCOPING Scope rule: the "current" binding for a given name is the one encountered most recently during execution. Typically adopted in (early) functional languages that are interpreted. With dynamic scope: Name-to-object bindings cannot be determined by a compiler in general. Easy for interpreter to look up name-to-object binding in a stack of declarations. Generally considered to be a bad programming language feature. Hard to keep track of active bindings when reading a program text. Most languages are now compiled, or a compiler/interpreter mix.

95 DYNAMIC SCOPING IMPLEMENTATION Each time a subroutine is called, its local variables are pushed onto the stack with their name-to-object binding. When a reference to a variable is made, the stack is searched top-down for the variable's name-to-object binding. After the subroutine returns, the bindings of the local variables are popped. Different implementations of a binding stack are used in programming languages with dynamic scope, each with advantages and disadvantages.

96 DYNAMIC SCOPING Deep binding: reference environment of older is established with the first reference to older, which is when it is passed as an argument to show. main(p) thres:=35 show(p, older) thres:integer thres:=20 older(p) return p.age>thres if <return value is true> write(p) thres:integer function older(p:person):boolean return p.age>thres procedure show(p:person, c:function) thres:integer thres:=20 if c(p) write(p) procedure main(p) thres:=35 show(p, older)

97 DYNAMIC SCOPING Shallow binding: reference environment of older is established with the call to older in show. main(p) thres:=35 show(p, older) thres:integer thres:=20 older(p) return p.age>thres if <return value is true> write(p) thres:integer function older(p:person):boolean return p.age>thres procedure show(p:person, c:function) thres:integer thres:=20 if c(p) write(p) procedure main(p) thres:=35 show(p, older)

LECTURE 14. Names, Scopes, and Bindings: Scopes

LECTURE 14. Names, Scopes, and Bindings: Scopes LECTURE 14 Names, Scopes, and Bindings: Scopes SCOPE The scope of a binding is the textual region of a program in which a name-to-object binding is active. Nonspecifically, scope is a program region of

More information

COP4020 Programming Languages. Names, Scopes, and Bindings Prof. Robert van Engelen

COP4020 Programming Languages. Names, Scopes, and Bindings Prof. Robert van Engelen COP4020 Programming Languages Names, Scopes, and Bindings Prof. Robert van Engelen Overview Abstractions and names Binding time Object lifetime Object storage management Static allocation Stack allocation

More information

6. Names, Scopes, and Bindings

6. Names, Scopes, and Bindings Copyright (C) R.A. van Engelen, FSU Department of Computer Science, 2000-2004 6. Names, Scopes, and Bindings Overview Names Binding time Object lifetime Object storage management Static allocation Stack

More information

LECTURE 3. Compiler Phases

LECTURE 3. Compiler Phases LECTURE 3 Compiler Phases COMPILER PHASES Compilation of a program proceeds through a fixed series of phases. Each phase uses an (intermediate) form of the program produced by an earlier phase. Subsequent

More information

Names and Abstractions: What s in a Name?

Names and Abstractions: What s in a Name? Copyright R.A. van Engelen, FSU Department of Computer Science, 2000 Names, Scopes, and Bindings Binding time Object lifetime Object storage management Static allocation Stack allocation Heap allocation

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 2 Thomas Wies New York University Review Last week Programming Languages Overview Syntax and Semantics Grammars and Regular Expressions High-level

More information

LECTURE 1. Overview and History

LECTURE 1. Overview and History LECTURE 1 Overview and History COURSE OBJECTIVE Our ultimate objective in this course is to provide you with the knowledge and skills necessary to create a new programming language (at least theoretically).

More information

LECTURE 7. Lex and Intro to Parsing

LECTURE 7. Lex and Intro to Parsing LECTURE 7 Lex and Intro to Parsing LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens) and create real programs that can recognize them.

More information

Principles of Programming Languages

Principles of Programming Languages Principles of Programming Languages h"p://www.di.unipi.it/~andrea/dida2ca/plp- 14/ Prof. Andrea Corradini Department of Computer Science, Pisa Lesson 18! Bootstrapping Names in programming languages Binding

More information

Programming Languages, Summary CSC419; Odelia Schwartz

Programming Languages, Summary CSC419; Odelia Schwartz Programming Languages, Summary CSC419; Odelia Schwartz Chapter 1 Topics Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation Criteria Influences on Language Design

More information

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

SEMANTIC ANALYSIS TYPES AND DECLARATIONS SEMANTIC ANALYSIS CS 403: Type Checking Stefan D. Bruda Winter 2015 Parsing only verifies that the program consists of tokens arranged in a syntactically valid combination now we move to check whether

More information

Scope. CSC 4181 Compiler Construction. Static Scope. Static Scope Rules. Closest Nested Scope Rule

Scope. CSC 4181 Compiler Construction. Static Scope. Static Scope Rules. Closest Nested Scope Rule Scope CSC 4181 Compiler Construction Scope and Symbol Table A scope is a textual region of the program in which a (name-to-object) binding is active. There are two types of scope: Static scope Dynamic

More information

Programming Languages Third Edition. Chapter 7 Basic Semantics

Programming Languages Third Edition. Chapter 7 Basic Semantics Programming Languages Third Edition Chapter 7 Basic Semantics Objectives Understand attributes, binding, and semantic functions Understand declarations, blocks, and scope Learn how to construct a symbol

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

More information

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning

More information

A Simple Syntax-Directed Translator

A Simple Syntax-Directed Translator Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up

More information

Informatica 3 Syntax and Semantics

Informatica 3 Syntax and Semantics Informatica 3 Syntax and Semantics Marcello Restelli 9/15/07 Laurea in Ingegneria Informatica Politecnico di Milano Introduction Introduction to the concepts of syntax and semantics Binding Variables Routines

More information

MIDTERM EXAM (Solutions)

MIDTERM EXAM (Solutions) MIDTERM EXAM (Solutions) Total Score: 100, Max. Score: 83, Min. Score: 26, Avg. Score: 57.3 1. (10 pts.) List all major categories of programming languages, outline their definitive characteristics and

More information

Briefly describe the purpose of the lexical and syntax analysis phases in a compiler.

Briefly describe the purpose of the lexical and syntax analysis phases in a compiler. Name: Midterm Exam PID: This is a closed-book exam; you may not use any tools besides a pen. You have 75 minutes to answer all questions. There are a total of 75 points available. Please write legibly;

More information

CS5363 Final Review. cs5363 1

CS5363 Final Review. cs5363 1 CS5363 Final Review cs5363 1 Programming language implementation Programming languages Tools for describing data and algorithms Instructing machines what to do Communicate between computers and programmers

More information

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4020 July 5, 2007 14.00-15.30 This exam (8 pages) consists of 60 True/False

More information

Working of the Compilers

Working of the Compilers Working of the Compilers Manisha Yadav Nisha Thakran IT DEPARTMENT IT DEPARTMENT DCE,GURGAON DCE,GURGAON Abstract- The objective of the paper is to depict the working of the compilers that were designed

More information

Question Bank. 10CS63:Compiler Design

Question Bank. 10CS63:Compiler Design Question Bank 10CS63:Compiler Design 1.Determine whether the following regular expressions define the same language? (ab)* and a*b* 2.List the properties of an operator grammar 3. Is macro processing a

More information

COMPILER DESIGN LECTURE NOTES

COMPILER DESIGN LECTURE NOTES COMPILER DESIGN LECTURE NOTES UNIT -1 1.1 OVERVIEW OF LANGUAGE PROCESSING SYSTEM 1.2 Preprocessor A preprocessor produce input to compilers. They may perform the following functions. 1. Macro processing:

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Any questions about the syllabus?! Course Material available at www.cs.unic.ac.cy/ioanna! Next time reading assignment [ALSU07]

More information

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Compiling and Interpreting Programming. Overview of Compilers and Interpreters Copyright R.A. van Engelen, FSU Department of Computer Science, 2000 Overview of Compilers and Interpreters Common compiler and interpreter configurations Virtual machines Integrated programming environments

More information

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language. UNIT I LEXICAL ANALYSIS Translator: It is a program that translates one language to another Language. Source Code Translator Target Code 1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System

More information

Part 5 Program Analysis Principles and Techniques

Part 5 Program Analysis Principles and Techniques 1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

More information

CS606- compiler instruction Solved MCQS From Midterm Papers

CS606- compiler instruction Solved MCQS From Midterm Papers CS606- compiler instruction Solved MCQS From Midterm Papers March 06,2014 MC100401285 Moaaz.pk@gmail.com Mc100401285@gmail.com PSMD01 Final Term MCQ s and Quizzes CS606- compiler instruction If X is a

More information

Chapter 3:: Names, Scopes, and Bindings

Chapter 3:: Names, Scopes, and Bindings Chapter 3:: Names, Scopes, and Bindings Programming Language Pragmatics Michael L. Scott Some more things about NFAs/DFAs We said that a regular expression can be: A character (base case) A concatenation

More information

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher COP4020 ming Languages Compilers and Interpreters Robert van Engelen & Chris Lacher Overview Common compiler and interpreter configurations Virtual machines Integrated development environments Compiler

More information

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4303 April 9, 2010 14.00-15.30 This exam (6 pages) consists of 52 True/False

More information

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer. The Compiler So Far CSC 4181 Compiler Construction Scanner - Lexical analysis Detects inputs with illegal tokens e.g.: main 5 (); Parser - Syntactic analysis Detects inputs with ill-formed parse trees

More information

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms SE352b Software Engineering Design Tools W3: Programming Paradigms Feb. 3, 2005 SE352b, ECE,UWO, Hamada Ghenniwa SE352b: Roadmap CASE Tools: Introduction System Programming Tools Programming Paradigms

More information

Chapter 2 - Programming Language Syntax. September 20, 2017

Chapter 2 - Programming Language Syntax. September 20, 2017 Chapter 2 - Programming Language Syntax September 20, 2017 Specifying Syntax: Regular expressions and context-free grammars Regular expressions are formed by the use of three mechanisms Concatenation Alternation

More information

CMSC 350: COMPILER DESIGN

CMSC 350: COMPILER DESIGN Lecture 11 CMSC 350: COMPILER DESIGN see HW3 LLVMLITE SPECIFICATION Eisenberg CMSC 350: Compilers 2 Discussion: Defining a Language Premise: programming languages are purely formal objects We (as language

More information

Programmiersprachen (Programming Languages)

Programmiersprachen (Programming Languages) 2016-05-13 Preface Programmiersprachen (Programming Languages) coordinates: lecturer: web: usable for: requirements: No. 185.208, VU, 3 ECTS Franz Puntigam http://www.complang.tuwien.ac.at/franz/ps.html

More information

CST-402(T): Language Processors

CST-402(T): Language Processors CST-402(T): Language Processors Course Outcomes: On successful completion of the course, students will be able to: 1. Exhibit role of various phases of compilation, with understanding of types of grammars

More information

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Parsing III (Top-down parsing: recursive descent & LL(1) ) (Bottom-up parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper,

More information

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors

More information

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc. Syntax Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will

More information

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5 Introduction to Parsing Lecture 5 1 Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2 Languages and Automata Formal languages are very important

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing Roadmap > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing The role of the parser > performs context-free syntax analysis > guides

More information

Monday, September 13, Parsers

Monday, September 13, Parsers Parsers Agenda Terminology LL(1) Parsers Overview of LR Parsing Terminology Grammar G = (Vt, Vn, S, P) Vt is the set of terminals Vn is the set of non-terminals S is the start symbol P is the set of productions

More information

CS 406/534 Compiler Construction Parsing Part I

CS 406/534 Compiler Construction Parsing Part I CS 406/534 Compiler Construction Parsing Part I Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr.

More information

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design PSD3A Principles of Compiler Design Unit : I-V 1 UNIT I - SYLLABUS Compiler Assembler Language Processing System Phases of Compiler Lexical Analyser Finite Automata NFA DFA Compiler Tools 2 Compiler -

More information

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments Programming Languages Third Edition Chapter 10 Control II Procedures and Environments Objectives Understand the nature of procedure definition and activation Understand procedure semantics Learn parameter-passing

More information

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful? Chapter 1 :: Introduction Introduction Programming Language Pragmatics Michael L. Scott Why are there so many programming languages? evolution -- we've learned better ways of doing things over time socio-economic

More information

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill Syntax Analysis Björn B. Brandenburg The University of North Carolina at Chapel Hill Based on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. The Big Picture Character

More information

Names, Scopes, and Bindings. CSE 307 Principles of Programming Languages Stony Brook University

Names, Scopes, and Bindings. CSE 307 Principles of Programming Languages Stony Brook University Names, Scopes, and Bindings CSE 307 Principles of Programming Languages Stony Brook University http://www.cs.stonybrook.edu/~cse307 1 Names, Scopes, and Bindings Names are identifiers (mnemonic character

More information

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward Lexical Analysis COMP 524, Spring 2014 Bryan Ward Based in part on slides and notes by J. Erickson, S. Krishnan, B. Brandenburg, S. Olivier, A. Block and others The Big Picture Character Stream Scanner

More information

The Structure of a Syntax-Directed Compiler

The Structure of a Syntax-Directed Compiler Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree Type Checker (AST) Decorated AST Translator Intermediate Representation Symbol Tables Optimizer (IR) IR Code Generator Target

More information

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab

More information

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5 Introduction to Parsing Lecture 5 1 Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2 Languages and Automata Formal languages are very important

More information

Semantic Analysis. Lecture 9. February 7, 2018

Semantic Analysis. Lecture 9. February 7, 2018 Semantic Analysis Lecture 9 February 7, 2018 Midterm 1 Compiler Stages 12 / 14 COOL Programming 10 / 12 Regular Languages 26 / 30 Context-free Languages 17 / 21 Parsing 20 / 23 Extra Credit 4 / 6 Average

More information

Names, Scopes, and Bindings. CSE 307 Principles of Programming Languages Stony Brook University

Names, Scopes, and Bindings. CSE 307 Principles of Programming Languages Stony Brook University Names, Scopes, and Bindings CSE 307 Principles of Programming Languages Stony Brook University http://www.cs.stonybrook.edu/~cse307 1 Names, Scopes, and Bindings Names are identifiers (mnemonic character

More information

Programming Languages

Programming Languages Programming Languages Tevfik Koşar Lecture - VIII February 9 th, 2006 1 Roadmap Allocation techniques Static Allocation Stack-based Allocation Heap-based Allocation Scope Rules Static Scopes Dynamic Scopes

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution

More information

CS 415 Midterm Exam Spring SOLUTION

CS 415 Midterm Exam Spring SOLUTION CS 415 Midterm Exam Spring 2005 - SOLUTION Name Email Address Student ID # Pledge: This exam is closed note, closed book. Questions will be graded on quality of answer. Please supply the best answer you

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata

More information

VIVA QUESTIONS WITH ANSWERS

VIVA QUESTIONS WITH ANSWERS VIVA QUESTIONS WITH ANSWERS 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the

More information

3. Parsing. Oscar Nierstrasz

3. Parsing. Oscar Nierstrasz 3. Parsing Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/

More information

The role of semantic analysis in a compiler

The role of semantic analysis in a compiler Semantic Analysis Outline The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors

More information

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08 CS412/413 Introduction to Compilers Tim Teitelbaum Lecture 2: Lexical Analysis 23 Jan 08 Outline Review compiler structure What is lexical analysis? Writing a lexer Specifying tokens: regular expressions

More information

Lexical and Syntax Analysis. Top-Down Parsing

Lexical and Syntax Analysis. Top-Down Parsing Lexical and Syntax Analysis Top-Down Parsing Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure Syntax A syntax

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. COMPILER DESIGN 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the target

More information

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 20 Intermediate code generation Part-4 Run-time environments

More information

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Compiler Theory. (Semantic Analysis and Run-Time Environments) Compiler Theory (Semantic Analysis and Run-Time Environments) 005 Semantic Actions A compiler must do more than recognise whether a sentence belongs to the language of a grammar it must do something useful

More information

A programming language requires two major definitions A simple one pass compiler

A programming language requires two major definitions A simple one pass compiler A programming language requires two major definitions A simple one pass compiler [Syntax: what the language looks like A context-free grammar written in BNF (Backus-Naur Form) usually suffices. [Semantics:

More information

Wednesday, September 9, 15. Parsers

Wednesday, September 9, 15. Parsers Parsers What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs: What is a parser Parsers A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Chapter 3: Lexing and Parsing

Chapter 3: Lexing and Parsing Chapter 3: Lexing and Parsing Aarne Ranta Slides for the book Implementing Programming Languages. An Introduction to Compilers and Interpreters, College Publications, 2012. Lexing and Parsing* Deeper understanding

More information

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

More information

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2 Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Formal Languages Basis for the design and implementation of programming languages Alphabet: finite set Σ of symbols String: finite sequence

More information

Wednesday, August 31, Parsers

Wednesday, August 31, Parsers Parsers How do we combine tokens? Combine tokens ( words in a language) to form programs ( sentences in a language) Not all combinations of tokens are correct programs (not all sentences are grammatically

More information

Time : 1 Hour Max Marks : 30

Time : 1 Hour Max Marks : 30 Total No. of Questions : 6 P4890 B.E/ Insem.- 74 B.E ( Computer Engg) PRINCIPLES OF MODERN COMPILER DESIGN (2012 Pattern) (Semester I) Time : 1 Hour Max Marks : 30 Q.1 a) Explain need of symbol table with

More information

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below. UNIT I Translator: It is a program that translates one language to another Language. Examples of translator are compiler, assembler, interpreter, linker, loader and preprocessor. Source Code Translator

More information

Lexical and Syntax Analysis

Lexical and Syntax Analysis Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Easy for humans to write and understand String of characters

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular

More information

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation Language Implementation Methods The Design and Implementation of Programming Languages Compilation Interpretation Hybrid In Text: Chapter 1 2 Compilation Interpretation Translate high-level programs to

More information

Binding and Storage. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Binding and Storage. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill Binding and Storage Björn B. Brandenburg The University of North Carolina at Chapel Hill Based in part on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. What s

More information

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Syntax Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Limits of Regular Languages Advantages of Regular Expressions

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2016 Lecture 3a Andrew Tolmach Portland State University 1994-2016 Formal Semantics Goal: rigorous and unambiguous definition in terms of a wellunderstood formalism (e.g.

More information

Week 2: Syntax Specification, Grammars

Week 2: Syntax Specification, Grammars CS320 Principles of Programming Languages Week 2: Syntax Specification, Grammars Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 1/ 62 Words and Sentences

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Recap: Static Data Structures Outline of Lecture 18 Recap:

More information

Chapter 5. Names, Bindings, and Scopes

Chapter 5. Names, Bindings, and Scopes Chapter 5 Names, Bindings, and Scopes Chapter 5 Topics Introduction Names Variables The Concept of Binding Scope Scope and Lifetime Referencing Environments Named Constants 1-2 Introduction Imperative

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 3 Thomas Wies New York University Review Last week Names and Bindings Lifetimes and Allocation Garbage Collection Scope Outline Control Flow Sequencing

More information

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468 Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure

More information

NOTE: Answer ANY FOUR of the following 6 sections:

NOTE: Answer ANY FOUR of the following 6 sections: A-PDF MERGER DEMO Philadelphia University Lecturer: Dr. Nadia Y. Yousif Coordinator: Dr. Nadia Y. Yousif Internal Examiner: Dr. Raad Fadhel Examination Paper... Programming Languages Paradigms (750321)

More information

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine. 1. true / false By a compiler we mean a program that translates to code that will run natively on some machine. 2. true / false ML can be compiled. 3. true / false FORTRAN can reasonably be considered

More information

COP4020 Programming Languages. Semantics Prof. Robert van Engelen

COP4020 Programming Languages. Semantics Prof. Robert van Engelen COP4020 Programming Languages Semantics Prof. Robert van Engelen Overview Static semantics Dynamic semantics Attribute grammars Abstract syntax trees COP4020 Spring 2011 2 Static Semantics Syntax concerns

More information

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline CS 0 Lecture 8 Chapter 5 Louden Outline The symbol table Static scoping vs dynamic scoping Symbol table Dictionary associates names to attributes In general: hash tables, tree and lists (assignment ) can

More information

When do We Run a Compiler?

When do We Run a Compiler? When do We Run a Compiler? Prior to execution This is standard. We compile a program once, then use it repeatedly. At the start of each execution We can incorporate values known at the start of the run

More information