PART I: overview material Course Overview Introduction Language processors (tombstone diagrams, bootstrapping) Architecture of a compiler PART II: inside a compiler Syntax analysis Contextual analysis 6 Runtime organization Code generation PART III: conclusion 8 Interpretation 9 Review Arrays An array is a composite data type; an array value consists of mu ltiple values of the same type. Arrays are in some sense like records, except that their elements all have the same type. The elements of arrays are typically indexed using an integer value (In some languages such as for example Pascal, also other ordinal types can be used for indexing arrays). Two kinds of arrays (with different runtime representation schemas): static arrays: their size (number of elements) is known at compile time. dynamic arrays: their size can not be known at compile time because the number of elements may vary at run-time. Q: Which are the cheapest arrays? Why? typename = array 6 of of Char; var me: Name; var names: array of of Name me[0] me[] me[] me[] me[] me[] K r s names[0][0] names[0][] names[0][] names[0][] names[0][] names[0][] names[][0] names[][] names[][] names[][] names[][] names[][] J h n S h a Name Name type = record Char c, c, Integer n end var code: array of of code[0].c code[0].n code[].c code[].n code[].c code[].n K d typet = arrayn of of TE; a[0] a[] a[] a[n-] size[t] = n * size[te] address[a[0] ] = address[a] address[a[]] = address[a]+size[te] address[a[] ] = address[a]+*size[te] address[a[k] ] = address[a]+k*size[te] with different lower bound in Pascal typet = array[..0] of of TE; a[] a[] a[6] a[0] size[t] = * size[te] address[a[] ] = address[a] address[a[]] = address[a]+size[te] address[a[6] ] = address[a]+*size[te] address[a[k]] = address[a]+(k-)*size[te] 6
with different lower bound with different lower bound in Pascal typet = array[l..u] of of TE; size[t] = (u-l+) * size[te] The of the array corresponds to address[a[k]] = address[a]+(k-l)*size[te] address[a[0] ] = address[a]+k*size[te]-l*size[te] = (address[a]-l*size[te]) +k*size[te] address[a[k]] = [a] +k*size[te] [a] = address[a] - l*size[te] = address[a[0] ] Note: The of the array (corresponds to a[0]) is an address which may be outside of the array! typet = array[..] of of TE; array bounds a[-] a[0] a[] a[] a[] a[] a[] a[6] a[] 8 Dynamic arrays are arrays whose size is not known until run time. Example : Java Arrays (all arrays in Java are dynamic) Dynamic array: no size given in declaration buffer = new char[buffersize]; Array creation at runtime determines size...... for for (int i=0; i<buffer.length; i++) buffer[i] = ; ; Can ask for size of an array at run time Q: How could we represent Java arrays? Java Arrays buffer = new char[len]; A possible representation for Java arrays buffer.length buffer. C m l e buffer[0] buffer[] buffer[] buffer[] buffer[] buffer[] buffer[6] 9 0 Java Arrays buffer = new char[len]; Another possible representation for Java arrays buffer Note: In reality Java also stores a type in its representation for arrays, because Java arrays are objects (instances of classes). C m l e buffer.length buffer[0] buffer[] buffer[] buffer[] buffer[] buffer[] buffer[6] Example : Dynamic arrays in Ada Different from Java because also the lower bound can be dynamically determined. type String is isarray(integer range < >) >) of of Character d: d: String ( (... k ); ); s: s: String (m (m... n -- ); ); k, m, n are Integer variables => value not known at compile time Variables d and s are both of type String. Concatenation and lexicographic comparison are allowed on these arrays even if they have different ranges. Assignments will copy the contents of one array into the other, but only works if both have same number of elements. Otherwise => runtime error.
Example : Dynamic arrays in Ada type String is isarray(integer range < >) >) of of Character d: d: String ( (... k); k); A possible representation for Ada arrays d lower upper k Note: remember can be outside the actual array. a b c z d[0] d[] d[] d[] d[k] Example : Dynamic arrays in Ada type T is array (Integer range < >) of TE a: T (l (l.. u); The formulas: address[a[k]] = content[address[a]] + k*size[te] Runtime index check is needed: l = k = u a lower upper l u where l = content[ address[a] + address-size ] u = content[ address[a] + address-size + size[integer] ] (Note: Similar runtime index checks also performed in Java) A recursive type is a type which is defined in terms of itself. In other words, a component type of the recursive type is the type itself. Example : In Pascal type List = ^ ; = record Recursive head : Integer; tail : List; end; Example : In Pascal type List = ^ ; = record head : Integer; tail : List; end; var primes : List; primes In Pascal, recursive types are only allowed when the type is a pointer type (denoted by ^). Why only allowed for pointer types? All pointers have the same size. 6 Example : In Haskell data Tree = Int Tree Tree t = 0 ( ) ( ( ) ( )) In Haskell, there are no explicit pointer types, but recursive types are allowed in disjoint unions (Haskell data types). The representation of the above Tree type of course uses a pointer structure, similar to the list on the previous page. Example : In Haskell data Tree = Int Tree Tree t = 0 ( ) ( ( ) ( )) Tag: or t 0 8
0 Example : In Haskell data Tree = Int Tree Tree t Another possible representation. This representation may be more memory efficient if we can steal a few bits from a pointer to represent a small tag. Data Representation: how to represent values of the source language on the target machine : How to organize computing the values of expressions (taking care of intermediate results) Stack Storage Allocation: How to organize storage for variables (considering lifetimes of global and local variables) Routines: How to implement procedures, functions (and how to pass their parameters and return values) Heap Storage Allocation: How to organize storage for variables (considering lifetimes of heap variables) Object Orientation: Runtime organization for OO languages (how to handle classes, objects, methods) 9 0 What is the problem? Computing the value of something like this: on a low level machine. (a * b) + ( - (c * )) Low level machine has instructions for multiplication, addition, subtraction, etc. Each instruction operates on two values at a time. Problem: How to use these simple instructions to compute complex expressions More specifically: how to manage intermediate results Intermediate results A sequence of machine instructions that compute the value of an expression E op E Instructions that compute value of E Instructions that compute value of E Do operation op Produces value of E Produces value of E (while this is executing, value of E must be saved somewhere) Needs values of E and E on a Register Machine (RM) a * a*b b + a*b + (-c*) Intermediate results - -c* * c c* Register machine: A register machine has a number of registers R, R, R, which can be used to store intermediate values. Typical Instructions: STORE Ri a LOAD Ri x Ri x SUB Ri x ADD Ri x x = Register #number address... RM code is efficient, but compilation to a RM is rather complex: Must assign a specific register to each intermediate result. Must manage allocation of registers (try to reuse/minimize number of registers). Machine only has a fixed number of registers, so what if this is not enough?
on a Register Machine on a Stack Machine (SM) Computing (a * b) + ( - (c * )) on a register machine. A register machine has a number of registers R, R, R, which can be used to store intermediate values. LOAD R a //R: a R b //R: a * b LOAD R # //R: LOAD R c //R: c R # //R: c * SUB R R //R: - (c * ) ADD R R //R: (a * b) + ( - (c * )) Stack machine: On a stack machine, the intermediate results are stored on a stack. Operations take their arguments from the top of the stack and put the result back on the stack. Typical Instructions: STORE a LOAD x SUB ADD Stack machine: Very natural for expression evaluation (see examples on next two pages). Requires more instructions for the same expression, but the instructions are simpler. 6 on a Stack Machine on a Stack Machine Example : Computing (a * b) + ( - (c * )) on a stack machine. LOAD a //stack: a LOAD b //stack: a b //stack: (a*b) LOAD # //stack: (a*b) LOAD c //stack: (a*b) c LOAD # //stack: (a*b) c //stack: (a*b) (c*) SUB //stack: (a*b) (-(c*)) ADD //stack: (a*b)+(-(c*)) Note the correspondence between the instructions and the expression written in postfix notation: a b * c * - + Example : Computing (0 < n) && odd(n) on a stack machine. LOAD #0 //stack: 0 LOAD n //stack: 0 n LT //stack: (0<n) LOAD n //stack: (0<n) n CALL odd //stack: (0<n) odd(n) AND //stack: (0<n)&&odd(n) This example illustrates that calling functions/procedures fits in just as naturally with the stack machine evaluation model as operations that correspond to machine instructions. In register machines this is much more complicated, because a stack must be created in memory for managing subroutine calls/returns. 8