The Procedure Abstraction
Procedure Abstraction Begins Chapter 6 in EAC The compiler must deal with interface between compile time and run time Most of the tricky issues arise in implementing procedures Issues Compile-time versus run-time behavior Finding storage for EVERYTHING and mapping names to addresses Generating code to compute addresses Interfaces with other programs, other languages, and the OS Efficiency of implementation
The Procedure & Its Three Abstractions
The Procedure as a Name Space There is a strict constraints that each procedure must adhere to!
The Procedure & Its Three Abstractions
The Procedure & Its Three Abstractions
The Procedure & Its Three Abstractions
The Procedure: Three Abstractions Control Abstraction Well defined entries & exits Mechanism to return control to caller Clean Name Space Clean slate for writing locally visible names Local names may obscure identical, non-local names Local names cannot be seen outside External Interface Access is by procedure name & parameters Clear protection for both caller & callee Invoked procedure can ignore calling context Procedures permit a critical separation of concerns
The Procedure (Realist s View) Procedures are the key to building large systems Requires system-wide contract Conventions on memory layout, protection, resource allocation calling sequences, & error handling Must involve architecture (ISA), OS, & compiler Provides shared access to system-wide facilities Storage management, flow of control, interrupts Interface to input/output devices, protection facilities, timers, synchronization flags, counters, Establishes a private context Create private storage for each procedure invocation Encapsulate information about control flow & data abstractions
The Procedure (Realist s View) Procedures allow us to use separate compilation Separate compilation allows us to build non-trivial programs Keeps compile times reasonable Lets multiple programmers collaborate Requires independent procedures Without separate compilation, we would not build large systems
The Procedure (Realist s View) The procedure linkage convention Ensures that each procedure inherits a valid runtime environment and that the callers environment is restored on return The compiler must generate code to ensure this happens according to conventions established by the system
The Procedure (More Abstract View) A procedure is an abstract structure constructed via software Underlying hardware directly supports little of the abstraction it understands bits, bytes, integers, reals, and addresses, but not: Entries and exits Interfaces Call and return mechanisms may be a special instruction to save context at point of call Name space Nested scopes All these are established by a carefully-crafted system of mechanisms provided by compiler, runtime system, linker and loader, and OS
Run Time versus Compile Time These concepts are often confusing to the newcomer Linkages execute at run time Code for the linkage is emitted at compile time The linkage is designed long before either of these Compile time versus run time can be confusing to CISC672 students. We will emphasize the distinction between them.
The Procedure as a Control Abstraction Procedures have well-defined control-flow The Algol-60 procedure call Invoked at a call site, with some set of actual parameters Control returns to call site, immediately after invocation
The Procedure as a Control Abstraction Procedures have well-defined control-flow The Algol-60 procedure call Invoked at a call site, with some set of actual parameters Control returns to call site, immediately after invocation s = p(10,t,u); int p(a,b,c) int a, b, c; { int d; d = q(c,b);...
The Procedure as a Control Abstraction Procedures have well-defined control-flow The Algol-60 procedure call Invoked at a call site, with some set of actual parameters Control returns to call site, immediately after invocation s = p(10,t,u); int p(a,b,c) int a, b, c; { int d; d = q(c,b);... int q(x,y) int x,y; { return x + y;
The Procedure as a Control Abstraction Procedures have well-defined control-flow The Algol-60 procedure call Invoked at a call site, with some set of actual parameters Control returns to call site, immediately after invocation s = p(10,t,u); int p(a,b,c) int a, b, c; { int d; d = q(c,b);... int q(x,y) int x,y; { return x + y;
The Procedure as a Control Abstraction Procedures have well-defined control-flow The Algol-60 procedure call Invoked at a call site, with some set of actual parameters Control returns to call site, immediately after invocation s = p(10,t,u); int p(a,b,c) int a, b, c; { int d; d = q(c,b);... int q(x,y) int x,y; { return x + y;
The Procedure as a Control Abstraction Procedures have well-defined control-flow The Algol-60 procedure call Invoked at a call site, with some set of actual parameters Control returns to call site, immediately after invocation s = p(10,t,u); int p(a,b,c) int a, b, c; { int d; d = q(c,b);... int q(x,y) int x,y; { return x + y; Most languages allow recursion
The Procedure as a Control Abstraction Implementing procedures with this behavior Requires code to save and restore a return address Must map actual parameters to formal parameters (c x, b y) Must create storage for local variables (&, maybe, parameters) p needs space for d (&, maybe, a, b, & c) where does this space go in recursive invocations? s = p(10,t,u); int p(a,b,c) int a, b, c; { int d; d = q(c,b);... int q(x,y) int x,y; { return x + y; Compiler emits code that causes all this to happen at run time
The Procedure as a Control Abstraction Implementing procedures with this behavior Must preserve p s state while q executes Strategy: Create unique location for each procedure activation Can use a stack of memory blocks to hold local storage and return addresses s = p(10,t,u); int p(a,b,c) int a, b, c; { int d; d = q(c,b);... int q(x,y) int x,y; { return x + y; Compiler emits code that causes all this to happen at run time
The Procedure as a Name Space Why introduce lexical scoping? Provides a compile-time mechanism for binding variables Simplifies rules for naming & resolves conflicts Lets the programmer introduce local names How can the compiler keep track of all those names? The Problem At point p, which declaration of x is current? At run-time, where is x found? As parser goes in & out of scopes, how does it delete x? The Answer The compiler must model the name space Lexically scoped symbol tables (see 5.7.3)
Lexically-scoped Symbol Tables 5.7 in EaC The problem The compiler needs a distinct record for each declaration Nested lexical scopes admit duplicate declarations The interface insert(name, level ) creates record for name at level lookup(name, level ) returns pointer or index delete(level ) removes all names declared at level Many implementation schemes have been proposed We ll stay at the conceptual level Hash table implementation is tricky and detailed (see B.4) Symbol tables are compile-time structures the compiler use to resolve references to names. We ll see the corresponding run-time structures that are used to establish addressability later.
Example procedure p { int a, b, c procedure q { int v, b, x, w procedure r { int x, y, z. procedure s { int x, a, v r s q B0: { int a, b, c B1: { int v, b, x, w B2: { int x, y, z. B3: { int x, a, v
Lexically-scoped Symbol Tables High-level idea Create a new table for each scope Chain them together for lookup Sheaf of tables implementation insert() may need to create table it always inserts at current level lookup() walks chain of tables & returns first occurrence of name delete() throws away table for level p, if it is top table in the chain If the compiler must preserve the table (for, say, the debugger ), this idea is actually practical. Individual tables can be hash tables.
The Procedure as an External Interface OS needs a way to start the program s execution Programmer needs a way to indicate where it begins The main procedure in most languaages When user invokes grep at a command line OS finds the executable OS creates a process and arranges for it to run grep grep is code from the compiler, linked with run-time system Starts the run-time environment & calls main After main, it shuts down run-time environment & returns When grep needs system services It makes a system call, such as fopen() UNIX/Linux specific discussion
The Procedure as an External Interface OS needs a way to start the program s execution > grep foo hello.txt main function here foo and hello.txt here C o d e S G t l a & o t b i a c l H e a p 0 high grep running in memory S t a c k
The Procedure as an External Interface Grep may call fopen with hello.txt main function foo and hello.txt in main s activiation record S G S C t l H t o a & o e a d t b a c e i a p k c l 0 high grep running in memory on stack of ARs fopen main AR of fopen AR of main
Where Do All These Variables Go? Local Keep them in the procedure activation record or in a register Automatic lifetime matches procedure s lifetime Static File scope storage area affixed with file name Lifetime is entire execution Global One or more named global data areas One per variable, or per file, or per program, Lifetime is entire execution
Placing Run-time Data Structures Classic Organization C o d e S G t l a & o t b i a c l H e a p 0 high Single Logical Address Space S t a c k Code, static, & global data have known size Heap & stack both grow & shrink over time This is a virtual address space
How Does This Really Work? The Big Picture Compiler s view virtual address spaces C o d e S G t l a & o t b i a c l H e a p S t a c k C o d e S G t l a & o t b i a c l H e a p S t a c k C o d e S G t l a & o t b i a c l H e a p S t a c k... C o d e S G t l a & o t b i a c l H e a p S t a c k OS s view... 0 high Hardware s view Physical address space_
Where Do Local Variables Live? A Simplistic model Allocate a data area for each distinct scope Need a data area per invocation (or activation) of a scope We call this the scope s activation record The compiler can also store control information there! Top of Stack parameters register save area return value return address Second Function First Function local variables parameters register save area return value return address local variables
Variable-length Data B0: { int a, b int v(a), c, x int z, y(8). Arrays If size is fixed at compile time, store in fixed-length data area If size is variable, store descriptor in fixed length area, with pointer to variable length area Variable-length data area is assigned at the end of the fixed length area for block in which it is allocated a b v c x z y(8) v(a) Includes variable length data for all blocks in the procedure Variable-length data
Activation Record Basics TOS parameters register save area return value return address local variables Space for parameters to the current routine Saved register contents If function, space for return value Address to resume caller Space for local values & variables (including spills) One AR for each invocation of a procedure
Activation Record Details Where do activation records live? If lifetime of AR matches lifetime of invocation, AND If code normally executes a return Keep ARs on a stack C o d e S G t l a & o t b i a c l H e a p S t a c k If a procedure can outlive its caller, OR If it can return an object that can reference its execution state ARs must be kept in the heap If a procedure makes no calls AR can be allocated statically Efficiency prefers static, stack, then heap
Communicating Between Procedures Most languages provide a parameter passing mechanism Expression used at call site becomes variable in callee Two common binding mechanisms Call-by-reference passes a pointer to actual parameter Requires slot in the AR (for address of parameter) Multiple names with the same address? Call-by-value passes a copy of its value at time of call Requires slot in the AR Each name gets a unique location Arrays are mostly passed by reference, not value Can always use global variables call fee(x,x,x); (may have same value)
Procedure Linkages Standard procedure linkage procedure p prolog pre-call post-return epilog procedure q prolog epilog Procedure has standard prolog standard epilog Each call involves a pre-call sequence post-return sequence These are completely predictable from the call site depend on the number & type of the actual parameters
Procedure Linkages Pre-call Sequence Sets up callee s basic AR Helps preserve its own environment The Details Allocate space for the callee s AR except space for local variables Evaluates each parameter & stores value or address Saves return address, caller s ARP into callee s AR If access links are used Find appropriate lexical ancestor & copy into callee s AR Save any caller-save registers Save into space in caller s AR Jump to address of callee s prolog code
Procedure Linkages Post-return Sequence Finish restoring caller s environment Place any value back where it belongs The Details Copy return value from callee s AR, if necessary Free the callee s AR Restore any caller-save registers Restore any call-by-reference parameters to registers, if needed Also copy back call-by-value/result parameters Continue execution after the call
Procedure Linkages Prolog Code Finish setting up the callee s environment Preserve parts of the caller s environment that will be disturbed The Details Preserve any callee-save registers If display is being used Save display entry for current lexical level Store current ARP into display for current lexical level Allocate space for local data Easiest scenario is to extend the AR Find any static data areas referenced in the callee Handle any local variable initializations With heap allocated AR, may need to use a separate heap object for local variables
Procedure Linkages Epilog Code Wind up the business of the callee Start restoring the caller s environment The Details Store return value? No, this happens on the return statement Restore callee-save registers Free space for local data, if necessary (on the heap) Load return address from AR Restore caller s ARP Jump to the return address If ARs are stack allocated, this may not be necessary. (Caller can reset stacktop to its pre-call value.)
Storage for Blocks within a Single Procedure procedure A { int a, b, c B1: { int v, b, x, w B2: { int x, y, z. B3: { int x, a, v Fixed length data can always be at a constant offset from the beginning of a procedure In our example, the a declared at level 0 will always be the first data element, stored at byte 0 in the fixed-length data area The x declared at level 1 will always be the sixth data item, stored at byte 20 in the fixed data area The x declared at level 2 will always be the eighth data item, stored at byte 28 in the fixed data area But what about the a declared in the second block at level 2?
Storage for Blocks within a Single Procedure procedure A { int a, b, c B1: { int v, b, x, w B2: { int x, y, z. B3: { int x, a, v Could implement activation record for each block. Too expensive! Share storage for blocks that don t overlap B2 and B3 do not overlap When enter a block put local variables at TOS (top of stack) Offsets computed statically Blocks B2 and B3 share same offsets!
Activation Record Details How does the compiler find the variables? They are at known offsets from the AR pointer The static coordinate leads to a loadai operation Level specifies an ARP, offset is the constant Variable-length data If AR can be extended, put it after local variables Leave a pointer at a known offset from ARP Otherwise, put variable-length data on the heap Initializing local variables Must generate explicit code to store the values Among the procedure s first actions
Establishing Addressability Must create base addresses Global & static variables Construct a label by mangling names (i.e., &_fee) Local variables Convert to static data coordinate and use ARP + offset Local variables of other procedures Convert to static coordinates Find appropriate ARP Use that ARP + offset { Must find the right AR Need links to nameable ARs
Establishing Addressability Using access links Each AR has a pointer to AR of lexical ancestor Lexical ancestor need not be the caller AR P parameters register save area return value return address access link caller s ARP local variables parameters register save area return value return address access link caller s ARP local variables parameters register save area return value return address access link caller s ARP local variables Reference to <p,16> runs up access link chain to p Cost of access is proportional to lexical distance
Establishing Addressability Using access links SC Generated Code access link <2,8> loadai r 0, 8 r 2 <1,12> loadai r 0, -4 r 1 loadai r 1, 12 r 2 <0,16> loadai r 0, -4 r 1 loadai r 1, -4 r 1 loadai r 1, 16 r 2 access link access link access link Access & maintenance cost varies with level All accesses are relative to ARP (r 0 )
Back to Activation Records If activation records are stored on the stack Easy to extend simply bump top of stack pointer Caller & callee share responsibility Caller can push parameters, space for registers, return value slot, return address, addressability info, & its own ARP Callee can push space for local variables (fixed & variable size) If activation records are stored on the heap Hard to extend Caller passes everything it can in registers Callee allocates AR & stores register contents into it Extra parameters stored in caller s AR! Static is easy
What about Object-Oriented Languages?
What about Object-Oriented Languages? What is an OOL? A language that supports object-oriented programming How does an OOL differ from an Algol-like languages? Data-centric name scopes for values & functions Dynamic resolution of names to their implementations How do we compile OOLs? Need to define what we mean by an OOL Term is almost meaningless today Smalltalk to C++ to Java We will focus on Java and C++
Object-Oriented Languages An object is an abstract data type that encapsulates data, operations and internal state behind a simple, consistent interface. The Concept: x Data Code Data Code Elaborating the concepts: Each object needs local storage for its attributes Attributes are static (lifetime of object ) Access is through methods Some methods are public, others are private More complex linkage convention; locating them is complex Object s internal state leads to complex behavior y z Data Code
OOLs & the Procedure Abstraction What is the shape of an OOL s name space? Local storage in objects (beyond attributes) Some storage associated with methods Local values inside a method Static values with lifetimes beyond methods Methods shared among multiple objects In some OOLs, everything is an object. In others, variables coexist with objects & inside objects. Classes Objects with the same state are grouped into a class Same attributes, instance variables, & methods Class variables are static, shared among all objects of same class Allows programmer to write it down once Allows code reuse at both source & implementation level
Implementing Object-Oriented Languages So, what can an executing method see? The object s own attributes & private variables Smalltalk terminology: instance variables The attributes & private variables of the class that defines it Smalltalk terminology: class variables Any object defined in the global name space (or scope) Objects may contain other objects An executing method might reference any of these A final twist: Most OOLs support a hierarchical notion of inheritance Some OOLs support multiple inheritance More than one path through the inheritance hierarchy
Java Name Space Given a method M for an object O in a class C, M can see: Local variables declared within M (lexical scoping) All instance variables and class variables of the class C All public and protected variables of any superclass of C Any classes defined in the same package or in a package imported public class variables and instance variables of imported classes package class and instance variables in the package containing C Class declarations can be nested! These member declarations hide outer class declarations of the same name (lexical scoping) Accessibility: public, private, protected, package
Java Symbol Tables To compile code in method M for an object O within a class C, the compiler needs: Lexically scoped symbol table for block and class nesting Just like Algol-like language inner declarations hide outer declarations Chain of symbol tables for inheritance Need mechanism to find the class and instance variables of all superclasses Symbol tables for all global classes (package scope) Entries for all members with visibility Need to construct symbol table for imported packages
Java Symbol Tables To find the address associated with a variable reference in method M for an object O within a class C, the compiler must For an unqualified use (i.e., x): Search the scoped symbol table for the current method Search the chain of symbol tables for the class hierarchy Search global symbol table (current package and imported) In each case check visibility attribute of x For a qualified use (i.e.: Q.x): Search stack-structured global symbol table for class nesting for Q Check visibility attribute of x
Implementing Object-Oriented Languages Two critical issues in OOL implementation: Object representation Mapping a method invocation name to a method implementation These both are intimately related to the OOL s name space Object Representation Static, private storage for attributes & instance variables Heap allocate object records or instances Need consistent, fast access Known, constant offsets Provision for initialization in NEW x fee() fie() foe() count 0 4 8 12
OOL Storage Layout (Java) Class variables Static class storage accessible by global name (class C) Accessible via linkage symbol &_C Nested classes are handled like blocks in Algol-like Languages Method code put at fixed offset from start of class area Object Representation x Object storage is heap allocated Fields at fixed offsets from start of object storage Methods Code for methods is stored with the class Methods accessed by offsets from code vector Method local storage in object (no calls) or on stack fee() fie() foe() count 0 4 8 12
Dealing with Single Inheritance Use prefixing of storage Class Point { int x, y; self x y Class ColorPoint extends Point { Color c; self x y c Does casting work properly?
Implementing Object-Oriented Languages Mapping message names to methods Static mapping, known at compile-time Fixed offsets & indirect calls Dynamic mapping, unknown until run-time Look up name in class table of methods (Java, C++) (Smalltalk) Want uniform placement of standard services (NEW, PRINT, ) This is really a data-structures problem Build a table of function pointers Use a standard invocation sequence
Implementing Object-Oriented Languages With static, compile-time mapped classes Class A Class B fee() fie() x i j k band() bend() m a b c foe() bind() fum() y i j k bond() n a b c z i j k p a b c Message dispatch becomes an indirect call through a function table
Single Inheritance and Dynamic Dispatch Use prefixing of tables Class Point { self int x, y; public void draw(); public void d2o(); table x y draw d2o Point: draw Point: d20 self table draw ColorPoint:draw Class ColorPoint extends Point { Color c; public void draw(); public void rev(); x y c d2o rev ColorPoint: rev