Chapter 5 Basic Semantics Attributes, Bindings, and Semantic Functions Declarations, Blocks, Scope, and the Symbol Table Name Resolution and Overloading Allocation, Lifetimes, and the Environment Variables and Constants
Introduction Syntactic Specification: BNF, Syntax Diagram, etc. Semantic Specification Reference Manual natural language is used hard to be precise but easy to read Language Translator the execution of a program is mandatory to know the semantics machine dependant Formal Definition precise but hard to read (see Chapter 13) We will take the reference manual approach Programming Languages 2
Attributes Attributes The properties of language entities, typically identifiers For other language entities, say operator symbols, the attributes of which are often predetermined. Examples of Attributes the location of a variable: the place holder the value of an expression: the storable quantity the types of identifiers: determine the operations applicable the code of a procedure: the operation of the procedure the size of a certain type: determines the value range Programming Languages 3
Binding Binding The process of associating an attribute to a name Declarations (including definitions) bind the attributes to identifiers Example of Binding // C++ example int x, *y; x = 2; y = new int(3); x y Type: integer Value: 2 Type: integer pointer Value: 3 Programming Languages 4
Binding Time The time when a binding occurs Large Categories of Binding Times Static binding: the binding occurs prior to the execution Dynamic binding: the binding occurs during the execution Further Refined Binding Time Categories Language definition time Language implementation time Program translation time Link time Load time Execution time (run time) Programming Languages 5
Binding Time Examples The value of an expression execution time if it contains variables translation time if it is a constant expression The type of an identifier translation time in a compilation system (say, Java) execution time in an interpreter (say, LISP) The maximum number of digits of an integer language definition time for some languages (say, Java) language implementation time for others (say, C) The location of a variable load time for static variables (static in C) execution time for dynamic variables (auto in C) Programming Languages 6
Symbol Table and Environment Symbol Table The data structure (of a language translator) which maintains the binding information Mathematically, a function SymbolTable: Names Attributes In a Compilation System The attributes are maintained by multiple functions SymbolTable: Names (Some) StaticAttributes Environment: Names Locations Memory: Locations Values In an Interpretation System The symbol table and the environment are combined Environment: Names Attributes (including Locations and Values) Programming Languages 7
Declarations A principal method for establishing bindings implicit binding: implicitly assumed by a declaration explicit binding: explicitly specified by a declaration example) In the following C declaration, int x; the type binding is an explicit binding the location binding is an implicit binding Implicit Declaration The naming convention may determine the binding example) FORTRAN: the variable whose name starts with I, J, K, L, M, or N is implicitly an integer BASIC: the trailing % of variable means an integer, $ means a string Programming Languages 8
Different Terms for Declarations Some languages differentiate definitions from declarations Definition: a declaration that binds all potential attributes Declaration: a declaration that binds only a part of attributes C language example a function declaration (prototype) binds only the type (the interface) of the function an external variable declaration does not bind the location attributes an incomplete type specification (type declaration) may be used to resolve mutually recursive type definitions Programming Languages 9
Examples of C Declarations int x = 0; Explicitly specifies data type and initial value. Implicitly specifies scope (explained later) and location in memory. int f(double); Explicitly specifies type (double int) Implicitly specifies nothing else: needs another declaration specifying code The former is called a definition in C, the latter is simply a declaration. Programming Languages 10
Block and Locality Block a standard language construct which may contain declarations unit of allocations Locality of the Declarations or the References Local: the declaration and the reference for a name are in the same block Non-Local: the declaration of a name is not in the block which contains the reference for the name Note that we need some rules to locate corresponding declarations for non-local references. Programming Languages 11
Block-Structured Block-Structured Program The program is consisted of blocks, which may be nested Most Algol descendants exploit this structure Kinds of Blocks procedural block: Pascal non-procedural block: Ada, C -- Ada example (* Pascal example *) declare x: integer; begin end program ex; var x: integer begin end Programming Languages 12
Other entities containing declarations Structured Data Type a type declaration which contains declarations examples: struct in C, class in C++, Java, Smalltalk Module or Package a collection of declarations the collection itself is not a declaration examples) Ada: package and task Java: package ML and Haskell: module C++: namespace Programming Languages 13
Scope Scope of a Binding the region of the program over which the binding is maintained Scope of a Name (abuse of the term scope ) the same name may declared multiply these different declarations of the same name should be differentiated Scope and Block Declaration before Use Rule: The scope is typically extends to the end of the block which contains the declaration In some constructs, the scope may extend backwards to the beginning of the block (classes in Java and C++, top-level declarations in Scheme) Programming Languages 14
Scope Rules Lexical Scope (Static Scope) Rule the scope of a binding is the inside of the block which contains the corresponding declaration the standard scope rule of most block-structured languages Dynamic Scope Rule the scope of a binding is determined according to the execution path the symbol table (or the environment) should be managed dynamically Programming Languages 15
Lexical Scope Example (C) int x; void p(void) { char y;... } /* p */ y In C, the declaration before use rule apply. void q(void) { double z;... } /* q */ z p x main() { int w[10];... } w main q Programming Languages 16
Scope Holes What is a scope hole? a local declaration of a name can mask a prior declaration of the same name in this case, the masked declaration has a scope hole over the corresponding block Visibility and Scope Visibility: the region where the declared name is visible (excluding scope holes) Scope: the region where the binding exists (including scope holes) Programming Languages 17
Scope Resolution Operator // C++ example int x; void p(void) { char x; x = 'a'; // local x ::x = 42; // global x... } /* p */ main() { x = 2; // global x... } The global integer variable x has a scope hole in the body of p. In C, the global x cannot be referenced in p. In C++, the global x can be referenced in p using a scope resolution operator ::. Ada also has a scope resolution operator.. Programming Languages 18
File Scope in C File Scope In C, a global name can be turned into a file scope name by attaching the keyword static. A file scope name can only be referenced in that file. Example File 1 extern int x;... x... File 2 int x;... x... File 3 static int x;... x... Programming Languages 19
Recursive Declaration Recursive functions are generally well-defined int factorial(int n) {... factorial(n 1)... } How about recursive variables? int x = x + 1; Not allowed in Ada or Java Allowed in C/C++ for local variables but meaningless Dealing with mutual recursions in the context of declaration before use rule. forward declarations in Pascal prototype declarations in C/C++ Programming Languages 20
Java Scope Example public class Scope { public static void f() { System.out.println(x); } public static void main(string[] args) } { int x = 3; f(); } public static int x = 2; In Java classes, the declaration before use rule does not apply. In a class declaration, the scope of a declaration is the entire class. Note the underlined declaration. The result may differ according to the scope rules. The above code prints 2. (Java adopts lexical scope rule) Under dynamic scope rule, the code would print 3. Programming Languages 21
Dynamic Scope Evaluated Disadvantages of the Dynamic Scope The scope of a declaration cannot be determined statically. (Hand-simulation is needed to find the applicable declaration.) The types of identifiers cannot be determined statically. (A static type-checking is impossible) Historical Note Originally used in Lisp. Scheme could still use it, but doesn't. Some languages still use it: VBScript, Javascript, Perl (older versions). Lisp inventor (McCarthy) now calls it a bug. Programming Languages 22
Symbol Table Maintenance In a lexically scoped languages, The symbol table is maintained like a stack. The symbol table is maintained statically, i.e. regardless to the execution path. In a dynamically scoped languages, All the bindings of the outermost names are constructed. The bindings are maintained according to the execution path. Programming Languages 23
Symbol Table under Lexical Scope int x; char y; void p(void) { double x;... { /* block b */ int y[10];... }... } void q(void) { int y;... } x y p q main double int char local global to main p int[10] char local global to bq void function void function int function int int global global char global main() { char x;... } Programming Languages 24
Symbol Table under Dynamic Scope #include <stdio.h> int x = 1; char y = 'a'; void p(void) { double x = 2.5; printf("%c\n",y) { /* block b */ int y[10]; } } void q(void) { int y = 42; printf("%d\n",x); p(); } main() { char x = 'b'; q(); return 0; } x y p q main Output double=2.5 char='b' int=1 local global local to to main p int=42 char='a' local global to q void function void function int function 98 * char int=1 global char='a' global ASCII value of 'b' ASCII 42 = 'b' Programming Languages 25
Nested Symbol Table Scoping Structures Some language constructs are allowed to have their own name spaces Examples Pascal: functions, procedures C++: classes, functions, structures, namespaces Java: classes, methods, packages Stacks of Symbol Tables When a language has scoping structures, it may be convenient to construct stacks of symbol tables Programming Languages 26
Nested Symbol Table Example public class Scope { public static void f() { System.out.println(x); } public static void main(string[] args) { int x = 3; f(); } public static int x = 2; } name name Scope bindings public class symtab f main x bindings Under lexical scope, the symbol table of the left code can be the following. public static void method symtab public static void method symtab public static int name name args bindings String[] parameter Programming Languages 27 x (empty) bindings int local
Overloading What is overloading? reusing names for different entities of a kind within the same scope entity 1 /name 1, entity 2 /name 2 (entity 1, entity 2 )/name the name is overloaded in the above case operator overloading, function overloading Overload Resolution choosing the appropriate entity for the given usage of the overloaded name the calling context (the information contained in a call) is generally used for overload resolution Programming Languages 28
Overloading Example In most languages, the operator + is overloaded integer addition (say, ADDI) floating point number addition (say, ADDF) Disambiguating Clue: data types of operands How about mixed-type expression? 2 + 3.2 C/C++ adopts promotion (implicit type conversion). Ada treats the above expression error. Programming Languages 29
Function Overloading int max(int x, int y) // max #1 { return x > y? x : y; } double max(double x, double y) // max #2 { return x > y? x : y; } int max(int x, int y, int z) // max #3 { return x > y? (x > z? x : z) : (y > z? y : z); } Name resolution max(2,3) calls max #1 max(2.1,3.2) calls max #2 max(1,3,2) calls max #3 Programming Languages 30
Overload Resolution Issues Implicit conversions may cause ambiguous calls max(2.1, 3) C++: too many candidates (max #1 or max #2) Ada: no candidates Java: implicit conversions are used only for the cases of no information loss Whether the return type is included in the calling context or not C++, Java: not included Ada: included Programming Languages 31
Function Overloading in Ada procedure overload is function max(x: integer; y: integer) -- max #1 return integer is begin... end max; function max(x: integer; y: integer) -- max #2 return float is begin... end max; a: integer; b: float; begin -- max_test a := max(2,3); -- call to max # 1 b := max(2,3); -- call to max # 2 end overload; In Ada, the return type is included in the calling context. Programming Languages 32
Operator Overloading in C++ #include <iostream> using namespace std; typedef struct { int i; double d; } IntDouble; bool operator < (IntDouble x, IntDouble y) { return x.i < y.i && x.d < y.d; } IntDouble operator + (IntDouble x, IntDouble y) { IntDouble z; z.i = x.i + y.i; z.d = x.d + y.d; return z; } int main() { IntDouble x = {1,2.1}, y = {5,3.4}; if (x < y) x = x + y; else y = x + y; cout << x.i << " " << x.d << endl; return 0; } Programming Languages 33
Operator Overloading in Ada procedure opover is type IntDouble is record i: Integer; d: Float; end record; function "<" (x,y: IntDouble) return Boolean is... function "+" (x,y: IntDouble) return IntDouble is... x, y: IntDouble; begin x := (1,2.1); y := (5,3.4); if (x < y) then x := x + y; else y := x + y; end if; put(x.i); put(" "); put(x.d); new_line; end opover; Programming Languages 34
Notes on Operator Overloading The syntactic properties (associativity and precedence) of the operators are not changed by operator overloading. Special notations are used for prefix form of the operators. x + y prefix form in Ada: "+"(x, y) prefix form in C++: operator + (x, y) Programming Languages 35
Other Kinds of Reuse of Names Reusing names for different kinds of entities Separate name space for each kind is needed. These kinds of reusing is not an overloading. C example typedef struct A A; struct A { int data; A * next; }; structure tag name type name Java example class A { A A(A A) { A: for(;;) } } Which is which? class name method name parameter name label name { if (A.A(A) == A) break A; } return A; Programming Languages 36
Environment Environment Construction Time static environment: FORTRAN dynamic environment: LISP mixture: most Algol-style languages Variable Allocation Time in a Algol-style Language global variables static allocation allocated in load time local variables mostly dynamic allocation allocated in the declaration elaboration time (i.e. when the control flow passing the declaration) Programming Languages 37
Typical Environment Components of Typical Algol-style Languages static area for static allocation stack for LIFO-style dynamic allocation heap for on-demand dynamic allocation Programming Languages 38
Activation Record Activation an invocation of a subprogram the subprogram environment should be constructed for each activation Activation Record the region of memory allocated for an activation subprogram environment + bookkeeping information Run-Time Stack block enters and exits are LIFO-style procedure calls and returns are LIFO-style activation records are stored in the run-time stack Programming Languages 39
Run-Time Stack Manipulation A: { int x; char y; point 1 B: { double x; int a; } /* end B */ C: { char y; int b; D: { int x; double y; } /* end D */ } /* end C */ } /* end A */ Programming Languages 40
Run-Time Stack Manipulation A: { int x; char y; B: { double x; int a; point 2 } /* end B */ C: { char y; int b; D: { int x; double y; } /* end D */ } /* end C */ } /* end A */ Programming Languages 41
Run-Time Stack Manipulation A: { int x; char y; B: { double x; int a; } /* end B */ C: { char y; point 3 int b; D: { int x; double y; } /* end D */ } /* end C */ } /* end A */ Programming Languages 42
Run-Time Stack Manipulation A: { int x; char y; B: { double x; int a; } /* end B */ C: { char y; int b; point 4 D: { int x; double y; } /* end D */ } /* end C */ } /* end A */ Programming Languages 43
Run-Time Stack Manipulation A: { int x; char y; B: { double x; int a; } /* end B */ C: { char y; int b; D: { int x; double y; point 5 } /* end D */ } /* end C */ } /* end A */ Programming Languages 44
Heap Manipulation Heap (Free Store) the memory pool for the objects allocated manually Heap Deallocation manual deallocation: special functions or operators are used for deallocation (free in C, delete in C++) automatic deallocation: garbage collector is used (more safe but somewhat slow, Java) Ada Approach Ada does not provide delete operation but allows a userdefined deallocation (Unchecked_Deallocation) Programming Languages 45
Pointer and Dereferencing Pointer an object whose value is a reference to an object Dereferencing referencing an object via a pointer value In order to manipulate the heap objects, pointers are mandatory (either implicitly or explicitly) /* C example */ int *x; // pointer declaration x = (int*)malloc(sizeof(int)); // memory allocation *x = 5; // dereferencing free(x); // deallocation Programming Languages 46
Lifetime Storable Object a chunk of memory cells an area of storage that is allocated in the environment The Lifetime (or Extent) of an Object the duration of its allocation in the environment Lifetime vs. Scope the lifetime and the scope of variables are closely related but not identical (cf. local static variables in C/C++) according to the scope: local, global according to the lifetime: static, dynamic Programming Languages 47
Local Static Variable Example (C) int p(void) { static int p_count = 0; /* initialized only once - not each call! */ p_count += 1; return p_count; The variable p_count counts the } number of calls of the function p. Accordingly, p is history sensitive. main() Guess the output! { int i; for (i = 0; i < 10; i++) { if (p() % 3) printf("%d\n",p()); } return 0; } Programming Languages 48
Variables and Constants Variable an object whose value may change during execution Constants an object whose value does not change for its lifetime Literals a language entity whose value is explicit from its name a kind of constants but may never be allocated Programming Languages 49
Diagrams for Variables Schematic Representation Box-Circle Diagram Programming Languages 50
L-value and R-value L-value and R-value of a Variable l-value (LHS value): the location r-value (RHS value): the value stored Language Examples ML has only references and the r-value is explicit. x :=!x + 1!x means r-value of x. C has address-of operator (&) and dereferencing operator (*) but the distinction of l-values and r-values is normally implicit. Programming Languages 51
Assignment General Syntax infix notation variable assingmentopertor expression Semantics storage semantics assignment by value-copying pointer semantics assignment by sharing (shallow copying) assignment by cloning (deep copying) Programming Languages 52
Assignment by Value-Copying The value of the variable is copied. x = y Programming Languages 53
Assignment by Sharing The location of the variable is copied. x = y Programming Languages 54
Assignment by Cloning The location and the value of the variable is duplicated. x = y Programming Languages 55
Java Example Java supports all the kinds of assignment semantics assignment of object variables: assignment by sharing assignment of simple data: assignment by value-copying object cloning is supported by the method clone. A closer view of object assignment in Java x = y Programming Languages 56
Constant Semantics Schematic Diagram for Constants Constant has Value Semantics Once the value binding is constructed, the value cannot be changed The location of a constant cannot be referred to. Programming Languages 57
Classification of Constants Literals and Named Constants literals: names denote the exact value named constants: names for the meaning of the value Classification of Named Constants static constants (manifest constants) compile-time static constants (may never be allocated) load-time static constants dynamic constants Programming Languages 58
Constant Example (Java) Compile-time constant in Java: static final int zero = 0; Load-time constant in Java: static final Date now = new Date(); Dynamic constant in Java: any non-static final assigned in a constructor. Java vs. C Java takes a very general view of constants, since it is not very worried about getting rid of them during compilation. C takes a much stricter view of constants, essentially forcing them to be capable of elimination during compilation. Programming Languages 59
Constant Initialization C vs. C++ In C, the initial value of a static constant (or a static variable) should be computed from literals only In C++, the above restriction is removed #include <stdio.h> #include <time.h> const int a = 2; const int b = 27+2*2; /* legal in C */ const int c = (int) time(0); /* illegal C code! */ int b = 27+a*a; /* also illegal in C */ Programming Languages 60
How about functions? Function Constants The function names in most languages are constants. Function Variables may be implemented by pointers Function Literals (anonymous functions) Most functional languages support anonymous functions. /* function pointer in C */ int gcd( int u, int v) { if (v == 0) return u; else return gcd(v, u % v); } int (*fv)(int,int) = gcd; main() { printf("%d\n", fv(15,10)); return 0; } Anonymous function in ML (fn(x:int) => x * x) 2; evaluates val it = 4 : int Programming Languages 61
Aliases Aliases two or more different names for the same object at the same time bad for readability (may cause potentially harmful side effects; see the next slide) Side Effect any change persists beyond the execution of a statement Potentially Harmful Side Effect the side effect cannot be determined from the written statement the previous code should also be read Programming Languages 62
An Example of Harmful Aliases main() { int *x, *y; x = (int *) malloc(sizeof(int)); *x = 1; y = x; /* *x and *y now aliases */ *y = 2; printf("%d\n",*x); return 0; } Programming Languages 63
What makes aliases? What makes aliases? pointer assignment call-by-reference parameters assignment by sharing explicit-mechanism for aliasing: EQUIVALENCE and COMMON in FORTRAN variant records Why explicit-mechanism for aliasing in FORTRAN? in order to save memory the memory was a valuable resource at that time Programming Languages 64
Dangling References Dangling References locations accessible but deallocated the locations are deallocated too early dangerous! What makes dangling references? pointer assignment and explicit deallocation pointer assignment and implicit deallocation by block exit by function exit Programming Languages 65
An Example of Dangling References main() { int *x, *y; x = (int *) malloc(sizeof(int)); *x = 1; y = x; /* *x and *y now aliases */ free(x); /* *y now a dangling reference */ printf("%d\n",*y); /* illegal reference */ return 0; } Programming Languages 66
Garbage Garbage (Dangling Objects) inaccessible memory locations that are allocated the locations are deallocated too late a waste of memory but not harmful What makes garbage? explicit allocation and the access point is lost due to assignment deallocation of the access point Programming Languages 67
An Example of Garbage main() { int *x; x = (int *) malloc(sizeof(int)); x = 1; /* OOPS! */... return 0; } Programming Languages 68
Garbage Collection A language subsystem that automatically reclaims garbage. Most functional language implementations and some object-oriented language implementations are using garbage collectors. (See Section 8.5) Programming Languages 69