CSC 467 Lecture 13-14: Semantic Analysis

Similar documents
Type Checking. Chapter 6, Section 6.3, 6.5

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

1 Lexical Considerations

Intermediate Code Generation

Compilers. Compiler Construction Tutorial The Front-end

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Chapter 20: Binary Trees

Lexical Considerations

Semantic actions for declarations and expressions

LECTURE 3. Compiler Phases

CSE 431S Type Checking. Washington University Spring 2013

Semantic actions for declarations and expressions. Monday, September 28, 15

A Simple Syntax-Directed Translator

Principles of Compiler Design

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Semantic Analysis and Type Checking

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Formal Languages and Compilers Lecture IX Semantic Analysis: Type Chec. Type Checking & Symbol Table

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

Chapter 4 :: Semantic Analysis

Lexical Considerations

MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Programming Languages Third Edition. Chapter 7 Basic Semantics

More On Syntax Directed Translation

CS415 Compilers Context-Sensitive Analysis Type checking Symbol tables

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Compilers CS S-05 Semantic Analysis

The role of semantic analysis in a compiler

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

The compilation process is driven by the syntactic structure of the program as discovered by the parser

Modelica Change Proposal MCP-0019 Flattening (In Development) Proposed Changes to the Modelica Language Specification Version 3.

CSE 12 Abstract Syntax Trees

A programming language requires two major definitions A simple one pass compiler

Semantic actions for declarations and expressions

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

5. Syntax-Directed Definitions & Type Analysis

Operational Semantics. One-Slide Summary. Lecture Outline

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms

Semantic Analysis. CSE 307 Principles of Programming Languages Stony Brook University

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Solution printed. Do not start the test until instructed to do so! CS 2604 Data Structures Midterm Spring Instructions:

TML Language Reference Manual

CS 231 Data Structures and Algorithms, Fall 2016

Programming II (CS300)

MARKING KEY The University of British Columbia MARKING KEY Computer Science 260 Midterm #2 Examination 12:30 noon, Thursday, March 15, 2012

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

Compiler Principle and Technology. Prof. Dongming LU April 15th, 2019

Single-pass Static Semantic Check for Efficient Translation in YAPL

The New C Standard (Excerpted material)

Binary Tree Node Relationships. Binary Trees. Quick Application: Expression Trees. Traversals

Binary Trees. For example: Jargon: General Binary Trees. root node. level: internal node. edge. leaf node. Data Structures & File Management

CSCI2100B Data Structures Trees

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Chapter 3 (part 3) Describing Syntax and Semantics

Syntax Directed Translation

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Project Compiler. CS031 TA Help Session November 28, 2011

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

Syntax-Directed Translation

The Typed Racket Guide

10/18/18. Outline. Semantic Analysis. Two types of semantic rules. Syntax vs. Semantics. Static Semantics. Static Semantics.

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

A simple syntax-directed

Motivation was to facilitate development of systems software, especially OS development.

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

Lecture 4: Outline. Arrays. I. Pointers II. III. Pointer arithmetic IV. Strings

QUIZ. What are 3 differences between C and C++ const variables?

Semantic actions for expressions

BBM 201 Data structures

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct.

COP4020 Programming Languages. Semantics Robert van Engelen & Chris Lacher

Intermediate Code Generation

Procedural programming with C

Review of the C Programming Language for Principles of Operating Systems

Summer Final Exam Review Session August 5, 2009

COP4020 Programming Languages. Semantics Prof. Robert van Engelen

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

ECE 15B COMPUTER ORGANIZATION

Type checking of statements We change the start rule from P D ; E to P D ; S and add the following rules for statements: S id := E

Lecture Outline. COOL operational semantics. Operational Semantics of Cool. Motivation. Lecture 13. Notation. The rules. Evaluation Rules So Far

Type systems. Static typing

Definition of Graphs and Trees. Representation of Trees.

Topic 14. The BinaryTree ADT

Names, Scope, and Bindings

Chapter 7:: Data Types. Mid-Term Test. Mid-Term Test (cont.) Administrative Notes

1 Terminology. 2 Environments and Static Scoping. P. N. Hilfinger. Fall Static Analysis: Scope and Types

Functions in C C Programming and Software Tools

CS S-06 Semantic Analysis 1

Chapter 6 Intermediate Code Generation

Semantic Analysis. How to Ensure Type-Safety. What Are Types? Static vs. Dynamic Typing. Type Checking. Last time: CS412/CS413

Java and C CSE 351 Spring

School of Computer Science CPS109 Course Notes 5 Alexander Ferworn Updated Fall 15

syntax tree - * * * * * *

COP5621 Exam 3 - Spring 2005

Transcription:

CSC 467 Lecture 13-14: Semantic Analysis Recall Parsing is to translate token stream to parse tree Today How to build trees: syntax direction translation How to add information to trees: semantic analysis On Tree Traversals Trees are classic data structures. Trees have nodes and edges, so they are a special case of graphs. Tree edges are directional, with roles "parent" and "child" attributed to the source and destination of the edge. A tree has the property that every node has zero or one parent. A node with no parents is called a root. A node with no children is called a leaf. A node that is neither a root nor a leaf is an "internal node". Trees have a size (total # of nodes), a height (maximum count of nodes from root to a leaf), and an "arity" (maximum number of children in any one node). Parse trees are k-ary, where there is a variable number of children bounded by a value k determined by the grammar. You may wish to consult your old data structures book, or look at some books from the library, to learn more about trees if you are not totally comfortable with them. #include <stdarg.h> struct tree { short label; /* what production rule this came from */ short nkids; /* how many children it really has */ struct tree *child[1]; /* array of children, size varies 0..k */ ; struct tree *alctree(int label, int nkids,...) { int i; va_list ap; struct tree *ptr = malloc(sizeof(struct tree) + (nkids-1)*sizeof(struct tree *)); if (ptr == NULL) {fprintf(stderr, "alctree out of memory\n"); exit(1); ptr->label = label; ptr->nkids = nkids; va_start(ap, nkids); for(i=0; i < nkids; i++) ptr->child[i] = va_arg(ap, struct tree *); va_end(ap);

return ptr; Besides a function to allocate trees, you need to write one or more recursive functions to visit each node in the tree, either top to bottom (preorder), or bottom to top (postorder). You might do many different traversals on the tree in order to write a whole compiler: check types, generate machine- independent intermediate code, analyze the code to make it shorter, etc. You can write 4 or more different traversal functions, or you can write 1 traversal function that does different work at each node, determined by passing in a function pointer, to be called for each node. void postorder(struct tree *t, void (*f)(struct tree *)) { /* postorder means visit each child, then do work at the parent */ int i; if (t == NULL) return; /* visit each child */ for (i=0; i < t-> nkids; i++) postorder(t->child[i], f); /* do work at parent */ f(t); You would then be free to write as many little helper functions as you want, for different tree traversals, for example: void printer(struct tree *t) { if (t == NULL) return; printf("%p: %d, %d children\n", t, t->label, t->nkids); Semantic Analysis Semantic ("meaning") analysis refers to a phase of compilation in which the input program is studied in order to determine what operations are to be carried out. The two primary components of a classic semantic analysis phase are variable reference analysis and type checking. These components both rely on an underlying symbol table. What we have at the start of semantic analysis is a syntax tree that corresponds to the source program as parsed using the context free grammar. Semantic information is added by annotating grammar symbols with semantic attributes, which are defined by semantic rules. A semantic rule is a specification of how to calculate a semantic attribute that is to be added to the parse tree. So the input is a syntax tree...and the output is the same tree, only "fatter" in the sense that nodes carry more information. Another output of semantic analysis are error messages detecting many types of semantic errors. Two typical examples of semantic analysis include:

variable reference analysis the compiler must determine, for each use of a variable, which variable declaration corresponds to that use. This depends on the semantics of the source language being translated. type checking the compiler must determine, for each operation in the source code, the types of the operands and resulting value, if any. Notations used in semantic analysis: syntax-directed definitions high-level (declarative) specifications of semantic rules translation schemes semantic rules and the order in which they get evaluated In practice, attributes get stored in parse tree nodes, and the semantic rules are evaluated either (a) during parsing (for easy rules) or (b) during one or more (sub)tree traversals. Two Types of Attributes: synthesized attributes computed from information contained within one's children. These are generally easy to compute, even on-the-fly during parsing. inherited attributes computed from information obtained from one's parent or siblings These are generally harder to compute. Compilers may be able to jump through hoops to compute some inherited attributes during parsing, but depending on the semantic rules this may not be possible in general. Compilers resort to tree traversals to move semantic information around the tree to where it will be used. Attribute Examples Isconst and Value Not all expressions have constant values; the ones that do may allow various optimizations. CFG Semantic Rule E 1.isconst = E 2.isconst && T.isconst E 1 : E 2 + T if (E 1.isconst) E 1.value = E 2.value + T.value E.isconst = T.isconst E : T if (E.isconst) E.value = T.value T : T * F T 1.isconst = T 2.isconst &&

T : F F : ( E ) F : ident F : intlit F.isconst if (T 1.isconst) T 1.value = T 2.value * F.value T.isconst = F.isconst if (T.isconst) T.value = F.value F.isconst = E.isconst if (F.isconst) F.value = E.value F.isconst = FALSE F.isconst = TRUE F.value = intlit.ival Symbol Table Module Symbol tables are used to resolve names within name spaces. Symbol tables are generally organized hierarchically according to the scope rules of the language. Although initially concerned with simply storing the names of various that are visible in each scope, symbol tables take on additional roles in the remaining phases of the compiler. In semantic analysis, they store type information. And for code generation, they store memory addresses and sizes of variables. mktable(parent) creates a new symbol table, whose scope is local to (or inside) parent enter(table, symbolname, type, offset) insert a symbol into a table lookup(table, symbolname) lookup a symbol in a table; returns structure pointer including type and offset. lookup operations are often chained together progressively from most local scope on out to global scope. addwidth(table) sums the widths of all entries in the table. ("widths" = #bytes, sum of widths = #bytes needed for an "activation record" or "global data section"). Worry not about this method until code generation you wish to implement. enterproc(table, name, newtable) enters the local scope of the named procedure Variable Reference Analysis The simplest use of a symbol table would check: for each variable, has it been declared? (undeclared error) for each declaration, is it already declared? (redeclared error)

Reading Tree Leaves In order to work with your tree, you must be able to tell, preferably trivially easily, which nodes are tree leaves and which are internal nodes, and for the leaves, how to access the lexical attributes. Options: 1. encode in the parent what the types of children are 2. encode in each child what its own type is (better) How do you do option #2 here? Perhaps the best approach to all this is to unify the tokens and parse tree nodes with something like the following, where perhaps an nkids value of -1 is treated as a flag that tells the reader to use lexical information instead of pointers to children: struct node { int code; /* terminal or nonterminal symbol */ int nkids; union { struct token {... leaf; struct node *kids[9]; u; ; There are actually nonterminal symbols with 0 children (nonterminal with a righthand side with 0 symbols) so you don't necessarily want to use an nkids of 0 is your flag to say that you are a leaf. Type Checking Perhaps the primary component of semantic analysis in many traditional compilers consists of the type checker. In order to check types, one first must have a representation of those types (a type system) and then one must implement comparison and composition operators on those types using the semantic rules of the source language being compiled. Lastly, type checking will involve adding (mostly-) synthesized attributes through those parts of the language grammar that involve expressions and values. Type Systems Types are defined recursively according to rules defined by the source language being compiled. A type system might start with rules like: Base types (int, char, etc.) are types Named types (via typedef, etc.) are types Types composed using other types are types, for example: o array(t, indices) is a type. In some languages indices always start with 0, so array(t, size) works. o T1 x T2 is a type (specifying, more or less, the tuple or sequence T1 followed by T2; x is a so-called cross-product operator).

o record((f1 x T1) x (f2 x T2) x... x (fn x Tn)) is a type o in languages with pointers, pointer(t) is a type o (T 1 x... T n ) -> T n+1 is a type denoting a function mapping parameter types to a return type In some language type expressions may contain variables whose values are types. In addition, a type system includes rules for assigning these types to the various parts of the program; usually this will be performed using attributes assigned to grammar symbols Representing C (C++, Java, etc.) Types The type system is represented using data structures in the compiler's implementation language. In the symbol table and in the parse tree attributes used in type checking, there is a need to represent and compare source language types. You might start by trying to assign a numeric code to each type, kind of like the integers used to denote each terminal symbol and each production rule of the grammar. But what about arrays? What about structs? There are an infinite number of types; any attempt to enumerate them will fail. Instead, you should create a new data type to explicitly represent type information. This might look something like the following: struct c_type { int base_type; /* 1 = int, 2=float,... */ union { struct array { int size; struct c_type *elemtype; a; struct ctype *p; struct struc { char *label; struct field **f; s; u; struct field { char *name; struct ctype *elemtype; Given this representation, how would you initialize a variable to represent each of the following types: int [10][20] struct foo { int x; char *s; Example Semantic Rules for Type Checking grammar rule semantic rule E 1 : E 2 PLUS E 3 E 1.type = check_types(plus, E 2.type, E 3.type)

Where check_types() returns a (struct c_type *) value. One of the values it should be able to return is Error. The operator (PLUS) is included in the check types function because behavior may depend on the operator -- the result type for array subscripting works different than the result type for the arithmetic operators, which may work different (in some languages) than the result type for logical operators that return booleans. Type Promotion and Type Equivalence When is it legal to perform an assignment x = y? When x and y are identical types, sure. Many languages such as C have automatic promotion rules for scalar types such as shorts and longs. The results of type checking may include not just a type attribute, they may include a type conversion, which is best represented by inserting a new node in the tree to denote the promoted value. Example: int x; long y; y = y + x; For records/structures, some languages use name equivalence, while others use structure equivalence. Features like typedef complicate matters. If you have a new type name MY_INT that is defined to be an int, is it compatible to pass as a parameter to a function that expects regular int's? Object-oriented languages also get interesting during type checking, since subclasses usually are allowed anyplace their superclass would be allowed. Implementing Structs 1. storing and retrieving structs by their label -- the struct label is how structs are identified. You do not have to do typedefs and such. The labels can be keys in a separate hash table, similar to the global symbol table. You can put them in the global symbol table so long as you can tell the difference between them and variable names. 2. You have to store fieldnames and their types, from where the struct is declared. You could use a hash table for each struct, but a link list is OK as an alternative. 3. You have to use the struct information to check the validity of each dot operator like in rec.foo. To do this you'll have to lookup rec in the symbol table, where you store rec's type. rec's type must be a struct type for the dot to be legal, and that struct type should include a hash table or link list that gives the names and types of the fields -- where you can lookup the name foo to find its type.