EI326 ENGINEERING PRACTICE & TECHNICAL INNOVATION (III-G) Kenny Q. Zhu Dept. of Computer Science Shanghai Jiao Tong University
KENNY ZHU Research Interests: Programming Languages Data processing Coordination Probabilistic Programming Program Analysis Degrees: Postdoc: Experiences: Data & Knowledge Engineering Unstructured Data Graph Data Information extraction Knowledge discovery NLP Recent Publications: POPL 08, SIGMOD 12, TACL 14, ICDE 13, 15 National University of Singapore (NUS) Princeton University Microsoft Redmond, USA Microsoft Research Asia Research Professor & PhD Advisor at SJTU since 2009 Director of ADAPT Lab 2
OBJECTIVE OF THIS COURSE Understand the principles of programming languages (this week) Case study: high level programming languages, such as C/C++, Java, Python, etc. (2 nd week) Review some of the basics in compilers lexical and syntactical analysis (3 rd week) Study the principles and tools of source-to-source compilation (4 th week) Implement a source-to-source compiler (any two languages) 3
PRACTICE VS. INNOVATION Practice: Learn to use multiple high level programming language Implementation of source-to-source compiler Innovation: Reduce the human effort in the source-to-source compilation Try to compile into languages of different paradigm (functional or logic) or niche languages (GPU cuda). 4
ADMINISTRATIVE INFO (I) All-English Course: everything in English! Lecturer: Kenny Zhu, SEIEE #03-524, kzhu@cs.sjtu.edu.cn Office hours: by appointment or after class Teaching Assistant: Yu Gong, SEIEE #03-341, gy910210@163.com Office hours: Thursday 16:00-17:00 Course Web Page (definitive source!): http://www.cs.sjtu.edu.cn/~kzhu/ei326/ 3
ADMINISTRATIVE INFO (II) Format: Lectures on Mon & Wed (Week #1-#4) Textbook: Programming Languages Principles and Paradigms, 2 nd Edition, by Tucker & Noonan, McGraw Hill / Tsinghua University Press Compilers Principles, Techniques and Tools, 1 st Edition, 2011, By Aho, Lam and Sethi, Mechanical Industry Press ROSE Tutorial Tutorials and Wiki pages on the web Lecture materials on course web page 4
ADMINISTRATIVE INFO (III) 2-credit course (easy!) Modes of Assessment: Attendance 10% Home Reading (code/articles) 20% Project: 70% Implement a source-to-source compiler between two major high level programming language Due 14 th week. 5
PRINCIPLES OF PROGRAMMING LANGUAGES 8
BASIC PROPERTIES Programming languages have four properties: Syntax Names Types Semantics For any language: Its designers must define these properties Its programmers must master these properties 9
PARADIGMS A programming paradigm is a pattern of problemsolving thought that underlies a particular genre of programs and languages. There are four main programming paradigms: Imperative Object-oriented Functional Logic (declarative) 10
ABSTRACTION IN PROGRAMMING Data Programmer-defined types/classes Class libraries Procedural Programmer-defined functions Standard function libraries 11
ORTHOGONALITY A language is orthogonal if its features are built upon a small, mutually independent set of primitive operations. Fewer exceptional rules = conceptual simplicity E.g., restricting types of arguments to a function Tradeoffs with efficiency 12
SYNTAX 13
THINKING ABOUT SYNTAX The syntax of a programming language is a precise description of all its grammatically correct programs. Precise syntax was first used with Algol 60, and has been used ever since. Three levels: Lexical syntax Concrete syntax Abstract syntax 14
LEVELS OF SYNTAX Lexical syntax = all the basic symbols of the language (names, values, operators, etc.) Concrete syntax = rules for writing expressions, statements and programs. Abstract syntax = internal representation of the program, favoring content over form. E.g., C: if ( expr )... discard ( ) Ada: if ( expr ) then discard then 15
GRAMMARS A metalanguage is a language used to define other languages. A grammar is a metalanguage used to define the syntax of a language. Our interest: using grammars to define the syntax of a programming language. 16
BNF GRAMMAR Set of productions: P terminal symbols: T nonterminal symbols: N start symbol: S Î N A production has the form where A Î N and w Î (N ÈT)* 17
EXAMPLE: BINARY DIGITS Consider the grammar: binarydigit 0 binarydigit 1 or equivalently: binarydigit 0 1 Here, is a metacharacter that separates alternatives.
DERIVATIONS Consider the grammar: Integer Digit Integer Digit Digit 0 1 2 3 4 5 6 7 8 9 We can derive any unsigned integer, like 352, from this grammar.
DERIVATION OF 352 AS AN INTEGER Use a grammar rule to enable each step: Integer Integer Digit Integer 2 Integer Digit 2 Integer 5 2 Digit 5 2 3 5 2
NOTATION FOR DERIVATIONS Integer * 352 Means that 352 can be derived in a finite number of steps using the grammar for Integer. 352 L(G) Means that 352 is a member of the language defined by grammar G. L(G) = { T* Integer * } Means that the language defined by grammar G is the set of all symbol strings that can be derived as an Integer. 21
PARSE TREES A parse tree is a graphical representation of a derivation. Each internal node of the tree corresponds to a step in the derivation. Each child of a node represents a right-hand side of a production. Each leaf node represents a symbol of the derived string, reading from left to right. 22
THE STEP INTEGER INTEGER DIGIT APPEARS IN THE PARSE TREE AS: Integer Integer Digit
PARSE TREE FOR 352 AS AN INTEGER Do different derivations give the same parse tree? 24
ARITHMETIC EXPRESSION GRAMMAR The following grammar defines the language of arithmetic expressions with 1-digit integers, addition, and subtraction. Expr Expr + Term Expr Term Term Term 0... 9 ( Expr )
PARSE OF THE STRING 5-4+3 26
LINKING SYNTAX AND SEMANTICS Output: parse tree is inefficient Example: Parse tree for z = x + 2*y; 27
FINDING A MORE EFFICIENT TREE The shape of the parse tree reveals the meaning of the program. So we want a tree that removes its inefficiency and keeps its shape. Remove separator/punctuation terminal symbols Remove all trivial root nonterminals Replace remaining nonterminals with leaf terminals Example: next slide 28
ABSTRACT SYNTAX TREE FOR Z = X + 2*Y; 29
ABSTRACT SYNTAX Removes syntactic sugar and keeps essential elements of a language. E.g., consider the following two equivalent loops: Pascal while i < n do begin end; i := i + 1; C/C++ while (i < n) { } i = i + 1; The only essential information in each of these is 1) that it is a loop, 2) that its terminating condition is i < n, and 3) that its body increments the current value of i. 30
EXAMPLE ABSTRACT SYNTAX TREE Binary node op term1 term2 Abstract Syntax Tree for x+2*y Binary Operator + Variable x Binary Operator * Value Variable 2 y 31
ABSTRACT SYNTAX AS JAVA CLASSES abstract class Expression { } abstract class VariableRef extends Expression { } class Variable extends VariableRef { String id; } class Value extends Expression { } class Binary extends Expression { Operator op; Expression term1, term2; } class Unary extends Expression { UnaryOp op; Expression term; }
NAMES 33
TERMINOLOGY Binding is an association between an entity (such as a variable) and a property (such as its value). A binding is static if THE association occurs before run-time. A binding is dynamic if the association occurs at runtime. Name bindings play a fundamental role. The lifetime of a variable name refers to the time interval during which memory is allocated. 34
SYNTACTIC ISSUES Lexical rules for names. Collection of reserved words or keywords. Case sensitivity C-like: yes Early languages: no PHP: partly yes, partly no 35
RESERVED WORDS Cannot be used as Identifiers Usually identify major constructs: if while switch Predefined identifiers: e.g., library routines 36
VARIABLES Basic bindings Name Address Type Value Lifetime L-value - use of a variable name to denote its address. Ex: x = // Pointer example: R-value - use of a variable name to int x,y; denote its value. int *p; Ex: = x Some languages support/require x = *p; explicit dereferencing. *p = y; Ex: x :=!y + 1 37
SCOPE The scope of a name is the collection of statements which can access the name binding. In static scoping, a name is bound to a collection of statements according to its position in the source program. Most modern languages use static (or lexical) scoping. 38
WHAT CONSTITUTES A SCOPE? (A BIG PICTURE) Algol C Java Ada Package n/a n/a yes yes Class n/a n/a nested yes Function nested yes yes nested Block nested nested nested nested For Loop no no yes automatic Two different scopes are either nested or disjoint. In disjoint scopes, same name can be bound to different entities without interference. The scope in which a name is defined or declared is called its defining scope. A reference to a name is nonlocal if it occurs in a nested scope of the defining scope; otherwise, it is local. 39
EXAMPLE SCOPES IN C 1 void sort (float a[ ], int size) { 2 int i, j; 3 for (i = 0; i < size; i++) // i, size local 4 for (j = i + 1; j < size; j++) 5 if (a[j] < a[i]) { // a, i, j local 6 float t; 7 t = a[i]; // t local; a, i nonlocal 8 a[i] = a[j]; 9 a[j] = t; 10 } Forward Reference Only 11 } 40
OUT-OF-SCOPE VARIABLE REFERENCE for (int i = 0; i < 10; i++) { System.out.println(i); // i is local to for body... }... i... // invalid reference to i 41
SYMBOL TABLE A symbol table is a data structure kept by a translator that allows it to keep track of each declared name and its binding. Assume for now that each name is unique within its local scope. The data structure can be any implementation of a dictionary, where the name is the key. 42