10 High Level Languages This Course Java (Object Oriented) Jython in Java Relation ASP RDF (Horn Clause Deduction, Semantic Web) Dr. Philip Cannata 1
Dr. Philip Cannata 2
Programming Languages Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis Tokenizing Syntactic Analysis Parsing Hmm Concrete Syntax Noam Chomsky Hmm Abstract Syntax Dr. Philip Cannata 3
Chomsky Hierarchy Regular grammar used for tokenizing Context-free grammar (BNF) used for parsing Context-sensitive grammar not really used for programming languages Dr. Philip Cannata 4
Simplest; least powerful Equivalent to: Regular expression (think of perl) Finite-state automaton Right regular grammar: ω Terminal*, A and B Nonterminal A ω B A ω Example: Integer 0 Integer 1 Integer... 9 Integer 0 1... 9 Regular Grammar Dr. Philip Cannata 5
Regular Grammar Less powerful than context-free grammars The following is not a regular language { aⁿ bⁿ n 1 } i.e., cannot balance: ( ), { }, begin end Dr. Philip Cannata 6
Regular Expressions x a character x \x an escaped character, e.g., \n { name } a reference to a name M N M or N M N M followed by N M* zero or more occurrences of M M+ One or more occurrences of M M? Zero or one occurrence of M [aeiou] the set of vowels [0-9] the set of digits. any single character Dr. Philip Cannata 7
Regular Expressions Dr. Philip Cannata 8
Regular Expressions Dr. Philip Cannata 9
Finite State Automaton for Identifiers (S, a2i$) (I, 2i$) (I, i$) (I, $) (F, ) Thus: (S, a2i$) * (F, ) Dr. Philip Cannata 10
Deterministic Finite State Automaton Examples Dr. Philip Cannata 11
Context-Free Grammar Production: α β α Nonterminal β (Nonterminal Terminal)* ie, lefthand side is a single nonterminal, and righthand side is a string of nonterminals and/or terminals (possibly empty). Dr. Philip Cannata 12
Context-Sensitive Grammar Production: α β α β α, β (Nonterminal Terminal)* ie, lefthand side can be composed of strings of terminals and nonterminals, however, the number of items on the left must be smaller than the number of items on the right. Dr. Philip Cannata 13
The syntax of a programming language is a precise description of all its grammatically correct programs. Precise syntax was first used with Algol 60, and has been used ever since. Three levels: Syntax Lexical syntax - all the basic symbols of the language (names, values, operators, etc.) Concrete syntax - rules for writing expressions, statements and programs. Abstract syntax - internal representation of the program, favoring content over form. Dr. Philip Cannata 14
Grammars Grammars: Metalanguages used to define the concrete syntax of a language. Backus Normal Form Backus Naur Form (BNF) Stylized version of a context-free grammar (cf. Chomsky hierarchy) First used to define syntax of Algol 60 Now used to define syntax of most major languages Production: α β α Nonterminal β (Nonterminal Terminal)* ie, lefthand side is a single nonterminal, and β is a string of nonterminals and/or terminals (possibly empty). Example Integer Digit Integer Digit Digit 0 1 2 3 4 5 6 7 8 9 Dr. Philip Cannata 15
Extended BNF (EBNF) Additional metacharacters { } a series of zero or more ( ) must pick one from a list [ ] pick none or one from a list Example Expression -> Term { ( + - ) Term } IfStatement -> if ( Expression ) Statement [ else Statement ] EBNF is no more powerful than BNF, but its production rules are often simpler and clearer. Javacc EBNF ( )* a series of zero or more ( )+ a series of one or more [ ] optional Dr. Philip Cannata 16
For more details, see Chapter 2 of Programming Language Pragmatics, Third Edition (Paperback) Michael L. Scott (Author) Dr. Philip Cannata 17
Instance of a Programming Language: int main () { return 0 ; } Internal Parse Tree Program (abstract syntax): Function = main; Return type = int params = Block: Return: Variable: return#main, LOCAL addr=0 IntValue: 0 Abstract Syntax Dr. Philip Cannata 18
Now we ll focus on the internal parse tree Dr. Philip Cannata 19
Parse Trees Integer Digit Integer Digit Digit 0 1 2 3 4 5 6 7 8 9 Parse Tree for 352 as an Integer Dr. Philip Cannata 20
Arithmetic Expression Grammar Expr Expr + Term Expr Term Term Term 0... 9 ( Expr ) Parse of 5-4 + 3 Dr. Philip Cannata 21
Associativity and Precedence A grammar can be used to define associativity and precedence among the operators in an expression. E.g., + and - are left-associative operators in mathematics; * and / have higher precedence than + and -. Consider the following grammar: Expr -> Expr + Term Expr Term Term Term -> Term * Factor Term / Factor Term % Factor Factor Factor -> Primary ** Factor Primary Primary -> 0... 9 ( Expr ) Dr. Philip Cannata 22
Parse of 4**2**3 + 5 * 6 + 7 Associativity and Precedence Dr. Philip Cannata 23
Associativity and Precedence Precedence Associativity Operators 3 right ** 2 left * / % 1 left + - Note: These relationships are shown by the structure of the parse tree: highest precedence at the bottom, and left-associativity on the left at each level. Dr. Philip Cannata 24
Ambiguous Grammars A grammar is ambiguous if one of its strings has two or more diffferent parse trees. Example: Expr -> Expr Op Expr ( Expr ) Integer Op -> + - * / % ** Equivalent to previous grammar but ambiguous Dr. Philip Cannata 25
Ambiguous Grammars Ambiguous Parse of 5 4 + 3 Dr. Philip Cannata 26
Dangling Else Ambiguous Grammars IfStatement -> if ( Expression ) Statement if ( Expression ) Statement else Statement Statement -> Assignment IfStatement Block Block -> { Statements } Statements -> Statements Statement Statement With which if does the following else associate if (x < 0) if (y < 0) y = y - 1; else y = 0; Dr. Philip Cannata 27
Dangling Else Ambiguous Grammars Dr. Philip Cannata 28
Hmm BNF (i.e., Concrete Syntax) Program : {[ Declaration ] rettype Identifier Function MyClass MyObject} Function : ( ) Block MyClass: Class Idenitifier { {rettype Identifier Function}Constructor Function } } MyObject: Identifier Identifier = create Identifier callargs Constructor: Identifier ([{ Parameter } ]) block Declaration : Type Identifier [ [Literal] ]{, Identifier [ [ Literal ] ] } Type : int bool float list tuple object string void Statements : { Statement } {rettype Identifier Statement : ; Declaration Block ForEach Assignment IfStatement WhileStatement CallStatement ReturnStatement Block : { Statements } ForEach: for( Expression <- Expression ) Block Assignment : Identifier [ [ Expression ] ]= Expression ; Parameter : Type Identifier IfStatement: if ( Expression ) Block [elseifstatement Block ] WhileStatement: while ( Expression ) Block Dr. Philip Cannata 29
Hmm BNF (i.e., Concrete Syntax) Expression : Conjunction { Conjunction } Conjunction : Equality {&&Equality } Equality : Relation [EquOp Relation ] EquOp: ==!= Relation : Addition [RelOp Addition ] RelOp: < <= > >= Addition : Term {AddOp Term } AddOp: + - Term : Factor {MulOp Factor } MulOp: * / % Factor : [UnaryOp]Primary UnaryOp: -! Primary : callorlambda IdentifierOrArrayRef Literal subexpressionortuple ListOrListComprehension ObjFunction callorlambda : Identifier callargs LambdaDef callargs : ([Expression passfunc {,Expression passfunc}] ) passfunc : Identifier (Type Identifier { Type Identifier } ) LambdaDef : (\\ Identifier {,Identifier } -> Expression) Dr. Philip Cannata 30
Hmm BNF (i.e., Concrete Syntax) IdentifierOrArrayRef : Identifier [ [Expression] ] subexpressionortuple : ([ Expression [,[ Expression {, Expression } ] ] ] ) ListOrListComprehension: [ Expression {, Expression } ] Expression[<- Expression ] {, Expression[<- Expression ] } ] ObjFunction: Identifier. Identifier. Identifier callargs Identifier : (a b z A B Z){ (a b z A B Z ) (0 1 9)} Literal : Integer True False ClFloat ClString Integer : Digit { Digit } ClFloat: 0 1 9 {0 1 9}.{0 1 9} ClString: {~[ ] } Dr. Philip Cannata 31
Associativity and Precedence for Hmm Clite Operator Associativity Unary -! none * / left + - left < <= > >= none ==!= none && left left Dr. Philip Cannata 32
Hmm Parse Tree Example z = x + 2 * y; Dr. Philip Cannata 33
Now we ll focus on the Abstract Syntax Dr. Philip Cannata 34
Hmm Parse Tree z = x + 2 * y; = Dr. Philip Cannata 35
Very Approximate Hmm Abstract Syntax Dr. Philip Cannata 36
Very Approximate Hmm Abstract Syntax Assignment = Variable target; Expression source Expression = VariableRef Value Binary Unary VariableRef = Variable ArrayRef Variable = String id ArrayRef = String id; Expression index Value = IntValue BoolValue FloatValue CharValue Binary = Operator op; Expression term1, term2 Unary = UnaryOp op; Expression term Operator = ArithmeticOp RelationalOp BooleanOp IntValue = Integer intvalue Dr. Philip Cannata 37
Hmm Abstract Syntax Binary Example z = x + 2 * y = Binary Operator + Variable x Binary Operator * Value Variable 2 y Dr. Philip Cannata 38