G Programming Languages Spring 2010 Lecture 1. Robert Grimm, New York University

Similar documents
G Programming Languages - Fall 2012

G Programming Languages - Fall 2018

8/27/17. CS-3304 Introduction. What will you learn? Semester Outline. Websites INTRODUCTION TO PROGRAMMING LANGUAGES

Topic I. Introduction and motivation References: Chapter 1 of Concepts in programming languages by J. C. Mitchell. CUP, 2003.

CSCI312 Principles of Programming Languages!

Introduction to Lexing and Parsing

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Concepts in Programming Languages

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Part 5 Program Analysis Principles and Techniques

Programming Languages, Summary CSC419; Odelia Schwartz

General Concepts. Abstraction Computational Paradigms Implementation Application Domains Influence on Success Influences on Design

Week 2: Syntax Specification, Grammars

Com S 541. Programming Languages I

What Is Computer Science? The Scientific Study of Computation. Expressing or Describing

Concepts of Programming Languages

ECS 120 Lesson 7 Regular Expressions, Pt. 1

CPS 506 Comparative Programming Languages. Syntax Specification

LECTURE 3. Compiler Phases

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

CS 415 Midterm Exam Spring 2002

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Grammars and Parsing. Paul Klint. Grammars and Parsing

CSE 3302 Programming Languages Lecture 2: Syntax

CMSC 330: Organization of Programming Languages. Context Free Grammars

Organization of Programming Languages (CSE452) Why are there so many programming languages? What makes a language successful?

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

Chapter 11 :: Functional Languages

Compilers and computer architecture From strings to ASTs (2): context free grammars

COP 3402 Systems Software Syntax Analysis (Parser)

! Broaden your language horizons! Different programming languages! Different language features and tradeoffs. ! Study how languages are implemented

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

CS 242. Fundamentals. Reading: See last slide

LECTURE 1. Overview and History

COMPILER DESIGN LECTURE NOTES

CST-402(T): Language Processors

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Lecture 4: Syntax Specification

CS A331 Programming Language Concepts

! Broaden your language horizons. ! Study how languages are implemented. ! Study how languages are described / specified

Early computers (1940s) cost millions of dollars and were programmed in machine language. less error-prone method needed

Programming Lecture 3

Lecture 09. Ada to Software Engineering. Mr. Mubashir Ali Lecturer (Dept. of Computer Science)

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Homework. Lecture 7: Parsers & Lambda Calculus. Rewrite Grammar. Problems

CS164: Midterm I. Fall 2003

2. Lexical Analysis! Prof. O. Nierstrasz!

CSCE 314 Programming Languages

Introduction. A. Bellaachia Page: 1

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

Programming Language Pragmatics

Chapter 2 - Programming Language Syntax. September 20, 2017

Syntax. 2.1 Terminology

Building Compilers with Phoenix

Seminar in Programming Languages

Principles of Programming Languages. Lecture Outline

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

G Programming Languages - Fall 2012

Programmiersprachen (Programming Languages)

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Formal Languages and Compilers Lecture VI: Lexical Analysis

CS101 Introduction to Programming Languages and Compilers

Compiler Construction D7011E

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen

Consider a description of arithmetic. It includes two equations that define the structural types of digit and operator:

CSCI.4430/6969 Programming Languages Lecture Notes

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Functional Programming Language Haskell

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Lexical Analysis. Introduction

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Chapter 1. Preview. Reason for Studying OPL. Language Evaluation Criteria. Programming Domains

G Programming Languages - Fall 2012

CMSC 330: Organization of Programming Languages

G Programming Languages Spring 2010 Lecture 6. Robert Grimm, New York University

CMSC 330: Organization of Programming Languages

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

A programming language requires two major definitions A simple one pass compiler

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Programming Languages 2nd edition Tucker and Noonan"

SKILL AREA 304: Review Programming Language Concept. Computer Programming (YPG)

A Tour of Language Implementation

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

ECE251 Midterm practice questions, Fall 2010

Functional Languages. CSE 307 Principles of Programming Languages Stony Brook University

Semantic Analysis. Lecture 9. February 7, 2018

Chapter 1. Preliminaries

Transcription:

G22.2110-001 Programming Languages Spring 2010 Lecture 1 Robert Grimm, New York University 1

Course Information Online at http://cs.nyu.edu/rgrimm/teaching/sp10-pl/, including grading policy and collaboration policy. Acknowledgments I will draw on a number of sources throughout the semester, notably on previous versions of this course by Clark Barrett, Allan Gottlieb, Martin Hirzel, and (indirectly) Ed Osinski. Sources for today s lecture: PLP, chapters 1-2. Clark Barrett. Lecture notes, Fall 2008. 2

What is a Program? 3

Programs What is a Program? A fulfillment of a customer s requirements An idea in your head A sequence of characters A mathematical interpretation of a sequence of characters A file on a disk Ones and zeroes Electromagnetic states of a machine In this class, we will typically take the view that a program is a sequence of characters, though we recognize the close association to the other possible meanings. A Programming Language describes what sequences are allowed (the syntax) and what they mean (the semantics). 4

A Brief History of Programming Languages The first computer programs were written in machine language. Machine language is just a sequence of ones and zeroes. The computer interprets sequences of ones and zeroes as instructions that control the central processing unit (CPU) of the computer. The length and meaning of the sequences depends on the CPU. Example On the 6502, an 8-bit microprocessor used in the Apple II computer, the following bits add 1 plus 1: 10101001000000010110100100000001. Or, using base 16, a common shorthand: A9016901. Programming in machine language requires an extensive understanding of the low-level details of the computer and is extremely tedious if you want to do anything non-trivial. But it is the most straightforward way to give instructions to the computer: no extra work is required before the computer can run the program. 5

A Brief History of Programming Languages Before long, programmers started looking for ways to make their job easier. The first step was assembly language. Assembly language assigns meaningful names to the sequences of bits that make up instructions for the CPU. A program called an assembler is used to translate assembly language into machine language. Example The assembly code for the previous example is: LDA #$01 ADC #$01 Question: How do you write an assembler? Answer: in machine language! 6

A Brief History of Programming Languages As computers became more powerful and software more ambitious, programmers needed more efficient ways to write programs. This led to the development of high-level languages, the first being FORTRAN. High-level languages have features designed to make things much easier for the programmer. In addition, they are largely machine-independent: the same program can be run on different machines without rewriting it. But high-level languages require an interpreter or compiler. An interpreter implements a (virtual) machine, whose language is the high-level language. In contrast, a compiler converts high-level programs into machine language. More on this later... Question: How do you write an interpreter or compiler? Answer: in assembly language (at least the first time) 7

Programming Languages There are now thousands of programming languages. Why are there so many? Evolution: old languages evolve into new ones as we discover better and easier ways to do things: ALGOL BCPL C C++ Special Purposes: some languages are designed specifically to make a particular task easier. For example, ML was originally designed to write proof tactics for a theorem prover. Personal Preference: Programmers are opinionated and creative. If you don t like any existing programming language, why not create your own? 8

Programming Languages Though there are many languages, only a few are widely used. What makes a language successful? 9

Programming Languages What makes a language successful? Expressive Power: Though most languages share the same theoretical power (i.e. they are all Turing complete), it is often much easier to accomplish a task in one language as opposed to another. Ease of Use for Novices: BASIC, LOGO, PASCAL, and even JAVA owe much their popularity to the fact that they are easy to learn. Ease of Implementation: Languages like BASIC, PASCAL, JAVA were designed to be easily portable from one platform to another. Standardization: C, C++, and C# all have well-defined international standards, ensuring source code portability. Open Source: C owes much of its popularity to being closely associated with open source projects. Excellent Compilers: A lot of resources were invested into writing good compilers for FORTRAN, one reason why it is still used today for many scientific applications. Economics, Patronage, Inertia: Some languages are supported by large and powerful organizations: ADA by the Department of Defense; C# by Microsoft. Some languages live on because of a large amount of legacy code. 10

Language Design Language design is influenced by various viewpoints Programmers: Desire expressive features, predictable performance, supportive development environment Implementors: Prefer languages to be simple and semantics to be precise Verifiers/testers: Languages should have rigorous semantics and discourage unsafe constructs The interplay of design and implementation in particular is emphasized in the text and we will return to this theme periodically. 11

Classifying Programming Languages Programming Paradigms Imperative (von Neumann): FORTRAN, PASCAL, C, ADA programs have mutable storage (state) modified by assignments the most common and familiar paradigm Object-Oriented: SIMULA 67, SMALLTALK, ADA, JAVA, C# data structures and their operations are bundled together inheritance, information hiding Functional (applicative): SCHEME, LISP, ML, HASKELL based on lambda calculus functions are first-class objects side effects (e.g., assignments) discouraged Logical (declarative): PROLOG, MERCURY programs are sets of assertions and rules Scripting: Unix shells, PERL, PYTHON, TCL, PHP, JAVASCRIPT Often used to glue other programs together. 12

Classifying Programming Languages Hybrids Imperative + Object-Oriented: C++ Functional + Logical: CURRY Functional + Object-Oriented: O CAML, OHASKELL Concurrent Programming Not really a category of programming languages Usually implemented with extensions within existing languages 13

Classifying Programming Languages Compared to machine or assembly language, all others are high-level. But within high-level languages, there are different levels as well. Somewhat confusingly, these are also referred to as low-level and high-level. Low-level languages give the programmer more control (at the cost of requiring more effort) over how the program is translated into machine code. C, FORTRAN High-level languages hide many implementation details, often with some performance cost BASIC, LISP, SCHEME, ML, PROLOG, Wide-spectrum languages try to do both: ADA, C++, (JAVA) High-level languages typically have garbage collection and are often interpreted. The higher the level, the harder it is to predict performance (bad for real-time or performance-critical applications) 14

Programming Idioms All general-purpose languages have essentially the same capabilities But different languages can make the same task difficult or easy Try multiplying two Roman numerals Idioms in language A may be useful inspiration when writing in language B. 15

Programming Idioms Copying a string q to p in C: while (*p++ = *q++) ; Removing duplicates from the list @xs in PERL: my %seen = (); @xs = grep {! $seen{$_}++; } @xs; Computing the sum of numbers in list xs in HASKELL: foldr (+) 0 xs Some of these may seem natural to you; others may seem counterintuitive. One goal of this class is for you to become comfortable with many different idioms. 16

Characteristics of Modern Languages Modern general-purpose languages (e.g., ADA, C++, JAVA) have similar characteristics: large number of features (grammar with several hundred productions, 500 page reference manuals,...) a complex type system procedural mechanisms object-oriented facilities abstraction mechanisms, with information hiding several storage-allocation mechanisms facilities for concurrent programming (not C++) facilities for generic programming (after a while for JAVA) development support including editors, libraries, compilers We will discuss many of these in detail this semester. 17

Compilation Overview Major phases of a compiler: 1. Lexer: Text Tokens 2. Parser: Tokens Parse Tree 3. Type checking: Parse Tree Parse Tree + Types 4. Intermediate code generation: Parse Tree Intermed. Representation (IR) 5. Optimization I: IR IR 6. Target code generation: IR assembly/machine language 7. Optimization II: target language target language 18

Syntax and Semantics Syntax refers to the structure of the language, i.e. what sequences of characters are well-formed programs. Formal specification of syntax requires a set of rules These are often specified using grammars Semantics denotes meaning: Given a well-formed program, what does it mean? Meaning may depend on context We will not be covering semantic analysis (this is covered in the compilers course), though you can read about it in chapter 4 if you are interested. We now look at grammars in more detail. 19

Grammars A grammar G is a tuple (Σ, N, S, δ), where: N is a set of non-terminal symbols S N is a distinguished non-terminal: the root or start symbol Σ is a set of terminal symbols, also called the alphabet. We require Σ to be disjoint from N (i.e. Σ N = ). δ is a set of rewrite rules (productions) of the form: ABC... XYZ... where A, B, C, D, X, Y, Z are terminals and non-terminals. Any sequence consisting of terminals and non-terminals is called a string. The language defined by a grammar is the set of strings containing only terminal symbols that can be generated by applying the rewriting rules starting from S. 20

Grammars Consider the following grammar G: N = {S, X, Y } S = S Σ = {a, b, c} δ consists of the following rules: S b S XbY X a X ax Y c Y Y c Some sample derivations: S b S XbY aby abc S XbY axby aaxby aaaby aaabc 21

The Chomsky Hierarchy Regular grammars (Type 3) All productions have a single non-terminal on the left and a terminal and optionally a non-terminal on the right Non-terminals on the right side of rules must either always preceed terminals or always follow terminals Recognizable by finite state automaton Context-free grammars (Type 2) All productions have a single non-terminal on the left Right side of productions can be any string Recognizable by non-deterministic pushdown automaton Context-sensitive grammars (Type 1) Each production is of the form αaβ αγβ, A is a non-terminal, and α, β, γ are arbitrary strings (α and β may be empty, but not γ) Recognizable by linear bounded automaton Unrestricted grammars (Type 0) No restrictions Recognizable by turing machine 22

Regular Expressions An alternate way of describing a regular language over an alphabet Σ is with regular expressions. We say that a regular expression R denotes the language [R] (recall that a language is a set of strings). Regular expressions over alphabet Σ: ɛ denotes a character x, where x Σ, denotes {x} (sequencing) a sequence of two regular expressions RS denotes {αβ α [R], β [S ]} (alternation) R S denotes [R] [S ] (Kleene star) R denotes the set of strings which are concatenations of zero or more strings from [R] parentheses are used for grouping R? ɛ R R + RR 23

Regular Grammar Example A grammar for floating point numbers: Float Digits Digits. Digits Digits Digit Digit Digits Digit 0 1 2 3 4 5 6 7 8 9 A regular expression for floating point numbers: (0 1 2 3 4 5 6 7 8 9) + (.(0 1 2 3 4 5 6 7 8 9) + )? The same thing in PERL: [0-9]+(\.[0-9]+)? or \d+(\.\d+)? 24

Tokens Tokens are the basic building blocks of programs: keywords (begin, end, while). identifiers (myvariable, yourtype) numbers (137, 6.022e23) symbols (+, ) string literals ( Hello world ) described (mainly) by regular grammars Example: identifiers Id Letter IdRest IdRest ɛ Letter IdRest Digit IdRest Other issues: international characters, case-sensitivity, limit of identifier length 25

Backus-Naur Form Backus-Naur Form (BNF) is a notation for context-free grammars: alternation: Symb ::= Letter Digit repetition: Id ::= Letter {Symb} or we can use a Kleene star: Id ::= Letter Symb for one or more repetitions: Int ::= Digit + option: Num ::= Digit + [. Digit ] Note that these abbreviations do not add to expressive power of grammar. 26

Parse Trees A parse tree describes the way in which a string in the language of a grammar is derived: root of tree is start symbol of grammar leaf nodes are terminal symbols internal nodes are non-terminal symbols an internal node and its descendants correspond to some production for that non terminal top-down tree traversal represents the process of generating the given string from the grammar construction of tree from string is parsing 27

Ambiguity If the parse tree for a string is not unique, the grammar is ambiguous: E ::= E + E E E Id Two possible parse trees for A + B C: ((A + B) C) (A + (B C)) One solution: rearrange grammar: E ::= E + T T T ::= T Id Id Why is ambiguity bad? 28

Dangling Else Problem Consider: S ::= if E then S S ::= if E then S else S The string is ambiguous (Which then does else S2 match?) if E1 then if E2 then S1 else S2 Solutions: PASCAL rule: else matches most recent if grammatical solution: different productions for balanced and unbalanced if-statements syntactical solution: introduce explicit end-marker 29

Some Practical Considerations 30

Abstract Syntax Trees Many non-terminals inside a parse tree are artifacts of the grammar. Remember: E ::= E + T T T ::= T Id Id The parse tree for B C can be written as E(T(Id(B), Id(C))) In constrast, an abstract syntax tree (AST) captures only those tree nodes that are necessary for representing the program. In the example: T(Id(B), Id(C)) Consequently, many parsers really generate abstract syntax trees. 31

Abstract Syntax Trees Another explanation for abstract syntax tree: It s a tree capturing only semantically relevant information for a program, i.e., omitting all formatting and comments. Question 1: What is a concrete syntax tree? Question 2: When do I need a concrete syntax tree? 32

Scannerless Parsing Separating syntactic analysis into lexing and parsing helps performance. After all, regular expressions can be made very fast. But it also limits language design choices. For example, it s very hard to compose different languages with separate lexers and parsers think embedding SQL in JAVA. Scannerless parsing integrates lexical analysis into the parser, making this problem more tractable. Well, to embed SQL in JAVA, we still need support for composing grammars... (see Robert Grimm, Better Extensibility through Modular Syntax, in Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, pp. 38-51, June 2006). 33