Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

Similar documents
CS 321 IV. Overview of Compilation

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Lexical Analysis (ASU Ch 3, Fig 3.1)

A simple syntax-directed

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

COMPILER DESIGN LECTURE NOTES

UNIT -2 LEXICAL ANALYSIS

CPS 506 Comparative Programming Languages. Syntax Specification

LECTURE 3. Compiler Phases

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Formal Languages and Compilers Lecture VI: Lexical Analysis

Introduction to Compiler

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Projects for Compilers

CS 4201 Compilers 2014/2015 Handout: Lab 1

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

CMSC 330: Organization of Programming Languages. Context Free Grammars

Recognition of Tokens

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Syntax and Grammars 1 / 21

CS 403: Scanning and Parsing

CSE302: Compiler Design

PRACTICAL CLASS: Flex & Bison

LANGUAGE PROCESSORS. Introduction to Language processor:

Undergraduate Compilers in a Day

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.


Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Part 5 Program Analysis Principles and Techniques

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CS 441G Fall 2018 Exam 1 Matching: LETTER

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

Syntax Analysis. Chapter 4

Compilers & Translation Systems Engineering

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Group A Assignment 3(2)

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Structure of a Compiler: Scanner reads a source, character by character, extracting lexemes that are then represented by tokens.

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Building Compilers with Phoenix

Compilers and Interpreters

A Simple Syntax-Directed Translator

Chapter 3: Lexical Analysis

Lexical and Syntax Analysis

COSE312: Compilers. Lecture 1 Overview of Compilers

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

CS 415 Midterm Exam Spring SOLUTION

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

2.2 Syntax Definition

Chapter 3 Lexical Analysis

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

CA Compiler Construction

A programming language requires two major definitions A simple one pass compiler

Introduction to Compiler Design

Syntax-Directed Translation. Lecture 14

Compiler course. Chapter 3 Lexical Analysis

COP 3402 Systems Software Syntax Analysis (Parser)

Compiling Regular Expressions COMP360

Front End. Hwansoo Han

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Introduction to Lexical Analysis

Programming Language Syntax and Analysis

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Introduction to Lexing and Parsing

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Context-free grammars (CFG s)

Topic 3: Syntax Analysis I

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CSE302: Compiler Design

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

CSE 3302 Programming Languages Lecture 2: Syntax

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Time : 1 Hour Max Marks : 30

CSE 12 Abstract Syntax Trees

Introduction to Compiler Construction

CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke

UNIT III. The following section deals with the compilation procedure of any program.

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Week 2: Syntax Specification, Grammars

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Introduction to Compiler Construction

Transcription:

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 ) lizheng@mail.buct.edu.cn

Lexical and Syntax Analysis

Topic Covered Today Compilation Lexical Analysis Semantic Analysis

Compilation Translating from high-level language to machine code is organized into several phases or passes. In the early days passes communicated through files, but this is no longer necessary.

Language Specification Language Specification We must first describe the language in question by giving its specification. Syntax: Defines symbols (vocabulary) Defines programs (sentences) Semantics: Gives meaning to sentences. The formal specifications are often the input to tools that build translators automatically.

Compiler passes Compiler passes

Compiler passes Compiler passes source program Lexical scanner front end Parser symbol table manager semantic analyzer Translator error handler Optimizer Final assembly back end target program

Symbol Table Management The symbol table is a data structure used by all phases of the compiler to keep track of user defined symbols and keywords. During early phases (lexical and syntax analysis) symbols are discovered and put into the symbol table During later phases symbols are looked up to validate their usage.

Symbol Tables Regular Expression ws if then else id num < <= = < > > >= Token - if then else id num relop relop relop relop relop relop Attribute-Value - - - - pointer to table entry pointer to table entry LT LE EQ NE GT GE Note: Each token has a unique token identifier to define category of lexemes

Error Management Errors can occur at all phases in the compiler Invalid input characters, syntax errors, semantic errors, etc. Good compilers will attempt to recover from errors and continue.

Lexical analyzer Lexical analyzer Also called a scanner or tokenizer Converts stream of characters into a stream of tokens Tokens are: Keywords such as for, while, and class. Special characters such as +, -, (, and < Variable name occurrences Constant occurrences such as 1, 0, true.

Lexical analyzer The lexical analyzer is usually a subroutine of the parser. Each token is a single entity. A numerical code is usually assigned to each type of token.

Lexical analyzer Lexical analyzer Lexical analyzers perform: Line reconstruction delete comments delete white spaces perform text substitution Lexical translation: translation of lexemes -> tokens Often additional information is affiliated with a token.

Token Definitions letter A B C Z a b z digit 0 1 2 9 id letter ( letter digit )* Shorthand Notation: + : one or more r* = r + & r + = r r*? : zero or one r?=r [range] : set range of characters (replaces ) [A-Z] = A B C Z id [A-Za-z][A-Za-z0-9]*

Example of extraction lexemes and produce the corresponding tokens. Sum = oldsum value /100; Token Lexeme IDENT sum ASSIGN_OP = IDENT oldsum SUBTRACT_OP - IDENT value DIVISION_OP / INT_LIT 100 SEMICOLON ;

Parser Parser Performs syntax analysis Imposes syntactic structure on a sentence. Parse trees are used to expose the structure. These trees are often not explicitly built Simpler representations of them are often used Parsers, accepts a string of tokens and builds a parse tree representing the program

Parser Parser The collection of all the programs in a given language is usually specified using a list of rules known as a context free grammar.

Parser Parser A grammar has four components: A set of tokens known as terminal symbols A set of variables or non-terminals A set of productions where each production consists of a non-terminal, an arrow, and a sequence of tokens and/or non-terminals A designation of one of the nonterminals as the start symbol.

Abstract Syntax Tree The parse tree is used to recognize the components of the program and to check that the syntax is correct. As the parser applies productions, it usually generates the component of a simpler tree (known as Abstract Syntax Tree). The meaning of the component is derived out of the way the statement is organized in a subtree. Abstract Syntax Tree

Comparison with Lexical Analysis Phase Input Output Lexer Parser Sequence of characters Sequence of tokens Sequence of tokens Parse tree

Semantic Analyzer The semantic analyzer completes the symbol table with information on the characteristics of each identifier. The symbol table is usually initialized during parsing. One entry is created for each identifier and constant. Scope is taken into account. Two different variables with the same name will have different entries in the symbol table. Semantic Analyzer

Translator The lexical scanner, parser, and semantic analyzer are collectively known as the front end of the compiler. The second part, or back end starts by generating low level code from the (possibly optimized) AST.

Translator Rather than generate code for a specific architecture, most compilers generate intermediate language Three address code is popular. Really a flattened tree representation. Simple. Flexible (captures the essence of many target architectures). Can be interpreted.

Optimizers Intermediate code is examined and improved. Can be simple: changing a:=a+1 to increment a changing 3*5 to 15 Can be complicated: reorganizing data and data accesses for cache efficiency Optimization can improve running time by orders of magnitude, often also decreasing program size.

Code Generation Generation of real executable code for a particular target machine. It is completed by the Final Assembly phase Final output can either be assembly language for the target machine object code ready for linking The target machine can be a virtual machine (such as the Java Virtual Machine, JVM), and the real executable code is virtual code (such as Java Bytecode).

Compiler Overview Source Program IF (a<b) THEN c=1*d; Lexical Analyzer Token Sequence IF ( ID a < ID b ) THEN ID c = CONST 1 * ID d Syntax Analyzer Semantic Analyzer Code Optimizer Code Generation Syntax Tree IF_stmt 3-Address Code Optimized 3-Addr. Code Assembly Code cond_expr < list GE a, b, L1 MUlT 1, d, c L1: a b assign_stmt GE a, b, L1 MOV d, c L1: lhs rhs c * loadi R1,a cmpi R1,b jge L1 loadi R1,d storei R1,c L1: 1 d

Exercise: Abstract Syntax Tree x := a + b; y := a * b; while (y > a) { } a := a + 1; x := a + b;

Email: lizheng@mail.buct.edu.cn Web: http://cist.buct.edu.cn/staff/zheng/ Office: 科 510