CJT^jL rafting Cm ompiler ij CHARLES N. FISCHER Computer Sciences University of Wisconsin Madison RON K. CYTRON Computer Science and Engineering Washington University RICHARD J. LeBLANC, Jr. Computer Science and Software Engineering Seattle University f ^ TECHNISCHE INFORMATION SB IBLIOTHEK UNIVERSITATSBIBLIOTHEK HANNOVER V J TIB/UB Hannover 133 389 162 89 Boston Columbus Indianapolis New York San Francisco Upper SaddK Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo
Contents 1 Overview 29 1.1 History of Compilation 30 1.2 What Compilers Do 32 1.2.1 Machine Code Generated by Compilers 32 1.2.2 Target Code Formats 35 1.3 Interpreters 37 1.4 Syntax and Semantics 38 1.4.1 Static Semantics 39 1.4.2 Runtime Semantics 40 1.5 Organization of a Compiler 42 1.5.1 The Scanner 44 1.5.2 The Parser 44 1.5.3 The Type Checker (Semantic Analysis) 45 1.5.4 Translator (Program Synthesis) 45 1.5.5 Symbol Tables 46 1.5.6 The Optimizer 46 1.5.7 The Code Generator 47 1.5.8 Compiler Writing Tools 47 1.6 Programming Language and Compiler Design 48 1.7 Computer Architecture and Compiler Design 49 1.8 Compiler Design Considerations 50 1.8.1 Debugging (Development) Compilers 50 1.8.2 Optimizing Compilers 51 1.8.3 Retargetable Compilers 51 1.9 Integrated Development Environments 52 Exercises 54 17
18 Contents 2 Design of a Simple Compiler 59 2.1 An Informal Definition of the ac Language 60 2.2 Formal Definition of ac 61 2.2.1 Syntax Specification 61 2.2.2 Token Specification 64 2.3 Phases of a Simple Compiler 65 2.4 Scanning 66 2.5 Parsing 67 2.5.1 Predicting a Parsing Procedure 69 2.5.2 Implementing the Production 71 2.6 Abstract Syntax Trees 73 2.7 Semantic Analysis 74 2.7.1 Symbol Tables 75 2.7.2 Type Checking 76 2.8 Code Generation 79 Exercises 82 3 Theory and Practice of Scanning 85 3.1 Overview of a Scanner 86 3.2 Regular Expressions 88 3.3 Examples 91 3.4 Finite Automata and Scanners 92 3.4.1 Deterministic Finite Automata 93 3.5 The Lex Scanner Generator 97 3.5.1 Defining Tokens in Lex 98 3.5.2 The Character Class 100 3.5.3 Using Regular Expressions to Define Tokens 100 3.5.4 Character Processing Using Lex 104 3.6 Other Scanner Generators 105 3.7 Practical Considerations of Building Scanners 107 3.7.1 Processing Identifiers and Literals 108
Contents 19 3.7.2 Using Compiler Directives and Listing Source Lines.. Ill 3.7.3 Terminating the Scanner 113 3.7.4 Multicharacter Lookahead 114 3.7.5 Performance Considerations 115 3.7.6 Lexical Error Recovery 117 3.8 Regular Expressions and Finite Automata 120 3.8.1 Transforming a Regular Expression into an NFA... 121 3.8.2 Creating the DFA 122 3.8.3 Optimizing Finite Automata 125 3.8.4 Translating Finite Automata into Regular Expressions. 128 3.9 Summary 131 Exercises 134 4 Formal Grammars and Parsing 143 4.1 Context-Free Grammars 144 4.1.1 Leftmost Derivations 146 4.1.2 Rightmost Derivations 147 4.1.3 Parse Trees 147 4.1.4 Other Types of Grammars 148 4.2 Properties of CFGs 150 4.2.1 Reduced Grammars 150 4.2.2 Ambiguity 151 4.2.3 Faulty Language Definition 152 4.3 Transforming Extended Grammars 152 4.4 Parsers and Recognizers 153 4.5 Grammar Analysis Algorithms 157 4.5.1 Grammar Representation 157 4.5.2 Deriving the Empty String 158 4.5.3 First Sets 160 4.5.4 Follow Sets 164 Exercises 168
20 Contents 5 Top-Down Parsing 175 5.1 Overview 176 5.2 LL(fc) Grammars 177 5.3 Recursive-Descent LL(1) Parsers 181 5.4 Table-Driven LL(1) Parsers 182 5.5 Obtaining LL(1) Grammars 186 5.5.1 Common Prefixes 188 5.5.2 Left Recursion 189 5.6 A Non-LL(1) Language 191 5.7 Properties of LL(1) Parsers 193 5.8 Parse Table Representation 195 5.8.1 Compaction 196 5.8.2 Compression 197 5.9 Syntactic Error Recovery and Repair 200 5.9.1 Error Recovery 201 5.9.2 Error Repair 201 5.9.3 Error Detection in LL(1) Parsers 203 5.9.4 Error Recovery in LL(1) Parsers 203 Exercises 205 6 Bottom-Up Parsing 211 6.1 Overview 212 6.2 Shift-Reduce Parsers 213 6.2.1 LR Parsers and Rightmost Derivations 214 6.2.2 LR Parsing as Knitting 214 6.2.3 LR Parsing Engine 216 6.2.4 The LR Parse Table 217 6.2.5 LR(Jfc) Parsing 219 6.3 LR(0) Table Construction 223 6.4 Conflict Diagnosis 229 6.4.1 Ambiguous Grammars 231
Contents 21 6.4.2 Grammars that are not LR(fc) 234 6.5 Conflict Resolution and Table Construction 236 6.5.1 SLR(fc) Table Construction 236 6.5.2 LALR(fc) Table Construction 241 6.5.3 LALR Propagation Graph 243 6.5.4 LR(fc) Table Construction 251 Exercises 256 7 Syntax-Directed Compilation 267 7.1 Overview 267 7.1.1 Semantic Actions and Values 268 7.1.2 Synthesized and Inherited Attributes 269 7.2 Bottom-Up Syntax-Directed Translation 271 7.2.1 Example 271 7.2.2 Rule Cloning 275 7.2.3 Forcing Semantic Actions 276 7.2.4 Aggressive Grammar Restructuring 278 7.3 Top-Down Syntax-Directed Translation 279 7.4 Abstract Syntax Trees 282 7.4.1 Concrete and Abstract Trees 282 7.4.2 An Efficient AST Data Structure 283 7.4.3 Infrastructure for Creating ASTs 284 7.5 AST Design and Construction 286 7.5.1 Design 288 7.5.2 Construction 290 7.6 AST Structures for Left and Right Values 293 7.7 Design Patterns for ASTs 296 7.7.1 Node Class Hierarchy 296 7.7.2 Visitor Pattern 297 7.7.3 Reflective Visitor Pattern 300 Exercises 304
22 Contents 8 Declaration Processing and Symbol Tables 311 8.1 Constructing a Symbol Table 312 8.1.1 Static Scoping 314 8.1.2 A Symbol Table Interface 314 8.2 Block-Structured Languages and Scopes 316 8.2.1 Handling Scopes 316 8.2.2 One Symbol Table or Many? 317 8.3 Basic Implementation Techniques 318 8.3.1 Entering and Finding Names 318 8.3.2 The Name Space 321 8.3.3 An Efficient Symbol Table Implementation 322 8.4 Advanced Features 325 8.4.1 Records and Typenames 326 8.4.2 Overloading and Type Hierarchies 326 8.4.3 Implicit Declarations 328 8.4.4 Export and Import Directives 328 8.4.5 Altered Search Rules 329 8.5 Declaration Processing Fundamentals 330 8.5.1 Attributes in the Symbol Table 330 8.5.2 Type Descriptor Structures 331 8.5.3 Type Checking Using an Abstract Syntax Tree 332 8.6 Variable and Type Declarations 335 8.6.1 Simple Variable Declarations 335 8.6.2 Handling Type Names 336 8.6.3 Type Declarations 337 8.6.4 Variable Declarations Revisited 340 8.6.5 Static Array Types 343 8.6.6 Struct and Record Types 344 8.6.7 Enumeration Types 345 8.7 Class and Method Declarations 348 8.7.1 Processing Class Declarations 349 8.7.2 Processing Method Declarations 353 8.8 An Introduction to Type Checking 355 8.8.1 Simple Identifiers and Literals 359
Contents 23 8.8.2 Assignment Statements 360 8.8.3 Checking Expressions 360 8.8.4 Checking Complex Names 361 8.9 Summary 366 Exercises 368 9 Semantic Analysis 375 9.1 Semantic Analysis for Control Structures 375 9.1.1 Reachability and Termination Analysis 377 9.1.2 If Statements 380 9.1.3 While, Do, and Repeat Loops 382 9.1.4 For Loops 385 9.1.5 Break, Continue, Return, and Goto Statements... 388 9.1.6 Switch and Case Statements 396 9.1.7 Exception Handling 401 9.2 Semantic Analysis of Calls 408 9.3 Summary 416 Exercises 417 10 Intermediate Representations 423 10.1 Overview 424 10.1.1 Examples 425 10.1.2 The Middle-End 427 10.2 Java Virtual Machine 429 10.2.1 Introduction and Design Principles 430 10.2.2 Contents of a Class File 431 10.2.3 JVM Instructions 433 10.3 Static Single Assignment Form 442 10.3.1 Renaming and ^-functions 443 Exercises 446
24 Contents 11 Code Synthesis for Virtual Machines 449 11.1 Visitors for Code Generation 450 11.2 Class and Method Declarations 452 11.2.1 Class Declarations 454 11.2.2 Method Declarations 456 11.3 The MethodBodyVisitor 457 11.3.1 Constants 457 11.3.2 References to Local Storage 458 11.3.3 Static References 459 11.3.4 Expressions 459 11.3.5 Assignment 461 11.3.6 Method Calls 462 11.3.7 Field References 464 11.3.8 Array References 465 11.3.9 Conditional Execution 467 11.3.10 Loops 468 11.4 The LHSVisitor 469 11.4.1 Local References 469 11.4.2 Static References 470 11.4.3 Field References 471 11.4.4 Array References 471 Exercises 473 12 Runtime Support 477 12.1 Static Allocation 478 12.2 Stack Allocation 479 12.2.1 Field Access in Classes and Structs 481 12.2.2 Accessing Frames at Runtime 482 12.2.3 Handling Classes and Objects 483 12.2.4 Handling Multiple Scopes 485 12.2.5 Block-Level Allocation 487
Contents 25 12.2.6 More About Frames 489 12.3 Arrays 492 12.3.1 Static One-Dimensional Arrays 492 12.3.2 Multidimensional Arrays 497 12.4 Heap Management 500 12.4.1 Allocation Mechanisms 500 12.4.2 Deallocation Mechanisms 503 12.4.3 Automatic Garbage Collection 504 12.5 Region-Based Memory Management 511 Exercises 514 13 Target Code Generation 521 13.1 Translating Bytecodes 522 13.1.1 Allocating memory addresses 525 13.1.2 Allocating Arrays and Objects 525 13.1.3 Method Calls 528 13.1.4 Example of Bytecode Translation 530 13.2 Translating Expression Trees 533 13.3 Register Allocation 537 13.3.1 On-the-Fly Register Allocation 538 13.3.2 Register Allocation Using Graph Coloring 540 13.3.3 Priority-Based Register Allocation 548 13.3.4 Interprocedural Register Allocation 549 13.4 Code Scheduling 551 13.4.1 Improving Code Scheduling 555 13.4.2 Global and Dynamic Code Scheduling 556 13.5 Automatic Instruction Selection 558 13.5.1 Instruction Selection Using BURS 561 13.5.2 Instruction Selection Using Twig 563 13.5.3 Other Approaches 564 13.6 Peephole Optimization 564 13.6.1 Levels of Peephole Optimization 565 13.6.2 Automatic Generation of Peephole Optimizers 568 Exercises 570
26 Contents 14 Program Analysis and Optimization 579 14.1 Overview 580 14.1.1 Why Optimize? 581 14.2 Control Flow Analysis 587 14.2.1 Control Flow Graphs 588 14.2.2 Program and Control Flow Structure 591 14.2.3 Direct Procedure Call Graphs 592 14.2.4 Depth-First Spanning Tree 592 14.2.5 Dominance 597 14.2.6 Simple Dominance Algorithm 599 14.2.7 Fast Dominance Algorithm 603 14.2.8 Dominance Frontiers 613 14.2.9 Intervals 617 14.3 Introduction to Data Flow Analysis 630 14.3.1 Available Expressions 630 14.3.2 Live Variables 633 14.4 Data Flow Frameworks 636 14.4.1 Data Flow Evaluation Graph 636 14.4.2 Meet Lattice 638 14.4.3 Transfer Functions 640 14.5 Evaluation 643 14.5.1 Iteration 643 14.5.2 Initialization 647 14.5.3 Termination and Rapid Frameworks 648 14.5.4 Distributive Frameworks 652 14.6 Constant Propagation 655 14.7 SSA Form 659 14.7.1 Placing ^-Functions 661 14.7.2 Renaming 663 Exercises 668
Contents 27 Bibliography 683 Abbreviations 693 Pseudocode Guide 695 Index 699