Arrays and Functions

Similar documents
Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

Procedure and Function Calls, Part II. Comp 412 COMP 412 FALL Chapter 6 in EaC2e. target code. source code Front End Optimizer Back End

The Processor Memory Hierarchy

Handling Assignment Comp 412

Runtime Support for Algol-Like Languages Comp 412

Naming in OOLs and Storage Layout Comp 412

Runtime Support for OOLs Object Records, Code Vectors, Inheritance Comp 412

Code Shape II Expressions & Assignment

Parsing II Top-down parsing. Comp 412

Compiler Design. Code Shaping. Hwansoo Han

The Procedure Abstraction

The Software Stack: From Assembly Language to Machine Code

Separate compilation. Topic 6: Runtime Environments p.1/21. CS 526 Topic 6: Runtime Environments The linkage convention

(just another operator) Ambiguous scalars or aggregates go into memory

Optimizer. Defining and preserving the meaning of the program

CSE 504: Compiler Design. Runtime Environments

CS415 Compilers. Procedure Abstractions. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Intermediate Representations

Run Time Environment. Procedure Abstraction. The Procedure as a Control Abstraction. The Procedure as a Control Abstraction

CS 406/534 Compiler Construction Instruction Scheduling beyond Basic Block and Code Generation

Run Time Environment. Activation Records Procedure Linkage Name Translation and Variable Access

Instruction Selection: Preliminaries. Comp 412

Syntax Analysis, III Comp 412

Syntax Analysis, III Comp 412

Intermediate Code Generation

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Intermediate Representations

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Intermediate Code Generation

Runtime management. CS Compiler Design. The procedure abstraction. The procedure abstraction. Runtime management. V.

CA Compiler Construction

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri

Computing Inside The Parser Syntax-Directed Translation. Comp 412

The Procedure Abstraction Part I Basics

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

The Procedure Abstraction Part I: Basics

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

Overview. COMP 506 Rice University Spring target code. source code Front End Optimizer Back End

CS415 Compilers. Intermediate Represeation & Code Generation

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Lecture 7: Binding Time and Storage

Implementing Procedure Calls

Run-time Environments

Run-time Environments

CS 406/534 Compiler Construction Intermediate Representation and Procedure Abstraction

Building a Parser Part III

G Programming Languages - Fall 2012

Just-In-Time Compilers & Runtime Optimizers

Calling Conventions. See P&H 2.8 and Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Concepts Introduced in Chapter 7

Implementing Control Flow Constructs Comp 412

Compilers and Code Optimization EDOARDO FUSELLA

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Procedure and Object- Oriented Abstraction

! Those values must be stored somewhere! Therefore, variables must somehow be bound. ! How?

Compilers and computer architecture: A realistic compiler to MIPS

Chapter 3:: Names, Scopes, and Bindings

What about Object-Oriented Languages?

1 Lexical Considerations

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Example. program sort; var a : array[0..10] of integer; procedure readarray; : function partition (y, z :integer) :integer; var i, j,x, v :integer; :

Compilation /15a Lecture 7. Activation Records Noam Rinetzky

Introduction to Parsing. Comp 412

System Software Assignment 1 Runtime Support for Procedures

Control Abstraction. Hwansoo Han

Run-time Environment

The View from 35,000 Feet

Digital Forensics Lecture 3 - Reverse Engineering

Programming Languages

Run-time Environments - 2

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Optimization Prof. James L. Frankel Harvard University

Pointers and Arrays CS 201. This slide set covers pointers and arrays in C++. You should read Chapter 8 from your Deitel & Deitel book.

Combining Optimizations: Sparse Conditional Constant Propagation

Topic 7: Activation Records

Intermediate Representations Part II

Declaring Pointers. Declaration of pointers <type> *variable <type> *variable = initial-value Examples:

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Compiler Design. Dr. Chengwei Lei CEECS California State University, Bakersfield

Grammars. CS434 Lecture 15 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Procedures and Stacks

Chapter 9 :: Subroutines and Control Abstraction

Special Topics: Programming Languages

Intermediate Representations

Storage in Programs. largest. address. address

Introduction & First Content Comp 412

Lexical Considerations

Local Register Allocation (critical content for Lab 2) Comp 412

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING

143A: Principles of Operating Systems. Lecture 4: Calling conventions. Anton Burtsev October, 2017

CSC 2400: Computing Systems. X86 Assembly: Function Calls

Chap. 8 :: Subroutines and Control Abstraction

Transcription:

COMP 506 Rice University Spring 2018 Arrays and Functions source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 506 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved Chapters 6 & 7 in EaC2e.

Expanding the Expression Grammar The Classic, Left-Recursive, Expression Grammar 1 Goal Expr 2 Expr Expr +Term 3 Expr - Term 4 Term 5 Term Term * Factor 6 Term / Factor 7 Factor 8 Factor ( Expr ) 9 number 10 ident The grammar is great for x - 2 * y Not all programs operate over the domain of ident and number More complex programs require more complex expressions No mention of array references, structure references, or function calls COMP 412, Fall 2018 2

Expanding the Expression Grammar The Classic, Left-Recursive, Expression Grammar 1 Goal Expr 2 Expr Expr +Term 3 Expr - Term 4 Term 5 Term Term * Factor 6 Term / Factor 7 Factor 8 Factor ( Expr ) 9 number 10 ident 11 ident [ Exprs ] 12 ident ( Exprs ) References to aggregates & functions First, the compiler must recognize expressions that refer to arrays, structures, and function calls Second, the compiler must have schemes to generate code to implement those abstractions Array reference Function invocation COMP 412, Fall 2018 3

COMP 506, Lab 2 What Code Should The Compiler Generate for A[i,j]? Compiler must generate code to compute the runtime address of A[i,j] The compiler needs to know where A begins A was assigned a data area and an offset during storage layout Compile-time knowledge is tied to runtime behavior Assume that @A is the address of A s first element Lecture 9 The compiler must have a plan for A s internal layout Programming language usually dictates array element layout Three common choices Row-major order, column-major order, & indirection vectors Compiler needs a formula to calculate the address of an array element General scheme: compute the element address, then issue a load (rvalue) or a store (lvalue) A The Concept 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 COMP The demo 412, Fall documents 2018 do not specify how arrays are laid out. It is your choice. 4

Array Layout Row-Major Order Lay out as a sequence of consecutive rows Rightmost subscript varies fastest A[1,1], A[1,2], A[1,3], A[2,1], A[2,2], A[2,3] Storage Layout Stride one access: successive references in the loop are to adjacent locations in virtual memory. Stride one access maximizes spatial reuse & effectiveness of hardware prefetch units. In row-major order, stride one is along rows. A 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 Stride One Access for ( i = 0; i < n; i++) for ( j = 0; j < n; j++) A[i][j] = 0; Declared arrays in C (and most languages) A The Concept 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 COMP 412, Fall 2018 5

Array Layout Column-Major Order Lay out as a sequence of columns Leftmost subscript varies fastest A[1,1], A[2,1], A[1,2], A[2,2], A[1,3], A[2,3] In column-major order, stride one access is along the columns. Storage Layout A 1,1 2,1 1,2 2,2 1,3 2,3 1,4 2,4 Stride One Access for ( j = 0; j < n; i++) for ( i =0; i < n; j++) A[i][j] = 0; Declared arrays in FORTRAN A The Concept 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 COMP 412, Fall 2018 6

Array Layout Indirection Vectors Vector of pointers to pointers to to values Takes much more space, trades indirection for arithmetic Not amenable to analysis Storage Layout A 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 Stride One Access No reference pattern guarantees stride one access in cache, unless rows are contiguous Arrays in Java int **array; in C 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 COMP 412, Fall 2018 7 A The Concept

Welcome to high-school algebra Computing an Address for an Array Element The General Scheme for an Array Reference Compute an address Issue a load Color Code: Invariant Varying A[ i ] @A + ( i low ) x sizeof(a[1]) In general: base(a) + ( i low ) x sizeof(a[1]) @A is the base address of A. Depending on how A is declared, @A may be an offset from the pointer to the procedure s data area, an offset from some global label, or an arbitrary address (e.g., a pointer) The first two are compile-time constants. low Lower bound on index COMP 412, Fall 2018 8

Computing an Address for an Array Element A[ i ] @A + ( i low ) x sizeof(a[1]) In general: base(a) + ( i low ) x sizeof(a[1]) int A[1:10] Þ low is 1 Make low be 0 for faster access (saves a subtraction) Often a power of 2, known at compile-time Þ use a shift for speed low Lower bound on index Take away point: vector access costs a couple of operations COMP 412, Fall 2018 9

Assume sizeof(a[1]) is 1, for simplicity. Computing an Address for an Array Element A[ i ] @A + ( i low ) x sizeof(a[1]) In general: base(a) + ( i low ) x sizeof(a[1]) What about A[i 1,i 2 ] in A[1:5,1:4]? low 1 Lower bound on row index 1 high 1 Upper bound on row index 5 low 2 Lower bound on column index 1 high 2 Upper bound on column index 4 Row-major order, two dimensions @A + (( i 1 low 1 ) x (high 2 low 2 + 1) + i 2 low 2 ) x sizeof(a[1]) A[4,3] 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 5,1 5,2 5,3 5,4 (i-low 1 ) is 3 (high 2 low 2 + 1) is 4 3 x 4 is 12 i 2 low 2 is 2 Combined, the 2 terms take us to the start of A[4,3] COMP Address 412, computation Fall 2018 took 3 adds, 3 subtracts, 1 multiply and 1 shift 10

Array Layout All three schemes generalize for higher dimensions Row-Major Order Lay out as a sequence of consecutive rows Rightmost subscript varies fastest A[1,1], A[1,2], A[1,3], A[2,1], A[2,2], A[2,3] Storage Layout A 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 Address Polynomial for A[i,j] @A + (( i 1 low 1 ) x (high 2 low 2 + 1) + i 2 low 2 ) x sizeof(a[1]) Why does C start arrays indices at zero? To avoid those subtractions COMP This polynomial 412, Fall 2018 follows Horner s rule for evaluation. 11

Array Layout Column-Major Order Lay out as a sequence of columns Leftmost subscript varies fastest A[1,1], A[2,1], A[1,2], A[2,2], A[1,3], A[2,3] Storage Layout A 1,1 2,1 1,2 2,2 1,3 2,3 1,4 2,4 Address Polynomial for A[i,j] @A + (( i 2 low 2 ) x (high 1 low 1 + 1) + i 1 low 1 ) x sizeof(a[1]) COMP 412, Fall 2018 12

Array Layout Indirection Vectors Vector of pointers to pointers to to values Takes much more space, trades indirection for arithmetic Not amenable to analysis Storage Layout A 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 A[i 1 ] sub i 1, low => r 1 add @A, r 1 => r 2 Address Polynomial for A[i,j] *(A[i 1 ])[i 2 ] where each of the [i] s is, itself, a 1-d array reference *[i 2 ] load r 2 => r 3 sub i 2, low => r 4 add r 3, r 4 => r 5 load r 5 => r 6 Sketch of the code Back when systems supported 1 or 2 cycle indirect load operations, this COMP scheme 412, was Fall efficient. 2018 It replaces a multiply & an add with an indirect load. 13

Array Address Calculations In scientific codes, array address calculations are a major cost Each additional dimension adds more arithmetic Efficiency in address calculation is a critical issue for performance A[i+1,j], A[i,j], A[i,j+1] should all have some common terms Horner s rule evaluation hides the common terms Improving these calculations is a matter of algebra Improving the efficiency of array address calculations has been a major focus of code optimization (& hardware design) for the last 40 years Generate better code Transform it to fit the context Design memory systems that support common array access patterns Optimize hardware for stride one access Include sophisticated prefetch engines that recognize access patterns COMP 412, Fall 2018 14

Array Address Calculations: Better Code for A[i,j] Distribute x over + & - In row-major order @A + ((i low 1 ) x (high 2 low 2 + 1) + j low 2 ) x w Can be re-distributed to @A + (i low 1 ) x (high 2 low 2 +1) x w + (j low 2 ) x w Which can be re-distributed to @A + i x (high 2 low 2 +1) x w + j x w (low 1 x (high 2 low 2 +1) x w) - (low 2 x w) If low i, high i, and w are known, the last term is a constant Define @A 0 as @A (low 1 x (high 2 low 2 +1) x w - low 2 x w) And len 2 as (high 2 -low 2 +1) Then, the address expression becomes @A 0 + (i x len 2 + j ) x w where w = sizeof(a[1,1]) If @A is known, @A 0 is a known constant. Just a couple of operations: 2 adds, 1 multiply, and 1 shift Color Code: Invariant Varying Compile-time constants COMP 412, Fall 2018 15

Does Layout Matter? Which loop is faster? for (x=0; x<n; x++) for (y=0; y<n; y++) A[x][y] = 0; for (y=0; y<n; y++) for (x=0; x<n; x++) A[x][y] = 0; p = & A[0][0]; t = n * n; for (x=0; x<t; x++) *p++ = 0; All data collected on a quiescent, multiuser Intel T9600 @ 2.8 GHz using code compiled with gcc 4.1 O3 All three loops have distinct performance 0.51 seconds on 10,000 x 10,000 array 1.65 seconds on 10,000 x 10,000 array 0.11 seconds on 10,000 x 10,000 array ~ 5x ~ 15x Conventional wisdom suggests using bzero((void *) &A[0][0],(size_t) n * n * sizeof(int)); 0.52 seconds on 10,000 x 10,000 array COMP 412, Fall 2018 16

Expanding the Expression Grammar The Classic, Left-Recursive, Expression Grammar 1 Goal Expr 2 Expr Expr +Term 3 Expr - Term 4 Term 5 Term Term * Factor 6 Term / Factor 7 Factor 8 Factor ( Expr ) 9 number 10 ident 11 ident [ Exprs ] 12 ident ( Exprs ) References to aggregates & functions First, the compiler must recognize expressions that refer to arrays, structures, and function calls Second, the compiler must have schemes to generate code to implement those abstractions Array reference Function invocation COMP 412, Fall 2018 17

Function Calls Another possible leaf in an expression tree is a function call r f(x) + c What code does the compiler generate for f(x)? Compiler may not have access to code for f()? (it may not be written yet) Another compilation or another compiler may have translated f() f(x) might be a system call, such as mmap() or malloc() Programmers expect such calls to just work COMP 412, Fall 2018 18

Function Calls Another possible leaf in an expression tree is a function call r f(x) + c What code does the compiler generate for f(x)? Compiler may not have access to code for f()? (it may not be written yet) Another compilation or another compiler may have translated f() f(x) might be a system call, such as mmap() or malloc() Programmers expect such calls to just work Functions and procedures are a fundamental unit of abstraction in PLs Call creates a new, known, and controlled environment (names, access) Compilers & system need a convention for how procedures interact The linkage convention is the social contract that makes it all wok COMP 412, Fall 2018 19

Linkage Convention The linkage convention plays a critical role in construction of systems Makes separate compilation possible Compiler can generate code to the convention s specifications Adherence to the convention means that things just work Divides responsibility for the runtime environment & behavior among the caller, the callee, & the system Naming and address space management Control flow among functions & procedures (&, to some extent processes) The linkage convention is typically a system-wide convention Allows interoperability among languages & compilers Code compiled with gcc can link with code from clang Admits only limited opportunities for customization & optimization COMP 412, Fall 2018 20

Linkage Convention: High-Level View The linkage must: Preserve & protect the caller s environment from callee (& its callees) For example, it must preserve values in registers that are live across the call Create a new environment for the callee (new name space) At runtime, it creates a local data area for the callee & ties it to the context Map names & resources from caller into the callee, as needed To accomplish these goals: Convention divides responsibility between caller & callee Caller & callee must use the same set of conventions Implies limited opportunities for customization & optimization We expect compiled code to work even when created with different compilers, perhaps in different languages fee foe fie foe must operate in multiple distinct calling contexts. COMP 412, Fall 2018 21

Linkage Convention A procedure or function has two distinct representations Code that implements the procedure Code for all the statements Code to implement linkage convention Labels on entry points and return points Use name mangling to create consistent labels Data that represents an invocation Activation record (AR) Local data + saved context of caller Control information Storage to preserve caller s env t Chain ARs together to create context Static data area (name mangling) @fee_: prolog pre-call @foe_: prolog fee s AR foe s AR ARP @fee_r1: post-return epilog #fee_: #foe_: epilog COMP 412, Fall 2018 22

The Meta Issue One of the most difficult aspects of understanding how compilers work is keeping track of what happens at compiler-design time, compiler-build time, compile time, and run time. At compile time, the compiler emits code for the application program, including all of its procedures (and their prologs, epilogs, & sequences) The compiler uses symbol tables to keep track of names Those tables are compile-time data structures At runtime, those sequences (prolog, epilog, pre-call, & post-return) create, populate, use, and destroy ARs for invocations of procedures The running code stores data (both program data and control information in Ars Those ARs are run-time data structures To confuse matters further, the compiler may preserve the symbol tables so that the runtime debugging environment can locate variables and values, and locate the various data areas and the ARs COMP 412, Fall 2018 23

Linkage Convention Generated at compile time Executedat runtime A procedure or function has two distinct representations (managed by the linkage code) Code that implements the procedure Code for all the statements Code to implement linkage convention Labels on entry points and return points Use name mangling to create consistent labels @fee_: prolog pre-call @foe_: prolog Created & used at runtime Data that represents an invocation Activation record (AR) Local data + saved context of caller Control information Storage to preserve caller s env t Chain ARs together to create context Static data area (name mangling) @fee_r1: post-return epilog #fee_: #foe_: epilog COMP 412, Fall 2018 24 ARP fee s AR foe s AR

The Code: Procedure Linkages Standard Procedure Linkage procedure main prolog z = a*a + b*b x = sqrt( y,z) L: pre-call post-return epilog caller function sqrt prolog epilog callee Each procedure has A standard prolog A standard epilog Each call uses A pre-call sequence A post-return sequence The code for the pre-call and post-return sequences is completely predictable from the call site. It depends on the number & the types of the actual parameters. COMP Note: 412, The gap Fall between 2018 pre-call & post-return code is artistic license. 25

Digression: Name Mangling Compilers use assembly labels to bridge the gap across separate compile steps and from compile time to runtime Howcan this possibly work? It is a social contract among the compilers on a given system Agreement on how to form a label from a procedure or variable name The label is a direct function of the name and its purpose For a procedure foo : Code entry point might be #_foo Static data area might be #_foo_ Profile data might be stored at #_@foo_ For a global variable baz : Start of variable might be at #_%baz_ Details: Must use characters that are not valid in identifiers Must be consistent across all implementations that are linked together System-wide agreement is typical COMP 412, Fall 2018 26

The Code: Procedure Linkages Standard Procedure Linkage procedure main prolog z = a*a + b*b x = sqrt( y,z) L: pre-call post-return epilog caller function sqrt prolog epilog callee These four code sequences: Allocate and initialize a new AR for the callee Populate that AR with needed values & context Preserve parts of the caller s environment that might be modified by callee Contents of registers, pointer to its activation record, Handle parameter passing Evaluate parameters in caller Make them available in callee Transfer control from caller to callee & back (LIFO behavior) COMP 412, Fall 2018 27

The Data: Activation Records A procedure s activation record (AR) combines the control information needed to manage procedure calls (return address, register save area, addressability, & return value) and the procedure s local data area(s) Each time a procedure is invoked, or activated, it needs an AR Allocated on a runtime stack or in the runtime heap The pre-call & post-return sequences manage the ARs Pre-call creates AR & initializes it Post-return cleans up the AR & deallocates it The prolog & epilog sequences manipulate environment data in the AR Create & destroy parts of the new environment Preserve & restore parts of the old environment (depends on needs) The code cannot refer to ARs with mangled labels, because each invocation of ARs are runtime structures, laid out at compile time a procedure creates a unique AR. Thus, the runtime environment has Meta-Issue: Activation records are created, used and destroyed at an AR pointer, the ARP. COMP runtime. 412, The Fall code 2018to maintain them is emitted at compile time. 28 ARP AR