Referencing Examples

Similar documents
Machine-Level Prog. IV - Structured Data

Systems I. Machine-Level Programming VII: Structured Data

Basic Data Types. Lecture 6A Machine-Level Programming IV: Structured Data. Array Allocation. Array Access Basic Principle

Basic Data Types The course that gives CMU its Zip! Machine-Level Programming IV: Data Feb. 5, 2008

Integral. ! Stored & operated on in general registers. ! Signed vs. unsigned depends on instructions used

Basic Data Types The course that gives CMU its Zip! Machine-Level Programming IV: Structured Data Sept. 18, 2003.

Page 1. Basic Data Types CISC 360. Machine-Level Programming IV: Structured Data Sept. 24, Array Access. Array Allocation.

Assembly IV: Complex Data Types. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Introduction to Computer Systems , fall th Lecture, Sep. 14 th

Machine-Level Programming IV: Structured Data

Giving credit where credit is due

Giving credit where credit is due

CISC 360. Machine-Level Programming IV: Structured Data Sept. 24, 2008

Floating Point Stored & operated on in floating point registers Intel GAS Bytes C Single s 4 float Double l 8 double Extended t 10/12/16 long double

Data Structures in Memory!

Machine Representa/on of Programs: Arrays, Structs and Unions. This lecture. Arrays Structures Alignment Unions CS Instructor: Sanjeev Se(a

The Hardware/So=ware Interface CSE351 Winter 2013

2 Systems Group Department of Computer Science ETH Zürich. Argument Build 3. %r8d %rax. %r9d %rbx. Argument #6 %rcx. %r10d %rcx. Static chain ptr %rdx

Lecture 7: More procedures; array access Computer Architecture and Systems Programming ( )

Machine- level Programming IV: Data Structures. Topics Arrays Structs Unions

h"p://news.illinois.edu/news/12/0927transient_electronics_johnrogers.html

1.'(*2#3%* 2+6.)* !!"##$%$&'()&*$! +',-.$+'/-$! #0!"#$1"&2"/#',$! &'!)&,21'$3)*!40*,$ ! 5*'6728'*,20*"#$! 9)#46728'*,20*"#$:*',('7;$!

Assembly IV: Complex Data Types. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Assembly IV: Complex Data Types

Machine-Level Programming IV: Structured Data

Machine Programming 4: Structured Data

Computer Organization: A Programmer's Perspective

The course that gives CMU its Zip! Floating Point Arithmetic Feb 17, 2000

Computer Systems CEN591(502) Fall 2011

CS241 Computer Organization Spring Loops & Arrays

Machine-level Programs Data

u Arrays One-dimensional Multi-dimensional (nested) Multi-level u Structures Allocation Access Alignment u Floating Point

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

Prac*ce problem on webpage

Last time. Last Time. Last time. Dynamic Array Multiplication. Dynamic Nested Arrays

Machine- Level Programming IV: x86-64 Procedures, Data

Memory Organization and Addressing

Machine Level Programming: Arrays, Structures and More

Arrays and Structs. CSE 351 Summer Instructor: Justin Hsia. Teaching Assistants: Josie Lee Natalie Andreeva Teagan Horkan.

Arrays. CSE 351 Autumn Instructor: Justin Hsia

Basic Data Types. CS429: Computer Organization and Architecture. Array Allocation. Array Access

Machine-Level Programming V: Miscellaneous Topics Sept. 24, 2002

CS429: Computer Organization and Architecture

Arrays. CSE 351 Autumn Instructor: Justin Hsia

Linux Memory Layout The course that gives CMU its Zip! Machine-Level Programming IV: Miscellaneous Topics Sept. 24, Text & Stack Example

Machine- Level Representation: Data

Giving credit where credit is due

More in-class activities / questions

Assembly I: Basic Operations. Jo, Heeseung

CS , Fall 2001 Exam 1

Assembly I: Basic Operations. Computer Systems Laboratory Sungkyunkwan University

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung

CMSC 313 Fall2009 Midterm Exam 2 Section 01 Nov 11, 2009

CAS CS Computer Systems Spring 2015 Solutions to Problem Set #2 (Intel Instructions) Due: Friday, March 20, 1:00 pm

Arrays. Young W. Lim Wed. Young W. Lim Arrays Wed 1 / 19

How to Write Fast Numerical Code Spring 2013 Lecture: Architecture/Microarchitecture and Intel Core

Arrays. Young W. Lim Mon. Young W. Lim Arrays Mon 1 / 17

EECS 213 Fall 2007 Midterm Exam

MACHINE-LEVEL PROGRAMMING IV: Computer Organization and Architecture

Assembly III: Procedures. Jo, Heeseung

ASSEMBLY III: PROCEDURES. Jo, Heeseung

CS241 Computer Organization Spring Data Alignment

1 /* file cpuid2.s */ 4.asciz "The processor Vendor ID is %s \n" 5.section.bss. 6.lcomm buffer, section.text. 8.globl _start.

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p

Assembly III: Procedures. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Homework. In-line Assembly Code Machine Language Program Efficiency Tricks Reading PAL, pp 3-6, Practice Exam 1

CSC 252: Computer Organization Spring 2018: Lecture 9

CS213. Machine-Level Programming III: Procedures

The course that gives CMU its Zip! Machine-Level Programming III: Procedures Sept. 17, 2002

Machine-Level Programming I: Introduction Jan. 30, 2001

Machine Programming 1: Introduction

Introduction to Computer Systems. Exam 1. February 22, This is an open-book exam. Notes are permitted, but not computers.

Instruction Set Architecture

Instruction Set Architecture

Questions about last homework? (Would more feedback be useful?) New reading assignment up: due next Monday

Machine-Level Programming III: Procedures

CS241 Computer Organization Spring Addresses & Pointers

Introduction to Computer Systems. Exam 1. February 22, Model Solution fp

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

Y86 Processor State. Instruction Example. Encoding Registers. Lecture 7A. Computer Architecture I Instruction Set Architecture Assembly Language View

Machine-level Programming (3)

CISC 360. Machine-Level Programming I: Introduction Sept. 18, 2008

Question 4.2 2: (Solution, p 5) Suppose that the HYMN CPU begins with the following in memory. addr data (translation) LOAD 11110

Instruction Set Architecture

CISC 360 Instruction Set Architecture

Process Layout and Function Calls

Executables & Arrays. CSE 351 Autumn Instructor: Justin Hsia

Sungkyunkwan University

CS241 Computer Organization Spring 2015 IA

The course that gives CMU its Zip! Machine-Level Programming I: Introduction Sept. 10, 2002

AS08-C++ and Assembly Calling and Returning. CS220 Logic Design AS08-C++ and Assembly. AS08-C++ and Assembly Calling Conventions

CISC 360 Midterm Exam - Review Alignment

Page # CISC 360. Machine-Level Programming I: Introduction Sept. 18, IA32 Processors. X86 Evolution: Programmerʼs View.

CS429: Computer Organization and Architecture

Process Layout, Function Calls, and the Heap

Machine-Level Programming Introduction

Procedure Calls. Young W. Lim Sat. Young W. Lim Procedure Calls Sat 1 / 27

CS , Fall 2002 Exam 1

IA32 Processors The course that gives CMU its Zip! Machine-Level Programming I: Introduction Sept. 10, X86 Evolution: Programmer s View

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Transcription:

Referencing Examples zip_dig cmu; 1 5 2 1 3 16 20 24 28 32 36 zip_dig mit; 0 2 1 3 9 36 40 44 48 52 56 zip_dig nwu; 6 0 2 0 1 56 60 64 68 72 76 Code Does Not Do ny Bounds Checking! Reference ddress Value Guaranteed? mit[3] 36 + 4* 3 = 48 3 Yes mit[5] 36 + 4* 5 = 56 6 No mit[-1] 36 + 4*-1 = 32 3 No cmu[15] 16 + 4*15 = 76?? No Out of range behavior implementation-dependent No guranteed relative allocation of different arrays 1

rray Loop Example int zd2int(zip_dig z) { int i; Original Source int zi = 0; for (i = 0; i < 5; i++) { zi = 10 * zi + z[i]; } return zi; } Transformed Version Eliminate loop variable i Convert array code to pointer code Express in do-while form No need to test at entrance int zd2int(zip_dig z) { int zi = 0; int *zend = z + 4; do { zi = 10 * zi + *z; z++; } while(z <= zend); return zi; } 2

rray Loop Implementation Registers %ecx z %eax zi %ebx zend Computations 10*zi + *z implemented as *z + 2*(zi+4*zi) z++ increments by 4 int zd2int(zip_dig z) { int zi = 0; int *zend = z + 4; do { zi = 10 * zi + *z; z++; } while(z <= zend); return zi; } # %ecx = z xorl %eax,%eax # zi = 0 leal 16(%ecx),%ebx # zend = z+4.l59: leal (%eax,%eax,4),%edx # 5*zi movl (%ecx),%eax # *z addl $4,%ecx # z++ leal (%eax,%edx,2),%eax # zi = *z + 2*(5*zi) cmpl %ebx,%ecx # z : zend jle.l59 # if <= goto loop 3

Nested rray Example #define PCOUNT 4 zip_dig pgh[pcount] = {{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }}; zip_dig pgh[4]; 1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1 76 96 116 136 156 Declaration zip_dig pgh[4] equivalent to int pgh[4][5] Variable pgh denotes array of 4 elements» llocated contiguously Each element is an array of 5 int s» llocated contiguously Row-Major ordering of all elements guaranteed 4

Nested rray llocation Declaration T [R][C]; rray of data type T R rows C columns Type T element requires K bytes rray Size R * C * K bytes rrangement Row-Major Ordering a[0][0] a[r-1][0] a[0][c-1] a[r-1][c-1] int [R][C]; [0] [0] [0] [C-1] [1] [0] [1] [C-1] [R-1] [0] [R-1] [C-1] 4*R*C Bytes 5

Nested rray Row ccess Row Vectors [i] is array of C elements Each element of type T Starting address + i * C * K int [R][C]; [0] [i] [R-1] [0] [0] [0] [C-1] [i] [0] [i] [C-1] [R-1] [0] [R-1] [C-1] +i*c*4 +(R-1)*C*4 6

Nested rray Row ccess Code int *get_pgh_zip(int index) { return pgh[index]; } Row Vector pgh[index] is array of 5 int s Starting address pgh+20*index Code Computes and returns address Compute as pgh + 4*(index+4*index) # %eax = index leal (%eax,%eax,4),%eax leal pgh(,%eax,4),%eax # 5 * index # pgh + (20 * index) 7

Nested rray Element ccess rray Elements [i][j] is element of type T ddress + (i * C + j) * K [i] [j] int [R][C]; [0] [i] [R-1] [0] [0] [0] [C-1] [i] [j] [R-1] [0] [R-1] [C-1] +i*c*4 +(i*c+j)*4 +(R-1)*C*4 8

Nested rray Element ccess Code rray Elements pgh[index][dig] is int ddress: pgh + 20*index + 4*dig Code Computes address pgh + 4*dig + 4*(index+4*index) movl performs memory reference int get_pgh_digit (int index, int dig) { return pgh[index][dig]; } # %ecx = dig # %eax = index leal 0(,%ecx,4),%edx leal (%eax,%eax,4),%eax movl pgh(%edx,%eax,4),%eax # 4*dig # 5*index # *(pgh + 4*dig + 20*index) Note: One Memory Fetch 9

Strange Referencing Examples zip_dig pgh[4]; 1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1 76 96 116 136 156 Reference ddress Value Guaranteed? pgh[3][3] 76+20*3+4*3 = 148 2 Yes pgh[2][5] 76+20*2+4*5 = 136 1 Yes pgh[2][-1] 76+20*2+4*-1 = 112 3 Yes pgh[4][-1] 76+20*4+4*-1 = 152 1 Yes pgh[0][19] 76+20*0+4*19 = 152 1 Yes pgh[0][-1] 76+20*0+4*-1 = 72?? No Code does not do any bounds checking Ordering of elements within array guaranteed 10

Variable univ denotes array of 3 elements Each element is a pointer 4 bytes Each pointer points to array of int s Multi-Level rray Example zip_dig cmu = { 1, 5, 2, 1, 3 }; zip_dig mit = { 0, 2, 1, 3, 9 }; zip_dig nwu = { 6, 0, 2, 0, 1 }; #define UCOUNT 3 int *univ[ucount] = {mit, cmu, nwu}; 160 164 168 univ 36 16 56 cmu 1 5 2 1 3 16 20 24 28 32 36 mit 0 2 1 3 9 nwu 36 40 44 48 52 56 6 0 2 0 1 56 60 64 68 72 76 11

Referencing Row in Multi-Level rray Row Vector univ[index] is pointer to array of int s Starting address Mem[univ+4*index] Code Computes address within univ Reads pointer from memory and returns it # %edx = index leal 0(,%edx,4),%eax movl univ(%eax),%eax int* get_univ_zip(int index) { return univ[index]; } # 4*index # *(univ+4*index) 12

ccessing Element in Multi-Level rray Computation Element access Mem[Mem[univ+4*index]+4*dig] Must do two memory reads First get pointer to row array Then access element within array int get_univ_digit (int index, int dig) { return univ[index][dig]; } # %ecx = index # %eax = dig leal 0(,%ecx,4),%edx movl univ(%edx),%edx movl (%edx,%eax,4),%eax # 4*index # Mem[univ+4*index] # Mem[...+4*dig] Note: Two Memory Fetches 13

160 164 168 Strange Referencing Examples cmu 1 5 2 1 3 univ 16 20 24 28 32 36 36 mit 0 2 1 3 9 16 36 40 44 48 52 56 56 nwu 6 0 2 0 1 56 60 64 68 72 76 Reference ddress Value Guaranteed? univ[2][3] 56+4*3 = 68 0 Yes univ[1][5] 16+4*5 = 36 0 No univ[2][-1] 56+4*-1 = 52 9 No univ[3][-1]???? No univ[1][12] 16+4*12 = 64 2 No Code does not do any bounds checking Ordering of elements in different arrays not guaranteed 14

Using Nested rrays Strengths C compiler handles doubly subscripted arrays Generates very efficient code voids multiply in index computation Limitation Only works if have fixed array size (i,*) (*,k) #define N 16 typedef int fix_matrix[n][n]; /* Compute element i,k of fixed matrix product */ int fix_prod_ele (fix_matrix a, fix_matrix b, int i, int k) { int j; int result = 0; for (j = 0; j < N; j++) result += a[i][j]*b[j][k]; return result; } Row-wise B Column-wise 15

Dynamic Nested rrays Strength Can create matrix of arbitrary size Programming Must do index computation explicitly Performance ccessing single element costly Must do multiplication movl 12(%ebp),%eax movl 8(%ebp),%edx imull 20(%ebp),%eax addl 16(%ebp),%eax movl (%edx,%eax,4),%eax int * new_var_matrix(int n) { return (int *) calloc(sizeof(int), n*n); } int var_ele (int *a, int i, int j, int n) { return a[i*n+j]; } # i # a # n*i # n*i+j # Mem[a+4*(i*n+j)] 16

Concept Structures Contiguously-allocated region of memory Refer to members within structure by names Members may be of different types Hidden C++ fields vtable pointer typeinfo field struct rec { int i; int a[3]; int *p; }; Memory Layout i a p 0 4 16 20 ccessing Structure Member void set_i(struct rec *r, int val) { r->i = val; } ssembly # %eax = val # %edx = r movl %eax,(%edx) # Mem[r] = val 17

Generating Pointer to Structure Member struct rec { int i; int a[3]; int *p; }; Generating Pointer to rray Element Offset of each structure member determined at compile time r i a p 0 4 16 r + 4 + 4*idx int * find_a (struct rec *r, int idx) { return &r->a[idx]; } # %ecx = idx # %edx = r leal 0(,%ecx,4),%eax # 4*idx leal 4(%eax,%edx),%eax # r+4*idx+4 18

Structure Referencing (Cont.) C Code struct rec { int i; int a[3]; int *p; }; void set_p(struct rec *r) { r->p = &r->a[r->i]; } i a p 0 4 16 i a 0 4 16 Element i # %edx = r movl (%edx),%ecx # r->i leal 0(,%ecx,4),%eax # 4*(r->i) leal 4(%edx,%eax),%eax # r+4+4*(r->i) movl %eax,16(%edx) # Update r->p 19

ligned Data lignment Primitive data type requires K bytes ddress must be multiple of K Required on some machines; advised on I32 treated differently by Linux and Windows! Motivation for ligning Data Memory accessed by (aligned) double or quad-words Inefficient to load or store datum that spans quad word boundaries Virtual memory very tricky when datum spans 2 pages Compiler Inserts gaps in structure to ensure correct alignment of fields 20

Specific Cases of lignment Size of Primitive Data Type: 1 byte (e.g., char) no restrictions on address 2 bytes (e.g., short) lowest 1 bit of address must be 0 2 4 bytes (e.g., int, float, char *, etc.) lowest 2 bits of address must be 00 2 8 bytes (e.g., double) Windows (and most other OS s & instruction sets):» lowest 3 bits of address must be 000 2 Linux:» lowest 2 bits of address must be 00 2» i.e. treated the same as a 4 byte primitive data type 12 bytes (long double) Linux:» lowest 2 bits of address must be 00 2» i.e. treated the same as a 4 byte primitive data type 21

Satisfying lignment with Structures Offsets Within Structure Must satisfy element s alignment requirement Overall Structure Placement Each structure has alignment requirement K Largest alignment of any element Initial address & structure length must be multiples of K Example (under Windows): K = 8, due to double element struct S1 { char c; int i[2]; double v; } *p; c i[0] i[1] v p+0 p+4 p+8 p+16 p+24 Multiple of 4 Multiple of 8 Multiple of 8 Multiple of 8 22

Linux vs. Windows Windows (including Cygwin): K = 8, due to double element struct S1 { char c; int i[2]; double v; } *p; c i[0] i[1] v p+0 p+4 p+8 p+16 p+24 Multiple of 4 Multiple of 8 Multiple of 8 Multiple of 8 Linux: K = 4; double treated like a 4-byte data type c i[0] i[1] v p+0 p+4 p+8 p+12 p+20 Multiple of 4 Multiple of 4 Multiple of 4 Multiple of 4 23

Effect of Overall lignment Requirement struct S2 { double x; int i[2]; char c; } *p; p must be multiple of: 8 for Windows 4 for Linux x i[0] i[1] c p+0 p+8 p+12 p+16 Windows: p+24 Linux: p+20 struct S3 { float x[2]; int i[2]; char c; p must be multiple of 4 (in either OS) } *p; x[0] x[1] i[0] i[1] c p+0 p+4 p+8 p+12 p+16 p+20 24

Ordering Elements Within Structure struct S4 { char c1; double v; char c2; int i; } *p; 10 bytes wasted space in Windows c1 v c2 i p+0 p+8 p+16 p+20 p+24 struct S5 { double v; char c1; char c2; int i; } *p; 2 bytes wasted space v c1 c2 i p+0 p+8 p+12 p+16 25

rrays of Structures Principle llocated by repeating allocation for array type In general, may nest arrays & structures to arbitrary depth struct S6 { short i; float v; short j; } a[10]; a[1].i a[1].v a[1].j a+12 a+16 a+20 a+24 a[0] a[1] a[2] a+0 a+12 a+24 a+36 26

ccessing Element within rray Compute offset to start of structure Compute 12*i as 4*(i+2i) ccess element according to its offset within structure Offset by 8 ssembler gives displacement as a + 8» Linker must set actual value struct S6 { short i; float v; short j; } a[10]; short get_j(int idx) { return a[idx].j; } # %eax = idx leal (%eax,%eax,2),%eax # 3*idx movswl a+8(,%eax,4),%eax a+0 a[0] a[i] a+12i a[i].i a+12i a[i].v 27 a+12i+8 a[i].j

Principles Union llocation Overlay union elements llocate according to largest element Can only use one field at a time union U1 { char c; int i[2]; double v; } *up; struct S1 { char c; int i[2]; double v; } *sp; c i[0] i[1] v up+0 up+4 up+8 (Windows alignment) c i[0] i[1] v sp+0 sp+4 sp+8 sp+16 sp+24 28

Implementing Tagged Union Structure can hold 3 kinds of data Only one form at any given time Identify particular kind with flag type typedef enum { CHR, INT, DBL } utype; typedef struct { utype type; union { char c; int i[2]; double v; } e; } store_ele, *store_ptr; store_ele k; k.e.c k.type k.e.i[0] k.e.i[1] k.e.v k.e 29

I32 Floating Point History 8086: first computer to implement IEEE FP separate 8087 FPU (floating point unit) 486: merged FPU and Integer Unit onto one chip Summary Hardware to add, multiply, and divide Floating point data registers Various control & status registers Floating Point Formats single precision (C float): 32 bits double precision (C double): 64 bits extended precision (C long double): 80 bits Integer Unit Instruction decoder and sequencer Data Bus FPU 30

FPU Data Register Stack FPU register format (extended precision) 79 78 64 63 s exp frac 0 FPU register stack stack grows down wraps around from R0 -> R7 FPU registers are typically referenced relative to top of stack st(0) is top of stack (Top) followed by st(1), st(2), push: increment Top, load pop: store, decrement Top Run out of stack? Overwrite! absolute view R7 R6 R5 R4 R3 R2 R1 R0 stack view st(5) st(4) st(3) st(2) st(1) st(0) st(7) st(6) Top stack grows down 31

FPU instructions Large number of floating point instructions and formats ~50 basic instruction types load, store, add, multiply sin, cos, tan, arctan, and log! Sampling of instructions: Instruction Effect Description fldz push 0.0 Load zero flds S push S Load single precision real fmuls S st(0) <- st(0)*s Multiply faddp st(1) <- st(0)+st(1); pop dd and pop 32

Floating Point Code Example Compute Inner Product of Two Vectors Single precision arithmetic Scientific computing and signal processing workhorse pushl %ebp movl %esp,%ebp pushl %ebx # setup float ipf (float x[], float y[], int n) { int i; float result = 0.0; } for (i = 0; i < n; i++) { result += x[i] * y[i]; } return result; movl 8(%ebp),%ebx # %ebx=&x movl 12(%ebp),%ecx # %ecx=&y movl 16(%ebp),%edx # %edx=n fldz # push +0.0 xorl %eax,%eax # i=0 cmpl %edx,%eax # if i>=n done jge.l3.l5: flds (%ebx,%eax,4) # push x[i] fmuls (%ecx,%eax,4) # st(0)*=y[i] faddp # st(1)+=st(0); pop incl %eax # i++ cmpl %edx,%eax # if i<n repeat jl.l5.l3: movl -4(%ebp),%ebx # finish leave ret # st(0) = result 33

Inner product stack trace 1. fldz 2. flds (%ebx,%eax,4) 3. fmuls (%ecx,%eax,4) st(0) 0 st(1) 0 st(1) 0 st(0) x[0] st(0) x[0]*y[0] 4. faddp %st,%st(1) 5. flds (%ebx,%eax,4) 6. fmuls (%ecx,%eax,4) st(0) 0 + x[0]*y[0] st(1) 0 + x[0]*y[0] st(1) 0 + x[0]*y[0] st(0) x[1] st(0) x[1]*y[1] 7. faddp %st,%st(1) st(0) 0 + x[0]*y[0] + x[1]*y[1] 34