CS 61C: Great Ideas in Computer Architecture Compilers and Floa-ng Point. Today s. Lecture

CS 61C: Great Ideas in Computer Architecture s and Floa-ng Point Instructors: Krste Asanovic, Randy H. Katz hdp://inst.eecs.berkeley.edu/~cs61c/fa12 Fall 2012 - - Lecture #13 1 New- School Machine Structures (It s a bit more complicated!) Parallel Requests Assigned to computer e.g., Search Katz Parallel Threads Assigned to core e.g., Lookup, Ads So3ware Parallel Instruc^ons >1 instruc^on @ one ^me e.g., 5 pipelined instruc^ons Parallel Data >1 data item @ one ^me e.g., Add of 4 pairs of words Hardware descrip^ons All gates @ one ^me Programming Languages Harness Parallelism & Achieve High Performance Hardware Warehouse Scale Computer Core Memory Input/Output Instruc^on Unit(s) Cache Memory Core Smart Phone Today s Logic Gates Lecture Fall 2012 - - Lecture #13 2 Computer (Cache) Core Func^onal Unit(s) A 0 +B 0 A 1 +B 1 A 2 +B 2 A 3 +B 3 Big Idea #1: Levels of Representa^on/ Today s Interpreta^on Lecture High Level Language Program (e.g., C) Assembly Language Program (e.g., MIPS) Machine Interpreta4on Assembler Machine Language Program (MIPS) Hardware Architecture DescripCon (e.g., block diagrams) Architecture Implementa4on temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; lw $t0, 0($2) lw $t1, 4($2) sw $t1, 0($2) sw $t0, 4($2) Anything can be represented as a number, i.e., data or instruc^ons 0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111! Logic Circuit DescripCon (Circuit SchemaCc Diagrams) Fall 2012 - - Lecture #13 3 Agenda Review s Administrivia Floa^ng Point Revisited And in Conclusion, Fall 2012 - - Lecture #13 4 Transla^on and Startup Many compilers produce object modules directly Sta^c linking 2.12 Transla^ng and Star^ng a Program What s a? : a program that accepts as input a program text in a certain language and produces as output a program text in another language, while preserving the meaning of that text Text must comply with the syntax rules of whichever programming language it is wriden in 's complexity depends on the syntax of the language and how much abstrac^on that programming language provides A C compiler is much simpler than C++ executes before compiled program runs Fall 2012 - - Lecture #13 5 Fall 2012 - - Lecture #13 1

Compiled Languages: Edit- Compile- Link- Run Op^miza^on Source Source Source Object Object Object Linker Executable program J gcc compiler op^ons - O1: the compiler tries to reduce size and execu^on ^me, without performing any op^miza^ons that take a great deal of compila^on ^me - O2: Op^mize even more. GCC performs nearly all supported op^miza^ons that do not involve a space- speed tradeoff. As compared to - O, this op^on increases both compila^on ^me and the performance of the generated - O3: Op^mize yet more. All - O2 op^miza^ons and also turns on the - finline- func^ons, Fall 2012 - - Lecture #13 2-7 Fall 2012 - - Lecture #13 8 MacAir 3.1 No Opt gcc Op^miza^on Experiment BubbleSort.c Dhrystone.c 3.2 s 7.4 s 3125000 dhrys/s - O2 1.5 s 2.7 s 8333333 dhrys/s MacAir 5.1 No Opt 1.9 s (1.7x) 3.0 s (2.4x) 8333333 dhrys/s (2.7x) - O2 0.8 s (1.9x) 0.7 s (3.8x) 25000000 dhrys/s (3x) Performance Equa^on Time = Seconds Program Instruc^ons Clock cycles Seconds = Program Instruc^on Clock Cycle affects this! Fall 2012 - - Lecture #13 9 Fall 2012 - - Lecture #13 10 What is Typical Benefit of Op^miza^on? What is a typical program? For now, try a toy program: BubbleSort.c #define ARRAY_SIZE 20000 int main() { int iarray[array_size], x, y, holder; for(x = 0; x < ARRAY_SIZE; x++) for(y = 0; y < ARRAY_SIZE- 1; y++) if(iarray[y] > iarray[y+1]) { holder = iarray[y+1]; iarray[y+1] = iarray[y]; iarray[y] = holder; } } Fall 2012 - - Lecture #13 11 Unop^mized MIPS Code $L3: addu $2,$3,$2 $L11: lw $3,80020($sp) lw $2,80016($sp) lw $4,80020($sp) $L9: addu $2,$3,1 slt $3,$2,20000 addu $3,$4,1 lw $2,80020($sp) move $3,$2 bne $3,$0,$L6 move $4,$3 addu $3,$2,1 sll $2,$3,2 j $L4 sll $3,$4,2 sw $3,80020 addu $3,$sp,16 $L6: ($sp) addu $4,$sp,16 addu $2,$3,$2.set noreorder j $L7 addu $3,$4,$3 lw $3,80020($sp) nop $L8: lw $2,0($2) move $4,$3.set reorder $L5: lw $3,0($3) sll $3,$4,2 sw $0,80020($sp) lw $2,80016($sp) slt $2,$3,$2 addu $4,$sp,16 $L7: addu $3,$2,1 beq $2,$0,$L9 addu $3,$4,$3 lw $2,80020($sp) sw $3,80016 lw $3,80020($sp) lw $4,0($3) ($sp) slt $3,$2,19999 addu $2,$3,1 sw $4,0($2) j $L3 bne $3,$0,$L10 move $3,$2 lw $2,80020($sp) $L4: j $L5 sll $2,$3,2 move $3,$2 $L2: $L10: addu $3,$sp,16 sll $2,$3,2 li $12,65536 lw $2,80020($sp) addu $2,$3,$2 addu $3,$sp,16 ori move $3,$2 lw $3,0($2) addu $2,$3,$2 $12,$12,0x38b0 sll $2,$3,2 sw $3,80024($sp lw $3,80024($sp) addu $13,$12,$sp addu $3,$sp,16 sw $3,0($2) addu $sp,$sp,$12 j $31 Fall 2012 - - Lecture #13 12 2

li $13,65536 ori $13,$13,0x3890 addu $13,$13,$sp sw $28,0($13) move $4,$0 addu $8,$sp,16 $L6: move $3,$0 addu $9,$4,1.p2align 3 $L10: sll $2,$3,2 addu $6,$8,$2 addu $7,$3,1 sll $2,$7,2 addu $5,$8,$2 lw $3,0($6) lw $4,0($5) - O2 op^mized MIPS Code slt $2,$4,$3 beq $2,$0,$L9 sw $3,0($5) sw $4,0($6) $L9: move $3,$7 slt $2,$3,19999 bne $2,$0,$L10 move $4,$9 slt $2,$4,20000 bne $2,$0,$L6 li $12,65536 ori $12,$12,0x38a0 addu $13,$12,$sp addu $sp,$sp,$12 j $31. What s an Interpreter? Reads and executes source statements executed one at a ^me No linking No machine genera^on, so more portable Starts execu^ng quicker, but runs much more slowly than compiled Performing the ac^ons straight from the text allows beder error checking and repor^ng to be done Interpreter stays around during execu^on Unlike compiler, some work is done a3er program starts Wri^ng an interpreter is much less work than wri^ng a compiler Fall 2012 - - Lecture #13 13 Fall 2012 - - Lecture #13 Interpreted Languages: Edit- Run Source Interpreter J vs. Interpreter Advantages Compilation: Faster Execu^on Single file to execute can do beder diagnosis of syntax and seman^c errors, since it has more info than an interpreter (Interpreter only sees one line at a ^me) Can find syntax errors before run program can op^mize Interpreter: Easier to debug program Faster development ^me Fall 2012 - - Lecture #13 2-15 Fall 2012 - - Lecture #13 16 vs. Interpreter Disadvantages Compilation: Harder to debug program Takes longer to change source, recompile, and relink Interpreter: Slower execu^on ^mes No op^miza^on Need all of source available Source larger than executable for large systems Interpreter must remain installed while the program is interpreted Java s Hybrid Approach: + Interpreter A Java compiler converts Java source into instruc^ons for the Java Virtual Machine (JVM) These instruc^ons, called bytes, are same for any computer / OS A CPU- specific Java interpreter interprets bytes on a par^cular computer Fall 2012 - - Lecture #13 Fall 2012 - - Lecture #13 17 18 3

Java s + Interpreter Why Bytes? : 7 K ; Hello.java Interpreter Hello, World! : : ; Hello.class Interpreter Plaorm- independent Load from the Internet faster than source Interpreter is faster and smaller than it would be for Java source Source is not revealed to end users Interpreter performs addi^onal security checks, screens out malicious 19 Fall 2012 - - Lecture #13 Fall 2012 - - Lecture #13 20 JVM uses Stack vs. Registers Java Bytes (Stack) vs. MIPS (Reg.) a = b + c; => iload b ; push b onto Top Of Stack (TOS) iload c ; push c onto Top Of Stack (TOS) iadd ; Next to top Of Stack (NOS) = ; Top Of Stack (TOS) + NOS istore a ; store TOS into a and pop stack Fall 2012 - - Lecture #13 21 Fall 2012 - - Lecture #13 22 Compiles bytes of hot methods into na^ve for host machine Star^ng Java Applica^ons Simple portable instruc^on set for the JVM Just In Time (JIT) compiler translates byte into machine language just before execu^on Interprets bytes Fall 2012 - - Lecture #13 23 Agenda Review s Administrivia Floa^ng Point Revisited And in Conclusion, Fall 2012 - - Lecture #13 24 4

Administrivia CS61c in the News Lab #5: MIPS Assembly HW #4 (of six), due Sunday Project 2a: MIPS Emulator, due Sunday Midterm, two weeks from Tuesday 25 26 Data Barns CS61c in the News CS61c in the News Most data centers, by design, consume vast amounts of energy in an incongruously wasteful manner, interviews and documents show. Online companies typically run their facili^es at maximum capacity around the clock, whatever the demand. As a result, data centers can waste 90 percent or more of the electricity they pull oﬀ the grid, The Times found. Worldwide, the digital warehouses use about 30 billion waos of electricity, roughly equivalent to the output of 30 nuclear power plants, according to es^mates industry experts compiled for The Times. The consul^ng ﬁrm McKinsey & Company analyzed energy use by data centers and found that, on average, they were using only 6 percent to 12 percent of the electricity powering their servers to perform computa^ons. The rest was essen^ally used to keep servers idling and ready in case of a surge in ac^vity that could slow or crash their opera^ons. 27 Get To Know Your Professor 28 Agenda 29 Review s Administrivia Floa^ng Point Revisited And in Conclusion, 30 5

Goals for Floa^ng Point Standard arithme^c for reals for all computers Like two s complement Keep as much precision as possible in formats Help programmer with errors in real arithme^c +, -, Not- A- Number (NaN), exponent overflow, exponent underflow Keep encoding that is somewhat compa^ble with two s complement E.g., 0 in Fl. Pt. is 0 in two s complement Make it possible to sort without needing to do floa^ng point comparison Floa^ng Point: Represen^ng Very Small Numbers Zero: Bit padern of all 0s is encoding for 0.000 But 0 in exponent should mean most nega^ve exponent (want 0 to be next to smallest real) Can t use two s complement (1000 0000 two ) Bias nota-on: subtract bias from exponent Single precision uses bias of 127; DP uses 1023 0 uses 0000 0000 two => 0-127 = - 127;, NaN uses 1111 1111 two => 255-127 = +128 Smallest SP real can represent: 1.00 00 x 2-126 Largest SP real can represent: 1.11 11 x 2 +127 Fall 2012 - - Lecture #13 31 Fall 2012 - - Lecture #13 32 What If Opera^on Result Doesn t Fit in 32 Bits? Overflow: calculate too big a number to represent within a word Unsigned numbers: 1 + 4,294,967,295 (2 32-1) Signed numbers: 1 + 2,147,483,647 (2 31-1) How to handle depends on the programming language C signed number arithme^c ignores overflow Other languages want overflow signal on signed numbers (e.g., Fortran) What s a computer architect to do? MIPS Solu^on: Offer Both Instruc^ons that can trigger overflow: add, sub, mult, div, addi, multi, divi Instruc^ons that don t overflow are called unsigned (really means no overflow ): addu, subu, multu, divu, addiu, multiu, diviu Given seman^cs of C, always use unsigned versions Note: slt and slti do signed comparisons, while sltu and sltiu do unsigned comparisons Nothing to do with overflow When would get different answer for slt vs. sltu? Fall 2012 - - Lecture #13 33 Fall 2012 - - Lecture #13 34 Student RouleDe? What About Real Numbers in Base 2? r x E i, E where is exponent (2), i is a posi^ve or nega^ve integer, r is a real number 1.0, < 2 Computers version of normalized scien^fic nota^on called Floa-ng Point nota^on Floa^ng Point Numbers 32- bit word has 2 32 paderns, so must be approxima^on of real numbers 1.0, < 2 IEEE 754 Floa^ng Point Standard: 1 bit for sign (s) of floa^ng point number 8 bits for exponent (E) 23 bits for frac-on (F) (get 1 extra bit of precision if leading 1 is implicit) (- 1) s x (1 + F) x 2 E Can represent from 2.0 x 10-38 to 2.0 x 10 38 Fall 2012 - - Lecture #13 35 Fall 2012 - - Lecture #13 36 6

, NaN Ge ng closer to zero Zero Bias Nota^on (+127) How it is interpreted How it is end Fall 2012 - - Lecture #13 37 - - 1 x 2 127 The Number Line - 1 x 2-126 3130 0 1 x 2-126 23 22 1 x 2 127 0 (- 1) s x (1 + F) x 2 E S E F 0-0 11111111 00000000000000000000000 1 x 2127 127 + 127 0 1111111000000000000000000000000 = 254 1 x 2-126 - 126 + 127 0 0000000100000000000000000000000 = 1 0 00000000 00000000000000000000000-1 x 2-126 1 0000000100000000000000000000000-126 + 127 = 1-1 x 2 127 1 127 + 127 1111111000000000000000000000000 = 254 1 11111111 00000000000000000000000 Fall 2012 - - Lecture #13 38 Floa^ng Point Numbers What about bigger or smaller numbers? IEEE 754 Floa^ng Point Standard: Double Precision (64 bits) 1 bit for sign (s) of floa^ng point number 11 bits for exponent (E) 52 bits for frac-on (F) (get 1 extra bit of precision if leading 1 is implicit) (- 1) s x (1 + F) x 2 E Can represent from 2.0 x 10-308 to 2.0 x 10 308 32 bit format called Single Precision Fall 2012 - - Lecture #13 39 More Floa^ng Point What about 0? Bit padern all 0s means 0, so no implicit leading 1 What if divide 1 by 0? Can get infinity symbols +, - Sign bit 0 or 1, largest exponent, 0 in frac^on What if do something stupid? (, 0 0) Can get special symbols NaN for Not- a- Number Sign bit 0 or 1, largest exponent, not zero in frac^on What if result is too big? (2x10 308 x 2x10 2 ) Get overflow in exponent, alert programmer! What if result is too small? (2x10-308 2x10 2 ) Get underflow in exponent, alert programmer! Fall 2012 - - Lecture #13 40 Floa^ng Point Add Associa^vity? A = (1000000.0 + 0.000001) - 1000000.0 B = (1000000.0-1000000.0) + 0.000001 In single precision floa^ng point arithme^c, A does not equal B A = 0.000000, B = 0.000001 Floa^ng Point Addi^on is not Associa^ve! Integer addi^on is associa^ve When does this mader? MIPS Floa^ng Point Instruc^ons C, Java has single precision (float) and double precision (double) types MIPS instruc^ons:.s for single,.d for double Fl. Pt. Addi^on single precision: add.s Fl. Pt. Addi^on double precision: add.d Fl. Pt. Subtrac^on single precision: sub.s Fl. Pt. Subtrac^on double precision: sub.d Fl. Pt. Mul^plica^on single precision: mul.s Fl. Pt. Mul^plica^on double precision: mul.d Fl. Pt. Divide single precision: div.s Fl. Pt. Divide double precision: div.d Fall 2011 - - Lecture #13 41 Fall 2012 - - Lecture #13 42 7

MIPS Floa^ng Point Instruc^ons C, Java have single precision (float) and double precision (double) types MIPS instruc^ons:.s for single,.d for double Fl. Pt. Comparison single precision: Fl. Pt. Comparison double precision: Fl. Pt. branch: Since rarely mix integers and Floa^ng Point, MIPS has separate registers for floa^ng- point opera^ons: $f0, $f1,, $f31 Double precision uses adjacent even- odd pairs of registers: $f0 and $f1, $f2 and $f3, $f4 and $f5,, $f30 and $f31 Need data transfer instruc^ons for these new registers lwc1 (load word), swc1 (store word) Double precision uses two lwc1 instruc^ons, two swc1 instruc^ons Fall 2012 - - Lecture #13 43 Who is Bigger than Whom? What s the Representa^on? 0000 0000 0000 0000 0000 0000 0000 0000 two > 1111 1111 1111 1111 1111 1111 1111 1111 two 1111 1111 1111 1111 1111 1111 1111 1110 two > 1111 1111 1111 1111 1111 1111 1111 1111 two 1000 0000 0000 0000 0000 0000 0000 0000 two > 1111 1111 0111 1111 1111 1111 1111 1111 two Ones Complement Twos Complement Sign and Magnitude Fall 2012 - - Lecture #13 44 Who is Bigger than Whom? What s the Representa^on? 0000 0000 0000 0000 0000 0000 0000 0000 two > 1111 1111 1111 1111 1111 1111 1111 1111 two 1111 1111 1111 1111 1111 1111 1111 1110 two > 1111 1111 1111 1111 1111 1111 1111 1111 two 1000 0000 0000 0000 0000 0000 0000 0000 two > 1111 1111 0111 1111 1111 1111 1111 1111 two Ones Complement Twos Complement Sign and Magnitude Peer Instruc^on Ques^on Suppose Big, Tiny, and BigNega^ve are floats in C, with Big ini^alized to a big number (e.g., age of universe in seconds or 4.32 x 10 17 ), Tiny to a small number (e.g., seconds/ femtosecond or 1.0 x 10-15 ), BigNega^ve = - Big. Here are two condi^onals: I. (Big * Tiny) * BigNega^ve == (Big * BigNega^ve) * Tiny II. (Big + Tiny) + BigNega^ve == (Big + BigNega^ve) + Tiny Which statement about these is correct? Orange. I. is false and II. is false Green. I. is false and II. is true Pink. I. is true and II. is false Fall 2012 - - Lecture #13 45 Fall 2012 - - Lecture #13 46 Peer Instruc^on Answer Suppose Big, Tiny, and BigNega^ve are floats in C, with Big ini^alized to a big number (e.g., age of universe in seconds or 4.32 x 10 17 ), Tiny to a small number (e.g., seconds/femtosecond or 1.0 x 10-15 ), BigNega^ve = - Big. Here are two condi^onals: I. (Big * Tiny) * BigNega^ve == (Big * BigNega^ve) * Tiny II. (Big + Tiny) + BigNega^ve == (Big + BigNega^ve) + Tiny Which statement about these is correct? Pink. I. is true and II. is false (if we don t consider overflow) but there are cases where one side overflows while the other does not! I. Works ok if no overflow, but because exponents add, if Big * BigNeg overflows, then result is overflow, not - 1 II. LeŠ hand side is 0, right hand side is ^ny Fall 2012 - - Lecture #13 47 Pialls Floa^ng point addi^on is NOT associa^ve Some op^miza^ons can change order of floa^ng point computa^ons, which can change results Need to ensure that floa^ng point algorithm is correct even with op^miza^ons Fall 2012 - - Lecture #13 48 8

And, in Conclusion, Interpreters for debugging, but slow execu^on Hybrid (Java): + Interpreter to try to get best of both Op^miza^on to relieve programmer Floa^ng point is an approxima^on of reals Fall 2012 - - Lecture #13 49 9