Wednesday, October 4, Optimizing compilers source modification Optimizing compilers code generation Your program - miscellaneous

Size: px
Start display at page:

Download "Wednesday, October 4, Optimizing compilers source modification Optimizing compilers code generation Your program - miscellaneous"

Transcription

1 Wednesday, October 4, 2017 Topics for today Code improvement Optimizing compilers source modification Optimizing compilers code generation Your program - miscellaneous Optimization Michael Jackson Donald Knuth Rule 1: Don t do it Rule 2: (experts only) Don t do it yet Premature optimization is the root of all evil (or at least most of it) in programming Optimizing compilers (p 297) There are many possible translations of a particular high-level language program into assembly code Two measures of the assembly code are: its size (in bytes) the time it takes to run The "default" translation (least work for the compiler?) is not likely to have the shortest run time nor use the least space Typically there is a trade-off between the time it takes an object program to run and its size In the following diagram, s represent possible translations of the source program D represents the program you get by default Run time Source program Compiler D Program size An optimizing compiler is one that does extra work during translation to arrive at a translation that is better than the default (by some measure) Comp 162 Notes Page 1 of 15 October 4, 2017

2 Many compilers permit users to specify the types of optimization that they wish the compiler to perform [ From the manual for the gcc compiler, here is the section on optimization options The default is no optimization ] ] Optimization Options -falign-functions=n -falign-jumps=n -falign-labels=n -falign-loops=n -fbranch-probabilities -fcaller-saves -fcprop-registers -fcse-follow-jumps -fcse-skip-blocks -fdata-sections -fdelayed-branch -fdelete-null-pointer-checks -fexpensive-optimizations -ffast-math -ffloat-store -fforce-addr -fforce-mem -ffunction-sections -fgcse -fgcse-lm -fgcse-sm -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2 -finline-functions -finline-limit=n -fkeep-inline-functions -fkeep-static-consts -fmerge-constants -fmerge-all-constants -fmove-all-movables -fnew-ra -fno-branch-count-reg -fno-default-inline -fno-defer-pop -fno-function-cse -fno-guess-branch-probability -fno-inline -fno-math-errno -fno-peephole -fno-peephole2 -funsafe-math-optimizations -ffinite-math-only -fno-trapping-math -fno-zero-initialized-in-bss -fomit-frame-pointer -foptimize-register-move -foptimize-sibing-calls -fprefetch-loop-arrays -freduce-all-givs -fregmove -frename-registers -freorder-blocks -freorder-functions -frerun-cse-after-loop -frerun-loop-opt -fschedule-insns -fschedule-insns2 -fno-sched-interblock -fno-sched-spec -fsched-spec-load -fsched-spec-load-dangerous -fsignaling-nans -fsingle-precision-constant -fssa -fssa-ccp -fssa-dce -fvrp -fstrength-reduce -fstrict-aliasing -ftracer -fthread-jumps -ftsp-ordering -funroll-all-loops -funroll-loops --param name=value -O -O0 -O1 -O2 -O3 -Os A compilation that uses optimization techniques takes more time because the compiler is doing more work It is thus most appropriate to select optimization options when producing the final production run of a program rather than when still at the debugging stage Here are five examples of the kinds of optimizations a compiler might perform Think of them as transformations on the source program performed before translation into assembly code even though that is probably not how the compiler implements them 1: Detecting common sub-expressions Performing arithmetic at compile time 2 Detecting common code in branches 3 Loop unrolling with constant 4 Strength reduction 5 Loop unrolling with variable Comp 162 Notes Page 2 of 15 October 4, 2017

3 Example 1: detection of common sub-expressions or performing compile-time arithmetic In each of the following assignments statements there are duplicate expressions A compiler can detect that and generate a single instance of the code that evaluates the expression t = (a + b - 49 ) * ( b + a - 49); AR [ y * t + 19 ] = AR [ y * t + 19] + 1; For instance, the second example is transformed to temp = y * t + 49; AR[ temp ] = AR[ temp ] + 1; Probably saving both time and space If there is arithmetic that can be carried out at compile time it makes sense for the compiler to do it Here is an (extreme) example W = T + ( * 7) * (43-6); Compiler replaces by W = T ; A more reasonable example is where you have included an expression for readability int secondsinaweek = 7 * 24 * 60 * 60 ; Example 2: common code in branches Sometimes, typically after multiple edits, a programmer may not realize that both branches of a conditional contain the same action A compiler can check for this if (t<0) b = 99; a = 0; else a = 0; b = 40; The compiler can in effect transform this into if (t<0) b = 99; else b = 40; a = 0; saving space in the object program Comp 162 Notes Page 3 of 15 October 4, 2017

4 Example 3: Loop unrolling with constant Depending on the size of the loop action, the loop overhead (initializing a loop variable, testing it, incrementing or decrementing it) may account for a significant fraction of loop run time A compiler can unroll a loop to reduce or eliminate loop overheads If the compiler can determine the number of iterations, loop overheads can be eliminated completely, for example for (i = 0; i < 3; i++) read(n); sum+=n; can be treated as if the user had written read(n); sum+=n; read(n); sum+=n; read(n); sum+=n; We get rid of the overhead of the loop counter The new code will run faster but it may take up more space This is why we have to give the user many choices about optimizations to be performed See Example 5 later for cases where the number of iterations is not a constant Example 4: strength reduction In strength reduction we try to replace an operation by a faster one In the following example we replace multiplication (a slow operation on most systems) by addition The compiler treats for (i=1; i<n; i++) T = 6 * i; output(t); as if the programmer had written T = 0; for (i=1; i<n; i++) T = T + 6; output(t); Comp 162 Notes Page 4 of 15 October 4, 2017

5 If we know the number of iterations of this loop as in for (i=1; i<10; i++) T = 6 * i; output(t); the compiler might be able to reduce the program to output(6); output(12); output(18); etc Example 5: Loop unrolling with variable What if the loop count is not a constant, for example for (i=0; i<n; i++)? The following transformation reduces the time requirements of the loop overhead by approximately 50% but the space requirements are increased because we have three instances of the action for (i=0; i<n/2; i++) for (i=0; i<n%2; i++ For example, if N contains 37 then the first loop will iterate 18 times (performing 2 actions each time) and the second loop once Note the compiler will typically generate code that avoids calculating N/2 or N%2 more than once We can take this idea further with the following that uses even more space but decreases runtime still further Comp 162 Notes Page 5 of 15 October 4, 2017

6 M = N/8; for (i=0; i<m; i++) for (i=0; i<n%8; i++) If N is 101 for example then the main loop iterates 12 times (accounting for 96 actions) then the second loop iterates 5 times accounting for the remainder Duff s Device Tom Duff proposed combining the two loops into a single loop Surprisingly the following is legal C though jumping into the middle of a loop is normally frowned on The first time through the loop, only part of the loop may be executed (the remainder part of the example above) thereafter come the full loops int t = (N+7)/8 switch (N%8) case 0: do case 7: case 6: case 5: case 4: case 3: case 2: case 1: while (--t > 0) For example, if N is 85, t is 11 and N%8 is 5, we jump to case 5 and do 5 actions as we fall through the cases Thereafter we perform full loops There are a total of t (11) loops, one of 5 actions and 10 with 8 actions giving us a total of 85 An article in the August 2005 issue of Dr Dobbs Journal shows how this code can be wrapped in a macro with (1) A dummy outer loop to enable local namespace for variables (2) Logical operations to speed up divide and mod Comp 162 Notes Page 6 of 15 October 4, 2017

7 Here is a macro definition that you can use in a C program #define DUFF_DEVICE_8(aCount, aaction) \ do \ int count_ = (acount); \ int times_ = (count_ + 7) >> 3; \ switch (count_ & 7) \ \ case 0: do aaction; \ case 7: aaction; \ case 6: aaction; \ case 5: aaction; \ case 4: aaction; \ case 3: aaction; \ case 2: aaction; \ case 1: aaction; \ while (--times_ > 0);\ \ while (0) now the user can simply write something like DUFF_DEVICE_8 (N, printf( \n ) ) Here is the log of a test of Duff s Device (the file duffh contains the text of the macro above) sh-300$ cat test2bc #include <stdioh> #include "duffh" int main() int N=13; DUFF_DEVICE_8 (N, printf("\n") ); sh-300$ gcc test2bc sh-300$ aout Comp 162 Notes Page 7 of 15 October 4, 2017

8 Tweaking source code In an embedded system environment, there is often tweaking of the source code to cause the compiler to generate the output you want (small and/or fast) This may result in source programs looking inelegant (from the point of view of an instructor in a high-level language programming course) Example 1: writing code inline instead of using function calls with their overheads Good code Code that might give you the object program you want void () A; B; C; (); (); A; B; C; A; B; C; Example 2: avoiding parameter passing Good code void pstars (int N) for (int I to N) output ( * ); Code that should give you a faster object program no parameter passing void p5stars() output( ***** ); Void p17stars() Even better no calls to user function output( ***** ); output( **************** ); output( ***** ); pstars(5); pstars(17); pstars(5); output( **************** ); p5stars(); p17stars(); p5stars(); Comp 162 Notes Page 8 of 15 October 4, 2017

9 Optimizing Compilers: Code generation The optimizations we have seen so far can be thought of as being applied to the source program transforming it in some way before translation What choices might the compiler make when it comes to generating the assembly code? We look at a couple (1) Memory vs Registers It takes longer for the CPU to access memory than to access registers An optimization is to have the compiler use registers instead of memory where possible In general, the algorithm for determining which variable is mapped to which register at any time might be complex However, Pep/9 only has two general purpose registers which simplifies the issue Here is an example of how a C program might be translated into Pep/9 assembly code maximizing use of registers sum = 0; for (i=0; i<n; i++) read(m); sum+=m; output(sum); We can use register A to hold "sum" and register to hold "i" leading to top: done: ldwa 0,i ; sum ldwx 0,i ; i cpwx N,d brge done deci M,d adda M,d ; sum is updated addx 1,i ; I is updated br top stwa sum,d deco sum,d This is faster than using memory variables i and sum but needs comments to help the reader follow the mapping In general, a compiler determines the scope of variables in a program and determines if any can map to the same register Consider the following where the table on the left of the code indicates the lines where the particular variable is in use Comp 162 Notes Page 9 of 15 October 4, 2017

10 a b c d int complexfunction() int a,b=0,c,d; for (a=0; a<1000; a++) read vector[a]; b+=vector[a]; for (c=1; c<1000; c++ vector[c] = vector[c]/b; vector[c] = vector[c-1] + vector[c]; d=0; for (c=0; c<1000; c++) d+=vector[c]; vector[c] /= d; for (a=0; a<1000; a++) output(vector[a]); Because they are never in use at the same time, variables a and c can be mapped to the same register Similarly, variables b and d can be mapped to the same register (2) Basic blocks A basic block is a sequence of statements with only one way in (at the top) and one way out (at the end In other words, there is no way we can jump into the middle of the block and no way to leave it from the middle Consider the C fragment total = sum + 5; result = sum + 2 * total; sum = 2 * sum + 2 * total; If the compiler looks at each statement in isolation, it would produce something like the following 12-instruction sequence Comp 162 Notes Page 10 of 15 October 4, 2017

11 ldwa sum,d adda 5,i stwa total,d ; ldwa sum,d adda total,d adda total,d stwa result,d ; ldwa sum,d adda sum,d adda total,d adda total,d stwa sum,d However, if the compiler takes into account the fact that the three high-level instructions constitute a basic block and must be executed in the sequence given it could take advantage of previous calculations and save 4 instructions as in ldwa sum,d adda 5,i stwa totald ; asla adda sum,d stwa result,d ; adda sum,d stwa sum,d Miscellaneous ideas for improving your programs (1) In looking at the programs you write, note that cpa 0,i is almost always redundant (2) In the sequence stwa t,d ldwa t,d the load is redundant because the value of t is still in register A (3) Branches to the next line are redundant brge label label: because the program goes to the labeled line whether or not the condition is met Comp 162 Notes Page 11 of 15 October 4, 2017

12 Reading Warford has some remarks on optimization on pages 297 and 298 We will begin section 63 next looking at subroutines in Pep/9 and how they can be used to implement functions in a high-level language Comp 162 Notes Page 12 of 15 October 4, 2017

13 Review Questions 1 There are redundancies in the following program that inputs a number and outputs one of two messages Identify the instructions that could be removed deci N,d ldwa N,d cpwa 40,i brlt br Y : stro ONE,d br end end: stop Y: ldwa N,d cpwa 40,i brlt stro TWO,d br end stop N: block 2 ONE: ascii low\n\x00 TWO: ascii high\n\x00 end 2 Consider the for loop for (i=1; i<limit; i++) A translation is top: (a) (b) ldwa 1,i stwa i,d action ldwa i,d adda 1,i stwa i,d cpwa limit,d brlt top if limit has a value 3 what is the largest number of bytes that action can be so that the unrolled loop is no bigger than the original code suppose that limit has a value 5, what is the largest number of bytes now? Comp 162 Notes Page 13 of 15 October 4, 2017

14 3 The following code inputs N and assigns 15 * N to M deci N,d ldwa N,d ldwx 14,i L: adda N,d subx 1,i brgt L stwa M,d (a) (b) (c) how much space does it occupy (in bytes) and how many instruction executions are there when it runs? what are the space and run-time figures if we unroll the loop? Is the unrolled version smaller or larger? Is the unrolled version faster or slower? Is there an implementation of the calculation that is both faster and smaller than the original? If so, show how it can be done Comp 162 Notes Page 14 of 15 October 4, 2017

15 Review Answers 1 deci N,d ldwa N,d cpwa 40,i brlt br Y : stro ONE,d br end *** end: stop Y: ldwa N,d *** cpwa 40,I *** brlt *** stro TWO,d br end *** stop N: block 2 ONE: ascii low\n\x00 TWO: ascii high\n\x00 end 2 (a) loop is 21 bytes plus the action Unrolled is 3 copies of action so action can be no more than 10 bytes (b) action can be no more than 5 bytes 3 (a) 21 bytes and 46 instructions at run-time (b) 51 bytes and 17 instructions so unrolled version is larger but faster (c) Yes Following is 16 bytes and 8 instructions deci N,d ldwa N,d asla asla asla asla suba N,d stwa M,d ; 2N ; 4N ; 8N ; 16N ; 15N Comp 162 Notes Page 15 of 15 October 4, 2017

Optimization of C Programs

Optimization of C Programs Optimization of C Programs C Programming and Software Tools N.C. State Department of Computer Science with material from R. Bryant and D. O Halloran Computer Systems: A Programmer s Perspective and Jon

More information

Arguments of C++ Applications g++ options Libraries

Arguments of C++ Applications g++ options Libraries Arguments of C++ Applications g++ options Libraries Shahram Rahatlou University of Rome La Sapienza Corso di Programmazione++ Roma, 23 May 2006 Options of g++ $ man g++ GCC(1) GNU GCC(1) NAME gcc - GNU

More information

Compiler Optimization

Compiler Optimization 1/34 Compiler Seminar Effiziente Programmierung in C Fachbereich Informatik Universität Hamburg 2012-11-29 Overview Key Aspects What is the compiler capable of? What are its weaknesses? How can you make

More information

Compiler Optimization

Compiler Optimization 1/34 s s Compiler Seminar Effiziente Programmierung in C Fachbereich Informatik Universität Hamburg 2012-11-29 Overview s s Key Aspects What is the compiler capable of? What are its weaknesses? How can

More information

Wednesday, March 14, 2018

Wednesday, March 14, 2018 Wednesday, March 14, 2018 Topics for today Arrays and Indexed Addressing Arrays as parameters of functions Multi-dimensional arrays Option A: Space-minimal solution Option B: Iliffe vectors Array bound

More information

Wednesday, February 15, 2017

Wednesday, February 15, 2017 Wednesday, February 15, 2017 Topics for today Before and after assembly: Macros, Linkers Overview of Chapter 6 Branching Unconditional Status bits and branching If statements While statements The V and

More information

Wednesday, February 7, 2018

Wednesday, February 7, 2018 Wednesday, February 7, 2018 Topics for today The Pep/9 memory Four example programs The loader The assembly language level (Chapter 5) Symbolic Instructions Assembler directives Immediate mode and equate

More information

Monday, March 6, We have seen how to translate void functions. What about functions that return a value such as

Monday, March 6, We have seen how to translate void functions. What about functions that return a value such as Monday, March 6, 2017 Topics for today C functions and Pep/9 subroutines Translating functions (c) Non-void functions (d) Recursive functions Reverse Engineering: Pep/9 to C C Functions and Pep/9 Subroutines

More information

Monday, March 27, 2017

Monday, March 27, 2017 Monday, March 27, 2017 Topics for today Indexed branching Implementation of switch statement Reusable subroutines Indexed branching It turns out that arrays are useful in translating other language constructs,

More information

Monday, March 13, 2017

Monday, March 13, 2017 Monday, March 13, 2017 Topics for today Arrays and Indexed Addressing Global arrays Local arrays Buffer exploit attacks Arrays and indexed addressing (section 6.4) So far we have looked at scalars (int,

More information

Wednesday, February 28, 2018

Wednesday, February 28, 2018 Wednesday, February 28, 2018 Topics for today C functions and Pep/9 subroutines Introduction Location of subprograms in a program Translating functions (a) Void functions (b) Void functions with parameters

More information

Wednesday, September 27, 2017

Wednesday, September 27, 2017 Wednesday, September 27, 2017 Topics for today Chapter 6: Mapping High-level to assembly-level The Pep/9 run-time stack (6.1) Stack-relative addressing (,s) SP manipulation Stack as scratch space Global

More information

Wednesday, October 17, 2012

Wednesday, October 17, 2012 Wednesday, October 17, 2012 Topics for today Arrays and Indexed Addressing Arrays as parameters of functions Multi-dimensional arrays Indexed branching Implementation of switch statement Arrays as parameters

More information

Evaluation of Various Compiler Optimization Techniques Related to Mibench Benchmark Applications

Evaluation of Various Compiler Optimization Techniques Related to Mibench Benchmark Applications Journal of Computer Science 9 (6): 749-756, 2013 ISSN: 1549-3636 2013 doi:10.3844/jcssp.2013.749.756 Published Online 9 (6) 2013 (http://www.thescipub.com/jcs.toc) Evaluation of Various Compiler Optimization

More information

Wednesday, September 20, 2017

Wednesday, September 20, 2017 Wednesday, September 20, 2017 Topics for today More high-level to Pep/9 translations Compilers and Assemblers How assemblers work Symbol tables ILC Pass 1 algorithm, Error checking Pass 2 Immediate mode

More information

Wednesday, September 21, 2016

Wednesday, September 21, 2016 Wednesday, September 21, 2016 Topics for today More high-level to translations Compilers and Assemblers How assemblers work Symbol tables ILC Pass 1 algorithm, Error checking Pass 2 Immediate mode and

More information

Monday, October 24, 2016

Monday, October 24, 2016 Monday, October 24, 2016 Topics for today Arrays and Indexed Addressing Arrays as parameters of functions Multi-dimensional arrays Option A: Space-minimal solution Option B: Iliffe vectors Array bound

More information

GCC UPC BENCHMARKS

GCC UPC BENCHMARKS GCC UPC 4.2.3 BENCHMARKS Author: Nenad Vukicevic 2155 Park Blvd Palo Alto, CA 94306 Date: May 30, 2008 Background Ongoing GCC UPC development work continues to provide improvements

More information

Optimization of applications. Optimization of applications

Optimization of applications. Optimization of applications Execution time T of the program T = T e + T m = N i t i + N m t m T e T m N i t i Nm : time to execute instructions : time to move data (and instructions) between CPU and memory : number of instructions

More information

Session 9 (last revised: February 8, 2011) 9 1

Session 9 (last revised: February 8, 2011) 9 1 780.20 Session 9 (last revised: February 8, 2011) 9 1 9. 780.20 Session 9 a. Follow-ups to Session 8 and earlier Visualization of the Pendulum s Dynamics. The handout with this heading shows the progression

More information

Wednesday, March 29, Implementation of sets in an efficient manner illustrates some bit-manipulation ideas.

Wednesday, March 29, Implementation of sets in an efficient manner illustrates some bit-manipulation ideas. Wednesday, March 29, 2017 Topics for today Sets: representation and manipulation using bits Dynamic memory allocation Addressing mode summary Sets Implementation of sets in an efficient manner illustrates

More information

Wednesday, March 12, 2014

Wednesday, March 12, 2014 Wednesday, March 12, 2014 Topics for today Solutions to HW #3 Arrays and Indexed Addressing Global arrays Local arrays Buffer exploit attacks Solutions to Homework #3 1. deci N,d < (a) N not defined lda

More information

MACHINE LEARNING BASED COMPILER OPTIMIZATION

MACHINE LEARNING BASED COMPILER OPTIMIZATION MACHINE LEARNING BASED COMPILER OPTIMIZATION Arash Ashari Slides have been copied and adapted from 2011 SIParCS by William Petzke; Self-Tuning Compilers Selecting a good set of compiler optimization flags

More information

Monday, October 17, 2016

Monday, October 17, 2016 Monday, October 17, 2016 Topics for today C functions and Pep/8 subroutines Passing parameters by reference Globals Locals Reverse Engineering II Representation of Booleans C Functions and Pep/8 Subroutines

More information

Monday, March 9, 2015

Monday, March 9, 2015 Monday, March 9, 2015 Topics for today C functions and Pep/8 subroutines Passing parameters by reference Globals Locals More reverse engineering: Pep/8 to C Representation of Booleans C Functions and Pep/8

More information

Monday, October 26, 2015

Monday, October 26, 2015 Monday, October 26, 2015 Topics for today Indexed branching Implementation of switch statement Reusable subroutines Indexed branching It turns out that arrays are useful in translating other language constructs,

More information

Introduction of our test cases

Introduction of our test cases LTSI workshop Introduction of our test cases Oct 25th, 2013 Teppei ASABA, Fujitsu Computer Technologies 1237ka01 whoami In-House Embedded Linux Distributor of Fujitsu Our Distribution includes LTSI Kernel

More information

Monday, February 16, 2015

Monday, February 16, 2015 Monday, February 16, 2015 Topics for today How assemblers work Symbol tables ILC Pass 1 algorithm, Error checking Pass 2 Immediate mode and equate Assembler variants: Disassembler, Cross assembler Macros

More information

Wednesday, April 19, 2017

Wednesday, April 19, 2017 Wednesday, April 19, 2017 Topics for today Process management (Chapter 8) Loader Traps Interrupts, Time-sharing Storage management (Chapter 9) Main memory (1) Uniprogramming (2) Fixed-partition multiprogramming

More information

Improving Application Performance Through Space Optimizations

Improving Application Performance Through Space Optimizations Improving Application Performance Through Space Optimizations Michael Carpenter Doumit Ishak Project Sponsor: Will Cohen, Red Hat Inc. Table of Contents Topic Page Number Project Description 1 Performance

More information

Profiling. David Newman. (From Tom Logan) (Slides originally from Kate Hedstrom & Tom Baring)

Profiling. David Newman. (From Tom Logan) (Slides originally from Kate Hedstrom & Tom Baring) 1 Profiling David Newman (From Tom Logan) (Slides originally from Kate Hedstrom & Tom Baring) 2 Topics Objectives of Profiling Compiler options Hardware performance counters Manual Timing Profiling Compile

More information

Wednesday, April 16, 2014

Wednesday, April 16, 2014 Wednesday, pril 16, 2014 Topics for today Homework #5 solutions Code generation nalysis lgorithm 4: infix to tree Synthesis lgorithm 5: tree to code Optimization HW #5 solutions 1. lda 0,i ; for sum of

More information

Wednesday, April 22, 2015

Wednesday, April 22, 2015 Wednesday, April 22, 2015 Topics for today Topics for Exam 3 Process management (Chapter 8) Loader Traps Interrupts, Time-sharing Storage management (Chapter 9) Main memory (1) Uniprogramming (2) Fixed-partition

More information

Monday, February 11, 2013

Monday, February 11, 2013 Monday, February 11, 2013 Topics for today The Pep/8 memory Four example programs The loader The assembly language level (Chapter 5) Symbolic Instructions Assembler directives Immediate mode and equate

More information

A rubric for programming assignments

A rubric for programming assignments Fall 2012 Comp 162 Peter Smith A rubric for programming assignments Generally, half the points for a program assignment are for the Correctness of the program with respect to the specification. The other

More information

A NOVEL APPROACH FOR SELECTION OF BEST SET OF OPTIMIZATION FUNCTIONS FOR A BENCHMARK APPLICATION USING AN EFFECTIVE STRATEGY

A NOVEL APPROACH FOR SELECTION OF BEST SET OF OPTIMIZATION FUNCTIONS FOR A BENCHMARK APPLICATION USING AN EFFECTIVE STRATEGY A NOVEL APPROACH FOR SELECTION OF BEST SET OF OPTIMIZATION FUNCTIONS FOR A BENCHMARK APPLICATION USING AN EFFECTIVE STRATEGY J.Andrews Research scholar, Department of Computer Science Engineering Sathyabama

More information

Wednesday, November 15, 2017

Wednesday, November 15, 2017 Wednesday, November 15, 2017 Topics for today Code generation Synthesis Algorithm 5: tree to code Optimizations Code generation Algorithm 5: generating assembly code Visiting all the nodes in a linked

More information

g77 - Linux Command g77 SYNOPSIS g77 [-câ -Sâ -E] [ ] [ -pg ] [ -O level ] [ -W warn...] [ -pedantic ] [

g77 - Linux Command g77 SYNOPSIS g77 [-câ -Sâ -E] [ ] [ -pg ] [ -O level ] [ -W warn...] [ -pedantic ] [ - CentOS 5.2 - Linux Users Guide - Linux Command SYNOPSIS [-câ -Sâ -E] [ [ -pg ] [ -O level ] [ -W warn...] [ -pedantic ] [ -I dir...] [ -L dir...] [ -D macro [= defn ]...] [ -U macro ] [ -f option...]

More information

Bindel, Fall 2011 Applications of Parallel Computers (CS 5220) Tuning on a single core

Bindel, Fall 2011 Applications of Parallel Computers (CS 5220) Tuning on a single core Tuning on a single core 1 From models to practice In lecture 2, we discussed features such as instruction-level parallelism and cache hierarchies that we need to understand in order to have a reasonable

More information

QUIZ. Name all the 4 parts of the fetch-execute cycle.

QUIZ. Name all the 4 parts of the fetch-execute cycle. QUIZ Name all the 4 parts of the fetch-execute cycle. 1 Solution Name all the 4 parts of the fetch-execute cycle. 2 QUIZ Name two fundamental differences between magnetic drives and optical drives: 3 QUIZ

More information

Compiler Options. Linux/x86 Performance Practical,

Compiler Options. Linux/x86 Performance Practical, Center for Information Services and High Performance Computing (ZIH) Compiler Options Linux/x86 Performance Practical, 17.06.2009 Zellescher Weg 12 Willers-Bau A106 Tel. +49 351-463 - 31945 Ulf Markwardt

More information

Monday, November 7, Structures and dynamic memory

Monday, November 7, Structures and dynamic memory Monday, November 7, 2016 Topics for today Structures Structures and dynamic memory Grammars and Languages (Chapter 7) String generation Parsing Regular languages Structures We have seen one composite data

More information

Low-level software. Components Circuits Gates Transistors

Low-level software. Components Circuits Gates Transistors QUIZ Pipelining A computer pipeline has 4 processors, as shown above. Each processor takes 15 ms to execute, and each instruction must go sequentially through all 4 processors. A program has 10 instructions.

More information

Extra-credit QUIZ Pipelining -due next time-

Extra-credit QUIZ Pipelining -due next time- QUIZ Pipelining A computer pipeline has 4 processors, as shown above. Each processor takes 15 ms to execute, and each instruction must go sequentially through all 4 processors. A program has 10 instructions.

More information

Compiler Optimizations

Compiler Optimizations Compiler Optimizations CS 498: Compiler Optimizations Fall 2006 University of Illinois at Urbana-Champaign A note about the name of optimization It is a misnomer since there is no guarantee of optimality.

More information

Compiler Optimization

Compiler Optimization Compiler Optimization The compiler translates programs written in a high-level language to assembly language code Assembly language code is translated to object code by an assembler Object code modules

More information

A Deconstructing Iterative Optimization

A Deconstructing Iterative Optimization A Deconstructing Iterative Optimization YANG CHEN, SHUANGDE FANG and YUANJIE HUANG, State Key Laboratory of Computer Architecture, ICT, CAS, China; Gradudate School, CAS, China LIEVEN EECKHOUT, Ghent University,

More information

Computer Systems Lecture 9

Computer Systems Lecture 9 Computer Systems Lecture 9 CPU Registers in x86 CPU status flags EFLAG: The Flag register holds the CPU status flags The status flags are separate bits in EFLAG where information on important conditions

More information

QUIZ. Name all the 4 parts of the fetch-execute cycle.

QUIZ. Name all the 4 parts of the fetch-execute cycle. QUIZ Name all the 4 parts of the fetch-execute cycle. 1 Solution Name all the 4 parts of the fetch-execute cycle. 2 QUIZ Name two fundamental differences between magnetic drives and optical drives: 3 Solution

More information

Performance Improvement. The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 7

Performance Improvement. The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 7 Performance Improvement The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 7 1 For Your Amusement Optimization hinders evolution. -- Alan Perlis

More information

Wednesday, February 19, 2014

Wednesday, February 19, 2014 Wednesda, Februar 19, 2014 Topics for toda Solutions to HW #2 Topics for Eam #1 Chapter 6: Mapping High-level to assembl-level The Pep/8 run-time stack Stack-relative addressing (,s) SP manipulation Stack

More information

Wednesday, September 13, Chapter 4

Wednesday, September 13, Chapter 4 Wednesday, September 13, 2017 Topics for today Introduction to Computer Systems Static overview Operation Cycle Introduction to Pep/9 Features of the system Operational cycle Program trace Categories of

More information

Lecture 6: Assembly Programs

Lecture 6: Assembly Programs Lecture 6: Assembly Programs Today s topics: Procedures Examples Large constants The compilation process A full example 1 Procedures Local variables, AR, $fp, $sp Scratchpad and saves/restores, $fp Arguments

More information

Tour of common optimizations

Tour of common optimizations Tour of common optimizations Simple example foo(z) { x := 3 + 6; y := x 5 return z * y } Simple example foo(z) { x := 3 + 6; y := x 5; return z * y } x:=9; Applying Constant Folding Simple example foo(z)

More information

Lecture 7: Examples, MARS, Arithmetic

Lecture 7: Examples, MARS, Arithmetic Lecture 7: Examples, MARS, Arithmetic Today s topics: More examples MARS intro Numerical representations 1 Dealing with Characters Instructions are also provided to deal with byte-sized and half-word quantities:

More information

Fixed-Point Math and Other Optimizations

Fixed-Point Math and Other Optimizations Fixed-Point Math and Other Optimizations Embedded Systems 8-1 Fixed Point Math Why and How Floating point is too slow and integers truncate the data Floating point subroutines: slower than native, overhead

More information

C Review. MaxMSP Developers Workshop Summer 2009 CNMAT

C Review. MaxMSP Developers Workshop Summer 2009 CNMAT C Review MaxMSP Developers Workshop Summer 2009 CNMAT C Syntax Program control (loops, branches): Function calls Math: +, -, *, /, ++, -- Variables, types, structures, assignment Pointers and memory (***

More information

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program. Language Translation Compilation vs. interpretation Compilation diagram Step 1: compile program compiler Compiled program Step 2: run input Compiled program output Language Translation compilation is translation

More information

Monday, September 28, 2015

Monday, September 28, 2015 Monda, September 28, 2015 Topics for toda Chapter 6: Mapping High-level to assembl-level The Pep/8 run-time stack (6.1) Stack-relative addressing (,s) SP manipulation Stack as scratch space Global variables

More information

Computer Programming. Basic Control Flow - Loops. Adapted from C++ for Everyone and Big C++ by Cay Horstmann, John Wiley & Sons

Computer Programming. Basic Control Flow - Loops. Adapted from C++ for Everyone and Big C++ by Cay Horstmann, John Wiley & Sons Computer Programming Basic Control Flow - Loops Adapted from C++ for Everyone and Big C++ by Cay Horstmann, John Wiley & Sons Objectives To learn about the three types of loops: while for do To avoid infinite

More information

Project 3 Due October 21, 2015, 11:59:59pm

Project 3 Due October 21, 2015, 11:59:59pm Project 3 Due October 21, 2015, 11:59:59pm 1 Introduction In this project, you will implement RubeVM, a virtual machine for a simple bytecode language. Later in the semester, you will compile Rube (a simplified

More information

CSE Lecture In Class Example Handout

CSE Lecture In Class Example Handout CSE 30321 Lecture 07-09 In Class Example Handout Part A: A Simple, MIPS-based Procedure: Swap Procedure Example: Let s write the MIPS code for the following statement (and function call): if (A[i] > A

More information

ECE 30 Introduction to Computer Engineering

ECE 30 Introduction to Computer Engineering ECE 30 Introduction to Computer Engineering Study Problems, Set #3 Spring 2015 Use the MIPS assembly instructions listed below to solve the following problems. arithmetic add add sub subtract addi add

More information

Shift and Rotate Instructions

Shift and Rotate Instructions Shift and Rotate Instructions Shift and rotate instructions facilitate manipulations of data (that is, modifying part of a 32-bit data word). Such operations might include: Re-arrangement of bytes in a

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

Final CSE 131B Winter 2003

Final CSE 131B Winter 2003 Login name Signature Name Student ID Final CSE 131B Winter 2003 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 _ (20 points) _ (25 points) _ (21 points) _ (40 points) _ (30 points) _ (25 points)

More information

Introduction. 1 Measuring time. How large is the TLB? 1.1 process or wall time. 1.2 the test rig. Johan Montelius. September 20, 2018

Introduction. 1 Measuring time. How large is the TLB? 1.1 process or wall time. 1.2 the test rig. Johan Montelius. September 20, 2018 How large is the TLB? Johan Montelius September 20, 2018 Introduction The Translation Lookaside Buer, TLB, is a cache of page table entries that are used in virtual to physical address translation. Since

More information

cs3157: another C lecture (mon-21-feb-2005) C pre-processor (3).

cs3157: another C lecture (mon-21-feb-2005) C pre-processor (3). cs3157: another C lecture (mon-21-feb-2005) C pre-processor (1). today: C pre-processor command-line arguments more on data types and operators: booleans in C logical and bitwise operators type conversion

More information

NAN propagation versus fault trapping in floating point code

NAN propagation versus fault trapping in floating point code NAN propagation versus fault trapping in floating point code By Agner Fog. Technical University of Denmark. Copyright 2018. Last updated 2018-05-24. Contents 1 Introduction... 1 2 Fault trapping... 1 3

More information

CS553 Lecture Dynamic Optimizations 2

CS553 Lecture Dynamic Optimizations 2 Dynamic Optimizations Last time Predication and speculation Today Dynamic compilation CS553 Lecture Dynamic Optimizations 2 Motivation Limitations of static analysis Programs can have values and invariants

More information

Monday, April 15, We will lead up to the Analysis and Synthesis algorithms involved by first looking at three simpler ones.

Monday, April 15, We will lead up to the Analysis and Synthesis algorithms involved by first looking at three simpler ones. Monday, pril 15, 2013 Topics for today Code generation nalysis lgorithm 1: evaluation of postfix lgorithm 2: infix to postfix lgorithm 3: evaluation of infix lgorithm 4: infix to tree Synthesis lgorithm

More information

Wednesday, February 4, Chapter 4

Wednesday, February 4, Chapter 4 Wednesday, February 4, 2015 Topics for today Introduction to Computer Systems Static overview Operation Cycle Introduction to Pep/8 Features of the system Operational cycle Program trace Categories of

More information

IA-64 Compiler Technology

IA-64 Compiler Technology IA-64 Compiler Technology David Sehr, Jay Bharadwaj, Jim Pierce, Priti Shrivastav (speaker), Carole Dulong Microcomputer Software Lab Page-1 Introduction IA-32 compiler optimizations Profile Guidance (PGOPTI)

More information

Unified Engineering Fall 2004

Unified Engineering Fall 2004 Massachusetts Institute of Technology Department of Aeronautics and Astronautics Cambridge, MA 02139 Unified Engineering Fall 2004 Problem Set #3 Solutions C&P PSET 3 Solutions 1. 12

More information

Weeks 6&7: Procedures and Parameter Passing

Weeks 6&7: Procedures and Parameter Passing CS320 Principles of Programming Languages Weeks 6&7: Procedures and Parameter Passing Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Weeks 6&7: Procedures and Parameter Passing 1 / 45

More information

MIPS Programming. A basic rule is: try to be mechanical (that is, don't be "tricky") when you translate high-level code into assembler code.

MIPS Programming. A basic rule is: try to be mechanical (that is, don't be tricky) when you translate high-level code into assembler code. MIPS Programming This is your crash course in assembler programming; you will teach yourself how to program in assembler for the MIPS processor. You will learn how to use the instruction set summary to

More information

Class Information ANNOUCEMENTS

Class Information ANNOUCEMENTS Class Information ANNOUCEMENTS Third homework due TODAY at 11:59pm. Extension? First project has been posted, due Monday October 23, 11:59pm. Midterm exam: Friday, October 27, in class. Don t forget to

More information

Control Structures. Code can be purely arithmetic assignments. At some point we will need some kind of control or decision making process to occur

Control Structures. Code can be purely arithmetic assignments. At some point we will need some kind of control or decision making process to occur Control Structures Code can be purely arithmetic assignments At some point we will need some kind of control or decision making process to occur C uses the if keyword as part of it s control structure

More information

multiple variables having the same value multiple variables having the same identifier multiple uses of the same variable

multiple variables having the same value multiple variables having the same identifier multiple uses of the same variable PART III : Language processing, interpretation, translation, the concept of binding, variables, name and scope, Type, l-value, r-value, reference and unnamed variables, routines, generic routines, aliasing

More information

Introduction. C provides two styles of flow control:

Introduction. C provides two styles of flow control: Introduction C provides two styles of flow control: Branching Looping Branching is deciding what actions to take and looping is deciding how many times to take a certain action. Branching constructs: if

More information

BASIC COMPUTATION. public static void main(string [] args) Fundamentals of Computer Science I

BASIC COMPUTATION. public static void main(string [] args) Fundamentals of Computer Science I BASIC COMPUTATION x public static void main(string [] args) Fundamentals of Computer Science I Outline Using Eclipse Data Types Variables Primitive and Class Data Types Expressions Declaration Assignment

More information

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

HO #13 Fall 2015 Gary Chan. Hashing (N:12) HO #13 Fall 2015 Gary Chan Hashing (N:12) Outline Motivation Hashing Algorithms and Improving the Hash Functions Collisions Strategies Open addressing and linear probing Separate chaining COMP2012H (Hashing)

More information

Stored Program Concept. Instructions: Characteristics of Instruction Set. Architecture Specification. Example of multiple operands

Stored Program Concept. Instructions: Characteristics of Instruction Set. Architecture Specification. Example of multiple operands Stored Program Concept Instructions: Instructions are bits Programs are stored in memory to be read or written just like data Processor Memory memory for data, programs, compilers, editors, etc. Fetch

More information

Agenda. CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language 8/29/17. Recap: Binary Number Conversion

Agenda. CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language 8/29/17. Recap: Binary Number Conversion CS 61C: Great Ideas in Computer Architecture Lecture 2: Numbers & C Language Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c Numbers wrap-up This is not on the exam! Break C Primer Administrivia,

More information

CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language. Krste Asanović & Randy Katz

CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language. Krste Asanović & Randy Katz CS 61C: Great Ideas in Computer Architecture Lecture 2: Numbers & C Language Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c Numbers wrap-up This is not on the exam! Break C Primer Administrivia,

More information

HW 2 is out! Due 9/25!

HW 2 is out! Due 9/25! HW 2 is out! Due 9/25! CS 6290 Static Exploitation of ILP Data-Dependence Stalls w/o OOO Single-Issue Pipeline When no bypassing exists Load-to-use Long-latency instructions Multiple-Issue (Superscalar),

More information

Low-Level Programming Languages and Pseudocode

Low-Level Programming Languages and Pseudocode Chapter 6 Low-Level Programming Languages and Pseudocode Chapter Goals List the operations that a computer can perform Describe the important features of the Pep/8 virtual machine Distinguish between immediate

More information

Goals of this Lecture

Goals of this Lecture C Pointers Goals of this Lecture Help you learn about: Pointers and application Pointer variables Operators & relation to arrays 2 Pointer Variables The first step in understanding pointers is visualizing

More information

Topic Notes: MIPS Instruction Set Architecture

Topic Notes: MIPS Instruction Set Architecture Computer Science 220 Assembly Language & Comp. Architecture Siena College Fall 2011 Topic Notes: MIPS Instruction Set Architecture vonneumann Architecture Modern computers use the vonneumann architecture.

More information

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013 A Bad Name Optimization is the process by which we turn a program into a better one, for some definition of better. CS 2210: Optimization This is impossible in the general case. For instance, a fully optimizing

More information

CSC 221: Computer Organization, Spring 2009

CSC 221: Computer Organization, Spring 2009 1 of 7 4/17/2009 10:52 AM Overview Schedule Resources Assignments Home CSC 221: Computer Organization, Spring 2009 Practice Exam 2 Solutions The exam will be open-book, so that you don't have to memorize

More information

LOW-LEVEL PROGRAMMING LANAGUAGES AND PSEUDOCODE. Introduction to Computer Engineering 2015 Spring by Euiseong Seo

LOW-LEVEL PROGRAMMING LANAGUAGES AND PSEUDOCODE. Introduction to Computer Engineering 2015 Spring by Euiseong Seo LOW-LEVEL PROGRAMMING LANAGUAGES AND PSEUDOCODE Introduction to Computer Engineering 2015 Spring by Euiseong Seo Where are we? Chapter 1: The Big Picture Chapter 2: Binary Values and Number Systems Chapter

More information

CS2351 Data Structures. Lecture 7: A Brief Review of Pointers in C

CS2351 Data Structures. Lecture 7: A Brief Review of Pointers in C CS2351 Data Structures Lecture 7: A Brief Review of Pointers in C 1 About this lecture Pointer is a useful object that allows us to access different places in our memory We will review the basic use of

More information

CpSc 1111 Lab 4 Part a Flow Control, Branching, and Formatting

CpSc 1111 Lab 4 Part a Flow Control, Branching, and Formatting CpSc 1111 Lab 4 Part a Flow Control, Branching, and Formatting Your factors.c and multtable.c files are due by Wednesday, 11:59 pm, to be submitted on the SoC handin page at http://handin.cs.clemson.edu.

More information

APS105. Modularity. C pre-defined functions 11/5/2013. Functions. Functions (and Pointers) main. Modularity. Math functions. Benefits of modularity:

APS105. Modularity. C pre-defined functions 11/5/2013. Functions. Functions (and Pointers) main. Modularity. Math functions. Benefits of modularity: APS105 Functions (and Pointers) Functions Tetbook Chapter5 1 2 Modularity Modularity Break a program into manageable parts (modules) Modules interoperate with each other Benefits of modularity: Divide-and-conquer:

More information

CprE 288 Introduction to Embedded Systems ARM Assembly Programming: Translating C Control Statements and Function Calls

CprE 288 Introduction to Embedded Systems ARM Assembly Programming: Translating C Control Statements and Function Calls CprE 288 Introduction to Embedded Systems ARM Assembly Programming: Translating C Control Statements and Function Calls Instructors: Dr. Phillip Jones 1 Announcements Final Projects Projects: Mandatory

More information

Dynamic Memory Allocation

Dynamic Memory Allocation 1 Dynamic Memory Allocation Anne Bracy CS 3410 Computer Science Cornell University Note: these slides derive from those by Markus Püschel at CMU 2 Recommended Approach while (TRUE) { code a little; test

More information

Lecture 5: Procedure Calls

Lecture 5: Procedure Calls Lecture 5: Procedure Calls Today s topics: Procedure calls and register saving conventions 1 Example Convert to assembly: while (save[i] == k) i += 1; i and k are in $s3 and $s5 and base of array save[]

More information

Monday, August 28, 2017

Monday, August 28, 2017 Monday, August 28, 2017 Topics for today Course in context. Course outline, requirements, grading. Administrivia: Tutoring: Department, PLTL, LRC Knowledge Survey The concept of a multi-level machine Motivations

More information

CSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1

CSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1 CSIS1120A 10. Instruction Set & Addressing Mode CSIS1120A 10. Instruction Set & Addressing Mode 1 Elements of a Machine Instruction Operation Code specifies the operation to be performed, e.g. ADD, SUB

More information