A QUICK INTRO TO PRACTICAL OPTIMIZATION TECHNIQUES

Size: px
Start display at page:

Download "A QUICK INTRO TO PRACTICAL OPTIMIZATION TECHNIQUES"

Transcription

1 A QUICK INTRO TO PRACTICAL OPTIMIZATION TECHNIQUES 0. NO SILVER BULLETS HERE. 1. Set Compiler Options Appropriately: Select processor architecture: Enables compiler to make full use of instructions which are supported by the processor Compiler performs processor-specific optimizations E.g., use -proc ARM7 Also use the most appropriate arch?? ( -arch N) (4 and 4T enables halfword instruction) Debugging Options Debugging options affect both codesize and performance significantly. To allow efficient debugging a varying level of optimizations is disabled. (Some optimizations produce code that cannot be described in debugging tables. Switch off debugging when code size and/or performance is important. Debug options: -g, -gr, -go -go increases code size by 7-15% Optimization Options: For time/speed (-Otime) For code size (-Ospace) Use appropriate ARM Procedure Call Standard or APCS options: -aps /wide -aps /fp -aps /swst/fp Recommendation: experiment with these options for the project: -Ospace -Otime -Otime -apcs /wide -Otime -go -Otime -g -Otime -gr -Otime -apcs /fp -Otime -apcs /swst/fp 2. Division & Reminder Divisions are typically implemented by calling a C-lib function: rt_sdiv, and rt_udiv Divisions are very expensive: cycles! (rule of thumb: Ncycles) Avoid it when possible, use algebraic substitutions: (x/y)> z can be replaced by x > (z*y) Combine both division and reminder when needed int combined_div_mod (int a, int b) {return (a/b) +(a%b); Use power of 2 division when possible. (eg. Use a/16 instead of a/15) (128 instead 100) Typedef unsigned int uint Uint div16u (uint a) { return a/16; Avoid Modulo (uses reminder) arithmetic when possible: Uint counter1 (uint count) { return (++count % 60); SW OPTIMIZATION (Courtesy M. Romdhane, Conexant) Page 1 of 8

2 Instead use this: Uint counter2 (uint count) { if (++count >= 60) count = 0; return (count); Division by a constant? Lookup tables 3. Conditional Execution All ARM instructions are conditional. Each instruction contains a 4-bit field which is a condition code. The instruction is only executed if the ARM flag bits indicate that the specified condition is true. Conditional execution is applied mostly in the body of if statements and while evaluating complex expressions with relational (<, ==, >) and Boolean operators (&&,!, ) Typically it starts with a compare instruction followed by a few conditional instructions It reduces the number of branch instructions and, therefore, improves code size and performance. 1 Branch instruction takes about 2.5 ARM7 cycles. Recommendation: To enable the compiler to use conditional instruction you need to keep the bodies of if/else statements as simple as possible. And relational expressions should be grouped into blocks of similar conditions: generate all flags and stream through the code without branch instructions 4. Compare with zero Can be avoided if the code can directly test the N, Z flags (Z: result is zero, N negative) ADD R0, R0, R1 CMP R0, #0 Produces identical N and Z flags as ADDS R0, R0, R1 However: the C language has no concept of a carry flag or overflow flag so it is not possible to test the C or V flag bits directly without inline assembler. The compiler supports the carry flag. 5. LOOPS Loop termination condition can cause significant overhead if written without caution. Always use count-down-to-zero loops and use simple termination conditions. Exampe: n! int fact1 (int n) { int i, fact =1; for (i=1; i <= n; i++) fact *=i; return (fact); int fact2 (int n) { int i, fact =1; for (i=n; i!= 0; i--) fact *=i; return (fact); fact2 can use SUBS instead of ADD/CMP. This is because a compare with zero could be optimized away. Saves one instruction in the loop. SW OPTIMIZATION (Courtesy M. Romdhane, Conexant) Page 2 of 8

3 Also on fact2 the variable n does not need to be saved across the loop so a register is also saved. this eases register allocation and leads to more efficient code elsewhere in the function (2 more instruction saved) This observation (of initializing the loop counter to the number of iterations required and then decrementing down to zero) also applies to while and do statements. 6. LOOP UNROLLING Small loops can be unrolled for higher performance at the expense of increased codesize. When a loop is unrolled, the loop counter needs to be updated less often and fewer branches are executed. If the iteration number is small a loop can be fully unrolled and the loop overhead completely disappears. Not supported by the compiler, should be done manually int countbit1 (uint n) { int bits = 0; while (n!=0) { if (n & 1) bits++; n >>=1; return bits; Here if we assume ARM7, then checking a single bit takes 6 cycles, and the code size is only 9 instructions int countbit1 (uint n) { int bits = 0; while (n!=0) { if (n & 1) bits++; if (n & 2) bits++; if (n &4) bits++; if (n & 8) bits++; n >>=4; /* shift right by 4*/ return bits; The above code checks 4 bits at a time, taking on average 3 cycles per bit. However, the code size is 15 instructions. 7. REGISTER ALLOCATION This is a process where the compiler allocates variables to ARM registers, rather than to memory. This has a dramatic effect on both speed and memory as these variables can now be accessed quickly without needing instructions to transfer them to/from the memory. You can write code which enables the compiler to achieve a more optimal register allocation. All basic interger, pointer and floating-point types, fields of structures and complete structures can be allocated to registers. A variable may be allocated to a register if: it is a local variable or a function parameter and its address in never taken, or its address is taken but not assigned to another variable. A field in a structure may be allocated to a register if: it is declared locally or a function parameter and SW OPTIMIZATION (Courtesy M. Romdhane, Conexant) Page 3 of 8

4 the structure is not assigned directly with the result of a function call, and neither the address of the structure nor any of its filds is taken, or if any of these addresses is taken, it is not to another variable. 8. ALIASING/POINTERs Pointer must be used carefully or poor code can be produced If the address of a variable is taken, the compiler must assume that the variable can be changed by any assignment through a pointer or by any function call That makes it impossible to put it into a register, This is also true for Global Variables, as they might have their address taken in some other function. Pointer Aliasing Problem. Some other compilers ignore this but ARM compiler does not (because this rule is part of the ANSI/ISO standard. (ignoring can produce untraceable bugs) Rules of thumb on local and global variables: Avoid taking the address of local variables avoid global variables avoid pointer chains 9. LOCAL VARIABLES: Sometimes it is necessary to take the address of variables, example if they are passed as a reference parameter to a function. This means that these variables can not be allocated to registers. SOLUTION: Make a copy of the variable and pass the address of that copy instead. EXAMPLE: Void f(int *a); Int g(int a); Int test1(int I) { f(&i); address of I is taken cannot allocate reg /*now use 'I' extensively */ I += g(i); I += g(i); Return I; int test2(int I) { int dummy =I; f(&dummy); I=dummy; I += g(i); I += g(i); Return I; 10. GLOBAL VARIABLES Global variables are never allocated to registers (unless the global_reg feature is used). Global Variables can be changed by assigning the indirectly using a pointer, or by a function call. Hence the compiler cannot cache the variable in a register extra (often unnecessary) loads and stores when globals are used. Rule of thumb: NEVER USE GLOBAL VARIABLES IN A CRITICAL LOOPs. if a function uses global variables heavily, when possible and when it makes sense: copy those global variables into local variables so that they can be assigned to registers. SW OPTIMIZATION (Courtesy M. Romdhane, Conexant) Page 4 of 8

5 Of course, this is possible only if those globals are not used by any of the function which are called. EXAMPLE: Int f(void; Int g(void); Int errs; Void test1(void) {errs += f(); errs += g(); void test2(void) { int localerrs = errs; localerrs += f(); localerrs += g(); errs = localerrs; Here, test 1 must load and store the global errs value each time it is incremented, whereas test2 stores localerrs in a register and needs only a single instruction. 11. POINTER CHAINS Example: Typedef struct { int x, y, z; Point3; Typedef struct { Point3 *pos, * direction; Object; Void InitPos1(Object *p) { p pos x = 0; p pos y = 0; p pos z=0; This code must reload p pos for each assignment. Instead, cache p pos in a local variable: Void InitPos2(Object *p) { Point3 *pos = p pos; pos x = 0; pos y = 0; pos z = 0; An alternative is to avoid pointers in the first place by including Point3 structure in the Object structure. 12. LIVE VARIABLES & SPILLING These have effect on the quality of register allocation. ARM has 14 integer registers available. The ARM Compiler supports live-range spilling: LIVE RANGE OF A VARIABLE: last assignment & usage {all statements Next assignment In this range the value of the variable is valid, thus alive. In between live ranges the value of a variable is not needed: it is dead. So its register can be reused by other variables, resulting in allocation of more variables to registers. The number of registers needed for register-allocatable variables is at least the SW OPTIMIZATION (Courtesy M. Romdhane, Conexant) Page 5 of 8

6 number of overlapping live ranges at each point in a function. If this exceeds the number of registers available, some variables must be stored to memory temporarily. This process is called SPILLING. The compiler spills the least frequently used variables first. SPILLING CAN BE AVOIDED BY 1. Limiting the maximum number of variables. 2. Keep expressions SIMPLE and SMALL. 3. Minimize the number of variables in a function. 4. Subdivide large functions into SMALLER, SIMPLER ones 5. using register for frequently used variables etc 13. DECLARING VARIABLE TYPES Use the most appropriate variable types - char, short, int, long, signed and unsigned, float, double. For local variables, when possible avoid char & short as local variables. (The compiler needs to reduce the size of local variable to 8 (shift right by 24) or 16 bits (shift right by 16). These operation can be avoided if int is used, thus optimizing both the speed and the codesize. 14. FUNCTION DESIGN General Rule: keep functions small and simple. This enables the compiler to perform optimizations, such as register allocation, more efficiently. Function Call Overhead: Relatively small - The minimal call-return sequence is BL MOV pc, lr (~ 6 cycles) - The Multiple load and store instruction (LDM, STM) (PUSH, POP in Thumb instruction set) reduce the cost of function entry and exit when some registers need to be saved. - Under the APCS, up to 4 words of arguments can be passed to a function in registers. If more needed (e.g., 5 th and 6 th are passed on the stack), then there is additional cost of storing these words in the calling function and reloading them in the called function. Int f1(int a, int b, int c, int d) { return a + b+ c + d; int g1 (void) { return f1(1, 2, 3, 4); Int f2(int a, int b, int c, int d, int e, int f) { return a + b+ c + d + e + f; int g2 (void) { return f1(1, 2, 3, 4, 5, 6); Here, the 5 th and 6 th parameters are stored on the stack in g2, and reloaded in f2, costing 2 Memory accesses per parameter. Therefore, Use four or less arguments for small functions if more arguments are needed, make sure that the function does significant amount of work pass pointers to structures instead of passing the structure itself use value_in_regs specifier, which can be used to return structures of upto 4 words in registers. Otherwise, normally structures are returned on the stack. Example: SW OPTIMIZATION (Courtesy M. Romdhane, Conexant) Page 6 of 8

7 typedef struct {int hi; uint lo; int64; //low word unsigned value_in_regs int64 add64(int64 x, int64 y) // see specifier { int64 res; res.lo = x.lo + y.lo; res.hi = x.hi + y.hi; if (res.lo < y.lo) res.hi++; // carry from low word return res; void test(void) { int64 a, b, c, sum; a.hi = 0x ; a.lo = 0xF ; b.hi = 0x ; b.lo = 0x ; sum= add64(a,b); c.hi = 0x ; c.lo = 0xffffffff; sum = add64(sum c); Here, by using value_in_regs, the code size is 52 Otherwise it would have been 160 bytes! LEAF FUNCTIONS These are functions that do not call any other function. These can be efficiently compiled with ARM compiler, since we do not need to perform the usual saving and restoring of registers. maximize use of leaf functions TAIL CONTINUED FUNCTION When a func ends with a call to another function, the call can be converted to a branch to that function. This is called tail continuation. This usally saves stackspace and branch. PURE FUNCTION The result they return depends only on their arguments (math func). Therefore, there are no side effects: cannot read or write global state by using global variables or indirecting through pointers use pure pure int square (int x) { return x * x int f(int n) { return square(n) + square(n); INLINE FUNCTIONS Inline functions do not have call overhead, and a lower argument evaluation overhead. Therefore, more compiler optimizations are possible (e.g., Combine ADD + MUL MLA: Mul accumulate instruction inline: Each call to an inline function is substituted by its budy, instead of a normal call. Faster code, larger codesize. FUNCTION DEFINITION Placing function definitions before their use can produce better code. It allows the compiler to analyze the register usage of the called function. This is a simple form of interprocedural optimization (where opt are carried out between functions). Finally, where possible, use table lookup approximations, rather than function calls. 15. USE MACROS FOR PORTABILITY: #ifdef arm SW OPTIMIZATION (Courtesy M. Romdhane, Conexant) Page 7 of 8

8 # define INLINE inline # define VALUE_IN_REGS value_in_regs #define PURE pure #else # define INLINE # define VALUE_IN_REGS # define PURE #endif SW OPTIMIZATION (Courtesy M. Romdhane, Conexant) Page 8 of 8

Support for high-level languages

Support for high-level languages Outline: Support for high-level languages memory organization ARM data types conditional statements & loop structures the ARM Procedure Call Standard hands-on: writing & debugging C programs 2005 PEVE

More information

Computer Systems Lecture 9

Computer Systems Lecture 9 Computer Systems Lecture 9 CPU Registers in x86 CPU status flags EFLAG: The Flag register holds the CPU status flags The status flags are separate bits in EFLAG where information on important conditions

More information

Instruction Sets: Characteristics and Functions Addressing Modes

Instruction Sets: Characteristics and Functions Addressing Modes Instruction Sets: Characteristics and Functions Addressing Modes Chapters 10 and 11, William Stallings Computer Organization and Architecture 7 th Edition What is an Instruction Set? The complete collection

More information

Compiler Optimization

Compiler Optimization Compiler Optimization The compiler translates programs written in a high-level language to assembly language code Assembly language code is translated to object code by an assembler Object code modules

More information

Programming the ARM. Computer Design 2002, Lecture 4. Robert Mullins

Programming the ARM. Computer Design 2002, Lecture 4. Robert Mullins Programming the ARM Computer Design 2002, Lecture 4 Robert Mullins 2 Quick Recap The Control Flow Model Ordered list of instructions, fetch/execute, PC Instruction Set Architectures Types of internal storage

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 4 Thomas Wies New York University Review Last week Control Structures Selection Loops Adding Invariants Outline Subprograms Calling Sequences Parameter

More information

CprE 288 Introduction to Embedded Systems ARM Assembly Programming: Translating C Control Statements and Function Calls

CprE 288 Introduction to Embedded Systems ARM Assembly Programming: Translating C Control Statements and Function Calls CprE 288 Introduction to Embedded Systems ARM Assembly Programming: Translating C Control Statements and Function Calls Instructors: Dr. Phillip Jones 1 Announcements Final Projects Projects: Mandatory

More information

CprE 288 Introduction to Embedded Systems Course Review for Exam 3. Instructors: Dr. Phillip Jones

CprE 288 Introduction to Embedded Systems Course Review for Exam 3. Instructors: Dr. Phillip Jones CprE 288 Introduction to Embedded Systems Course Review for Exam 3 Instructors: Dr. Phillip Jones 1 Announcements Exam 3: See course website for day/time. Exam 3 location: Our regular classroom Allowed

More information

ARM Assembly Programming II

ARM Assembly Programming II ARM Assembly Programming II Computer Organization and Assembly Languages Yung-Yu Chuang 2007/11/26 with slides by Peng-Sheng Chen GNU compiler and binutils HAM uses GNU compiler and binutils gcc: GNU C

More information

EL6483: Brief Overview of C Programming Language

EL6483: Brief Overview of C Programming Language EL6483: Brief Overview of C Programming Language EL6483 Spring 2016 EL6483 EL6483: Brief Overview of C Programming Language Spring 2016 1 / 30 Preprocessor macros, Syntax for comments Macro definitions

More information

ARM Cortex-M4 Architecture and Instruction Set 4: The Stack and subroutines

ARM Cortex-M4 Architecture and Instruction Set 4: The Stack and subroutines ARM Cortex-M4 Architecture and Instruction Set 4: The Stack and subroutines M J Brockway February 13, 2016 The Cortex-M4 Stack SP The subroutine stack is full, descending It grows downwards from higher

More information

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find CS1622 Lecture 15 Semantic Analysis CS 1622 Lecture 15 1 Semantic Analysis How to build symbol tables How to use them to find multiply-declared and undeclared variables. How to perform type checking CS

More information

ECE 471 Embedded Systems Lecture 5

ECE 471 Embedded Systems Lecture 5 ECE 471 Embedded Systems Lecture 5 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 17 September 2013 HW#1 is due Thursday Announcements For next class, at least skim book Chapter

More information

ISA: The Hardware Software Interface

ISA: The Hardware Software Interface ISA: The Hardware Software Interface Instruction Set Architecture (ISA) is where software meets hardware In embedded systems, this boundary is often flexible Understanding of ISA design is therefore important

More information

Embedded Controller Programming 2

Embedded Controller Programming 2 Embedded Controller Programming 2 Section 3: C Language for Embedded Systems - Ken Arnold ecp2@hte.com Copyright 2006 Ken Arnold Overview Structures Unions Scope of Variables Pointers Operators and Precedence

More information

Lectures 5-6: Introduction to C

Lectures 5-6: Introduction to C Lectures 5-6: Introduction to C Motivation: C is both a high and a low-level language Very useful for systems programming Faster than Java This intro assumes knowledge of Java Focus is on differences Most

More information

ARM Assembly Programming

ARM Assembly Programming ARM Assembly Programming Computer Organization and Assembly Languages g Yung-Yu Chuang 2007/12/1 with slides by Peng-Sheng Chen GNU compiler and binutils HAM uses GNU compiler and binutils gcc: GNU C compiler

More information

ARM Assembly Language. Programming

ARM Assembly Language. Programming Outline: ARM Assembly Language the ARM instruction set writing simple programs examples Programming hands-on: writing simple ARM assembly programs 2005 PEVE IT Unit ARM System Design ARM assembly language

More information

ARM PROGRAMMING. When use assembly

ARM PROGRAMMING. When use assembly ARM PROGRAMMING Bùi Quốc Bảo When use assembly Functions that cannot be implemented in C, such as special register accesses and exclusive accesses Timing-critical routines Tight memory requirements, causing

More information

Course Administration

Course Administration Fall 2018 EE 3613: Computer Organization Chapter 2: Instruction Set Architecture Introduction 4/4 Avinash Karanth Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 45701

More information

ARM Assembly Programming

ARM Assembly Programming ARM Assembly Programming Computer Organization and Assembly Languages g Yung-Yu Chuang with slides by Peng-Sheng Chen GNU compiler and binutils HAM uses GNU compiler and binutils gcc: GNU C compiler as:

More information

ECEN 449 Microprocessor System Design. Review of C Programming. Texas A&M University

ECEN 449 Microprocessor System Design. Review of C Programming. Texas A&M University ECEN 449 Microprocessor System Design Review of C Programming 1 Objectives of this Lecture Unit Review C programming basics Refresh programming skills 2 Basic C program structure # include main()

More information

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1... Instruction-set Design Issues: what is the format(s) Opcode Dest. Operand Source Operand 1... 1) Which instructions to include: How many? Complexity - simple ADD R1, R2, R3 complex e.g., VAX MATCHC substrlength,

More information

CS 3330 Exam 3 Fall 2017 Computing ID:

CS 3330 Exam 3 Fall 2017 Computing ID: S 3330 Fall 2017 Exam 3 Variant E page 1 of 16 Email I: S 3330 Exam 3 Fall 2017 Name: omputing I: Letters go in the boxes unless otherwise specified (e.g., for 8 write not 8 ). Write Letters clearly: if

More information

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015 Branch Addressing Branch instructions specify Opcode, two registers, target address Most branch targets are near branch Forward or backward op rs rt constant or address 6 bits 5 bits 5 bits 16 bits PC-relative

More information

Practical Malware Analysis

Practical Malware Analysis Practical Malware Analysis Ch 4: A Crash Course in x86 Disassembly Revised 1-16-7 Basic Techniques Basic static analysis Looks at malware from the outside Basic dynamic analysis Only shows you how the

More information

ECEN 449 Microprocessor System Design. Review of C Programming

ECEN 449 Microprocessor System Design. Review of C Programming ECEN 449 Microprocessor System Design Review of C Programming 1 Objectives of this Lecture Unit Review C programming basics Refresh es programming g skills s 2 1 Basic C program structure # include

More information

Architecture. Digital Computer Design

Architecture. Digital Computer Design Architecture Digital Computer Design Architecture The architecture is the programmer s view of a computer. It is defined by the instruction set (language) and operand locations (registers and memory).

More information

Reminder: tutorials start next week!

Reminder: tutorials start next week! Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected

More information

C Syntax Out: 15 September, 1995

C Syntax Out: 15 September, 1995 Burt Rosenberg Math 220/317: Programming II/Data Structures 1 C Syntax Out: 15 September, 1995 Constants. Integer such as 1, 0, 14, 0x0A. Characters such as A, B, \0. Strings such as "Hello World!\n",

More information

CHAPTER 4 FUNCTIONS. 4.1 Introduction

CHAPTER 4 FUNCTIONS. 4.1 Introduction CHAPTER 4 FUNCTIONS 4.1 Introduction Functions are the building blocks of C++ programs. Functions are also the executable segments in a program. The starting point for the execution of a program is main

More information

Chapter 2A Instructions: Language of the Computer

Chapter 2A Instructions: Language of the Computer Chapter 2A Instructions: Language of the Computer Copyright 2009 Elsevier, Inc. All rights reserved. Instruction Set The repertoire of instructions of a computer Different computers have different instruction

More information

Figure 1 Common Sub Expression Optimization Example

Figure 1 Common Sub Expression Optimization Example General Code Optimization Techniques Wesley Myers wesley.y.myers@gmail.com Introduction General Code Optimization Techniques Normally, programmers do not always think of hand optimizing code. Most programmers

More information

Lectures 5-6: Introduction to C

Lectures 5-6: Introduction to C Lectures 5-6: Introduction to C Motivation: C is both a high and a low-level language Very useful for systems programming Faster than Java This intro assumes knowledge of Java Focus is on differences Most

More information

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University G22.2110-001 Programming Languages Spring 2010 Lecture 4 Robert Grimm, New York University 1 Review Last week Control Structures Selection Loops 2 Outline Subprograms Calling Sequences Parameter Passing

More information

CSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1

CSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1 CSIS1120A 10. Instruction Set & Addressing Mode CSIS1120A 10. Instruction Set & Addressing Mode 1 Elements of a Machine Instruction Operation Code specifies the operation to be performed, e.g. ADD, SUB

More information

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Supercharge your PS3 game code Part 1: Compiler internals.

More information

Topic 6: A Quick Intro To C. Reading. "goto Considered Harmful" History

Topic 6: A Quick Intro To C. Reading. goto Considered Harmful History Topic 6: A Quick Intro To C Reading Assumption: All of you know basic Java. Much of C syntax is the same. Also: Some of you have used C or C++. Goal for this topic: you can write & run a simple C program

More information

ARM Architecture and Instruction Set

ARM Architecture and Instruction Set AM Architecture and Instruction Set Ingo Sander ingo@imit.kth.se AM Microprocessor Core AM is a family of ISC architectures, which share the same design principles and a common instruction set AM does

More information

Why Pointers. Pointers. Pointer Declaration. Two Pointer Operators. What Are Pointers? Memory address POINTERVariable Contents ...

Why Pointers. Pointers. Pointer Declaration. Two Pointer Operators. What Are Pointers? Memory address POINTERVariable Contents ... Why Pointers Pointers They provide the means by which functions can modify arguments in the calling function. They support dynamic memory allocation. They provide support for dynamic data structures, such

More information

Fixed-Point Math and Other Optimizations

Fixed-Point Math and Other Optimizations Fixed-Point Math and Other Optimizations Embedded Systems 8-1 Fixed Point Math Why and How Floating point is too slow and integers truncate the data Floating point subroutines: slower than native, overhead

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 11 Instruction Sets: Addressing Modes and Formats

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 11 Instruction Sets: Addressing Modes and Formats William Stallings Computer Organization and Architecture 8 th Edition Chapter 11 Instruction Sets: Addressing Modes and Formats Addressing Modes Immediate Direct Indirect Register Register Indirect Displacement

More information

EE319K Exam 1 Summer 2014 Page 1. Exam 1. Date: July 9, Printed Name:

EE319K Exam 1 Summer 2014 Page 1. Exam 1. Date: July 9, Printed Name: EE319K Exam 1 Summer 2014 Page 1 Exam 1 Date: July 9, 2014 UT EID: Printed Name: Last, First Your signature is your promise that you have not cheated and will not cheat on this exam, nor will you help

More information

Compiler Design and Construction Optimization

Compiler Design and Construction Optimization Compiler Design and Construction Optimization Generating Code via Macro Expansion Macroexpand each IR tuple or subtree A := B+C; D := A * C; lw $t0, B, lw $t1, C, add $t2, $t0, $t1 sw $t2, A lw $t0, A

More information

ECE 372 Microcontroller Design Assembly Programming. ECE 372 Microcontroller Design Assembly Programming

ECE 372 Microcontroller Design Assembly Programming. ECE 372 Microcontroller Design Assembly Programming Assembly Programming HCS12 Assembly Programming Basic Assembly Programming Top Assembly Instructions (Instruction You Should Know!) Assembly Programming Concepts Assembly Programming HCS12 Assembly Instructions

More information

CS 61c: Great Ideas in Computer Architecture

CS 61c: Great Ideas in Computer Architecture MIPS Functions July 1, 2014 Review I RISC Design Principles Smaller is faster: 32 registers, fewer instructions Keep it simple: rigid syntax, fixed instruction length MIPS Registers: $s0-$s7,$t0-$t9, $0

More information

Introduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C

Introduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C Final Review CS304 Introduction to C Why C? Difference between Python and C C compiler stages Basic syntax in C Pointers What is a pointer? declaration, &, dereference... Pointer & dynamic memory allocation

More information

Branch Instructions. R type: Cond

Branch Instructions. R type: Cond Branch Instructions Standard branch instructions, B and BL, change the PC based on the PCR. The next instruction s address is found by adding a 24-bit signed 2 s complement immediate value

More information

Robust Programming. Style of programming that prevents abnormal termination and unexpected actions

Robust Programming. Style of programming that prevents abnormal termination and unexpected actions Robust Programming Style of programming that prevents abnormal termination and unexpected actions Code handles bad inputs reasonably Code assumes errors will occur and takes appropriate action Also called

More information

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine Machine Language Instructions Introduction Instructions Words of a language understood by machine Instruction set Vocabulary of the machine Current goal: to relate a high level language to instruction

More information

CIT Week13 Lecture

CIT Week13 Lecture CIT 3136 - Week13 Lecture Runtime Environments During execution, allocation must be maintained by the generated code that is compatible with the scope and lifetime rules of the language. Typically there

More information

ARM Assembler Workbook. CS160 Computer Organization Version 1.1 October 27 th, 2002 Revised Fall 2005

ARM Assembler Workbook. CS160 Computer Organization Version 1.1 October 27 th, 2002 Revised Fall 2005 ARM Assembler Workbook CS160 Computer Organization Version 1.1 October 27 th, 2002 Revised Fall 2005 ARM University Program Version 1.0 January 14th, 1997 Introduction Aim This workbook provides the student

More information

CSCE 5610: Computer Architecture

CSCE 5610: Computer Architecture HW #1 1.3, 1.5, 1.9, 1.12 Due: Sept 12, 2018 Review: Execution time of a program Arithmetic Average, Weighted Arithmetic Average Geometric Mean Benchmarks, kernels and synthetic benchmarks Computing CPI

More information

Communicating with People (2.8)

Communicating with People (2.8) Communicating with People (2.8) For communication Use characters and strings Characters 8-bit (one byte) data for ASCII lb $t0, 0($sp) ; load byte Load a byte from memory, placing it in the rightmost 8-bits

More information

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution 1. (40 points) Write the following subroutine in x86 assembly: Recall that: int f(int v1, int v2, int v3) { int x = v1 + v2; urn (x + v3) * (x v3); Subroutine arguments are passed on the stack, and can

More information

P.G.TRB - COMPUTER SCIENCE. c) data processing language d) none of the above

P.G.TRB - COMPUTER SCIENCE. c) data processing language d) none of the above P.G.TRB - COMPUTER SCIENCE Total Marks : 50 Time : 30 Minutes 1. C was primarily developed as a a)systems programming language b) general purpose language c) data processing language d) none of the above

More information

Some Basic Concepts EL6483. Spring EL6483 Some Basic Concepts Spring / 22

Some Basic Concepts EL6483. Spring EL6483 Some Basic Concepts Spring / 22 Some Basic Concepts EL6483 Spring 2016 EL6483 Some Basic Concepts Spring 2016 1 / 22 Embedded systems Embedded systems are rather ubiquitous these days (and increasing rapidly). By some estimates, there

More information

ECE 2035 A Programming Hw/Sw Systems Spring problems, 8 pages Final Exam 29 April 2015

ECE 2035 A Programming Hw/Sw Systems Spring problems, 8 pages Final Exam 29 April 2015 Instructions: This is a closed book, closed note exam. Calculators are not permitted. If you have a question, raise your hand and I will come to you. Please work the exam in pencil and do not separate

More information

We will begin our study of computer architecture From this perspective. Machine Language Control Unit

We will begin our study of computer architecture From this perspective. Machine Language Control Unit An Instruction Set View Introduction Have examined computer from several different views Observed programmer s view Focuses on instructions computer executes Collection of specific set of instructions

More information

Computer Organization CS 206 T Lec# 2: Instruction Sets

Computer Organization CS 206 T Lec# 2: Instruction Sets Computer Organization CS 206 T Lec# 2: Instruction Sets Topics What is an instruction set Elements of instruction Instruction Format Instruction types Types of operations Types of operand Addressing mode

More information

NET3001. Advanced Assembly

NET3001. Advanced Assembly NET3001 Advanced Assembly Arrays and Indexing supposed we have an array of 16 bytes at 0x0800.0100 write a program that determines if the array contains the byte '0x12' set r0=1 if the byte is found plan:

More information

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1... Instruction-set Design Issues: what is the format(s) Opcode Dest. Operand Source Operand 1... 1) Which instructions to include: How many? Complexity - simple ADD R1, R2, R3 complex e.g., VAX MATCHC substrlength,

More information

COE608: Computer Organization and Architecture

COE608: Computer Organization and Architecture Add on Instruction Set Architecture COE608: Computer Organization and Architecture Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University Overview More

More information

Quiz for Chapter 2 Instructions: Language of the Computer3.10

Quiz for Chapter 2 Instructions: Language of the Computer3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: 1. [5 points] Prior to the early 1980s, machines were built

More information

September, Saeid Nooshabadi. Review: Floating Point Representation COMP Single Precision and Double Precision

September, Saeid Nooshabadi. Review: Floating Point Representation COMP Single Precision and Double Precision Review: Floating Point Representation COMP3211 lec22-fraction1 COMP 3221 Microprocessors and Embedded Systems Lectures 22 : Fractions http://wwwcseunsweduau/~cs3221 September, 2003 Saeid@unsweduau Single

More information

Emulation. Michael Jantz

Emulation. Michael Jantz Emulation Michael Jantz Acknowledgements Slides adapted from Chapter 2 in Virtual Machines: Versatile Platforms for Systems and Processes by James E. Smith and Ravi Nair Credit to Prasad A. Kulkarni some

More information

MIPS Programming. A basic rule is: try to be mechanical (that is, don't be "tricky") when you translate high-level code into assembler code.

MIPS Programming. A basic rule is: try to be mechanical (that is, don't be tricky) when you translate high-level code into assembler code. MIPS Programming This is your crash course in assembler programming; you will teach yourself how to program in assembler for the MIPS processor. You will learn how to use the instruction set summary to

More information

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites: C Programming Code: MBD101 Duration: 10 Hours Prerequisites: You are a computer science Professional/ graduate student You can execute Linux/UNIX commands You know how to use a text-editing tool You should

More information

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009 What Compilers Can and Cannot Do Saman Amarasinghe Fall 009 Optimization Continuum Many examples across the compilation pipeline Static Dynamic Program Compiler Linker Loader Runtime System Optimization

More information

Procedure Calling. Procedure Calling. Register Usage. 25 September CSE2021 Computer Organization

Procedure Calling. Procedure Calling. Register Usage. 25 September CSE2021 Computer Organization CSE2021 Computer Organization Chapter 2: Part 2 Procedure Calling Procedure (function) performs a specific task and return results to caller. Supporting Procedures Procedure Calling Calling program place

More information

Week 4 Lecture 1. Expressions and Functions

Week 4 Lecture 1. Expressions and Functions Lecture 1 Expressions and Functions Expressions A representation of a value Expressions have a type Expressions have a value Examples 1 + 2: type int; value 3 1.2 + 3: type float; value 4.2 2 More expression

More information

COMP322 - Introduction to C++ Lecture 02 - Basics of C++

COMP322 - Introduction to C++ Lecture 02 - Basics of C++ COMP322 - Introduction to C++ Lecture 02 - Basics of C++ School of Computer Science 16 January 2012 C++ basics - Arithmetic operators Where possible, C++ will automatically convert among the basic types.

More information

Control Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary

Control Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary Control Instructions Computer Organization Architectures for Embedded Computing Thursday, 26 September 2013 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,

More information

Control Instructions

Control Instructions Control Instructions Tuesday 22 September 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Instruction Set

More information

ECE 2035 Programming HW/SW Systems Spring problems, 6 pages Exam Two 11 March Your Name (please print) total

ECE 2035 Programming HW/SW Systems Spring problems, 6 pages Exam Two 11 March Your Name (please print) total Instructions: This is a closed book, closed note exam. Calculators are not permitted. If you have a question, raise your hand and I will come to you. Please work the exam in pencil and do not separate

More information

CSCI 171 Chapter Outlines

CSCI 171 Chapter Outlines Contents CSCI 171 Chapter 1 Overview... 2 CSCI 171 Chapter 2 Programming Components... 3 CSCI 171 Chapter 3 (Sections 1 4) Selection Structures... 5 CSCI 171 Chapter 3 (Sections 5 & 6) Iteration Structures

More information

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram: The CPU and Memory How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram: 1 Registers A register is a permanent storage location within

More information

EECS 213 Introduction to Computer Systems Dinda, Spring Homework 3. Memory and Cache

EECS 213 Introduction to Computer Systems Dinda, Spring Homework 3. Memory and Cache Homework 3 Memory and Cache 1. Reorder the fields in this structure so that the structure will (a) consume the most space and (b) consume the least space on an IA32 machine on Linux. struct foo { double

More information

SOURCE LANGUAGE DESCRIPTION

SOURCE LANGUAGE DESCRIPTION 1. Simple Integer Language (SIL) SOURCE LANGUAGE DESCRIPTION The language specification given here is informal and gives a lot of flexibility for the designer to write the grammatical specifications to

More information

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated CNIT 127: Exploit Development Ch 1: Before you begin Updated 1-14-16 Basic Concepts Vulnerability A flaw in a system that allows an attacker to do something the designer did not intend, such as Denial

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10032011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Chapter 3 Number Systems Fixed Point

More information

Computer Architecture and System Programming Laboratory. TA Session 3

Computer Architecture and System Programming Laboratory. TA Session 3 Computer Architecture and System Programming Laboratory TA Session 3 Stack - LIFO word-size data structure STACK is temporary storage memory area register points on top of stack (by default, it is highest

More information

button.c The little button that wouldn t

button.c The little button that wouldn t Goals for today The little button that wouldn't :( the volatile keyword Pointer operations => ARM addressing modes Implementation of C function calls Management of runtime stack, register use button.c

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

ARM Instruction Set Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ARM Instruction Set Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University ARM Instruction Set Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Condition Field (1) Most ARM instructions can be conditionally

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2016 Lecture 4a Andrew Tolmach Portland State University 1994-2016 Pragmatics of Large Values Real machines are very efficient at handling word-size chunks of data (e.g.

More information

Summary: Direct Code Generation

Summary: Direct Code Generation Summary: Direct Code Generation 1 Direct Code Generation Code generation involves the generation of the target representation (object code) from the annotated parse tree (or Abstract Syntactic Tree, AST)

More information

Chapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont )

Chapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont ) Chapter 2 Computer Abstractions and Technology Lesson 4: MIPS (cont ) Logical Operations Instructions for bitwise manipulation Operation C Java MIPS Shift left >>> srl Bitwise

More information

Overview. Introduction to the MIPS ISA. MIPS ISA Overview. Overview (2)

Overview. Introduction to the MIPS ISA. MIPS ISA Overview. Overview (2) Introduction to the MIPS ISA Overview Remember that the machine only understands very basic instructions (machine instructions) It is the compiler s job to translate your high-level (e.g. C program) into

More information

Optimization Prof. James L. Frankel Harvard University

Optimization Prof. James L. Frankel Harvard University Optimization Prof. James L. Frankel Harvard University Version of 4:24 PM 1-May-2018 Copyright 2018, 2016, 2015 James L. Frankel. All rights reserved. Reasons to Optimize Reduce execution time Reduce memory

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Programming Language Implementation

Programming Language Implementation A Practical Introduction to Programming Language Implementation 2014: Week 12 Optimisation College of Information Science and Engineering Ritsumeikan University 1 review of last week s topics why primitives

More information

EEM870 Embedded System and Experiment Lecture 4: ARM Instruction Sets

EEM870 Embedded System and Experiment Lecture 4: ARM Instruction Sets EEM870 Embedded System and Experiment Lecture 4 ARM Instruction Sets Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email wylin@mail.cgu.edu.tw March 2014 Introduction Embedded

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

Review of the C Programming Language for Principles of Operating Systems

Review of the C Programming Language for Principles of Operating Systems Review of the C Programming Language for Principles of Operating Systems Prof. James L. Frankel Harvard University Version of 7:26 PM 4-Sep-2018 Copyright 2018, 2016, 2015 James L. Frankel. All rights

More information

Exercise Session 2 Simon Gerber

Exercise Session 2 Simon Gerber Exercise Session 2 Simon Gerber CASP 2014 Exercise 2: Binary search tree Implement and test a binary search tree in C: Implement key insert() and lookup() functions Implement as C module: bst.c, bst.h

More information

Programming Fundamentals - A Modular Structured Approach using C++ By: Kenneth Leroy Busbee

Programming Fundamentals - A Modular Structured Approach using C++ By: Kenneth Leroy Busbee 1 0 1 0 Foundation Topics 1 0 Chapter 1 - Introduction to Programming 1 1 Systems Development Life Cycle N/A N/A N/A N/A N/A N/A 1-8 12-13 1 2 Bloodshed Dev-C++ 5 Compiler/IDE N/A N/A N/A N/A N/A N/A N/A

More information

Programming in C - Part 2

Programming in C - Part 2 Programming in C - Part 2 CPSC 457 Mohammad Reza Zakerinasab May 11, 2016 These slides are forked from slides created by Mike Clark Where to find these slides and related source code? http://goo.gl/k1qixb

More information

Computer Architecture. Chapter 2-2. Instructions: Language of the Computer

Computer Architecture. Chapter 2-2. Instructions: Language of the Computer Computer Architecture Chapter 2-2 Instructions: Language of the Computer 1 Procedures A major program structuring mechanism Calling & returning from a procedure requires a protocol. The protocol is a sequence

More information

Utilizing Tools to Effectively Code for the Architectural Features of an ARM Platform. Chris Shore Training Manager

Utilizing Tools to Effectively Code for the Architectural Features of an ARM Platform. Chris Shore Training Manager Utilizing Tools to Effectively Code for the Architectural Features of an ARM Platform Chris Shore Training Manager Have the right tools... Many tool sets are available This presentation assumes that you

More information