Program Op*miza*on and Analysis. Chenyang Lu CSE 467S

Size: px
Start display at page:

Download "Program Op*miza*on and Analysis. Chenyang Lu CSE 467S"

Transcription

1 Program Op*miza*on and Analysis Chenyang Lu CSE 467S 1

2 Program Transforma*on op#mize Analyze HLL compile assembly assemble Physical Address Rela5ve Address assembly object load executable link Absolute Address Chenyang Lu CSE 467S 2

3 What do we need to do? Ø Understand optimization levels (-O1, -O2, etc.) q Ø Optimize HLL code. Ø Analyze and optimize assembly code. Ø Modifying compiler output requires care: q correctness; q loss of hand-tweaked code. Chenyang Lu CSE 467S 3

4 Goals Ø Optimizing for execution time. Ø Optimizing for energy/power. Ø Optimizing for program size. Ø They may conflict with each other! Chenyang Lu CSE 467S 4

5 Expression Simplifica*on Ø Constant folding: q 8+1 = 9 Ø Algebraic: q a*b + a*c = a*(b+c) Ø Strength reduction: q a*2 = a<<1 Chenyang Lu CSE 467S 5

6 Dead Code Elimina*on Ø Dead code: #define DEBUG 0 if (DEBUG) dbg(p1); Ø Eliminate by control flow analysis, constant folding dbg(p1); Chenyang Lu CSE 467S 6

7 Func*on Call Chenyang Lu CSE 467S 7

8 Instruc*ons (ARM7) Ø Branch and link instruction: BL foo == MOV r14, r15 B foo q r15 contains the current PC q Copies current PC to r14. Ø To return from subroutine: MOV r15,r14 Chenyang Lu CSE 467S 8

9 Stack Ø Use a stack to keep track of q parameters, q return value, q return address. Ø Caller and callee access the stack in a consistent order. q Different compilers/programmers may follow different orders. Ø Access the stack (ARM7) q r13 always points to the top of stack q Push: STR r0, [r13, #4]! q Pop: SUB r13, #4 Chenyang Lu CSE 467S 9

10 Stack Opera*ons Ø Caller: call a function q Push parameters to stack q BL (r15 à r14; jump) Ø Callee: receive a call q Read parameters from stack q Overwrite top of stack with return address (r14) Ø Callee: return q Load PC with return address (on top of stack) Ø Caller: receive a return q Pop callee s return address from stack Chenyang Lu CSE 467S 10

11 Nested func*on calls (ARM7) main() { f1(x); } void f1(int a) { f2(a); } ; f1 is called by main() LDR r0, [r13] ; load parameter into r0 from stack STR r14, [r13] ; store f1 s return addr. ; f1 calls f2() STR r0, [r13, #4]! ; push parameter for f2 to stack BL f2 ; branch and link to f2 ; return from f2() SUB r13, #4 ; pop f2 s parameter off stack ; f1 returns to main() LDR r15, [r13] ; restore register and return Chenyang Lu CSE 467S 11

12 Func*on Inlining int foo(a,b,c) { return a + b - c;} z = foo(w,x,y); ð z = w + x - y; Ø Improve performance by eliminating function call overhead. Ø May increase code size, but not always Ø Affect instruction cache behavior. Chenyang Lu CSE 467S 12

13 Op*miza*on: Inlining App Code size Code inlined noninlined reduction Data size CPU reduction Surge % % Maté % % TinyDB % % Inlining improves performance and reduces code size. Why? Chenyang Lu 13

14 Loops Chenyang Lu CSE 467S 14

15 Loop Unrolling Ø Reduces loop overhead for (i=0; i<4; i++) a[i] = b[i] * c[i]; ð for (i=0; i<4; i+=2) { a[i] = b[i] * c[]; } a[i+1] = b[i+1] * c[i+1]; Chenyang Lu CSE 467S 15

16 Loop Overhead on ARM7 ; loop initiation code MOV r0, #0 ; use r0 for loop counter MOV r8, #0 ; use separate index for arrays LDR r1, #4 ; buffer size MOV r2, #0 ; use r2 for f ADR r3, c ; load r3 with base of c[ ] ADR r5, x ; load r5 with base of x[ ] ; loop; L: LDR r4, [r3, r8] ; get c[i] LDR r6, [r5, r8] ; get x[i] MUL r4, r4, r6 ; compute c[i]x[i] ADD r2, r2, r4 ; add into sum ADD r8, r8, #4 ; add one word to array index ADD r0, r0, #1 ; add 1 to i CMP r0, r1 ; exit? BLT L ; if i < 4, continue Chenyang Lu CSE 467S 16

17 Loop Fusion Combines multiple loops: for (i=0; i<n; i++) a[i] = b[i] * 5; for (j=0; j<n; j++) w[j] = c[j] * d[j]; ð for (i=0; i<n; i++) { a[i] = b[i] * 5; w[i] = c[i] * d[i]; } Necessary conditions Ø Loops share a same index Ø No dependencies between two loops Chenyang Lu CSE 467S 17

18 Code Mo*on for (i=0; i<n*m; i++) z[i] = a[i] + b[i]; i=0; X i=0; = N*M Y i<n*m i<x N z[i] = a[i] + b[i]; i = i+1; Chenyang Lu CSE 467S 18

19 Array Chenyang Lu CSE 467S 19

20 One- Dimensional Array Ø C array name points to 0th element: a a[0] a[1] a[2] a[i] = *(a + i) Chenyang Lu CSE 467S 20

21 Two- Dimensional Array Ø Row-major layout: N a[0,0] a[0,1]... M... a[1,0] a[1,1] a[i][j] = *(a + i*m + j) Chenyang Lu CSE 467S 21

22 for (i=0; i<n; i++) for (j=0; j<m; j++) z[i][j] = b[i][j]; zptr = z; bptr = b; for (i=0; i<n; i++) for (j=0; j<m; j++) { zind = i*m+j; bind = i*m+j; *(zptr+zind)=*(bptr+bind) } zptr = z; bptr = b; for (i=0; i<n; i++) for (j=0; j<m; j++) { zbind = i*m+j; *(zptr+zbind)=*(bptr+zbind); } zptr = z; bptr = b; zbind = 0; for (i=0; i<n; i++) for (j=0; j<m; j++) { *(zptr+zbind)=*(bptr+zbind); zbind++; } induction variable elimination strength reduction Chenyang Lu CSE 467S 22

23 Cache Analysis Ø Loops use large quan55es of data (arrays) à cache conflicts Chenyang Lu CSE 467S 23

24 Direct- Mapped Cache 1 0xabcd byte byte byte... valid tag data cache block tag index offset = hit value byte Chenyang Lu CSE 467S 24

25 Array Conflicts in Cache for (i=0; i<n; i++) for (j=0; j<m; j++) a[i][j] = a[i][j] + b[i][j]; a[0,0] b[0,0] main memory cache Chenyang Lu CSE 467S 25

26 Array Conflicts Ø Array elements conflict because they are in the same line. Ø Solu5on: move one array. Chenyang Lu CSE 467S 26

27 Sta*c Cache Locking Ø Lock instructions in cache before execution. Ø Predictable execution time. Ø Similarly, lock code and data in memory to avoid paging. Chenyang Lu CSE 467S 27

28 Register Alloca*on Ø Fit current variables in registers. Ø Load once, use many times. ü Reduce number of cache/memory accesses. ü Improve performance. ü Reduce energy consumption. Chenyang Lu CSE 467S 28

29 Register Life*me Graph 1. w = a + b; 2. x = c + w; 3. y = c + d; 4. z = a - b; a b c d w x y z no. of needed register = Chenyang Lu CSE 467S 29

30 ATer Rescheduling 1. w = a + b; 2. z = a - b; 3. x = c + w; 4. y = c + d; a b c d w x y z no. of needed register = Cannot change dependencies between instructions! Chenyang Lu CSE 467S 30

31 Summary: Performance Op*miza*on Ø Use registers efficiently. Ø Optimize loops. Ø Optimize function calls. Ø Optimize cache behavior: q Avoid instruction conflicts by rewriting code, rescheduling; q Move conflicting scalar/array data can be moved. Chenyang Lu CSE 467S 31

32 Execu*on Time Analysis Ø Real-time systems must meet deadlines. Ø Need to analyze execution time. Chenyang Lu CSE 467S 32

33 Execu*on Time Ø Affected by program path and instruction timing Ø Program path depends on input data. q Sensor readings q User input Ø Instruction timing depends on q pipelining q cache behavior memory can be x10 slower than cache! Chenyang Lu CSE 467S 33

34 Program Path for (i=0, f=0; i<n; i++) f = f + c[i]*x[i]; i=0; f=0; Loop initiation executed once. Loop test executed N+1 times. Loop body and index update executed N times. i<n N Y f = f + c[i]*x[i]; i = i+1; Chenyang Lu CSE 467S 34

35 Execu*on Time Metrics Ø Difficult to predict execution time accurately. Ø Average case q For typical data values q Soft real-time Ø Worst case q For any possible input set q Hard real-time q Longest program path may NOT lead to worst-case execution time Chenyang Lu CSE 467S 35

36 Approaches Ø Compile-time analysis: pessimistic Ø Measurement: optimistic Chenyang Lu CSE 467S 36

37 Analysis Ø Analyze optimized assembly/binary code, not high-level language (HLL) code q HLL statement à many assembly/binary instructions q Example: function calls Ø Challenges q Program path depends on input data q Pipelining, cache effects are hard to predict q Analysis tends to be pessimistic Chenyang Lu CSE 467S 37

38 Measurement Ø CPU simulator q I/O may be hard to measure. q May not be totally accurate. Ø Time stamping q Requires instrumenting program. q Timer granularity Gettimeofday on Linux: ms Gethrtime on Intel processors: read 64-bit clock cycle counter and return the number of clock cycles since CPU was powered up or reset. Ø Logic analyzer: limited logic analyzer memory depth. Chenyang Lu CSE 467S 38

39 Output from a Logic Analyzer Timing diagram of event propagation on Mote Granularity: 50 microsecond Chenyang Lu CSE 467S 39

40 Trace- driven Analysis Ø Record of the program path of a program. Ø Help study cache behavior and power management policies. Ø A useful trace q requires proper input values; q is large. Chenyang Lu CSE 467S 40

41 Trace Genera*on Ø Hardware capture q Logic analyzer Limited buffer space Cannot observe on-chip cache q Hardware assist in CPU Pentium supports automatic tracing of branches Ø Software q PC sampling q Instrumentation instructions q Simulation Chenyang Lu CSE 467S 41

42 Goals Ø Optimizing for execution time. Ø Optimizing for energy/power. Ø Optimizing for program size. Chenyang Lu CSE 467S 42

43 Op*mizing for Program Size Ø Goals q Reduce memory cost; q Reduce power consumption. Ø Two opportunities: q Data; q Instructions. Chenyang Lu CSE 467S 43

44 Reduce Data Size Ø Reuse constants, variables, buffers in different parts of code. q Single-buffer in TinyOS. q Pack multiple flags in one byte. q Use shortest data type needed. q Requires careful verification of correctness. uint8_t i; for(i = 0; i < 1000; i++) {... } // This loop will never terminate Ø Generate data using instructions. Chenyang Lu CSE 467S 44

45 Reduce Code Size Ø Avoid loop unrolling. Ø Inlining? q Size of function q Number of calls Ø Choose CPU with compact instructions. q Digital Signal Processors (DSP) tend to have smaller code. Ø Some CPUs support dense instruction set q ARM Thumb, MIPS-16 Chenyang Lu CSE 467S 45

46 Code Compression Ø Use sta5s5cal compression to reduce code size. Ø Decompress on- the- fly. Ø Need to handle jump addresses main memory decompressor table LDR r0,[r4] cache CPU Chenyang Lu CSE 467S 46

47 Reading Ø Textbook 5.5, 5.6, 5.7, 5.8, 5.9. Chenyang Lu CSE 467S 47

Loops. Announcements. Loop fusion. Loop unrolling. Code motion. Array. Good targets for optimization. Basic loop optimizations:

Loops. Announcements. Loop fusion. Loop unrolling. Code motion. Array. Good targets for optimization. Basic loop optimizations: Announcements HW1 is available online Next Class Liang will give a tutorial on TinyOS/motes Very useful! Classroom: EADS Hall 116 This Wed ONLY Proposal is due on 5pm, Wed Email me your proposal Loops

More information

Program design and analysis

Program design and analysis Program design and analysis Optimizing for execution time. Optimizing for energy/power. Optimizing for program size. Motivation Embedded systems must often meet deadlines. Faster may not be fast enough.

More information

Compiler Optimization Intermediate Representation

Compiler Optimization Intermediate Representation Compiler Optimization Intermediate Representation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology

More information

nesc Ø Programming language for TinyOS and applications Ø Support TinyOS components Ø Whole-program analysis at compile time Ø Static language

nesc Ø Programming language for TinyOS and applications Ø Support TinyOS components Ø Whole-program analysis at compile time Ø Static language nesc Ø Programming language for TinyOS and applications Ø Support TinyOS components Ø Whole-program analysis at compile time q Improve robustness: detect race conditions q Optimization: function inlining

More information

Memory management units

Memory management units Memory management units Memory management unit (MMU) translates addresses: CPU logical address memory management unit physical address main memory Computers as Components 1 Access time comparison Media

More information

CS 2461: Computer Architecture 1

CS 2461: Computer Architecture 1 Next.. : Computer Architecture 1 Performance Optimization CODE OPTIMIZATION Code optimization for performance A quick look at some techniques that can improve the performance of your code Rewrite code

More information

Optimization Prof. James L. Frankel Harvard University

Optimization Prof. James L. Frankel Harvard University Optimization Prof. James L. Frankel Harvard University Version of 4:24 PM 1-May-2018 Copyright 2018, 2016, 2015 James L. Frankel. All rights reserved. Reasons to Optimize Reduce execution time Reduce memory

More information

Tour of common optimizations

Tour of common optimizations Tour of common optimizations Simple example foo(z) { x := 3 + 6; y := x 5 return z * y } Simple example foo(z) { x := 3 + 6; y := x 5; return z * y } x:=9; Applying Constant Folding Simple example foo(z)

More information

High-Level Synthesis Creating Custom Circuits from High-Level Code

High-Level Synthesis Creating Custom Circuits from High-Level Code High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida Exis%ng Design Flow Register-transfer (RT) synthesis - Specify RT structure (muxes,

More information

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015 Branch Addressing Branch instructions specify Opcode, two registers, target address Most branch targets are near branch Forward or backward op rs rt constant or address 6 bits 5 bits 5 bits 16 bits PC-relative

More information

CSE Lecture In Class Example Handout

CSE Lecture In Class Example Handout CSE 30321 Lecture 07-09 In Class Example Handout Part A: A Simple, MIPS-based Procedure: Swap Procedure Example: Let s write the MIPS code for the following statement (and function call): if (A[i] > A

More information

Chapter 2 Instructions Sets. Hsung-Pin Chang Department of Computer Science National ChungHsing University

Chapter 2 Instructions Sets. Hsung-Pin Chang Department of Computer Science National ChungHsing University Chapter 2 Instructions Sets Hsung-Pin Chang Department of Computer Science National ChungHsing University Outline Instruction Preliminaries ARM Processor SHARC Processor 2.1 Instructions Instructions sets

More information

HPC VT Machine-dependent Optimization

HPC VT Machine-dependent Optimization HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler

More information

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory

More information

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors CPUs Caches. Memory management. CPU performance. Cache : MainMemory :: Window : 1. Door 2. Bigger Door 3. The Great Outdoors 4. Horizontal Blinds 18% 9% 64% 9% Door Bigger Door The Great Outdoors Horizontal

More information

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space Code optimization Have we achieved optimal code? Impossible to answer! We make improvements to the code Aim: faster code and/or less space Types of optimization machine-independent In source code or internal

More information

Architecture. Digital Computer Design

Architecture. Digital Computer Design Architecture Digital Computer Design Architecture The architecture is the programmer s view of a computer. It is defined by the instruction set (language) and operand locations (registers and memory).

More information

Lecture 1 Introduc-on

Lecture 1 Introduc-on Lecture 1 Introduc-on What would you get out of this course? Structure of a Compiler Op9miza9on Example 15-745: Introduc9on 1 What Do Compilers Do? 1. Translate one language into another e.g., convert

More information

Advanced optimizations of cache performance ( 2.2)

Advanced optimizations of cache performance ( 2.2) Advanced optimizations of cache performance ( 2.2) 30 1. Small and Simple Caches to reduce hit time Critical timing path: address tag memory, then compare tags, then select set Lower associativity Direct-mapped

More information

EE4144: ARM Cortex-M Processor

EE4144: ARM Cortex-M Processor EE4144: ARM Cortex-M Processor EE4144 Fall 2014 EE4144 EE4144: ARM Cortex-M Processor Fall 2014 1 / 10 ARM Cortex-M 32-bit RISC processor Cortex-M4F Cortex-M3 + DSP instructions + floating point unit (FPU)

More information

Kampala August, Agner Fog

Kampala August, Agner Fog Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler

More information

Computer System Architecture

Computer System Architecture CSC 203 1.5 Computer System Architecture Department of Statistics and Computer Science University of Sri Jayewardenepura Addressing 2 Addressing Subject of specifying where the operands (addresses) are

More information

Stack Frames. September 2, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 September 2, / 15

Stack Frames. September 2, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 September 2, / 15 Stack Frames Geoffrey Brown Bryce Himebaugh Indiana University September 2, 2016 Geoffrey Brown, Bryce Himebaugh 2015 September 2, 2016 1 / 15 Outline Preserving Registers Saving and Restoring Registers

More information

Compiler Optimization

Compiler Optimization Compiler Optimization The compiler translates programs written in a high-level language to assembly language code Assembly language code is translated to object code by an assembler Object code modules

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

Elements of CPU performance

Elements of CPU performance Elements of CPU performance Cycle time. CPU pipeline. Superscalar design. Memory system. Texec = instructions ( )( program cycles instruction seconds )( ) cycle ARM7TDM CPU Core ARM Cortex A-9 Microarchitecture

More information

Lecture 3: Instruction Set Architecture

Lecture 3: Instruction Set Architecture Lecture 3: Instruction Set Architecture CSE 30: Computer Organization and Systems Programming Summer 2014 Diba Mirza Dept. of Computer Science and Engineering University of California, San Diego 1. Steps

More information

ECE 471 Embedded Systems Lecture 8

ECE 471 Embedded Systems Lecture 8 ECE 471 Embedded Systems Lecture 8 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 September 2018 Announcements HW#2 was due HW#3 will be posted today. Work in groups? Note

More information

Energy Awareness for Embedded Systems OPTIMIZING EMBEDDED SOFTWARE FOR POWER

Energy Awareness for Embedded Systems OPTIMIZING EMBEDDED SOFTWARE FOR POWER Energy Awareness for Embedded Systems OPTIMIZING EMBEDDED SOFTWARE FOR POWER Introduction Review of Power Consumption Understanding Power for Embedded Systems Software and Hardware Optimizations Review

More information

Assembly labs start this week. Don t forget to submit your code at the end of your lab section. Download MARS4_5.jar to your lab PC or laptop.

Assembly labs start this week. Don t forget to submit your code at the end of your lab section. Download MARS4_5.jar to your lab PC or laptop. CSC258 Week 10 Logistics Assembly labs start this week. Don t forget to submit your code at the end of your lab section. Download MARS4_5.jar to your lab PC or laptop. Quiz review A word-addressable RAM

More information

PERFORMANCE OPTIMISATION

PERFORMANCE OPTIMISATION PERFORMANCE OPTIMISATION Adrian Jackson a.jackson@epcc.ed.ac.uk @adrianjhpc Hardware design Image from Colfax training material Pipeline Simple five stage pipeline: 1. Instruction fetch get instruction

More information

Memory Hierarchy Basics

Memory Hierarchy Basics Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases

More information

Communicating with People (2.8)

Communicating with People (2.8) Communicating with People (2.8) For communication Use characters and strings Characters 8-bit (one byte) data for ASCII lb $t0, 0($sp) ; load byte Load a byte from memory, placing it in the rightmost 8-bits

More information

Reminder: tutorials start next week!

Reminder: tutorials start next week! Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected

More information

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

USC 227 Office hours: 3-4 Monday and Wednesday  CS553 Lecture 1 Introduction 4 CS553 Compiler Construction Instructor: URL: Michelle Strout mstrout@cs.colostate.edu USC 227 Office hours: 3-4 Monday and Wednesday http://www.cs.colostate.edu/~cs553 CS553 Lecture 1 Introduction 3 Plan

More information

Simone Campanoni Loop transformations

Simone Campanoni Loop transformations Simone Campanoni simonec@eecs.northwestern.edu Loop transformations Outline Simple loop transformations Loop invariants Induction variables Complex loop transformations Simple loop transformations Simple

More information

Instructions: Assembly Language

Instructions: Assembly Language Chapter 2 Instructions: Assembly Language Reading: The corresponding chapter in the 2nd edition is Chapter 3, in the 3rd edition it is Chapter 2 and Appendix A and in the 4th edition it is Chapter 2 and

More information

CMPSCI 201 Fall 2004 Midterm #1 Answers

CMPSCI 201 Fall 2004 Midterm #1 Answers CMPSCI 201 Fall 2004 Midterm #1 Answers 10 Points Short Essay Answer The 8088 is primarily a CISC processor design, and the ARM is primarily RISC. The 6502 is such an early design that it is difficult

More information

Computer Organization & Assembly Language Programming (CSE 2312)

Computer Organization & Assembly Language Programming (CSE 2312) Computer Organization & Assembly Language Programming (CSE 2312) Lecture 16: Processor Pipeline Introduction and Debugging with GDB Taylor Johnson Announcements and Outline Homework 5 due today Know how

More information

Computer Architecture and Organization. Instruction Sets: Addressing Modes and Formats

Computer Architecture and Organization. Instruction Sets: Addressing Modes and Formats Computer Architecture and Organization Instruction Sets: Addressing Modes and Formats Addressing Modes Immediate Direct Indirect Register Register Indirect Displacement (Indexed) Stack Immediate Addressing

More information

CprE 288 Introduction to Embedded Systems Course Review for Exam 3. Instructors: Dr. Phillip Jones

CprE 288 Introduction to Embedded Systems Course Review for Exam 3. Instructors: Dr. Phillip Jones CprE 288 Introduction to Embedded Systems Course Review for Exam 3 Instructors: Dr. Phillip Jones 1 Announcements Exam 3: See course website for day/time. Exam 3 location: Our regular classroom Allowed

More information

Computer Systems and Networks

Computer Systems and Networks LECTURE 16: MIPS (11 AND 12) Computer Systems and Networks Dr. Pallipuram (vpallipuramkrishnamani@pacific.edu) University of the Pacific Deadline Lab 11 is open: DUE 1 st NOV 5 AM Lab 12 is open: DUE 8

More information

ARM Instruction Set Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ARM Instruction Set Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University ARM Instruction Set Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Condition Field (1) Most ARM instructions can be conditionally

More information

CSE 410. Operating Systems

CSE 410. Operating Systems CSE 410 Operating Systems Handout: syllabus 1 Today s Lecture Course organization Computing environment Overview of course topics 2 Course Organization Course website http://www.cse.msu.edu/~cse410/ Syllabus

More information

ARM Cortex-M4 Architecture and Instruction Set 4: The Stack and subroutines

ARM Cortex-M4 Architecture and Instruction Set 4: The Stack and subroutines ARM Cortex-M4 Architecture and Instruction Set 4: The Stack and subroutines M J Brockway February 13, 2016 The Cortex-M4 Stack SP The subroutine stack is full, descending It grows downwards from higher

More information

Branch Prediction Memory Alignment Cache Compiler Optimisations Loop Optimisations PVM. Performance

Branch Prediction Memory Alignment Cache Compiler Optimisations Loop Optimisations PVM. Performance PVM Performance Branch Prediction Memory Alignment Cache Temporal Locality Spatial Locality Compiler Optimisations Dead Code Elimination Inlining Zero Cost Abstractions Compile Time Execution Tail Call

More information

Lectures 3-4: MIPS instructions

Lectures 3-4: MIPS instructions Lectures 3-4: MIPS instructions Motivation Learn how a processor s native language looks like Discover the most important software-hardware interface MIPS Microprocessor without Interlocked Pipeline Stages

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

LECTURE 19. Subroutines and Parameter Passing

LECTURE 19. Subroutines and Parameter Passing LECTURE 19 Subroutines and Parameter Passing ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments behind a simple name. Data abstraction: hide data

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

Programming the ARM. Computer Design 2002, Lecture 4. Robert Mullins

Programming the ARM. Computer Design 2002, Lecture 4. Robert Mullins Programming the ARM Computer Design 2002, Lecture 4 Robert Mullins 2 Quick Recap The Control Flow Model Ordered list of instructions, fetch/execute, PC Instruction Set Architectures Types of internal storage

More information

CSE /003, Fall 2014, Homework 4 Due October 7, 2014 in Class (at 2:00pm for 002, 3:30pm for 003)

CSE /003, Fall 2014, Homework 4 Due October 7, 2014 in Class (at 2:00pm for 002, 3:30pm for 003) CSE2312-002/003, Fall 2014, Homework 4 Due October 7, 2014 in Class (at 2:00pm for 002, 3:30pm for 003) The following problems are from Chapter 2 of the ARM Edition of the Patterson and Hennessy textbook

More information

Solution : mov R0, mov R3,0 mov R4,31 movu R2,0x0001 mov R5,0 andu R1,R0,0xffff

Solution : mov R0, mov R3,0 mov R4,31 movu R2,0x0001 mov R5,0 andu R1,R0,0xffff Q4. Write an ARM assembly program that checks if a 32-bit number is a palindrome. Assume that the input is available in r 3. The program should set r 4 to 1 if it is a palindrome, otherwise r 4 should

More information

Memories. CPE480/CS480/EE480, Spring Hank Dietz.

Memories. CPE480/CS480/EE480, Spring Hank Dietz. Memories CPE480/CS480/EE480, Spring 2018 Hank Dietz http://aggregate.org/ee480 What we want, what we have What we want: Unlimited memory space Fast, constant, access time (UMA: Uniform Memory Access) What

More information

CprE 288 Introduction to Embedded Systems ARM Assembly Programming: Translating C Control Statements and Function Calls

CprE 288 Introduction to Embedded Systems ARM Assembly Programming: Translating C Control Statements and Function Calls CprE 288 Introduction to Embedded Systems ARM Assembly Programming: Translating C Control Statements and Function Calls Instructors: Dr. Phillip Jones 1 Announcements Final Projects Projects: Mandatory

More information

CS422 Computer Architecture

CS422 Computer Architecture CS422 Computer Architecture Spring 2004 Lecture 19, 04 Mar 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Topics for Today Cache Performance Cache Misses:

More information

ARM Assembly Programming II

ARM Assembly Programming II ARM Assembly Programming II Computer Organization and Assembly Languages Yung-Yu Chuang 2007/11/26 with slides by Peng-Sheng Chen GNU compiler and binutils HAM uses GNU compiler and binutils gcc: GNU C

More information

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall CS 701 Charles N. Fischer Class Meets Tuesdays & Thursdays, 11:00 12:15 2321 Engineering Hall Fall 2003 Instructor http://www.cs.wisc.edu/~fischer/cs703.html Charles N. Fischer 5397 Computer Sciences Telephone:

More information

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri CS356: Discussion #6 Assembly Procedures and Arrays Marco Paolieri (paolieri@usc.edu) Procedures Functions are a key abstraction in software They break down a problem into subproblems. Reusable functionality:

More information

Computer Systems CSE 410 Autumn Memory Organiza:on and Caches

Computer Systems CSE 410 Autumn Memory Organiza:on and Caches Computer Systems CSE 410 Autumn 2013 10 Memory Organiza:on and Caches 06 April 2012 Memory Organiza?on 1 Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c);

More information

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

An Introduction to Assembly Programming with the ARM 32-bit Processor Family An Introduction to Assembly Programming with the ARM 32-bit Processor Family G. Agosta Politecnico di Milano December 3, 2011 Contents 1 Introduction 1 1.1 Prerequisites............................. 2

More information

NET3001. Advanced Assembly

NET3001. Advanced Assembly NET3001 Advanced Assembly Arrays and Indexing supposed we have an array of 16 bytes at 0x0800.0100 write a program that determines if the array contains the byte '0x12' set r0=1 if the byte is found plan:

More information

CSE 533: Advanced Computer Architectures Instructor: Gürhan Küçük. Instructions. Instructions cont d. Forecast. Basics. Basics

CSE 533: Advanced Computer Architectures Instructor: Gürhan Küçük. Instructions. Instructions cont d. Forecast. Basics. Basics CSE 533: Advanced Computer Architectures Instructor: Gürhan Küçük Yeditepe University Lecture notes created by Mark D. Hill Updated by T.N. Vijaykumar and Mikko Lipasti Instructions Instructions are the

More information

Loops and Locality. with an introduc-on to the memory hierarchy. COMP 506 Rice University Spring target code. source code OpJmizer

Loops and Locality. with an introduc-on to the memory hierarchy. COMP 506 Rice University Spring target code. source code OpJmizer COMP 506 Rice University Spring 2017 Loops and Locality with an introduc-on to the memory hierarchy source code Front End IR OpJmizer IR Back End target code Copyright 2017, Keith D. Cooper & Linda Torczon,

More information

CSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1

CSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1 CSIS1120A 10. Instruction Set & Addressing Mode CSIS1120A 10. Instruction Set & Addressing Mode 1 Elements of a Machine Instruction Operation Code specifies the operation to be performed, e.g. ADD, SUB

More information

Code Genera*on for Control Flow Constructs

Code Genera*on for Control Flow Constructs Code Genera*on for Control Flow Constructs 1 Roadmap Last *me: Got the basics of MIPS CodeGen for some AST node types This *me: Do the rest of the AST nodes Introduce control flow graphs Scanner Parser

More information

A QUICK INTRO TO PRACTICAL OPTIMIZATION TECHNIQUES

A QUICK INTRO TO PRACTICAL OPTIMIZATION TECHNIQUES A QUICK INTRO TO PRACTICAL OPTIMIZATION TECHNIQUES 0. NO SILVER BULLETS HERE. 1. Set Compiler Options Appropriately: Select processor architecture: Enables compiler to make full use of instructions which

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 4A: Instruction Level Parallelism - Static Scheduling Avinash Kodi, kodi@ohio.edu Agenda 2 Dependences RAW, WAR, WAW Static Scheduling Loop-carried Dependence

More information

ARM Architecture and Assembly Programming Intro

ARM Architecture and Assembly Programming Intro ARM Architecture and Assembly Programming Intro Instructors: Dr. Phillip Jones http://class.ece.iastate.edu/cpre288 1 Announcements HW9: Due Sunday 11/5 (midnight) Lab 9: object detection lab Give TAs

More information

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1 Table of Contents About the Authors... iii Introduction... xvii Chapter 1: System Software... 1 1.1 Concept of System Software... 2 Types of Software Programs... 2 Software Programs and the Computing Machine...

More information

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine Machine Language Instructions Introduction Instructions Words of a language understood by machine Instruction set Vocabulary of the machine Current goal: to relate a high level language to instruction

More information

Exam 1. Date: February 23, 2016

Exam 1. Date: February 23, 2016 Exam 1 Date: February 23, 2016 UT EID: Printed Name: Last, First Your signature is your promise that you have not cheated and will not cheat on this exam, nor will you help others to cheat on this exam:

More information

CMPSCI 201 Fall 2006 Midterm #2 November 20, 2006 SOLUTION KEY

CMPSCI 201 Fall 2006 Midterm #2 November 20, 2006 SOLUTION KEY CMPSCI 201 Fall 2006 Midterm #2 November 20, 2006 SOLUTION KEY Professor William T. Verts 10 Points Trace the following circuit, called a demultiplexer, and show its outputs for all possible inputs.

More information

Optimisation p.1/22. Optimisation

Optimisation p.1/22. Optimisation Performance Tuning Optimisation p.1/22 Optimisation Optimisation p.2/22 Constant Elimination do i=1,n a(i) = 2*b*c(i) enddo What is wrong with this loop? Compilers can move simple instances of constant

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

Bentley Rules for Optimizing Work

Bentley Rules for Optimizing Work 6.172 Performance Engineering of Software Systems SPEED LIMIT PER ORDER OF 6.172 LECTURE 2 Bentley Rules for Optimizing Work Charles E. Leiserson September 11, 2012 2012 Charles E. Leiserson and I-Ting

More information

Objectives. ICT106 Fundamentals of Computer Systems Topic 8. Procedures, Calling and Exit conventions, Run-time Stack Ref: Irvine, Ch 5 & 8

Objectives. ICT106 Fundamentals of Computer Systems Topic 8. Procedures, Calling and Exit conventions, Run-time Stack Ref: Irvine, Ch 5 & 8 Objectives ICT106 Fundamentals of Computer Systems Topic 8 Procedures, Calling and Exit conventions, Run-time Stack Ref: Irvine, Ch 5 & 8 To understand how HLL procedures/functions are actually implemented

More information

Lecture 4: Instruction Set Design/Pipelining

Lecture 4: Instruction Set Design/Pipelining Lecture 4: Instruction Set Design/Pipelining Instruction set design (Sections 2.9-2.12) control instructions instruction encoding Basic pipelining implementation (Section A.1) 1 Control Transfer Instructions

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 7

ECE 571 Advanced Microprocessor-Based Design Lecture 7 ECE 571 Advanced Microprocessor-Based Design Lecture 7 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 9 February 2016 HW2 Grades Ready Announcements HW3 Posted be careful when

More information

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann,

More information

CSCE 5610: Computer Architecture

CSCE 5610: Computer Architecture HW #1 1.3, 1.5, 1.9, 1.12 Due: Sept 12, 2018 Review: Execution time of a program Arithmetic Average, Weighted Arithmetic Average Geometric Mean Benchmarks, kernels and synthetic benchmarks Computing CPI

More information

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion Outline Loop Optimizations Induction Variables Recognition Induction Variables Combination of Analyses Copyright 2010, Pedro C Diniz, all rights reserved Students enrolled in the Compilers class at the

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 4 Thomas Wies New York University Review Last week Control Structures Selection Loops Adding Invariants Outline Subprograms Calling Sequences Parameter

More information

Multitasking on Cortex-M(0) class MCU A deepdive into the Chromium-EC scheduler

Multitasking on Cortex-M(0) class MCU A deepdive into the Chromium-EC scheduler Multitasking on Cortex-M(0) class MCU A deepdive into the Chromium-EC scheduler $whoami Embedded Software Engineer at National Instruments We just finished our first product using Chromium-EC and future

More information

Procedure Calling. Procedure Calling. Register Usage. 25 September CSE2021 Computer Organization

Procedure Calling. Procedure Calling. Register Usage. 25 September CSE2021 Computer Organization CSE2021 Computer Organization Chapter 2: Part 2 Procedure Calling Procedure (function) performs a specific task and return results to caller. Supporting Procedures Procedure Calling Calling program place

More information

Architecture II. Computer Systems Laboratory Sungkyunkwan University

Architecture II. Computer Systems Laboratory Sungkyunkwan University MIPS Instruction ti Set Architecture II Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Making Decisions (1) Conditional operations Branch to a

More information

Computer Organization & Assembly Language Programming (CSE 2312)

Computer Organization & Assembly Language Programming (CSE 2312) Computer Organization & Assembly Language Programming (CSE 2312) Lecture 15: Running ARM Programs in QEMU and Debugging with gdb Taylor Johnson Announcements and Outline Homework 5 due Thursday Midterm

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

CS 61C: Great Ideas in Computer Architecture Func%ons and Numbers

CS 61C: Great Ideas in Computer Architecture Func%ons and Numbers CS 61C: Great Ideas in Computer Architecture Func%ons and Numbers 9/11/12 Instructor: Krste Asanovic, Randy H. Katz hcp://inst.eecs.berkeley.edu/~cs61c/sp12 Fall 2012 - - Lecture #8 1 New- School Machine

More information

Supplement for MIPS (Section 4.14 of the textbook)

Supplement for MIPS (Section 4.14 of the textbook) Supplement for MIPS (Section 44 of the textbook) Section 44 does a good job emphasizing that MARIE is a toy architecture that lacks key feature of real-world computer architectures Most noticable, MARIE

More information

On the Design of the Local Variable Cache in a Hardware Translation-Based Java Virtual Machine

On the Design of the Local Variable Cache in a Hardware Translation-Based Java Virtual Machine On the Design of the Local Variable Cache in a Hardware Translation-Based Java Virtual Machine Hitoshi Oi The University of Aizu June 16, 2005 Languages, Compilers, and Tools for Embedded Systems (LCTES

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections ) Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 4.4) 1 Static vs Dynamic Scheduling Arguments against dynamic scheduling: requires complex structures

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

Support for high-level languages

Support for high-level languages Outline: Support for high-level languages memory organization ARM data types conditional statements & loop structures the ARM Procedure Call Standard hands-on: writing & debugging C programs 2005 PEVE

More information

Cache Memory: Instruction Cache, HW/SW Interaction. Admin

Cache Memory: Instruction Cache, HW/SW Interaction. Admin Cache Memory Instruction Cache, HW/SW Interaction Computer Science 104 Admin Project Due Dec 7 Homework #5 Due November 19, in class What s Ahead Finish Caches Virtual Memory Input/Output (1 homework)

More information

ECE 598 Advanced Operating Systems Lecture 4

ECE 598 Advanced Operating Systems Lecture 4 ECE 598 Advanced Operating Systems Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Announcements HW#1 was due HW#2 was posted, will be tricky Let me know

More information

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Cache Memories Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Next time Dynamic memory allocation and memory bugs Fabián E. Bustamante,

More information

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer CS 61C: Great Ideas in Computer Architecture Everything is a Number Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13 9/19/13 Fall 2013 - - Lecture #7 1 New- School Machine Structures

More information

Supercomputing in Plain English Part IV: Henry Neeman, Director

Supercomputing in Plain English Part IV: Henry Neeman, Director Supercomputing in Plain English Part IV: Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma Wednesday September 19 2007 Outline! Dependency Analysis! What is

More information