Assembly Language Programming Optimization

Size: px
Start display at page:

Download "Assembly Language Programming Optimization"

Transcription

1 Assembly Language Programming Optimization December 9, 2017

2 Conditional transfer Sometimes we make comparison only to execute a single assignment depending on the result. Then we can use conditional move instruction, where assignment is performed only if the indicated condition was satisfied, e.g. the instruction cmove eax,1 sets register eax to 1 only if recently compared elements were equal. The main advantage is avoidance of the necessity of cleaning the pipeline or speculative execution. lub wykonania spekulacyjnego. Conditional assignment SET. Conditional transfer CMOV.

3 Conditional transfer: an example Find maximum of two numbers (arguments in EAX and EBX, result in ECX): mov ecx,eax cmp ebx,ecx cmova ecx,ebx

4 Conditional transfers: errors Assume we are compiling in C the expression int *xp;... return (xp? *xp : 0); If xp is in rdi, we could try xor eax,eax ;Maybe we will return zero test rdi,rdi ;xp == 0? cmovne eax,[rdi] ;Maybe we will return *xp But then the dereference of xp will occurs always (even for the NULL pointer), and this we want to avoid.

5 Jump avoidance Avoiding jumps ia a larger problem. Let us look at the computation of absolute value of number skip: test eax,eax jns omiń neg eax ;We set flags ;Positive sign

6 Jump avoidance There is a different way: mov ecx,eax sar ecx,31 xor eax,ecx sub eax,ecx ;sign bit everywhere ;bit reverse ;we subtract -1 and have 2-complement

7 Power of 2 Another trick: how to check, whether a number in EAX is a power of two? mov ebx,eax ;or lea ebx,[eax - 1] dec ebx test eax,ebx jnz isnot

8 Hints The processor tries to guess, whether the conditional jump will be performed. With static guess it is assumed, that the jump backwards will be peformed. We can help it using hints: prefixes HT(0x3e) and HNT(0x2e), for example test ecx,ecx db 3eh jz L9... L9: ;HT = we will jump

9 Hints Sometimes holding the data in cache memory is not useful, if it is only used once Direct write instructions (non-temporal store) MOVNTI, MOVNTPD, etc. in write phase omit the cache.

10 Conservativity of compiler The C compiler must be conservative and generate code in such a way, that all possible cases are covered. Example: void memclr (char *data, int n) { for (; n > 0; n--) *data++ = 0; } If the compiler knew something about the alignment of data, it could generate a code to zero 2, 4 or ever 8 bajtów in one step. However, it must assume the worst case.

11 Conservativity of compiler There a few elements in C/C++, which are classic examples of slowing down programs. The group is lead by the conversion (cast) from real number to integer, for example int i; float f;... i = (int)f; Such conversion takes processor cycles. Reason: the C/C++ defines a different way of rounding than implemented in FPU, so we have to toggle coprocessor mode.

12 Conservativity of compiler Other nomination to Oscara prize is pointer aliasing. In the code below a compiler will not pull the evaluation of *p + 2 befor the loop void Func1 (int a[], int *p) { int i; for (i = 0; i < 100; i++) a[i] = *p + 2; } And it is right, because (hooray for C and C++ :-) void Func2() { int list[100]; Func1(list, &list[8]); }

13 Conservativity of compiler Sometimes the recipes are simple. The code below twice fetches arg1->p1 from the memory: struct S1 int p1; struct S2 int p2, p3; void f1 (struct S1 *arg1, struct S2 *arg2) arg2->p2 += arg1->p1; arg2->p3 += arg1->p1; It must work this way, because arg2->p2 and arg1->p1 may be the same memory cell. But it is enough to introduce local variable bound to S1->p1.

14 Assembler Asembler allows us to take advantage from low-level services: Registers and direct input/output. Violating the compiler conventions: different passing of parameters, violating the memory allocation rules, iterative call of procedures. Linking incompatible code fragments, e.g. built by different compilers. Code optimization by hand to adapt it to a very particular hardware configuration.

15 Extreme example Appetizer The following code in C float a[4], b[4], c[4]; for (int i = 0; i < 4; i++) { c[i] = a[i] > b[i]? a[i] : b[i]; } can be optimally coded as follows movaps xmm0,[a] maxps xmm0,[b] movaps [c],xmm0 ;Load a vector ;max(a,b) ;c = a > b? a : b

16 Not enough registers or two in one We have two variables index and increment, both 16-bit (short). On ARM they can pe put into one register, index at the top. Then the C code elem = tab[index]; index += increment; could be written in assembler as LDRB Relem, [Rtab, Rindincr, LSR#16] ADD Rindincr, Rindincr, Rindincr, LSL#16

17 Intel/AMD The instruction set of CISC processors (x86) is not optimal confirmed by several changes of architecture philosophy. It must be preserved because of back compatibility with systems from years 1980s, when RAM and disc memory were small and costly. But CISC also has some advantages. The compactness of code fits well to requirements of cache memories with restricted sizes. The main problem of x86 processors is lack of enough registers, alleviated a little when designing x86-64.

18 Graphics accelerators Demading graphic applications need platforms with graphics coprocessor or accelerator card. The computational power contained in them can be used also to other tasks, but this is another story (and it depends much on hardware).

19 64-bit code Advantages: More registers: usually no need to store variables and intermediate result in RAM memory. The efficient procedure call: passing parameters in registers. 64-bitowe registers for integers. Better management of large memory blocks. Built-in restricted SIMD (SSE). Relative addressing of data, efficient relocatable code.

20 64-bit code Disadvantages: Twice larger addresses and stack positions: troubles with cache memory. The access to static and global arrays requires more instructions for large memory images. Mostly for Windows and Mac. More complicated computation of effective memory address when the size greater than 2GB. Some instructions are longer.

21 Intrinsic functions in C++ New approach for joining code from different levels. Intrinsic functions represent known to the compiler processor instructions. Example: addition of floating-point vectors ADDPS may be written in C++ as the function _mm_add_ps. We can also define the appropriate class of vectors and overlod the + operator in it. Intrinsic functions exist in Microsoft, Intela and GNU compilers.

22 Examining compiled code Various reasons: Checking for evident places for rewriting by hand in assembly language (or for switching compiler flag, e.g. -O3 ;-) Use compiler as an intelligent typist, and the resulting code as more comfortable base than staring form nothing. This code at least has correct interfaces with environment, and they give us usually most troubles. And sometimes we will discover an error in compiler.

23 Examining compiled code Let us look at the loop for (int i = 0; i <= 15; i++) T[i] := i; The compiler should logically replace it by for (int i = 15; i >= 0; i--) T[i] := i; Reason: we save at a comparison instruction (with 15), because subtraction already set zero flag.

24 Examining compiled code But when loop body is much more complicated, it could be difficult for the compiler to decide, whether it may change the order of passing. Then we have to do it ourselves!

25 Intel C++ compiler (parallel composer) Intrinsics for vectors, automatic vectorization. OpenMP and automatic parallelization of threads. CPU dispatch: different versions for different processors. The best optimized mathematical libraries (but once they could not divide correctly). Drawback: the code may execute slower on AMD and VIA processors, then you should bypass dispatch.

26 GNU compiler Intrinsics for vectors, automatic vectorization. OpenMP and automatic parallelization of threads. Library optimization waits for its turn. But it accepts mathematical vector libraries of AMD and Intela.

27 Hardware restrictions On classic ARM registers are 32-bit wide. You should avoid types char and short for loop counters, because then one has to check ranges by hand, e.g. for instruction short i;... i++; must generate code to check each time, whether there is no overflow, and possibly roll to zero. As registers are 32-bit wide, so there is no signalling of overflow/carry for 16-bit numbers. Here also the compiler defenceless. Of course on x86 processor we do not have these problems (AL, AX).

28 Dependent instructions The total time of execution of a sequence of dependent instructions (same arguments and/or results) is equal to the sum of their latency necessary number of cycles. If instructions are independent, then next instruction starts earlier and the total time is decreased, for example code double list[100], sum = 0.0; for (int i = 0; i < 100; i++) sum += list[i]; should be replaced by double list[100], sum1 = 0.0, sum2 = 0.0, sum3 = 0.0, sum4 = 0. for (int i = 0; i < 100; i += 4) { sum1 += list[i]; sum2 += list[i+1]; sum3 += list[i+2]; sum4 += list[i+3]; } sum1 = (sum1 + sum2) + (sum3 + sum4);

29 Dependencies Sometimes it looks strange, for example the assignment instruction y = a + b + c + d; is better replaced by y = (a + b) + (c + d); The specification of many programming language forces the compiler to always compute the experssions form left to right (e.g. to have always the same rounding orders) and the compiler may not do anything.

30 Partial registers Some CPUs implement out of order execution, but are not able to rename partial registers (ax, ah, al). This causes the delay in the code below, because the third instruction has to wait for higher 16 bits from multiplication imul eax,6 mov [mem2],eax mov ax,[mem3] add ax,2 mov [mem4],ax If we replace this instruction by movzx eax,[mem3] the dependency is removed. ;16-bit operands It could be one of reasons for doing it automatically for 32-bit transfers in 64-bit mode.

31 Changing order of execution Mostly on strongly pipelined RISCs (e.g. ARM), forced by specific of a processor On ARM9TDMI after the memory load instruction (e.g. LDR) the loaded value should not be used for two cycles. Multiplication takes the same time as multiplication with accumulation (MLA). Conclusion obvious. On ARM10E instructions of multiple load from memory and store to it work in the background. Superficially they take uone cycle, unless we try to use one of these register in the following instruction. On Intel XScale the instruction LDRD loads two words at once (in one cycle. But the first register should not be used for two following cycles, and the second one for three cycles.

32 Jumps and procedures Fetching code after (unexpected) jump generates delays on the order of 1 3 cycles. The delay is largest when the destination address falls on the end of 16-byte block (frame). Paradox: it is sometimes worthy to replace in the code earlier the shorter form of a instruction with the longer one to get the alignment. To predict the returns from procedures (ret) processor uses so called return stack buffer, usually with 16 elements. Do not fool the mechanism by jumping out of procedures or secretly removing return addresses from the stack (or using ret as indirect jump). Reduction calls (tail calls) are implemented with jumps!

33 Metaprogramming Instead of writing twisted assembler macros or overabuse m4 it is better to write programs, which generate other programs or their parts: Table generators for sinus, cosinus or leap years Converters of bitmaps into fast display procedures Gettting different aspect from the same code (aspect-oriented programming) Specialized code in assembler based on script written in Scheme or other language and on additional constraints.

34 Tuning: tools AMD Code Analyst Intel VTune New-Jersey Machine-Code Toolkit (w ML) nr/toolkit/

Practical Malware Analysis

Practical Malware Analysis Practical Malware Analysis Ch 4: A Crash Course in x86 Disassembly Revised 1-16-7 Basic Techniques Basic static analysis Looks at malware from the outside Basic dynamic analysis Only shows you how the

More information

Kampala August, Agner Fog

Kampala August, Agner Fog Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler

More information

Binghamton University. CS-220 Spring x86 Assembler. Computer Systems: Sections

Binghamton University. CS-220 Spring x86 Assembler. Computer Systems: Sections x86 Assembler Computer Systems: Sections 3.1-3.5 Disclaimer I am not an x86 assembler expert. I have never written an x86 assembler program. (I am proficient in IBM S/360 Assembler and LC3 Assembler.)

More information

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated CNIT 127: Exploit Development Ch 1: Before you begin Updated 1-14-16 Basic Concepts Vulnerability A flaw in a system that allows an attacker to do something the designer did not intend, such as Denial

More information

The Instruction Set. Chapter 5

The Instruction Set. Chapter 5 The Instruction Set Architecture Level(ISA) Chapter 5 1 ISA Level The ISA level l is the interface between the compilers and the hardware. (ISA level code is what a compiler outputs) 2 Memory Models An

More information

Selected Machine Language. Instructions Introduction Inc and dec Instructions

Selected Machine Language. Instructions Introduction Inc and dec Instructions Selected Machine Language 10 Instructions 10.1 Introduction As may have been learned from a computer organization text, there are many considerations that need to be taken into account and many different

More information

CPU. Fall 2003 CSE 207 Digital Design Project #4 R0 R1 R2 R3 R4 R5 R6 R7 PC STATUS IR. Control Logic RAM MAR MDR. Internal Processor Bus

CPU. Fall 2003 CSE 207 Digital Design Project #4 R0 R1 R2 R3 R4 R5 R6 R7 PC STATUS IR. Control Logic RAM MAR MDR. Internal Processor Bus http://www.engr.uconn.edu/~barry/cse207/fa03/project4.pdf Page 1 of 16 Fall 2003 CSE 207 Digital Design Project #4 Background Microprocessors are increasingly common in every day devices. Desktop computers

More information

Machine and Assembly Language Principles

Machine and Assembly Language Principles Machine and Assembly Language Principles Assembly language instruction is synonymous with a machine instruction. Therefore, need to understand machine instructions and on what they operate - the architecture.

More information

Computer Organization CS 206 T Lec# 2: Instruction Sets

Computer Organization CS 206 T Lec# 2: Instruction Sets Computer Organization CS 206 T Lec# 2: Instruction Sets Topics What is an instruction set Elements of instruction Instruction Format Instruction types Types of operations Types of operand Addressing mode

More information

x86 assembly CS449 Fall 2017

x86 assembly CS449 Fall 2017 x86 assembly CS449 Fall 2017 x86 is a CISC CISC (Complex Instruction Set Computer) e.g. x86 Hundreds of (complex) instructions Only a handful of registers RISC (Reduced Instruction Set Computer) e.g. MIPS

More information

Reverse Engineering II: Basics. Gergely Erdélyi Senior Antivirus Researcher

Reverse Engineering II: Basics. Gergely Erdélyi Senior Antivirus Researcher Reverse Engineering II: Basics Gergely Erdélyi Senior Antivirus Researcher Agenda Very basics Intel x86 crash course Basics of C Binary Numbers Binary Numbers 1 Binary Numbers 1 0 1 1 Binary Numbers 1

More information

CSE351 Spring 2018, Midterm Exam April 27, 2018

CSE351 Spring 2018, Midterm Exam April 27, 2018 CSE351 Spring 2018, Midterm Exam April 27, 2018 Please do not turn the page until 11:30. Last Name: First Name: Student ID Number: Name of person to your left: Name of person to your right: Signature indicating:

More information

Instruction Sets: Characteristics and Functions Addressing Modes

Instruction Sets: Characteristics and Functions Addressing Modes Instruction Sets: Characteristics and Functions Addressing Modes Chapters 10 and 11, William Stallings Computer Organization and Architecture 7 th Edition What is an Instruction Set? The complete collection

More information

Memory Models. Registers

Memory Models. Registers Memory Models Most machines have a single linear address space at the ISA level, extending from address 0 up to some maximum, often 2 32 1 bytes or 2 64 1 bytes. Some machines have separate address spaces

More information

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions? administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions? exam on Wednesday today s material not on the exam 1 Assembly Assembly is programming

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08 CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 21: Generating Pentium Code 10 March 08 CS 412/413 Spring 2008 Introduction to Compilers 1 Simple Code Generation Three-address code makes it

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2018 Lecture 4 LAST TIME Enhanced our processor design in several ways Added branching support Allows programs where work is proportional to the input values

More information

Reverse Engineering II: The Basics

Reverse Engineering II: The Basics Reverse Engineering II: The Basics Gergely Erdélyi Senior Manager, Anti-malware Research Protecting the irreplaceable f-secure.com Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 1 1 0 1 - Byte B D 1 0 1 1 1

More information

CSE P 501 Compilers. x86 Lite for Compiler Writers Hal Perkins Autumn /25/ Hal Perkins & UW CSE J-1

CSE P 501 Compilers. x86 Lite for Compiler Writers Hal Perkins Autumn /25/ Hal Perkins & UW CSE J-1 CSE P 501 Compilers x86 Lite for Compiler Writers Hal Perkins Autumn 2011 10/25/2011 2002-11 Hal Perkins & UW CSE J-1 Agenda Learn/review x86 architecture Core 32-bit part only for now Ignore crufty, backward-compatible

More information

Summary: Direct Code Generation

Summary: Direct Code Generation Summary: Direct Code Generation 1 Direct Code Generation Code generation involves the generation of the target representation (object code) from the annotated parse tree (or Abstract Syntactic Tree, AST)

More information

Winter Compiler Construction T11 Activation records + Introduction to x86 assembly. Today. Tips for PA4. Today:

Winter Compiler Construction T11 Activation records + Introduction to x86 assembly. Today. Tips for PA4. Today: Winter 2006-2007 Compiler Construction T11 Activation records + Introduction to x86 assembly Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University Today ic IC Language Lexical Analysis

More information

Reverse Engineering II: The Basics

Reverse Engineering II: The Basics Reverse Engineering II: The Basics This document is only to be distributed to teachers and students of the Malware Analysis and Antivirus Technologies course and should only be used in accordance with

More information

Optimizing Memory Bandwidth

Optimizing Memory Bandwidth Optimizing Memory Bandwidth Don t settle for just a byte or two. Grab a whole fistful of cache. Mike Wall Member of Technical Staff Developer Performance Team Advanced Micro Devices, Inc. make PC performance

More information

CSE 351 Midterm - Winter 2015 Solutions

CSE 351 Midterm - Winter 2015 Solutions CSE 351 Midterm - Winter 2015 Solutions February 09, 2015 Please read through the entire examination first! We designed this exam so that it can be completed in 50 minutes and, hopefully, this estimate

More information

Lecture 2 Assembly Language

Lecture 2 Assembly Language Lecture 2 Assembly Language Computer and Network Security 9th of October 2017 Computer Science and Engineering Department CSE Dep, ACS, UPB Lecture 2, Assembly Language 1/37 Recap: Explorations Tools assembly

More information

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

22 Assembly Language for Intel-Based Computers, 4th Edition. 3. Each edge is a transition from one state to another, caused by some input.

22 Assembly Language for Intel-Based Computers, 4th Edition. 3. Each edge is a transition from one state to another, caused by some input. 22 Assembly Language for Intel-Based Computers, 4th Edition 6.6 Application: Finite-State Machines 1. A directed graph (also known as a diagraph). 2. Each node is a state. 3. Each edge is a transition

More information

CSC 8400: Computer Systems. Machine-Level Representation of Programs

CSC 8400: Computer Systems. Machine-Level Representation of Programs CSC 8400: Computer Systems Machine-Level Representation of Programs Towards the Hardware High-level language (Java) High-level language (C) assembly language machine language (IA-32) 1 Compilation Stages

More information

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs CSC 2400: Computer Systems Towards the Hardware: Machine-Level Representation of Programs Towards the Hardware High-level language (Java) High-level language (C) assembly language machine language (IA-32)

More information

X86 Addressing Modes Chapter 3" Review: Instructions to Recognize"

X86 Addressing Modes Chapter 3 Review: Instructions to Recognize X86 Addressing Modes Chapter 3" Review: Instructions to Recognize" 1 Arithmetic Instructions (1)! Two Operand Instructions" ADD Dest, Src Dest = Dest + Src SUB Dest, Src Dest = Dest - Src MUL Dest, Src

More information

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution 1. (40 points) Write the following subroutine in x86 assembly: Recall that: int f(int v1, int v2, int v3) { int x = v1 + v2; urn (x + v3) * (x v3); Subroutine arguments are passed on the stack, and can

More information

Computer Systems Lecture 9

Computer Systems Lecture 9 Computer Systems Lecture 9 CPU Registers in x86 CPU status flags EFLAG: The Flag register holds the CPU status flags The status flags are separate bits in EFLAG where information on important conditions

More information

Computer System Architecture

Computer System Architecture CSC 203 1.5 Computer System Architecture Department of Statistics and Computer Science University of Sri Jayewardenepura Addressing 2 Addressing Subject of specifying where the operands (addresses) are

More information

Reverse Engineering Low Level Software. CS5375 Software Reverse Engineering Dr. Jaime C. Acosta

Reverse Engineering Low Level Software. CS5375 Software Reverse Engineering Dr. Jaime C. Acosta 1 Reverse Engineering Low Level Software CS5375 Software Reverse Engineering Dr. Jaime C. Acosta Machine code 2 3 Machine code Assembly compile Machine Code disassemble 4 Machine code Assembly compile

More information

Representation of Information

Representation of Information Representation of Information CS61, Lecture 2 Prof. Stephen Chong September 6, 2011 Announcements Assignment 1 released Posted on http://cs61.seas.harvard.edu/ Due one week from today, Tuesday 13 Sept

More information

Assembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit

Assembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit Assembly Language for Intel-Based Computers, 4 th Edition Kip R. Irvine Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit Slides prepared by Kip R. Irvine Revision date: 09/25/2002

More information

CS241 Computer Organization Spring 2015 IA

CS241 Computer Organization Spring 2015 IA CS241 Computer Organization Spring 2015 IA-32 2-10 2015 Outline! Review HW#3 and Quiz#1! More on Assembly (IA32) move instruction (mov) memory address computation arithmetic & logic instructions (add,

More information

Computer Organization and Technology Processor and System Structures

Computer Organization and Technology Processor and System Structures Computer Organization and Technology Processor and System Structures Assoc. Prof. Dr. Wattanapong Kurdthongmee Division of Computer Engineering, School of Engineering and Resources, Walailak University

More information

Important From Last Time

Important From Last Time Important From Last Time Embedded C Pros and cons Macros and how to avoid them Intrinsics Interrupt syntax Inline assembly Today Advanced C What C programs mean How to create C programs that mean nothing

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 8: Introduction to code generation Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 What is a Compiler? Compilers

More information

3.1 DATA MOVEMENT INSTRUCTIONS 45

3.1 DATA MOVEMENT INSTRUCTIONS 45 3.1.1 General-Purpose Data Movement s 45 3.1.2 Stack Manipulation... 46 3.1.3 Type Conversion... 48 3.2.1 Addition and Subtraction... 51 3.1 DATA MOVEMENT INSTRUCTIONS 45 MOV (Move) transfers a byte, word,

More information

Branching and Looping

Branching and Looping Branching and Looping Ray Seyfarth August 10, 2011 Branching and looping So far we have only written straight line code Conditional moves helped spice things up In addition conditional moves kept the pipeline

More information

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam Assembly Language Lecture 2 - x86 Processor Architecture Ahmed Sallam Introduction to the course Outcomes of Lecture 1 Always check the course website Don t forget the deadline rule!! Motivations for studying

More information

Static Analysis I PAOLO PALUMBO, F-SECURE CORPORATION

Static Analysis I PAOLO PALUMBO, F-SECURE CORPORATION Static Analysis I PAOLO PALUMBO, F-SECURE CORPORATION Representing Data Binary numbers 1 0 1 1 NIBBLE 0xB 1 0 1 1 1 1 0 1 0xBD 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 BYTE WORD 0xBD 0x39 Endianness c9 33 41 03

More information

Computer Systems Laboratory Sungkyunkwan University

Computer Systems Laboratory Sungkyunkwan University ARM & IA-32 Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ARM (1) ARM & MIPS similarities ARM: the most popular embedded core Similar basic set

More information

Islamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB

Islamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB Islamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB Lab # 9 Integer Arithmetic and Bit Manipulation April, 2014 1 Assembly Language LAB Bitwise

More information

Important From Last Time

Important From Last Time Important From Last Time Embedded C Ø Pros and cons Macros and how to avoid them Intrinsics Interrupt syntax Inline assembly Today Advanced C What C programs mean How to create C programs that mean nothing

More information

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth Registers Ray Seyfarth September 8, 2011 Outline 1 Register basics 2 Moving a constant into a register 3 Moving a value from memory into a register 4 Moving values from a register into memory 5 Moving

More information

Name: CMSC 313 Fall 2001 Computer Organization & Assembly Language Programming Exam 1. Question Points I. /34 II. /30 III.

Name: CMSC 313 Fall 2001 Computer Organization & Assembly Language Programming Exam 1. Question Points I. /34 II. /30 III. CMSC 313 Fall 2001 Computer Organization & Assembly Language Programming Exam 1 Name: Question Points I. /34 II. /30 III. /36 TOTAL: /100 Instructions: 1. This is a closed-book, closed-notes exam. 2. You

More information

T Jarkko Turkulainen, F-Secure Corporation

T Jarkko Turkulainen, F-Secure Corporation T-110.6220 2010 Emulators and disassemblers Jarkko Turkulainen, F-Secure Corporation Agenda Disassemblers What is disassembly? What makes up an instruction? How disassemblers work Use of disassembly In

More information

Review addressing modes

Review addressing modes Review addressing modes Op Src Dst Comments movl $0, %rax Register movl $0, 0x605428 Direct address movl $0, (%rcx) Indirect address movl $0, 20(%rsp) Indirect with displacement movl $0, -8(%rdi, %rax,

More information

CSE2421 FINAL EXAM SPRING Name KEY. Instructions: Signature

CSE2421 FINAL EXAM SPRING Name KEY. Instructions: Signature CSE2421 FINAL EXAM SPRING 2013 Name KEY Instructions: This is a closed-book, closed-notes, closed-neighbor exam. Only a writing utensil is needed for this exam. No calculators allowed. If you need to go

More information

Process Layout and Function Calls

Process Layout and Function Calls Process Layout and Function Calls CS 6 Spring 07 / 8 Process Layout in Memory Stack grows towards decreasing addresses. is initialized at run-time. Heap grow towards increasing addresses. is initialized

More information

3.0 Instruction Set. 3.1 Overview

3.0 Instruction Set. 3.1 Overview 3.0 Instruction Set 3.1 Overview There are 16 different P8 instructions. Research on instruction set usage was the basis for instruction selection. Each instruction has at least two addressing modes, with

More information

Introduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C

Introduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C Final Review CS304 Introduction to C Why C? Difference between Python and C C compiler stages Basic syntax in C Pointers What is a pointer? declaration, &, dereference... Pointer & dynamic memory allocation

More information

Real instruction set architectures. Part 2: a representative sample

Real instruction set architectures. Part 2: a representative sample Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length

More information

Assembly Language. Lecture 2 x86 Processor Architecture

Assembly Language. Lecture 2 x86 Processor Architecture Assembly Language Lecture 2 x86 Processor Architecture Ahmed Sallam Slides based on original lecture slides by Dr. Mahmoud Elgayyar Introduction to the course Outcomes of Lecture 1 Always check the course

More information

SOEN228, Winter Revision 1.2 Date: October 25,

SOEN228, Winter Revision 1.2 Date: October 25, SOEN228, Winter 2003 Revision 1.2 Date: October 25, 2003 1 Contents Flags Mnemonics Basic I/O Exercises Overview of sample programs 2 Flag Register The flag register stores the condition flags that retain

More information

Lab 4: Basic Instructions and Addressing Modes

Lab 4: Basic Instructions and Addressing Modes COE 205 Lab Manual Lab 4: Basic Instructions and Addressing Modes - page 36 Lab 4: Basic Instructions and Addressing Modes Contents 4.1. Data Transfer Instructions 4.2. Addition and Subtraction 4.3. Data

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

CS , Spring 2004 Exam 1

CS , Spring 2004 Exam 1 Andrew login ID: Full Name: CS 15-213, Spring 2004 Exam 1 February 26, 2004 Instructions: Make sure that your exam is not missing any sheets (there should be 15), then write your full name and Andrew login

More information

Program Exploitation Intro

Program Exploitation Intro Program Exploitation Intro x86 Assembly 04//2018 Security 1 Univeristà Ca Foscari, Venezia What is Program Exploitation "Making a program do something unexpected and not planned" The right bugs can be

More information

Page 1. Today. Important From Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right?

Page 1. Today. Important From Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right? Important From Last Time Today Embedded C Pros and cons Macros and how to avoid them Intrinsics Interrupt syntax Inline assembly Advanced C What C programs mean How to create C programs that mean nothing

More information

Defining and Using Simple Data Types

Defining and Using Simple Data Types 85 CHAPTER 4 Defining and Using Simple Data Types This chapter covers the concepts essential for working with simple data types in assembly-language programs The first section shows how to declare integer

More information

Digital Forensics Lecture 3 - Reverse Engineering

Digital Forensics Lecture 3 - Reverse Engineering Digital Forensics Lecture 3 - Reverse Engineering Low-Level Software Akbar S. Namin Texas Tech University Spring 2017 Reverse Engineering High-Level Software Low-level aspects of software are often the

More information

CSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1

CSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1 CSIS1120A 10. Instruction Set & Addressing Mode CSIS1120A 10. Instruction Set & Addressing Mode 1 Elements of a Machine Instruction Operation Code specifies the operation to be performed, e.g. ADD, SUB

More information

Assembly Language Programming 64-bit environments

Assembly Language Programming 64-bit environments Assembly Language Programming 64-bit environments October 17, 2017 Some recent history Intel together with HP start to work on 64-bit processor using VLIW technology. Itanium processor is born with the

More information

Functions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth

Functions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth Functions Ray Seyfarth August 4, 2011 Functions We will write C compatible function C++ can also call C functions using extern "C" {...} It is generally not sensible to write complete assembly programs

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 8 Processor-level SIMD SIMD instructions can perform

More information

Registers. Registers

Registers. Registers All computers have some registers visible at the ISA level. They are there to control execution of the program hold temporary results visible at the microarchitecture level, such as the Top Of Stack (TOS)

More information

HPC VT Machine-dependent Optimization

HPC VT Machine-dependent Optimization HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler

More information

Machine/Assembler Language Putting It All Together

Machine/Assembler Language Putting It All Together COMP 40: Machine Structure and Assembly Language Programming Fall 2015 Machine/Assembler Language Putting It All Together Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah

More information

2.7 Supporting Procedures in hardware. Why procedures or functions? Procedure calls

2.7 Supporting Procedures in hardware. Why procedures or functions? Procedure calls 2.7 Supporting Procedures in hardware Why procedures or functions? Procedure calls Caller: Callee: Proc save registers save more registers set up parameters do function call procedure set up results get

More information

EJEMPLOS DE ARQUITECTURAS

EJEMPLOS DE ARQUITECTURAS Maestría en Electrónica Arquitectura de Computadoras Unidad 4 EJEMPLOS DE ARQUITECTURAS M. C. Felipe Santiago Espinosa Marzo/2017 ARM & MIPS Similarities ARM: the most popular embedded core Similar basic

More information

reply db y prompt db Enter your favourite colour:, 0 colour db 80 dup(?) i db 20 k db? num dw 4000 large dd 50000

reply db y prompt db Enter your favourite colour:, 0 colour db 80 dup(?) i db 20 k db? num dw 4000 large dd 50000 Declaring Variables in Assembly Language As in Java, variables must be declared before they can be used Unlike Java, we do not specify a variable type in the declaration in assembly language Instead we

More information

Objectives. ICT106 Fundamentals of Computer Systems Topic 8. Procedures, Calling and Exit conventions, Run-time Stack Ref: Irvine, Ch 5 & 8

Objectives. ICT106 Fundamentals of Computer Systems Topic 8. Procedures, Calling and Exit conventions, Run-time Stack Ref: Irvine, Ch 5 & 8 Objectives ICT106 Fundamentals of Computer Systems Topic 8 Procedures, Calling and Exit conventions, Run-time Stack Ref: Irvine, Ch 5 & 8 To understand how HLL procedures/functions are actually implemented

More information

CSCE 212H, Spring 2008 Lab Assignment 3: Assembly Language Assigned: Feb. 7, Due: Feb. 14, 11:59PM

CSCE 212H, Spring 2008 Lab Assignment 3: Assembly Language Assigned: Feb. 7, Due: Feb. 14, 11:59PM CSCE 212H, Spring 2008 Lab Assignment 3: Assembly Language Assigned: Feb. 7, Due: Feb. 14, 11:59PM February 7, 2008 1 Overview The purpose of this assignment is to introduce you to the assembly language

More information

Instruction Set Architectures

Instruction Set Architectures Instruction Set Architectures! ISAs! Brief history of processors and architectures! C, assembly, machine code! Assembly basics: registers, operands, move instructions 1 What should the HW/SW interface

More information

Module 3 Instruction Set Architecture (ISA)

Module 3 Instruction Set Architecture (ISA) Module 3 Instruction Set Architecture (ISA) I S A L E V E L E L E M E N T S O F I N S T R U C T I O N S I N S T R U C T I O N S T Y P E S N U M B E R O F A D D R E S S E S R E G I S T E R S T Y P E S O

More information

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine.

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine. This lecture Compiler construction Lecture 6: Code generation for x86 Magnus Myreen Spring 2018 Chalmers University of Technology Gothenburg University x86 architecture s Some x86 instructions From LLVM

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

CS , Fall 2004 Exam 1

CS , Fall 2004 Exam 1 Andrew login ID: Full Name: CS 15-213, Fall 2004 Exam 1 Tuesday October 12, 2004 Instructions: Make sure that your exam is not missing any sheets, then write your full name and Andrew login ID on the front.

More information

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation Compiler Construction SMD163 Lecture 8: Introduction to code generation Viktor Leijon & Peter Jonsson with slides by Johan Nordlander Contains material generously provided by Mark P. Jones What is a Compiler?

More information

Machine Programming 3: Procedures

Machine Programming 3: Procedures Machine Programming 3: Procedures CS61, Lecture 5 Prof. Stephen Chong September 15, 2011 Announcements Assignment 2 (Binary bomb) due next week If you haven t yet please create a VM to make sure the infrastructure

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function William Stallings Computer Organization and Architecture 8 th Edition Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data

More information

Interfacing Compiler and Hardware. Computer Systems Architecture. Processor Types And Instruction Sets. What Instructions Should A Processor Offer?

Interfacing Compiler and Hardware. Computer Systems Architecture. Processor Types And Instruction Sets. What Instructions Should A Processor Offer? Interfacing Compiler and Hardware Computer Systems Architecture FORTRAN 90 program C++ program Processor Types And Sets FORTRAN 90 Compiler C++ Compiler set level Hardware 1 2 What s Should A Processor

More information

RAID 0 (non-redundant) RAID Types 4/25/2011

RAID 0 (non-redundant) RAID Types 4/25/2011 Exam 3 Review COMP375 Topics I/O controllers chapter 7 Disk performance section 6.3-6.4 RAID section 6.2 Pipelining section 12.4 Superscalar chapter 14 RISC chapter 13 Parallel Processors chapter 18 Security

More information

Computer Architecture Prof. Smruti Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi

Computer Architecture Prof. Smruti Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi Computer Architecture Prof. Smruti Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture - 11 X86 Assembly Language Part-II (Refer Slide Time: 00:25)

More information

Machine Language, Assemblers and Linkers"

Machine Language, Assemblers and Linkers Machine Language, Assemblers and Linkers 1 Goals for this Lecture Help you to learn about: IA-32 machine language The assembly and linking processes 2 1 Why Learn Machine Language Last stop on the language

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

How Software Executes

How Software Executes How Software Executes CS-576 Systems Security Instructor: Georgios Portokalidis Overview Introduction Anatomy of a program Basic assembly Anatomy of function calls (and returns) Memory Safety Intel x86

More information

MARIE: An Introduction to a Simple Computer

MARIE: An Introduction to a Simple Computer MARIE: An Introduction to a Simple Computer 4.2 CPU Basics The computer s CPU fetches, decodes, and executes program instructions. The two principal parts of the CPU are the datapath and the control unit.

More information

EECS 213 Introduction to Computer Systems Dinda, Spring Homework 3. Memory and Cache

EECS 213 Introduction to Computer Systems Dinda, Spring Homework 3. Memory and Cache Homework 3 Memory and Cache 1. Reorder the fields in this structure so that the structure will (a) consume the most space and (b) consume the least space on an IA32 machine on Linux. struct foo { double

More information

Spectre and Meltdown. Clifford Wolf q/talk

Spectre and Meltdown. Clifford Wolf q/talk Spectre and Meltdown Clifford Wolf q/talk 2018-01-30 Spectre and Meltdown Spectre (CVE-2017-5753 and CVE-2017-5715) Is an architectural security bug that effects most modern processors with speculative

More information

Microprocessor and Assembly Language Week-5. System Programming, BCS 6th, IBMS (2017)

Microprocessor and Assembly Language Week-5. System Programming, BCS 6th, IBMS (2017) Microprocessor and Assembly Language Week-5 System Programming, BCS 6th, IBMS (2017) High Speed Memory Registers CPU store data temporarily in these location CPU process, store and transfer data from one

More information

Inline Assembler. Willi-Hans Steeb and Yorick Hardy. International School for Scientific Computing

Inline Assembler. Willi-Hans Steeb and Yorick Hardy. International School for Scientific Computing Inline Assembler Willi-Hans Steeb and Yorick Hardy International School for Scientific Computing e-mail: steebwilli@gmail.com Abstract We provide a collection of inline assembler programs. 1 Using the

More information

17. Instruction Sets: Characteristics and Functions

17. Instruction Sets: Characteristics and Functions 17. Instruction Sets: Characteristics and Functions Chapter 12 Spring 2016 CS430 - Computer Architecture 1 Introduction Section 12.1, 12.2, and 12.3 pp. 406-418 Computer Designer: Machine instruction set

More information

Assembly III: Procedures. Jo, Heeseung

Assembly III: Procedures. Jo, Heeseung Assembly III: Procedures Jo, Heeseung IA-32 Stack (1) Characteristics Region of memory managed with stack discipline Grows toward lower addresses Register indicates lowest stack address - address of top

More information