Assembly Language Programming Optimization
|
|
- Norah Wilkerson
- 5 years ago
- Views:
Transcription
1 Assembly Language Programming Optimization December 9, 2017
2 Conditional transfer Sometimes we make comparison only to execute a single assignment depending on the result. Then we can use conditional move instruction, where assignment is performed only if the indicated condition was satisfied, e.g. the instruction cmove eax,1 sets register eax to 1 only if recently compared elements were equal. The main advantage is avoidance of the necessity of cleaning the pipeline or speculative execution. lub wykonania spekulacyjnego. Conditional assignment SET. Conditional transfer CMOV.
3 Conditional transfer: an example Find maximum of two numbers (arguments in EAX and EBX, result in ECX): mov ecx,eax cmp ebx,ecx cmova ecx,ebx
4 Conditional transfers: errors Assume we are compiling in C the expression int *xp;... return (xp? *xp : 0); If xp is in rdi, we could try xor eax,eax ;Maybe we will return zero test rdi,rdi ;xp == 0? cmovne eax,[rdi] ;Maybe we will return *xp But then the dereference of xp will occurs always (even for the NULL pointer), and this we want to avoid.
5 Jump avoidance Avoiding jumps ia a larger problem. Let us look at the computation of absolute value of number skip: test eax,eax jns omiń neg eax ;We set flags ;Positive sign
6 Jump avoidance There is a different way: mov ecx,eax sar ecx,31 xor eax,ecx sub eax,ecx ;sign bit everywhere ;bit reverse ;we subtract -1 and have 2-complement
7 Power of 2 Another trick: how to check, whether a number in EAX is a power of two? mov ebx,eax ;or lea ebx,[eax - 1] dec ebx test eax,ebx jnz isnot
8 Hints The processor tries to guess, whether the conditional jump will be performed. With static guess it is assumed, that the jump backwards will be peformed. We can help it using hints: prefixes HT(0x3e) and HNT(0x2e), for example test ecx,ecx db 3eh jz L9... L9: ;HT = we will jump
9 Hints Sometimes holding the data in cache memory is not useful, if it is only used once Direct write instructions (non-temporal store) MOVNTI, MOVNTPD, etc. in write phase omit the cache.
10 Conservativity of compiler The C compiler must be conservative and generate code in such a way, that all possible cases are covered. Example: void memclr (char *data, int n) { for (; n > 0; n--) *data++ = 0; } If the compiler knew something about the alignment of data, it could generate a code to zero 2, 4 or ever 8 bajtów in one step. However, it must assume the worst case.
11 Conservativity of compiler There a few elements in C/C++, which are classic examples of slowing down programs. The group is lead by the conversion (cast) from real number to integer, for example int i; float f;... i = (int)f; Such conversion takes processor cycles. Reason: the C/C++ defines a different way of rounding than implemented in FPU, so we have to toggle coprocessor mode.
12 Conservativity of compiler Other nomination to Oscara prize is pointer aliasing. In the code below a compiler will not pull the evaluation of *p + 2 befor the loop void Func1 (int a[], int *p) { int i; for (i = 0; i < 100; i++) a[i] = *p + 2; } And it is right, because (hooray for C and C++ :-) void Func2() { int list[100]; Func1(list, &list[8]); }
13 Conservativity of compiler Sometimes the recipes are simple. The code below twice fetches arg1->p1 from the memory: struct S1 int p1; struct S2 int p2, p3; void f1 (struct S1 *arg1, struct S2 *arg2) arg2->p2 += arg1->p1; arg2->p3 += arg1->p1; It must work this way, because arg2->p2 and arg1->p1 may be the same memory cell. But it is enough to introduce local variable bound to S1->p1.
14 Assembler Asembler allows us to take advantage from low-level services: Registers and direct input/output. Violating the compiler conventions: different passing of parameters, violating the memory allocation rules, iterative call of procedures. Linking incompatible code fragments, e.g. built by different compilers. Code optimization by hand to adapt it to a very particular hardware configuration.
15 Extreme example Appetizer The following code in C float a[4], b[4], c[4]; for (int i = 0; i < 4; i++) { c[i] = a[i] > b[i]? a[i] : b[i]; } can be optimally coded as follows movaps xmm0,[a] maxps xmm0,[b] movaps [c],xmm0 ;Load a vector ;max(a,b) ;c = a > b? a : b
16 Not enough registers or two in one We have two variables index and increment, both 16-bit (short). On ARM they can pe put into one register, index at the top. Then the C code elem = tab[index]; index += increment; could be written in assembler as LDRB Relem, [Rtab, Rindincr, LSR#16] ADD Rindincr, Rindincr, Rindincr, LSL#16
17 Intel/AMD The instruction set of CISC processors (x86) is not optimal confirmed by several changes of architecture philosophy. It must be preserved because of back compatibility with systems from years 1980s, when RAM and disc memory were small and costly. But CISC also has some advantages. The compactness of code fits well to requirements of cache memories with restricted sizes. The main problem of x86 processors is lack of enough registers, alleviated a little when designing x86-64.
18 Graphics accelerators Demading graphic applications need platforms with graphics coprocessor or accelerator card. The computational power contained in them can be used also to other tasks, but this is another story (and it depends much on hardware).
19 64-bit code Advantages: More registers: usually no need to store variables and intermediate result in RAM memory. The efficient procedure call: passing parameters in registers. 64-bitowe registers for integers. Better management of large memory blocks. Built-in restricted SIMD (SSE). Relative addressing of data, efficient relocatable code.
20 64-bit code Disadvantages: Twice larger addresses and stack positions: troubles with cache memory. The access to static and global arrays requires more instructions for large memory images. Mostly for Windows and Mac. More complicated computation of effective memory address when the size greater than 2GB. Some instructions are longer.
21 Intrinsic functions in C++ New approach for joining code from different levels. Intrinsic functions represent known to the compiler processor instructions. Example: addition of floating-point vectors ADDPS may be written in C++ as the function _mm_add_ps. We can also define the appropriate class of vectors and overlod the + operator in it. Intrinsic functions exist in Microsoft, Intela and GNU compilers.
22 Examining compiled code Various reasons: Checking for evident places for rewriting by hand in assembly language (or for switching compiler flag, e.g. -O3 ;-) Use compiler as an intelligent typist, and the resulting code as more comfortable base than staring form nothing. This code at least has correct interfaces with environment, and they give us usually most troubles. And sometimes we will discover an error in compiler.
23 Examining compiled code Let us look at the loop for (int i = 0; i <= 15; i++) T[i] := i; The compiler should logically replace it by for (int i = 15; i >= 0; i--) T[i] := i; Reason: we save at a comparison instruction (with 15), because subtraction already set zero flag.
24 Examining compiled code But when loop body is much more complicated, it could be difficult for the compiler to decide, whether it may change the order of passing. Then we have to do it ourselves!
25 Intel C++ compiler (parallel composer) Intrinsics for vectors, automatic vectorization. OpenMP and automatic parallelization of threads. CPU dispatch: different versions for different processors. The best optimized mathematical libraries (but once they could not divide correctly). Drawback: the code may execute slower on AMD and VIA processors, then you should bypass dispatch.
26 GNU compiler Intrinsics for vectors, automatic vectorization. OpenMP and automatic parallelization of threads. Library optimization waits for its turn. But it accepts mathematical vector libraries of AMD and Intela.
27 Hardware restrictions On classic ARM registers are 32-bit wide. You should avoid types char and short for loop counters, because then one has to check ranges by hand, e.g. for instruction short i;... i++; must generate code to check each time, whether there is no overflow, and possibly roll to zero. As registers are 32-bit wide, so there is no signalling of overflow/carry for 16-bit numbers. Here also the compiler defenceless. Of course on x86 processor we do not have these problems (AL, AX).
28 Dependent instructions The total time of execution of a sequence of dependent instructions (same arguments and/or results) is equal to the sum of their latency necessary number of cycles. If instructions are independent, then next instruction starts earlier and the total time is decreased, for example code double list[100], sum = 0.0; for (int i = 0; i < 100; i++) sum += list[i]; should be replaced by double list[100], sum1 = 0.0, sum2 = 0.0, sum3 = 0.0, sum4 = 0. for (int i = 0; i < 100; i += 4) { sum1 += list[i]; sum2 += list[i+1]; sum3 += list[i+2]; sum4 += list[i+3]; } sum1 = (sum1 + sum2) + (sum3 + sum4);
29 Dependencies Sometimes it looks strange, for example the assignment instruction y = a + b + c + d; is better replaced by y = (a + b) + (c + d); The specification of many programming language forces the compiler to always compute the experssions form left to right (e.g. to have always the same rounding orders) and the compiler may not do anything.
30 Partial registers Some CPUs implement out of order execution, but are not able to rename partial registers (ax, ah, al). This causes the delay in the code below, because the third instruction has to wait for higher 16 bits from multiplication imul eax,6 mov [mem2],eax mov ax,[mem3] add ax,2 mov [mem4],ax If we replace this instruction by movzx eax,[mem3] the dependency is removed. ;16-bit operands It could be one of reasons for doing it automatically for 32-bit transfers in 64-bit mode.
31 Changing order of execution Mostly on strongly pipelined RISCs (e.g. ARM), forced by specific of a processor On ARM9TDMI after the memory load instruction (e.g. LDR) the loaded value should not be used for two cycles. Multiplication takes the same time as multiplication with accumulation (MLA). Conclusion obvious. On ARM10E instructions of multiple load from memory and store to it work in the background. Superficially they take uone cycle, unless we try to use one of these register in the following instruction. On Intel XScale the instruction LDRD loads two words at once (in one cycle. But the first register should not be used for two following cycles, and the second one for three cycles.
32 Jumps and procedures Fetching code after (unexpected) jump generates delays on the order of 1 3 cycles. The delay is largest when the destination address falls on the end of 16-byte block (frame). Paradox: it is sometimes worthy to replace in the code earlier the shorter form of a instruction with the longer one to get the alignment. To predict the returns from procedures (ret) processor uses so called return stack buffer, usually with 16 elements. Do not fool the mechanism by jumping out of procedures or secretly removing return addresses from the stack (or using ret as indirect jump). Reduction calls (tail calls) are implemented with jumps!
33 Metaprogramming Instead of writing twisted assembler macros or overabuse m4 it is better to write programs, which generate other programs or their parts: Table generators for sinus, cosinus or leap years Converters of bitmaps into fast display procedures Gettting different aspect from the same code (aspect-oriented programming) Specialized code in assembler based on script written in Scheme or other language and on additional constraints.
34 Tuning: tools AMD Code Analyst Intel VTune New-Jersey Machine-Code Toolkit (w ML) nr/toolkit/
Practical Malware Analysis
Practical Malware Analysis Ch 4: A Crash Course in x86 Disassembly Revised 1-16-7 Basic Techniques Basic static analysis Looks at malware from the outside Basic dynamic analysis Only shows you how the
More informationKampala August, Agner Fog
Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler
More informationBinghamton University. CS-220 Spring x86 Assembler. Computer Systems: Sections
x86 Assembler Computer Systems: Sections 3.1-3.5 Disclaimer I am not an x86 assembler expert. I have never written an x86 assembler program. (I am proficient in IBM S/360 Assembler and LC3 Assembler.)
More informationCNIT 127: Exploit Development. Ch 1: Before you begin. Updated
CNIT 127: Exploit Development Ch 1: Before you begin Updated 1-14-16 Basic Concepts Vulnerability A flaw in a system that allows an attacker to do something the designer did not intend, such as Denial
More informationThe Instruction Set. Chapter 5
The Instruction Set Architecture Level(ISA) Chapter 5 1 ISA Level The ISA level l is the interface between the compilers and the hardware. (ISA level code is what a compiler outputs) 2 Memory Models An
More informationSelected Machine Language. Instructions Introduction Inc and dec Instructions
Selected Machine Language 10 Instructions 10.1 Introduction As may have been learned from a computer organization text, there are many considerations that need to be taken into account and many different
More informationCPU. Fall 2003 CSE 207 Digital Design Project #4 R0 R1 R2 R3 R4 R5 R6 R7 PC STATUS IR. Control Logic RAM MAR MDR. Internal Processor Bus
http://www.engr.uconn.edu/~barry/cse207/fa03/project4.pdf Page 1 of 16 Fall 2003 CSE 207 Digital Design Project #4 Background Microprocessors are increasingly common in every day devices. Desktop computers
More informationMachine and Assembly Language Principles
Machine and Assembly Language Principles Assembly language instruction is synonymous with a machine instruction. Therefore, need to understand machine instructions and on what they operate - the architecture.
More informationComputer Organization CS 206 T Lec# 2: Instruction Sets
Computer Organization CS 206 T Lec# 2: Instruction Sets Topics What is an instruction set Elements of instruction Instruction Format Instruction types Types of operations Types of operand Addressing mode
More informationx86 assembly CS449 Fall 2017
x86 assembly CS449 Fall 2017 x86 is a CISC CISC (Complex Instruction Set Computer) e.g. x86 Hundreds of (complex) instructions Only a handful of registers RISC (Reduced Instruction Set Computer) e.g. MIPS
More informationReverse Engineering II: Basics. Gergely Erdélyi Senior Antivirus Researcher
Reverse Engineering II: Basics Gergely Erdélyi Senior Antivirus Researcher Agenda Very basics Intel x86 crash course Basics of C Binary Numbers Binary Numbers 1 Binary Numbers 1 0 1 1 Binary Numbers 1
More informationCSE351 Spring 2018, Midterm Exam April 27, 2018
CSE351 Spring 2018, Midterm Exam April 27, 2018 Please do not turn the page until 11:30. Last Name: First Name: Student ID Number: Name of person to your left: Name of person to your right: Signature indicating:
More informationInstruction Sets: Characteristics and Functions Addressing Modes
Instruction Sets: Characteristics and Functions Addressing Modes Chapters 10 and 11, William Stallings Computer Organization and Architecture 7 th Edition What is an Instruction Set? The complete collection
More informationMemory Models. Registers
Memory Models Most machines have a single linear address space at the ISA level, extending from address 0 up to some maximum, often 2 32 1 bytes or 2 64 1 bytes. Some machines have separate address spaces
More informationadministrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?
administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions? exam on Wednesday today s material not on the exam 1 Assembly Assembly is programming
More informationMartin Kruliš, v
Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal
More informationCS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08
CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 21: Generating Pentium Code 10 March 08 CS 412/413 Spring 2008 Introduction to Compilers 1 Simple Code Generation Three-address code makes it
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2018 Lecture 4 LAST TIME Enhanced our processor design in several ways Added branching support Allows programs where work is proportional to the input values
More informationReverse Engineering II: The Basics
Reverse Engineering II: The Basics Gergely Erdélyi Senior Manager, Anti-malware Research Protecting the irreplaceable f-secure.com Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 1 1 0 1 - Byte B D 1 0 1 1 1
More informationCSE P 501 Compilers. x86 Lite for Compiler Writers Hal Perkins Autumn /25/ Hal Perkins & UW CSE J-1
CSE P 501 Compilers x86 Lite for Compiler Writers Hal Perkins Autumn 2011 10/25/2011 2002-11 Hal Perkins & UW CSE J-1 Agenda Learn/review x86 architecture Core 32-bit part only for now Ignore crufty, backward-compatible
More informationSummary: Direct Code Generation
Summary: Direct Code Generation 1 Direct Code Generation Code generation involves the generation of the target representation (object code) from the annotated parse tree (or Abstract Syntactic Tree, AST)
More informationWinter Compiler Construction T11 Activation records + Introduction to x86 assembly. Today. Tips for PA4. Today:
Winter 2006-2007 Compiler Construction T11 Activation records + Introduction to x86 assembly Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University Today ic IC Language Lexical Analysis
More informationReverse Engineering II: The Basics
Reverse Engineering II: The Basics This document is only to be distributed to teachers and students of the Malware Analysis and Antivirus Technologies course and should only be used in accordance with
More informationOptimizing Memory Bandwidth
Optimizing Memory Bandwidth Don t settle for just a byte or two. Grab a whole fistful of cache. Mike Wall Member of Technical Staff Developer Performance Team Advanced Micro Devices, Inc. make PC performance
More informationCSE 351 Midterm - Winter 2015 Solutions
CSE 351 Midterm - Winter 2015 Solutions February 09, 2015 Please read through the entire examination first! We designed this exam so that it can be completed in 50 minutes and, hopefully, this estimate
More informationLecture 2 Assembly Language
Lecture 2 Assembly Language Computer and Network Security 9th of October 2017 Computer Science and Engineering Department CSE Dep, ACS, UPB Lecture 2, Assembly Language 1/37 Recap: Explorations Tools assembly
More informationEN164: Design of Computing Systems Lecture 24: Processor / ILP 5
EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More information22 Assembly Language for Intel-Based Computers, 4th Edition. 3. Each edge is a transition from one state to another, caused by some input.
22 Assembly Language for Intel-Based Computers, 4th Edition 6.6 Application: Finite-State Machines 1. A directed graph (also known as a diagraph). 2. Each node is a state. 3. Each edge is a transition
More informationCSC 8400: Computer Systems. Machine-Level Representation of Programs
CSC 8400: Computer Systems Machine-Level Representation of Programs Towards the Hardware High-level language (Java) High-level language (C) assembly language machine language (IA-32) 1 Compilation Stages
More informationCSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs
CSC 2400: Computer Systems Towards the Hardware: Machine-Level Representation of Programs Towards the Hardware High-level language (Java) High-level language (C) assembly language machine language (IA-32)
More informationX86 Addressing Modes Chapter 3" Review: Instructions to Recognize"
X86 Addressing Modes Chapter 3" Review: Instructions to Recognize" 1 Arithmetic Instructions (1)! Two Operand Instructions" ADD Dest, Src Dest = Dest + Src SUB Dest, Src Dest = Dest - Src MUL Dest, Src
More informationEECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution
1. (40 points) Write the following subroutine in x86 assembly: Recall that: int f(int v1, int v2, int v3) { int x = v1 + v2; urn (x + v3) * (x v3); Subroutine arguments are passed on the stack, and can
More informationComputer Systems Lecture 9
Computer Systems Lecture 9 CPU Registers in x86 CPU status flags EFLAG: The Flag register holds the CPU status flags The status flags are separate bits in EFLAG where information on important conditions
More informationComputer System Architecture
CSC 203 1.5 Computer System Architecture Department of Statistics and Computer Science University of Sri Jayewardenepura Addressing 2 Addressing Subject of specifying where the operands (addresses) are
More informationReverse Engineering Low Level Software. CS5375 Software Reverse Engineering Dr. Jaime C. Acosta
1 Reverse Engineering Low Level Software CS5375 Software Reverse Engineering Dr. Jaime C. Acosta Machine code 2 3 Machine code Assembly compile Machine Code disassemble 4 Machine code Assembly compile
More informationRepresentation of Information
Representation of Information CS61, Lecture 2 Prof. Stephen Chong September 6, 2011 Announcements Assignment 1 released Posted on http://cs61.seas.harvard.edu/ Due one week from today, Tuesday 13 Sept
More informationAssembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit
Assembly Language for Intel-Based Computers, 4 th Edition Kip R. Irvine Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit Slides prepared by Kip R. Irvine Revision date: 09/25/2002
More informationCS241 Computer Organization Spring 2015 IA
CS241 Computer Organization Spring 2015 IA-32 2-10 2015 Outline! Review HW#3 and Quiz#1! More on Assembly (IA32) move instruction (mov) memory address computation arithmetic & logic instructions (add,
More informationComputer Organization and Technology Processor and System Structures
Computer Organization and Technology Processor and System Structures Assoc. Prof. Dr. Wattanapong Kurdthongmee Division of Computer Engineering, School of Engineering and Resources, Walailak University
More informationImportant From Last Time
Important From Last Time Embedded C Pros and cons Macros and how to avoid them Intrinsics Interrupt syntax Inline assembly Today Advanced C What C programs mean How to create C programs that mean nothing
More informationCompiler Construction D7011E
Compiler Construction D7011E Lecture 8: Introduction to code generation Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 What is a Compiler? Compilers
More information3.1 DATA MOVEMENT INSTRUCTIONS 45
3.1.1 General-Purpose Data Movement s 45 3.1.2 Stack Manipulation... 46 3.1.3 Type Conversion... 48 3.2.1 Addition and Subtraction... 51 3.1 DATA MOVEMENT INSTRUCTIONS 45 MOV (Move) transfers a byte, word,
More informationBranching and Looping
Branching and Looping Ray Seyfarth August 10, 2011 Branching and looping So far we have only written straight line code Conditional moves helped spice things up In addition conditional moves kept the pipeline
More informationAssembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam
Assembly Language Lecture 2 - x86 Processor Architecture Ahmed Sallam Introduction to the course Outcomes of Lecture 1 Always check the course website Don t forget the deadline rule!! Motivations for studying
More informationStatic Analysis I PAOLO PALUMBO, F-SECURE CORPORATION
Static Analysis I PAOLO PALUMBO, F-SECURE CORPORATION Representing Data Binary numbers 1 0 1 1 NIBBLE 0xB 1 0 1 1 1 1 0 1 0xBD 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 BYTE WORD 0xBD 0x39 Endianness c9 33 41 03
More informationComputer Systems Laboratory Sungkyunkwan University
ARM & IA-32 Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ARM (1) ARM & MIPS similarities ARM: the most popular embedded core Similar basic set
More informationIslamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB
Islamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB Lab # 9 Integer Arithmetic and Bit Manipulation April, 2014 1 Assembly Language LAB Bitwise
More informationImportant From Last Time
Important From Last Time Embedded C Ø Pros and cons Macros and how to avoid them Intrinsics Interrupt syntax Inline assembly Today Advanced C What C programs mean How to create C programs that mean nothing
More informationRegisters. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth
Registers Ray Seyfarth September 8, 2011 Outline 1 Register basics 2 Moving a constant into a register 3 Moving a value from memory into a register 4 Moving values from a register into memory 5 Moving
More informationName: CMSC 313 Fall 2001 Computer Organization & Assembly Language Programming Exam 1. Question Points I. /34 II. /30 III.
CMSC 313 Fall 2001 Computer Organization & Assembly Language Programming Exam 1 Name: Question Points I. /34 II. /30 III. /36 TOTAL: /100 Instructions: 1. This is a closed-book, closed-notes exam. 2. You
More informationT Jarkko Turkulainen, F-Secure Corporation
T-110.6220 2010 Emulators and disassemblers Jarkko Turkulainen, F-Secure Corporation Agenda Disassemblers What is disassembly? What makes up an instruction? How disassemblers work Use of disassembly In
More informationReview addressing modes
Review addressing modes Op Src Dst Comments movl $0, %rax Register movl $0, 0x605428 Direct address movl $0, (%rcx) Indirect address movl $0, 20(%rsp) Indirect with displacement movl $0, -8(%rdi, %rax,
More informationCSE2421 FINAL EXAM SPRING Name KEY. Instructions: Signature
CSE2421 FINAL EXAM SPRING 2013 Name KEY Instructions: This is a closed-book, closed-notes, closed-neighbor exam. Only a writing utensil is needed for this exam. No calculators allowed. If you need to go
More informationProcess Layout and Function Calls
Process Layout and Function Calls CS 6 Spring 07 / 8 Process Layout in Memory Stack grows towards decreasing addresses. is initialized at run-time. Heap grow towards increasing addresses. is initialized
More information3.0 Instruction Set. 3.1 Overview
3.0 Instruction Set 3.1 Overview There are 16 different P8 instructions. Research on instruction set usage was the basis for instruction selection. Each instruction has at least two addressing modes, with
More informationIntroduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C
Final Review CS304 Introduction to C Why C? Difference between Python and C C compiler stages Basic syntax in C Pointers What is a pointer? declaration, &, dereference... Pointer & dynamic memory allocation
More informationReal instruction set architectures. Part 2: a representative sample
Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length
More informationAssembly Language. Lecture 2 x86 Processor Architecture
Assembly Language Lecture 2 x86 Processor Architecture Ahmed Sallam Slides based on original lecture slides by Dr. Mahmoud Elgayyar Introduction to the course Outcomes of Lecture 1 Always check the course
More informationSOEN228, Winter Revision 1.2 Date: October 25,
SOEN228, Winter 2003 Revision 1.2 Date: October 25, 2003 1 Contents Flags Mnemonics Basic I/O Exercises Overview of sample programs 2 Flag Register The flag register stores the condition flags that retain
More informationLab 4: Basic Instructions and Addressing Modes
COE 205 Lab Manual Lab 4: Basic Instructions and Addressing Modes - page 36 Lab 4: Basic Instructions and Addressing Modes Contents 4.1. Data Transfer Instructions 4.2. Addition and Subtraction 4.3. Data
More informationCS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS
CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight
More information6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU
1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high
More informationCS , Spring 2004 Exam 1
Andrew login ID: Full Name: CS 15-213, Spring 2004 Exam 1 February 26, 2004 Instructions: Make sure that your exam is not missing any sheets (there should be 15), then write your full name and Andrew login
More informationProgram Exploitation Intro
Program Exploitation Intro x86 Assembly 04//2018 Security 1 Univeristà Ca Foscari, Venezia What is Program Exploitation "Making a program do something unexpected and not planned" The right bugs can be
More informationPage 1. Today. Important From Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right?
Important From Last Time Today Embedded C Pros and cons Macros and how to avoid them Intrinsics Interrupt syntax Inline assembly Advanced C What C programs mean How to create C programs that mean nothing
More informationDefining and Using Simple Data Types
85 CHAPTER 4 Defining and Using Simple Data Types This chapter covers the concepts essential for working with simple data types in assembly-language programs The first section shows how to declare integer
More informationDigital Forensics Lecture 3 - Reverse Engineering
Digital Forensics Lecture 3 - Reverse Engineering Low-Level Software Akbar S. Namin Texas Tech University Spring 2017 Reverse Engineering High-Level Software Low-level aspects of software are often the
More informationCSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1
CSIS1120A 10. Instruction Set & Addressing Mode CSIS1120A 10. Instruction Set & Addressing Mode 1 Elements of a Machine Instruction Operation Code specifies the operation to be performed, e.g. ADD, SUB
More informationAssembly Language Programming 64-bit environments
Assembly Language Programming 64-bit environments October 17, 2017 Some recent history Intel together with HP start to work on 64-bit processor using VLIW technology. Itanium processor is born with the
More informationFunctions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth
Functions Ray Seyfarth August 4, 2011 Functions We will write C compatible function C++ can also call C functions using extern "C" {...} It is generally not sensible to write complete assembly programs
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 8 Processor-level SIMD SIMD instructions can perform
More informationRegisters. Registers
All computers have some registers visible at the ISA level. They are there to control execution of the program hold temporary results visible at the microarchitecture level, such as the Top Of Stack (TOS)
More informationHPC VT Machine-dependent Optimization
HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler
More informationMachine/Assembler Language Putting It All Together
COMP 40: Machine Structure and Assembly Language Programming Fall 2015 Machine/Assembler Language Putting It All Together Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah
More information2.7 Supporting Procedures in hardware. Why procedures or functions? Procedure calls
2.7 Supporting Procedures in hardware Why procedures or functions? Procedure calls Caller: Callee: Proc save registers save more registers set up parameters do function call procedure set up results get
More informationEJEMPLOS DE ARQUITECTURAS
Maestría en Electrónica Arquitectura de Computadoras Unidad 4 EJEMPLOS DE ARQUITECTURAS M. C. Felipe Santiago Espinosa Marzo/2017 ARM & MIPS Similarities ARM: the most popular embedded core Similar basic
More informationreply db y prompt db Enter your favourite colour:, 0 colour db 80 dup(?) i db 20 k db? num dw 4000 large dd 50000
Declaring Variables in Assembly Language As in Java, variables must be declared before they can be used Unlike Java, we do not specify a variable type in the declaration in assembly language Instead we
More informationObjectives. ICT106 Fundamentals of Computer Systems Topic 8. Procedures, Calling and Exit conventions, Run-time Stack Ref: Irvine, Ch 5 & 8
Objectives ICT106 Fundamentals of Computer Systems Topic 8 Procedures, Calling and Exit conventions, Run-time Stack Ref: Irvine, Ch 5 & 8 To understand how HLL procedures/functions are actually implemented
More informationCSCE 212H, Spring 2008 Lab Assignment 3: Assembly Language Assigned: Feb. 7, Due: Feb. 14, 11:59PM
CSCE 212H, Spring 2008 Lab Assignment 3: Assembly Language Assigned: Feb. 7, Due: Feb. 14, 11:59PM February 7, 2008 1 Overview The purpose of this assignment is to introduce you to the assembly language
More informationInstruction Set Architectures
Instruction Set Architectures! ISAs! Brief history of processors and architectures! C, assembly, machine code! Assembly basics: registers, operands, move instructions 1 What should the HW/SW interface
More informationModule 3 Instruction Set Architecture (ISA)
Module 3 Instruction Set Architecture (ISA) I S A L E V E L E L E M E N T S O F I N S T R U C T I O N S I N S T R U C T I O N S T Y P E S N U M B E R O F A D D R E S S E S R E G I S T E R S T Y P E S O
More informationCompiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine.
This lecture Compiler construction Lecture 6: Code generation for x86 Magnus Myreen Spring 2018 Chalmers University of Technology Gothenburg University x86 architecture s Some x86 instructions From LLVM
More informationUNIT- 5. Chapter 12 Processor Structure and Function
UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers
More informationCS , Fall 2004 Exam 1
Andrew login ID: Full Name: CS 15-213, Fall 2004 Exam 1 Tuesday October 12, 2004 Instructions: Make sure that your exam is not missing any sheets, then write your full name and Andrew login ID on the front.
More informationWhat is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation
Compiler Construction SMD163 Lecture 8: Introduction to code generation Viktor Leijon & Peter Jonsson with slides by Johan Nordlander Contains material generously provided by Mark P. Jones What is a Compiler?
More informationMachine Programming 3: Procedures
Machine Programming 3: Procedures CS61, Lecture 5 Prof. Stephen Chong September 15, 2011 Announcements Assignment 2 (Binary bomb) due next week If you haven t yet please create a VM to make sure the infrastructure
More informationWilliam Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function
William Stallings Computer Organization and Architecture 8 th Edition Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data
More informationInterfacing Compiler and Hardware. Computer Systems Architecture. Processor Types And Instruction Sets. What Instructions Should A Processor Offer?
Interfacing Compiler and Hardware Computer Systems Architecture FORTRAN 90 program C++ program Processor Types And Sets FORTRAN 90 Compiler C++ Compiler set level Hardware 1 2 What s Should A Processor
More informationRAID 0 (non-redundant) RAID Types 4/25/2011
Exam 3 Review COMP375 Topics I/O controllers chapter 7 Disk performance section 6.3-6.4 RAID section 6.2 Pipelining section 12.4 Superscalar chapter 14 RISC chapter 13 Parallel Processors chapter 18 Security
More informationComputer Architecture Prof. Smruti Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi
Computer Architecture Prof. Smruti Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture - 11 X86 Assembly Language Part-II (Refer Slide Time: 00:25)
More informationMachine Language, Assemblers and Linkers"
Machine Language, Assemblers and Linkers 1 Goals for this Lecture Help you to learn about: IA-32 machine language The assembly and linking processes 2 1 Why Learn Machine Language Last stop on the language
More informationEN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design
EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown
More informationHow Software Executes
How Software Executes CS-576 Systems Security Instructor: Georgios Portokalidis Overview Introduction Anatomy of a program Basic assembly Anatomy of function calls (and returns) Memory Safety Intel x86
More informationMARIE: An Introduction to a Simple Computer
MARIE: An Introduction to a Simple Computer 4.2 CPU Basics The computer s CPU fetches, decodes, and executes program instructions. The two principal parts of the CPU are the datapath and the control unit.
More informationEECS 213 Introduction to Computer Systems Dinda, Spring Homework 3. Memory and Cache
Homework 3 Memory and Cache 1. Reorder the fields in this structure so that the structure will (a) consume the most space and (b) consume the least space on an IA32 machine on Linux. struct foo { double
More informationSpectre and Meltdown. Clifford Wolf q/talk
Spectre and Meltdown Clifford Wolf q/talk 2018-01-30 Spectre and Meltdown Spectre (CVE-2017-5753 and CVE-2017-5715) Is an architectural security bug that effects most modern processors with speculative
More informationMicroprocessor and Assembly Language Week-5. System Programming, BCS 6th, IBMS (2017)
Microprocessor and Assembly Language Week-5 System Programming, BCS 6th, IBMS (2017) High Speed Memory Registers CPU store data temporarily in these location CPU process, store and transfer data from one
More informationInline Assembler. Willi-Hans Steeb and Yorick Hardy. International School for Scientific Computing
Inline Assembler Willi-Hans Steeb and Yorick Hardy International School for Scientific Computing e-mail: steebwilli@gmail.com Abstract We provide a collection of inline assembler programs. 1 Using the
More information17. Instruction Sets: Characteristics and Functions
17. Instruction Sets: Characteristics and Functions Chapter 12 Spring 2016 CS430 - Computer Architecture 1 Introduction Section 12.1, 12.2, and 12.3 pp. 406-418 Computer Designer: Machine instruction set
More informationAssembly III: Procedures. Jo, Heeseung
Assembly III: Procedures Jo, Heeseung IA-32 Stack (1) Characteristics Region of memory managed with stack discipline Grows toward lower addresses Register indicates lowest stack address - address of top
More information