CS220. April 25, 2007

Size: px
Start display at page:

Download "CS220. April 25, 2007"

Transcription

1 CS220 April 25, 2007

2 AT&T syntax MMX Most MMX documents are in Intel Syntax OPERATION DEST, SRC We use AT&T Syntax OPERATION SRC, DEST Always remember: DEST = DEST OPERATION SRC (Please note the weird subtraction and division operation direction in FP was a mistake of gcc)

3 Multiplication Except for multiplication, conversion, and comparison, all other MMX instructions are straightforward. PMADDWD mm/m64, mm

4 PMULHW mm/m64, mm Doubleword->word, keep high part PMULLW mm/m64, mm Doubleword->word, keep low part

5 Conversion PACKSSDW mm/m64, mm PACKUSDW mm/m64, mm doubleword->word PACKUSWB mm/m64, mm word->byte

6 How to do interleave pack? PACKSSDW %mm0, %mm0 PACKSSDW %mm1, %mm1 PUNPKLWD %mm1, %mm0 (interleave the low end 16-bit values of the operands)

7 PUNPCKHBW mm/m64, mm Low parts of original 64 bits are ignored byte_src+byte_dst=word_dst PUNPCKLBW mm/m64/m32, mm High parts of original 64 bits are ignored byte_src+byte_dst=word_dst

8 How to do non-interleaved unpack? MOVQ %mm0, %mm2 PUNPCKLDQ %mm1, %mm0 (replace the two high end words of mm0 with the two low end words of mm1 leave the two low end words of mm0 in place) PUNPCKHDQ %mm1, %mm2 (move the two high end words of mm2 to the two low end words of mm2; place the two high end words of mm1 in the two high end words of mm2) mm0 mm2

9 PCMPEQW mm/m64, mm PCMPGTW mm/m64, mm

10 Rule of Thumb Only Shift instructions can have immediate number Only movd instruction can have 32-bit register Punpckl can have 32-bit memory source All other instructions deal with 64-bit registers or memory. No immediate number!

11 Constant numbers Generate a zero in mm0: PXOR %mm0, %mm0 PANDN %mm0, %mm0 Generate all 1's in register mm1, which is -1 in each of the packed data type fields: PCMPEQ %mm1, %mm1 Generate the constant 1 in every packed-byte [or packed-word] (or packed-dword) field: PXOR %mm0, %mm0 PCMPEQ %mm1, %mm1 PSUBB %mm1, %mm0 [PSUBW %mm1, %mm0] (PSUBD %mm1, %mm0) Generate the signed constant 2 n -1 in every packed-word (or packed-dword) field: PCMPEQ %mm1, %mm1 PSRLW $(16-n), %mm1 (PSRLD $(32-n), %mm1) Generate the signed constant -2 n in every packed-word (or packed-dword) field: PCMPEQ %mm1, %mm1 PSLLW $n, %mm1 (PSLLD $n, %mm1)

12 Examples absolute value of a vector of signed words movq %mm0, %mm1 #make a copy of source data psraw $15, %mm0 #replicate sign bit pxor %mm0, %mm1 # psubs %mm0, %mm1 #add 1 to just the negative fields PXOR/XOR a number with all 0s, get itself PXOR/XOR a number with all 1s, get NOT(itself) The data in %mm0 are all 0 s and all 1 s For positive number, it subtracts 0 s(0) For negative number, it subtracts 1 s(-1)

13 Dot Production #include<stdio.h> main() { int i; int result; unsigned short a[] = {1, 2, 3, 4, 5, 6, 7, 8}; unsigned short b[] = {2, 4, 6, 8, 10, 12, 14, 16}; asm ("pxor %mm7,%mm7"); } for(i = 0; i < sizeof(a)/sizeof(short); i += 4){ asm ("movq %0,%%mm0\n\t" "movq %1,%%mm1\n\t" "pmaddwd %%mm1,%%mm0\n\t" "paddd %%mm0,%%mm7" : : "m" (a[i]), "m" (b[i]) ); } asm ("movq %%mm7,%%mm0\n\t" "psrlq $32,%%mm0\n\t" "paddd %%mm7,%%mm0\n\t" "movd %%mm0,%0\n\t" movd moves lower 32bits of mm0 "emms" :"=m" (result) ); printf("dotproduction: %d\n", result);

14 PCMPEQ (packed compare for equality) is performed on the weathercaster and blue-screen images, yielding a bitmask that traces the outline of the weathercaster. This bitmask image is PANDNed (packed and not) with the weathercaster image, yielding the first intermediate image: now the weathercaster has no background behind her. The same bitmask image is PANDed (packed and) with the weather map image, yielding the second intermediate image. The two intermediate images are PORed (packed or) together, resulting in final composite of the weathercaster over weather map Weathercaster

15 Address or Content?.section.rodata mybytes:.byte 'a','b','c','d','e','f','g','h' mystr:.ascii "abcdefghijklmnopqrstuvwxyz".text.globl main.type main: pushl %ebp movl %esp, %ebp movl mybytes, %eax movl $mybytes, %ebx movl (mybytes), %edx movl (%ebx), %edx xorl %ecx, %ecx movl $mystr, %ebx movq (%ebx,%ecx,8),%mm0 leal mystr, %ebx movq (%ebx,%ecx,8),%mm1 leal (mystr), %ebx movq (%ebx,%ecx,8),%mm2 movq mystr(,%ecx,8),%mm3 movq mystr,%mm4 movq (mystr),%mm5 subl $8, %esp movq %mm0, (%esp) leave ret.size main,.-main Content in %eax, %ecx and %edx: 0x == abcd Content in %ebx: Address Content in %mm0-%mm5: 0x H address L address abcdefgh L address H address 0x61==97== a

16 Misc Context Switching FP mode to MMX mode: 28 cycles MMX mode to FP mode: 53 cycles FP_code:... MMX_code:... EMMS (*mark the FP tag word as empty*) FP_code 1: Also FNSAVE and FRSTR

17 MMX Instruction Set Category Mnemonic Different Opcodes Description Arithmetic PADD[B,W,D] 3 Add with wrap-around on [byte, word, doubleword] PADDS[B,W] 2 Add signed with saturation on [byte, word] PADDUS[B,W] 2 Add unsigned with saturation on [byte, word] PSUB[B,W,D] 3 Subtract with wrap-around on [byte, word, doubleword] PSUBS[B,W] 2 Subtract signed with saturation on [byte, word] PSUBUS[B,W] 2 Subtract unsigned with saturation on [byte, word] PMULHW 1 Packed multiply high on words PMULLW 1 Packed multiply low on words PMADDWD 1 Packed multiply on words and add resulting pairs Comparison PCMPEQ[B,W,D] 3 Packed compare for equality [byte, word,doubleword] PCMPGT[B,W,D] 3 Packed compare greater than [byte, word, doubleword] Conversion PACKUSWB 1 Pack words into bytes (unsigned with saturation) PACKSS[WB,DW] 2 Pack [words into bytes, doublewords into words] (signed with saturation) PUNPCKH [BW,WD,DQ] 3 Unpack (interleave) high-order [bytes, words, doublewords] from MMXTM register PUNPCKL [BW,WD,DQ] 3 Unpack (interleave) low-order [bytes, words, doublewords] from MMX register Logical PAND 1 Bitwise AND PANDN 1 Bitwise AND NOT POR 1 Bitwise OR PXOR 1 Bitwise XOR Shift PSLL[W,D,Q] 6 Packed shift left logical [word, doubleword, quadword] by amount specified in MMX register or by immediate value PSRL[W,D,Q] 6 Packed shift right logical [word, doubleword, quadword] by amount specified in MMX register or by immediate value PSRA[W,D] 4 Packed shift right arithmetic [word, doubleword] by amount specified in MMX register or by immediate value Data Transfer MOV[D,Q] 4 Move [doubleword, quadword] to MMX register or from MMX register State Mgmt EMMS 1 Empty MMX state

CS802 Parallel Processing Class Notes

CS802 Parallel Processing Class Notes CS802 Parallel Processing Class Notes MMX Technology Instructor: Dr. Chang N. Zhang Winter Semester, 2006 Intel MMX TM Technology Chapter 1: Introduction to MMX technology 1.1 Features of the MMX Technology

More information

MMX TM Technology Technical Overview

MMX TM Technology Technical Overview MMX TM Technology Technical Overview Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel products. No license,

More information

Intel MMX Technology Overview

Intel MMX Technology Overview Intel MMX Technology Overview March 996 Order Number: 24308-002 E Information in this document is provided in connection with Intel products. No license under any patent or copyright is granted expressly

More information

Cannot increase performance by multiple issuing. -limitation of Instruction Fetch and decode rate (memory bottelneck) -Not enough ILP

Cannot increase performance by multiple issuing. -limitation of Instruction Fetch and decode rate (memory bottelneck) -Not enough ILP Vector Processors Motivations: Cannot increase performance with deeper pipeline because: -clock cycle time limitation (latch delay) -increase dependences with deeper pipeline Cannot increase performance

More information

Intel Architecture MMX Technology

Intel Architecture MMX Technology D Intel Architecture MMX Technology Programmer s Reference Manual March 1996 Order No. 243007-002 Subject to the terms and conditions set forth below, Intel hereby grants you a nonexclusive, nontransferable

More information

Instruction Set Progression. from MMX Technology through Streaming SIMD Extensions 2

Instruction Set Progression. from MMX Technology through Streaming SIMD Extensions 2 Instruction Set Progression from MMX Technology through Streaming SIMD Extensions 2 This article summarizes the progression of change to the instruction set in the Intel IA-32 architecture, from MMX technology

More information

Using MMX Instructions to Implement a 1/3T Equalizer

Using MMX Instructions to Implement a 1/3T Equalizer Using MMX Instructions to Implement a 1/3T Equalizer Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel

More information

Intel SIMD architecture. Computer Organization and Assembly Languages Yung-Yu Chuang

Intel SIMD architecture. Computer Organization and Assembly Languages Yung-Yu Chuang Intel SIMD architecture Computer Organization and Assembly Languages g Yung-Yu Chuang Overview SIMD MMX architectures MMX instructions examples SSE/SSE2 SIMD instructions are probably the best place to

More information

Intel SIMD architecture. Computer Organization and Assembly Languages Yung-Yu Chuang 2006/12/25

Intel SIMD architecture. Computer Organization and Assembly Languages Yung-Yu Chuang 2006/12/25 Intel SIMD architecture Computer Organization and Assembly Languages Yung-Yu Chuang 2006/12/25 Reference Intel MMX for Multimedia PCs, CACM, Jan. 1997 Chapter 11 The MMX Instruction Set, The Art of Assembly

More information

Using MMX Instructions to Implement the G.728 Codebook Search

Using MMX Instructions to Implement the G.728 Codebook Search Using MMX Instructions to Implement the G.728 Codebook Search Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection

More information

4) C = 96 * B 5) 1 and 3 only 6) 2 and 4 only

4) C = 96 * B 5) 1 and 3 only 6) 2 and 4 only Instructions: The following questions use the AT&T (GNU) syntax for x86-32 assembly code, as in the course notes. Submit your answers to these questions to the Curator as OQ05 by the posted due date and

More information

Using MMX Instructions to Compute the L1 Norm Between Two 16-bit Vectors

Using MMX Instructions to Compute the L1 Norm Between Two 16-bit Vectors Using MMX Instructions to Compute the L1 Norm Between Two 16-bit Vectors Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in

More information

Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation

Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided

More information

SEN361 Computer Organization. Prof. Dr. Hasan Hüseyin BALIK (8 th Week)

SEN361 Computer Organization. Prof. Dr. Hasan Hüseyin BALIK (8 th Week) + SEN361 Computer Organization Prof. Dr. Hasan Hüseyin BALIK (8 th Week) + Outline 3. The Central Processing Unit 3.1 Instruction Sets: Characteristics and Functions 3.2 Instruction Sets: Addressing Modes

More information

CS241 Computer Organization Spring 2015 IA

CS241 Computer Organization Spring 2015 IA CS241 Computer Organization Spring 2015 IA-32 2-10 2015 Outline! Review HW#3 and Quiz#1! More on Assembly (IA32) move instruction (mov) memory address computation arithmetic & logic instructions (add,

More information

Assembly level Programming. 198:211 Computer Architecture. (recall) Von Neumann Architecture. Simplified hardware view. Lecture 10 Fall 2012

Assembly level Programming. 198:211 Computer Architecture. (recall) Von Neumann Architecture. Simplified hardware view. Lecture 10 Fall 2012 19:211 Computer Architecture Lecture 10 Fall 20 Topics:Chapter 3 Assembly Language 3.2 Register Transfer 3. ALU 3.5 Assembly level Programming We are now familiar with high level programming languages

More information

Credits and Disclaimers

Credits and Disclaimers Credits and Disclaimers 1 The examples and discussion in the following slides have been adapted from a variety of sources, including: Chapter 3 of Computer Systems 2 nd Edition by Bryant and O'Hallaron

More information

x86 assembly CS449 Fall 2017

x86 assembly CS449 Fall 2017 x86 assembly CS449 Fall 2017 x86 is a CISC CISC (Complex Instruction Set Computer) e.g. x86 Hundreds of (complex) instructions Only a handful of registers RISC (Reduced Instruction Set Computer) e.g. MIPS

More information

Question 4.2 2: (Solution, p 5) Suppose that the HYMN CPU begins with the following in memory. addr data (translation) LOAD 11110

Question 4.2 2: (Solution, p 5) Suppose that the HYMN CPU begins with the following in memory. addr data (translation) LOAD 11110 Questions 1 Question 4.1 1: (Solution, p 5) Define the fetch-execute cycle as it relates to a computer processing a program. Your definition should describe the primary purpose of each phase. Question

More information

Credits and Disclaimers

Credits and Disclaimers Credits and Disclaimers 1 The examples and discussion in the following slides have been adapted from a variety of sources, including: Chapter 3 of Computer Systems 3 nd Edition by Bryant and O'Hallaron

More information

Process Layout and Function Calls

Process Layout and Function Calls Process Layout and Function Calls CS 6 Spring 07 / 8 Process Layout in Memory Stack grows towards decreasing addresses. is initialized at run-time. Heap grow towards increasing addresses. is initialized

More information

Using MMX Instructions to Implement a Row Filter Algorithm

Using MMX Instructions to Implement a Row Filter Algorithm sing MMX Instructions to Implement a Row Filter Algorithm Information for Developers and ISs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with

More information

Using MMX Instructions to Perform Simple Vector Operations

Using MMX Instructions to Perform Simple Vector Operations Using MMX Instructions to Perform Simple Vector Operations Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with

More information

Using MMX Instructions for 3D Bilinear Texture Mapping

Using MMX Instructions for 3D Bilinear Texture Mapping Using MMX Instructions for 3D Bilinear Texture Mapping Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel

More information

x86 assembly CS449 Spring 2016

x86 assembly CS449 Spring 2016 x86 assembly CS449 Spring 2016 CISC vs. RISC CISC [Complex instruction set Computing] - larger, more feature-rich instruction set (more operations, addressing modes, etc.). slower clock speeds. fewer general

More information

Intel assembly language using gcc

Intel assembly language using gcc QOTD Intel assembly language using gcc Assembly language programming is difficult. Make no mistake about that. It is not for wimps and weaklings. - Tanenbaum s 6th, page 519 These notes are a supplement

More information

An Efficient Vector/Matrix Multiply Routine using MMX Technology

An Efficient Vector/Matrix Multiply Routine using MMX Technology An Efficient Vector/Matrix Multiply Routine using MMX Technology Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection

More information

Media Instructions, Coprocessors, and Hardware Accelerators. Overview

Media Instructions, Coprocessors, and Hardware Accelerators. Overview Media Instructions, Coprocessors, and Hardware Accelerators Steven P. Smith SoC Design EE382V Fall 2009 EE382 System-on-Chip Design Coprocessors, etc. SPS-1 University of Texas at Austin Overview SoCs

More information

Using MMX Instructions to Implement a Modem Baseband Canceler

Using MMX Instructions to Implement a Modem Baseband Canceler Using MMX Instructions to Implement a Modem Baseband Canceler Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection

More information

Machine and Assembly Language Principles

Machine and Assembly Language Principles Machine and Assembly Language Principles Assembly language instruction is synonymous with a machine instruction. Therefore, need to understand machine instructions and on what they operate - the architecture.

More information

Using MMX Instructions to Perform 16-Bit x 31-Bit Multiplication

Using MMX Instructions to Perform 16-Bit x 31-Bit Multiplication Using MMX Instructions to Perform 16-Bit x 31-Bit Multiplication Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection

More information

CPS104 Recitation: Assembly Programming

CPS104 Recitation: Assembly Programming CPS104 Recitation: Assembly Programming Alexandru Duțu 1 Facts OS kernel and embedded software engineers use assembly for some parts of their code some OSes had their entire GUIs written in assembly in

More information

Preface. Intel Technology Journal Q3, Lin Chao Editor Intel Technology Journal

Preface. Intel Technology Journal Q3, Lin Chao Editor Intel Technology Journal Intel Technology Journal Q3, 1997 Preface Lin Chao Editor Intel Technology Journal Welcome to the Intel Technology Journal. After a decade as an internal R&D journal, we're broadening our audience and

More information

Intel Xeon Scalable Processor

Intel Xeon Scalable Processor Intel Xeon Scalable Processor Instruction Throughput and Latency August 2017 Revision 1.1 336289-002 Document ID: 336289-002 Revision Number: 1.1 Revision History Document ID Description Date 336289-001

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2016 Lecture 12

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2016 Lecture 12 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2016 Lecture 12 CS24 MIDTERM Midterm format: 6 hour overall time limit, multiple sittings (If you are focused on midterm, clock should be running.) Open book

More information

You may work with a partner on this quiz; both of you must submit your answers.

You may work with a partner on this quiz; both of you must submit your answers. Instructions: Choose the best answer for each of the following questions. It is possible that several answers are partially correct, but one answer is best. It is also possible that several answers are

More information

Credits and Disclaimers

Credits and Disclaimers Credits and Disclaimers 1 The examples and discussion in the following slides have been adapted from a variety of sources, including: Chapter 3 of Computer Systems 2 nd Edition by Bryant and O'Hallaron

More information

Program Exploitation Intro

Program Exploitation Intro Program Exploitation Intro x86 Assembly 04//2018 Security 1 Univeristà Ca Foscari, Venezia What is Program Exploitation "Making a program do something unexpected and not planned" The right bugs can be

More information

parallelism in these applications.

parallelism in these applications. Alex Peleg Uri Weiser Intel Israel Design Center Designed to accelerate multimedia and communications software, iz'lilx tech nology impmues performance by introducing data types and instructions to the

More information

Machine Programming 1: Introduction

Machine Programming 1: Introduction Machine Programming 1: Introduction CS61, Lecture 3 Prof. Stephen Chong September 8, 2011 Announcements (1/2) Assignment 1 due Tuesday Please fill in survey by 5pm today! Assignment 2 will be released

More information

Assembly Language: Function Calls

Assembly Language: Function Calls Assembly Language: Function Calls 1 Goals of this Lecture Help you learn: Function call problems: Calling and returning Passing parameters Storing local variables Handling registers without interference

More information

Using MMX Technology in Digital Image Processing (Technical Report and Coding Examples) TR-98-13

Using MMX Technology in Digital Image Processing (Technical Report and Coding Examples) TR-98-13 Using MMX Technology in Digital Image Processing (Technical Report and Coding Examples) TR-98-13 Vladimir Kravtchenko Department of Computer Science The University of British Columbia 201-2366 Main Mall,

More information

Assembly Language: Function Calls" Goals of this Lecture"

Assembly Language: Function Calls Goals of this Lecture Assembly Language: Function Calls" 1 Goals of this Lecture" Help you learn:" Function call problems:" Calling and returning" Passing parameters" Storing local variables" Handling registers without interference"

More information

CS 3843 Final Exam Fall 2012

CS 3843 Final Exam Fall 2012 CS 3843 Final Exam Fall 2012 Name (Last), (First) ID Please indicate your session: Morning Afternoon You may use a calculator and two sheets of notes on this exam, but no other materials and no computer.

More information

CS241 Computer Organization Spring Introduction to Assembly

CS241 Computer Organization Spring Introduction to Assembly CS241 Computer Organization Spring 2015 Introduction to Assembly 2-05 2015 Outline! Rounding floats: round-to-even! Introduction to Assembly (IA32) move instruction (mov) memory address computation arithmetic

More information

Assembly Language: Function Calls" Goals of this Lecture"

Assembly Language: Function Calls Goals of this Lecture Assembly Language: Function Calls" 1 Goals of this Lecture" Help you learn:" Function call problems:" Calling and urning" Passing parameters" Storing local variables" Handling registers without interference"

More information

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems Assembly Language: Function Calls 1 Goals of this Lecture Help you learn: Function call problems: Calling and urning Passing parameters Storing local variables Handling registers without interference Returning

More information

Lecture 15 Intel Manual, Vol. 1, Chapter 3. Fri, Mar 6, Hampden-Sydney College. The x86 Architecture. Robb T. Koether. Overview of the x86

Lecture 15 Intel Manual, Vol. 1, Chapter 3. Fri, Mar 6, Hampden-Sydney College. The x86 Architecture. Robb T. Koether. Overview of the x86 Lecture 15 Intel Manual, Vol. 1, Chapter 3 Hampden-Sydney College Fri, Mar 6, 2009 Outline 1 2 Overview See the reference IA-32 Intel Software Developer s Manual Volume 1: Basic, Chapter 3. Instructions

More information

COMP 210 Example Question Exam 2 (Solutions at the bottom)

COMP 210 Example Question Exam 2 (Solutions at the bottom) _ Problem 1. COMP 210 Example Question Exam 2 (Solutions at the bottom) This question will test your ability to reconstruct C code from the assembled output. On the opposing page, there is asm code for

More information

CSE P 501 Compilers. x86 Lite for Compiler Writers Hal Perkins Autumn /25/ Hal Perkins & UW CSE J-1

CSE P 501 Compilers. x86 Lite for Compiler Writers Hal Perkins Autumn /25/ Hal Perkins & UW CSE J-1 CSE P 501 Compilers x86 Lite for Compiler Writers Hal Perkins Autumn 2011 10/25/2011 2002-11 Hal Perkins & UW CSE J-1 Agenda Learn/review x86 architecture Core 32-bit part only for now Ignore crufty, backward-compatible

More information

ESTIMATING MULTIMEDIA INSTRUCTION PERFORMANCE BASED ON WORKLOAD CHARACTERIZATION AND MEASUREMENT

ESTIMATING MULTIMEDIA INSTRUCTION PERFORMANCE BASED ON WORKLOAD CHARACTERIZATION AND MEASUREMENT ESTIMATING MULTIMEDIA INSTRUCTION PERFORMANCE BASED ON WORKLOAD CHARACTERIZATION AND MEASUREMENT By ADIL ADI GHEEWALA A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

More information

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs CSC 2400: Computer Systems Towards the Hardware: Machine-Level Representation of Programs Towards the Hardware High-level language (Java) High-level language (C) assembly language machine language (IA-32)

More information

Using MMX Instructions to Compute the L2 Norm Between Two 16-Bit Vectors

Using MMX Instructions to Compute the L2 Norm Between Two 16-Bit Vectors Using X Instructions to Compute the L2 Norm Between Two 16-Bit Vectors Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection

More information

Data in Memory. variables have multiple attributes. variable

Data in Memory. variables have multiple attributes. variable Data in Memory variables have multiple attributes variable symbolic name data type (perhaps with qualifier) allocated in data area, stack, or heap duration (lifetime or extent) storage class scope (visibility

More information

Using MMX Instructions to Implement Viterbi Decoding

Using MMX Instructions to Implement Viterbi Decoding Using MMX Instructions to Implement Viterbi Decoding Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel

More information

Machine Program: Procedure. Zhaoguo Wang

Machine Program: Procedure. Zhaoguo Wang Machine Program: Procedure Zhaoguo Wang Requirements of procedure calls? P() { y = Q(x); y++; 1. Passing control int Q(int i) { int t, z; return z; Requirements of procedure calls? P() { y = Q(x); y++;

More information

Credits and Disclaimers

Credits and Disclaimers Credits and Disclaimers 1 The examples and discussion in the following slides have been adapted from a variety of sources, including: Chapter 3 of Computer Systems 3 nd Edition by Bryant and O'Hallaron

More information

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions? administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions? exam on Wednesday today s material not on the exam 1 Assembly Assembly is programming

More information

Introduction to 8086 Assembly

Introduction to 8086 Assembly Introduction to 8086 Assembly Lecture 13 Inline Assembly Inline Assembly Compiler-dependent GCC -> GAS (the GNU assembler) Intel Syntax => AT&T Syntax Registers: eax => %eax Immediates: 123 => $123 Memory:

More information

CS61 Section Solutions 3

CS61 Section Solutions 3 CS61 Section Solutions 3 (Week of 10/1-10/5) 1. Assembly Operand Specifiers 2. Condition Codes 3. Jumps 4. Control Flow Loops 5. Procedure Calls 1. Assembly Operand Specifiers Q1 Operand Value %eax 0x104

More information

Accelerating 3D Geometry Transformation with Intel MMX TM Technology

Accelerating 3D Geometry Transformation with Intel MMX TM Technology Accelerating 3D Geometry Transformation with Intel MMX TM Technology ECE 734 Project Report by Pei Qi Yang Wang - 1 - Content 1. Abstract 2. Introduction 2.1 3 -Dimensional Object Geometry Transformation

More information

The x86 Architecture

The x86 Architecture The x86 Architecture Lecture 24 Intel Manual, Vol. 1, Chapter 3 Robb T. Koether Hampden-Sydney College Fri, Mar 20, 2015 Robb T. Koether (Hampden-Sydney College) The x86 Architecture Fri, Mar 20, 2015

More information

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08 CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 21: Generating Pentium Code 10 March 08 CS 412/413 Spring 2008 Introduction to Compilers 1 Simple Code Generation Three-address code makes it

More information

W4118: PC Hardware and x86. Junfeng Yang

W4118: PC Hardware and x86. Junfeng Yang W4118: PC Hardware and x86 Junfeng Yang A PC How to make it do something useful? 2 Outline PC organization x86 instruction set gcc calling conventions PC emulation 3 PC board 4 PC organization One or more

More information

What's New in Computers

What's New in Computers feature ARTCLE What's New in Computers MMX Technology for Multimedia pes S Balakrishnan n this article we discuss ntel's MMX technology and its integration as part of multimedia pes. S Balakrishnan is

More information

CSCI 192 Engineering Programming 2. Assembly Language

CSCI 192 Engineering Programming 2. Assembly Language CSCI 192 Engineering Programming 2 Week 5 Assembly Language Lecturer: Dr. Markus Hagenbuchner Slides by: Igor Kharitonenko and Markus Hagenbuchner Room 3.220 markus@uow.edu.au UOW 2010 24/08/2010 1 C Compilation

More information

X86 Addressing Modes Chapter 3" Review: Instructions to Recognize"

X86 Addressing Modes Chapter 3 Review: Instructions to Recognize X86 Addressing Modes Chapter 3" Review: Instructions to Recognize" 1 Arithmetic Instructions (1)! Two Operand Instructions" ADD Dest, Src Dest = Dest + Src SUB Dest, Src Dest = Dest - Src MUL Dest, Src

More information

Process Layout, Function Calls, and the Heap

Process Layout, Function Calls, and the Heap Process Layout, Function Calls, and the Heap CS 6 Spring 20 Prof. Vern Paxson TAs: Devdatta Akhawe, Mobin Javed, Matthias Vallentin January 9, 20 / 5 2 / 5 Outline Process Layout Function Calls The Heap

More information

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction E I P CPU isters Condition Codes Addresses Data Instructions Memory Object Code Program Data OS Data Topics Assembly Programmer

More information

Second Part of the Course

Second Part of the Course CSC 2400: Computer Systems Towards the Hardware 1 Second Part of the Course Toward the hardware High-level language (C) assembly language machine language (IA-32) 2 High-Level Language g Make programming

More information

The Hardware/Software Interface CSE351 Spring 2013

The Hardware/Software Interface CSE351 Spring 2013 The Hardware/Software Interface CSE351 Spring 2013 x86 Programming II 2 Today s Topics: control flow Condition codes Conditional and unconditional branches Loops 3 Conditionals and Control Flow A conditional

More information

CAS CS Computer Systems Spring 2015 Solutions to Problem Set #2 (Intel Instructions) Due: Friday, March 20, 1:00 pm

CAS CS Computer Systems Spring 2015 Solutions to Problem Set #2 (Intel Instructions) Due: Friday, March 20, 1:00 pm CAS CS 210 - Computer Systems Spring 2015 Solutions to Problem Set #2 (Intel Instructions) Due: Friday, March 20, 1:00 pm This problem set is to be completed individually. Explain how you got to your answers

More information

CSC 8400: Computer Systems. Machine-Level Representation of Programs

CSC 8400: Computer Systems. Machine-Level Representation of Programs CSC 8400: Computer Systems Machine-Level Representation of Programs Towards the Hardware High-level language (Java) High-level language (C) assembly language machine language (IA-32) 1 Compilation Stages

More information

Practical Malware Analysis

Practical Malware Analysis Practical Malware Analysis Ch 4: A Crash Course in x86 Disassembly Revised 1-16-7 Basic Techniques Basic static analysis Looks at malware from the outside Basic dynamic analysis Only shows you how the

More information

Meet & Greet! Come hang out with your TAs and Fellow Students (& eat free insomnia cookies) When : TODAY!! 5-6 pm Where : 3rd Floor Atrium, CIT

Meet & Greet! Come hang out with your TAs and Fellow Students (& eat free insomnia cookies) When : TODAY!! 5-6 pm Where : 3rd Floor Atrium, CIT Meet & Greet! Come hang out with your TAs and Fellow Students (& eat free insomnia cookies) When : TODAY!! 5-6 pm Where : 3rd Floor Atrium, CIT CS33 Intro to Computer Systems XI 1 Copyright 2017 Thomas

More information

CPEG421/621 Tutorial

CPEG421/621 Tutorial CPEG421/621 Tutorial Compiler data representation system call interface calling convention Assembler object file format object code model Linker program initialization exception handling relocation model

More information

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung ASSEMBLY I: BASIC OPERATIONS Jo, Heeseung MOVING DATA (1) Moving data: movl source, dest Move 4-byte ("long") word Lots of these in typical code Operand types Immediate: constant integer data - Like C

More information

Procedure Calls. Young W. Lim Sat. Young W. Lim Procedure Calls Sat 1 / 27

Procedure Calls. Young W. Lim Sat. Young W. Lim Procedure Calls Sat 1 / 27 Procedure Calls Young W. Lim 2016-11-05 Sat Young W. Lim Procedure Calls 2016-11-05 Sat 1 / 27 Outline 1 Introduction References Stack Background Transferring Control Register Usage Conventions Procedure

More information

CS61, Fall 2012 Midterm Review Section

CS61, Fall 2012 Midterm Review Section CS61, Fall 2012 Midterm Review Section (10/16/2012) Q1: Hexadecimal and Binary Notation - Solve the following equations and put your answers in hex, decimal and binary. Hexadecimal Decimal Binary 15 +

More information

GCC and Assembly language. GCC and Assembly language. Consider an example (dangeous) foo.s

GCC and Assembly language. GCC and Assembly language. Consider an example (dangeous) foo.s GCC and Assembly language slide 1 GCC and Assembly language slide 2 during the construction of an operating system kernel, microkernel, or embedded system it is vital to be able to access some of the microprocessor

More information

CMSC 313 Lecture 12. Project 3 Questions. How C functions pass parameters. UMBC, CMSC313, Richard Chang

CMSC 313 Lecture 12. Project 3 Questions. How C functions pass parameters. UMBC, CMSC313, Richard Chang Project 3 Questions CMSC 313 Lecture 12 How C functions pass parameters UMBC, CMSC313, Richard Chang Last Time Stack Instructions: PUSH, POP PUSH adds an item to the top of the stack POP

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 8: Introduction to code generation Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 What is a Compiler? Compilers

More information

Questions about last homework? (Would more feedback be useful?) New reading assignment up: due next Monday

Questions about last homework? (Would more feedback be useful?) New reading assignment up: due next Monday Questions about last homework? (Would more feedback be useful?) New reading assignment up: due next Monday addl: bitwise for signed (& unsigned) 4 bits: 1000 = -8, 0111 = 7-8 + -8 = -16 = 0 1000 + 1000

More information

Instruction Set Architecture

Instruction Set Architecture CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant adapted by Jason Fritts http://csapp.cs.cmu.edu CS:APP2e Hardware Architecture - using Y86 ISA For learning aspects

More information

Assembly I: Basic Operations. Computer Systems Laboratory Sungkyunkwan University

Assembly I: Basic Operations. Computer Systems Laboratory Sungkyunkwan University Assembly I: Basic Operations Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moving Data (1) Moving data: movl source, dest Move 4-byte ( long )

More information

Computer Systems Organization V Fall 2009

Computer Systems Organization V Fall 2009 Computer Systems Organization V22.0201 Fall 2009 Sample Midterm Exam ANSWERS 1. True/False. Circle the appropriate choice. (a) T (b) F At most one operand of an x86 assembly instruction can be an memory

More information

1 /* file cpuid2.s */ 4.asciz "The processor Vendor ID is %s \n" 5.section.bss. 6.lcomm buffer, section.text. 8.globl _start.

1 /* file cpuid2.s */ 4.asciz The processor Vendor ID is %s \n 5.section.bss. 6.lcomm buffer, section.text. 8.globl _start. 1 /* file cpuid2.s */ 2.section.data 3 output: 4.asciz "The processor Vendor ID is %s \n" 5.section.bss 6.lcomm buffer, 12 7.section.text 8.globl _start 9 _start: 10 movl $0, %eax 11 cpuid 12 movl $buffer,

More information

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution 1. (40 points) Write the following subroutine in x86 assembly: Recall that: int f(int v1, int v2, int v3) { int x = v1 + v2; urn (x + v3) * (x v3); Subroutine arguments are passed on the stack, and can

More information

CS 31: Intro to Systems Functions and the Stack. Martin Gagne Swarthmore College February 23, 2016

CS 31: Intro to Systems Functions and the Stack. Martin Gagne Swarthmore College February 23, 2016 CS 31: Intro to Systems Functions and the Stack Martin Gagne Swarthmore College February 23, 2016 Reminders Late policy: you do not have to send me an email to inform me of a late submission before the

More information

CPSC W Term 2 Problem Set #3 - Solution

CPSC W Term 2 Problem Set #3 - Solution 1. (a) int gcd(int a, int b) { if (a == b) urn a; else if (a > b) urn gcd(a - b, b); else urn gcd(a, b - a); CPSC 313 06W Term 2 Problem Set #3 - Solution.file "gcdrec.c".globl gcd.type gcd, @function

More information

Assembly Language: IA-32 Instructions

Assembly Language: IA-32 Instructions Assembly Language: IA-32 Instructions 1 Goals of this Lecture Help you learn how to: Manipulate data of various sizes Leverage more sophisticated addressing modes Use condition codes and jumps to change

More information

Computer System Architecture

Computer System Architecture CSC 203 1.5 Computer System Architecture Department of Statistics and Computer Science University of Sri Jayewardenepura Instruction Set Architecture (ISA) Level 2 Introduction 3 Instruction Set Architecture

More information

Homework. In-line Assembly Code Machine Language Program Efficiency Tricks Reading PAL, pp 3-6, Practice Exam 1

Homework. In-line Assembly Code Machine Language Program Efficiency Tricks Reading PAL, pp 3-6, Practice Exam 1 Homework In-line Assembly Code Machine Language Program Efficiency Tricks Reading PAL, pp 3-6, 361-367 Practice Exam 1 1 In-line Assembly Code The gcc compiler allows you to put assembly instructions in-line

More information

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth Registers Ray Seyfarth September 8, 2011 Outline 1 Register basics 2 Moving a constant into a register 3 Moving a value from memory into a register 4 Moving values from a register into memory 5 Moving

More information

Machine-Level Programming Introduction

Machine-Level Programming Introduction Machine-Level Programming Introduction Today Assembly programmer s exec model Accessing information Arithmetic operations Next time More of the same Fabián E. Bustamante, Spring 2007 IA32 Processors Totally

More information

CSE2421 FINAL EXAM SPRING Name KEY. Instructions: Signature

CSE2421 FINAL EXAM SPRING Name KEY. Instructions: Signature CSE2421 FINAL EXAM SPRING 2013 Name KEY Instructions: This is a closed-book, closed-notes, closed-neighbor exam. Only a writing utensil is needed for this exam. No calculators allowed. If you need to go

More information

Assembly I: Basic Operations. Jo, Heeseung

Assembly I: Basic Operations. Jo, Heeseung Assembly I: Basic Operations Jo, Heeseung Moving Data (1) Moving data: movl source, dest Move 4-byte ("long") word Lots of these in typical code Operand types Immediate: constant integer data - Like C

More information

CSC201, SECTION 002, Fall 2000: Homework Assignment #3

CSC201, SECTION 002, Fall 2000: Homework Assignment #3 1 of 7 11/8/2003 7:34 PM CSC201, SECTION 002, Fall 2000: Homework Assignment #3 DUE DATE October 25 for the homework problems, in class. October 27 for the programs, in class. INSTRUCTIONS FOR HOMEWORK

More information

Machine-Level Programming Introduction

Machine-Level Programming Introduction Machine-Level Programming Introduction Today! Assembly programmer s exec model! Accessing information! Arithmetic operations Next time! More of the same Fabián E. Bustamante, 2007 X86 Evolution: Programmer

More information

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p text C program (p1.c p2.c) Compiler (gcc -S) text Asm

More information