CS802 Parallel Processing Class Notes
|
|
- Sabrina McBride
- 5 years ago
- Views:
Transcription
1 CS802 Parallel Processing Class Notes MMX Technology Instructor: Dr. Chang N. Zhang Winter Semester, 2006
2 Intel MMX TM Technology Chapter 1: Introduction to MMX technology 1.1 Features of the MMX Technology - MMX technology: to accelerate multimedia and communication by adding new instructions and defining new 64-bit data types. - MMX technology introduces new general-purpose instructions. These instructions operate in parallel on multiple data elements packed into 64-bit quantities. These instructions accelerate the performance of applications with compute-intensive algorithms that perform localized, recurring operations on small native data. These applications include motion video, combined graphics with video, image processing, audio, synthesis, speech synthesis and compression, telephony, video conferencing, 2D graphics, and 3d graphics. - Single Instruction, Multiple Data (SIMD) technique. The MMX technology uses SIMD technique to speed up software performance by processing multiple data elements in parallel, using a single instruction. The MMX technology supports parallel operations on byte, word, and doubleword data elements, and the new quadword (64-bit) integer data type new instructions - Eight 64-bit wide MMX registers (MM0~MM7) - Four new data types 1.2 Advantages of the MMX Technology - SIMD provides parallelism, greatly increase performance on the PC platform - MMX technology is integrated into Intel Architecture (IA) processors, fully compatible with existing OS. - IA software will run on MMX technology-enabled system - MMX technology be used in applications, algorithm, and drivers 1
3 Chapter 2: MMX New Data Types & MMX Registers 2.1 MMX New Data Types The principal data type of the IA MMX technology is the packed fixed-point integer. The decimal point of the fixed-point values is implicit and is left for the user to control for maximum flexibility. The IA MMX technology defines the following four new 64-bit quantity: (1) Packed byte: Eight bytes packed into one 64-bit quantity (2) Packed word: Four words packed into one 64-bit quantity (3) Packed doubleword: Two words packed into one 64-bit quantity (4) Quadword : one 64-bit quantity 2.2 MMX Registers The IA MMX technology provides eight 64-bit, general-purpose registers. The registers are aliased on the floating-point registers. The operating system handles the MMX technology as it would handle floating-point. The MMX registers can hold packed 64-bit data types. The MMX instructions access the MMX registers directly using the register names MM0 to MM7. The MMX registers can be used to perform calculations on data. They cannot be used to address memory; addressing is accomplished by using the integer registers and standard IA addressing modes. 2
4 Chapter 3: MMX Instructions (Total 57) Overview 3.1 Types of Instructions arithmetic: add, subtract, multiply, arithmetic shift and multiply add. comparison: logic: AND, AND NOT, OR, and XOR shift: conversion: data transfer: EMMS: empty MMX state 3.2 MMX Instructions: Syntax Typical MMX instruction: -- Prefix: P for Packed -- Instruction operation: for example, ADD, CMP, XOR -- Suffix: US for Unsigned Saturation S for Signed saturation B,W, D, Q for the data type: Example: PADDUSW Packed Add Unsigned with Saturation for word 3.3 MMX Instructions: Format For data transfer instruction: -- destination and source operands can reside in memory, integer registers, or MMX registers For all other IA MMX instructions: -- destination operand: MMX register -- source operand: MMX register, memory, or immediate operands 3
5 3.4 MMX Instructions: Conventions source operand: at right place destination operand: at left place e.g. PSLLW mm, mm/m64 memory address: as the least significant byte of the data 3.5 MMX Instructions: Conventions Wrap Around: if overflow or underflow, a data is truncated, only the lower (least significant) bits are returned. Carry is ignored. Saturation: if overflow or underflow, a data is clipped (saturated) to a datarange limit for the data type. lower limit upper limit signed byte 80H 7FH signed word 8000H 7FFFH unsigned byte 00H FFH unsigned word 0000H FFFFH e.g for unsigned byte, e5h+62h= ffh (saturation) e5h+62h= 47H (wrap around) 4
6 Chapter 4: MMX Instructions 4.1 Arithmetic (PADD, Wrap around) PADDB mm, mm/m64, Operation as: mm(7 0) mm(7 0) + mm/m64(7...0) mm(15 8) mm(15 8) + mm/m64(15 8). mm(63 56) mm(63 56) +mm/m64(63 56) PADDW mm, mm/m64, Operation as: mm(15 0) mm(15 0) + mm/m64(15...0) mm(31 16) mm(31 16) + mm/m64(31 16). mm(63 48) mm(63 48) + mm/m64(63 48) 5
7 PADDD mm, mm/m64, Operation as: mm(31 0) mm(31 0) + mm/m64(31...0) mm(63 32) mm(63 32) + mm/m64(63 32) 4.2 Arithmetic (PADD, saturation) PADDSB mm, mm/m64, Operation as: mm(7 0) SaturateToSignedByte( mm(7 0) + mm/m64(7...0)) mm(15 8) SaturateToSignedByte( mm(15 8) + mm/m64(15 8)). mm(63 56) SaturateToSignedByte( mm(63 56) +mm/m64(63 56)) PADDSW mm, mm/m64, Operation as: mm(15 0) SaturateToSignedWord( mm(15 0) + mm/m64(15...0)) mm(31 16) SaturateToSignedWord( mm(31 16) + mm/m64(31 16)). mm(63 48) SaturateToSignedWord( mm(63 48) + mm/m64(63 48)) 6
8 4.3 Arithmetic Packed Add Unsigned with Saturation --- PADDUSB mm, mm/m PADDUSW mm, mm/m64 Subtraction: --- PSUB[B,W,D] mm, mm/m64 (Wrap Around) --- PSUBS[B,W] mm, mm/m64 (Saturation) --- PSUBUS[B,W] mm, mm/m64 (Saturation) 4.4 Arithmetic Packed Multiply and Add --- PMADDWD mm, mm/m64, Multiply the packed word by the packed word in MMX reg/memory. Add the 32-bit results pairwise and store in MMX register as dword. Packed Multiply High --- PMULHW mm, mm/m64, Multiply the signed packed word in MMX register with the signed packed word in MMX reg/memory, then store the high-order 16 bits of the result in MMX register. mm(15 0) (mm(15 0) * mm/m64(15...0)) (31 16); mm(31 16) (mm(31 16) * mm/m64(31 16)) (31 16); mm(47 32) (mm(47 32) * mm/m64(47 32)) (31 16); mm(63 48) (mm(63 48) * mm/m64(63 48)) (31 16); Packed Multiply Low --- PMULHL mm, mm/m64, Multiply the signed packed word in MMX register with the signed packed word in MMX reg/memory, then store the low-order 16 bits of the result in MMX register. mm(15 0) (mm(15 0) * mm/m64(15...0)) (15 0); 7
9 mm(31 16) (mm(31 16) * mm/m64(31 16)) (15 0); mm(47 32) (mm(47 32) * mm/m64(47 32)) (15 0); mm(63 48) (mm(63 48) * mm/m64(63 48)) (15 0); 4.5 Comparison Packed Compare for Equality [byte, word, doubleword] --- PCMPEQB mm, mm/m64, Return (0xff, or 0) --- PCMPEQW mm, mm/m64, Return (0xffff, or 0) --- PCMPEQD mm, mm/m64, Return (0xffffffff, or 0) Packed Compare for Greater than --- PCMPGT[B, W,Q]; 4.6 Logic Bit-wise Logical Exclusive OR --- PXOR mm, mm/m64, mm mm XOR mm/m64 Bit-wise Logical AND --- PAND mm, mm/m64, mm mm AND mm/m64 Bit-wise Logical AND NOT --- PANDN mm, mm/m64, mm (NOT mm) AND mm/m64 Bit-wise Logical OR --- POR mm, mm/m64, mm mm OR mm/m Shift Packed shift left logical (Shifting in zero) --- PSLL[W, D, Q] mm, mm/m64, Packed shift Right logical (Shifting in zero) --- PSRL[W, D,Q] mm, mm/m64, Packed shift right arithmetic (Shifting in sign bits) --- PSRA[W, D] mm, mm/m64, 8
10 4.8 Conversion Pack with unsigned saturation --- PACKUSWB mm, mm/m64, Pack and saturate signed words from MMX register and MMX register /memory into unsigned bytes in MMX register. mm(7 0) SaturateSignedWordToUnsignedByte mm(15...0); mm(15 8) SaturateSignedWordToUnsignedByte mm(31 16); mm(23 16) SaturateSignedWordToUnsignedByte mm(47 32); mm(31 24) SaturateSignedWordToUnsignedByte mm(63 48); mm(39 32) SaturateSignedWordToUnsignedByte m/m64(15...0); mm(47 40) SaturateSignedWordToUnsignedByte mm/m64(31 16); mm(55 48) SaturateSignedWordToUnsignedByte mm/m64(47 32); mm(63 56) aturatesignedwordtounsignedbyte mm/m64(63 48); Pack with unsigned saturation --- PACKUSWB mm, mm/m64, Pack with signed saturation --- PACKSSWB mm, mm/m64, Pack and saturate signed words from MMX register and MMX register /memory into signed bytes in MMX register. mm(7 0) SaturateSignedWordToSigignedByte mm(15...0); mm(15 8) SaturateSignedWordToSignedByte mm(31 16); mm(23 16) SaturateSignedWordToSignedByte mm(47 32); mm(31 24) SaturateSignedWordToSignedByte mm(63 48); mm(39 32) SaturateSignedWordToSignedByte mm/m64(15...0); mm(47 40) SaturateSignedWordToSignedByte mm/m64(31 16); 9
11 mm(55 48) SaturateSignedWordToSignedByte mm/m64(47 32); mm(63 56) SaturateSignedWordToSignedByte mm/m64(63 48); Pack with signed saturation --- PACKSSDW mm, mm/m64, Pack and saturate signed dwords from MMX register and MMX register /memory into signed words in MMX register. mm(15 0) SaturateSignedDwordToSigignedWord mm(31...0); mm(31 16) SaturateSignedDwordToSignedWord mm(63 32); mm(47 32) SaturateSignedDwordToSignedWord mm/m64(31...0); mm(63 48) SaturateSignedDwordToSignedWord mm/m64(63 32); Unpack High Packed Data --- PUNPCKH[BW, WD, DQ]SSDW mm, mm/m64, Unpack and interleave the high-order data elements of the destination and source operands into the destination operand. The low order elements are ignored. E.g. PUNPCKHWD mm(63 48) mm/m64(63 48); mm(47 32) mm (63 48); mm(31 16) mm/m64(47 32); mm(15 0) mm (47 32); Unpack Low Packed Data --- PUNPCKL[BW, WD, DQ]SSDW mm, mm/m64, Unpack and interleave the low-order data elements of the destination and source operands into the destination operand. The high order elements are ignored. E.g. PUNPCKLWD mm(63 48) mm/m64(31 16); mm(47 32) mm (31 16); mm(31 16) mm/m64(15 0); mm(15 0) mm (15 0); 10
12 4.9 Data Transfer Move 32 bits --- MOVD mm, r/m32 move 32 bits from integer register/memory to MMX register mm(63 0) ZeroExtend(r/m32); Move 32 bits --- MOVD r/m32, mm move 32 bits from MMX register to integer register/memory r/m32 mm(31 0). Move 64 bits --- MOVQ mm, mm/m64 move 64 bits from MMX register/memory to MMX register mm mm/m64; --- MOVQ mm/64, mm move 64 bits from MMX register to MMX register/memory mm/m64 mm; 4.10 Instruction Samples e.g. MOVD MM0, EAX; PSLLQ MM0, 32; MOVD MM1, EBX; POR MM0, MM1; MOVQ MM2, MM3; PSLLQ MM3, 1; PXOR MM3, MM2; 11
13 Chapter 5. MMX Code Optimization 5.1 Code Optimization Guidelines use the current compiler do not intermix MMX instructions and FP instructions use the opcode reg, mem instruction format whenever possible put an EMMS instruction at the end of all MMX code sections that will transition to FP code Optimize data cache bandwidth to MMX register. 5.2 Accessing Memory Pentium II and III, -- opcode reg, mem (2 micro-ops) -- opcode reg, reg (1 micro-op) Recommend: merging loads whenever the same address is used more than once. (memory-bound) Recommend: merging loads whenever the same address is used more than twice. (not memory-bound) change MOVQ reg, reg and opcode reg, mem to 12
14 MOVQ reg, mem and opcode reg, reg to save one micro-op. Chapter 6 Programming Tools and Examples 6.1 Programming Tools MASM 6.11 or above. With 6.14 Patch ( install the ML614.exe) VC can compile MMX instructions key functions written with assembly language including MMX instructions. Some C/C++ compilers also including the MASM tool 6.2 Programming Examples // Name: cpu_test.c // Purpose: to test some MMX instructions #include <stdio.h> // to test if the CPU is MMX compatible int cpu_test( ); // to left shift 16 bit for X and append the low 8 bit of y, return x; unsigned int MMX_test(unsigned int x, unsigned int y); //Main function for the program void main( void ) { int found_mmx=cpu_test(); if (found_mmx==1) printf("this CPU support MMX technology\n"); 13
15 else printf("this CPU doen NOT support MMX technology\n"); } // test the MMX instruction unsigned int x= 0x ; unsigned int y= 0x ; printf("the original value of x is 0x%x\n", x); printf("the original value of y is 0x%x\n", y); x=mmx_test(x, y); printf("after left shifting 16 bit of x and append the \n"); printf("low 8 bit of y, value of y is 0x%x\n", x); //Function Name: cpu_test //Return: If the CPU supports MMX, returns value 1, otherwise returns value 2 int cpu_test() { asm{ // test if the cpu support MMX mov eax,1; cpuid; test edx, h; jnz found; mov eax, 2; jmp end; found: mov eax, 1; end: EMMS; } /* Return with result in EAX */ } //Function Name: MMX_test //Parameters: Two unsigned integers x, and y //Purpose: to test some MMX instructions //Return: to left shift 16 bit for X and append it with the low 8 bit of Y, return X; unsigned int MMX_test(unsigned int x, unsigned int y) { _ asm{ mov eax, x; mov ebx, y; movd mm0, eax; 14
16 } } mov eax, 0xff; movd mm2, eax psllq mm0, 16; movd mm1, ebx; pand mm1, mm2; por mm0, mm1; movd eax, mm0; Results: This CPU support MMX technology The original value of x is 0x The original value of y is 0x After left shifting 16 bit of x and appending the low 8 bit of y, value of y is 0x Reference 1. MMX Technology Programmer s Reference Manual 2. MMX Technology Technical Overview 3. Intel Architecture Optimization Reference Manual 4. Intel Architecture Software Developer manual 1 5. Intel Architecture Software Developer manual 2 6. Intel Architecture Software Developer manual Appendix: MMX Instructions Sheet 15
17 MMX Instructions Sheet 16
CS220. April 25, 2007
CS220 April 25, 2007 AT&T syntax MMX Most MMX documents are in Intel Syntax OPERATION DEST, SRC We use AT&T Syntax OPERATION SRC, DEST Always remember: DEST = DEST OPERATION SRC (Please note the weird
More informationIntel Architecture MMX Technology
D Intel Architecture MMX Technology Programmer s Reference Manual March 1996 Order No. 243007-002 Subject to the terms and conditions set forth below, Intel hereby grants you a nonexclusive, nontransferable
More informationIntel MMX Technology Overview
Intel MMX Technology Overview March 996 Order Number: 24308-002 E Information in this document is provided in connection with Intel products. No license under any patent or copyright is granted expressly
More informationMMX TM Technology Technical Overview
MMX TM Technology Technical Overview Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel products. No license,
More informationInstruction Set Progression. from MMX Technology through Streaming SIMD Extensions 2
Instruction Set Progression from MMX Technology through Streaming SIMD Extensions 2 This article summarizes the progression of change to the instruction set in the Intel IA-32 architecture, from MMX technology
More informationIntel SIMD architecture. Computer Organization and Assembly Languages Yung-Yu Chuang
Intel SIMD architecture Computer Organization and Assembly Languages g Yung-Yu Chuang Overview SIMD MMX architectures MMX instructions examples SSE/SSE2 SIMD instructions are probably the best place to
More informationIntel SIMD architecture. Computer Organization and Assembly Languages Yung-Yu Chuang 2006/12/25
Intel SIMD architecture Computer Organization and Assembly Languages Yung-Yu Chuang 2006/12/25 Reference Intel MMX for Multimedia PCs, CACM, Jan. 1997 Chapter 11 The MMX Instruction Set, The Art of Assembly
More informationCannot increase performance by multiple issuing. -limitation of Instruction Fetch and decode rate (memory bottelneck) -Not enough ILP
Vector Processors Motivations: Cannot increase performance with deeper pipeline because: -clock cycle time limitation (latch delay) -increase dependences with deeper pipeline Cannot increase performance
More informationMedia Instructions, Coprocessors, and Hardware Accelerators. Overview
Media Instructions, Coprocessors, and Hardware Accelerators Steven P. Smith SoC Design EE382V Fall 2009 EE382 System-on-Chip Design Coprocessors, etc. SPS-1 University of Texas at Austin Overview SoCs
More informationIntel s MMX. Why MMX?
Intel s MMX Dr. Richard Enbody CSE 820 Why MMX? Make the Common Case Fast Multimedia and Communication consume significant computing resources. Providing specific hardware support makes sense. 1 Goals
More informationUsing MMX Instructions to Implement a 1/3T Equalizer
Using MMX Instructions to Implement a 1/3T Equalizer Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel
More informationUsing MMX Instructions to Perform Simple Vector Operations
Using MMX Instructions to Perform Simple Vector Operations Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with
More informationUsing MMX Instructions to Implement the G.728 Codebook Search
Using MMX Instructions to Implement the G.728 Codebook Search Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection
More informationUsing MMX Instructions to Compute the L1 Norm Between Two 16-bit Vectors
Using MMX Instructions to Compute the L1 Norm Between Two 16-bit Vectors Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in
More informationAn Efficient Vector/Matrix Multiply Routine using MMX Technology
An Efficient Vector/Matrix Multiply Routine using MMX Technology Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection
More informationUsing MMX Instructions to Perform 16-Bit x 31-Bit Multiplication
Using MMX Instructions to Perform 16-Bit x 31-Bit Multiplication Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection
More informationUsing MMX Instructions for 3D Bilinear Texture Mapping
Using MMX Instructions for 3D Bilinear Texture Mapping Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel
More informationSEN361 Computer Organization. Prof. Dr. Hasan Hüseyin BALIK (8 th Week)
+ SEN361 Computer Organization Prof. Dr. Hasan Hüseyin BALIK (8 th Week) + Outline 3. The Central Processing Unit 3.1 Instruction Sets: Characteristics and Functions 3.2 Instruction Sets: Addressing Modes
More informationUsing MMX Instructions to Compute the AbsoluteDifference in Motion Estimation
Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided
More information17. Instruction Sets: Characteristics and Functions
17. Instruction Sets: Characteristics and Functions Chapter 12 Spring 2016 CS430 - Computer Architecture 1 Introduction Section 12.1, 12.2, and 12.3 pp. 406-418 Computer Designer: Machine instruction set
More informationCS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08
CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 21: Generating Pentium Code 10 March 08 CS 412/413 Spring 2008 Introduction to Compilers 1 Simple Code Generation Three-address code makes it
More informationHistory of the Intel 80x86
Intel s IA-32 Architecture Cptr280 Dr Curtis Nelson History of the Intel 80x86 1971 - Intel invents the microprocessor, the 4004 1975-8080 introduced 8-bit microprocessor 1978-8086 introduced 16 bit microprocessor
More informationComponent Operation 16
16 The embedded Pentium processor has an optimized superscalar micro-architecture capable of executing two instructions in a single clock. A 64-bit external bus, separate data and instruction caches, write
More informationUsing MMX Instructions to implement 2X 8-bit Image Scaling
Using MMX Instructions to implement 2X 8-bit Image Scaling Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with
More informationHigh Performance Computing. Classes of computing SISD. Computation Consists of :
High Performance Computing! Introduction to classes of computing! SISD! MISD! SIMD! Conclusion Classes of computing Computation Consists of :! Sequential Instructions (operation)! Sequential dataset We
More informationUsing MMX Instructions to Implement a Row Filter Algorithm
sing MMX Instructions to Implement a Row Filter Algorithm Information for Developers and ISs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with
More informationAccelerating 3D Geometry Transformation with Intel MMX TM Technology
Accelerating 3D Geometry Transformation with Intel MMX TM Technology ECE 734 Project Report by Pei Qi Yang Wang - 1 - Content 1. Abstract 2. Introduction 2.1 3 -Dimensional Object Geometry Transformation
More informationInstruction Set extensions to X86. Floating Point SIMD instructions
Instruction Set extensions to X86 Some extensions to x86 instruction set intended to accelerate 3D graphics AMD 3D-Now! Instructions simply accelerate floating point arithmetic. Accelerate object transformations
More informationUsing MMX Instructions to Implement a Synthesis Sub-Band Filter for MPEG Audio Decoding
Using MMX Instructions to Implement a Synthesis Sub-Band Filter for MPEG Audio Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided
More informationTeaching the SIMD Execution Model: Assembling a Few Parallel Programming Skills
Teaching the SIMD Execution Model: Assembling a Few Parallel Programming Skills Ariel Ortiz Computer Science Department Instituto Tecnológico y de Estudios Superiores de Monterrey Campus Estado de México
More informationStorage I/O Summary. Lecture 16: Multimedia and DSP Architectures
Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal
More informationUsing MMX Technology in Digital Image Processing (Technical Report and Coding Examples) TR-98-13
Using MMX Technology in Digital Image Processing (Technical Report and Coding Examples) TR-98-13 Vladimir Kravtchenko Department of Computer Science The University of British Columbia 201-2366 Main Mall,
More informationSWAR: MMX, SSE, SSE 2 Multiplatform Programming
SWAR: MMX, SSE, SSE 2 Multiplatform Programming Relatore: dott. Matteo Roffilli roffilli@csr.unibo.it 1 What s SWAR? SWAR = SIMD Within A Register SIMD = Single Instruction Multiple Data MMX,SSE,SSE2,Power3DNow
More informationIntel Architecture Software Developer s Manual
Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel Architecture Software Developer s Manual consists of three books: Basic Architecture, Order Number 243190; Instruction
More informationIntel 64 and IA-32 Architectures Software Developer s Manual
Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of five volumes: Basic Architecture,
More informationUsing MMX Instructions to Implement Viterbi Decoding
Using MMX Instructions to Implement Viterbi Decoding Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel
More informationSOEN228, Winter Revision 1.2 Date: October 25,
SOEN228, Winter 2003 Revision 1.2 Date: October 25, 2003 1 Contents Flags Mnemonics Basic I/O Exercises Overview of sample programs 2 Flag Register The flag register stores the condition flags that retain
More informationAMD Extensions to the. Instruction Sets Manual
AMD Extensions to the 3DNow! TM and MMX Instruction Sets Manual TM 2000 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided in connection with Advanced Micro Devices,
More informationIA-32 Intel Architecture Software Developer s Manual
IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of three volumes: Basic Architecture, Order Number
More information1 Overview of the AMD64 Architecture
24592 Rev. 3.1 March 25 1 Overview of the AMD64 Architecture 1.1 Introduction The AMD64 architecture is a simple yet powerful 64-bit, backward-compatible extension of the industry-standard (legacy) x86
More informationIntel Xeon Scalable Processor
Intel Xeon Scalable Processor Instruction Throughput and Latency August 2017 Revision 1.1 336289-002 Document ID: 336289-002 Revision Number: 1.1 Revision History Document ID Description Date 336289-001
More informationESTIMATING MULTIMEDIA INSTRUCTION PERFORMANCE BASED ON WORKLOAD CHARACTERIZATION AND MEASUREMENT
ESTIMATING MULTIMEDIA INSTRUCTION PERFORMANCE BASED ON WORKLOAD CHARACTERIZATION AND MEASUREMENT By ADIL ADI GHEEWALA A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2018 Lecture 4 LAST TIME Enhanced our processor design in several ways Added branching support Allows programs where work is proportional to the input values
More informationWinter Compiler Construction T11 Activation records + Introduction to x86 assembly. Today. Tips for PA4. Today:
Winter 2006-2007 Compiler Construction T11 Activation records + Introduction to x86 assembly Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University Today ic IC Language Lexical Analysis
More informationMachine Code and Assemblers November 6
Machine Code and Assemblers November 6 CSC201 Section 002 Fall, 2000 Definitions Assembly time vs. link time vs. load time vs. run time.c file.asm file.obj file.exe file compiler assembler linker Running
More informationEEM336 Microprocessors I. Arithmetic and Logic Instructions
EEM336 Microprocessors I Arithmetic and Logic Instructions Introduction We examine the arithmetic and logic instructions. The arithmetic instructions include addition, subtraction, multiplication, division,
More informationWilliam Stallings Computer Organization and Architecture 8 th Edition. Chapter 10 Instruction Sets: Characteristics and Functions
William Stallings Computer Organization and Architecture 8 th Edition Chapter 10 Instruction Sets: Characteristics and Functions Instruction Set = The complete collection of instructions that are recognized
More informationDefining and Using Simple Data Types
85 CHAPTER 4 Defining and Using Simple Data Types This chapter covers the concepts essential for working with simple data types in assembly-language programs The first section shows how to declare integer
More informationIA-32 Intel Architecture Software Developer s Manual
IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of four volumes: Basic Architecture, Order Number
More informationWe will first study the basic instructions for doing multiplications and divisions
MULTIPLICATION, DIVISION AND NUMERICAL CONVERSIONS We will first study the basic instructions for doing multiplications and divisions We then use these instructions to 1. Convert a string of ASCII digits
More informationTopics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation
Computer Organization CS 231-01 Data Representation Dr. William H. Robinson November 12, 2004 Topics Power tends to corrupt; absolute power corrupts absolutely. Lord Acton British historian, late 19 th
More informationChapter 3: Addressing Modes
Chapter 3: Addressing Modes Chapter 3 Addressing Modes Note: Adapted from (Author Slides) Instructor: Prof. Dr. Khalid A. Darabkh 2 Introduction Efficient software development for the microprocessor requires
More informationTake Home Final Examination (From noon, May 5, 2004 to noon, May 12, 2004)
Last (family) name: First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE 734 VLSI Array Structure for Digital Signal Processing Take
More informationOptimizing Memory Bandwidth
Optimizing Memory Bandwidth Don t settle for just a byte or two. Grab a whole fistful of cache. Mike Wall Member of Technical Staff Developer Performance Team Advanced Micro Devices, Inc. make PC performance
More informationMarking Scheme. Examination Paper Department of CE. Module: Microprocessors (630313)
Philadelphia University Faculty of Engineering Marking Scheme Examination Paper Department of CE Module: Microprocessors (630313) Final Exam Second Semester Date: 02/06/2018 Section 1 Weighting 40% of
More informationCS Bootcamp x86-64 Autumn 2015
The x86-64 instruction set architecture (ISA) is used by most laptop and desktop processors. We will be embedding assembly into some of our C++ code to explore programming in assembly language. Depending
More informationappendix b From LC-3 to x86
appendix b From LC-3 to x86 As you know, the ISA of the LC-3 explicitly specifies the interface between what the LC-3 machine language programmer or LC-3 compilers produce and what a microarchitecture
More information3DNow! Instruction Porting Guide. Application Note
3DNow! Instruction Porting Guide Application Note Publication # 22621 Rev: B Issue Date: August 1999 1999 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided in
More informationScott M. Lewandowski CS295-2: Advanced Topics in Debugging September 21, 1998
Scott M. Lewandowski CS295-2: Advanced Topics in Debugging September 21, 1998 Assembler Syntax Everything looks like this: label: instruction dest,src instruction label Comments: comment $ This is a comment
More informationIslamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB
Islamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB Lab # 9 Integer Arithmetic and Bit Manipulation April, 2014 1 Assembly Language LAB Bitwise
More informationInstructions moving data
do not affect flags. Instructions moving data mov register/mem, register/mem/number (move data) The difference between the value and the address of a variable mov al,sum; value 56h al mov ebx,offset Sum;
More informationWhat's New in Computers
feature ARTCLE What's New in Computers MMX Technology for Multimedia pes S Balakrishnan n this article we discuss ntel's MMX technology and its integration as part of multimedia pes. S Balakrishnan is
More informationMASM32 error A2070: invalid instruction operands It's unclear what the data size. Use cmp dword inc ecx Instruction operands must be the same size?
Instruction Operands Must Be The Same Size Masm With MASM(32 bit, but however) these two lines are rejected as an error. DWORD test.asm(121) : error A2022:instruction operands must be the same size. The
More information2010 Summer Answers [OS I]
CS2503 A-Z Accumulator o Register where CPU stores intermediate arithmetic results. o Speeds up process by not having to store these results in main memory. Addition o Carried out by the ALU. o ADD AX,
More informationCMSC 313 Lecture 07. Short vs Near Jumps Logical (bit manipulation) Instructions AND, OR, NOT, SHL, SHR, SAL, SAR, ROL, ROR, RCL, RCR
CMSC 313 Lecture 07 Short vs Near Jumps Logical (bit manipulation) Instructions AND, OR, NOT, SHL, SHR, SAL, SAR, ROL, ROR, RCL, RCR More Arithmetic Instructions NEG, MUL, IMUL, DIV Indexed Addressing:
More informationUsing MMX Instructions to Implement a Modem Baseband Canceler
Using MMX Instructions to Implement a Modem Baseband Canceler Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection
More information6/20/2011. Introduction. Chapter Objectives Upon completion of this chapter, you will be able to:
Introduction Efficient software development for the microprocessor requires a complete familiarity with the addressing modes employed by each instruction. This chapter explains the operation of the stack
More informationSSE and SSE2. Timothy A. Chagnon 18 September All images from Intel 64 and IA 32 Architectures Software Developer's Manuals
SSE and SSE2 Timothy A. Chagnon 18 September 2007 All images from Intel 64 and IA 32 Architectures Software Developer's Manuals Overview SSE: Streaming SIMD (Single Instruction Multiple Data) Extensions
More information3DNow! Technology Manual
3DNow! TM Technology Manual 1998 Advanced Micro Devices, Inc. All rights reserved. AMD-K6 3D Advanced Micro Devices, Inc. ( AMD ) reserves the right to make changes in its products without notice in order
More informationW4118: PC Hardware and x86. Junfeng Yang
W4118: PC Hardware and x86 Junfeng Yang A PC How to make it do something useful? 2 Outline PC organization x86 instruction set gcc calling conventions PC emulation 3 PC board 4 PC organization One or more
More informationCOMPUTER ORGANIZATION & ARCHITECTURE
COMPUTER ORGANIZATION & ARCHITECTURE Instructions Sets Architecture Lesson 5a 1 What are Instruction Sets The complete collection of instructions that are understood by a CPU Can be considered as a functional
More informationUsing MMX Instructions to Implement 2D Sprite Overlay
Using MMX Instructions to Implement 2D Sprite Overlay Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel
More informationThe CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:
The CPU and Memory How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram: 1 Registers A register is a permanent storage location within
More informationReverse Engineering Low Level Software. CS5375 Software Reverse Engineering Dr. Jaime C. Acosta
1 Reverse Engineering Low Level Software CS5375 Software Reverse Engineering Dr. Jaime C. Acosta Machine code 2 3 Machine code Assembly compile Machine Code disassemble 4 Machine code Assembly compile
More information3.1 DATA MOVEMENT INSTRUCTIONS 45
3.1.1 General-Purpose Data Movement s 45 3.1.2 Stack Manipulation... 46 3.1.3 Type Conversion... 48 3.2.1 Addition and Subtraction... 51 3.1 DATA MOVEMENT INSTRUCTIONS 45 MOV (Move) transfers a byte, word,
More informationIN5050: Programming heterogeneous multi-core processors SIMD (and SIMT)
: Programming heterogeneous multi-core processors SIMD (and SIMT) single scull: one is fast quad scull: many are faster Types of Parallel Processing/Computing? Bit-level parallelism 4-bit à 8-bit à 16-bit
More informationInstruction Set Architecture
C Fortran Ada etc. Basic Java Instruction Set Architecture Compiler Assembly Language Compiler Byte Code Nizamettin AYDIN naydin@yildiz.edu.tr http://www.yildiz.edu.tr/~naydin http://akademik.bahcesehir.edu.tr/~naydin
More informationDr. Ramesh K. Karne Department of Computer and Information Sciences, Towson University, Towson, MD /12/2014 Slide 1
Dr. Ramesh K. Karne Department of Computer and Information Sciences, Towson University, Towson, MD 21252 rkarne@towson.edu 11/12/2014 Slide 1 Intel x86 Aseembly Language Assembly Language Assembly Language
More informationChapter 2. lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1
Chapter 2 1 MIPS Instructions Instruction Meaning add $s1,$s2,$s3 $s1 = $s2 + $s3 sub $s1,$s2,$s3 $s1 = $s2 $s3 addi $s1,$s2,4 $s1 = $s2 + 4 ori $s1,$s2,4 $s2 = $s2 4 lw $s1,100($s2) $s1 = Memory[$s2+100]
More informationHomework 2. Lecture 6: Machine Code. Instruction Formats for HW2. Two parts: How to do Homework 2!!!!
Lecture 6: Machine How to do Homework 2!!!! Homework 2 Two parts: Part 1: Use Debug to enter and run a simple machine code program convert input data into 2 s complement hex enter data at the correct address
More informationComputer Organization CS 206 T Lec# 2: Instruction Sets
Computer Organization CS 206 T Lec# 2: Instruction Sets Topics What is an instruction set Elements of instruction Instruction Format Instruction types Types of operations Types of operand Addressing mode
More informationWhen an instruction is initially read from memory it goes to the Instruction register.
CS 320 Ch. 12 Instruction Sets Computer instructions are written in mnemonics. Mnemonics typically have a 1 to 1 correspondence between a mnemonic and the machine code. Mnemonics are the assembly language
More informationCSE P 501 Compilers. x86 Lite for Compiler Writers Hal Perkins Autumn /25/ Hal Perkins & UW CSE J-1
CSE P 501 Compilers x86 Lite for Compiler Writers Hal Perkins Autumn 2011 10/25/2011 2002-11 Hal Perkins & UW CSE J-1 Agenda Learn/review x86 architecture Core 32-bit part only for now Ignore crufty, backward-compatible
More informationUsing MMX Instructions to Perform 3D Geometry Transformations
Using MMX Instructions to Perform 3D Geometry Transformations Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection
More informationLab 3. The Art of Assembly Language (II)
Lab. The Art of Assembly Language (II) Dan Bruce, David Clark and Héctor D. Menéndez Department of Computer Science University College London October 2, 2017 License Creative Commons Share Alike Modified
More informationRegisters. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth
Registers Ray Seyfarth September 8, 2011 Outline 1 Register basics 2 Moving a constant into a register 3 Moving a value from memory into a register 4 Moving values from a register into memory 5 Moving
More informationALT-Assembly Language Tutorial
ALT-Assembly Language Tutorial ASSEMBLY LANGUAGE TUTORIAL Let s Learn in New Look SHAIK BILAL AHMED i A B O U T T H E T U TO R I A L Assembly Programming Tutorial Assembly language is a low-level programming
More informationPaul Cockshott and Kenneth Renfrew. SIMD Programming. Manual for Linux. and Windows. Springer
Paul Cockshott and Kenneth Renfrew SIMD Programming Manual for Linux and Windows Springer List of Tables List of Figures List of Algorithms Introduction xvii xix xxiii xxv I SIMD Programming 1 Paul Cockshott
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 13 & 14
CO20-320241 Computer Architecture and Programming Languages CAPL Lecture 13 & 14 Dr. Kinga Lipskoch Fall 2017 Frame Pointer (1) The stack is also used to store variables that are local to function, but
More informationMillions of instructions per second [MIPS] executed by a single chip microprocessor
Microprocessor Design Trends Joy's Law [Bill Joy of BSD4.x and Sun fame] MIPS = 2 year-1984 Millions of instructions per second [MIPS] executed by a single chip microprocessor More realistic rate is a
More informationPreface. Intel Technology Journal Q3, Lin Chao Editor Intel Technology Journal
Intel Technology Journal Q3, 1997 Preface Lin Chao Editor Intel Technology Journal Welcome to the Intel Technology Journal. After a decade as an internal R&D journal, we're broadening our audience and
More informationFigure 8-1. x87 FPU Execution Environment
Sign 79 78 64 63 R7 Exponent R6 R5 R4 R3 R2 R1 R0 Data Registers Significand 0 15 Control Register 0 47 Last Instruction Pointer 0 Status Register Last Data (Operand) Pointer Tag Register 10 Opcode 0 Figure
More informationECOM Computer Organization and Assembly Language. Computer Engineering Department CHAPTER 7. Integer Arithmetic
ECOM 2325 Computer Organization and Assembly Language Computer Engineering Department CHAPTER 7 Integer Arithmetic Presentation Outline Shift and Rotate Instructions Shift and Rotate Applications Multiplication
More informationCS241 Computer Organization Spring 2015 IA
CS241 Computer Organization Spring 2015 IA-32 2-10 2015 Outline! Review HW#3 and Quiz#1! More on Assembly (IA32) move instruction (mov) memory address computation arithmetic & logic instructions (add,
More informationAssembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit
Assembly Language for Intel-Based Computers, 4 th Edition Kip R. Irvine Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit Slides prepared by Kip R. Irvine Revision date: 09/25/2002
More informationELEG3924 Microprocessor
Department of Electrical Engineering University of Arkansas ELEG3924 Microprocessor Ch.2 Assembly Language Programming Dr. Jing Yang jingyang@uark.edu 1 OUTLINE Inside 8051 Introduction to assembly programming
More informationIA-32 Architecture. Computer Organization and Assembly Languages Yung-Yu Chuang 2005/10/6. with slides by Kip Irvine and Keith Van Rhein
IA-32 Architecture Computer Organization and Assembly Languages Yung-Yu Chuang 2005/10/6 with slides by Kip Irvine and Keith Van Rhein Virtual machines Abstractions for computers High-Level Language Level
More informationAssembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture. Chapter Overview.
Assembly Language for Intel-Based Computers, 4 th Edition Kip R. Irvine Chapter 2: IA-32 Processor Architecture Slides prepared by Kip R. Irvine Revision date: 09/25/2002 Chapter corrections (Web) Printing
More informationMemory Models. Registers
Memory Models Most machines have a single linear address space at the ISA level, extending from address 0 up to some maximum, often 2 32 1 bytes or 2 64 1 bytes. Some machines have separate address spaces
More informationComputer Organization (II) IA-32 Processor Architecture. Pu-Jen Cheng
Computer Organization & Assembly Languages Computer Organization (II) IA-32 Processor Architecture Pu-Jen Cheng Materials Some materials used in this course are adapted from The slides prepared by Kip
More information