Intel s MMX. Why MMX?

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Intel s MMX. Why MMX?"

Transcription

1 Intel s MMX Dr. Richard Enbody CSE 820 Why MMX? Make the Common Case Fast Multimedia and Communication consume significant computing resources. Providing specific hardware support makes sense. 1

2 Goals accelerate multimedia and communications applications. maintain full compatibility with existing operating systems and applications. exploit inherent parallelism in multimedia and communication algorithms includes new instructions and data types to improve performance. First Step: examine code Examined a wide range of applications: graphics, MPEG video, music synthesis, speech compression, speech recognition, image processing, games, video conferencing. Identified and analyzed the most compute-intensive routines 2

3 Common Characteristics Small integer data types: e.g. 8-bit pixels, 16-bit audio samples Small, highly repetitive loops Frequent multiply-and-accumulate Compute-intensive algorithms Highly parallel operations MMX Technology A set of basic, general purpose integer instructions: Single Instruction, Multiple Data (SIMD) 57 new instructions Eight 64-bit wide MMX registers Four new data types 3

4 Data Types Data Types 4

5 Example Pixels are generally 8-bit integers. Pack eight pixels into a 64-bit MMX register. An MMX instruction takes all eight of the pixels at once from the MMX register, performs the arithmetic or logical operation on all eight elements in parallel, and writes the result into an MMX register. Compatibility No new exceptions or states are added. Aliases to existing FP registers: The exponent field of the corresponding floating-point register (bits 64-78) and the sign bit (bit 79) are set to ones (1's), making the value in the register a NaN (Not a Number) or infinity when viewed as a floating-point value. 5

6 57 Instructions Basic arithmetic: add, subtract, multiply, arithmetic shift and multiply-add Comparison Conversion: pack & unpack Logical Shift Move: register-to-register Load/Store: 64-bit and 32-bit 6

7 Packed Add Word with wrap around Each Addition is independent Rightmost overflows and wraps around Saturation Saturation: if addition results in overflow or underflow, the result is clamped to the largest or smallest value representable. This is important for pixel calculations where this would prevent a wrap-around add from causing a black pixel to suddenly turn white 7

8 No Mode There is no "saturation mode bit : a new mode bit would require a change to the operating system. Separate instructions are used to generate wrap-around and saturating results. Packed Add Word with unsigned saturation Each Addition is independent Rightmost saturates 8

9 Multiply-Accumulate multiply-accumulate operations are fundamental to many signal processing algorithms like vector-dot-products, matrix multiplies, FIR and IIR Filters, FFTs, DCTs etc Packed Multiply-Add Multiply bytes generating four 32-bit results. Add the 2 products on the left for one result and the 2 products on the right for the other result. 9

10 Packed Parallel Compare No new condition code flags No existing IA condition code flags are affected by this instruction. Result can be used as a mask to select elements from different inputs using a logical operation, eliminating branchs. Packed Parallel Compare 10

11 Pack/Unpack Important when an algorithm needs higher precision in its intermediate calculations, as in image filtering. For example, image filtering involves a set of intermediate multiply operations between filter coefficients and a set of adjacent image pixels, accumulating all the values together. Pack 11

12 Conditional Select The Chroma Keying example demonstrates how conditional selection using the MMX instruction set removes branch mis-predictions, in addition to performing multiple selection operations in parallel. Text overlay on a pix/video background, and sprite overlays in games are some of the other operations that would benefit from this technique. Chroma Keying 12

13 Chroma Keying (con t) Take pixels from the picture with the woman on a green background. A compare instruction builds a mask for that data. That mask is a sequence of bytes that are all ones or all zeros. We now know what is the unwanted background and what we want to keep. Create Mask Assume pixels alternate green/not_green 13

14 Combine:!AND, AND, OR Branch Removal Without MMX technology, each pixel is processed separately and requires a conditional branch. Using MMX instructions, eight 8-bit pixels can be processed in parallel and no conditional branches are involved. 14

15 Vector Dot Product The vector dot product is one of the most basic algorithms used in signalprocessing of natural data such as images, audio, video and sound. PMADD does 4 multiplies and 2 adds at a time. Coupled with PADD, eight multiply-accumulate operations can be performed: 2 PMADD and 2 PADD Vector Dot Product 15

16 Vector Dot Product Vector Dot Product Assuming precision is sufficient, a dotproduct on an 8-element vector can be completed using 8 MMX instructions: 2 PMADDs, 2 PADDs, two shifts (if needed to fix the precision after the multiply), and 2 loads for one of the vectors (the other vector is loaded by the PMADD instruction which can have one of its operands come from memory). 16

17 Compare Compare With MMX technology, one third of the number of instructions is needed. Most MMX instructions can be executed in one clock cycle, so the performance improvement will be more dramatic than the simple ratio of instruction counts. 17

18 Matrix Multiply 3D games: computations that manipulate 3D objects use 4-by-4 matrices that are multiplied with 4-element vectors many times. Each vector has the X,Y, Z and perspective corrective information for each pixel. The 4-by-4 matrix is used to rotate, scale, translate and update the perspective corrective information for each pixel. 18

19 Compare Matrix Multiply MMX required half the instructions. 19

20 Image Dissolve Using Alpha Blending Dissolve a Swan into a Flower Result_pixel = Flower_pixel * (alpha/255) + Swan_pixel * [1 - (alpha/255)] Assume 640x480 resolution Dissolve: Millions of Inst. 20

21 Dissolve 1 billion fewer instructions for the 640x480 dissolve 21

22 Conclusion MMX appeared in 1997 in Pentium processors (with bigger cache). According to Intel, an MMX microprocessor runs a multimedia application up to 60% faster. In addition, it runs other applications about 10% faster 22

Intel MMX Technology Overview

Intel MMX Technology Overview Intel MMX Technology Overview March 996 Order Number: 24308-002 E Information in this document is provided in connection with Intel products. No license under any patent or copyright is granted expressly

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

Cannot increase performance by multiple issuing. -limitation of Instruction Fetch and decode rate (memory bottelneck) -Not enough ILP

Cannot increase performance by multiple issuing. -limitation of Instruction Fetch and decode rate (memory bottelneck) -Not enough ILP Vector Processors Motivations: Cannot increase performance with deeper pipeline because: -clock cycle time limitation (latch delay) -increase dependences with deeper pipeline Cannot increase performance

More information

Intel 64 and IA-32 Architectures Software Developer s Manual

Intel 64 and IA-32 Architectures Software Developer s Manual Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of five volumes: Basic Architecture,

More information

Media Signal Processing

Media Signal Processing 39 Media Signal Processing Ruby Lee Princeton University Gerald G. Pechanek BOPS, Inc. Thomas C. Savell Creative Advanced Technology Center Sadiq M. Sait King Fahd University Habib Youssef King Fahd University

More information

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram: The CPU and Memory How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram: 1 Registers A register is a permanent storage location within

More information

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Using Intel Streaming SIMD Extensions for 3D Geometry Processing Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,

More information

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0. C NUMERIC FORMATS Figure C-. Table C-. Listing C-. Overview The DSP supports the 32-bit single-precision floating-point data format defined in the IEEE Standard 754/854. In addition, the DSP supports an

More information

SWAR: MMX, SSE, SSE 2 Multiplatform Programming

SWAR: MMX, SSE, SSE 2 Multiplatform Programming SWAR: MMX, SSE, SSE 2 Multiplatform Programming Relatore: dott. Matteo Roffilli roffilli@csr.unibo.it 1 What s SWAR? SWAR = SIMD Within A Register SIMD = Single Instruction Multiple Data MMX,SSE,SSE2,Power3DNow

More information

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009 Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 15

CO Computer Architecture and Programming Languages CAPL. Lecture 15 CO20-320241 Computer Architecture and Programming Languages CAPL Lecture 15 Dr. Kinga Lipskoch Fall 2017 How to Compute a Binary Float Decimal fraction: 8.703125 Integral part: 8 1000 Fraction part: 0.703125

More information

VICP Signal Processing Library. Further extending the performance and ease of use for VICP enabled devices

VICP Signal Processing Library. Further extending the performance and ease of use for VICP enabled devices Signal Processing Library Further extending the performance and ease of use for enabled devices Why is library effective for customer application? Get to market faster with ready-to-use signal processing

More information

Intel Architecture MMX Technology

Intel Architecture MMX Technology D Intel Architecture MMX Technology Programmer s Reference Manual March 1996 Order No. 243007-002 Subject to the terms and conditions set forth below, Intel hereby grants you a nonexclusive, nontransferable

More information

Floating Point Arithmetic

Floating Point Arithmetic Floating Point Arithmetic CS 365 Floating-Point What can be represented in N bits? Unsigned 0 to 2 N 2s Complement -2 N-1 to 2 N-1-1 But, what about? very large numbers? 9,349,398,989,787,762,244,859,087,678

More information

Module 2: Computer Arithmetic

Module 2: Computer Arithmetic Module 2: Computer Arithmetic 1 B O O K : C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N, 3 E D, D A V I D L. P A T T E R S O N A N D J O H N L. H A N N E S S Y, M O R G A N K A U F M A N N

More information

Representing and Manipulating Floating Points

Representing and Manipulating Floating Points Representing and Manipulating Floating Points Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu The Problem How to represent fractional values with

More information

Number Systems and Computer Arithmetic

Number Systems and Computer Arithmetic Number Systems and Computer Arithmetic Counting to four billion two fingers at a time What do all those bits mean now? bits (011011011100010...01) instruction R-format I-format... integer data number text

More information

DSP Platforms Lab (AD-SHARC) Session 05

DSP Platforms Lab (AD-SHARC) Session 05 University of Miami - Frost School of Music DSP Platforms Lab (AD-SHARC) Session 05 Description This session will be dedicated to give an introduction to the hardware architecture and assembly programming

More information

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: September 18, 2017 at 12:48 CS429 Slideset 4: 1 Topics of this Slideset

More information

Dan Stafford, Justine Bonnot

Dan Stafford, Justine Bonnot Dan Stafford, Justine Bonnot Background Applications Timeline MMX 3DNow! Streaming SIMD Extension SSE SSE2 SSE3 and SSSE3 SSE4 Advanced Vector Extension AVX AVX2 AVX-512 Compiling with x86 Vector Processing

More information

Inf2C - Computer Systems Lecture 2 Data Representation

Inf2C - Computer Systems Lecture 2 Data Representation Inf2C - Computer Systems Lecture 2 Data Representation Boris Grot School of Informatics University of Edinburgh Last lecture Moore s law Types of computer systems Computer components Computer system stack

More information

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop.

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop. CS 320 Ch 10 Computer Arithmetic The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop. Signed integers are typically represented in sign-magnitude

More information

Intel Architecture Software Developer s Manual

Intel Architecture Software Developer s Manual Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel Architecture Software Developer s Manual consists of three books: Basic Architecture, Order Number 243190; Instruction

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

IA-32 Intel Architecture Software Developer s Manual

IA-32 Intel Architecture Software Developer s Manual IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of three volumes: Basic Architecture, Order Number

More information

Edge Detection Using Streaming SIMD Extensions On Low Cost Robotic Platforms

Edge Detection Using Streaming SIMD Extensions On Low Cost Robotic Platforms Edge Detection Using Streaming SIMD Extensions On Low Cost Robotic Platforms Matthias Hofmann, Fabian Rensen, Ingmar Schwarz and Oliver Urbann Abstract Edge detection is a popular technique for extracting

More information

Preface. Intel Technology Journal Q3, Lin Chao Editor Intel Technology Journal

Preface. Intel Technology Journal Q3, Lin Chao Editor Intel Technology Journal Intel Technology Journal Q3, 1997 Preface Lin Chao Editor Intel Technology Journal Welcome to the Intel Technology Journal. After a decade as an internal R&D journal, we're broadening our audience and

More information

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Systems I Floating Point Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties IEEE Floating Point IEEE Standard 754 Established in 1985 as uniform standard for

More information

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. ! Floating point Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties Next time! The machine model Chris Riesbeck, Fall 2011 Checkpoint IEEE Floating point Floating

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

Giving credit where credit is due

Giving credit where credit is due CSCE 230J Computer Organization Floating Point Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce230j Giving credit where credit is due Most of slides for this lecture are based

More information

Number Systems. Both numbers are positive

Number Systems. Both numbers are positive Number Systems Range of Numbers and Overflow When arithmetic operation such as Addition, Subtraction, Multiplication and Division are performed on numbers the results generated may exceed the range of

More information

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction 1 Floating Point The World is Not Just Integers Programming languages support numbers with fraction Called floating-point numbers Examples: 3.14159265 (π) 2.71828 (e) 0.000000001 or 1.0 10 9 (seconds in

More information

System Programming CISC 360. Floating Point September 16, 2008

System Programming CISC 360. Floating Point September 16, 2008 System Programming CISC 360 Floating Point September 16, 2008 Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Powerpoint Lecture Notes for Computer Systems:

More information

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation Chapter 2 Float Point Arithmetic Topics IEEE Floating Point Standard Fractional Binary Numbers Rounding Floating Point Operations Mathematical properties Real Numbers in Decimal Notation Representation

More information

Preliminary Decimal-Floating-Point Architecture

Preliminary Decimal-Floating-Point Architecture z/architecture Preliminary Decimal-Floating-Point Architecture November, 2006 SA23-2232-00 ii Preliminary Decimal-Floating-Point Architecture Note: Before using this information and the product it supports,

More information

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation Computer Organization CS 231-01 Data Representation Dr. William H. Robinson November 12, 2004 Topics Power tends to corrupt; absolute power corrupts absolutely. Lord Acton British historian, late 19 th

More information

Logic, Words, and Integers

Logic, Words, and Integers Computer Science 52 Logic, Words, and Integers 1 Words and Data The basic unit of information in a computer is the bit; it is simply a quantity that takes one of two values, 0 or 1. A sequence of k bits

More information

Floating Point Numbers

Floating Point Numbers Floating Point Numbers Summer 8 Fractional numbers Fractional numbers fixed point Floating point numbers the IEEE 7 floating point standard Floating point operations Rounding modes CMPE Summer 8 Slides

More information

Representation of Numbers and Arithmetic in Signal Processors

Representation of Numbers and Arithmetic in Signal Processors Representation of Numbers and Arithmetic in Signal Processors 1. General facts Without having any information regarding the used consensus for representing binary numbers in a computer, no exact value

More information

applications with SIMD and Hyper-Threading Technology by Chew Yean Yam Intel Corporation

applications with SIMD and Hyper-Threading Technology by Chew Yean Yam Intel Corporation White Paper Intel Digital Security Surveillance Optimizing Video ompression for Intel Digital Security Surveillance applications with SIMD and Hyper-Threading Technology by hew Yean Yam Intel orporation

More information

Instruction Set Progression. from MMX Technology through Streaming SIMD Extensions 2

Instruction Set Progression. from MMX Technology through Streaming SIMD Extensions 2 Instruction Set Progression from MMX Technology through Streaming SIMD Extensions 2 This article summarizes the progression of change to the instruction set in the Intel IA-32 architecture, from MMX technology

More information

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers. class04.ppt 15-213 The course that gives CMU its Zip! Topics Floating Point Jan 22, 2004 IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Floating Point Puzzles For

More information

Floating Point Numbers

Floating Point Numbers Floating Point Floating Point Numbers Mathematical background: tional binary numbers Representation on computers: IEEE floating point standard Rounding, addition, multiplication Kai Shen 1 2 Fractional

More information

Floating Point Arithmetic

Floating Point Arithmetic Floating Point Arithmetic Clark N. Taylor Department of Electrical and Computer Engineering Brigham Young University clark.taylor@byu.edu 1 Introduction Numerical operations are something at which digital

More information

Real-time Signal Processing on the Ultrasparc

Real-time Signal Processing on the Ultrasparc Technical Memorandum M97/4, Electronics Research Labs, 1/17/97 February 21, 1997 U N T H E I V E R S I T Y A O F LET TH E R E B E 1 8 6 8 LI G H T C A L I A I F O R N Real-time Signal Processing on the

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

Computer System Architecture

Computer System Architecture CSC 203 1.5 Computer System Architecture Budditha Hettige Department of Statistics and Computer Science University of Sri Jayewardenepura Microprocessors 2011 Budditha Hettige 2 Processor Instructions

More information

SEN361 Computer Organization. Prof. Dr. Hasan Hüseyin BALIK (8 th Week)

SEN361 Computer Organization. Prof. Dr. Hasan Hüseyin BALIK (8 th Week) + SEN361 Computer Organization Prof. Dr. Hasan Hüseyin BALIK (8 th Week) + Outline 3. The Central Processing Unit 3.1 Instruction Sets: Characteristics and Functions 3.2 Instruction Sets: Addressing Modes

More information

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8 Data Representation At its most basic level, all digital information must reduce to 0s and 1s, which can be discussed as binary, octal, or hex data. There s no practical limit on how it can be interpreted

More information

Basic Arithmetic and the ALU. Basic Arithmetic and the ALU. Background. Background. Integer multiplication, division

Basic Arithmetic and the ALU. Basic Arithmetic and the ALU. Background. Background. Integer multiplication, division Basic Arithmetic and the ALU Forecast Representing numbs, 2 s Complement, unsigned ition and subtraction /Sub ALU full add, ripple carry, subtraction, togeth Carry-Lookahead addition, etc. Logical opations

More information

Chapter 10 - Computer Arithmetic

Chapter 10 - Computer Arithmetic Chapter 10 - Computer Arithmetic Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 10 - Computer Arithmetic 1 / 126 1 Motivation 2 Arithmetic and Logic Unit 3 Integer representation

More information

Floating Point Considerations

Floating Point Considerations Chapter 6 Floating Point Considerations In the early days of computing, floating point arithmetic capability was found only in mainframes and supercomputers. Although many microprocessors designed in the

More information

Lecture 4 - Number Representations, DSK Hardware, Assembly Programming

Lecture 4 - Number Representations, DSK Hardware, Assembly Programming Lecture 4 - Number Representations, DSK Hardware, Assembly Programming James Barnes (James.Barnes@colostate.edu) Spring 2014 Colorado State University Dept of Electrical and Computer Engineering ECE423

More information

Next Generation Technology from Intel Intel Pentium 4 Processor

Next Generation Technology from Intel Intel Pentium 4 Processor Next Generation Technology from Intel Intel Pentium 4 Processor 1 The Intel Pentium 4 Processor Platform Intel s highest performance processor for desktop PCs Targeted at consumer enthusiasts and business

More information

EJEMPLOS DE ARQUITECTURAS

EJEMPLOS DE ARQUITECTURAS Maestría en Electrónica Arquitectura de Computadoras Unidad 4 EJEMPLOS DE ARQUITECTURAS M. C. Felipe Santiago Espinosa Marzo/2017 ARM & MIPS Similarities ARM: the most popular embedded core Similar basic

More information

Floating Point Representation in Computers

Floating Point Representation in Computers Floating Point Representation in Computers Floating Point Numbers - What are they? Floating Point Representation Floating Point Operations Where Things can go wrong What are Floating Point Numbers? Any

More information

ECE 154A Introduction to. Fall 2012

ECE 154A Introduction to. Fall 2012 ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 4: Arithmetic and Data Transfer Instructions Agenda Review of last lecture Logic and shift instructions Load/store instructionsi

More information

Intel SIMD architecture. Computer Organization and Assembly Languages Yung-Yu Chuang

Intel SIMD architecture. Computer Organization and Assembly Languages Yung-Yu Chuang Intel SIMD architecture Computer Organization and Assembly Languages g Yung-Yu Chuang Overview SIMD MMX architectures MMX instructions examples SSE/SSE2 SIMD instructions are probably the best place to

More information

Chapter 3 Arithmetic for Computers (Part 2)

Chapter 3 Arithmetic for Computers (Part 2) Department of Electr rical Eng ineering, Chapter 3 Arithmetic for Computers (Part 2) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Eng ineering, Feng-Chia Unive

More information

Instruction Set Architecture of MIPS Processor

Instruction Set Architecture of MIPS Processor CSE 3421/5421: Introduction to Computer Architecture Instruction Set Architecture of MIPS Processor Presentation B Study: 2.1 2.3, 2.4 2.7, 2.10 and Handout MIPS Instructions: 32-bit Core Subset Read:

More information

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning 4 Operations On Data 4.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three categories of operations performed on data.

More information

Chapter 06: Instruction Pipelining and Parallel Processing. Lesson 14: Example of the Pipelined CISC and RISC Processors

Chapter 06: Instruction Pipelining and Parallel Processing. Lesson 14: Example of the Pipelined CISC and RISC Processors Chapter 06: Instruction Pipelining and Parallel Processing Lesson 14: Example of the Pipelined CISC and RISC Processors 1 Objective To understand pipelines and parallel pipelines in CISC and RISC Processors

More information

Chapter 2. Data Representation in Computer Systems

Chapter 2. Data Representation in Computer Systems Chapter 2 Data Representation in Computer Systems Chapter 2 Objectives Understand the fundamentals of numerical data representation and manipulation in digital computers. Master the skill of converting

More information

Classes of Real Numbers 1/2. The Real Line

Classes of Real Numbers 1/2. The Real Line Classes of Real Numbers All real numbers can be represented by a line: 1/2 π 1 0 1 2 3 4 real numbers The Real Line { integers rational numbers non-integral fractions irrational numbers Rational numbers

More information

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8 Computer Arithmetic Ch 8 ALU Integer Representation Integer Arithmetic Floating-Point Representation Floating-Point Arithmetic 1 Arithmetic Logical Unit (ALU) (2) (aritmeettis-looginen yksikkö) Does all

More information

CHAPTER 6 ARITHMETIC, LOGIC INSTRUCTIONS, AND PROGRAMS

CHAPTER 6 ARITHMETIC, LOGIC INSTRUCTIONS, AND PROGRAMS CHAPTER 6 ARITHMETIC, LOGIC INSTRUCTIONS, AND PROGRAMS Addition of Unsigned Numbers The instruction ADD is used to add two operands Destination operand is always in register A Source operand can be a register,

More information

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8 Computer Arithmetic Ch 8 ALU Integer Representation Integer Arithmetic Floating-Point Representation Floating-Point Arithmetic 1 Arithmetic Logical Unit (ALU) (2) Does all work in CPU (aritmeettis-looginen

More information

1 Overview of the AMD64 Architecture

1 Overview of the AMD64 Architecture 24592 Rev. 3.1 March 25 1 Overview of the AMD64 Architecture 1.1 Introduction The AMD64 architecture is a simple yet powerful 64-bit, backward-compatible extension of the industry-standard (legacy) x86

More information

The Perils of Floating Point

The Perils of Floating Point The Perils of Floating Point by Bruce M. Bush Copyright (c) 1996 Lahey Computer Systems, Inc. Permission to copy is granted with acknowledgement of the source. Many great engineering and scientific advances

More information

Data Representation and Introduction to Visualization

Data Representation and Introduction to Visualization Data Representation and Introduction to Visualization Massimo Ricotti ricotti@astro.umd.edu University of Maryland Data Representation and Introduction to Visualization p.1/18 VISUALIZATION Visualization

More information

Integers and Floating Point

Integers and Floating Point CMPE12 More about Numbers Integers and Floating Point (Rest of Textbook Chapter 2 plus more)" Review: Unsigned Integer A string of 0s and 1s that represent a positive integer." String is X n-1, X n-2,

More information

Characters, Strings, and Floats

Characters, Strings, and Floats Characters, Strings, and Floats CS 350: Computer Organization & Assembler Language Programming 9/6: pp.8,9; 9/28: Activity Q.6 A. Why? We need to represent textual characters in addition to numbers. Floating-point

More information

Computer Architecture Review. Jo, Heeseung

Computer Architecture Review. Jo, Heeseung Computer Architecture Review Jo, Heeseung Computer Abstractions and Technology Jo, Heeseung Below Your Program Application software Written in high-level language System software Compiler: translates HLL

More information

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3 Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Instructor: Nicole Hynes nicole.hynes@rutgers.edu 1 Fixed Point Numbers Fixed point number: integer part

More information

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time. Floating point Today IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time The machine model Fabián E. Bustamante, Spring 2010 IEEE Floating point Floating point

More information

Arithmetic for Computers. Hwansoo Han

Arithmetic for Computers. Hwansoo Han Arithmetic for Computers Hwansoo Han Arithmetic for Computers Operations on integers Addition and subtraction Multiplication and division Dealing with overflow Floating-point real numbers Representation

More information

The von Neumann Architecture. IT 3123 Hardware and Software Concepts. The Instruction Cycle. Registers. LMC Executes a Store.

The von Neumann Architecture. IT 3123 Hardware and Software Concepts. The Instruction Cycle. Registers. LMC Executes a Store. IT 3123 Hardware and Software Concepts February 11 and Memory II Copyright 2005 by Bob Brown The von Neumann Architecture 00 01 02 03 PC IR Control Unit Command Memory ALU 96 97 98 99 Notice: This session

More information

Partitioned Branch Condition Resolution Logic

Partitioned Branch Condition Resolution Logic 1 Synopsys Inc. Synopsys Module Compiler Group 700 Middlefield Road, Mountain View CA 94043-4033 (650) 584-5689 (650) 584-1227 FAX aamirf@synopsys.com http://aamir.homepage.com Partitioned Branch Condition

More information

17. Instruction Sets: Characteristics and Functions

17. Instruction Sets: Characteristics and Functions 17. Instruction Sets: Characteristics and Functions Chapter 12 Spring 2016 CS430 - Computer Architecture 1 Introduction Section 12.1, 12.2, and 12.3 pp. 406-418 Computer Designer: Machine instruction set

More information

unused unused unused unused unused unused

unused unused unused unused unused unused BCD numbers. In some applications, such as in the financial industry, the errors that can creep in due to converting numbers back and forth between decimal and binary is unacceptable. For these applications

More information

Chapter 2 Data Representations

Chapter 2 Data Representations Computer Engineering Chapter 2 Data Representations Hiroaki Kobayashi 4/21/2008 4/21/2008 1 Agenda in Chapter 2 Translation between binary numbers and decimal numbers Data Representations for Integers

More information

Typical Processor Execution Cycle

Typical Processor Execution Cycle Typical Processor Execution Cycle Instruction Fetch Obtain instruction from program storage Instruction Decode Determine required actions and instruction size Operand Fetch Locate and obtain operand data

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function William Stallings Computer Organization and Architecture 8 th Edition Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data

More information

Floating Point Arithmetic

Floating Point Arithmetic Floating Point Arithmetic Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning 4 Operations On Data 4.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three categories of operations performed on data.

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2011 www.electron.frba.utn.edu.ar/dplab Architecture Introduction to the Blackfin Processor Introduction to MSA Micro Signal Architecture (MSA) core was jointly

More information

A Vectorizing Compiler for Multimedia Extensions

A Vectorizing Compiler for Multimedia Extensions International Journal of Parallel Programming, Vol. 28, No. 4, 2000 A Vectorizing Compiler for Multimedia Extensions N. Sreraman 1, 2 and R. Govindarajan 3 Received August 18, 1999; revised March 12, 2000

More information

NUMBER SCALING FOR THE LGP-27

NUMBER SCALING FOR THE LGP-27 NUMBER SCALING FOR THE LGP-27 5 SCALING The LGP-21 considers all numbers to be within the range -l

More information

Implementation of Floating Point Multiplier Using Dadda Algorithm

Implementation of Floating Point Multiplier Using Dadda Algorithm Implementation of Floating Point Multiplier Using Dadda Algorithm Abstract: Floating point multiplication is the most usefull in all the computation application like in Arithematic operation, DSP application.

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 293 5.2 Instruction Formats 293 5.2.1 Design Decisions for Instruction Sets 294 5.2.2 Little versus Big Endian 295 5.2.3 Internal

More information

Data Representations & Arithmetic Operations

Data Representations & Arithmetic Operations Data Representations & Arithmetic Operations Hiroaki Kobayashi 7/13/2011 7/13/2011 Computer Science 1 Agenda Translation between binary numbers and decimal numbers Data Representations for Integers Negative

More information

LogiCORE IP Floating-Point Operator v6.2

LogiCORE IP Floating-Point Operator v6.2 LogiCORE IP Floating-Point Operator v6.2 Product Guide Table of Contents SECTION I: SUMMARY IP Facts Chapter 1: Overview Unsupported Features..............................................................

More information

Floating-Point Arithmetic

Floating-Point Arithmetic Floating-Point Arithmetic if ((A + A) - A == A) { SelfDestruct() } L11 Floating Point 1 What is the problem? Many numeric applications require numbers over a VERY large range. (e.g. nanoseconds to centuries)

More information

LECTURE 10. Pipelining: Advanced ILP

LECTURE 10. Pipelining: Advanced ILP LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction

More information

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data

More information

Data Representation Floating Point

Data Representation Floating Point Data Representation Floating Point CSCI 2400 / ECE 3217: Computer Architecture Instructor: David Ferry Slides adapted from Bryant & O Hallaron s slides via Jason Fritts Today: Floating Point Background:

More information

Better sharc data such as vliw format, number of kind of functional units

Better sharc data such as vliw format, number of kind of functional units Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com

More information

Digital Arithmetic. Digital Arithmetic: Operations and Circuits Dr. Farahmand

Digital Arithmetic. Digital Arithmetic: Operations and Circuits Dr. Farahmand Digital Arithmetic Digital Arithmetic: Operations and Circuits Dr. Farahmand Binary Arithmetic Digital circuits are frequently used for arithmetic operations Fundamental arithmetic operations on binary

More information