Assembly Language - SSE and SSE2. Introduction to Scalar Floating- Point Operations via SSE

Size: px
Start display at page:

Download "Assembly Language - SSE and SSE2. Introduction to Scalar Floating- Point Operations via SSE"

Transcription

1 Assembly Language - SSE and SSE2 Introduction to Scalar Floating- Point Operations via SSE

2 Floating Point Registers and x86_64 Two sets of registers, in addition to the General-Purpose Registers Three additional groups of instructions use the registers Floating-Point Registers, ST0 ST7-80-bit registers a superset of IEEE-754 format - used by the original Floating-Point coprocessors» 8087, 80387, i486 FPU, etc. - Only addressable as a stack, i.e. all operations applied to register st0 Useful for trig, log functions, etc.

3 x86_64 Registers

4 MMX MMX instructions provide SIMD (Single- Instruction, Multiple-Data) processing - Integer operations on multiple data values in parallel MMX registers overlap the Floating-Point registers - MMX registers MMX0 MMX7 are only 64 bits wide - Each register overlaps the lower 64 bits of the 80-bit ST registers - MMX registers are directly addressable, unlike ST Difficult to use MMX and Floating-Point at the same time - slow context switches

5 SSE, SSE2, and beyond "Streaming SIMD Extensions" SSE offers single-precision floating-point SIMD instructions XMM0 XMM15 registers each hold multiple IEEE-754 operands bits wide, four single-precision operands per register SSE2 adds integer and double-precision floating-point SIMD instructions Many compilers use XMM registers and SSE/SSE2 instructions instead of F.P. or MMX

6 Programming Usage C function ABI (Application Binary Interface): Floating-point arguments ("float", "double") are passed in XMM registers - AL register passes count of XMM registers used - Floating-point return values in XMM0, XMM1 if needed Floating-point operations use SSE, SSE2 instructions on XMM registers - Instead of more-awkward floating-point instructions

7 first example Just pass a couple of f.p. (double-type) arguments to a C function - printf() Note that this scanf() doesn't get, or return, f.p. arguments - only pointers The printf() does, however

8 Basic SSE Instructions movsd move a double-precision operand to/from an XMM register addsd, mulsd, divsd, sqrtsd, etc - operate on scalar, double-precision operands in XMM registers movss move a single-precision operand to/from an XMM register addss, mulss, divss, sqrtss, etc - operate on scalar, single-precision operands in XMM registers many other instructions - also see F.P. instructions

9 Example 2 64-bit Division

10 SSE and Constants No SSE instructions to load constants into registers Constants must be created in memory, using Assembler (nasm) features Load constants from memory locations into registers for use

11 64-bit example product (a) This function calculates the product of an array of doubletype numbers. It uses an optimized while loop. It returns 1.0 for an empty array.

12 64-bit product (b) This code sets up a main() routine to read in a set of doubles. Easily adjusted for 32-bit floats. continued on the next slide

13 64-bit product (c) continued from the previous slide The input loop; calling product() and producing the outputs

14 SSE and Converting Numeric Types Convert integers to floats/doubles - cvtsi2ss convert integer to 32-bit float in XMM register - cvtsi2sd convert integer to 64-bit double in XMM register Convert floats/doubles to integers - cvtss2si convert 32-bit float to integer, round up/down - cvttss2si convert 32-bit float to integer, truncate result - cvtsd2si convert 64-bit double to integer, round - cvttsd2si convert 64-bit double to integer, truncate Convert between floats and doubles - cvtss2sd convert 32-bit float to 64-bit double - cvtsd2ss convert 64-bit float to 32-bit float

15 32-bit example mean (a) This function sums up the elements of an array, then divides the sum by the array length. The divisor must be converted from an integer to a float. Instruction cvtsi2ss does this.

16 32-bit mean (b) This code sets up a main() routine to read in a set of floats. continued on the next slide

17 32-bit mean (c) continued from the previous slide The input loop; calling mean() ; producing the outputs The output must be promoted to a double, by cvtss2sd, for printf()

Assembly Language - SSE and SSE2. Floating Point Registers and x86_64

Assembly Language - SSE and SSE2. Floating Point Registers and x86_64 Assembly Language - SSE and SSE2 Introduction to Scalar Floating- Point Operations via SSE Floating Point Registers and x86_64 Two sets of, in addition to the General-Purpose Registers Three additional

More information

CS 261 Fall Mike Lam, Professor. x86-64 Data Structures and Misc. Topics

CS 261 Fall Mike Lam, Professor. x86-64 Data Structures and Misc. Topics CS 261 Fall 2017 Mike Lam, Professor x86-64 Data Structures and Misc. Topics Topics Homogeneous data structures Arrays Nested / multidimensional arrays Heterogeneous data structures Structs / records Unions

More information

X86 Assembly Buffer Overflow III:1

X86 Assembly Buffer Overflow III:1 X86 Assembly Buffer Overflow III:1 Admin Link to buffer overflow demo http://nsfsecurity.pr.erau.edu/bom/ ASM quick-reference from Larry Zhang (thanks!) http://www.cs.uaf.edu/2010/fall/cs301/support/x86/gcc.html

More information

Software Optimization Guide for AMD Family 10h Processors

Software Optimization Guide for AMD Family 10h Processors Software Optimization Guide for AMD Family 10h Processors Publication # 40546 Revision: 3.03 Issue Date: June 2007 Advanced Micro Devices 2006 2007 Advanced Micro Devices, Inc. All rights reserved. The

More information

IA-32 Intel Architecture Software Developer s Manual

IA-32 Intel Architecture Software Developer s Manual IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of three volumes: Basic Architecture, Order Number

More information

SIMD Programming CS 240A, 2017

SIMD Programming CS 240A, 2017 SIMD Programming CS 240A, 2017 1 Flynn* Taxonomy, 1966 In 2013, SIMD and MIMD most common parallelism in architectures usually both in same system! Most common parallel processing programming style: Single

More information

Fixed-Point Math and Other Optimizations

Fixed-Point Math and Other Optimizations Fixed-Point Math and Other Optimizations Embedded Systems 8-1 Fixed Point Math Why and How Floating point is too slow and integers truncate the data Floating point subroutines: slower than native, overhead

More information

P. Specht s Liste der 8-Byte-Floatingpoint Befehle des masm32 Assemblers COMPACTED INTEL PENTIUM-4 PRESCOTT (April 2004) DPFP COMMAND SET

P. Specht s Liste der 8-Byte-Floatingpoint Befehle des masm32 Assemblers COMPACTED INTEL PENTIUM-4 PRESCOTT (April 2004) DPFP COMMAND SET P. Specht s Liste der 8-Byte-Floatingpoint Befehle des masm32 Assemblers COMPACTED INTEL PENTIUM-4 PRESCOTT (April 2004) DPFP COMMAND SET ADDPD ADDSD ADDSUBPD ANDPD ANDNPD CMPPD Add Packed Double-precision

More information

Lecture 16 SSE vectorprocessing SIMD MultimediaExtensions

Lecture 16 SSE vectorprocessing SIMD MultimediaExtensions Lecture 16 SSE vectorprocessing SIMD MultimediaExtensions Improving performance with SSE We ve seen how we can apply multithreading to speed up the cardiac simulator But there is another kind of parallelism

More information

SWAR: MMX, SSE, SSE 2 Multiplatform Programming

SWAR: MMX, SSE, SSE 2 Multiplatform Programming SWAR: MMX, SSE, SSE 2 Multiplatform Programming Relatore: dott. Matteo Roffilli roffilli@csr.unibo.it 1 What s SWAR? SWAR = SIMD Within A Register SIMD = Single Instruction Multiple Data MMX,SSE,SSE2,Power3DNow

More information

MACHINE-LEVEL PROGRAMMING IV: Computer Organization and Architecture

MACHINE-LEVEL PROGRAMMING IV: Computer Organization and Architecture MACHINE-LEVEL PROGRAMMING IV: DATA CS 045 Computer Organization and Architecture Prof. Donald J. Patterson Adapted from Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

More information

Software Optimization Guide for AMD Family 15h Processors

Software Optimization Guide for AMD Family 15h Processors Software Optimization Guide for AMD Family 15h Processors Publication No. Revision Date 47414 3.03 April 2011 Advanced Micro Devices 2010, 2011 Advanced Micro Devices, Inc. All rights reserved. The contents

More information

Algorithms and Computation in Signal Processing

Algorithms and Computation in Signal Processing Algorithms and Computation in Signal Processing special topic course 18-799B spring 2005 22 nd lecture Mar. 31, 2005 Instructor: Markus Pueschel Guest instructor: Franz Franchetti TA: Srinivas Chellappa

More information

COE608: Computer Organization and Architecture

COE608: Computer Organization and Architecture Add on Instruction Set Architecture COE608: Computer Organization and Architecture Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University Overview More

More information

SSE and SSE2. Timothy A. Chagnon 18 September All images from Intel 64 and IA 32 Architectures Software Developer's Manuals

SSE and SSE2. Timothy A. Chagnon 18 September All images from Intel 64 and IA 32 Architectures Software Developer's Manuals SSE and SSE2 Timothy A. Chagnon 18 September 2007 All images from Intel 64 and IA 32 Architectures Software Developer's Manuals Overview SSE: Streaming SIMD (Single Instruction Multiple Data) Extensions

More information

Run time environment of a MIPS program

Run time environment of a MIPS program Run time environment of a MIPS program Stack pointer Frame pointer Temporary local variables Return address Saved argument registers beyond a0-a3 Low address Growth of stack High address A translation

More information

Instruction Set Progression. from MMX Technology through Streaming SIMD Extensions 2

Instruction Set Progression. from MMX Technology through Streaming SIMD Extensions 2 Instruction Set Progression from MMX Technology through Streaming SIMD Extensions 2 This article summarizes the progression of change to the instruction set in the Intel IA-32 architecture, from MMX technology

More information

Intel 64 and IA-32 Architectures Software Developer s Manual

Intel 64 and IA-32 Architectures Software Developer s Manual Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of five volumes: Basic Architecture,

More information

EJEMPLOS DE ARQUITECTURAS

EJEMPLOS DE ARQUITECTURAS Maestría en Electrónica Arquitectura de Computadoras Unidad 4 EJEMPLOS DE ARQUITECTURAS M. C. Felipe Santiago Espinosa Marzo/2017 ARM & MIPS Similarities ARM: the most popular embedded core Similar basic

More information

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine.

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine. This lecture Compiler construction Lecture 6: Code generation for x86 Magnus Myreen Spring 2018 Chalmers University of Technology Gothenburg University x86 architecture s Some x86 instructions From LLVM

More information

CPEG421/621 Tutorial

CPEG421/621 Tutorial CPEG421/621 Tutorial Compiler data representation system call interface calling convention Assembler object file format object code model Linker program initialization exception handling relocation model

More information

FP_IEEE_DENORM_GET_ Procedure

FP_IEEE_DENORM_GET_ Procedure FP_IEEE_DENORM_GET_ Procedure FP_IEEE_DENORM_GET_ Procedure The FP_IEEE_DENORM_GET_ procedure reads the IEEE floating-point denormalization mode. fp_ieee_denorm FP_IEEE_DENORM_GET_ (void); DeNorm The denormalization

More information

IA-32 Intel Architecture Software Developer s Manual

IA-32 Intel Architecture Software Developer s Manual IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of four volumes: Basic Architecture, Order Number

More information

How to Write Fast Numerical Code Spring 2013 Lecture: Architecture/Microarchitecture and Intel Core

How to Write Fast Numerical Code Spring 2013 Lecture: Architecture/Microarchitecture and Intel Core How to Write Fast Numerical Code Spring 2013 Lecture: Architecture/Microarchitecture and Intel Core Instructor: Markus Püschel TA: Daniele Spampinato & Alen Stojanov Technicalities Research project: Let

More information

Review of Last Lecture. CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions. Great Idea #4: Parallelism.

Review of Last Lecture. CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions. Great Idea #4: Parallelism. CS 61C: Great Ideas in Computer Architecture The Flynn Taxonomy, Intel SIMD Instructions Instructor: Justin Hsia 1 Review of Last Lecture Amdahl s Law limits benefits of parallelization Request Level Parallelism

More information

CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions

CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions CS 61C: Great Ideas in Computer Architecture The Flynn Taxonomy, Intel SIMD Instructions Instructor: Justin Hsia 3/08/2013 Spring 2013 Lecture #19 1 Review of Last Lecture Amdahl s Law limits benefits

More information

Dan Stafford, Justine Bonnot

Dan Stafford, Justine Bonnot Dan Stafford, Justine Bonnot Background Applications Timeline MMX 3DNow! Streaming SIMD Extension SSE SSE2 SSE3 and SSSE3 SSE4 Advanced Vector Extension AVX AVX2 AVX-512 Compiling with x86 Vector Processing

More information

CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions

CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions CS 61C: Great Ideas in Computer Architecture The Flynn Taxonomy, Intel SIMD Instructions Guest Lecturer: Alan Christopher 3/08/2014 Spring 2014 -- Lecture #19 1 Neuromorphic Chips Researchers at IBM and

More information

Last time. Last Time. Last time. Dynamic Array Multiplication. Dynamic Nested Arrays

Last time. Last Time. Last time. Dynamic Array Multiplication. Dynamic Nested Arrays Last time Lecture 8: Structures, alignment, floats Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 %rax %rbx %rcx %rdx %rsi %rdi %rsp Return alue Callee saed

More information

Instruction Set extensions to X86. Floating Point SIMD instructions

Instruction Set extensions to X86. Floating Point SIMD instructions Instruction Set extensions to X86 Some extensions to x86 instruction set intended to accelerate 3D graphics AMD 3D-Now! Instructions simply accelerate floating point arithmetic. Accelerate object transformations

More information

Computer System Architecture

Computer System Architecture CSC 203 1.5 Computer System Architecture Budditha Hettige Department of Statistics and Computer Science University of Sri Jayewardenepura Microprocessors 2011 Budditha Hettige 2 Processor Instructions

More information

IA-32 Intel Architecture Software Developer s Manual

IA-32 Intel Architecture Software Developer s Manual IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer's Manual consists of five volumes: Basic Architecture, Order Number

More information

Manipulating Integers

Manipulating Integers Manipulating Integers Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

Intel SIMD. Chris Phillips LBA Lead Scien-st November 2014 ASTRONOMY AND SPACE SCIENCE

Intel SIMD. Chris Phillips LBA Lead Scien-st November 2014 ASTRONOMY AND SPACE SCIENCE Intel SIMD Chris Phillips LBA Lead Scien-st November 2014 ASTRONOMY AND SPACE SCIENCE SIMD Single Instruc-on Mul-ple Data Vector extensions for x86 processors Parallel opera-ons More registers than regular

More information

How do you know your GPU or manycore program is correct?

How do you know your GPU or manycore program is correct? How do you know your GPU or manycore program is correct? Prof. Miriam Leeser Department of Electrical and Computer Engineering Northeastern University Boston, MA mel@coe.neu.edu Typical Radar Processing

More information

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3 Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Instructor: Nicole Hynes nicole.hynes@rutgers.edu 1 Fixed Point Numbers Fixed point number: integer part

More information

Math 230 Assembly Programming (AKA Computer Organization) Spring 2008

Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro II Lect 10 Feb 15, 2008 Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L10.1

More information

Improving the compute performance of video processing software using AVX (Advanced Vector Extensions) instructions

Improving the compute performance of video processing software using AVX (Advanced Vector Extensions) instructions Abstract Modern x86 CPUs permit instruction level parallelism (e.g. SIMD) on register vectors at most 128-bits. Second Generation Intel Core Processors include the first generation of AVX (256-bit operators),

More information

Assignment 11: functions, calling conventions, and the stack

Assignment 11: functions, calling conventions, and the stack Assignment 11: functions, calling conventions, and the stack ECEN 4553 & 5013, CSCI 4555 & 5525 Prof. Jeremy G. Siek December 5, 2008 The goal of this week s assignment is to remove function definitions

More information

Data Types. Data Types. Integer Types. Signed Integers

Data Types. Data Types. Integer Types. Signed Integers Data Types Data Types Dr. TGI Fernando 1 2 The fundamental building blocks of any programming language. What is a data type? A data type is a set of values and a set of operations define on these values.

More information

Part 1 Fine-grained Operations

Part 1 Fine-grained Operations Part 1 Fine-grained Operations As we learned on Monday, CMPXCHG can be used to implement other primitives, such as TestAndSet. int CMPXCHG (int* loc, int oldval, int newval) { ATOMIC(); int old_reg_val

More information

OpenCL Vectorising Features. Andreas Beckmann

OpenCL Vectorising Features. Andreas Beckmann Mitglied der Helmholtz-Gemeinschaft OpenCL Vectorising Features Andreas Beckmann Levels of Vectorisation vector units, SIMD devices width, instructions SMX, SP cores Cus, PEs vector operations within kernels

More information

Expressions. Arithmetic expressions. Logical expressions. Assignment expression. n Variables and constants linked with operators

Expressions. Arithmetic expressions. Logical expressions. Assignment expression. n Variables and constants linked with operators Expressions 1 Expressions n Variables and constants linked with operators Arithmetic expressions n Uses arithmetic operators n Can evaluate to any value Logical expressions n Uses relational and logical

More information

Parallel Programming. Easy Cases: Data Parallelism

Parallel Programming. Easy Cases: Data Parallelism Parallel Programming The preferred parallel algorithm is generally different from the preferred sequential algorithm Compilers cannot transform a sequential algorithm into a parallel one with adequate

More information

Assembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit

Assembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit Assembly Language for Intel-Based Computers, 4 th Edition Kip R. Irvine Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit Slides prepared by Kip R. Irvine Revision date: 09/25/2002

More information

Intel 64 and IA-32 Architectures Software Developer s Manual

Intel 64 and IA-32 Architectures Software Developer s Manual Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of seven volumes: Basic Architecture,

More information

( ZIH ) Center for Information Services and High Performance Computing. Overvi ew over the x86 Processor Architecture

( ZIH ) Center for Information Services and High Performance Computing. Overvi ew over the x86 Processor Architecture ( ZIH ) Center for Information Services and High Performance Computing Overvi ew over the x86 Processor Architecture Daniel Molka Ulf Markwardt Daniel.Molka@tu-dresden.de ulf.markwardt@tu-dresden.de Outline

More information

Computers Programming Course 5. Iulian Năstac

Computers Programming Course 5. Iulian Năstac Computers Programming Course 5 Iulian Năstac Recap from previous course Classification of the programming languages High level (Ada, Pascal, Fortran, etc.) programming languages with strong abstraction

More information

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers Chapter 03: Computer Arithmetic Lesson 09: Arithmetic using floating point numbers Objective To understand arithmetic operations in case of floating point numbers 2 Multiplication of Floating Point Numbers

More information

Divide: Paper & Pencil

Divide: Paper & Pencil Divide: Paper & Pencil 1001 Quotient Divisor 1000 1001010 Dividend -1000 10 101 1010 1000 10 Remainder See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or

More information

Module 2: Computer Arithmetic

Module 2: Computer Arithmetic Module 2: Computer Arithmetic 1 B O O K : C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N, 3 E D, D A V I D L. P A T T E R S O N A N D J O H N L. H A N N E S S Y, M O R G A N K A U F M A N N

More information

Lecture 3. More About C

Lecture 3. More About C Copyright 1996 David R. Hanson Computer Science 126, Fall 1996 3-1 Lecture 3. More About C Programming languages have their lingo Programming language Types are categories of values int, float, char Constants

More information

Most of the slides in this lecture are either from or adapted from slides provided by the authors of the textbook Computer Systems: A Programmer s

Most of the slides in this lecture are either from or adapted from slides provided by the authors of the textbook Computer Systems: A Programmer s Most of the slides in this lecture are either from or adapted from slides provided by the authors of the textbook Computer Systems: A Programmer s Perspective, 2 nd Edition and are provided from the website

More information

Paul Cockshott and Kenneth Renfrew. SIMD Programming. Manual for Linux. and Windows. Springer

Paul Cockshott and Kenneth Renfrew. SIMD Programming. Manual for Linux. and Windows. Springer Paul Cockshott and Kenneth Renfrew SIMD Programming Manual for Linux and Windows Springer List of Tables List of Figures List of Algorithms Introduction xvii xix xxiii xxv I SIMD Programming 1 Paul Cockshott

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Compilers Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science

More information

Functions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth

Functions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth Functions Ray Seyfarth August 4, 2011 Functions We will write C compatible function C++ can also call C functions using extern "C" {...} It is generally not sensible to write complete assembly programs

More information

Parallel Processing SIMD, Vector and GPU s

Parallel Processing SIMD, Vector and GPU s Parallel Processing SIMD, ector and GPU s EECS4201 Comp. Architecture Fall 2017 York University 1 Introduction ector and array processors Chaining GPU 2 Flynn s taxonomy SISD: Single instruction operating

More information

United States Naval Academy Electrical and Computer Engineering Department EC310-6 Week Midterm Spring AY2017

United States Naval Academy Electrical and Computer Engineering Department EC310-6 Week Midterm Spring AY2017 United States Naval Academy Electrical and Computer Engineering Department EC310-6 Week Midterm Spring AY2017 1. Do a page check: you should have 8 pages including this cover sheet. 2. You have 50 minutes

More information

printf( Please enter another number: ); scanf( %d, &num2);

printf( Please enter another number: ); scanf( %d, &num2); CIT 593 Intro to Computer Systems Lecture #13 (11/1/12) Now that we've looked at how an assembly language program runs on a computer, we're ready to move up a level and start working with more powerful

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Name :. Roll No. :... Invigilator s Signature : INTRODUCTION TO PROGRAMMING. Time Allotted : 3 Hours Full Marks : 70

Name :. Roll No. :... Invigilator s Signature : INTRODUCTION TO PROGRAMMING. Time Allotted : 3 Hours Full Marks : 70 Name :. Roll No. :..... Invigilator s Signature :.. 2011 INTRODUCTION TO PROGRAMMING Time Allotted : 3 Hours Full Marks : 70 The figures in the margin indicate full marks. Candidates are required to give

More information

Lecture 12 Integers. Computer and Network Security 19th of December Computer Science and Engineering Department

Lecture 12 Integers. Computer and Network Security 19th of December Computer Science and Engineering Department Lecture 12 Integers Computer and Network Security 19th of December 2016 Computer Science and Engineering Department CSE Dep, ACS, UPB Lecture 12, Integers 1/40 Outline Data Types Representation Conversions

More information

System calls and assembler

System calls and assembler System calls and assembler Michal Sojka sojkam1@fel.cvut.cz ČVUT, FEL License: CC-BY-SA 4.0 System calls (repetition from lectures) A way for normal applications to invoke operating system (OS) kernel's

More information

Figure 1: 128-bit registers introduced by SSE. 128 bits. xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7

Figure 1: 128-bit registers introduced by SSE. 128 bits. xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 SE205 - TD1 Énoncé General Instructions You can download all source files from: https://se205.wp.mines-telecom.fr/td1/ SIMD-like Data-Level Parallelism Modern processors often come with instruction set

More information

In Fig. 3.5 and Fig. 3.7, we include some completely blank lines in the pseudocode for readability. programs into their various phases.

In Fig. 3.5 and Fig. 3.7, we include some completely blank lines in the pseudocode for readability. programs into their various phases. Formulating Algorithms with Top-Down, Stepwise Refinement Case Study 2: Sentinel-Controlled Repetition In Fig. 3.5 and Fig. 3.7, we include some completely blank lines in the pseudocode for readability.

More information

Compiler Design. Homework 1. Due Date: Thursday, January 19, 2006, 2:00

Compiler Design. Homework 1. Due Date: Thursday, January 19, 2006, 2:00 Homework 1 Due Date: Thursday, January 19, 2006, 2:00 Your Name: Question 1 Is SPARC big- or little- Endian? When a word of data is stored in memory, which byte is stored in the first byte (i.e., in the

More information

Computer System and programming in C

Computer System and programming in C 1 Basic Data Types Integral Types Integers are stored in various sizes. They can be signed or unsigned. Example Suppose an integer is represented by a byte (8 bits). Leftmost bit is sign bit. If the sign

More information

Vector Processors. Kavitha Chandrasekar Sreesudhan Ramkumar

Vector Processors. Kavitha Chandrasekar Sreesudhan Ramkumar Vector Processors Kavitha Chandrasekar Sreesudhan Ramkumar Agenda Why Vector processors Basic Vector Architecture Vector Execution time Vector load - store units and Vector memory systems Vector length

More information

Procedure Calling. Procedure Calling. Register Usage. 25 September CSE2021 Computer Organization

Procedure Calling. Procedure Calling. Register Usage. 25 September CSE2021 Computer Organization CSE2021 Computer Organization Chapter 2: Part 2 Procedure Calling Procedure (function) performs a specific task and return results to caller. Supporting Procedures Procedure Calling Calling program place

More information

CS367 Test 1 Review Guide

CS367 Test 1 Review Guide CS367 Test 1 Review Guide This guide tries to revisit what topics we've covered, and also to briefly suggest/hint at types of questions that might show up on the test. Anything on slides, assigned reading,

More information

17. Instruction Sets: Characteristics and Functions

17. Instruction Sets: Characteristics and Functions 17. Instruction Sets: Characteristics and Functions Chapter 12 Spring 2016 CS430 - Computer Architecture 1 Introduction Section 12.1, 12.2, and 12.3 pp. 406-418 Computer Designer: Machine instruction set

More information

Operators and Expressions:

Operators and Expressions: Operators and Expressions: Operators and expression using numeric and relational operators, mixed operands, type conversion, logical operators, bit operations, assignment operator, operator precedence

More information

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction 1 Floating Point The World is Not Just Integers Programming languages support numbers with fraction Called floating-point numbers Examples: 3.14159265 (π) 2.71828 (e) 0.000000001 or 1.0 10 9 (seconds in

More information

C Refresher, Advance C, Coding Standard, Misra C Compliance & Real-time Programming

C Refresher, Advance C, Coding Standard, Misra C Compliance & Real-time Programming C Refresher, Advance C, Coding Standard, Misra C Compliance & Real-time Programming Course Overview This course transforms an IT-Professional or a Student into an expert C Programming Person with concepts

More information

MASM32 error A2070: invalid instruction operands It's unclear what the data size. Use cmp dword inc ecx Instruction operands must be the same size?

MASM32 error A2070: invalid instruction operands It's unclear what the data size. Use cmp dword inc ecx Instruction operands must be the same size? Instruction Operands Must Be The Same Size Masm With MASM(32 bit, but however) these two lines are rejected as an error. DWORD test.asm(121) : error A2022:instruction operands must be the same size. The

More information

Computers Programming Course 6. Iulian Năstac

Computers Programming Course 6. Iulian Năstac Computers Programming Course 6 Iulian Năstac Recap from previous course Data types four basic arithmetic type specifiers: char int float double void optional specifiers: signed, unsigned short long 2 Recap

More information

Computer Organization: A Programmer's Perspective

Computer Organization: A Programmer's Perspective A Programmer's Perspective Machine-Level Programming (4: Data Structures) Gal A. Kaminka galk@cs.biu.ac.il Today Arrays One-dimensional Multi-dimensional (nested) Multi-level Structures Allocation Access

More information

C Language Part 1 Digital Computer Concept and Practice Copyright 2012 by Jaejin Lee

C Language Part 1 Digital Computer Concept and Practice Copyright 2012 by Jaejin Lee C Language Part 1 (Minor modifications by the instructor) References C for Python Programmers, by Carl Burch, 2011. http://www.toves.org/books/cpy/ The C Programming Language. 2nd ed., Kernighan, Brian,

More information

Lecture 16 Optimizing for the memory hierarchy

Lecture 16 Optimizing for the memory hierarchy Lecture 16 Optimizing for the memory hierarchy A4 has been released Announcements Using SSE intrinsics, you can speed up your code by nearly a factor of 2 Scott B. Baden / CSE 160 / Wi '16 2 Today s lecture

More information

V850 Calling Convention

V850 Calling Convention IAR Application Note V850 Calling Convention SUMMARY This application note describes the calling convention used by IAR Systems V850 compiler for C and Embedded C++. The intended audience is developers

More information

Computer System Architecture

Computer System Architecture CSC 203 1.5 Computer System Architecture Department of Statistics and Computer Science University of Sri Jayewardenepura Instruction Set Architecture (ISA) Level 2 Introduction 3 Instruction Set Architecture

More information

Concepts Introduced in Chapter 3

Concepts Introduced in Chapter 3 Concepts Introduced in Chapter 3 basic instruction set design principles subset of the MIPS assembly language correspondence between high-level language constructs and MIPS assembly code how MIPS assembly

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 4: Logic Operations and Introduction to Conditionals Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Previously examined

More information

Moving from 32 to 64 bits while maintaining compatibility. Orlando Ricardo Nunes Rocha

Moving from 32 to 64 bits while maintaining compatibility. Orlando Ricardo Nunes Rocha Moving from 32 to 64 bits while maintaining compatibility Orlando Ricardo Nunes Rocha Informatics Department, University of Minho 4710 Braga, Portugal orocha@deb.uminho.pt Abstract. The EM64T is a recent

More information

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Using Intel Streaming SIMD Extensions for 3D Geometry Processing Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,

More information

Arithmetic and IO. 25 August 2017

Arithmetic and IO. 25 August 2017 Arithmetic and IO 25 August 2017 Submissions you can submit multiple times to the homework dropbox file name: uppercase first letter, Yourlastname0829.java the system will use the last submission before

More information

Flynn Taxonomy Data-Level Parallelism

Flynn Taxonomy Data-Level Parallelism ecture 27 Computer Science 61C Spring 2017 March 22nd, 2017 Flynn Taxonomy Data-Level Parallelism 1 New-School Machine Structures (It s a bit more complicated!) Software Hardware Parallel Requests Assigned

More information

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri CS356: Discussion #6 Assembly Procedures and Arrays Marco Paolieri (paolieri@usc.edu) Procedures Functions are a key abstraction in software They break down a problem into subproblems. Reusable functionality:

More information

C: How to Program. Week /Mar/05

C: How to Program. Week /Mar/05 1 C: How to Program Week 2 2007/Mar/05 Chapter 2 - Introduction to C Programming 2 Outline 2.1 Introduction 2.2 A Simple C Program: Printing a Line of Text 2.3 Another Simple C Program: Adding Two Integers

More information

Programming for Engineers Iteration

Programming for Engineers Iteration Programming for Engineers Iteration ICEN 200 Spring 2018 Prof. Dola Saha 1 Data type conversions Grade average example,-./0 class average = 23450-67 893/0298 Grade and number of students can be integers

More information

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance. 2.1 Introduction (No questions.) 2.2 A Simple Program: Printing a Line of Text 2.1 Which of the following must every C program have? (a) main (b) #include (c) /* (d) 2.2 Every statement in C

More information

Computer Architecture. Chapter 2-2. Instructions: Language of the Computer

Computer Architecture. Chapter 2-2. Instructions: Language of the Computer Computer Architecture Chapter 2-2 Instructions: Language of the Computer 1 Procedures A major program structuring mechanism Calling & returning from a procedure requires a protocol. The protocol is a sequence

More information

CSCI 402: Computer Architectures

CSCI 402: Computer Architectures CSCI 402: Computer Architectures Arithmetic for Computers (5) Fengguang Song Department of Computer & Information Science IUPUI What happens when the exact result is not any floating point number, too

More information

Assembly Language Programming 64-bit environments

Assembly Language Programming 64-bit environments Assembly Language Programming 64-bit environments October 17, 2017 Some recent history Intel together with HP start to work on 64-bit processor using VLIW technology. Itanium processor is born with the

More information

CS:APP3e Web Aside OPT:SIMD: Achieving Greater Parallelism with SIMD Instructions

CS:APP3e Web Aside OPT:SIMD: Achieving Greater Parallelism with SIMD Instructions CS:APP3e Web Aside OPT:SIMD: Achieving Greater Parallelism with SIMD Instructions Randal E. Bryant David R. O Hallaron January 14, 2016 Notice The material in this document is supplementary material to

More information

Lecture 18. Optimizing for the memory hierarchy

Lecture 18. Optimizing for the memory hierarchy Lecture 18 Optimizing for the memory hierarchy Today s lecture Motivation for using SSE intrinsics Managing Memory Locality 2 If we have simple data dependence patterns, GCC can generate good quality vectorized

More information

Module 2 - Part 2 DATA TYPES AND EXPRESSIONS 1/15/19 CSE 1321 MODULE 2 1

Module 2 - Part 2 DATA TYPES AND EXPRESSIONS 1/15/19 CSE 1321 MODULE 2 1 Module 2 - Part 2 DATA TYPES AND EXPRESSIONS 1/15/19 CSE 1321 MODULE 2 1 Topics 1. Expressions 2. Operator precedence 3. Shorthand operators 4. Data/Type Conversion 1/15/19 CSE 1321 MODULE 2 2 Expressions

More information

History of the Intel 80x86

History of the Intel 80x86 Intel s IA-32 Architecture Cptr280 Dr Curtis Nelson History of the Intel 80x86 1971 - Intel invents the microprocessor, the 4004 1975-8080 introduced 8-bit microprocessor 1978-8086 introduced 16 bit microprocessor

More information

Many of the following slides are taken with permission from. The book is used explicitly in CS 2505 and CS 3214 and as a reference in CS 2506.

Many of the following slides are taken with permission from. The book is used explicitly in CS 2505 and CS 3214 and as a reference in CS 2506. CS 3114 Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS:APP) Randal E. Bryant and David R. O'Hallaron

More information

Intel 64 and IA-32 Architectures Software Developer s Manual

Intel 64 and IA-32 Architectures Software Developer s Manual Intel 64 and IA-32 Architectures Software Developer s Manual Documentation Changes January 2015 Notice: The Intel 64 and IA-32 architectures may contain design defects or errors known as errata that may

More information