Assembly Language - SSE and SSE2. Introduction to Scalar Floating- Point Operations via SSE
|
|
- Letitia Rose
- 6 years ago
- Views:
Transcription
1 Assembly Language - SSE and SSE2 Introduction to Scalar Floating- Point Operations via SSE
2 Floating Point Registers and x86_64 Two sets of registers, in addition to the General-Purpose Registers Three additional groups of instructions use the registers Floating-Point Registers, ST0 ST7-80-bit registers a superset of IEEE-754 format - used by the original Floating-Point coprocessors» 8087, 80387, i486 FPU, etc. - Only addressable as a stack, i.e. all operations applied to register st0 Useful for trig, log functions, etc.
3 x86_64 Registers
4 MMX MMX instructions provide SIMD (Single- Instruction, Multiple-Data) processing - Integer operations on multiple data values in parallel MMX registers overlap the Floating-Point registers - MMX registers MMX0 MMX7 are only 64 bits wide - Each register overlaps the lower 64 bits of the 80-bit ST registers - MMX registers are directly addressable, unlike ST Difficult to use MMX and Floating-Point at the same time - slow context switches
5 SSE, SSE2, and beyond "Streaming SIMD Extensions" SSE offers single-precision floating-point SIMD instructions XMM0 XMM15 registers each hold multiple IEEE-754 operands bits wide, four single-precision operands per register SSE2 adds integer and double-precision floating-point SIMD instructions Many compilers use XMM registers and SSE/SSE2 instructions instead of F.P. or MMX
6 Programming Usage C function ABI (Application Binary Interface): Floating-point arguments ("float", "double") are passed in XMM registers - AL register passes count of XMM registers used - Floating-point return values in XMM0, XMM1 if needed Floating-point operations use SSE, SSE2 instructions on XMM registers - Instead of more-awkward floating-point instructions
7 first example Just pass a couple of f.p. (double-type) arguments to a C function - printf() Note that this scanf() doesn't get, or return, f.p. arguments - only pointers The printf() does, however
8 Basic SSE Instructions movsd move a double-precision operand to/from an XMM register addsd, mulsd, divsd, sqrtsd, etc - operate on scalar, double-precision operands in XMM registers movss move a single-precision operand to/from an XMM register addss, mulss, divss, sqrtss, etc - operate on scalar, single-precision operands in XMM registers many other instructions - also see F.P. instructions
9 Example 2 64-bit Division
10 SSE and Constants No SSE instructions to load constants into registers Constants must be created in memory, using Assembler (nasm) features Load constants from memory locations into registers for use
11 64-bit example product (a) This function calculates the product of an array of doubletype numbers. It uses an optimized while loop. It returns 1.0 for an empty array.
12 64-bit product (b) This code sets up a main() routine to read in a set of doubles. Easily adjusted for 32-bit floats. continued on the next slide
13 64-bit product (c) continued from the previous slide The input loop; calling product() and producing the outputs
14 SSE and Converting Numeric Types Convert integers to floats/doubles - cvtsi2ss convert integer to 32-bit float in XMM register - cvtsi2sd convert integer to 64-bit double in XMM register Convert floats/doubles to integers - cvtss2si convert 32-bit float to integer, round up/down - cvttss2si convert 32-bit float to integer, truncate result - cvtsd2si convert 64-bit double to integer, round - cvttsd2si convert 64-bit double to integer, truncate Convert between floats and doubles - cvtss2sd convert 32-bit float to 64-bit double - cvtsd2ss convert 64-bit float to 32-bit float
15 32-bit example mean (a) This function sums up the elements of an array, then divides the sum by the array length. The divisor must be converted from an integer to a float. Instruction cvtsi2ss does this.
16 32-bit mean (b) This code sets up a main() routine to read in a set of floats. continued on the next slide
17 32-bit mean (c) continued from the previous slide The input loop; calling mean() ; producing the outputs The output must be promoted to a double, by cvtss2sd, for printf()
Assembly Language - SSE and SSE2. Floating Point Registers and x86_64
Assembly Language - SSE and SSE2 Introduction to Scalar Floating- Point Operations via SSE Floating Point Registers and x86_64 Two sets of, in addition to the General-Purpose Registers Three additional
More informationCS 261 Fall Mike Lam, Professor. x86-64 Data Structures and Misc. Topics
CS 261 Fall 2017 Mike Lam, Professor x86-64 Data Structures and Misc. Topics Topics Homogeneous data structures Arrays Nested / multidimensional arrays Heterogeneous data structures Structs / records Unions
More informationX86 Assembly Buffer Overflow III:1
X86 Assembly Buffer Overflow III:1 Admin Link to buffer overflow demo http://nsfsecurity.pr.erau.edu/bom/ ASM quick-reference from Larry Zhang (thanks!) http://www.cs.uaf.edu/2010/fall/cs301/support/x86/gcc.html
More informationSoftware Optimization Guide for AMD Family 10h Processors
Software Optimization Guide for AMD Family 10h Processors Publication # 40546 Revision: 3.03 Issue Date: June 2007 Advanced Micro Devices 2006 2007 Advanced Micro Devices, Inc. All rights reserved. The
More informationIA-32 Intel Architecture Software Developer s Manual
IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of three volumes: Basic Architecture, Order Number
More informationSIMD Programming CS 240A, 2017
SIMD Programming CS 240A, 2017 1 Flynn* Taxonomy, 1966 In 2013, SIMD and MIMD most common parallelism in architectures usually both in same system! Most common parallel processing programming style: Single
More informationFixed-Point Math and Other Optimizations
Fixed-Point Math and Other Optimizations Embedded Systems 8-1 Fixed Point Math Why and How Floating point is too slow and integers truncate the data Floating point subroutines: slower than native, overhead
More informationP. Specht s Liste der 8-Byte-Floatingpoint Befehle des masm32 Assemblers COMPACTED INTEL PENTIUM-4 PRESCOTT (April 2004) DPFP COMMAND SET
P. Specht s Liste der 8-Byte-Floatingpoint Befehle des masm32 Assemblers COMPACTED INTEL PENTIUM-4 PRESCOTT (April 2004) DPFP COMMAND SET ADDPD ADDSD ADDSUBPD ANDPD ANDNPD CMPPD Add Packed Double-precision
More informationLecture 16 SSE vectorprocessing SIMD MultimediaExtensions
Lecture 16 SSE vectorprocessing SIMD MultimediaExtensions Improving performance with SSE We ve seen how we can apply multithreading to speed up the cardiac simulator But there is another kind of parallelism
More informationSWAR: MMX, SSE, SSE 2 Multiplatform Programming
SWAR: MMX, SSE, SSE 2 Multiplatform Programming Relatore: dott. Matteo Roffilli roffilli@csr.unibo.it 1 What s SWAR? SWAR = SIMD Within A Register SIMD = Single Instruction Multiple Data MMX,SSE,SSE2,Power3DNow
More informationMACHINE-LEVEL PROGRAMMING IV: Computer Organization and Architecture
MACHINE-LEVEL PROGRAMMING IV: DATA CS 045 Computer Organization and Architecture Prof. Donald J. Patterson Adapted from Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition
More informationSoftware Optimization Guide for AMD Family 15h Processors
Software Optimization Guide for AMD Family 15h Processors Publication No. Revision Date 47414 3.03 April 2011 Advanced Micro Devices 2010, 2011 Advanced Micro Devices, Inc. All rights reserved. The contents
More informationAlgorithms and Computation in Signal Processing
Algorithms and Computation in Signal Processing special topic course 18-799B spring 2005 22 nd lecture Mar. 31, 2005 Instructor: Markus Pueschel Guest instructor: Franz Franchetti TA: Srinivas Chellappa
More informationCOE608: Computer Organization and Architecture
Add on Instruction Set Architecture COE608: Computer Organization and Architecture Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University Overview More
More informationSSE and SSE2. Timothy A. Chagnon 18 September All images from Intel 64 and IA 32 Architectures Software Developer's Manuals
SSE and SSE2 Timothy A. Chagnon 18 September 2007 All images from Intel 64 and IA 32 Architectures Software Developer's Manuals Overview SSE: Streaming SIMD (Single Instruction Multiple Data) Extensions
More informationRun time environment of a MIPS program
Run time environment of a MIPS program Stack pointer Frame pointer Temporary local variables Return address Saved argument registers beyond a0-a3 Low address Growth of stack High address A translation
More informationInstruction Set Progression. from MMX Technology through Streaming SIMD Extensions 2
Instruction Set Progression from MMX Technology through Streaming SIMD Extensions 2 This article summarizes the progression of change to the instruction set in the Intel IA-32 architecture, from MMX technology
More informationIntel 64 and IA-32 Architectures Software Developer s Manual
Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of five volumes: Basic Architecture,
More informationEJEMPLOS DE ARQUITECTURAS
Maestría en Electrónica Arquitectura de Computadoras Unidad 4 EJEMPLOS DE ARQUITECTURAS M. C. Felipe Santiago Espinosa Marzo/2017 ARM & MIPS Similarities ARM: the most popular embedded core Similar basic
More informationCompiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine.
This lecture Compiler construction Lecture 6: Code generation for x86 Magnus Myreen Spring 2018 Chalmers University of Technology Gothenburg University x86 architecture s Some x86 instructions From LLVM
More informationCPEG421/621 Tutorial
CPEG421/621 Tutorial Compiler data representation system call interface calling convention Assembler object file format object code model Linker program initialization exception handling relocation model
More informationFP_IEEE_DENORM_GET_ Procedure
FP_IEEE_DENORM_GET_ Procedure FP_IEEE_DENORM_GET_ Procedure The FP_IEEE_DENORM_GET_ procedure reads the IEEE floating-point denormalization mode. fp_ieee_denorm FP_IEEE_DENORM_GET_ (void); DeNorm The denormalization
More informationIA-32 Intel Architecture Software Developer s Manual
IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of four volumes: Basic Architecture, Order Number
More informationHow to Write Fast Numerical Code Spring 2013 Lecture: Architecture/Microarchitecture and Intel Core
How to Write Fast Numerical Code Spring 2013 Lecture: Architecture/Microarchitecture and Intel Core Instructor: Markus Püschel TA: Daniele Spampinato & Alen Stojanov Technicalities Research project: Let
More informationReview of Last Lecture. CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions. Great Idea #4: Parallelism.
CS 61C: Great Ideas in Computer Architecture The Flynn Taxonomy, Intel SIMD Instructions Instructor: Justin Hsia 1 Review of Last Lecture Amdahl s Law limits benefits of parallelization Request Level Parallelism
More informationCS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions
CS 61C: Great Ideas in Computer Architecture The Flynn Taxonomy, Intel SIMD Instructions Instructor: Justin Hsia 3/08/2013 Spring 2013 Lecture #19 1 Review of Last Lecture Amdahl s Law limits benefits
More informationDan Stafford, Justine Bonnot
Dan Stafford, Justine Bonnot Background Applications Timeline MMX 3DNow! Streaming SIMD Extension SSE SSE2 SSE3 and SSSE3 SSE4 Advanced Vector Extension AVX AVX2 AVX-512 Compiling with x86 Vector Processing
More informationCS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions
CS 61C: Great Ideas in Computer Architecture The Flynn Taxonomy, Intel SIMD Instructions Guest Lecturer: Alan Christopher 3/08/2014 Spring 2014 -- Lecture #19 1 Neuromorphic Chips Researchers at IBM and
More informationLast time. Last Time. Last time. Dynamic Array Multiplication. Dynamic Nested Arrays
Last time Lecture 8: Structures, alignment, floats Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 %rax %rbx %rcx %rdx %rsi %rdi %rsp Return alue Callee saed
More informationInstruction Set extensions to X86. Floating Point SIMD instructions
Instruction Set extensions to X86 Some extensions to x86 instruction set intended to accelerate 3D graphics AMD 3D-Now! Instructions simply accelerate floating point arithmetic. Accelerate object transformations
More informationComputer System Architecture
CSC 203 1.5 Computer System Architecture Budditha Hettige Department of Statistics and Computer Science University of Sri Jayewardenepura Microprocessors 2011 Budditha Hettige 2 Processor Instructions
More informationIA-32 Intel Architecture Software Developer s Manual
IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer's Manual consists of five volumes: Basic Architecture, Order Number
More informationManipulating Integers
Manipulating Integers Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)
More informationIntel SIMD. Chris Phillips LBA Lead Scien-st November 2014 ASTRONOMY AND SPACE SCIENCE
Intel SIMD Chris Phillips LBA Lead Scien-st November 2014 ASTRONOMY AND SPACE SCIENCE SIMD Single Instruc-on Mul-ple Data Vector extensions for x86 processors Parallel opera-ons More registers than regular
More informationHow do you know your GPU or manycore program is correct?
How do you know your GPU or manycore program is correct? Prof. Miriam Leeser Department of Electrical and Computer Engineering Northeastern University Boston, MA mel@coe.neu.edu Typical Radar Processing
More informationFloating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3
Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Instructor: Nicole Hynes nicole.hynes@rutgers.edu 1 Fixed Point Numbers Fixed point number: integer part
More informationMath 230 Assembly Programming (AKA Computer Organization) Spring 2008
Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro II Lect 10 Feb 15, 2008 Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L10.1
More informationImproving the compute performance of video processing software using AVX (Advanced Vector Extensions) instructions
Abstract Modern x86 CPUs permit instruction level parallelism (e.g. SIMD) on register vectors at most 128-bits. Second Generation Intel Core Processors include the first generation of AVX (256-bit operators),
More informationAssignment 11: functions, calling conventions, and the stack
Assignment 11: functions, calling conventions, and the stack ECEN 4553 & 5013, CSCI 4555 & 5525 Prof. Jeremy G. Siek December 5, 2008 The goal of this week s assignment is to remove function definitions
More informationData Types. Data Types. Integer Types. Signed Integers
Data Types Data Types Dr. TGI Fernando 1 2 The fundamental building blocks of any programming language. What is a data type? A data type is a set of values and a set of operations define on these values.
More informationPart 1 Fine-grained Operations
Part 1 Fine-grained Operations As we learned on Monday, CMPXCHG can be used to implement other primitives, such as TestAndSet. int CMPXCHG (int* loc, int oldval, int newval) { ATOMIC(); int old_reg_val
More informationOpenCL Vectorising Features. Andreas Beckmann
Mitglied der Helmholtz-Gemeinschaft OpenCL Vectorising Features Andreas Beckmann Levels of Vectorisation vector units, SIMD devices width, instructions SMX, SP cores Cus, PEs vector operations within kernels
More informationExpressions. Arithmetic expressions. Logical expressions. Assignment expression. n Variables and constants linked with operators
Expressions 1 Expressions n Variables and constants linked with operators Arithmetic expressions n Uses arithmetic operators n Can evaluate to any value Logical expressions n Uses relational and logical
More informationParallel Programming. Easy Cases: Data Parallelism
Parallel Programming The preferred parallel algorithm is generally different from the preferred sequential algorithm Compilers cannot transform a sequential algorithm into a parallel one with adequate
More informationAssembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit
Assembly Language for Intel-Based Computers, 4 th Edition Kip R. Irvine Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit Slides prepared by Kip R. Irvine Revision date: 09/25/2002
More informationIntel 64 and IA-32 Architectures Software Developer s Manual
Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of seven volumes: Basic Architecture,
More information( ZIH ) Center for Information Services and High Performance Computing. Overvi ew over the x86 Processor Architecture
( ZIH ) Center for Information Services and High Performance Computing Overvi ew over the x86 Processor Architecture Daniel Molka Ulf Markwardt Daniel.Molka@tu-dresden.de ulf.markwardt@tu-dresden.de Outline
More informationComputers Programming Course 5. Iulian Năstac
Computers Programming Course 5 Iulian Năstac Recap from previous course Classification of the programming languages High level (Ada, Pascal, Fortran, etc.) programming languages with strong abstraction
More informationChapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers
Chapter 03: Computer Arithmetic Lesson 09: Arithmetic using floating point numbers Objective To understand arithmetic operations in case of floating point numbers 2 Multiplication of Floating Point Numbers
More informationDivide: Paper & Pencil
Divide: Paper & Pencil 1001 Quotient Divisor 1000 1001010 Dividend -1000 10 101 1010 1000 10 Remainder See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or
More informationModule 2: Computer Arithmetic
Module 2: Computer Arithmetic 1 B O O K : C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N, 3 E D, D A V I D L. P A T T E R S O N A N D J O H N L. H A N N E S S Y, M O R G A N K A U F M A N N
More informationLecture 3. More About C
Copyright 1996 David R. Hanson Computer Science 126, Fall 1996 3-1 Lecture 3. More About C Programming languages have their lingo Programming language Types are categories of values int, float, char Constants
More informationMost of the slides in this lecture are either from or adapted from slides provided by the authors of the textbook Computer Systems: A Programmer s
Most of the slides in this lecture are either from or adapted from slides provided by the authors of the textbook Computer Systems: A Programmer s Perspective, 2 nd Edition and are provided from the website
More informationPaul Cockshott and Kenneth Renfrew. SIMD Programming. Manual for Linux. and Windows. Springer
Paul Cockshott and Kenneth Renfrew SIMD Programming Manual for Linux and Windows Springer List of Tables List of Figures List of Algorithms Introduction xvii xix xxiii xxv I SIMD Programming 1 Paul Cockshott
More informationCMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Compilers Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science
More informationFunctions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth
Functions Ray Seyfarth August 4, 2011 Functions We will write C compatible function C++ can also call C functions using extern "C" {...} It is generally not sensible to write complete assembly programs
More informationParallel Processing SIMD, Vector and GPU s
Parallel Processing SIMD, ector and GPU s EECS4201 Comp. Architecture Fall 2017 York University 1 Introduction ector and array processors Chaining GPU 2 Flynn s taxonomy SISD: Single instruction operating
More informationUnited States Naval Academy Electrical and Computer Engineering Department EC310-6 Week Midterm Spring AY2017
United States Naval Academy Electrical and Computer Engineering Department EC310-6 Week Midterm Spring AY2017 1. Do a page check: you should have 8 pages including this cover sheet. 2. You have 50 minutes
More informationprintf( Please enter another number: ); scanf( %d, &num2);
CIT 593 Intro to Computer Systems Lecture #13 (11/1/12) Now that we've looked at how an assembly language program runs on a computer, we're ready to move up a level and start working with more powerful
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationName :. Roll No. :... Invigilator s Signature : INTRODUCTION TO PROGRAMMING. Time Allotted : 3 Hours Full Marks : 70
Name :. Roll No. :..... Invigilator s Signature :.. 2011 INTRODUCTION TO PROGRAMMING Time Allotted : 3 Hours Full Marks : 70 The figures in the margin indicate full marks. Candidates are required to give
More informationLecture 12 Integers. Computer and Network Security 19th of December Computer Science and Engineering Department
Lecture 12 Integers Computer and Network Security 19th of December 2016 Computer Science and Engineering Department CSE Dep, ACS, UPB Lecture 12, Integers 1/40 Outline Data Types Representation Conversions
More informationSystem calls and assembler
System calls and assembler Michal Sojka sojkam1@fel.cvut.cz ČVUT, FEL License: CC-BY-SA 4.0 System calls (repetition from lectures) A way for normal applications to invoke operating system (OS) kernel's
More informationFigure 1: 128-bit registers introduced by SSE. 128 bits. xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7
SE205 - TD1 Énoncé General Instructions You can download all source files from: https://se205.wp.mines-telecom.fr/td1/ SIMD-like Data-Level Parallelism Modern processors often come with instruction set
More informationIn Fig. 3.5 and Fig. 3.7, we include some completely blank lines in the pseudocode for readability. programs into their various phases.
Formulating Algorithms with Top-Down, Stepwise Refinement Case Study 2: Sentinel-Controlled Repetition In Fig. 3.5 and Fig. 3.7, we include some completely blank lines in the pseudocode for readability.
More informationCompiler Design. Homework 1. Due Date: Thursday, January 19, 2006, 2:00
Homework 1 Due Date: Thursday, January 19, 2006, 2:00 Your Name: Question 1 Is SPARC big- or little- Endian? When a word of data is stored in memory, which byte is stored in the first byte (i.e., in the
More informationComputer System and programming in C
1 Basic Data Types Integral Types Integers are stored in various sizes. They can be signed or unsigned. Example Suppose an integer is represented by a byte (8 bits). Leftmost bit is sign bit. If the sign
More informationVector Processors. Kavitha Chandrasekar Sreesudhan Ramkumar
Vector Processors Kavitha Chandrasekar Sreesudhan Ramkumar Agenda Why Vector processors Basic Vector Architecture Vector Execution time Vector load - store units and Vector memory systems Vector length
More informationProcedure Calling. Procedure Calling. Register Usage. 25 September CSE2021 Computer Organization
CSE2021 Computer Organization Chapter 2: Part 2 Procedure Calling Procedure (function) performs a specific task and return results to caller. Supporting Procedures Procedure Calling Calling program place
More informationCS367 Test 1 Review Guide
CS367 Test 1 Review Guide This guide tries to revisit what topics we've covered, and also to briefly suggest/hint at types of questions that might show up on the test. Anything on slides, assigned reading,
More information17. Instruction Sets: Characteristics and Functions
17. Instruction Sets: Characteristics and Functions Chapter 12 Spring 2016 CS430 - Computer Architecture 1 Introduction Section 12.1, 12.2, and 12.3 pp. 406-418 Computer Designer: Machine instruction set
More informationOperators and Expressions:
Operators and Expressions: Operators and expression using numeric and relational operators, mixed operands, type conversion, logical operators, bit operations, assignment operator, operator precedence
More informationFloating Point. The World is Not Just Integers. Programming languages support numbers with fraction
1 Floating Point The World is Not Just Integers Programming languages support numbers with fraction Called floating-point numbers Examples: 3.14159265 (π) 2.71828 (e) 0.000000001 or 1.0 10 9 (seconds in
More informationC Refresher, Advance C, Coding Standard, Misra C Compliance & Real-time Programming
C Refresher, Advance C, Coding Standard, Misra C Compliance & Real-time Programming Course Overview This course transforms an IT-Professional or a Student into an expert C Programming Person with concepts
More informationMASM32 error A2070: invalid instruction operands It's unclear what the data size. Use cmp dword inc ecx Instruction operands must be the same size?
Instruction Operands Must Be The Same Size Masm With MASM(32 bit, but however) these two lines are rejected as an error. DWORD test.asm(121) : error A2022:instruction operands must be the same size. The
More informationComputers Programming Course 6. Iulian Năstac
Computers Programming Course 6 Iulian Năstac Recap from previous course Data types four basic arithmetic type specifiers: char int float double void optional specifiers: signed, unsigned short long 2 Recap
More informationComputer Organization: A Programmer's Perspective
A Programmer's Perspective Machine-Level Programming (4: Data Structures) Gal A. Kaminka galk@cs.biu.ac.il Today Arrays One-dimensional Multi-dimensional (nested) Multi-level Structures Allocation Access
More informationC Language Part 1 Digital Computer Concept and Practice Copyright 2012 by Jaejin Lee
C Language Part 1 (Minor modifications by the instructor) References C for Python Programmers, by Carl Burch, 2011. http://www.toves.org/books/cpy/ The C Programming Language. 2nd ed., Kernighan, Brian,
More informationLecture 16 Optimizing for the memory hierarchy
Lecture 16 Optimizing for the memory hierarchy A4 has been released Announcements Using SSE intrinsics, you can speed up your code by nearly a factor of 2 Scott B. Baden / CSE 160 / Wi '16 2 Today s lecture
More informationV850 Calling Convention
IAR Application Note V850 Calling Convention SUMMARY This application note describes the calling convention used by IAR Systems V850 compiler for C and Embedded C++. The intended audience is developers
More informationComputer System Architecture
CSC 203 1.5 Computer System Architecture Department of Statistics and Computer Science University of Sri Jayewardenepura Instruction Set Architecture (ISA) Level 2 Introduction 3 Instruction Set Architecture
More informationConcepts Introduced in Chapter 3
Concepts Introduced in Chapter 3 basic instruction set design principles subset of the MIPS assembly language correspondence between high-level language constructs and MIPS assembly code how MIPS assembly
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 4: Logic Operations and Introduction to Conditionals Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Previously examined
More informationMoving from 32 to 64 bits while maintaining compatibility. Orlando Ricardo Nunes Rocha
Moving from 32 to 64 bits while maintaining compatibility Orlando Ricardo Nunes Rocha Informatics Department, University of Minho 4710 Braga, Portugal orocha@deb.uminho.pt Abstract. The EM64T is a recent
More informationUsing Intel Streaming SIMD Extensions for 3D Geometry Processing
Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,
More informationArithmetic and IO. 25 August 2017
Arithmetic and IO 25 August 2017 Submissions you can submit multiple times to the homework dropbox file name: uppercase first letter, Yourlastname0829.java the system will use the last submission before
More informationFlynn Taxonomy Data-Level Parallelism
ecture 27 Computer Science 61C Spring 2017 March 22nd, 2017 Flynn Taxonomy Data-Level Parallelism 1 New-School Machine Structures (It s a bit more complicated!) Software Hardware Parallel Requests Assigned
More informationCS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri
CS356: Discussion #6 Assembly Procedures and Arrays Marco Paolieri (paolieri@usc.edu) Procedures Functions are a key abstraction in software They break down a problem into subproblems. Reusable functionality:
More informationC: How to Program. Week /Mar/05
1 C: How to Program Week 2 2007/Mar/05 Chapter 2 - Introduction to C Programming 2 Outline 2.1 Introduction 2.2 A Simple C Program: Printing a Line of Text 2.3 Another Simple C Program: Adding Two Integers
More informationProgramming for Engineers Iteration
Programming for Engineers Iteration ICEN 200 Spring 2018 Prof. Dola Saha 1 Data type conversions Grade average example,-./0 class average = 23450-67 893/0298 Grade and number of students can be integers
More informationc) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.
2.1 Introduction (No questions.) 2.2 A Simple Program: Printing a Line of Text 2.1 Which of the following must every C program have? (a) main (b) #include (c) /* (d) 2.2 Every statement in C
More informationComputer Architecture. Chapter 2-2. Instructions: Language of the Computer
Computer Architecture Chapter 2-2 Instructions: Language of the Computer 1 Procedures A major program structuring mechanism Calling & returning from a procedure requires a protocol. The protocol is a sequence
More informationCSCI 402: Computer Architectures
CSCI 402: Computer Architectures Arithmetic for Computers (5) Fengguang Song Department of Computer & Information Science IUPUI What happens when the exact result is not any floating point number, too
More informationAssembly Language Programming 64-bit environments
Assembly Language Programming 64-bit environments October 17, 2017 Some recent history Intel together with HP start to work on 64-bit processor using VLIW technology. Itanium processor is born with the
More informationCS:APP3e Web Aside OPT:SIMD: Achieving Greater Parallelism with SIMD Instructions
CS:APP3e Web Aside OPT:SIMD: Achieving Greater Parallelism with SIMD Instructions Randal E. Bryant David R. O Hallaron January 14, 2016 Notice The material in this document is supplementary material to
More informationLecture 18. Optimizing for the memory hierarchy
Lecture 18 Optimizing for the memory hierarchy Today s lecture Motivation for using SSE intrinsics Managing Memory Locality 2 If we have simple data dependence patterns, GCC can generate good quality vectorized
More informationModule 2 - Part 2 DATA TYPES AND EXPRESSIONS 1/15/19 CSE 1321 MODULE 2 1
Module 2 - Part 2 DATA TYPES AND EXPRESSIONS 1/15/19 CSE 1321 MODULE 2 1 Topics 1. Expressions 2. Operator precedence 3. Shorthand operators 4. Data/Type Conversion 1/15/19 CSE 1321 MODULE 2 2 Expressions
More informationHistory of the Intel 80x86
Intel s IA-32 Architecture Cptr280 Dr Curtis Nelson History of the Intel 80x86 1971 - Intel invents the microprocessor, the 4004 1975-8080 introduced 8-bit microprocessor 1978-8086 introduced 16 bit microprocessor
More informationMany of the following slides are taken with permission from. The book is used explicitly in CS 2505 and CS 3214 and as a reference in CS 2506.
CS 3114 Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS:APP) Randal E. Bryant and David R. O'Hallaron
More informationIntel 64 and IA-32 Architectures Software Developer s Manual
Intel 64 and IA-32 Architectures Software Developer s Manual Documentation Changes January 2015 Notice: The Intel 64 and IA-32 architectures may contain design defects or errors known as errata that may
More information