Paul Cockshott and Kenneth Renfrew. SIMD Programming. Manual for Linux. and Windows. Springer

Size: px
Start display at page:

Download "Paul Cockshott and Kenneth Renfrew. SIMD Programming. Manual for Linux. and Windows. Springer"

Transcription

1 Paul Cockshott and Kenneth Renfrew SIMD Programming Manual for Linux and Windows Springer

2 List of Tables List of Figures List of Algorithms Introduction xvii xix xxiii xxv I SIMD Programming 1 Paul Cockshott 1 Computer Speed, Program Speed Clocks Width Instruction Speed Overhead Instructions Algorithm Complexity 8 2 SIMD Instruction-sets The SIMD Model The MMX Register Architecture MMX Data-types DNow! Cache Handling Cache Line Length and Prefetching Streaming SIMD Cache Optimisation The Motorola Altivec Architecture 22 3 SIMD Prograrnming in Assembler and C Vectorising C Compilers Dead for Loop Elimination Loop Unrolling Direct Use of Assembler Code The Example Program Use of Assembler Intrinsics 27 V

3 vi 3.4 Use of C++ Classes Use of the Nasm Assembler General Instruction Syntax Operand Forms Directives Linking and Object File Formats Summing a Vector Coordinate Transformations Using 3DNow! Coordinate Transformations Using SSE Instructions 44 4 Intel SIMD Instructions v Types shrl saturate Instructions ADDPS ADDSS ANDNPS ANDPS CMPPS CMPSS COMISS CVTPI2PS CVTPS2PI CVTTPS2PI CVTSI2SS CVTSS2SI CVTTSS2SI DIVPD DIVPS DIVSD DIVSS EMMS FXRSTOR FXSAVE MASKMOVQ MAXPD MAXPS MAXSD MAXSS MINPD MINPS MINSD MINSS MOVAPSJoad MOVAPS_store MOVDJoad 62

4 vii MOVD-store MOVDJoad-sse MOVD_store_sse MOVHLPS MOVHPSioad MOVHPS-store MOVLHPS MOVLPSJoad MOVLPS-store MOVMSKPS MOVNTPS MOVNTQ MOVQJoad MOVQ_store MOVSSJoad MOVSS_store MOVUPSJoad MOVUPS_store MULPD MULPS MULSD MULSS ORPS PACKSSDW PACKSSWB PACKUSWB PADDB PADDB_sse PADDW PADDW.sse PADDD PADDD_sse PADDQ PADDQ-Sse PADDSB PADDSB.sse PADDUSB PADDUSB-sse PAND PAND.sse PANDN PANDN_sse PAVGB PAVGB_sse PAVGW PAVGW-Sse PCMPEQB PCMPEQB_sse 77

5 .' viii PCMPEQW PCMPEQW-sse PCMPEQD PCMPEQDjsse PCMPGTB PCMPGTB-sse PCMPGTW PCMPGTW.sse PCMPGTD PCMPGTD.sse PEXTRW PEXTRW-sse PINSRW PMADDWD PMAXSW PMAXUB PMINSW PMINUB PMOVMSKB PMULHUW PMULHW PMULLW POR PREFETCHNTA PREFETCHT PREFETCHTO PSADBW PSHUFD PSHUFW PSxxf PSUBx PSUBSx PSUBUSx PSWAPD PUNPCKHBW PUNPCKLBW PUNPCKHWD PUNPCKLWD PUNPCKHDQ PUNPCKLDQ PXOR RCPPS RCPSS RSQRTPS RSQRTSS SFENCE SQRTPS SQRTSS 95

6 ix SUBPS SUBSS UNPCKHPS UNPCLPS XORPS DNOW Instructions FEMMS PF2ID PFACC PFADD PFCMPEQ PFCMPGT PFCMPGE PFMAX PFMIN PFMUL PFNACC PFPNACC PFRCP PFRCPIT PFSUB PFSUBR PI2FD PI2FW PREFETCH 105 II SIMD Programming Languages 107 Paul Cockshott 116, 6 Another Approach: Data Parallel Languages Operations on Whole Arrays Array Slicing Ill Conditional Operations Reduction Operations Data Reorganisation Design Goals Target Machines Backward Compatibility Expressive Power Run-time Efficiency Basics of Vector Pascal Formating Rules Alphabet 121

7 x Reserved Words and Identifiers Character Case Spaces and Comments Semicolons Base Types Booleans Integer Numbers Real Numbers Characters and Strings Variables and Constants Declaration Order Constant Declarations Variable Declarations Assignment Predefined Types Expressions and Operators Arithmetic Operations on Boolean Values Equality Operators Ordered Comparison Matrix and Vector Operations Array Declarations Matrix and Vector Arithmetic Array Input/Output Array Slices Vector and Matrix Products Inner Product of Vectors Dot Product of Non-real Typed Vectors Matrix to Vector Product Data-flow Hazards Matrix to Matrix Multiplication Typography of Vector Pascal Programs Algorithmic Features of Vector Pascal Conditional Evaluation Functions User-defined Functions Procedures Procedure ReadAndValidate Function H Function Log Branching Two-way Branches Multi-way Branches Unbounded Iteration While Repeat 160

8 xi 8.5 Bounded Iteration For to For Downto Goto User-defined Types Scalar Types SUCC and PRED ORD Input/Output of Scalars Representation Sub-range Types Representation Dimensioned Numbers Arithmetic on Dimensioned Numbers Handling Different Units of Measurement Records Pointers Pointer Idioms Freeing Storage Set Types Set Literals Operations on Sets String Types Input and Output File Types Binary Files Text Files Operating System Files Output Binary File Output Text File Output Generic Array Output Input Generic Array Input Binary File Input Text File Input File Predicates Random Access to Files Seek filepos Untyped i/o Error Conditions Permutations and Polymorphism Array Reorganisation An Example 200

9 xii Array Shifts Element Permutation Efficiency Considerations Dynamic Arrays Schematic Arrays Polymorphic Functions Multiple Uses of Parametric Units Function dategt 206 III Programming Examples 209 Paul Cockshott 12 Advanced Set Programming Use of Sets to Find Prime Numbers Set Implementation Ordered Sets openfiles loadset Sets of Records Retrieval Operations Use of Sets in Text Indexing Constructing an Indexing Program dirlist: A Program for Traversing a Directory Tree intodir bloomfilter hashword setfilter testfilter The Main Program to Index Files processfile A Retrieval Program Parallel Image Processing Declaring an Image Data Type Brightness and Contrast Adjustment Efficiency in Image Code Image Filtering Blurring Sharpening Comparing Implementations genconv dup prev 241 pm doedges freestore 242

10 xiii 13.5 Digital Half-toning Parallel Half-tone errordifuse Image Resizing Horizontal Resize Horizontal Interpolation Interpolate Vertically Displaying Images demoimg An Example Image Display.. Program The Unit BMP Procedure initbmpheader Procedure storebmpfile Function loadbmpfile Procedure adjustcontrast Procedure pconv Procedure convp Pattern Recognition and Image Compression Principles of Image Compression Data Compression in General Image Compression Vector Quantisation of Images Data Structures encode The K Means Algorithm Vector Quantisation of Colour Images D Graphics Mesh Representation linedemo: An Illustration of 3D Projection demo3d: Main Procedure of linedemo Viewing Matrices SDL Initialisation Create a Rotation Matrix Calculate x mod D Projection Entry Point to Line Drawing Bresenham Line Drawing Procedure Performance 292 IV VIPER 293 Ken Renfrew 16 Introduction to VIPER Rationale The Literate Programming Tool The Mathematical Syntax Converter 296

11 xiv 16.2 A System Overview Which VIPER to Download? System Dependencies Installing Files Setting Up the Compiler Setting Up the System Setting System Dependencies Personal Set-up Dynamic Compiler Options VIPER Option Buttons Moving VIPER Programming with VIPER Single Files Projects Embedding MjjX in Vector Pascal Compiling Files in VIPER Compiling Single Files Compiling Projects Running Programs in VIPER Making VPT$C VPl jx Options VPMath ETeK in VIPER HTML in VIPER Writing Code to Generate Good VPTeX Use of Special Comments Use of Margin Comments Use of Ordinary Pascal Comments Levels of Detail Within Documentation Mathematical Translation: Motivation and Guidelines L#TeX Packages 313 Appendix A Compiler Porting Tools 315 A.1 Dependencies 315 A.2 Compiler Structure 316 A.2.1 Vectorisation 317 A.2.2 Porting Strategy 320 A.3 ILCG 321 A.4 Supported Types 321 A.4.1 Data Formats 321 A.4.2 Typed Formats 322 A.4.3 ref Types 322 A.5 Supported Operations 322 A.5.1 TypeCasts 322 A.5.2 Arithmetic 322 A.5.3 Memory 322

12 xv A.5.4 Assignment 323 A.5.5 Dereferencing 323 A.6 Machine Description 323 A.6.1 Registers 323 A.6.2 Register Sets 324 A. 6.3 Register Arrays 324 A.6.4 Register Stacks 324 A.6.5 Instruction Formats 325 A.7 Grammar of ILCG 325 A.8 ILCG Grammar 326 A.8.1 Helpers 326 A.8.2 Tokens 327 A.8.3 Non-terminal Symbols 329 Appendix B Software Download 335 Appendix C Using the Command Line Compiler 337 C.l Invoking the Compiler 337 C.l.l Environment Variable 337 C.1.2 Compiler Options 337 C.1.3 Dependencies 338 C.2 Calling Conventions 338 C.3 Array Representation 341 C.3.1 Range Checking 341 References 343 Index 345

Instruction Set Progression. from MMX Technology through Streaming SIMD Extensions 2

Instruction Set Progression. from MMX Technology through Streaming SIMD Extensions 2 Instruction Set Progression from MMX Technology through Streaming SIMD Extensions 2 This article summarizes the progression of change to the instruction set in the Intel IA-32 architecture, from MMX technology

More information

Enabling a Superior 3D Visual Computing Experience for Next-Generation x86 Computing Platforms One AMD Place Sunnyvale, CA 94088

Enabling a Superior 3D Visual Computing Experience for Next-Generation x86 Computing Platforms One AMD Place Sunnyvale, CA 94088 Enhanced 3DNow! Technology for the AMD Athlon Processor Enabling a Superior 3D Visual Computing Experience for Next-Generation x86 Computing Platforms ADVANCED MICRO DEVICES, INC. One AMD Place Sunnyvale,

More information

AMD Extensions to the. Instruction Sets Manual

AMD Extensions to the. Instruction Sets Manual AMD Extensions to the 3DNow! TM and MMX Instruction Sets Manual TM 2000 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided in connection with Advanced Micro Devices,

More information

CS802 Parallel Processing Class Notes

CS802 Parallel Processing Class Notes CS802 Parallel Processing Class Notes MMX Technology Instructor: Dr. Chang N. Zhang Winter Semester, 2006 Intel MMX TM Technology Chapter 1: Introduction to MMX technology 1.1 Features of the MMX Technology

More information

IA-32 Intel Architecture Software Developer s Manual

IA-32 Intel Architecture Software Developer s Manual IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of three volumes: Basic Architecture, Order Number

More information

Assembly Language - SSE and SSE2. Introduction to Scalar Floating- Point Operations via SSE

Assembly Language - SSE and SSE2. Introduction to Scalar Floating- Point Operations via SSE Assembly Language - SSE and SSE2 Introduction to Scalar Floating- Point Operations via SSE Floating Point Registers and x86_64 Two sets of registers, in addition to the General-Purpose Registers Three

More information

Intel Xeon Scalable Processor

Intel Xeon Scalable Processor Intel Xeon Scalable Processor Instruction Throughput and Latency August 2017 Revision 1.1 336289-002 Document ID: 336289-002 Revision Number: 1.1 Revision History Document ID Description Date 336289-001

More information

Assembly Language - SSE and SSE2. Floating Point Registers and x86_64

Assembly Language - SSE and SSE2. Floating Point Registers and x86_64 Assembly Language - SSE and SSE2 Introduction to Scalar Floating- Point Operations via SSE Floating Point Registers and x86_64 Two sets of, in addition to the General-Purpose Registers Three additional

More information

Optimizing Graphics Drivers with Streaming SIMD Extensions. Copyright 1999, Intel Corporation. All rights reserved.

Optimizing Graphics Drivers with Streaming SIMD Extensions. Copyright 1999, Intel Corporation. All rights reserved. Optimizing Graphics Drivers with Streaming SIMD Extensions 1 Agenda Features of Pentium III processor Graphics Stack Overview Driver Architectures Streaming SIMD Impact on D3D Streaming SIMD Impact on

More information

Intel SIMD. Chris Phillips LBA Lead Scien-st November 2014 ASTRONOMY AND SPACE SCIENCE

Intel SIMD. Chris Phillips LBA Lead Scien-st November 2014 ASTRONOMY AND SPACE SCIENCE Intel SIMD Chris Phillips LBA Lead Scien-st November 2014 ASTRONOMY AND SPACE SCIENCE SIMD Single Instruc-on Mul-ple Data Vector extensions for x86 processors Parallel opera-ons More registers than regular

More information

CS220. April 25, 2007

CS220. April 25, 2007 CS220 April 25, 2007 AT&T syntax MMX Most MMX documents are in Intel Syntax OPERATION DEST, SRC We use AT&T Syntax OPERATION SRC, DEST Always remember: DEST = DEST OPERATION SRC (Please note the weird

More information

3DNow! Instruction Porting Guide. Application Note

3DNow! Instruction Porting Guide. Application Note 3DNow! Instruction Porting Guide Application Note Publication # 22621 Rev: B Issue Date: August 1999 1999 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided in

More information

NASM The Netwide Assembler

NASM The Netwide Assembler NASM The Netwide Assembler version 0.98.34 -~~..~:#;L.-:#;L,.-.~:#:;.T -~~.~:;..~:;. E8+U *T +U *T#.97 *L E8+ *;T *;, D97 *L.97 *L "T;E+:, D9 *L *L H7 I# T7 I# "*:. H7 I# I# U: :8 *#+, :8 T, 79 U: :8 :8,#B..IE,

More information

Intel SIMD architecture. Computer Organization and Assembly Languages Yung-Yu Chuang

Intel SIMD architecture. Computer Organization and Assembly Languages Yung-Yu Chuang Intel SIMD architecture Computer Organization and Assembly Languages g Yung-Yu Chuang Overview SIMD MMX architectures MMX instructions examples SSE/SSE2 SIMD instructions are probably the best place to

More information

3DNow! Technology Manual

3DNow! Technology Manual 3DNow! TM Technology Manual 1998 Advanced Micro Devices, Inc. All rights reserved. AMD-K6 3D Advanced Micro Devices, Inc. ( AMD ) reserves the right to make changes in its products without notice in order

More information

Intel SIMD architecture. Computer Organization and Assembly Languages Yung-Yu Chuang 2006/12/25

Intel SIMD architecture. Computer Organization and Assembly Languages Yung-Yu Chuang 2006/12/25 Intel SIMD architecture Computer Organization and Assembly Languages Yung-Yu Chuang 2006/12/25 Reference Intel MMX for Multimedia PCs, CACM, Jan. 1997 Chapter 11 The MMX Instruction Set, The Art of Assembly

More information

Applications Tuning for Streaming SIMD Extensions

Applications Tuning for Streaming SIMD Extensions Applications Tuning for Streaming SIMD Extensions James Abel, Kumar Balasubramanian, Mike Bargeron, Tom Craver, Mike Phlipot, Microprocessor Products Group, Intel Corp. Index words: SIMD, streaming, MMX

More information

An Efficient Vector/Matrix Multiply Routine using MMX Technology

An Efficient Vector/Matrix Multiply Routine using MMX Technology An Efficient Vector/Matrix Multiply Routine using MMX Technology Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection

More information

Using MMX Instructions to Implement a 1/3T Equalizer

Using MMX Instructions to Implement a 1/3T Equalizer Using MMX Instructions to Implement a 1/3T Equalizer Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel

More information

Pointers in C. A Hands on Approach. Naveen Toppo. Hrishikesh Dewan

Pointers in C. A Hands on Approach. Naveen Toppo. Hrishikesh Dewan Pointers in C A Hands on Approach Naveen Toppo Hrishikesh Dewan Contents About the Authors Acknowledgments Introduction xiii xv xvii S!Chapter 1: Memory, Runtime Memory Organization, and Virtual Memory

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing W. P. Petersen Seminar for Applied Mathematics Department of Mathematics, ETHZ, Zurich wpp@math. ethz.ch P. Arbenz Institute for Scientific Computing Department Informatik,

More information

SOME ASSEMBLY REQUIRED

SOME ASSEMBLY REQUIRED SOME ASSEMBLY REQUIRED Assembly Language Programming with the AVR Microcontroller TIMOTHY S. MARGUSH CRC Press Taylor & Francis Group CRC Press is an imprint of the Taylor & Francis Croup an Informa business

More information

The Internet Streaming SIMD Extensions

The Internet Streaming SIMD Extensions The Internet Streaming SIMD Extensions Shreekant (Ticky) Thakkar, Microprocessor Products Group, Intel Corp. Tom Huff, Microprocessor Products Group, Intel Corp. ABSTRACT The paper describes the development

More information

How to Write Fast Numerical Code Spring 2012 Lecture 15. Instructor: Markus Püschel TAs: Georg Ofenbeck & Daniele Spampinato

How to Write Fast Numerical Code Spring 2012 Lecture 15. Instructor: Markus Püschel TAs: Georg Ofenbeck & Daniele Spampinato How to Write Fast Numerical Code Spring 2012 Lecture 15 Instructor: Markus Püschel TAs: Georg Ofenbeck & Daniele Spampinato Flynn s Taxonomy Single data Multiple data Single instruction SISD Uniprocessor

More information

Design of Parallel and High-Performance Computing Fall 2018 Lecture: SIMD vector extensions MISD. Uniprocessor

Design of Parallel and High-Performance Computing Fall 2018 Lecture: SIMD vector extensions MISD. Uniprocessor Design of Parallel and High-Performance Computing Fall 2018 Lecture: SIMD vector extensions Instructor: Torsten Hoefler & Markus Püschel TA: Salvatore Di Girolamo Flynn s Taxonomy Single data Multiple

More information

Vector Processors. Kavitha Chandrasekar Sreesudhan Ramkumar

Vector Processors. Kavitha Chandrasekar Sreesudhan Ramkumar Vector Processors Kavitha Chandrasekar Sreesudhan Ramkumar Agenda Why Vector processors Basic Vector Architecture Vector Execution time Vector load - store units and Vector memory systems Vector length

More information

Intel 64 and IA-32 Architectures Software Developer s Manual

Intel 64 and IA-32 Architectures Software Developer s Manual Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of five volumes: Basic Architecture,

More information

IA-32 Intel Architecture Software Developer s Manual

IA-32 Intel Architecture Software Developer s Manual IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of four volumes: Basic Architecture, Order Number

More information

How to Write Fast Numerical Code

How to Write Fast Numerical Code How to Write Fast Numerical Code Lecture: SIMD extensions, SSE, compiler vectorization Instructor: Markus Püschel TA: Daniele Spampinato & Alen Stojanov Planning Next week exam in HG D7.1 Otherwise: work

More information

Intel Architecture MMX Technology

Intel Architecture MMX Technology D Intel Architecture MMX Technology Programmer s Reference Manual March 1996 Order No. 243007-002 Subject to the terms and conditions set forth below, Intel hereby grants you a nonexclusive, nontransferable

More information

P. Specht s Liste der 8-Byte-Floatingpoint Befehle des masm32 Assemblers COMPACTED INTEL PENTIUM-4 PRESCOTT (April 2004) DPFP COMMAND SET

P. Specht s Liste der 8-Byte-Floatingpoint Befehle des masm32 Assemblers COMPACTED INTEL PENTIUM-4 PRESCOTT (April 2004) DPFP COMMAND SET P. Specht s Liste der 8-Byte-Floatingpoint Befehle des masm32 Assemblers COMPACTED INTEL PENTIUM-4 PRESCOTT (April 2004) DPFP COMMAND SET ADDPD ADDSD ADDSUBPD ANDPD ANDNPD CMPPD Add Packed Double-precision

More information

Name :. Roll No. :... Invigilator s Signature : INTRODUCTION TO PROGRAMMING. Time Allotted : 3 Hours Full Marks : 70

Name :. Roll No. :... Invigilator s Signature : INTRODUCTION TO PROGRAMMING. Time Allotted : 3 Hours Full Marks : 70 Name :. Roll No. :..... Invigilator s Signature :.. 2011 INTRODUCTION TO PROGRAMMING Time Allotted : 3 Hours Full Marks : 70 The figures in the margin indicate full marks. Candidates are required to give

More information

Advanced R. V!aylor & Francis Group. Hadley Wickham. ~ CRC Press

Advanced R. V!aylor & Francis Group. Hadley Wickham. ~ CRC Press ~ CRC Press V!aylor & Francis Group Advanced R Hadley Wickham ')'l If trlro r r 1 Introduction 1 1.1 Who should read this book 3 1.2 What you will get out of this book 3 1.3 Meta-techniques... 4 1.4 Recommended

More information

Preface... (vii) CHAPTER 1 INTRODUCTION TO COMPUTERS

Preface... (vii) CHAPTER 1 INTRODUCTION TO COMPUTERS Contents Preface... (vii) CHAPTER 1 INTRODUCTION TO COMPUTERS 1.1. INTRODUCTION TO COMPUTERS... 1 1.2. HISTORY OF C & C++... 3 1.3. DESIGN, DEVELOPMENT AND EXECUTION OF A PROGRAM... 3 1.4 TESTING OF PROGRAMS...

More information

Using MMX Instructions to Implement the G.728 Codebook Search

Using MMX Instructions to Implement the G.728 Codebook Search Using MMX Instructions to Implement the G.728 Codebook Search Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection

More information

Intel Architecture Optimization

Intel Architecture Optimization Intel Architecture Optimization Reference Manual Copyright 1998, 1999 Intel Corporation All Rights Reserved Issued in U.S.A. Order Number: 245127-001 Intel Architecture Optimization Reference Manual Order

More information

Computer Programming C++ (wg) CCOs

Computer Programming C++ (wg) CCOs Computer Programming C++ (wg) CCOs I. The student will analyze the different systems, and languages of the computer. (SM 1.4, 3.1, 3.4, 3.6) II. The student will write, compile, link and run a simple C++

More information

IA-32 Intel Architecture Software Developer s Manual

IA-32 Intel Architecture Software Developer s Manual IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer's Manual consists of five volumes: Basic Architecture, Order Number

More information

SECTION A. (i) The Boolean function in sum of products form where K-map is given below (figure) is:

SECTION A. (i) The Boolean function in sum of products form where K-map is given below (figure) is: SECTION A 1. Fill in the blanks: (i) The Boolean function in sum of products form where K-map is given below (figure) is: C B 0 1 0 1 0 1 A A (ii) Consider a 3-bit error detection and 1-bit error correction

More information

High Performance Computing. Classes of computing SISD. Computation Consists of :

High Performance Computing. Classes of computing SISD. Computation Consists of : High Performance Computing! Introduction to classes of computing! SISD! MISD! SIMD! Conclusion Classes of computing Computation Consists of :! Sequential Instructions (operation)! Sequential dataset We

More information

Figure 1: 128-bit registers introduced by SSE. 128 bits. xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7

Figure 1: 128-bit registers introduced by SSE. 128 bits. xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 SE205 - TD1 Énoncé General Instructions You can download all source files from: https://se205.wp.mines-telecom.fr/td1/ SIMD-like Data-Level Parallelism Modern processors often come with instruction set

More information

Mathematics Shape and Space: Polygon Angles

Mathematics Shape and Space: Polygon Angles a place of mind F A C U L T Y O F E D U C A T I O N Department of Curriculum and Pedagogy Mathematics Shape and Space: Polygon Angles Science and Mathematics Education Research Group Supported by UBC Teaching

More information

Programming. Principles and Practice Using C++ Bjarne Stroustrup. / Addison-Wesley. Second Edition

Programming. Principles and Practice Using C++ Bjarne Stroustrup. / Addison-Wesley. Second Edition Programming Principles and Practice Using C++ Second Edition Bjarne Stroustrup / Addison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid

More information

C# Programming: From Problem Analysis to Program Design. Fourth Edition

C# Programming: From Problem Analysis to Program Design. Fourth Edition C# Programming: From Problem Analysis to Program Design Fourth Edition Preface xxi INTRODUCTION TO COMPUTING AND PROGRAMMING 1 History of Computers 2 System and Application Software 4 System Software 4

More information

Using MMX Instructions to Implement a Synthesis Sub-Band Filter for MPEG Audio Decoding

Using MMX Instructions to Implement a Synthesis Sub-Band Filter for MPEG Audio Decoding Using MMX Instructions to Implement a Synthesis Sub-Band Filter for MPEG Audio Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided

More information

Performance Optimization of an MPEG-2 to MPEG-4 Video Transcoder

Performance Optimization of an MPEG-2 to MPEG-4 Video Transcoder MERL A MITSUBISHI ELECTRIC RESEARCH LABORATORY http://www.merl.com Performance Optimization of an MPEG-2 to MPEG-4 Video Transcoder Hari Kalva Anthony Vetro Huifang Sun TR-2003-57 May 2003 Abstract The

More information

How to Write Fast Numerical Code

How to Write Fast Numerical Code How to Write Fast Numerical Code Lecture: SIMD extensions, SSE, compiler vectorization Instructor: Markus Püschel TA: Gagandeep Singh, Daniele Spampinato, Alen Stojanov Flynn s Taxonomy Single data Multiple

More information

CROSS-REFERENCE TABLE ASME A Including A17.1a-1997 Through A17.1d 2000 vs. ASME A

CROSS-REFERENCE TABLE ASME A Including A17.1a-1997 Through A17.1d 2000 vs. ASME A CROSS-REFERENCE TABLE ASME Including A17.1a-1997 Through A17.1d 2000 vs. ASME 1 1.1 1.1 1.1.1 1.2 1.1.2 1.3 1.1.3 1.4 1.1.4 2 1.2 3 1.3 4 Part 9 100 2.1 100.1 2.1.1 100.1a 2.1.1.1 100.1b 2.1.1.2 100.1c

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

SWAR: MMX, SSE, SSE 2 Multiplatform Programming

SWAR: MMX, SSE, SSE 2 Multiplatform Programming SWAR: MMX, SSE, SSE 2 Multiplatform Programming Relatore: dott. Matteo Roffilli roffilli@csr.unibo.it 1 What s SWAR? SWAR = SIMD Within A Register SIMD = Single Instruction Multiple Data MMX,SSE,SSE2,Power3DNow

More information

CONTENTS. PART 1 Structured Programming 1. 1 Getting started 3. 2 Basic programming elements 17

CONTENTS. PART 1 Structured Programming 1. 1 Getting started 3. 2 Basic programming elements 17 List of Programs xxv List of Figures xxix List of Tables xxxiii Preface to second version xxxv PART 1 Structured Programming 1 1 Getting started 3 1.1 Programming 3 1.2 Editing source code 5 Source code

More information

Vector Pascal. Paul Cockshott, University of Glasgow September 17, 2001

Vector Pascal. Paul Cockshott, University of Glasgow September 17, 2001 Vector Pascal Paul Cockshott, University of Glasgow September 17, 2001 Abstract Vector Pascal is a language designed to support elegant and efficient expression of algorithms using the SIMD model of computation.

More information

JAVA CONCEPTS Early Objects

JAVA CONCEPTS Early Objects INTERNATIONAL STUDENT VERSION JAVA CONCEPTS Early Objects Seventh Edition CAY HORSTMANN San Jose State University Wiley CONTENTS PREFACE v chapter i INTRODUCTION 1 1.1 Computer Programs 2 1.2 The Anatomy

More information

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1 Table of Contents About the Authors... iii Introduction... xvii Chapter 1: System Software... 1 1.1 Concept of System Software... 2 Types of Software Programs... 2 Software Programs and the Computing Machine...

More information

Roll No. : Invigilator's Signature :.. GRAPHICS AND MULTIMEDIA. Time Allotted : 3 Hours Full Marks : 70

Roll No. : Invigilator's Signature :.. GRAPHICS AND MULTIMEDIA. Time Allotted : 3 Hours Full Marks : 70 Name : Roll No. : Invigilator's Signature :.. CS/MCA/SEM-4/MCA-402/2012 2012 GRAPHICS AND MULTIMEDIA Time Allotted : 3 Hours Full Marks : 70 The figures in the margin indicate full marks. Candidates are

More information

CS 261 Fall Mike Lam, Professor. x86-64 Data Structures and Misc. Topics

CS 261 Fall Mike Lam, Professor. x86-64 Data Structures and Misc. Topics CS 261 Fall 2017 Mike Lam, Professor x86-64 Data Structures and Misc. Topics Topics Homogeneous data structures Arrays Nested / multidimensional arrays Heterogeneous data structures Structs / records Unions

More information

Dan Stafford, Justine Bonnot

Dan Stafford, Justine Bonnot Dan Stafford, Justine Bonnot Background Applications Timeline MMX 3DNow! Streaming SIMD Extension SSE SSE2 SSE3 and SSSE3 SSE4 Advanced Vector Extension AVX AVX2 AVX-512 Compiling with x86 Vector Processing

More information

Using MMX Instructions for 3D Bilinear Texture Mapping

Using MMX Instructions for 3D Bilinear Texture Mapping Using MMX Instructions for 3D Bilinear Texture Mapping Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel

More information

Porting and Optimizing Multimedia Codecs for AMD64 architecture on Microsoft Windows

Porting and Optimizing Multimedia Codecs for AMD64 architecture on Microsoft Windows Porting and Optimizing Multimedia Codecs for AMD64 architecture on Microsoft Windows July 21, 2004 Abstract This paper provides information about the significant optimization techniques used for multimedia

More information

The Skeleton Assembly Line

The Skeleton Assembly Line The Skeleton Assembly Line February 27th 2005 J.M.P. van Waveren 2005, Id Software, Inc. Abstract Optimized routines to transform the skeletons and joints for a skeletal animation system are presented.

More information

Real-Time DXT Compression

Real-Time DXT Compression Real-Time DXT Compression May 20th 2006 J.M.P. van Waveren 2006, Id Software, Inc. Abstract S3TC also known as DXT is a lossy texture compression format with a fixed compression ratio. The DXT format is

More information

Intel Architecture Software Developer s Manual

Intel Architecture Software Developer s Manual Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel Architecture Software Developer s Manual consists of three books: Basic Architecture, Order Number 243190; Instruction

More information

Intel MMX Technology Overview

Intel MMX Technology Overview Intel MMX Technology Overview March 996 Order Number: 24308-002 E Information in this document is provided in connection with Intel products. No license under any patent or copyright is granted expressly

More information

Contents. Chapter 1 SPECIFYING SYNTAX 1

Contents. Chapter 1 SPECIFYING SYNTAX 1 Contents Chapter 1 SPECIFYING SYNTAX 1 1.1 GRAMMARS AND BNF 2 Context-Free Grammars 4 Context-Sensitive Grammars 8 Exercises 8 1.2 THE PROGRAMMING LANGUAGE WREN 10 Ambiguity 12 Context Constraints in Wren

More information

Computer Organization and Design

Computer Organization and Design Computer Organization and Design THE H A R D W A R E / S O F T W A R E I N T E R F A C E John L. Hennessy Stanford University David A. Patterson University of California at Berkeley With a contribution

More information

BASIC INTERFACING CONCEPTS

BASIC INTERFACING CONCEPTS Contents i SYLLABUS UNIT - I 8085 ARCHITECTURE Introduction to Microprocessors and Microcontrollers, 8085 Processor Architecture, Internal Operations, Instructions and Timings, Programming the 8085-Introduction

More information

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as 372 Chapter 16 Code Improvement 16.10 Exercises 16.1 In Section 16.2 we suggested replacing the instruction r1 := r2 / 2 with the instruction r1 := r2 >> 1, and noted that the replacement may not be correct

More information

"Charting the Course... MOC Programming in C# with Microsoft Visual Studio Course Summary

Charting the Course... MOC Programming in C# with Microsoft Visual Studio Course Summary Course Summary NOTE - The course delivery has been updated to Visual Studio 2013 and.net Framework 4.5! Description The course focuses on C# program structure, language syntax, and implementation details

More information

SYLLABUS UNIT - I UNIT - II UNIT - III UNIT - IV CHAPTER - 1 : INTRODUCTION CHAPTER - 4 : SYNTAX AX-DIRECTED TRANSLATION TION CHAPTER - 7 : STORA

SYLLABUS UNIT - I UNIT - II UNIT - III UNIT - IV CHAPTER - 1 : INTRODUCTION CHAPTER - 4 : SYNTAX AX-DIRECTED TRANSLATION TION CHAPTER - 7 : STORA Contents i SYLLABUS UNIT - I CHAPTER - 1 : INTRODUCTION Programs Related to Compilers. Translation Process, Major Data Structures, Other Issues in Compiler Structure, Boot Strapping and Porting. CHAPTER

More information

List of Figures. About the Authors. Acknowledgments

List of Figures. About the Authors. Acknowledgments List of Figures Preface About the Authors Acknowledgments xiii xvii xxiii xxv 1 Compilation 1 1.1 Compilers..................................... 1 1.1.1 Programming Languages......................... 1

More information

Contents. Figures. Tables. Examples. Foreword. Preface. 1 Basics of Java Programming 1. xix. xxi. xxiii. xxvii. xxix

Contents. Figures. Tables. Examples. Foreword. Preface. 1 Basics of Java Programming 1. xix. xxi. xxiii. xxvii. xxix PGJC4_JSE8_OCA.book Page ix Monday, June 20, 2016 2:31 PM Contents Figures Tables Examples Foreword Preface xix xxi xxiii xxvii xxix 1 Basics of Java Programming 1 1.1 Introduction 2 1.2 Classes 2 Declaring

More information

WITH C+ + William Ford University of the Pacific. William Topp University of the Pacific. Prentice Hall, Englewood Cliffs, New Jersey 07632

WITH C+ + William Ford University of the Pacific. William Topp University of the Pacific. Prentice Hall, Englewood Cliffs, New Jersey 07632 DATA STRUCTURES WITH C+ + William Ford University of the Pacific William Topp University of the Pacific Prentice Hall, Englewood Cliffs, New Jersey 07632 CONTENTS Preface xvii CHAPTER 1 INTRODUCTION 1

More information

An Architecture Extension for Efficient Geometry Processing

An Architecture Extension for Efficient Geometry Processing An Architecture Extension for Efficient Geometry Processing Radhika Thekkath, Mike Uhler, Chandlee Harrell, Ying-wai Ho MIPS Technologies, Inc. 1225 Charleston Road Mountain View, CA 94043 Talk Outline

More information

Media Instructions, Coprocessors, and Hardware Accelerators. Overview

Media Instructions, Coprocessors, and Hardware Accelerators. Overview Media Instructions, Coprocessors, and Hardware Accelerators Steven P. Smith SoC Design EE382V Fall 2009 EE382 System-on-Chip Design Coprocessors, etc. SPS-1 University of Texas at Austin Overview SoCs

More information

Software Optimization Guide for AMD Family 10h Processors

Software Optimization Guide for AMD Family 10h Processors Software Optimization Guide for AMD Family 10h Processors Publication # 40546 Revision: 3.03 Issue Date: June 2007 Advanced Micro Devices 2006 2007 Advanced Micro Devices, Inc. All rights reserved. The

More information

Excel Programming with VBA (Macro Programming) 24 hours Getting Started

Excel Programming with VBA (Macro Programming) 24 hours Getting Started Excel Programming with VBA (Macro Programming) 24 hours Getting Started Introducing Visual Basic for Applications Displaying the Developer Tab in the Ribbon Recording a Macro Saving a Macro-Enabled Workbook

More information

MMX TM Technology Technical Overview

MMX TM Technology Technical Overview MMX TM Technology Technical Overview Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with Intel products. No license,

More information

MATHEMATICAL STRUCTURES FOR COMPUTER SCIENCE

MATHEMATICAL STRUCTURES FOR COMPUTER SCIENCE MATHEMATICAL STRUCTURES FOR COMPUTER SCIENCE A Modern Approach to Discrete Mathematics SIXTH EDITION Judith L. Gersting University of Hawaii at Hilo W. H. Freeman and Company New York Preface Note to the

More information

Microsoft. Microsoft Visual C# Step by Step. John Sharp

Microsoft. Microsoft Visual C# Step by Step. John Sharp Microsoft Microsoft Visual C#- 2010 Step by Step John Sharp Table of Contents Acknowledgments Introduction xvii xix Part I Introducing Microsoft Visual C# and Microsoft Visual Studio 2010 1 Welcome to

More information

"Charting the Course... Java Programming Language. Course Summary

Charting the Course... Java Programming Language. Course Summary Course Summary Description This course emphasizes becoming productive quickly as a Java application developer. This course quickly covers the Java language syntax and then moves into the object-oriented

More information

Problem solving using standard programming techniques and Turbo C compiler.

Problem solving using standard programming techniques and Turbo C compiler. Course Outcome First Year of B.Sc. IT Program Semester I Course Number:USIT 101 Course Name: Imperative Programming Introduces programming principles and fundamentals of programming. The ability to write

More information

IA-32 Intel Architecture Optimization

IA-32 Intel Architecture Optimization IA-32 Intel Architecture Optimization Reference Manual Issued in U.S.A. Order Number: 248966-009 World Wide Web: http://developer.intel.com INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation

Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided

More information

IA-32 Intel Architecture Optimization

IA-32 Intel Architecture Optimization IA-32 Intel Architecture Optimization Reference Manual Issued in U.S.A. Order Number: 248966-011 World Wide Web: http://developer.intel.com INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

CERTIFICATE IN WEB PROGRAMMING

CERTIFICATE IN WEB PROGRAMMING COURSE DURATION: 6 MONTHS CONTENTS : CERTIFICATE IN WEB PROGRAMMING 1. PROGRAMMING IN C and C++ Language 2. HTML/CSS and JavaScript 3. PHP and MySQL 4. Project on Development of Web Application 1. PROGRAMMING

More information

Using MMX Instructions to Perform Simple Vector Operations

Using MMX Instructions to Perform Simple Vector Operations Using MMX Instructions to Perform Simple Vector Operations Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection with

More information

Teaching the SIMD Execution Model: Assembling a Few Parallel Programming Skills

Teaching the SIMD Execution Model: Assembling a Few Parallel Programming Skills Teaching the SIMD Execution Model: Assembling a Few Parallel Programming Skills Ariel Ortiz Computer Science Department Instituto Tecnológico y de Estudios Superiores de Monterrey Campus Estado de México

More information

Introduction to Creo Elements/Direct 19.0 Modeling

Introduction to Creo Elements/Direct 19.0 Modeling Introduction to Creo Elements/Direct 19.0 Modeling Overview Course Code Course Length TRN-4531-T 3 Day In this course, you will learn the basics about 3-D design using Creo Elements/Direct Modeling. You

More information

Table of Contents. Preface... xxi

Table of Contents. Preface... xxi Table of Contents Preface... xxi Chapter 1: Introduction to Python... 1 Python... 2 Features of Python... 3 Execution of a Python Program... 7 Viewing the Byte Code... 9 Flavors of Python... 10 Python

More information

Using MMX Instructions to Implement a Modem Baseband Canceler

Using MMX Instructions to Implement a Modem Baseband Canceler Using MMX Instructions to Implement a Modem Baseband Canceler Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided in connection

More information

OpenCL Vectorising Features. Andreas Beckmann

OpenCL Vectorising Features. Andreas Beckmann Mitglied der Helmholtz-Gemeinschaft OpenCL Vectorising Features Andreas Beckmann Levels of Vectorisation vector units, SIMD devices width, instructions SMX, SP cores Cus, PEs vector operations within kernels

More information

RAJALAKSHMI ENGINEERING COLLEGE Thandalam, Chennai Department of Computer Science and Engineering CS17201 DATA STRUCTURES Unit-II-Assignment

RAJALAKSHMI ENGINEERING COLLEGE Thandalam, Chennai Department of Computer Science and Engineering CS17201 DATA STRUCTURES Unit-II-Assignment RAJALAKSHMI ENGINEERING COLLEGE Thandalam, Chennai 602 105 Department of Computer Science and Engineering CS17201 DATA STRUCTURES Unit-II-Assignment Reg. No. : Name : Year : Branch: Section: I. Choose

More information

Cannot increase performance by multiple issuing. -limitation of Instruction Fetch and decode rate (memory bottelneck) -Not enough ILP

Cannot increase performance by multiple issuing. -limitation of Instruction Fetch and decode rate (memory bottelneck) -Not enough ILP Vector Processors Motivations: Cannot increase performance with deeper pipeline because: -clock cycle time limitation (latch delay) -increase dependences with deeper pipeline Cannot increase performance

More information

Structured Parallel Programming

Structured Parallel Programming Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

CLASSIC DATA STRUCTURES IN JAVA

CLASSIC DATA STRUCTURES IN JAVA CLASSIC DATA STRUCTURES IN JAVA Timothy Budd Oregon State University Boston San Francisco New York London Toronto Sydney Tokyo Singapore Madrid Mexico City Munich Paris Cape Town Hong Kong Montreal CONTENTS

More information

Direct compilation of high level languages for Multi-media instruction-sets. Paul Cockshott

Direct compilation of high level languages for Multi-media instruction-sets. Paul Cockshott Direct compilation of high level languages for Multi-media instruction-sets Paul Cockshott November 29, 2000 Contents 1 Multi-media instruction-sets 3 1.1 The SIMD model........................... 3 1.2

More information

Structured Parallel Programming Patterns for Efficient Computation

Structured Parallel Programming Patterns for Efficient Computation Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Using MMX Instructions for Procedural Texture Mapping

Using MMX Instructions for Procedural Texture Mapping Using MMX Instructions for Procedural Texture Mapping Based on Perlin's Noise Function Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is

More information

Advanced Computer Architecture Lab 4 SIMD

Advanced Computer Architecture Lab 4 SIMD Advanced Computer Architecture Lab 4 SIMD Moncef Mechri 1 Introduction The purpose of this lab assignment is to give some experience in using SIMD instructions on x86. We will

More information

Murach s Beginning Java with Eclipse

Murach s Beginning Java with Eclipse Murach s Beginning Java with Eclipse Introduction xv Section 1 Get started right Chapter 1 An introduction to Java programming 3 Chapter 2 How to start writing Java code 33 Chapter 3 How to use classes

More information