SIMD Instructions outside and inside Oracle 12c. Laurent Léturgez 2016

Size: px
Start display at page:

Download "SIMD Instructions outside and inside Oracle 12c. Laurent Léturgez 2016"

Transcription

1 SIMD Instructions outside and inside Oracle 2c Laurent Léturgez 206

2 Whoami Oracle Consultant since 200 Former developer (C, Java, perl, PL/SQL) Data Management on Premise and in the Cloud Blogger since (In french and discontinued) Twitter

3 Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c

4 Caveats Most of the topics are from My own researches My past life as a developer Some of the topics are about internals, so: Analysis and conclusion may be incomplete Future versions of Oracle may change the features Tests have been done with Oracle , Oracle Enterprise Linux 7. (UEKR3), VMWare Fusion 8 (And VirtualBox)

5 Before we start Some fundamentals (from Dennis Yurichev s book) CPU register : [ ]The easiest way to understand a register is to think of it as an untyped temporary variable. Imagine if you were working with high-level PL and could only use eight 32-bit (or 64-bit) variables. Yet a lot can be done using just these! Instruction : A primitive CPU command. The simplest examples include: moving data between registers, working with memory and arithmetic primitives. As a rule, each CPU has its own instruction set architecture (ISA). Assembly language : Mnemonic code and some extensions like macros which are intended to make a programmer s life easier.

6 Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c

7 SIMD instructions outside Oracle 2c SIMD stands for Single Instruction Multiple Data Process multiple data In one CPU instruction Based on Specific registers Specific CPU instructions and sets of instructions Not Oracle specific CPU Architecture specific Intel IBM (Altivec) Sparc (VIS) This presentation is mainly about Intel architecture

8 SIMD instructions outside Oracle 2c What is a SIMD register? It s a CPU register Wider than traditional registers (RDI, RSI, R8, R9 etc.) 28 up to 52 bits wide Contains many data

9 SIMD instructions outside Oracle 2c Scalar operation an array of 4 integers {,2,3,4} add to each value CPU CPU CPU CPU CPU Reg Reg Reg Reg Reg 4 Reg2 Reg2 Reg2 Reg2 Reg2 Reg3 Reg3 Reg3 2 Reg3 2 / Reg3 5 RAM RAM RAM RAM RAM In In In In In Out Out Out Out 2 Out LOAD ADD SAVE 4 LOAD 4 ADD 4 SAVE

10 SIMD instructions outside Oracle 2c SIMD operation an array of 4 integers {,2,3,4} add to each value CPU CPU CPU CPU SIMD Reg SIMD Reg SIMD Reg SIMD Reg SIMD Reg2 SIMD Reg2 SIMD Reg2 SIMD Reg2 SIMD Reg3 SIMD Reg3 SIMD Reg SIMD Reg RAM RAM RAM RAM In In In In Out Out Out Out LOAD ADD SAVE

11 SIMD instructions outside Oracle 2c Instruction set MMX SSE SSE2/SSE3/SSSE3/SSE 4 AVX/AVX2 AVX3 or AVX52 Register Size 64 Bits 28 bits 28 bits 256 Bits 52 bits # Registers Register Name MM0 to MM7 XMM0 to XMM7 XMM0 to XMM5 YMM0 to YMM5 ZMM0 to ZMM3 Processors Pentium II Pentium III Pentium IV to Nehalem Sandy Bridge - Haswell Skylake (initially announced but not available yet) Other Only four 32 bits single precision floating point numbers Usage expansion (two 64 bits double precision, four 32 bits integers and up to sixteen 8 bits bytes) Three operand instructions (non destructive) : A+B=C rather than A=A+B Alignements requirements relaxed

12 SIMD instructions outside Oracle 2c Intel API (C/C++) : Intel Intrinsics Guide Sample code:

13 Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c

14 Will my application use SIMD registers and instructions? It depends on : Hardware Consult processors datasheets to see which instruction set extensions are used (if many) Hypervisor Some (old) hypervisors do not support modern extensions VirtualBox versions <5.0 don t support SSE4, AVX and AVX2 Hyper-V on W2008R2-SP needs patch for specific processors to support AVX

15 Will my application use SIMD registers and instructions? It depends on the Operating System AVX (256 bits) is supported from Linux Kernel >= Redhat EL5 : Oracle EL5 w/uek : AVX needs xsave kernel parameter Solaris 0 upd 0 and Solaris Windows 2008 R2 SP

16 Will my application use SIMD registers and instructions? It depends on the compiler GCC > 4.6 for AVX support Use of specific switches (-msse2, -msse4., msse4.2, -mavx, -mavx2 ) Intel C/C++ Compiler (ICC) >. for AVX Support and > 3.0 for AVX2 support Use of specific switches (-xsse4.2, -xavx, -xcore-avx2 ) Beware of optimization switches (-O,-O2, -O3) More disassemble (if you are allowed to J ) Registers Assembler instructions

17 Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c

18 Raw Performance Based on a C program Used CPU: Haswell microarchitecture (Core i7-4960hq). AVX/AVX2 enabled 3 tests : No SIMD, SSE4, AVX Input: one array containing Million values. Goal: Add to each value, each million values repeated 4k, 8k, 6k and 32k times CPU Time(s) = f(#rows) Quick and Dirty Sample code available here:

19 Raw Performance 90 RAW Performance (CPU) for SIMD Instructions 80 85, CPU Time (Sec) , ,46 25, ,5 3,73 0,35 3,3,96 6,8 3,5 7, M. ROWS 892 M. ROWS 6384 M. ROWS M. ROWS NO SIMD SSE4 (XMM Registers) AVX (YMM Registers)

20 Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c

21 SIMD instructions inside Oracle 2c In Memory Data Structure In Memory Compression Unit : IMCU IMCU is the unit of column store allocation Target size is M rows (controlled by _inmemory_imcu_target_rows) One IMCU can contain more than one column Each column in one IMCU is a column unit (CU)

22 SIMD instructions inside Oracle 2c In memory column store storage indexes For each column unit, min and max values are maintained in a storage index Storage Indexes provide CU pruning IMCU Pruning Information about CU available in GV$IM_COL_CU (Undocumented. See Bug ID )

23 SIMD instructions inside Oracle 2c SIMD extensions are used with In Memory storage indexes for efficient filtering. IM Storage Indexes do IMCU pruning 2. SIMD instructions apply efficiently filter predicates Prod-id IMCU Pruning Filtering with SIMD

24 SIMD instructions inside Oracle 2c Oracle 2c uses specific libraries for SIMD (and compression) Located in $ORACLE_HOME/lib libshpksse422.so for SSE4.2 extensions Compiled with ICC v2 with specific xsse4.2 switch libshpkavx2.so for AVX extensions Compiled with ICC v2 with specific xavx switch libshpkavx22.so for AVX2 extensions Not yet implemented (8 functions implemented) No ICC avx2 switch used because ICC v2 doesn t support AVX2 Thanks Tanel Pöder for this J

25 SIMD instructions inside Oracle 2c Oracle SIMD related functions Located in kdzk kernel module (HPK) Part of Advanced Compression library (ADVCMP) Easily tracked with systemtap

26 SIMD instructions inside Oracle 2c How Oracle uses SIMD extensions? It depends on many parameters OS Level : /proc/cpuinfo AVX and AVX2 support SSE4 Support only

27 SIMD instructions inside Oracle 2c Which library am I using? pmap AVX support SSE4 support

28 SIMD instructions inside Oracle 2c Which compiler options have been used? Read comment section in ELF conf]$ readelf -p.comment $ORACLE_HOME/lib/libshpkavx2.so > egrep -i 'intel gcc' egrep 'xavx mavx [ 2c] -?comment:intel(r) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 2.0 Build / -DNTEV_USE_EPOLL -DNET_USE_LDAP -xavx Read the corresponding compiler documentation

29 SIMD instructions inside Oracle 2c How are SIMD registers used by Oracle? GDB To get the call stack (backtrace) To set breakpoints on interesting functions To view register contents (traditional and SIMD) Info registers for traditional registers Info all-registers for all registers (SIMD reg included) (gdb) print $ymmx.<format> Format can be v8_float, v4_double, v32_int8, v6_int6, v8_int32, v4_int64, or v2_int28

30 SIMD instructions inside Oracle 2c In red, register content has been modified In blue, the second part of the SIMD registers (28 bits) is empty

31 SIMD instructions inside Oracle 2c Oracle IM can use AVX or SSE4 extensions for SIMD operations When AVX is used It uses only 28 bits out of 256 bits wide registers AVX adds new register-state through the 256-bit wide YMM register file Explicit operating system support is required to properly save and restore AVX's expanded registers between context switches Without this, only AVX 28-bit is supported

32 SIMD instructions inside Oracle 2c The culprit Oracle is supported from EL5 onwards EL5 Redhat Kernel is and this flag (xsave) is supported from kernels For compatibility reasons, Oracle has to compile its code on kernels

33 Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c

34 Tracing SIMD in Oracle 2c Interesting components to trace for SIMD and/or IMCU Pruning are : ADVCMP_DECOMP.* ADVCMP_DECOMP_HPK : SIMD functions ADVCMP_DECOMP_PCODE : Portable Code Machine (usually not related to specific CPU instructions) IM_optimizer Gives information about CBO calculation related to IM

35 Tracing SIMD in Oracle 2c ADVCMP_DECOMP ADVCMP_DECOMP_HPK Information is available in the trace file (for each IMCU processed) Used library and function Number of rows and counting algorithm Processing rate (comparison and decompression if relevant) But nothing on the results of the processing L

36 Tracing SIMD in Oracle 2c ADVCMP_DECOMP ADVCMP_DECOMP_HPK Gives information about SIMD function usage and filtering (after IMCU pruning) Example: inmemory table with NO MEMCOMPRESS or DML compression

37 Tracing SIMD in Oracle 2c ADVCMP_DECOMP ADVCMP_DECOMP_HPK Example: inmemory compressed table SIMD are used only in the kdzk_eq_dict functions

38 Tracing SIMD in Oracle 2c My thoughts about compression/decompression NO MEMCOMPRESS / COMPRESS FOR DML kdzk*dynp* functions (ex: kdzk_eq_dynp_6bit, kdzk_le_dynp_32bit etc.) FOR QUERY LOW / QUERY HIGH Dictionary Encoding (LZW?) : kdzk_*dict* functions (ex: kdzk_eq_dict_7bit, kdzk_le_dict_4bit etc.) Run Length Encoding: kdzk_burst_rle* functions (ex: kdzk_burst_rle_8bit, kdzk_burst_rle_6bit ) Bit packing compression: kdzk*fixed* functions (ex: kdzk_ge_lt_fixed_32bit, kdzk_lt_fixed_8bit )

39 Tracing SIMD in Oracle 2c My thoughts about compression/decompression FOR CAPACITY LOW FOR QUERY LOW + additional proprietary compression (OZIP) Functions: ozip_decode_dict*, kdzk_ozip_decode* (Ex: kdzk_ozip_decode_dydi, ozip_decode_dict_9_bit etc.) FOR CAPACITY HIGH FOR QUERY HIGH + heavy weigth compression algorithm Compression/decompression method depends on: Datatype Column Compression Unit size Column contents

40

OpenCL Vectorising Features. Andreas Beckmann

OpenCL Vectorising Features. Andreas Beckmann Mitglied der Helmholtz-Gemeinschaft OpenCL Vectorising Features Andreas Beckmann Levels of Vectorisation vector units, SIMD devices width, instructions SMX, SP cores Cus, PEs vector operations within kernels

More information

Dan Stafford, Justine Bonnot

Dan Stafford, Justine Bonnot Dan Stafford, Justine Bonnot Background Applications Timeline MMX 3DNow! Streaming SIMD Extension SSE SSE2 SSE3 and SSSE3 SSE4 Advanced Vector Extension AVX AVX2 AVX-512 Compiling with x86 Vector Processing

More information

Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany

Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Motivation C AVX2 AVX512 New instructions utilized! Scalar performance

More information

SIMD: Data parallel execution

SIMD: Data parallel execution ERLANGEN REGIONAL COMPUTING CENTER SIMD: Data parallel execution J. Eitzinger HLRS, 15.6.2018 CPU Stored Program Computer: Base setting Memory for (int j=0; j

More information

SWAR: MMX, SSE, SSE 2 Multiplatform Programming

SWAR: MMX, SSE, SSE 2 Multiplatform Programming SWAR: MMX, SSE, SSE 2 Multiplatform Programming Relatore: dott. Matteo Roffilli roffilli@csr.unibo.it 1 What s SWAR? SWAR = SIMD Within A Register SIMD = Single Instruction Multiple Data MMX,SSE,SSE2,Power3DNow

More information

( ZIH ) Center for Information Services and High Performance Computing. Overvi ew over the x86 Processor Architecture

( ZIH ) Center for Information Services and High Performance Computing. Overvi ew over the x86 Processor Architecture ( ZIH ) Center for Information Services and High Performance Computing Overvi ew over the x86 Processor Architecture Daniel Molka Ulf Markwardt Daniel.Molka@tu-dresden.de ulf.markwardt@tu-dresden.de Outline

More information

SIMD Exploitation in (JIT) Compilers

SIMD Exploitation in (JIT) Compilers SIMD Exploitation in (JIT) Compilers Hiroshi Inoue, IBM Research - Tokyo 1 What s SIMD? Single Instruction Multiple Data Same operations applied for multiple elements in a vector register input 1 A0 input

More information

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 This release of the Intel C++ Compiler 16.0 product is a Pre-Release, and as such is 64 architecture processor supporting

More information

Chapter 5 C. Virtual machines

Chapter 5 C. Virtual machines Chapter 5 C Virtual machines Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple guests Avoids security and reliability problems Aids sharing

More information

High Performance Computing and Programming 2015 Lab 6 SIMD and Vectorization

High Performance Computing and Programming 2015 Lab 6 SIMD and Vectorization High Performance Computing and Programming 2015 Lab 6 SIMD and Vectorization 1 Introduction The purpose of this lab assignment is to give some experience in using SIMD instructions on x86 and getting compiler

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

CHAPTER 16 - VIRTUAL MACHINES

CHAPTER 16 - VIRTUAL MACHINES CHAPTER 16 - VIRTUAL MACHINES 1 OBJECTIVES Explore history and benefits of virtual machines. Discuss the various virtual machine technologies. Describe the methods used to implement virtualization. Show

More information

Oracle Database In-Memory

Oracle Database In-Memory Oracle Database In-Memory Mark Weber Principal Sales Consultant November 12, 2014 Row Format Databases vs. Column Format Databases Row SALES Transactions run faster on row format Example: Insert or query

More information

Figure 1: 128-bit registers introduced by SSE. 128 bits. xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7

Figure 1: 128-bit registers introduced by SSE. 128 bits. xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 SE205 - TD1 Énoncé General Instructions You can download all source files from: https://se205.wp.mines-telecom.fr/td1/ SIMD-like Data-Level Parallelism Modern processors often come with instruction set

More information

Intel X86 Assembler Instruction Set Opcode Table

Intel X86 Assembler Instruction Set Opcode Table Intel X86 Assembler Instruction Set Opcode Table x86 Instruction Set Reference. Derived from the September 2014 version of the Intel 64 and IA-32 LGDT, Load Global/Interrupt Descriptor Table Register.

More information

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri CS356: Discussion #6 Assembly Procedures and Arrays Marco Paolieri (paolieri@usc.edu) Procedures Functions are a key abstraction in software They break down a problem into subproblems. Reusable functionality:

More information

COE608: Computer Organization and Architecture

COE608: Computer Organization and Architecture Add on Instruction Set Architecture COE608: Computer Organization and Architecture Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University Overview More

More information

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013 CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013 TOPICS TODAY Moore s Law Evolution of Intel CPUs IA-32 Basic Execution Environment IA-32 General Purpose Registers

More information

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Roadmap. Java: Assembly language: OS: Machine code: Computer system: Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Assembly language: Machine code: Computer system: get_mpg: pushq movq... popq ret %rbp %rsp, %rbp

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 4

ECE 571 Advanced Microprocessor-Based Design Lecture 4 ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted

More information

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36 The Challenges of X86 Hardware Virtualization GCC- Virtualization: Rajeev Wankar 36 The Challenges of X86 Hardware Virtualization X86 operating systems are designed to run directly on the bare-metal hardware,

More information

Technical Report. Research Lab: LERIA

Technical Report. Research Lab: LERIA Technical Report Improvement of Fitch function for Maximum Parsimony in Phylogenetic Reconstruction with Intel AVX2 assembler instructions Research Lab: LERIA TR20130624-1 Version 1.0 24 June 2013 JEAN-MICHEL

More information

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Virtual Machines and Dynamic Translation: Implementing ISAs in Software Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application

More information

Functions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth

Functions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth Functions Ray Seyfarth August 4, 2011 Functions We will write C compatible function C++ can also call C functions using extern "C" {...} It is generally not sensible to write complete assembly programs

More information

EE382M 15: Assignment 2

EE382M 15: Assignment 2 EE382M 15: Assignment 2 Professor: Lizy K. John TA: Jee Ho Ryoo Department of Electrical and Computer Engineering University of Texas, Austin Due: 11:59PM September 28, 2014 1. Introduction The goal of

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

FFTSS Library Version 3.0 User s Guide

FFTSS Library Version 3.0 User s Guide Last Modified: 31/10/07 FFTSS Library Version 3.0 User s Guide Copyright (C) 2002-2007 The Scalable Software Infrastructure Project, is supported by the Development of Software Infrastructure for Large

More information

Compression Device Drivers. Release

Compression Device Drivers. Release Compression Device Drivers Release 18.08.0 August 09, 2018 CONTENTS 1 Compression Device Supported Functionality Matrices 1 1.1 Supported Feature Flags............................... 1 2 ISA-L Compression

More information

Sarah Knepper. Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018

Sarah Knepper. Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018 Sarah Knepper Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018 Outline Motivation Problem statement and solutions Simple example Performance comparison 2 Motivation Partial differential equations

More information

Kampala August, Agner Fog

Kampala August, Agner Fog Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler

More information

Computer System Architecture

Computer System Architecture CSC 203 1.5 Computer System Architecture Department of Statistics and Computer Science University of Sri Jayewardenepura Instruction Set Architecture (ISA) Level 2 Introduction 3 Instruction Set Architecture

More information

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture Computer Science 324 Computer Architecture Mount Holyoke College Fall 2007 Topic Notes: MIPS Instruction Set Architecture vonneumann Architecture Modern computers use the vonneumann architecture. Idea:

More information

CS 261 Fall Mike Lam, Professor. x86-64 Data Structures and Misc. Topics

CS 261 Fall Mike Lam, Professor. x86-64 Data Structures and Misc. Topics CS 261 Fall 2017 Mike Lam, Professor x86-64 Data Structures and Misc. Topics Topics Homogeneous data structures Arrays Nested / multidimensional arrays Heterogeneous data structures Structs / records Unions

More information

Principles of Computer Architecture. Chapter 5: Languages and the Machine

Principles of Computer Architecture. Chapter 5: Languages and the Machine 5-1 Chapter 5 - Languages and the Machine Principles of Computer Architecture Miles Murdocca and Vincent Heuring Chapter 5: Languages and the Machine 5-2 Chapter 5 - Languages and the Machine 5.1 The Compilation

More information

Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism.

Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism. Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism. Parallel + SIMD is the Path Forward Intel Xeon and Intel Xeon Phi Product

More information

COMP3221: Microprocessors and. and Embedded Systems. Instruction Set Architecture (ISA) What makes an ISA? #1: Memory Models. What makes an ISA?

COMP3221: Microprocessors and. and Embedded Systems. Instruction Set Architecture (ISA) What makes an ISA? #1: Memory Models. What makes an ISA? COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) http://www.cse.unsw.edu.au/~cs3221 Lecturer: Hui Wu Session 2, 2005 Instruction Set Architecture (ISA) ISA is

More information

CS 16: Assembly Language Programming for the IBM PC and Compatibles

CS 16: Assembly Language Programming for the IBM PC and Compatibles CS 16: Assembly Language Programming for the IBM PC and Compatibles Discuss the general concepts Look at IA-32 processor architecture and memory management Dive into 64-bit processors Explore the components

More information

CSCI 8530 Advanced Operating Systems. Part 19 Virtualization

CSCI 8530 Advanced Operating Systems. Part 19 Virtualization CSCI 8530 Advanced Operating Systems Part 19 Virtualization Virtualization This is a very old idea It appears in many different forms A variety of commercial products exist The idea has become hot again

More information

Modern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, And AVX PDF

Modern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, And AVX PDF Modern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, And AVX PDF Modern X86 Assembly Language Programming shows the fundamentals of x86 assembly language programming. It focuses on the aspects

More information

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. 11 1 This Set 11 1 These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. Text covers multiple-issue machines in Chapter 4, but

More information

MASM32 error A2070: invalid instruction operands It's unclear what the data size. Use cmp dword inc ecx Instruction operands must be the same size?

MASM32 error A2070: invalid instruction operands It's unclear what the data size. Use cmp dword inc ecx Instruction operands must be the same size? Instruction Operands Must Be The Same Size Masm With MASM(32 bit, but however) these two lines are rejected as an error. DWORD test.asm(121) : error A2022:instruction operands must be the same size. The

More information

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Parallel is the Path Forward Intel Xeon and Intel Xeon Phi Product Families are both going parallel Intel Xeon processor

More information

Intel Parallel Studio XE 2015

Intel Parallel Studio XE 2015 2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:

More information

Compression Device Drivers. Release rc3

Compression Device Drivers. Release rc3 Compression Device Drivers Release 19.02.0-rc3 January 20, 2019 CONTENTS 1 Compression Device Supported Functionality Matrices 1 1.1 Supported Feature Flags............................... 1 2 ISA-L Compression

More information

Registers. Registers

Registers. Registers All computers have some registers visible at the ISA level. They are there to control execution of the program hold temporary results visible at the microarchitecture level, such as the Top Of Stack (TOS)

More information

CSC 252: Computer Organization Spring 2018: Lecture 5

CSC 252: Computer Organization Spring 2018: Lecture 5 CSC 252: Computer Organization Spring 2018: Lecture 5 Instructor: Yuhao Zhu Department of Computer Science University of Rochester Action Items: Assignment 1 is due tomorrow, midnight Assignment 2 is out

More information

Characterization of Native Signal Processing Extensions

Characterization of Native Signal Processing Extensions Characterization of Native Signal Processing Extensions Jason Law Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 jlaw@mail.utexas.edu Abstract Soon if

More information

SIMD Programming CS 240A, 2017

SIMD Programming CS 240A, 2017 SIMD Programming CS 240A, 2017 1 Flynn* Taxonomy, 1966 In 2013, SIMD and MIMD most common parallelism in architectures usually both in same system! Most common parallel processing programming style: Single

More information

CS 101, Mock Computer Architecture

CS 101, Mock Computer Architecture CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically

More information

Identifying performance issues beyond the Oracle wait interface

Identifying performance issues beyond the Oracle wait interface Identifying performance issues beyond the Oracle wait interface Stefan Koehler 11.11.15 Page 1 About me Stefan Koehler Independent Oracle performance consultant and researcher 12+ years using Oracle RDBMS

More information

COMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA)

COMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA) COMP2121: Microprocessors and Interfacing Instruction Set Architecture (ISA) http://www.cse.unsw.edu.au/~cs2121 Lecturer: Hui Wu Session 2, 2017 1 Contents Memory models Registers Data types Instructions

More information

Oracle Database In-Memory

Oracle Database In-Memory Oracle Database In-Memory Under The Hood Andy Cleverly andy.cleverly@oracle.com Director Database Technology Oracle EMEA Technology Safe Harbor Statement The following is intended to outline our general

More information

Computer Basics 1/24/13. Computer Organization. Computer systems consist of hardware and software.

Computer Basics 1/24/13. Computer Organization. Computer systems consist of hardware and software. Hardware and Software Computer Basics TOPICS Computer Organization Data Representation Program Execution Computer Languages Computer systems consist of hardware and software. Hardware includes the tangible

More information

MACHINE-LEVEL PROGRAMMING IV: Computer Organization and Architecture

MACHINE-LEVEL PROGRAMMING IV: Computer Organization and Architecture MACHINE-LEVEL PROGRAMMING IV: DATA CS 045 Computer Organization and Architecture Prof. Donald J. Patterson Adapted from Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

More information

The x86 Architecture. ICS312 - Spring 2018 Machine-Level and Systems Programming. Henri Casanova

The x86 Architecture. ICS312 - Spring 2018 Machine-Level and Systems Programming. Henri Casanova The x86 Architecture ICS312 - Spring 2018 Machine-Level and Systems Programming Henri Casanova (henric@hawaii.edu) The 80x86 Architecture! To learn assembly programming we need to pick a processor family

More information

What Transitioning from 32-bit to 64-bit x86 Computing Means Today

What Transitioning from 32-bit to 64-bit x86 Computing Means Today What Transitioning from 32-bit to 64-bit x86 Computing Means Today Chris Wanner Senior Architect, Industry Standard Servers Hewlett-Packard 2004 Hewlett-Packard Development Company, L.P. The information

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 27 Virtualization Slides based on Various sources 1 1 Virtualization Why we need virtualization? The concepts and

More information

Memory Models. Registers

Memory Models. Registers Memory Models Most machines have a single linear address space at the ISA level, extending from address 0 up to some maximum, often 2 32 1 bytes or 2 64 1 bytes. Some machines have separate address spaces

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2007 Lecture 14: Virtual Machines 563 L14.1 Fall 2009 Outline Types of Virtual Machine User-level (or Process VMs) System-level Techniques for implementing all

More information

Design of CPU Simulation Software for ARMv7 Instruction Set Architecture

Design of CPU Simulation Software for ARMv7 Instruction Set Architecture Design of CPU Simulation Software for ARMv7 Instruction Set Architecture Author: Dillon Tellier Advisor: Dr. Christopher Lupo Date: June 2014 1 INTRODUCTION Simulations have long been a part of the engineering

More information

CS4617 Computer Architecture

CS4617 Computer Architecture 1/27 CS4617 Computer Architecture Lecture 7: Instruction Set Architectures Dr J Vaughan October 1, 2014 2/27 ISA Classification Stack architecture: operands on top of stack Accumulator architecture: 1

More information

last time out-of-order execution and instruction queues the data flow model idea

last time out-of-order execution and instruction queues the data flow model idea 1 last time 2 out-of-order execution and instruction queues the data flow model idea graph of operations linked by depedencies latency bound need to finish longest dependency chain multiple accumulators

More information

Introduction to the x86 Architecture. Camiel Vanderhoeven

Introduction to the x86 Architecture. Camiel Vanderhoeven Introduction to the x86 Architecture Camiel Vanderhoeven September 29, 2015 Introduction to the x86 Architecture This information contains forward looking statements and is provided solely for your convenience.

More information

SSE and SSE2. Timothy A. Chagnon 18 September All images from Intel 64 and IA 32 Architectures Software Developer's Manuals

SSE and SSE2. Timothy A. Chagnon 18 September All images from Intel 64 and IA 32 Architectures Software Developer's Manuals SSE and SSE2 Timothy A. Chagnon 18 September 2007 All images from Intel 64 and IA 32 Architectures Software Developer's Manuals Overview SSE: Streaming SIMD (Single Instruction Multiple Data) Extensions

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2018 Lecture 4 LAST TIME Enhanced our processor design in several ways Added branching support Allows programs where work is proportional to the input values

More information

Recent Innovations in Data Storage Technologies Dr Roger MacNicol Software Architect

Recent Innovations in Data Storage Technologies Dr Roger MacNicol Software Architect Recent Innovations in Data Storage Technologies Dr Roger MacNicol Software Architect Copyright 2017, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is intended to

More information

x86 Programming I CSE 351 Winter

x86 Programming I CSE 351 Winter x86 Programming I CSE 351 Winter 2017 http://xkcd.com/409/ Administrivia Lab 2 released! Da bomb! Go to section! No Luis OH Later this week 2 Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

Masterpraktikum Scientific Computing

Masterpraktikum Scientific Computing Masterpraktikum Scientific Computing High-Performance Computing Michael Bader Alexander Heinecke Technische Universität München, Germany Outline Logins Levels of Parallelism Single Processor Systems Von-Neumann-Principle

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 2: Hardware/Software Interface Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Basic computer components How does a microprocessor

More information

Compiling for Scalable Computing Systems the Merit of SIMD. Ayal Zaks Intel Corporation Acknowledgements: too many to list

Compiling for Scalable Computing Systems the Merit of SIMD. Ayal Zaks Intel Corporation Acknowledgements: too many to list Compiling for Scalable Computing Systems the Merit of SIMD Ayal Zaks Intel Corporation Acknowledgements: too many to list Takeaways 1. SIMD is mainstream and ubiquitous in HW 2. Compiler support for SIMD

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Compilers Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science

More information

CS/COE 0449 term 2174 Lab 5: gdb

CS/COE 0449 term 2174 Lab 5: gdb CS/COE 0449 term 2174 Lab 5: gdb What is a debugger? A debugger is a program that helps you find logical mistakes in your programs by running them in a controlled way. Undoubtedly by this point in your

More information

EJEMPLOS DE ARQUITECTURAS

EJEMPLOS DE ARQUITECTURAS Maestría en Electrónica Arquitectura de Computadoras Unidad 4 EJEMPLOS DE ARQUITECTURAS M. C. Felipe Santiago Espinosa Marzo/2017 ARM & MIPS Similarities ARM: the most popular embedded core Similar basic

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

COMPUTER ORGANIZATION & ARCHITECTURE

COMPUTER ORGANIZATION & ARCHITECTURE COMPUTER ORGANIZATION & ARCHITECTURE Instructions Sets Architecture Lesson 5a 1 What are Instruction Sets The complete collection of instructions that are understood by a CPU Can be considered as a functional

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University 1. Introduction 2. System Structures 3. Process Concept 4. Multithreaded Programming

More information

Computer Basics 1/6/16. Computer Organization. Computer systems consist of hardware and software.

Computer Basics 1/6/16. Computer Organization. Computer systems consist of hardware and software. Hardware and Software Computer Basics TOPICS Computer Organization Data Representation Program Execution Computer Languages Computer systems consist of hardware and software. Hardware includes the tangible

More information

Computer Processors. Part 2. Components of a Processor. Execution Unit The ALU. Execution Unit. The Brains of the Box. Processors. Execution Unit (EU)

Computer Processors. Part 2. Components of a Processor. Execution Unit The ALU. Execution Unit. The Brains of the Box. Processors. Execution Unit (EU) Part 2 Computer Processors Processors The Brains of the Box Computer Processors Components of a Processor The Central Processing Unit (CPU) is the most complex part of a computer In fact, it is the computer

More information

MIPS ISA-II: Procedure Calls & Program Assembly

MIPS ISA-II: Procedure Calls & Program Assembly MIPS ISA-II: Procedure Calls & Program Assembly Module Outline Reiew ISA and understand instruction encodings Arithmetic and Logical Instructions Reiew memory organization Memory (data moement) instructions

More information

Machine-level Representation of Programs

Machine-level Representation of Programs Machine-level Representation of Programs Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu

More information

Technology in Action. Chapter 5 System Software: The Operating System, Utility Programs, and File Management

Technology in Action. Chapter 5 System Software: The Operating System, Utility Programs, and File Management Technology in Action Chapter 5 System Software: The Operating System, Utility Programs, and File Management Chapter Topics Operating System Fundamentals What the Operating System Does The Boot Process:

More information

CS Bootcamp x86-64 Autumn 2015

CS Bootcamp x86-64 Autumn 2015 The x86-64 instruction set architecture (ISA) is used by most laptop and desktop processors. We will be embedding assembly into some of our C++ code to explore programming in assembly language. Depending

More information

ECE 486/586. Computer Architecture. Lecture # 8

ECE 486/586. Computer Architecture. Lecture # 8 ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Lecture Topics Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls

More information

Instruction Set Architectures

Instruction Set Architectures Instruction Set Architectures! ISAs! Brief history of processors and architectures! C, assembly, machine code! Assembly basics: registers, operands, move instructions 1 What should the HW/SW interface

More information

Assembly Language for x86 Processors 7 th Edition. Chapter 2: x86 Processor Architecture

Assembly Language for x86 Processors 7 th Edition. Chapter 2: x86 Processor Architecture Assembly Language for x86 Processors 7 th Edition Kip Irvine Chapter 2: x86 Processor Architecture Slides prepared by the author Revision date: 1/15/2014 (c) Pearson Education, 2015. All rights reserved.

More information

Growth in Cores - A well rehearsed story

Growth in Cores - A well rehearsed story Intel CPUs Growth in Cores - A well rehearsed story 2 1. Multicore is just a fad! Copyright 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

More information

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth Registers Ray Seyfarth September 8, 2011 Outline 1 Register basics 2 Moving a constant into a register 3 Moving a value from memory into a register 4 Moving values from a register into memory 5 Moving

More information

Machine-level Representation of Programs. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Machine-level Representation of Programs. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Machine-level Representation of Programs Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Program? 짬뽕라면 준비시간 :10 분, 조리시간 :10 분 재료라면 1개, 스프 1봉지, 오징어

More information

Using Intel AVX without Writing AVX

Using Intel AVX without Writing AVX 1 White Paper Using Intel AVX without Writing AVX Introduction and Tools Intel Advanced Vector Extensions (Intel AVX) is a new 256-bit instruction set extension to Intel Streaming SIMD Extensions (Intel

More information

Exercise Session 6. Data Processing on Modern Hardware L Fall Semester Cagri Balkesen

Exercise Session 6. Data Processing on Modern Hardware L Fall Semester Cagri Balkesen Cagri Balkesen Data Processing on Modern Hardware Exercises Fall 2012 1 Exercise Session 6 Data Processing on Modern Hardware 263-3502-00L Fall Semester 2012 Cagri Balkesen cagri.balkesen@inf.ethz.ch Department

More information

EPYC Offers x86 Compatibility

EPYC Offers x86 Compatibility EPYC Offers x86 Compatibility By Jag Bolaria Principal Analyst June 2017 www.linleygroup.com EPYC Offer x86 Compatibility By Jag Bolaria, Principal Analyst, The Linley Group A strong processor is worthless

More information

Introduction to Machine/Assembler Language

Introduction to Machine/Assembler Language COMP 40: Machine Structure and Assembly Language Programming Fall 2017 Introduction to Machine/Assembler Language Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

Vectorization on KNL

Vectorization on KNL Vectorization on KNL Steve Lantz Senior Research Associate Cornell University Center for Advanced Computing (CAC) steve.lantz@cornell.edu High Performance Computing on Stampede 2, with KNL, Jan. 23, 2017

More information

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles.

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles. ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls Reference:

More information

Introduction to the Tegra SoC Family and the ARM Architecture. Kristoffer Robin Stokke, PhD FLIR UAS

Introduction to the Tegra SoC Family and the ARM Architecture. Kristoffer Robin Stokke, PhD FLIR UAS Introduction to the Tegra SoC Family and the ARM Architecture Kristoffer Robin Stokke, PhD FLIR UAS Goals of Lecture To give you something concrete to start on Simple introduction to ARMv8 NEON programming

More information

Lecture 3 CIS 341: COMPILERS

Lecture 3 CIS 341: COMPILERS Lecture 3 CIS 341: COMPILERS HW01: Hellocaml! Announcements is due tomorrow tonight at 11:59:59pm. HW02: X86lite Will be available soon look for an announcement on Piazza Pair-programming project Simulator

More information

Intel Advisor XE. Vectorization Optimization. Optimization Notice

Intel Advisor XE. Vectorization Optimization. Optimization Notice Intel Advisor XE Vectorization Optimization 1 Performance is a Proven Game Changer It is driving disruptive change in multiple industries Protecting buildings from extreme events Sophisticated mechanics

More information