SIMD Instructions outside and inside Oracle 12c. Laurent Léturgez 2016

Similar documents
OpenCL Vectorising Features. Andreas Beckmann

Dan Stafford, Justine Bonnot

Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany

SIMD: Data parallel execution

SWAR: MMX, SSE, SSE 2 Multiplatform Programming

( ZIH ) Center for Information Services and High Performance Computing. Overvi ew over the x86 Processor Architecture

SIMD Exploitation in (JIT) Compilers

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

Chapter 5 C. Virtual machines

High Performance Computing and Programming 2015 Lab 6 SIMD and Vectorization

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

CHAPTER 16 - VIRTUAL MACHINES

Oracle Database In-Memory

Figure 1: 128-bit registers introduced by SSE. 128 bits. xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7

Intel X86 Assembler Instruction Set Opcode Table

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri

COE608: Computer Organization and Architecture

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

ECE 571 Advanced Microprocessor-Based Design Lecture 4

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36

Technical Report. Research Lab: LERIA

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Functions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth

EE382M 15: Assignment 2

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

FFTSS Library Version 3.0 User s Guide

Compression Device Drivers. Release

Sarah Knepper. Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018

Kampala August, Agner Fog

Computer System Architecture

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

CS 261 Fall Mike Lam, Professor. x86-64 Data Structures and Misc. Topics

Principles of Computer Architecture. Chapter 5: Languages and the Machine

Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism.

COMP3221: Microprocessors and. and Embedded Systems. Instruction Set Architecture (ISA) What makes an ISA? #1: Memory Models. What makes an ISA?

CS 16: Assembly Language Programming for the IBM PC and Compatibles

CSCI 8530 Advanced Operating Systems. Part 19 Virtualization

Modern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, And AVX PDF

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

MASM32 error A2070: invalid instruction operands It's unclear what the data size. Use cmp dword inc ecx Instruction operands must be the same size?

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

Intel Parallel Studio XE 2015

Compression Device Drivers. Release rc3

Registers. Registers

CSC 252: Computer Organization Spring 2018: Lecture 5

Characterization of Native Signal Processing Extensions

SIMD Programming CS 240A, 2017

CS 101, Mock Computer Architecture

Identifying performance issues beyond the Oracle wait interface

COMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA)

Oracle Database In-Memory

Computer Basics 1/24/13. Computer Organization. Computer systems consist of hardware and software.

MACHINE-LEVEL PROGRAMMING IV: Computer Organization and Architecture

The x86 Architecture. ICS312 - Spring 2018 Machine-Level and Systems Programming. Henri Casanova

What Transitioning from 32-bit to 64-bit x86 Computing Means Today

CS370 Operating Systems

Memory Models. Registers

Advanced Computer Architecture

Design of CPU Simulation Software for ARMv7 Instruction Set Architecture

CS4617 Computer Architecture

last time out-of-order execution and instruction queues the data flow model idea

Introduction to the x86 Architecture. Camiel Vanderhoeven

SSE and SSE2. Timothy A. Chagnon 18 September All images from Intel 64 and IA 32 Architectures Software Developer's Manuals

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4

Recent Innovations in Data Storage Technologies Dr Roger MacNicol Software Architect

x86 Programming I CSE 351 Winter

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

Instruction Set Principles and Examples. Appendix B

Masterpraktikum Scientific Computing

ECE232: Hardware Organization and Design

Compiling for Scalable Computing Systems the Merit of SIMD. Ayal Zaks Intel Corporation Acknowledgements: too many to list

CMSC 611: Advanced Computer Architecture

CS/COE 0449 term 2174 Lab 5: gdb

EJEMPLOS DE ARQUITECTURAS

55:132/22C:160, HPCA Spring 2011

COMPUTER ORGANIZATION & ARCHITECTURE

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Computer Basics 1/6/16. Computer Organization. Computer systems consist of hardware and software.

Computer Processors. Part 2. Components of a Processor. Execution Unit The ALU. Execution Unit. The Brains of the Box. Processors. Execution Unit (EU)

MIPS ISA-II: Procedure Calls & Program Assembly

Machine-level Representation of Programs

Technology in Action. Chapter 5 System Software: The Operating System, Utility Programs, and File Management

CS Bootcamp x86-64 Autumn 2015

ECE 486/586. Computer Architecture. Lecture # 8

Instruction Set Architectures

Assembly Language for x86 Processors 7 th Edition. Chapter 2: x86 Processor Architecture

Growth in Cores - A well rehearsed story

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth

Machine-level Representation of Programs. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Using Intel AVX without Writing AVX

Exercise Session 6. Data Processing on Modern Hardware L Fall Semester Cagri Balkesen

EPYC Offers x86 Compatibility

Introduction to Machine/Assembler Language

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Vectorization on KNL

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles.

Introduction to the Tegra SoC Family and the ARM Architecture. Kristoffer Robin Stokke, PhD FLIR UAS

Lecture 3 CIS 341: COMPILERS

Intel Advisor XE. Vectorization Optimization. Optimization Notice

Transcription:

SIMD Instructions outside and inside Oracle 2c Laurent Léturgez 206

Whoami Oracle Consultant since 200 Former developer (C, Java, perl, PL/SQL) Owner@Premiseo: Data Management on Premise and in the Cloud Blogger since 2004 http://laurent.leturgez.free.fr (In french and discontinued) http://laurent-leturgez.com Twitter : @lleturgez

Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c

Caveats Most of the topics are from My own researches My past life as a developer Some of the topics are about internals, so: Analysis and conclusion may be incomplete Future versions of Oracle may change the features Tests have been done with Oracle 2..0.2, Oracle Enterprise Linux 7. (UEKR3), VMWare Fusion 8 (And VirtualBox)

Before we start Some fundamentals (from Dennis Yurichev s book) CPU register : [ ]The easiest way to understand a register is to think of it as an untyped temporary variable. Imagine if you were working with high-level PL and could only use eight 32-bit (or 64-bit) variables. Yet a lot can be done using just these! Instruction : A primitive CPU command. The simplest examples include: moving data between registers, working with memory and arithmetic primitives. As a rule, each CPU has its own instruction set architecture (ISA). Assembly language : Mnemonic code and some extensions like macros which are intended to make a programmer s life easier. http://beginners.re/reverse_engineering_for_beginners-en.pdf

Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c

SIMD instructions outside Oracle 2c SIMD stands for Single Instruction Multiple Data Process multiple data In one CPU instruction Based on Specific registers Specific CPU instructions and sets of instructions Not Oracle specific CPU Architecture specific Intel IBM (Altivec) Sparc (VIS) This presentation is mainly about Intel architecture

SIMD instructions outside Oracle 2c What is a SIMD register? It s a CPU register Wider than traditional registers (RDI, RSI, R8, R9 etc.) 28 up to 52 bits wide Contains many data

SIMD instructions outside Oracle 2c Scalar operation an array of 4 integers {,2,3,4} add to each value CPU CPU CPU CPU CPU Reg Reg Reg Reg Reg 4 Reg2 Reg2 Reg2 Reg2 Reg2 Reg3 Reg3 Reg3 2 Reg3 2 / Reg3 5 RAM RAM RAM RAM RAM In 2 3 4 In 2 3 4 In 2 3 4 In 2 3 4 In 2 3 4 Out Out Out Out 2 Out 2 3 4 5 LOAD ADD SAVE 4 LOAD 4 ADD 4 SAVE

SIMD instructions outside Oracle 2c SIMD operation an array of 4 integers {,2,3,4} add to each value CPU CPU CPU CPU SIMD Reg SIMD Reg 2 3 4 SIMD Reg 2 3 4 SIMD Reg 2 3 4 SIMD Reg2 SIMD Reg2 SIMD Reg2 SIMD Reg2 SIMD Reg3 SIMD Reg3 SIMD Reg3 2 3 4 5 SIMD Reg3 2 3 4 5 RAM RAM RAM RAM In 2 3 4 In 2 3 4 In 2 3 4 In 2 3 4 Out Out Out Out 2 3 4 5 LOAD ADD SAVE

SIMD instructions outside Oracle 2c Instruction set MMX SSE SSE2/SSE3/SSSE3/SSE 4 AVX/AVX2 AVX3 or AVX52 Register Size 64 Bits 28 bits 28 bits 256 Bits 52 bits # Registers 8 8 6 6 32 Register Name MM0 to MM7 XMM0 to XMM7 XMM0 to XMM5 YMM0 to YMM5 ZMM0 to ZMM3 Processors Pentium II Pentium III Pentium IV to Nehalem Sandy Bridge - Haswell Skylake (initially announced but not available yet) Other Only four 32 bits single precision floating point numbers Usage expansion (two 64 bits double precision, four 32 bits integers and up to sixteen 8 bits bytes) Three operand instructions (non destructive) : A+B=C rather than A=A+B Alignements requirements relaxed

SIMD instructions outside Oracle 2c Intel API (C/C++) : Intel Intrinsics Guide https://software.intel.com/sites/landingpage/intrinsicsguide/ Sample code: https://app.box.com/simdsamplec-205

Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c

Will my application use SIMD registers and instructions? It depends on : Hardware Consult processors datasheets to see which instruction set extensions are used (if many) http://ark.intel.com/#@processors Hypervisor Some (old) hypervisors do not support modern extensions VirtualBox versions <5.0 don t support SSE4, AVX and AVX2 Hyper-V on W2008R2-SP needs patch for specific processors to support AVX

Will my application use SIMD registers and instructions? It depends on the Operating System AVX (256 bits) is supported from Linux Kernel >= 2.6.30 Redhat EL5 : 2.6.8 Oracle EL5 w/uek : 2.6.32 AVX needs xsave kernel parameter Solaris 0 upd 0 and Solaris Windows 2008 R2 SP

Will my application use SIMD registers and instructions? It depends on the compiler GCC > 4.6 for AVX support Use of specific switches (-msse2, -msse4., msse4.2, -mavx, -mavx2 ) Intel C/C++ Compiler (ICC) >. for AVX Support and > 3.0 for AVX2 support Use of specific switches (-xsse4.2, -xavx, -xcore-avx2 ) Beware of optimization switches (-O,-O2, -O3) More disassemble (if you are allowed to J ) Registers Assembler instructions

Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c

Raw Performance Based on a C program Used CPU: Haswell microarchitecture (Core i7-4960hq). AVX/AVX2 enabled 3 tests : No SIMD, SSE4, AVX Input: one array containing Million values. Goal: Add to each value, each million values repeated 4k, 8k, 6k and 32k times CPU Time(s) = f(#rows) Quick and Dirty Sample code available here: https://app.box.com/s/ibmnbblpho4xtbeq2x8ir60nrk37208v

Raw Performance 90 RAW Performance (CPU) for SIMD Instructions 80 85,64 70 60 CPU Time (Sec) 50 40 42,35 30 20 20,46 25,58 0 0 5,5 3,73 0,35 3,3,96 6,8 3,5 7,23 4096 M. ROWS 892 M. ROWS 6384 M. ROWS 32768 M. ROWS NO SIMD SSE4 (XMM Registers) AVX (YMM Registers)

Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c

SIMD instructions inside Oracle 2c In Memory Data Structure In Memory Compression Unit : IMCU IMCU is the unit of column store allocation Target size is M rows (controlled by _inmemory_imcu_target_rows) One IMCU can contain more than one column Each column in one IMCU is a column unit (CU)

SIMD instructions inside Oracle 2c In memory column store storage indexes For each column unit, min and max values are maintained in a storage index Storage Indexes provide CU pruning IMCU Pruning Information about CU available in GV$IM_COL_CU (Undocumented. See Bug ID 936690)

SIMD instructions inside Oracle 2c SIMD extensions are used with In Memory storage indexes for efficient filtering. IM Storage Indexes do IMCU pruning 2. SIMD instructions apply efficiently filter predicates Prod-id IMCU Pruning Filtering with SIMD 0 0 4 4 0

SIMD instructions inside Oracle 2c Oracle 2c uses specific libraries for SIMD (and compression) Located in $ORACLE_HOME/lib libshpksse422.so for SSE4.2 extensions Compiled with ICC v2 with specific xsse4.2 switch libshpkavx2.so for AVX extensions Compiled with ICC v2 with specific xavx switch libshpkavx22.so for AVX2 extensions Not yet implemented (8 functions implemented) No ICC avx2 switch used because ICC v2 doesn t support AVX2 Thanks Tanel Pöder for this J

SIMD instructions inside Oracle 2c Oracle SIMD related functions Located in kdzk kernel module (HPK) Part of Advanced Compression library (ADVCMP) Easily tracked with systemtap

SIMD instructions inside Oracle 2c How Oracle uses SIMD extensions? It depends on many parameters OS Level : /proc/cpuinfo AVX and AVX2 support SSE4 Support only

SIMD instructions inside Oracle 2c Which library am I using? pmap AVX support SSE4 support

SIMD instructions inside Oracle 2c Which compiler options have been used? Read comment section in ELF [oracle@oel7 conf]$ readelf -p.comment $ORACLE_HOME/lib/libshpkavx2.so > egrep -i 'intel gcc' egrep 'xavx mavx [ 2c] -?comment:intel(r) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 2.0 Build 202073 / -DNTEV_USE_EPOLL -DNET_USE_LDAP -xavx Read the corresponding compiler documentation

SIMD instructions inside Oracle 2c How are SIMD registers used by Oracle? GDB To get the call stack (backtrace) To set breakpoints on interesting functions To view register contents (traditional and SIMD) Info registers for traditional registers Info all-registers for all registers (SIMD reg included) (gdb) print $ymmx.<format> Format can be v8_float, v4_double, v32_int8, v6_int6, v8_int32, v4_int64, or v2_int28

SIMD instructions inside Oracle 2c In red, register content has been modified In blue, the second part of the SIMD registers (28 bits) is empty

SIMD instructions inside Oracle 2c Oracle IM can use AVX or SSE4 extensions for SIMD operations When AVX is used It uses only 28 bits out of 256 bits wide registers AVX adds new register-state through the 256-bit wide YMM register file Explicit operating system support is required to properly save and restore AVX's expanded registers between context switches Without this, only AVX 28-bit is supported

SIMD instructions inside Oracle 2c The culprit Oracle 2..0.2 is supported from EL5 onwards EL5 Redhat Kernel is 2.6.8 and this flag (xsave) is supported from 2.6.30 kernels For compatibility reasons, Oracle has to compile its code on 2.6.8 kernels

Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c

Tracing SIMD in Oracle 2c Interesting components to trace for SIMD and/or IMCU Pruning are : ADVCMP_DECOMP.* ADVCMP_DECOMP_HPK : SIMD functions ADVCMP_DECOMP_PCODE : Portable Code Machine (usually not related to specific CPU instructions) IM_optimizer Gives information about CBO calculation related to IM

Tracing SIMD in Oracle 2c ADVCMP_DECOMP ADVCMP_DECOMP_HPK Information is available in the trace file (for each IMCU processed) Used library and function Number of rows and counting algorithm Processing rate (comparison and decompression if relevant) But nothing on the results of the processing L

Tracing SIMD in Oracle 2c ADVCMP_DECOMP ADVCMP_DECOMP_HPK Gives information about SIMD function usage and filtering (after IMCU pruning) Example: inmemory table with NO MEMCOMPRESS or DML compression

Tracing SIMD in Oracle 2c ADVCMP_DECOMP ADVCMP_DECOMP_HPK Example: inmemory compressed table SIMD are used only in the kdzk_eq_dict functions

Tracing SIMD in Oracle 2c My thoughts about compression/decompression NO MEMCOMPRESS / COMPRESS FOR DML kdzk*dynp* functions (ex: kdzk_eq_dynp_6bit, kdzk_le_dynp_32bit etc.) FOR QUERY LOW / QUERY HIGH Dictionary Encoding (LZW?) : kdzk_*dict* functions (ex: kdzk_eq_dict_7bit, kdzk_le_dict_4bit etc.) Run Length Encoding: kdzk_burst_rle* functions (ex: kdzk_burst_rle_8bit, kdzk_burst_rle_6bit ) Bit packing compression: kdzk*fixed* functions (ex: kdzk_ge_lt_fixed_32bit, kdzk_lt_fixed_8bit )

Tracing SIMD in Oracle 2c My thoughts about compression/decompression FOR CAPACITY LOW FOR QUERY LOW + additional proprietary compression (OZIP) Functions: ozip_decode_dict*, kdzk_ozip_decode* (Ex: kdzk_ozip_decode_dydi, ozip_decode_dict_9_bit etc.) FOR CAPACITY HIGH FOR QUERY HIGH + heavy weigth compression algorithm Compression/decompression method depends on: Datatype Column Compression Unit size Column contents

laurent.leturgez@premiseo.com http://laurent-leturgez.com @lleturgez www.premiseo.com