SIMD Instructions outside and inside Oracle 12c. Laurent Léturgez 2016
|
|
- Annabelle Patterson
- 6 years ago
- Views:
Transcription
1 SIMD Instructions outside and inside Oracle 2c Laurent Léturgez 206
2 Whoami Oracle Consultant since 200 Former developer (C, Java, perl, PL/SQL) Data Management on Premise and in the Cloud Blogger since (In french and discontinued) Twitter
3 Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c
4 Caveats Most of the topics are from My own researches My past life as a developer Some of the topics are about internals, so: Analysis and conclusion may be incomplete Future versions of Oracle may change the features Tests have been done with Oracle , Oracle Enterprise Linux 7. (UEKR3), VMWare Fusion 8 (And VirtualBox)
5 Before we start Some fundamentals (from Dennis Yurichev s book) CPU register : [ ]The easiest way to understand a register is to think of it as an untyped temporary variable. Imagine if you were working with high-level PL and could only use eight 32-bit (or 64-bit) variables. Yet a lot can be done using just these! Instruction : A primitive CPU command. The simplest examples include: moving data between registers, working with memory and arithmetic primitives. As a rule, each CPU has its own instruction set architecture (ISA). Assembly language : Mnemonic code and some extensions like macros which are intended to make a programmer s life easier.
6 Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c
7 SIMD instructions outside Oracle 2c SIMD stands for Single Instruction Multiple Data Process multiple data In one CPU instruction Based on Specific registers Specific CPU instructions and sets of instructions Not Oracle specific CPU Architecture specific Intel IBM (Altivec) Sparc (VIS) This presentation is mainly about Intel architecture
8 SIMD instructions outside Oracle 2c What is a SIMD register? It s a CPU register Wider than traditional registers (RDI, RSI, R8, R9 etc.) 28 up to 52 bits wide Contains many data
9 SIMD instructions outside Oracle 2c Scalar operation an array of 4 integers {,2,3,4} add to each value CPU CPU CPU CPU CPU Reg Reg Reg Reg Reg 4 Reg2 Reg2 Reg2 Reg2 Reg2 Reg3 Reg3 Reg3 2 Reg3 2 / Reg3 5 RAM RAM RAM RAM RAM In In In In In Out Out Out Out 2 Out LOAD ADD SAVE 4 LOAD 4 ADD 4 SAVE
10 SIMD instructions outside Oracle 2c SIMD operation an array of 4 integers {,2,3,4} add to each value CPU CPU CPU CPU SIMD Reg SIMD Reg SIMD Reg SIMD Reg SIMD Reg2 SIMD Reg2 SIMD Reg2 SIMD Reg2 SIMD Reg3 SIMD Reg3 SIMD Reg SIMD Reg RAM RAM RAM RAM In In In In Out Out Out Out LOAD ADD SAVE
11 SIMD instructions outside Oracle 2c Instruction set MMX SSE SSE2/SSE3/SSSE3/SSE 4 AVX/AVX2 AVX3 or AVX52 Register Size 64 Bits 28 bits 28 bits 256 Bits 52 bits # Registers Register Name MM0 to MM7 XMM0 to XMM7 XMM0 to XMM5 YMM0 to YMM5 ZMM0 to ZMM3 Processors Pentium II Pentium III Pentium IV to Nehalem Sandy Bridge - Haswell Skylake (initially announced but not available yet) Other Only four 32 bits single precision floating point numbers Usage expansion (two 64 bits double precision, four 32 bits integers and up to sixteen 8 bits bytes) Three operand instructions (non destructive) : A+B=C rather than A=A+B Alignements requirements relaxed
12 SIMD instructions outside Oracle 2c Intel API (C/C++) : Intel Intrinsics Guide Sample code:
13 Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c
14 Will my application use SIMD registers and instructions? It depends on : Hardware Consult processors datasheets to see which instruction set extensions are used (if many) Hypervisor Some (old) hypervisors do not support modern extensions VirtualBox versions <5.0 don t support SSE4, AVX and AVX2 Hyper-V on W2008R2-SP needs patch for specific processors to support AVX
15 Will my application use SIMD registers and instructions? It depends on the Operating System AVX (256 bits) is supported from Linux Kernel >= Redhat EL5 : Oracle EL5 w/uek : AVX needs xsave kernel parameter Solaris 0 upd 0 and Solaris Windows 2008 R2 SP
16 Will my application use SIMD registers and instructions? It depends on the compiler GCC > 4.6 for AVX support Use of specific switches (-msse2, -msse4., msse4.2, -mavx, -mavx2 ) Intel C/C++ Compiler (ICC) >. for AVX Support and > 3.0 for AVX2 support Use of specific switches (-xsse4.2, -xavx, -xcore-avx2 ) Beware of optimization switches (-O,-O2, -O3) More disassemble (if you are allowed to J ) Registers Assembler instructions
17 Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c
18 Raw Performance Based on a C program Used CPU: Haswell microarchitecture (Core i7-4960hq). AVX/AVX2 enabled 3 tests : No SIMD, SSE4, AVX Input: one array containing Million values. Goal: Add to each value, each million values repeated 4k, 8k, 6k and 32k times CPU Time(s) = f(#rows) Quick and Dirty Sample code available here:
19 Raw Performance 90 RAW Performance (CPU) for SIMD Instructions 80 85, CPU Time (Sec) , ,46 25, ,5 3,73 0,35 3,3,96 6,8 3,5 7, M. ROWS 892 M. ROWS 6384 M. ROWS M. ROWS NO SIMD SSE4 (XMM Registers) AVX (YMM Registers)
20 Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c
21 SIMD instructions inside Oracle 2c In Memory Data Structure In Memory Compression Unit : IMCU IMCU is the unit of column store allocation Target size is M rows (controlled by _inmemory_imcu_target_rows) One IMCU can contain more than one column Each column in one IMCU is a column unit (CU)
22 SIMD instructions inside Oracle 2c In memory column store storage indexes For each column unit, min and max values are maintained in a storage index Storage Indexes provide CU pruning IMCU Pruning Information about CU available in GV$IM_COL_CU (Undocumented. See Bug ID )
23 SIMD instructions inside Oracle 2c SIMD extensions are used with In Memory storage indexes for efficient filtering. IM Storage Indexes do IMCU pruning 2. SIMD instructions apply efficiently filter predicates Prod-id IMCU Pruning Filtering with SIMD
24 SIMD instructions inside Oracle 2c Oracle 2c uses specific libraries for SIMD (and compression) Located in $ORACLE_HOME/lib libshpksse422.so for SSE4.2 extensions Compiled with ICC v2 with specific xsse4.2 switch libshpkavx2.so for AVX extensions Compiled with ICC v2 with specific xavx switch libshpkavx22.so for AVX2 extensions Not yet implemented (8 functions implemented) No ICC avx2 switch used because ICC v2 doesn t support AVX2 Thanks Tanel Pöder for this J
25 SIMD instructions inside Oracle 2c Oracle SIMD related functions Located in kdzk kernel module (HPK) Part of Advanced Compression library (ADVCMP) Easily tracked with systemtap
26 SIMD instructions inside Oracle 2c How Oracle uses SIMD extensions? It depends on many parameters OS Level : /proc/cpuinfo AVX and AVX2 support SSE4 Support only
27 SIMD instructions inside Oracle 2c Which library am I using? pmap AVX support SSE4 support
28 SIMD instructions inside Oracle 2c Which compiler options have been used? Read comment section in ELF conf]$ readelf -p.comment $ORACLE_HOME/lib/libshpkavx2.so > egrep -i 'intel gcc' egrep 'xavx mavx [ 2c] -?comment:intel(r) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 2.0 Build / -DNTEV_USE_EPOLL -DNET_USE_LDAP -xavx Read the corresponding compiler documentation
29 SIMD instructions inside Oracle 2c How are SIMD registers used by Oracle? GDB To get the call stack (backtrace) To set breakpoints on interesting functions To view register contents (traditional and SIMD) Info registers for traditional registers Info all-registers for all registers (SIMD reg included) (gdb) print $ymmx.<format> Format can be v8_float, v4_double, v32_int8, v6_int6, v8_int32, v4_int64, or v2_int28
30 SIMD instructions inside Oracle 2c In red, register content has been modified In blue, the second part of the SIMD registers (28 bits) is empty
31 SIMD instructions inside Oracle 2c Oracle IM can use AVX or SSE4 extensions for SIMD operations When AVX is used It uses only 28 bits out of 256 bits wide registers AVX adds new register-state through the 256-bit wide YMM register file Explicit operating system support is required to properly save and restore AVX's expanded registers between context switches Without this, only AVX 28-bit is supported
32 SIMD instructions inside Oracle 2c The culprit Oracle is supported from EL5 onwards EL5 Redhat Kernel is and this flag (xsave) is supported from kernels For compatibility reasons, Oracle has to compile its code on kernels
33 Agenda SIMD Instructions, outside Oracle 2c What is a SIMD instruction? Will my application use SIMD? Raw Performance SIMD Instructions, inside Oracle 2c How SIMD instructions are used inside Oracle 2c Tracing SIMD in Oracle 2c
34 Tracing SIMD in Oracle 2c Interesting components to trace for SIMD and/or IMCU Pruning are : ADVCMP_DECOMP.* ADVCMP_DECOMP_HPK : SIMD functions ADVCMP_DECOMP_PCODE : Portable Code Machine (usually not related to specific CPU instructions) IM_optimizer Gives information about CBO calculation related to IM
35 Tracing SIMD in Oracle 2c ADVCMP_DECOMP ADVCMP_DECOMP_HPK Information is available in the trace file (for each IMCU processed) Used library and function Number of rows and counting algorithm Processing rate (comparison and decompression if relevant) But nothing on the results of the processing L
36 Tracing SIMD in Oracle 2c ADVCMP_DECOMP ADVCMP_DECOMP_HPK Gives information about SIMD function usage and filtering (after IMCU pruning) Example: inmemory table with NO MEMCOMPRESS or DML compression
37 Tracing SIMD in Oracle 2c ADVCMP_DECOMP ADVCMP_DECOMP_HPK Example: inmemory compressed table SIMD are used only in the kdzk_eq_dict functions
38 Tracing SIMD in Oracle 2c My thoughts about compression/decompression NO MEMCOMPRESS / COMPRESS FOR DML kdzk*dynp* functions (ex: kdzk_eq_dynp_6bit, kdzk_le_dynp_32bit etc.) FOR QUERY LOW / QUERY HIGH Dictionary Encoding (LZW?) : kdzk_*dict* functions (ex: kdzk_eq_dict_7bit, kdzk_le_dict_4bit etc.) Run Length Encoding: kdzk_burst_rle* functions (ex: kdzk_burst_rle_8bit, kdzk_burst_rle_6bit ) Bit packing compression: kdzk*fixed* functions (ex: kdzk_ge_lt_fixed_32bit, kdzk_lt_fixed_8bit )
39 Tracing SIMD in Oracle 2c My thoughts about compression/decompression FOR CAPACITY LOW FOR QUERY LOW + additional proprietary compression (OZIP) Functions: ozip_decode_dict*, kdzk_ozip_decode* (Ex: kdzk_ozip_decode_dydi, ozip_decode_dict_9_bit etc.) FOR CAPACITY HIGH FOR QUERY HIGH + heavy weigth compression algorithm Compression/decompression method depends on: Datatype Column Compression Unit size Column contents
40
OpenCL Vectorising Features. Andreas Beckmann
Mitglied der Helmholtz-Gemeinschaft OpenCL Vectorising Features Andreas Beckmann Levels of Vectorisation vector units, SIMD devices width, instructions SMX, SP cores Cus, PEs vector operations within kernels
More informationDan Stafford, Justine Bonnot
Dan Stafford, Justine Bonnot Background Applications Timeline MMX 3DNow! Streaming SIMD Extension SSE SSE2 SSE3 and SSSE3 SSE4 Advanced Vector Extension AVX AVX2 AVX-512 Compiling with x86 Vector Processing
More informationGuy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany
Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Motivation C AVX2 AVX512 New instructions utilized! Scalar performance
More informationSIMD: Data parallel execution
ERLANGEN REGIONAL COMPUTING CENTER SIMD: Data parallel execution J. Eitzinger HLRS, 15.6.2018 CPU Stored Program Computer: Base setting Memory for (int j=0; j
More informationSWAR: MMX, SSE, SSE 2 Multiplatform Programming
SWAR: MMX, SSE, SSE 2 Multiplatform Programming Relatore: dott. Matteo Roffilli roffilli@csr.unibo.it 1 What s SWAR? SWAR = SIMD Within A Register SIMD = Single Instruction Multiple Data MMX,SSE,SSE2,Power3DNow
More information( ZIH ) Center for Information Services and High Performance Computing. Overvi ew over the x86 Processor Architecture
( ZIH ) Center for Information Services and High Performance Computing Overvi ew over the x86 Processor Architecture Daniel Molka Ulf Markwardt Daniel.Molka@tu-dresden.de ulf.markwardt@tu-dresden.de Outline
More informationSIMD Exploitation in (JIT) Compilers
SIMD Exploitation in (JIT) Compilers Hiroshi Inoue, IBM Research - Tokyo 1 What s SIMD? Single Instruction Multiple Data Same operations applied for multiple elements in a vector register input 1 A0 input
More informationIntel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2
Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 This release of the Intel C++ Compiler 16.0 product is a Pre-Release, and as such is 64 architecture processor supporting
More informationChapter 5 C. Virtual machines
Chapter 5 C Virtual machines Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple guests Avoids security and reliability problems Aids sharing
More informationHigh Performance Computing and Programming 2015 Lab 6 SIMD and Vectorization
High Performance Computing and Programming 2015 Lab 6 SIMD and Vectorization 1 Introduction The purpose of this lab assignment is to give some experience in using SIMD instructions on x86 and getting compiler
More informationEITF20: Computer Architecture Part2.1.1: Instruction Set Architecture
EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer
More informationCHAPTER 16 - VIRTUAL MACHINES
CHAPTER 16 - VIRTUAL MACHINES 1 OBJECTIVES Explore history and benefits of virtual machines. Discuss the various virtual machine technologies. Describe the methods used to implement virtualization. Show
More informationOracle Database In-Memory
Oracle Database In-Memory Mark Weber Principal Sales Consultant November 12, 2014 Row Format Databases vs. Column Format Databases Row SALES Transactions run faster on row format Example: Insert or query
More informationFigure 1: 128-bit registers introduced by SSE. 128 bits. xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7
SE205 - TD1 Énoncé General Instructions You can download all source files from: https://se205.wp.mines-telecom.fr/td1/ SIMD-like Data-Level Parallelism Modern processors often come with instruction set
More informationIntel X86 Assembler Instruction Set Opcode Table
Intel X86 Assembler Instruction Set Opcode Table x86 Instruction Set Reference. Derived from the September 2014 version of the Intel 64 and IA-32 LGDT, Load Global/Interrupt Descriptor Table Register.
More informationCS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri
CS356: Discussion #6 Assembly Procedures and Arrays Marco Paolieri (paolieri@usc.edu) Procedures Functions are a key abstraction in software They break down a problem into subproblems. Reusable functionality:
More informationCOE608: Computer Organization and Architecture
Add on Instruction Set Architecture COE608: Computer Organization and Architecture Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University Overview More
More informationCMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013
CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013 TOPICS TODAY Moore s Law Evolution of Intel CPUs IA-32 Basic Execution Environment IA-32 General Purpose Registers
More informationRoadmap. Java: Assembly language: OS: Machine code: Computer system:
Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Assembly language: Machine code: Computer system: get_mpg: pushq movq... popq ret %rbp %rsp, %rbp
More informationECE 571 Advanced Microprocessor-Based Design Lecture 4
ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted
More informationThe Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36
The Challenges of X86 Hardware Virtualization GCC- Virtualization: Rajeev Wankar 36 The Challenges of X86 Hardware Virtualization X86 operating systems are designed to run directly on the bare-metal hardware,
More informationTechnical Report. Research Lab: LERIA
Technical Report Improvement of Fitch function for Maximum Parsimony in Phylogenetic Reconstruction with Intel AVX2 assembler instructions Research Lab: LERIA TR20130624-1 Version 1.0 24 June 2013 JEAN-MICHEL
More informationVirtual Machines and Dynamic Translation: Implementing ISAs in Software
Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application
More informationFunctions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth
Functions Ray Seyfarth August 4, 2011 Functions We will write C compatible function C++ can also call C functions using extern "C" {...} It is generally not sensible to write complete assembly programs
More informationEE382M 15: Assignment 2
EE382M 15: Assignment 2 Professor: Lizy K. John TA: Jee Ho Ryoo Department of Electrical and Computer Engineering University of Texas, Austin Due: 11:59PM September 28, 2014 1. Introduction The goal of
More informationEITF20: Computer Architecture Part2.1.1: Instruction Set Architecture
EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer
More informationFFTSS Library Version 3.0 User s Guide
Last Modified: 31/10/07 FFTSS Library Version 3.0 User s Guide Copyright (C) 2002-2007 The Scalable Software Infrastructure Project, is supported by the Development of Software Infrastructure for Large
More informationCompression Device Drivers. Release
Compression Device Drivers Release 18.08.0 August 09, 2018 CONTENTS 1 Compression Device Supported Functionality Matrices 1 1.1 Supported Feature Flags............................... 1 2 ISA-L Compression
More informationSarah Knepper. Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018
Sarah Knepper Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018 Outline Motivation Problem statement and solutions Simple example Performance comparison 2 Motivation Partial differential equations
More informationKampala August, Agner Fog
Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler
More informationComputer System Architecture
CSC 203 1.5 Computer System Architecture Department of Statistics and Computer Science University of Sri Jayewardenepura Instruction Set Architecture (ISA) Level 2 Introduction 3 Instruction Set Architecture
More informationComputer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture
Computer Science 324 Computer Architecture Mount Holyoke College Fall 2007 Topic Notes: MIPS Instruction Set Architecture vonneumann Architecture Modern computers use the vonneumann architecture. Idea:
More informationCS 261 Fall Mike Lam, Professor. x86-64 Data Structures and Misc. Topics
CS 261 Fall 2017 Mike Lam, Professor x86-64 Data Structures and Misc. Topics Topics Homogeneous data structures Arrays Nested / multidimensional arrays Heterogeneous data structures Structs / records Unions
More informationPrinciples of Computer Architecture. Chapter 5: Languages and the Machine
5-1 Chapter 5 - Languages and the Machine Principles of Computer Architecture Miles Murdocca and Vincent Heuring Chapter 5: Languages and the Machine 5-2 Chapter 5 - Languages and the Machine 5.1 The Compilation
More informationCode modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism.
Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism. Parallel + SIMD is the Path Forward Intel Xeon and Intel Xeon Phi Product
More informationCOMP3221: Microprocessors and. and Embedded Systems. Instruction Set Architecture (ISA) What makes an ISA? #1: Memory Models. What makes an ISA?
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) http://www.cse.unsw.edu.au/~cs3221 Lecturer: Hui Wu Session 2, 2005 Instruction Set Architecture (ISA) ISA is
More informationCS 16: Assembly Language Programming for the IBM PC and Compatibles
CS 16: Assembly Language Programming for the IBM PC and Compatibles Discuss the general concepts Look at IA-32 processor architecture and memory management Dive into 64-bit processors Explore the components
More informationCSCI 8530 Advanced Operating Systems. Part 19 Virtualization
CSCI 8530 Advanced Operating Systems Part 19 Virtualization Virtualization This is a very old idea It appears in many different forms A variety of commercial products exist The idea has become hot again
More informationModern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, And AVX PDF
Modern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, And AVX PDF Modern X86 Assembly Language Programming shows the fundamentals of x86 assembly language programming. It focuses on the aspects
More informationThese slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.
11 1 This Set 11 1 These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. Text covers multiple-issue machines in Chapter 4, but
More informationMASM32 error A2070: invalid instruction operands It's unclear what the data size. Use cmp dword inc ecx Instruction operands must be the same size?
Instruction Operands Must Be The Same Size Masm With MASM(32 bit, but however) these two lines are rejected as an error. DWORD test.asm(121) : error A2022:instruction operands must be the same size. The
More informationIntel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant
Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Parallel is the Path Forward Intel Xeon and Intel Xeon Phi Product Families are both going parallel Intel Xeon processor
More informationIntel Parallel Studio XE 2015
2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:
More informationCompression Device Drivers. Release rc3
Compression Device Drivers Release 19.02.0-rc3 January 20, 2019 CONTENTS 1 Compression Device Supported Functionality Matrices 1 1.1 Supported Feature Flags............................... 1 2 ISA-L Compression
More informationRegisters. Registers
All computers have some registers visible at the ISA level. They are there to control execution of the program hold temporary results visible at the microarchitecture level, such as the Top Of Stack (TOS)
More informationCSC 252: Computer Organization Spring 2018: Lecture 5
CSC 252: Computer Organization Spring 2018: Lecture 5 Instructor: Yuhao Zhu Department of Computer Science University of Rochester Action Items: Assignment 1 is due tomorrow, midnight Assignment 2 is out
More informationCharacterization of Native Signal Processing Extensions
Characterization of Native Signal Processing Extensions Jason Law Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 jlaw@mail.utexas.edu Abstract Soon if
More informationSIMD Programming CS 240A, 2017
SIMD Programming CS 240A, 2017 1 Flynn* Taxonomy, 1966 In 2013, SIMD and MIMD most common parallelism in architectures usually both in same system! Most common parallel processing programming style: Single
More informationCS 101, Mock Computer Architecture
CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically
More informationIdentifying performance issues beyond the Oracle wait interface
Identifying performance issues beyond the Oracle wait interface Stefan Koehler 11.11.15 Page 1 About me Stefan Koehler Independent Oracle performance consultant and researcher 12+ years using Oracle RDBMS
More informationCOMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA)
COMP2121: Microprocessors and Interfacing Instruction Set Architecture (ISA) http://www.cse.unsw.edu.au/~cs2121 Lecturer: Hui Wu Session 2, 2017 1 Contents Memory models Registers Data types Instructions
More informationOracle Database In-Memory
Oracle Database In-Memory Under The Hood Andy Cleverly andy.cleverly@oracle.com Director Database Technology Oracle EMEA Technology Safe Harbor Statement The following is intended to outline our general
More informationComputer Basics 1/24/13. Computer Organization. Computer systems consist of hardware and software.
Hardware and Software Computer Basics TOPICS Computer Organization Data Representation Program Execution Computer Languages Computer systems consist of hardware and software. Hardware includes the tangible
More informationMACHINE-LEVEL PROGRAMMING IV: Computer Organization and Architecture
MACHINE-LEVEL PROGRAMMING IV: DATA CS 045 Computer Organization and Architecture Prof. Donald J. Patterson Adapted from Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition
More informationThe x86 Architecture. ICS312 - Spring 2018 Machine-Level and Systems Programming. Henri Casanova
The x86 Architecture ICS312 - Spring 2018 Machine-Level and Systems Programming Henri Casanova (henric@hawaii.edu) The 80x86 Architecture! To learn assembly programming we need to pick a processor family
More informationWhat Transitioning from 32-bit to 64-bit x86 Computing Means Today
What Transitioning from 32-bit to 64-bit x86 Computing Means Today Chris Wanner Senior Architect, Industry Standard Servers Hewlett-Packard 2004 Hewlett-Packard Development Company, L.P. The information
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 27 Virtualization Slides based on Various sources 1 1 Virtualization Why we need virtualization? The concepts and
More informationMemory Models. Registers
Memory Models Most machines have a single linear address space at the ISA level, extending from address 0 up to some maximum, often 2 32 1 bytes or 2 64 1 bytes. Some machines have separate address spaces
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2007 Lecture 14: Virtual Machines 563 L14.1 Fall 2009 Outline Types of Virtual Machine User-level (or Process VMs) System-level Techniques for implementing all
More informationDesign of CPU Simulation Software for ARMv7 Instruction Set Architecture
Design of CPU Simulation Software for ARMv7 Instruction Set Architecture Author: Dillon Tellier Advisor: Dr. Christopher Lupo Date: June 2014 1 INTRODUCTION Simulations have long been a part of the engineering
More informationCS4617 Computer Architecture
1/27 CS4617 Computer Architecture Lecture 7: Instruction Set Architectures Dr J Vaughan October 1, 2014 2/27 ISA Classification Stack architecture: operands on top of stack Accumulator architecture: 1
More informationlast time out-of-order execution and instruction queues the data flow model idea
1 last time 2 out-of-order execution and instruction queues the data flow model idea graph of operations linked by depedencies latency bound need to finish longest dependency chain multiple accumulators
More informationIntroduction to the x86 Architecture. Camiel Vanderhoeven
Introduction to the x86 Architecture Camiel Vanderhoeven September 29, 2015 Introduction to the x86 Architecture This information contains forward looking statements and is provided solely for your convenience.
More informationSSE and SSE2. Timothy A. Chagnon 18 September All images from Intel 64 and IA 32 Architectures Software Developer's Manuals
SSE and SSE2 Timothy A. Chagnon 18 September 2007 All images from Intel 64 and IA 32 Architectures Software Developer's Manuals Overview SSE: Streaming SIMD (Single Instruction Multiple Data) Extensions
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 4
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2018 Lecture 4 LAST TIME Enhanced our processor design in several ways Added branching support Allows programs where work is proportional to the input values
More informationRecent Innovations in Data Storage Technologies Dr Roger MacNicol Software Architect
Recent Innovations in Data Storage Technologies Dr Roger MacNicol Software Architect Copyright 2017, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is intended to
More informationx86 Programming I CSE 351 Winter
x86 Programming I CSE 351 Winter 2017 http://xkcd.com/409/ Administrivia Lab 2 released! Da bomb! Go to section! No Luis OH Later this week 2 Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals
More informationEN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design
EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown
More informationInstruction Set Principles and Examples. Appendix B
Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of
More informationMasterpraktikum Scientific Computing
Masterpraktikum Scientific Computing High-Performance Computing Michael Bader Alexander Heinecke Technische Universität München, Germany Outline Logins Levels of Parallelism Single Processor Systems Von-Neumann-Principle
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 2: Hardware/Software Interface Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Basic computer components How does a microprocessor
More informationCompiling for Scalable Computing Systems the Merit of SIMD. Ayal Zaks Intel Corporation Acknowledgements: too many to list
Compiling for Scalable Computing Systems the Merit of SIMD Ayal Zaks Intel Corporation Acknowledgements: too many to list Takeaways 1. SIMD is mainstream and ubiquitous in HW 2. Compiler support for SIMD
More informationCMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Compilers Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science
More informationCS/COE 0449 term 2174 Lab 5: gdb
CS/COE 0449 term 2174 Lab 5: gdb What is a debugger? A debugger is a program that helps you find logical mistakes in your programs by running them in a controlled way. Undoubtedly by this point in your
More informationEJEMPLOS DE ARQUITECTURAS
Maestría en Electrónica Arquitectura de Computadoras Unidad 4 EJEMPLOS DE ARQUITECTURAS M. C. Felipe Santiago Espinosa Marzo/2017 ARM & MIPS Similarities ARM: the most popular embedded core Similar basic
More information55:132/22C:160, HPCA Spring 2011
55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is
More informationCOMPUTER ORGANIZATION & ARCHITECTURE
COMPUTER ORGANIZATION & ARCHITECTURE Instructions Sets Architecture Lesson 5a 1 What are Instruction Sets The complete collection of instructions that are understood by a CPU Can be considered as a functional
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University 1. Introduction 2. System Structures 3. Process Concept 4. Multithreaded Programming
More informationComputer Basics 1/6/16. Computer Organization. Computer systems consist of hardware and software.
Hardware and Software Computer Basics TOPICS Computer Organization Data Representation Program Execution Computer Languages Computer systems consist of hardware and software. Hardware includes the tangible
More informationComputer Processors. Part 2. Components of a Processor. Execution Unit The ALU. Execution Unit. The Brains of the Box. Processors. Execution Unit (EU)
Part 2 Computer Processors Processors The Brains of the Box Computer Processors Components of a Processor The Central Processing Unit (CPU) is the most complex part of a computer In fact, it is the computer
More informationMIPS ISA-II: Procedure Calls & Program Assembly
MIPS ISA-II: Procedure Calls & Program Assembly Module Outline Reiew ISA and understand instruction encodings Arithmetic and Logical Instructions Reiew memory organization Memory (data moement) instructions
More informationMachine-level Representation of Programs
Machine-level Representation of Programs Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu
More informationTechnology in Action. Chapter 5 System Software: The Operating System, Utility Programs, and File Management
Technology in Action Chapter 5 System Software: The Operating System, Utility Programs, and File Management Chapter Topics Operating System Fundamentals What the Operating System Does The Boot Process:
More informationCS Bootcamp x86-64 Autumn 2015
The x86-64 instruction set architecture (ISA) is used by most laptop and desktop processors. We will be embedding assembly into some of our C++ code to explore programming in assembly language. Depending
More informationECE 486/586. Computer Architecture. Lecture # 8
ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Lecture Topics Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls
More informationInstruction Set Architectures
Instruction Set Architectures! ISAs! Brief history of processors and architectures! C, assembly, machine code! Assembly basics: registers, operands, move instructions 1 What should the HW/SW interface
More informationAssembly Language for x86 Processors 7 th Edition. Chapter 2: x86 Processor Architecture
Assembly Language for x86 Processors 7 th Edition Kip Irvine Chapter 2: x86 Processor Architecture Slides prepared by the author Revision date: 1/15/2014 (c) Pearson Education, 2015. All rights reserved.
More informationGrowth in Cores - A well rehearsed story
Intel CPUs Growth in Cores - A well rehearsed story 2 1. Multicore is just a fad! Copyright 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
More informationRegisters. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth
Registers Ray Seyfarth September 8, 2011 Outline 1 Register basics 2 Moving a constant into a register 3 Moving a value from memory into a register 4 Moving values from a register into memory 5 Moving
More informationMachine-level Representation of Programs. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Machine-level Representation of Programs Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Program? 짬뽕라면 준비시간 :10 분, 조리시간 :10 분 재료라면 1개, 스프 1봉지, 오징어
More informationUsing Intel AVX without Writing AVX
1 White Paper Using Intel AVX without Writing AVX Introduction and Tools Intel Advanced Vector Extensions (Intel AVX) is a new 256-bit instruction set extension to Intel Streaming SIMD Extensions (Intel
More informationExercise Session 6. Data Processing on Modern Hardware L Fall Semester Cagri Balkesen
Cagri Balkesen Data Processing on Modern Hardware Exercises Fall 2012 1 Exercise Session 6 Data Processing on Modern Hardware 263-3502-00L Fall Semester 2012 Cagri Balkesen cagri.balkesen@inf.ethz.ch Department
More informationEPYC Offers x86 Compatibility
EPYC Offers x86 Compatibility By Jag Bolaria Principal Analyst June 2017 www.linleygroup.com EPYC Offer x86 Compatibility By Jag Bolaria, Principal Analyst, The Linley Group A strong processor is worthless
More informationIntroduction to Machine/Assembler Language
COMP 40: Machine Structure and Assembly Language Programming Fall 2017 Introduction to Machine/Assembler Language Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah
More informationStorage I/O Summary. Lecture 16: Multimedia and DSP Architectures
Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal
More informationVectorization on KNL
Vectorization on KNL Steve Lantz Senior Research Associate Cornell University Center for Advanced Computing (CAC) steve.lantz@cornell.edu High Performance Computing on Stampede 2, with KNL, Jan. 23, 2017
More informationLecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles.
ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls Reference:
More informationIntroduction to the Tegra SoC Family and the ARM Architecture. Kristoffer Robin Stokke, PhD FLIR UAS
Introduction to the Tegra SoC Family and the ARM Architecture Kristoffer Robin Stokke, PhD FLIR UAS Goals of Lecture To give you something concrete to start on Simple introduction to ARMv8 NEON programming
More informationLecture 3 CIS 341: COMPILERS
Lecture 3 CIS 341: COMPILERS HW01: Hellocaml! Announcements is due tomorrow tonight at 11:59:59pm. HW02: X86lite Will be available soon look for an announcement on Piazza Pair-programming project Simulator
More informationIntel Advisor XE. Vectorization Optimization. Optimization Notice
Intel Advisor XE Vectorization Optimization 1 Performance is a Proven Game Changer It is driving disruptive change in multiple industries Protecting buildings from extreme events Sophisticated mechanics
More information