SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015

Similar documents

AMD IOMMU VERSION 2 How KVM will use it. Jörg Rödel August 16th, 2011

OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER

AMD Graphics Team Last Updated February 11, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview February 2013 Approved for public distribution

AMD RYZEN PROCESSOR WITH RADEON VEGA GRAPHICS CORPORATE BRAND GUIDELINES

SIMULATOR AMD RESEARCH JUNE 14, 2015

AMD Graphics Team Last Updated April 29, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview April 2013 Approved for public distribution

AMD APU and Processor Comparisons. AMD Client Desktop Feb 2013 AMD

ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling

Vulkan (including Vulkan Fast Paths)

D3D12 & Vulkan: Lessons learned. Dr. Matthäus G. Chajdas Developer Technology Engineer, AMD

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

Panel Discussion: The Future of I/O From a CPU Architecture Perspective

Multi-core processors are here, but how do you resolve data bottlenecks in native code?

Designing Natural Interfaces

MEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE

Understanding GPGPU Vector Register File Usage

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

STREAMING VIDEO DATA INTO 3D APPLICATIONS Session Christopher Mayer AMD Sr. Software Engineer

ROCm: An open platform for GPU computing exploration

Automatic Intra-Application Load Balancing for Heterogeneous Systems

Run Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008

HPG 2011 HIGH PERFORMANCE GRAPHICS HOT 3D

KVM CPU MODEL IN SYSCALL EMULATION MODE ALEXANDRU DUTU, JOHN SLICE JUNE 14, 2015

HyperTransport Technology

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT

FLASH MEMORY SUMMIT Adoption of Caching & Hybrid Solutions

LIQUIDVR TODAY AND TOMORROW GUENNADI RIGUER, SOFTWARE ARCHITECT

3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

CAUTIONARY STATEMENT 1 AMD NEXT HORIZON NOVEMBER 6, 2018

Gestural and Cinematic Interfaces - DX11. David Brebner Unlimited Realities CTO

RegMutex: Inter-Warp GPU Register Time-Sharing

AMD HD3D Technology. Setup Guide. 1 AMD HD3D TECHNOLOGY: Setup Guide

viewdle! - machine vision experts

Sequential Consistency for Heterogeneous-Race-Free

HIGHLY PARALLEL COMPUTING IN PHYSICS-BASED RENDERING OpenCL Raytracing Based. Thibaut PRADOS OPTIS Real-Time & Virtual Reality Manager

The Road to the AMD. Fiji GPU. Featuring Die Stacking and HBM Technology 1 THE ROAD TO THE AMD FIJI GPU ECTC 2016 MAY 2015

Generic System Calls for GPUs

Pattern-based analytics to estimate and track yield risk of designs down to 7nm

AMD Radeon ProRender plug-in for Unreal Engine. Installation Guide

EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS

AMD AIB Partner Guidelines. Version February, 2015

AMD EPYC CORPORATE BRAND GUIDELINES

Maximizing Six-Core AMD Opteron Processor Performance with RHEL

Anatomy of AMD s TeraScale Graphics Engine

AMD RYZEN CORPORATE BRAND GUIDELINES

BIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM. Dong Ping Zhang Heterogeneous System Architecture AMD

Desktop Telepresence Arrived! Sudha Valluru ViVu CEO

PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017

SCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL

INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS

AMD Radeon HD 2900 Highlights

ACCELERATING MATRIX PROCESSING WITH GPUs. Nicholas Malaya, Shuai Che, Joseph Greathouse, Rene van Oostrum, and Michael Schulte AMD Research

Changing your Driver Options with Radeon Pro Settings. Quick Start User Guide v2.1

GPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011

Accelerating Applications. the art of maximum performance computing James Spooner Maxeler VP of Acceleration

Changing your Driver Options with Radeon Pro Settings. Quick Start User Guide v3.0

Driver Options in AMD Radeon Pro Settings. User Guide

Fusion Enabled Image Processing

DR. LISA SU

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

Heterogeneous Computing

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

AMD SEV Update Linux Security Summit David Kaplan, Security Architect

HPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture

FUSION PROCESSORS AND HPC

High Performance Graphics 2010

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

Cilk Plus: Multicore extensions for C and C++

Fan Control in AMD Radeon Pro Settings. User Guide. This document is a quick user guide on how to configure GPU fan speed in AMD Radeon Pro Settings.

clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018

1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING

Introducing NVDIMM-X: Designed to be the World s Fastest NAND-Based SSD Architecture and a Platform for the Next Generation of New Media SSDs

Solid State Graphics (SSG) SDK Setup and Raw Video Player Guide

NEXT-GENERATION MATRIX 3D IMMERSIVE USER INTERFACE [ M3D-IUI ] H Raghavendra Swamy AMD Senior Software Engineer

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

AMD Radeon ProRender plug-in for Universal Scene Description. Installation Guide

MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011

INTRODUCING RYZEN MARCH

NUMA Topology for AMD EPYC Naples Family Processors

MULTIMEDIA PROCESSING Real-time H.264 video enhancement by using AMD APP SDK

Global Media Highlights. February 2014

Graphics Hardware 2008

1 Presentation Title Month ##, 2012

Microsoft Windows 2016 Mellanox 100GbE NIC Tuning Guide

Thermal Design Guide for Socket SP3 Processors

オープンソ プンソース技術者のための AMD 最新テクノロジーアップデート 日本 AMD 株式会社 マーケティング ビジネス開発本部 エンタープライズプロダクトマーケティング部 山野 洋幸

Forza Horizon 4 Benchmark Guide

Grafica Computazionale: Lezione 30. Grafica Computazionale. Hiding complexity... ;) Introduction to OpenGL. lezione30 Introduction to OpenGL

Mali Offline Compiler User Guide

AMD 780G. Niles Burbank AMD. an x86 chipset with advanced integrated GPU. Hot Chips 2008

User Manual. Nvidia Jetson Series Carrier board Aetina ACE-N622

Order Independent Transparency with Dual Depth Peeling. Louis Bavoil, Kevin Myers

Soft Particles. Tristan Lorach

Transcription:

SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015

PROBLEM Shaders are compiled in draw calls Emulating certain features in shaders Drivers keep shaders in some intermediate representation And insert additional code based on the states While compiling, everything stops Number of state combinations is exponential 2 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

EMULATED STATES Fragment shader: Conversion to colorbuffer formats (RGBA32, RGBA FP16, ) Alpha-test Selecting between front and back colors gl_fragcolor GL_ALPHA_TO_ONE Polygon stippling Line & polygon smoothing Point smoothing Fragment color clamping 3 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

EMULATED STATES, CONT. Vertex shader: Loading inputs from vertex buffers manually Vertex color clamping 4 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

IDEA Observation: All states can be applied at the beginning or end of shaders At link time, compile application shaders At draw time, append any shader bytecode needed 3 shader sections: Prolog section Main section (application shader) Epilog section Concatenate them 5 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

FRAGMENT SHADER EPILOG Color outputs are expected in r0, r1, out0 = r0; out1 = r1; 6 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

FRAGMENT SHADER EPILOG If we need alpha-test: if (!alphafunc(r0.w, alpharef)) discard; out0 = r0; out1 = r1; 7 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

FRAGMENT SHADER EPILOG If we need color clamping: r0 = clamp(r0, 0, 1); r1 = clamp(r1, 0, 1); if (!alphafunc(r0.w, alpharef)) discard; out0 = r0; out1 = r1; 8 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

FRAGMENT SHADER EPILOG If we need polygon stippling: r0 = clamp(r0, 0, 1); r1 = clamp(r1, 0, 1); if (!alphafunc(r0.w, alpharef)) discard; if (texture2d(stipple, gl_fragcoord.xy / 32).x < 0.5) discard; out0 = r0; out1 = r1; 9 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

FRAGMENT SHADER EPILOG If we need smoothing: r0 = clamp(r0, 0, 1); r1 = clamp(r1, 0, 1); if (!alphafunc(r0.w, alpharef)) discard; if (texture2d(stipple, gl_fragcoord.xy / 32).x < 0.5) discard; r0.w *= coveragemask; // popcount(gl_samplemaskin) / gl_numsamples r1.w *= coveragemask; out0 = r0; out1 = r1; 10 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

FRAGMENT SHADER EPILOG If color conversion is required: r0 = clamp(r0, 0, 1); r1 = clamp(r1, 0, 1); if (!alphafunc(r0.w, alpharef)) discard; if (texture2d(stipple, gl_fragcoord.xy / 32).x < 0.5) discard; r0.w *= coveragemask; // popcount(gl_samplemaskin) / gl_numsamples r1.w *= coveragemask; r0.xy = vec2(packhalf2x16(r0.xy), packhalf2x16(r0.zw)); r1.xy = vec2(packhalf2x16(r1.xy), packhalf2x16(r1.zw)); out0 = r0; out1 = r1; 11 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

FRAGMENT SHADER EPILOG If GL_ALPHA_TO_ONE is enabled: r0 = clamp(r0, 0, 1); r1 = clamp(r1, 0, 1); if (!alphafunc(r0.w, alpharef)) discard; if (texture2d(stipple, gl_fragcoord.xy / 32).x < 0.5) discard; r0.w *= coveragemask; // popcount(gl_samplemaskin) / gl_numsamples r1.w *= coveragemask; r0.w = 1; r0.xy = vec2(packhalf2x16(r0.xy), packhalf2x16(r0.zw)); r1.xy = vec2(packhalf2x16(r1.xy), packhalf2x16(r1.zw)); out0 = r0; out1 = r1; 12 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

FRAGMENT SHADER PROLOG Only contains two-side color selection Decreases performance if done always 3 scenarios: Two-side colors are enabled: Select colors based on gl_frontfacing Store them into registers r0, r1 Two-side colors are disabled: Just copy front colors into r0, r1 No color inputs => prolog is empty Application shader should read colors from r0, r1 13 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

COMPILING PROLOGS/EPILOGS Still have to be compiled in draw calls Can be slow Use an assembler instead of the compiler Our LLVM backend has an assembler too 14 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

VERTEX SHADER INPUTS R600 had fetch shader Removed since GCN Current implementation: One buffer per input Instance divisor == 0: Fetch BaseVertex + VertexID Instance divisor!= 0: Fetch StartInstance + (InstanceID / instance divisor) 15 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

VERTEX SHADER PROLOG Emulate fetch shader with prolog section Drawback: can t move loads to hide latencies, register usage Instead, only calculate load addresses: Prolog writes the addresses to r0,r1, Main shader section executes the loads 16 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

VERTEX SHADER EPILOG? Radeon has 3 ways to write VS outputs: For rasterizer For geometry shader For tessellation control shader Don t use an epilog OpenGL sometimes knows which shader follows If not, compile all 3 variants with 3 threads in parallel Piglit only: Compile on demand in draw calls Vertex color clamping: use conditional assignment 17 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

MESA STATE TRACKER Middle-end, translates shaders from GLSL IR into TGSI Does that in draw calls State dependencies for draw calls: Center vs sample interpolation Instead, select coordinates with conditional assignment Vertex and fragment color clamping GL rendering context Any dependencies should be dealt with in drivers Other drivers will benefit too GLSL->TGSI always done at link time 18 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

IF GAMES COMPILE TOO LATE Compiling at link time doesn t help Use shader cache 1 shader variant => shader cache in core Mesa If games compile early => don t need it 19 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

SKIP MESA OPTIMIZATIONS? Our LLVM backend can do most optimizations No need to do them in Mesa Mesa/GLSL passes we do need: Demoting inputs/outputs to local variables (dead code elimination?) Function inlining Breaking built-in input/output arrays into variables 20 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015

21 SOLUTION TO SHADER RECOMPILES IN IN RADEONSI SEPTEMBER 17, 2015 Questions?

THANK YOU

DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION 2015 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners. 23 SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 17, 2015