Understanding Shaders and WebGL. Chris Dalton & Olli Etuaho

Similar documents
X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Portland State University ECE 588/688. Graphics Processors

Graphics Hardware. Instructor Stephen J. Guy

CS427 Multicore Architecture and Parallel Computing

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Graphics Processing Unit Architecture (GPU Arch)

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Shaders. Slide credit to Prof. Zwicker

Multi-Processors and GPU

WebGL and GLSL Basics. CS559 Fall 2015 Lecture 10 October 6, 2015

Antonio R. Miele Marco D. Santambrogio

Pipeline Operations. CS 4620 Lecture 14

The Graphics Pipeline and OpenGL III: OpenGL Shading Language (GLSL 1.10)!

Windowing System on a 3D Pipeline. February 2005

The NVIDIA GeForce 8800 GPU

The Graphics Pipeline and OpenGL III: OpenGL Shading Language (GLSL 1.10)!

WebGL and GLSL Basics. CS559 Fall 2016 Lecture 14 October

Working with Metal Overview

Mobile graphics API Overview

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

12.2 Programmable Graphics Hardware

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

CS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay

Lecture 13: OpenGL Shading Language (GLSL)

CS195V Week 9. GPU Architecture and Other Shading Languages

Spring 2011 Prof. Hyesoon Kim

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Optimisation. CS7GV3 Real-time Rendering

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

GeForce4. John Montrym Henry Moreton

GPU Quality and Application Portability

Shaders (some slides taken from David M. course)

Programmable Graphics Hardware

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Lecture 2. Shaders, GLSL and GPGPU

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Persistent Background Effects for Realtime Applications Greg Lane, April 2009

CS GPU and GPGPU Programming Lecture 7: Shading and Compute APIs 1. Markus Hadwiger, KAUST

CS451Real-time Rendering Pipeline

GPUs and GPGPUs. Greg Blanton John T. Lubia

Ciril Bohak. - INTRODUCTION TO WEBGL

Lecture 5 Vertex and Fragment Shaders-1. CITS3003 Graphics & Animation

Programmable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes

Spring 2009 Prof. Hyesoon Kim

GLSL Introduction. Fu-Chung Huang. Thanks for materials from many other people

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Could you make the XNA functions yourself?

frame buffer depth buffer stencil buffer

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Threading Hardware in G80

GLSL Introduction. Fu-Chung Huang. Thanks for materials from many other people

PowerVR Hardware. Architecture Overview for Developers

Mattan Erez. The University of Texas at Austin

CSE 167: Introduction to Computer Graphics Lecture #7: Lights. Jürgen P. Schulze, Ph.D. University of California, San Diego Spring Quarter 2015

CS4621/5621 Fall Computer Graphics Practicum Intro to OpenGL/GLSL

Rendering Objects. Need to transform all geometry then

Game Technology. Lecture Physically Based Rendering. Dipl-Inform. Robert Konrad Polona Caserman, M.Sc.

Introduction to CUDA

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

CSE 167: Lecture #8: Lighting. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2011

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27

Real-Time Rendering Architectures

C P S C 314 S H A D E R S, O P E N G L, & J S RENDERING PIPELINE. Mikhail Bessmeltsev

From Shader Code to a Teraflop: How Shader Cores Work

Comparison of High-Speed Ray Casting on GPU

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems

Tesla Architecture, CUDA and Optimization Strategies

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

Programming shaders & GPUs Christian Miller CS Fall 2011

Blis: Better Language for Image Stuff Project Proposal Programming Languages and Translators, Spring 2017

CS 179: GPU Programming

CS 354R: Computer Game Technology

Dave Shreiner, ARM March 2009

CSE 167: Introduction to Computer Graphics Lecture #6: Lights. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2014

Rasterization-based pipeline

Introduction to OpenGL ES 3.0

GRAPHICS PROCESSING UNITS

CS559 Computer Graphics Fall 2015

From Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian)

GPU! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room, Bld 20! 11 December, 2017!

ECE 574 Cluster Computing Lecture 16

Cg: A system for programming graphics hardware in a C-like language

COMP371 COMPUTER GRAPHICS

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

PowerVR Series5. Architecture Guide for Developers

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

WebGL A quick introduction. J. Madeira V. 0.2 September 2017

Direct Rendering of Trimmed NURBS Surfaces

Tutorial on GPU Programming. Joong-Youn Lee Supercomputing Center, KISTI

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Day: Thursday, 03/19 Time: 16:00-16:50 Location: Room 212A Level: Intermediate Type: Talk Tags: Developer - Tools & Libraries; Game Development

TUNING CUDA APPLICATIONS FOR MAXWELL

Introduction to Computer Graphics. Hardware Acceleration Review

Transcription:

Understanding Shaders and WebGL Chris Dalton & Olli Etuaho

Agenda Introduction: Accessible shader development with WebGL Understanding WebGL shader execution: from JS to GPU Common shader bugs

Accessible Shader Development with WebGL 3

Accessible Shader Development with WebGL WebGL Shaders WebGL shading language is a restricted variant of The OpenGL ES Shading Language 1.00 Enables flexible shading and some general purpose usage of GPUs on the web

Accessible Shader Development with WebGL Web-based tools Start experimenting immediately ShaderToy http://shadertoy.com/ Shdr http://shdr.bkcore.com/ http://glslsandbox.com/

Accessible Shader Development with WebGL Browser tools Shader Editor in Firefox For existing projects Highlighting errors Highlighting objects based on shader

Power of WebGL Shaders Demos http://labs.gooengine.com/examples/painter http://david.li/flow Will only get better with WebGL 2.0, with a more flexible version of GLSL!

Understanding WebGL Shader Execution 8

Understanding WebGL Shader Execution Path from JS to GPU Browser validates shader source in compliance with the GLSL ES spec Browser translates the shader code for the platform drivers, exact process depends on the browser Platform driver translates the shader program to GPU bytecode GPU bytecode runs

Understanding WebGL Shader Execution Shader Translation ANGLE is used for shader validation and translation in Chrome and Firefox Translates WebGL shaders to HLSL or to GLSL that's appropriate to the platform Workarounds and emulation are used where needed https://www.khronos.org/registry/webgl/sdk/tests/extra/webgl-translate-shader.html

Understanding WebGL Shader Execution Quirks of Shader Translation Dynamic indexing is clamped some additional overhead Some built-ins are emulated on HLSL, can have slight overhead: mod, faceforward, atan(y, x) Short-circuiting and && operators sometimes need to be converted to if statements on HLSL

Understanding WebGL Shader Execution Shader Translation on WebGL 2 WebGL 2 on HLSL will add emulated built-ins: hyperbolics, roundeven (!), pack/unpack, matrix inverse switch statements with fall-through need complex workarounds on HLSL

Understanding WebGL Shader Execution Modern GPU pipeline (simplified) Shader cores execute compiled GLSL Unified means vertex/fragment shaders run on same unit Rasterizer converts triangles to pixels ROP (Render/Raster Output Unit) does perpixel operations Depth/stencil test Blending Front end VS (Vertex Shader) Rasterizer ROP (Render/Raster Output Unit) Shader cores FS (Fragment Shader) Texture units do the math for sampling Derivative/LOD computation Clamping Filtering Framebuffer (main memory) Texture units

Understanding WebGL Shader Execution GPU multiprocessing model Basis: SIMD (Single Instruction Multiple Data) The same set of instructions executes for every pixel/vertex Conceptually: Each pixel has its own dedicated processor, with a centralized controller As controller decodes instructions, processors execute in sync Results of one pixel/vertex shader can't affect other shaders Can run simultaneously

Understanding WebGL Shader Execution The Maxwell SMM Tegra X1 has 2 SMMs, GTX 980 has 16 SMM == Streaming Multiprocessor An SMM has 4 partitioned blocks A partitioned block executes warps A warp is a group of 32 threads Can't swap threads in or out Can't context-switch to a different block A thread is a single execution of your vertex/fragment shader for a specific vertex/pixel

Understanding WebGL Shader Execution Partitioned Blocks 16,834 registers (32-bit) Every warp has its own registers (zero context switch cost) Partition switches warps on a per-instruction basis Amount of warps limited by register requirement 32 basic math units (aka cuda cores) mad, mul, add, mov, min, max, abs, sub, neg, slt, sge, etc. Cost: 1 cycle per instruction 8 special function units sin, cos, 1/x, sqrt, pow, log, exp, etc. Cost: 4 cycles per instruction 8 load/store units

Understanding WebGL Shader Execution Demo http://swankybird.com/play/ Water, waves, pipes all ray traced with special functions (sin, cos, 1/x, sqrt, pow, log) Special functions are cheap!

Understanding WebGL Shader Execution Optimization tips Avoid multiple or dependent memory accesses (300-600 cycles!) A few can be hidden by running other warps Flops/fillrate is around 32 on modern GPUs Memory bandwidth is shared with CPU on mobile Maximize the number of warps that can be run on a partition Less registers => more threads Avoid divergent branching If one thread takes a branch, they all wait

Understanding WebGL Shader Execution Divergent branching example attribute float a; void main() { float v = 0.0; if (a > 0.0) v = 1.0; // relatively lightweight dosomething(v); }

Understanding WebGL Shader Execution Divergent branching example attribute float a; void main() { // this can get expensive! if (a > 0.0) dosomethingcomplex(); else dosomethingelse(); }

Common shader bugs 21

Common Shader Bugs Most frequent issue in the wild: incorrect use of shader precision Two variations #1: use of mediump or lowp when the shader requires 32-bit precision for correct results #2: use of highp when 16-bit precision would be enough

Common Shader Bugs Precision bug example Flawed code: http://glslsandbox.com/e#20238.2

Common Shader Bugs Solving precision issues Easy way out: use highp everywhere Shaders will act the same on mobile and desktop Cost: loss of performance on some mobile devices, only compatible with 85% of WebGL-capable mobile clients (data: webglstats.com late 2014)

Common Shader Bugs Solving precision issues Alternative solution: test with precision emulation and fix Emulate mobile-like shader hardware on desktop chrome --emulate-shader-precision Add --use-gl=desktop on Windows (requires Chrome version 41) Note the large performance cost of emulation

Common Shader Bugs Precision bug example Precision emulation: http://glslsandbox.com/e#20238.6 Fixed code: http://glslsandbox.com/e#20238.4