Introducing Metal 2. Graphics and Games #WWDC17. Michal Valient, GPU Software Engineer Richard Schreyer, GPU Software Engineer

Similar documents
Working with Metal Overview

Working With Metal Advanced

Metal Feature Set Tables

Metal for Ray Tracing Acceleration

What s New in Metal, Part 2

Building Visually Rich User Experiences

Metal for OpenGL Developers

The Application Stage. The Game Loop, Resource Management and Renderer Design

GPU Memory Model. Adapted from:

Vulkan on Mobile. Daniele Di Donato, ARM GDC 2016

What s New in SpriteKit

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

Metal. GPU-accelerated advanced 3D graphics rendering and data-parallel computation. source rebelsmarket.com

SceneKit: What s New Session 604

From Art to Engine with Model I/O

GPU Memory Model Overview

Programming Tips For Scalable Graphics Performance

EECS 487: Interactive Computer Graphics

OpenGL Status - November 2013 G-Truc Creation

Achieving High-performance Graphics on Mobile With the Vulkan API

Could you make the XNA functions yourself?

What s New in Metal. Part 2 #WWDC16. Graphics and Games. Session 605

Rendering Grass with Instancing in DirectX* 10

Soft Particles. Tristan Lorach

Moving Mobile Graphics Advanced Real-time Shadowing. Marius Bjørge ARM

More frames per second. Alex Kan and Jean-François Roy GPU Software

Grafica Computazionale: Lezione 30. Grafica Computazionale. Hiding complexity... ;) Introduction to OpenGL. lezione30 Introduction to OpenGL

Vision Framework. Building on Core ML. Media #WWDC17. Brett Keating, Apple Manager Frank Doepke, He who wires things together

Get the most out of the new OpenGL ES 3.1 API. Hans-Kristian Arntzen Software Engineer

Octree-Based Sparse Voxelization for Real-Time Global Illumination. Cyril Crassin NVIDIA Research

Order Independent Transparency with Dual Depth Peeling. Louis Bavoil, Kevin Myers

PowerVR Hardware. Architecture Overview for Developers

Building a Game with SceneKit

Dave Shreiner, ARM March 2009

PowerVR Series5. Architecture Guide for Developers

Shaders. Slide credit to Prof. Zwicker

Module 13C: Using The 3D Graphics APIs OpenGL ES

NVIDIA Parallel Nsight. Jeff Kiel

GeForce4. John Montrym Henry Moreton

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Adaptive Point Cloud Rendering

Rendering Objects. Need to transform all geometry then

CS 464 Review. Review of Computer Graphics for Final Exam

EECE 478. Learning Objectives. Learning Objectives. Rasterization & Scenes. Rasterization. Compositing

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

New GPU Features of NVIDIA s Maxwell Architecture

Copyright Khronos Group Page 1

Building Watch Apps #WWDC15. Featured. Session 108. Neil Desai watchos Engineer

Portland State University ECE 588/688. Graphics Processors

Windowing System on a 3D Pipeline. February 2005

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

What s New in Core Image

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

What s New in watchos

Hardware-driven visibility culling

PowerVR Performance Recommendations The Golden Rules. October 2015

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Enabling Your App for CarPlay

Finding Bugs Using Xcode Runtime Tools

Best practices for effective OpenGL programming. Dan Omachi OpenGL Development Engineer

SceneKit in Swift Playgrounds

What s New in Audio. Media #WWDC17. Akshatha Nagesh, AudioEngine-eer Béla Balázs, Audio Artisan Torrey Holbrook Walker, Audio/MIDI Black Ops

Data Delivery with Drag and Drop

Pipeline Operations. CS 4620 Lecture 14

Graphics Processing Unit Architecture (GPU Arch)

PERFORMANCE. Rene Damm Kim Steen Riber COPYRIGHT UNITY TECHNOLOGIES

DirectCompute Performance on DX11 Hardware. Nicolas Thibieroz, AMD Cem Cebenoyan, NVIDIA

OpenGL on Android. Lecture 7. Android and Low-level Optimizations Summer School. 27 July 2015

WebGL and GLSL Basics. CS559 Fall 2015 Lecture 10 October 6, 2015

Integrating Apps and Content with AR Quick Look

Game Graphics & Real-time Rendering

Computer Science 175. Introduction to Computer Graphics lib175 time: m/w 2:30-4:00 pm place:md g125 section times: tba

PROFESSIONAL. WebGL Programming DEVELOPING 3D GRAPHICS FOR THE WEB. Andreas Anyuru WILEY. John Wiley & Sons, Ltd.

PowerVR Performance Recommendations. The Golden Rules

CS427 Multicore Architecture and Parallel Computing

Modern User Interaction on ios

Blis: Better Language for Image Stuff Project Proposal Programming Languages and Translators, Spring 2017

CS 381 Computer Graphics, Fall 2008 Midterm Exam Solutions. The Midterm Exam was given in class on Thursday, October 23, 2008.

The Graphics Pipeline

OpenGL SUPERBIBLE. Fifth Edition. Comprehensive Tutorial and Reference. Richard S. Wright, Jr. Nicholas Haemel Graham Sellers Benjamin Lipchak

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

IOS PERFORMANCE. Getting the most out of your Games and Apps

I expect to interact in class with the students, so I expect students to be engaged. (no laptops, smartphones,...) (fig)

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Scan line algorithm. Jacobs University Visualization and Computer Graphics Lab : Graphics and Visualization 272

Programming Guide. Aaftab Munshi Dan Ginsburg Dave Shreiner. TT r^addison-wesley

Optimisation. CS7GV3 Real-time Rendering

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Compute Shaders. Christian Hafner. Institute of Computer Graphics and Algorithms Vienna University of Technology

Hardware Accelerated Volume Visualization. Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences

CS4620/5620: Lecture 14 Pipeline

Mention driver developers in the room. Because of time this will be fairly high level, feel free to come talk to us afterwards

Media and Gaming Accessibility

Profiling and Debugging Games on Mobile Platforms

Rendering Grass Terrains in Real-Time with Dynamic Lighting. Kévin Boulanger, Sumanta Pattanaik, Kadi Bouatouch August 1st 2006

Transcription:

Session Graphics and Games #WWDC17 Introducing Metal 2 601 Michal Valient, GPU Software Engineer Richard Schreyer, GPU Software Engineer 2017 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission from Apple.

Make expensive things happen once GPU in the driving seat New experiences

VR with Metal 2 Hall 3 Wednesday 10:00AM

Using Metal 2 for Compute Grand Ballroom A Thursday 4:10PM

Using Metal 2 for Compute Grand Ballroom A Thursday 4:10PM

Metal 2 Optimization and Debugging Executive Ballroom Thursday 3:10PM

Introducing Metal 2 Agenda

Introducing Metal 2 Agenda Argument Buffers

Introducing Metal 2 Agenda Argument Buffers Raster Order Groups

Introducing Metal 2 Agenda Argument Buffers Raster Order Groups ProMotion Displays

Introducing Metal 2 Agenda Argument Buffers Raster Order Groups ProMotion Displays Direct to Display

Introducing Metal 2 Agenda Argument Buffers Raster Order Groups ProMotion Displays Direct to Display Everything Else

Argument Buffers

Material Example roughness : 0.6 intensity : 0.3 surfacetexture speculartexture sampler

Material Example roughness intensity surfacetexture speculartexture sampler

Traditional Argument Model roughness intensity surfacetexture speculartexture sampler

Traditional Argument Model MTLBuffer roughness intensity surfacetexture speculartexture sampler

Traditional Argument Model MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls setbuffer CPU time

Traditional Argument Model MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls setbuffer settexture CPU time

Traditional Argument Model MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls setbuffer settexture settexture CPU time

Traditional Argument Model MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls setbuffer settexture settexture setsampler CPU time

Traditional Argument Model MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls setbuffer settexture settexture setsampler draw CPU time

Traditional Argument Model MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls 1 2 3 4 CPU time

Traditional Argument Model MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls 1 2 3 4 CPU time

Traditional Argument Model MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls 1 2 3 4... CPU time Object 1 Object 2 Object 3

Argument Buffers MTLBuffer roughness intensity surfacetexture speculartexture sampler

Argument Buffers MTLBuffer roughness intensity surfacetexture speculartexture sampler

Argument Buffers MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls CPU time

Argument Buffers MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls setbuffer CPU time

Argument Buffers MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls setbuffer draw CPU time

Argument Buffers MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls CPU time

Argument Buffers MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls CPU time

Argument Buffers MTLBuffer roughness intensity surfacetexture speculartexture sampler API calls... CPU time Object 1 Object 2 Object 3 Object 4 Object 5 Object 6 Object 7

Reduced CPU Overhead 8 iphone 7 7 6 5 Time (µs) Lower is better 4 3 2 1 0 2 Resources 8 Resources 16 Resources Traditional model Argument Buffers

Reduced CPU Overhead 8 7 6 iphone 7 7.97 Time (µs) Lower is better 5 4 3 4.53 2 1 0 1.64 2 Resources 8 Resources 16 Resources Traditional model Argument Buffers

Reduced CPU Overhead 8 7 6 iphone 7 7.97 Time (µs) Lower is better 5 4 3 4.53 2 1 0 1.64 0.23 0.24 0.25 2 Resources 8 Resources 16 Resources Traditional model Argument Buffers

Argument Buffers Benefits Improve performance Enable new use cases Easy to use

Argument Buffers Shader example struct Material { float roughness; float intensity; texture2d<float> surfacetexture; texture2d<float> speculartexture; sampler texturesampler; }; kernel void my_kernel(constant Material &material [[buffer(0)]]) {... }

Argument Buffers Shader example struct Material { float roughness; float intensity; texture2d<float> surfacetexture; texture2d<float> speculartexture; sampler texturesampler; }; kernel void my_kernel(constant Material &material [[buffer(0)]]) {... }

Dynamic Indexing Crowd rendering

Dynamic Indexing Crowd rendering MTLBuffer position height... speculartexture pantstexture...

Dynamic Indexing Crowd rendering MTLBuffer character[0] height speculartexture pantstexture position...... CPU GPU

Dynamic Indexing Crowd rendering MTLBuffer character[0] height speculartexture pantstexture position...... character[1] height speculartexture pantstexture position...... character[2] height speculartexture pantstexture position......... CPU GPU

Dynamic Indexing Crowd rendering MTLBuffer character[0] height speculartexture pantstexture position...... character[1] height speculartexture pantstexture position...... character[2] height speculartexture pantstexture position......... CPU setbuffer GPU

Dynamic Indexing Crowd rendering MTLBuffer character[0] height speculartexture pantstexture position...... character[1] height speculartexture pantstexture position...... character[2] height speculartexture pantstexture position......... CPU setbuffer drawindexedprimitives:... instancecount:1000 GPU

Dynamic Indexing Crowd rendering MTLBuffer character[0] height speculartexture pantstexture position...... character[1] height speculartexture pantstexture position...... character[2] height speculartexture pantstexture position......... CPU setbuffer drawindexedprimitives:... instancecount:1000 GPU Vertex Shader use per-character position

Dynamic Indexing Crowd rendering MTLBuffer character[0] height speculartexture pantstexture position...... character[1] height speculartexture pantstexture position...... character[2] height speculartexture pantstexture position......... CPU setbuffer drawindexedprimitives:... instancecount:1000 GPU Vertex Shader use per-character position Fragment Shader use per-character textures

Dynamic Indexing Vertex shader vertex VertexOutput instancedshader(uint id [[instance_id]], constant Character *crowd [[buffer(0)]]) { // Dynamically pick the character for given instance index constant Character &obj = crowd[id]; } return transformcharacter(obj);

Dynamic Indexing Vertex shader vertex VertexOutput instancedshader(uint id [[instance_id]], constant Character *crowd [[buffer(0)]]) { // Dynamically pick the character for given instance index constant Character &obj = crowd[id]; } return transformcharacter(obj);

Dynamic Indexing Vertex shader vertex VertexOutput instancedshader(uint id [[instance_id]], constant Character *crowd [[buffer(0)]]) { // Dynamically pick the character for given instance index constant Character &obj = crowd[id]; } return transformcharacter(obj);

Set Resources on GPU Particle simulation

Set Resources on GPU Particle simulation MTLBuffer particle[0] orientation... material particle[1] orientation... material particle[2] orientation... material...

Set Resources on GPU Particle simulation MTLBuffer Simulation kernel particle[0] orientation... material thread[0] particle[1] orientation... material thread[1] particle[2] orientation... material thread[2]......

Set Resources on GPU Particle simulation MTLBuffer Simulation kernel MTLBuffer particle[0] orientation... material thread[0] material[0] rocks particle[1] orientation... material thread[1] material[1] moss particle[2] orientation... material thread[2] material[2] grass.........

Set Resources on GPU Particle simulation MTLBuffer Simulation kernel MTLBuffer particle[0] orientation... material thread[0] material[0] rocks particle[1] orientation... material thread[1] material[1] moss particle[2] orientation... material thread[2] material[2] grass.........

Set Resources on GPU Particle simulation MTLBuffer Simulation kernel MTLBuffer particle[0] orientation... material moss thread[0] material[0] rocks particle[1] orientation... material thread[1] material[1] moss particle[2] orientation... material thread[2] material[2] grass.........

Set Resources on GPU Particle simulation MTLBuffer Simulation kernel MTLBuffer particle[0] orientation... material moss thread[0] material[0] rocks particle[1] orientation... material rocks thread[1] material[1] moss particle[2] orientation... material thread[2] material[2] grass.........

Set Resources on GPU Particle simulation MTLBuffer Simulation kernel MTLBuffer particle[0] orientation... material moss thread[0] material[0] rocks particle[1] orientation... material rocks thread[1] material[1] moss particle[2] orientation... material grass thread[2] material[2] grass.........

Set Resources on GPU Shader example struct Data { texture2d<float> tex; float value; }; kernel void copy(constant Data &src [[buffer(0)]], device Data &dst [[buffer(1)]]) { dst.value = 1.0f; //< Assign constants dst.tex = src.tex; //< Copy just a texture dst = src; //< Copy entire structure }

Set Resources on GPU Shader example struct Data { texture2d<float> tex; float value; }; kernel void copy(constant Data &src [[buffer(0)]], device Data &dst [[buffer(1)]]) { dst.value = 1.0f; //< Assign constants dst.tex = src.tex; //< Copy just a texture dst = src; //< Copy entire structure }

Set Resources on GPU Shader example struct Data { texture2d<float> tex; float value; }; kernel void copy(constant Data &src [[buffer(0)]], device Data &dst [[buffer(1)]]) { dst.value = 1.0f; //< Assign constants dst.tex = src.tex; //< Copy just a texture dst = src; //< Copy entire structure }

Set Resources on GPU Shader example struct Data { texture2d<float> tex; float value; }; kernel void copy(constant Data &src [[buffer(0)]], device Data &dst [[buffer(1)]]) { dst.value = 1.0f; //< Assign constants dst.tex = src.tex; //< Copy just a texture dst = src; //< Copy entire structure }

Set Resources on GPU Shader example struct Data { texture2d<float> tex; float value; }; kernel void copy(constant Data &src [[buffer(0)]], device Data &dst [[buffer(1)]]) { dst.value = 1.0f; //< Assign constants dst.tex = src.tex; //< Copy just a texture dst = src; //< Copy entire structure }

Multiple Indirections Argument Buffers can reference other Argument Buffers Create and reuse complex object hierarchies struct Object { float4 position; device Material *material; }; ///< Many objects can point to the same material struct Tree { device Tree *children[2]; device Object *objects; ///< Array of objects in the node };

Multiple Indirections Argument Buffers can reference other Argument Buffers Create and reuse complex object hierarchies struct Object { float4 position; device Material *material; }; ///< Many objects can point to the same material struct Tree { device Tree *children[2]; device Object *objects; ///< Array of objects in the node };

Multiple Indirections Argument Buffers can reference other Argument Buffers Create and reuse complex object hierarchies struct Object { float4 position; device Material *material; }; ///< Many objects can point to the same material struct Tree { device Tree *children[2]; device Object *objects; ///< Array of objects in the node };

Support Tiers CPU overhead reduction Set resources on GPU Multiple indirections Arrays of Argument Buffers Resources per draw call

Support Tiers Tier 1 CPU overhead reduction Yes Set resources on GPU No Multiple indirections No Arrays of Argument Buffers No Resources per draw call Unchanged

Support Tiers Tier 1 Tier 2 CPU overhead reduction Yes Yes Set resources on GPU No Yes Multiple indirections No Yes Arrays of Argument Buffers No Yes Resources per draw call Unchanged 500,000 textures and buffers

Argument Buffers Terrain rendering Material changes with terrain Trees placed by GPU Particle materials generated by GPU Available as sample code

API Highlights Argument Buffers are stored in plain MTLBuffer Use MTLArgumentEncoder to fill Indirect Argument Buffers Abstracts platform differences behind simple interface Up to eight Argument Buffers per stage

// Shader syntax struct Particle { texture2d<float> surface; float4 position; }; kernel void simulate(constant Particle &particle [[buffer(0)]]) {... }

// Shader syntax struct Particle { texture2d<float> surface; float4 position; }; kernel void simulate(constant Particle &particle [[buffer(0)]]) {... }

// Shader syntax struct Particle { texture2d<float> surface; float4 position; }; kernel void simulate(constant Particle &particle [[buffer(0)]]) {... } // Create encoder for first indirect argument buffer in Metal function 'simulate' let simulatefunction = library.makefunction(name:"simulate")! let particleencoder = simulatefunction.makeargumentencoder(bufferindex: 0)

// Shader syntax struct Particle { texture2d<float> surface; float4 position; }; kernel void simulate(constant Particle &particle [[buffer(0)]]) {... } // Create encoder for first indirect argument buffer in Metal function 'simulate' let simulatefunction = library.makefunction(name:"simulate")! let particleencoder = simulatefunction.makeargumentencoder(bufferindex: 0)

// Shader syntax struct Particle { texture2d<float> surface; float4 position; }; kernel void simulate(constant Particle &particle [[buffer(0)]]) {... } // Create encoder for first indirect argument buffer in Metal function 'simulate' let simulatefunction = library.makefunction(name:"simulate")! let particleencoder = simulatefunction.makeargumentencoder(bufferindex: 0)

// Shader syntax struct Particle { texture2d<float> surface; float4 position; }; kernel void simulate(constant Particle &particle [[buffer(0)]]) {... } // Create encoder for first indirect argument buffer in Metal function 'simulate' let simulatefunction = library.makefunction(name:"simulate")! let particleencoder = simulatefunction.makeargumentencoder(bufferindex: 0) // API calls to fill the indirect argument buffer particleencoder.settexture(mysurfacetexture, at: 0) particleencoder.constantdata(at: 1).storeBytes(of: myposition, as: float4.self)

// Shader syntax struct Particle { texture2d<float> surface; float4 position; }; kernel void simulate(constant Particle &particle [[buffer(0)]]) {... } // Create encoder for first indirect argument buffer in Metal function 'simulate' let simulatefunction = library.makefunction(name:"simulate")! let particleencoder = simulatefunction.makeargumentencoder(bufferindex: 0) // API calls to fill the indirect argument buffer particleencoder.settexture(mysurfacetexture, at: 0) particleencoder.constantdata(at: 1).storeBytes(of: myposition, as: float4.self)

// Shader syntax struct Particle { texture2d<float> surface; float4 position; }; kernel void simulate(constant Particle &particle [[buffer(0)]]) {... } // Create encoder for first indirect argument buffer in Metal function 'simulate' let simulatefunction = library.makefunction(name:"simulate")! let particleencoder = simulatefunction.makeargumentencoder(bufferindex: 0) // API calls to fill the indirect argument buffer particleencoder.settexture(mysurfacetexture, at: 0) particleencoder.constantdata(at: 1).storeBytes(of: myposition, as: float4.self)

Managing Resource Usage Tell Metal what resources you plan to use Use Metal Heaps for best performance // Use for textures with sample access or buffers commandencoder.use(mytextureheap) // Used for all render targets, views or read/write access to texture commandencoder.use(myrendertarget, usage:.write)

Managing Resource Usage Tell Metal what resources you plan to use Use Metal Heaps for best performance // Use for textures with sample access or buffers commandencoder.use(mytextureheap) // Used for all render targets, views or read/write access to texture commandencoder.use(myrendertarget, usage:.write)

Managing Resource Usage Tell Metal what resources you plan to use Use Metal Heaps for best performance // Use for textures with sample access or buffers commandencoder.use(mytextureheap) // Used for all render targets, views or read/write access to texture commandencoder.use(myrendertarget, usage:.write)

Best Practices Organize based on usage pattern Per-view vs. per-object vs. per-material Dynamically changing vs. Static Favor data locality Use traditional model where appropriate

Raster Order Groups Richard Schreyer

Raster Order Groups Ordered memory access from fragment shaders Enables new rendering techniques Order-independent transparency Dual-layer GBuffers Voxelization Custom blending

Fragment Shaders with Blending Time Fragment Shader Thread 1 Blend

Fragment Shaders with Blending Time Fragment Shader Thread 1 Blend Fragment Shader Thread 2 Blend

Fragment Shaders with Blending Time Fragment Shader Thread 1 Blend Fragment Shader Thread 2 Wait Blend

Mid-Shader Memory Access Time Fragment Shader Thread 1 Write Memory Fragment Shader Thread 2 Read Memory

With Raster Order Groups Time Fragment Shader Thread 1 Write Memory Fragment Shader Thread 2 Wait Read Memory

With Raster Order Groups Time Fragment Shader Thread 1 Write Memory Fragment Shader Thread 2 Wait Read Memory

// Blending manually to a pointer in memory fragment void BlendSomething( texture2d<float, access::read_write> framebuffer [[texture(0)]]) { float4 newcolor = } // Non-atomic access to memory without synchronization float4 priorcolor = framebuffer.read(framebufferposition); float4 blended = custom_blend(newcolor, priorcolor); framebuffer.write(blended, framebufferposition);

// Blending manually to a pointer in memory fragment void BlendSomething( texture2d<float, access::read_write> framebuffer [[texture(0)]]) { float4 newcolor = } // Non-atomic access to memory without synchronization float4 priorcolor = framebuffer.read(framebufferposition); float4 blended = custom_blend(newcolor, priorcolor); framebuffer.write(blended, framebufferposition);

// Blending manually to a pointer in memory fragment void BlendSomething( texture2d<float, access::read_write> framebuffer [[texture(0)]]) { float4 newcolor = } // Non-atomic access to memory without synchronization float4 priorcolor = framebuffer.read(framebufferposition); float4 blended = custom_blend(newcolor, priorcolor); framebuffer.write(blended, framebufferposition);

// Blending manually to a pointer in memory fragment void BlendSomething( texture2d<float, access::read_write> framebuffer [[texture(0), raster_order_group(0)]]) { float4 newcolor = } // Hardware waits on first access to raster ordered memory float4 priorcolor = framebuffer.read(framebufferposition); float4 blended = custom_blend(newcolor, priorcolor); framebuffer.write(blended, framebufferposition);

With Raster Order Groups Time Fragment Shader Thread 1 Write Memory Fragment Shader Thread 2 Wait Read Memory

With Raster Order Groups Time Fragment Shader Thread 1 Write Memory Fragment Shader Thread 2 Wait Read Memory

Raster Order Groups Summary Synchronization between overlapping fragment shader threads Check for support with MTLDevice.rasterOrderGroupsSupported

ProMotion Displays

Without ProMotion 60 FPS GPU 1 2 3 Display 1 2 3 16.6 ms

With ProMotion 120 FPS GPU 1 2 3 Display 1 2 3 8.3ms

Without ProMotion 48 FPS 20.83ms 20.83ms 20.83ms GPU 1 2 3 Display 1 2 3 16.6ms 33.3ms 16.6ms

With ProMotion 48 FPS 20.83ms 20.83ms 20.83ms GPU 1 2 3 Display 1 2 3 20.83ms 20.83ms 20.83ms

Without ProMotion Dropped frame 16.6ms 21ms 16.6ms GPU 1 2 3 Display 1 2 3 33.3ms 16.6ms 16.6ms

With ProMotion Dropped frame 16.6ms 21ms 16.6ms GPU 1 2 3 Display 1 2 3

With ProMotion Dropped frame 16.6ms 21ms 16.6ms GPU 1 2 3 Display 1 2 3 21ms 16.6ms 16.6ms

Opting in to ProMotion UIKit animations use ProMotion automatically Metal views require opt-in with Info.plist key <key>cadisableminimumframeduration</key> <true/>

Metal Presentation APIs Present immediately GPU 1 Display 1 present(drawable)

Metal Presentation APIs Present after minimum duration GPU 1 (t1) 1 2 (t2) 2 Display 1 2 33ms 33ms present(drawable, afterminimumduration: 1000.0 / 30.0)

Metal Presentation APIs Present at specific time GPU 1 (t1) 1 2 (t2) 2 Display 1 2 t1 t2 let time : CFTimeInterval = projectnextdisplaytime(); present(drawable, attime: time)

let targettime = // project when intend to display this drawable // render your scene into a command buffer for targettime let drawable = metallayer.nextdrawable() commandbuffer.present(drawable, attime: targettime) // after a frame or two let presentationdelay = drawable.presentedtime - targettime // Examine presentationdelay and adjust future frame timing

ProMotion Displays Summary 120 FPS Reduced latency Improved framerate consistency Reduced stuttering from missed display deadlines

Direct to Display

Displaying Metal Content Two paths GPU Composition Direct to Display

GPU Composition Metal View Other UIKit/AppKit Views Scale System Compositor Colorspace Conversion Composition Display Other Application s Windows

Direct to Display Full Screen Metal View Other UIKit/AppKit Views (occluded) Scale System Compositor Colorspace Conversion Composition Display Other Application s Windows (occluded)

Direct to Display Requirements

Direct to Display Requirements Opaque Layer

Direct to Display Requirements Opaque Layer No masking, rounded corners, and so on

Direct to Display Requirements Opaque Layer No masking, rounded corners, and so on Full screen (or with black bars via an opaque black background color)

Direct to Display Requirements Opaque Layer No masking, rounded corners, and so on Full screen (or with black bars via an opaque black background color) Dimensions matching the display or smaller

Direct to Display Requirements Opaque Layer No masking, rounded corners, and so on Full screen (or with black bars via an opaque black background color) Dimensions matching the display or smaller Color Space and Pixel Format compatible with the display

Colorspace Requirements Color Space Metal Pixel Format P3 Display srgb Display srgb bgra8unorm_srgb Direct Direct

Colorspace Requirements Color Space Metal Pixel Format P3 Display srgb Display srgb bgra8unorm_srgb Direct Direct Display P3 (macos) bgr10a2unorm Direct GPU Composited

Colorspace Requirements Color Space Metal Pixel Format P3 Display srgb Display srgb bgra8unorm_srgb Direct Direct Display P3 (macos) bgr10a2unorm Direct GPU Composited Extended srgb (ios) bgr10_xr_srgb Direct GPU Composited

Colorspace Requirements Color Space Metal Pixel Format P3 Display srgb Display srgb bgra8unorm_srgb Direct Direct Display P3 (macos) bgr10a2unorm Direct GPU Composited Extended srgb (ios) bgr10_xr_srgb Direct GPU Composited Linear Extended srgb, or Display P3 rgba16float GPU Composited GPU Composited

Detecting P3 Display Gamut UIKit UITraitCollection.displayGamut ==.P3 AppKit NSScreen.canRepresent(.p3)

Direct to Display Summary Eliminate compositor usage of GPU Useful for full-screen scenes Supported on ios, tvos, and macos Use Metal System Trace to verify

Everything Else

Memory Usage Queries New APIs to query memory usage per allocation MTLResource.allocatedSize MTLHeap.currentAllocatedSize Query total GPU memory allocated by the device MTLDevice.currentAllocatedSize

SIMDGroup-scoped Data Sharing Share data across a SIMDGroup simd_broadcast(data, simd_lane_id) simd_shuffle(data, simd_lane_id) simd_shuffle_up(data, simd_lane_id) simd_shuffle_down(data, simd_lane_id) simd_shuffle_xor(data, simd_lane_id)

SIMDGroup-scoped Data Sharing Share data across a SIMDGroup simd_broadcast(data, simd_lane_id) simd_shuffle(data, simd_lane_id) simd_shuffle_up(data, simd_lane_id) simd_shuffle_down(data, simd_lane_id) simd_shuffle_xor(data, simd_lane_id) Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 15 Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 15

Non-Uniform Threadgroup Sizes 0 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7

Non-Uniform Threadgroup Sizes 0 1 2 3 4 5 6 7 8 9 10 11 1 2 4x4 4x4 4x4 Threadgroup Threadgroup Threadgroup 3 4 5 6 4x4 4x4 4x4 Threadgroup Threadgroup Threadgroup 7

Non-Uniform Threadgroup Sizes 0 1 2 3 4 5 6 7 8 9 10 11 1 2 4x4 4x4 4x4 Threadgroup Threadgroup Threadgroup 3 4 5 6 4x4 4x4 4x4 Threadgroup Threadgroup Threadgroup 7

Non-Uniform Threadgroup Sizes 0 1 2 3 4 5 6 7 8 9 10 11 1 2 4x4 4x4 3x4 Threadgroup Threadgroup Threadgroup 3 4 5 4x3 4x3 3x3 6 Threadgroup Threadgroup Threadgroup 7

Viewport Arrays Vertex Function selects which viewport Useful for VR when combined with instancing Height 2 x Width

Multisample Pattern Control Select where within a pixel the MSAA Sample Patterns are located Useful for custom anti-aliasing 2 1 4 3 1 Pixel

Multisample Pattern Control Select where within a pixel the MSAA Toggle sample locations Sample Patterns are located Useful for custom anti-aliasing 1 2 3 4 1 Pixel

Resource Heaps Now available on macos Control time of memory allocation Fast reallocation and aliasing of resources Group related resources for faster binding

Resource Heaps Memory Allocation for A Memory Allocation for B Memory Allocation for C Texture A Texture B Texture C

Resource Heaps Memory Allocation for A Memory Allocation for B Memory Allocation for C Texture A Texture B Texture C Heap Texture A Texture B Texture C

Resource Heaps Memory Allocation for A Memory Allocation for B Memory Allocation for C Texture A Texture B Texture C Heap Texture A Texture B Texture C

Resource Heaps Memory Allocation for A Memory Allocation for B Texture A Texture B Heap Texture A Texture B

Resource Heaps Memory Allocation for A Memory Allocation for B Memory Allocation for D Texture A Texture B Texture D Heap Texture A Texture B Texture D

Resource Heaps Memory Allocation for A Memory Allocation for B Memory Allocation for D Texture A Texture B Texture D Heap Texture A Texture B Texture D

Other Features Feature Summary

Other Features Feature Summary Linear Textures Create textures from a MTLBuffer without copying

Other Features Feature Summary Linear Textures Create textures from a MTLBuffer without copying Function Constant for Argument Indexes Specialize bytecodes to change the binding index for shader arguments

Other Features Feature Summary Linear Textures Create textures from a MTLBuffer without copying Function Constant for Argument Indexes Specialize bytecodes to change the binding index for shader arguments Additional Vertex Array Formats Add some additional 1 component and 2 component vertex formats, and a BGRA8 vertex format

Other Features Feature Summary Linear Textures Create textures from a MTLBuffer without copying Function Constant for Argument Indexes Specialize bytecodes to change the binding index for shader arguments Additional Vertex Array Formats Add some additional 1 component and 2 component vertex formats, and a BGRA8 vertex format IOSurface Textures Create MTLTextures from IOSurfaces on ios

Other Features Feature Summary Linear Textures Create textures from a MTLBuffer without copying Function Constant for Argument Indexes Specialize bytecodes to change the binding index for shader arguments Additional Vertex Array Formats Add some additional 1 component and 2 component vertex formats, and a BGRA8 vertex format IOSurface Textures Create MTLTextures from IOSurfaces on ios Dual Source Blending Additional blending modes with two source parameters

Summary Argument Buffers Raster Order Groups ProMotion Displays Direct to Display Everything Else

More Information https://developer.apple.com/wwdc17/601

Related Sessions VR with Metal 2 Hall 3 Wednesday 10:00AM Metal 2 Optimization and Debugging Executive Ballroom Thursday 3:10PM Using Metal 2 for Compute Grand Ballroom A Thursday 4:10PM

Previous Sessions What's New in Metal, Part 1 WWDC 2016 Working with Wide Color WWDC 2016

Labs Metal 2 Lab Technology Lab A Tues 3:10 6:10PM VR with Metal 2 Lab Technology Lab A Wed 3:10 6:00PM Metal 2 Lab Technology Lab F Fri 9:00AM 12:00PM