Embree Ray Tracing Kernels: Overview and New Features

Similar documents
Embree Ray Tracing Kernels

Exploiting Local Orientation Similarity for Efficient Ray Traversal of Hair and Fur

Embree Ray Tracing Kernels for the Intel Xeon and Intel Xeon Phi Architectures. Sven Woop, Carsten Benthin, Ingo Wald Intel

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

IXPUG 16. Dmitry Durnov, Intel MPI team

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature. Intel Software Developer Conference London, 2017

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

Graphics Performance Analyzer for Android

H.J. Lu, Sunil K Pandey. Intel. November, 2018

Vectorization Advisor: getting started

Installation Guide and Release Notes

OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

Software Occlusion Culling

Mikhail Dvorskiy, Jim Cownie, Alexey Kukanov

Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth

INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT

Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015

Getting Started with Intel SDK for OpenCL Applications

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python

Bitonic Sorting Intel OpenCL SDK Sample Documentation

Intel Software Development Products Licensing & Programs Channel EMEA

Bei Wang, Dmitry Prohorov and Carlos Rosales

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

Kevin O Leary, Intel Technical Consulting Engineer

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Achieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013

Case Study. Optimizing an Illegal Image Filter System. Software. Intel Integrated Performance Primitives. High-Performance Computing

Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин

Memory & Thread Debugger

A Simple Path to Parallelism with Intel Cilk Plus

Expressing and Analyzing Dependencies in your C++ Application

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor

Sarah Knepper. Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018

Becca Paren Cluster Systems Engineer Software and Services Group. May 2017

HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Michael Kinsner, Dirk Seynhaeve IWOCL 2018

More performance options

Intel Cluster Checker 3.0 webinar

Sergey Maidanov. Software Engineering Manager for Intel Distribution for Python*

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria

Jomar Silva Technical Evangelist

INTEL MKL Vectorized Compact routines

MICHAL MROZEK ZBIGNIEW ZDANOWICZ

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

Carsten Benthin, Sven Woop, Ingo Wald, Attila Áfra HPG 2017

12th ANNUAL WORKSHOP 2016 NVME OVER FABRICS. Presented by Phil Cayton Intel Corporation. April 6th, 2016

Jackson Marusarz Software Technical Consulting Engineer

What s New August 2015

Johannes Günther, Senior Graphics Software Engineer. Intel Data Center Group, HPC Visualization

Real World Development examples of systems / iot

What s P. Thierry

Overview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation

Intel Many Integrated Core (MIC) Architecture

Crosstalk between VMs. Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA

Intel Advisor XE. Vectorization Optimization. Optimization Notice

Row Tracing with Hierarchical Occlusion Maps

Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism.

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature

Optimizing Film, Media with OpenCL & Intel Quick Sync Video

Jim Cownie, Johnny Peyton with help from Nitya Hariharan and Doug Jacobsen

Efficiently Introduce Threading using Intel TBB

Installation Guide and Release Notes

Intel Math Kernel Library (Intel MKL) Team - Presenter: Murat Efe Guney Workshop on Batched, Reproducible, and Reduced Precision BLAS Georgia Tech,

Debugging and Analyzing Programs using the Intercept Layer for OpenCL Applications

3D ray tracing simple scalability case study

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

Intel Architecture 2S Server Tioga Pass Performance and Power Optimization

Obtaining the Last Values of Conditionally Assigned Privates

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Growth in Cores - A well rehearsed story

Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory

Tuning Python Applications Can Dramatically Increase Performance

Intel SDK for OpenCL* - Sample for OpenCL* and Intel Media SDK Interoperability

Fast BVH Construction on GPUs

Intel Stereo 3D SDK Developer s Guide. Alpha Release

Using Intel VTune Amplifier XE for High Performance Computing

Software Optimization Case Study. Yu-Ping Zhao

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Intel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation

Intel Architecture for Software Developers

Intel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager

Part IV. Review of hardware-trends for real-time ray tracing

Gil Rapaport and Ayal Zaks. Intel Corporation, Israel Development Center. March 27-28, 2017 European LLVM Developers Meeting

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

SAH guided spatial split partitioning for fast BVH construction. Per Ganestam and Michael Doggett Lund University

pymic: A Python* Offload Module for the Intel Xeon Phi Coprocessor

High Performance Computing The Essential Tool for a Knowledge Economy

Compiling for Scalable Computing Systems the Merit of SIMD. Ayal Zaks Intel Corporation Acknowledgements: too many to list

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Real Time Ray Tracing

Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation. OpenMPCon 2017 September 18, 2017

Achieving Peak Performance on Intel Hardware. Intel Software Developer Conference London, 2017

Transcription:

Embree Ray Tracing Kernels: Overview and New Features Attila Áfra, Ingo Wald, Carsten Benthin, Sven Woop Intel Corporation Intel, the Intel logo, Intel Xeon Phi, Intel Xeon Processor are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. See Trademarks on intel.com for full list of Intel trademarks.

Legal Disclaimer and Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright 2016, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 2

Writing a fast ray tracer is difficult Need to multi-thread: easy for rendering but difficult for hierarchy construction Need to vectorize: efficient use of SIMD units, different ISAs (SSE, AVX, AVX2, AVX-512) Need deep domain knowledge: many different data structures (kd-trees, octrees, grids, BVH2, BVH4,..., hybrid structures) and algorithms (single rays, packets, large packets, stream tracing,...) to choose Need to support different CPUs: different ISAs/CPU types favor different data structures, data layouts, and algorithms 3

Embree ray tracing kernels Provides highly optimized and scalable ray tracing kernels Acceleration structure build and ray traversal Targets professional rendering applications Highest ray tracing performance on CPUs 1.5 6 speedup reported by users Support for latest CPUs Intel Xeon Phi Processor (codenamed Knights Landing) API for easy integration into applications Free and open source under Apache 2.0 license http://embree.github.com 4

Embree features Find closest hit, any hit rtcintersect, rtcoccluded Single rays, ray packets (4, 8, 16), ray streams (N) High-quality and high-performance BVH builders Intel SPMD Program Compiler (ISPC) support Triangles, quads, subdivs, instances, hair, linear motion blur Extensible User defined geometry, intersection filter functions, open source Support for Intel Threading Building Blocks (TBB) 5

Embree system overview Embree API (C++ and ISPC) Ray Tracing Kernel Selection Acceleration Structures bvh4.triangle4 bvh8.triangle4 bvh4.quad4v Builders SAH Builder Spatial Split Builder Morton Builder BVH Refitter Subdiv Engine B-Spline Patch Gregory Patch Tessellation Cache Displ. Mapping Traversal Single Ray Packet/Hybrid Ray Stream Intersection Möller-Trumbore Plücker Bézier Curve Line Segment Triangle Grid Common Vector and SIMD Library (Vec3f, Vec3fa, vfloat4, vfloat8, vfloat16,, SSE2, SSE4.1, AVX, AVX2, AVX-512) 6

Embree API How to use Embree? 7

Scene Scene is a container for set of geometries Scene flags passed at creation time Static scene Dynamic scene etc. Scene geometry changes have to get commited (rtccommit), which triggers BVH build // include Embree headers #include <embree2/rtcore.h> int main() { // initialize at application startup } rtcinit(); // create scene RTCScene scene = rtcnewscene (RTC_SCENE_STATIC, RTC_INTERSECT1); // add geometries... later slide... // commit changes rtccommit(scene); // trace rays... later slide... // cleanup at application exit rtcexit(); 8

Triangle mesh Contains vertex and index buffers Number of triangles and vertices set at creation time Linear motion blur supported 2 vertex buffers // application vertex and index layout struct Vertex { float x, y, z, s, t; }; struct Triangle { int materialid, v0, v1, v2; }; // add mesh to scene unsigned int geomid = rtcnewtrianglemesh (scene, numtriangles, numvertices, 1); // set data buffers rtcsetbuffer(scene, geomid, RTC_VERTEX_BUFFER, vertexptr, 0, sizeof(vertex)); rtcsetbuffer(scene, geomid, RTC_INDEX_BUFFER, indexptr, 4, sizeof(triangle)); // add more geometries... // commit changes rtccommit(scene); 9

Rendering (ISPC) // loop over all screen pixels foreach (y=0... screenheight-1, x=0... screenwidth-1) { // create and trace primary ray RTCRay ray = make_ray(p, normalize(x*vx + y*vy + vz), eps, inf); rtcintersect(scene, ray); // environment shading if (ray.geomid == RTC_INVALID_GEOMETRY_ID) { pixels[y*screenwidth+x] = make_vec3f(0.0f); continue; } // calculate hard shadows RTCRay shadow = make_ray(ray.org+ray.tfar*ray.dir, neg(lightdir), eps, inf); rtcoccluded(scene, shadow); } if (shadow.geomid == RTC_INVALID_GEOMETRY_ID) pixels[y*width+x] = colors[ray.primid]*(0.5f + clamp(-dot(lightdir, normalize(ray.ng)), 0.0f, 1.0f)); else pixels[y*width+x] = colors[ray.primid]*0.5f; 10

New/Advanced Features Since the initial publication of Embree [Wald et al. 2014] 11

Quad meshes (Embree 2.8) Most 3D modeling packages produce quad meshes No need to convert them to triangles anymore! rtcnewquadmesh Up to 2 lower memory usage Faster BVH building Higher ray intersection throughput 12

Subdivision surfaces (Embree 2.4) Catmull-Clark subdivision surfaces Compatible with OpenSubdiv 3.0 Displacement mapping Tessellation cache [Benthin et al. 2015] Low memory usage Real-time rendering performance 13

Hair Three hair geometry types: p0/r0 Cubic Bézier hair (Embree 2.3) Line segment (Embree 2.8) Cubic Bézier curve (Embree 2.10) Sweep surface of a circle along a Bézier curve p1/r1 p2/r2 p3/r3 High performance through use of oriented bounding boxes [Woop et al. 2014] Low memory consumption through direct ray/curve intersection 14

User defined geometries Enables implementing custom primitives not provided by Embree e.g., point, disk User provides: Bounding function Intersect and occluded functions Linear motion blur support (Embree 2.8) 15

Ray streams (Embree 2.9) Intersect many rays together e.g., 1K-4K rtcintersect1m, rtcintersectnm, rtcintersectnp Enables better coherence extraction than packets Improves both traversal and shading [Áfra et al. 2016] performance Novel stream traversal algorithm Based on hiding memory access latency Improves performance by 10-30% depending on coherence 16

Embree 2.10.0 Performance Ray tracing performance 17

Test setup Path tracer with complex materials and shaders (including procedural) Ray stream tracing with local shading coherence extraction [Áfra et al. 2016] Hardware: Dual-socket Intel Xeon E5-2699 v3 (Haswell, 2 18 cores, 2.3 GHz, AVX2), 64 GB DDR4 Intel Xeon Phi 7210 (Knights Landing, 64 cores, 1.3 GHz, AVX-512), 96 GB DDR4 Mazda / 5.7M triangles Villa / 37.7M triangles Conference / 0.3M triangles 18

Million rays / second Ray tracing performance (including shading) 100 90 80 70 60 50 40 30 20 10 0 Mazda Villa Conference 2 Intel Xeon E5-2699 v3 2 18 cores, 2.3 GHz, AVX2 Intel Xeon Phi 7210 64 cores, 1.3 GHz, AVX-512 19

Q&A Visit the Intel booth for a live Embree demo running on Intel Xeon Phi! 20