Speeding Up Daylighting Design and Glare Prediction Workflows with Accelerad Nathaniel Jones DIVA Day 2016 Massachusetts Institute of Technology Sustainable Design Lab
138,844,405 rays 49 minutes 41,010,721 rays 1.5 minutes 10 3 10 2 cd/m 2 10 4
10000 1000 Events 100 10 1 Time Command GH Add Delete Draw Save
10,000,000 1,000,000 100,000 10,000 Transistors per Chip (thousands) 1,000 100 Clock Speed (MHz) 10 1 0 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Data up to the year 2010 collected by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten Data for 2010 2015 collected by K. Rupp
10,000,000 1,000,000 100,000 10,000 Transistors per Chip (thousands) 1,000 100 Clock Speed (MHz) 10 1 0 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Data up to the year 2010 collected by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten Data for 2010 2015 collected by K. Rupp
10,000,000 1,000,000 100,000 10,000 Transistors per Chip (thousands) 1,000 100 Clock Speed (MHz) 10 Cores 1 0 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Data up to the year 2010 collected by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten Data for 2010 2015 collected by K. Rupp
CPU GPU 8 cores 2880 cores http://www.maximumpc.com/article/features/sandy_bridge e_benchmarked_intel_retains_performance_crown http://www.nvidia.com/content/pdf/kepler/nvidia Kepler GK110 Architecture Whitepaper.pdf
Introducing Accelerad Under the Hood Annual Daylighting Real Time Glare Analysis Setup and Installation
DIVA Radiance DAYSIM gensky radfiles2daysim oconv evalglare gen_dc vwrays gen_reindl gendaylit xform pmapdump gencumulativesky rtrace released ds_illum rpict rtrace_dc rlam rcontrib mkpmap rvu pcomb rcalc dctimestep lookamb Accelerad gen_dgp_profile scale_dc ds_el_lighting rotate_scene gen_directsunlight
Camera Sensors Ambient rpict rtrace preprocess rvu rcontrib
Media Lab 28 x Gund Hall 54 x Small Office 33 x Accelerad Radiance 0 100 200 300 Time (minutes)
HDR Photograph Path Tracing 6 days Radiance 8 hours Accelerad 17 minutes 10 10 2 10 3 10 cd/m 4 2
HDR Photograph Path Tracing 6 days Radiance 8 hours Accelerad 17 minutes 10 10 2 10 3 10 cd/m 4 2
Introducing Accelerad Under the Hood Annual Daylighting Real Time Glare Analysis Setup and Installation
CPU
GPU
CPU (with irradiance caching)
GPU (with irradiance caching)?
Direct 1 st Bounce 2 nd Bounce 3 rd Bounce n th Bounce Final Gather OptiX 3080 found 2 GPU devices: Device 0: Tesla K40c with 15 multiprocessors, 1024 threads per block, 745000 khz, 3489464320 bytes global memory, 128 hardware textures, compute capability 3.5, timeout disabled, Tesla compute cluster driver enabled, cuda device 0. Geometry build time: 1014 milliseconds. OptiX compile time: 765 milliseconds. OptiX kernel time: 2215 milliseconds (2 seconds). Adaptive sampling: 16 milliseconds. Retrieved 262144 of 262144 potential seeds at level 0. K-means performed 6 loop iterations in 655 milliseconds. K-means produced 4090 of 4096 clusters at level 0. OptiX kernel time: 1076 milliseconds (1 seconds). Retrieved 2012131 of 2166784 potential seeds at level 1. K-means performed 6 loop iterations in 4773 milliseconds. K-means produced 4074 of 4096 clusters at level 1. OptiX kernel time: 515 milliseconds (0 seconds). Retrieved 1014954 of 1048576 potential seeds at level 2. K-means performed 6 loop iterations in 2449 milliseconds. K-means produced 4075 of 4096 clusters at level 2. Using 3944 of 3944 ambient records OptiX kernel time: 780 milliseconds (1 seconds). OptiX kernel time: 1295 milliseconds (1 seconds). Retrieved 3950 ambient records from 4096 queries at level 2. Using 3950 of 7894 ambient records OptiX kernel time: 1513 milliseconds (1 seconds). OptiX kernel time: 1420 milliseconds (2 seconds). Retrieved 3943 ambient records from 4096 queries at level 1. Using 3943 of 11837 ambient records OptiX kernel time: 2855 milliseconds (3 seconds). OptiX kernel time: 2730 milliseconds (2 seconds). Retrieved 3831 ambient records from 4096 queries at level 0. Geometry Sampling Ambient Sampling Using 3831 of 15668 ambient records OptiX kernel time: 18018 milliseconds (18 seconds). rpict: ray tracing time: 46769 milliseconds (47 seconds).
Direct 1 st Bounce 2 nd Bounce 3 rd Bounce n th Bounce Final Gather OptiX 3080 found 2 GPU devices: Device 0: Tesla K40c with 15 multiprocessors, 1024 threads per block, 745000 khz, 3489464320 bytes global memory, 128 hardware textures, compute capability 3.5, timeout disabled, Tesla compute cluster driver enabled, cuda device 0. Geometry build time: 1014 milliseconds. OptiX compile time: 765 milliseconds. OptiX kernel time: 2215 milliseconds (2 seconds). Adaptive sampling: 16 milliseconds. Retrieved 262144 of 262144 potential seeds at level 0. K-means performed 6 loop iterations in 655 milliseconds. K-means produced 4090 of 4096 clusters at level 0. OptiX kernel time: 1076 milliseconds (1 seconds). Retrieved 2012131 of 2166784 potential seeds at level 1. K-means performed 6 loop iterations in 4773 milliseconds. K-means produced 4074 of 4096 clusters at level 1. OptiX kernel time: 515 milliseconds (0 seconds). Retrieved 1014954 of 1048576 potential seeds at level 2. K-means performed 6 loop iterations in 2449 milliseconds. K-means produced 4075 of 4096 clusters at level 2. Using 3944 of 3944 ambient records OptiX kernel time: 780 milliseconds (1 seconds). OptiX kernel time: 1295 milliseconds (1 seconds). Retrieved 3950 ambient records from 4096 queries at level 2. Using 3950 of 7894 ambient records OptiX kernel time: 1513 milliseconds (1 seconds). OptiX kernel time: 1420 milliseconds (2 seconds). Retrieved 3943 ambient records from 4096 queries at level 1. Using 3943 of 11837 ambient records OptiX kernel time: 2855 milliseconds (3 seconds). OptiX kernel time: 2730 milliseconds (2 seconds). Retrieved 3831 ambient records from 4096 queries at level 0. Geometry Sampling Ambient Sampling Using 3831 of 15668 ambient records OptiX kernel time: 18018 milliseconds (18 seconds). rpict: ray tracing time: 46769 milliseconds (47 seconds).
Direct 1 st Bounce 2 nd Bounce 3 rd Bounce n th Bounce Final Gather OptiX 3080 found 2 GPU devices: Device 0: Tesla K40c with 15 multiprocessors, 1024 threads per block, 745000 khz, 3489464320 bytes global memory, 128 hardware textures, compute capability 3.5, timeout disabled, Tesla compute cluster driver enabled, cuda device 0. Geometry build time: 1014 milliseconds. OptiX compile time: 765 milliseconds. OptiX kernel time: 2215 milliseconds (2 seconds). Adaptive sampling: 16 milliseconds. Retrieved 262144 of 262144 potential seeds at level 0. K-means performed 6 loop iterations in 655 milliseconds. K-means produced 4090 of 4096 clusters at level 0. OptiX kernel time: 1076 milliseconds (1 seconds). Retrieved 2012131 of 2166784 potential seeds at level 1. K-means performed 6 loop iterations in 4773 milliseconds. K-means produced 4074 of 4096 clusters at level 1. OptiX kernel time: 515 milliseconds (0 seconds). Retrieved 1014954 of 1048576 potential seeds at level 2. K-means performed 6 loop iterations in 2449 milliseconds. K-means produced 4075 of 4096 clusters at level 2. Using 3944 of 3944 ambient records OptiX kernel time: 780 milliseconds (1 seconds). OptiX kernel time: 1295 milliseconds (1 seconds). Retrieved 3950 ambient records from 4096 queries at level 2. Using 3950 of 7894 ambient records OptiX kernel time: 1513 milliseconds (1 seconds). OptiX kernel time: 1420 milliseconds (2 seconds). Retrieved 3943 ambient records from 4096 queries at level 1. Using 3943 of 11837 ambient records OptiX kernel time: 2855 milliseconds (3 seconds). OptiX kernel time: 2730 milliseconds (2 seconds). Retrieved 3831 ambient records from 4096 queries at level 0. Geometry Sampling Ambient Sampling Using 3831 of 15668 ambient records OptiX kernel time: 18018 milliseconds (18 seconds). rpict: ray tracing time: 46769 milliseconds (47 seconds).
Accelerad Radiance 10 minutes 198 minutes Jones and Reinhart, 2014. Irradiance caching for global illumination calculation on graphics hardware. 2014 ASHRAE/IBPSA USA Building Simulation Conference, 111 120. 10 10 2 cd/m 2 10 3 10 4
Ambient Accuracy ( aa, ar)
Ambient Accuracy ( aa, ar) Radiance Smoother Shading Accelerad Better Coverage aa 0.05 aa 0.1 aa 0.2 10 3 10 2 cd/m 2 10 4
Irradiance Cache Size ( ac) Time (seconds) 200 0 512 1024 2048 4096 8192 Long Final Gather Irradiance Cache Size Long IC Build 10 3 10 2 cd/m 2 10 4
Irradiance Infill ( ag) ad 256 22 seconds ad 1024 54 seconds ad 1024 ag 0 30 seconds ad 1024 ag 256 36 seconds 10 3 10 2 cd/m 2 10 4
Introducing Accelerad Under the Hood Annual Daylighting Real Time Glare Analysis Setup and Installation
Daylight Autonomy DA 300lux [50%] Fraction of space in which daylighting achieves 300 Lux in at least 50% of occupied hours Reinhart, Rakha, and Weissman, 2014. Predicting the daylit area A comparison of students assessments and simulations at eleven schools of architecture. LEUKOS: The Journal of the Illuminating Engineering Society of North America, 10(4), 193 206.
Direct Daylight Coefficients Sun Vector Diffuse Daylight Coefficients Sky Vector Sensor Values m sensor points + m sensor points = n suns 1 n n sky patches 1 n 1 m 2 Phase Method (DAYSIM)
View Matrix Transmission Matrix Daylight Matrix Sky Vector Sensor Values m sensor points n angles n angles = n angles n angles p sky patches 1 p 1 m 3 Phase Method (rcontrib)
+ = 3 Phase 3 Phase Direct Sun Direct Sun 5 Phase 5 Phase Method (rcontrib) Andy McNeil, 2013. The Five Phase Method for Simulating Complex Fenestration with Radiance.
DAYSIM rtrace_dc Add daylight coefficients Radiance rtrace Add contribution coefficients 3 Phase Method rcontrib Port to CUDA new program accelerad_rtrace_dc Accelerad accelerad_rtrace new program accelerad_rcontrib
3.6 m 8.2 m 2.8 m 1400 irradiance sensors Reinhart, Jakubiec, and Ibarra, 2013. Definition of a reference office for standardized evaluations of dynamic façade and lighting technologies. Proceedings of BS2013: 13th Conference of International Building Performance Simulation Association, Chambéry, France, August 26 28, pp. 3645 3652.
DAYSIM 3 Phase Method 5 Phase Method CPU GPU CPU GPU CPU GPU September 21, 9 AM 12.5 minutes 2.5 minutes 9.8 minutes 3.4 minutes 421 minutes 51 minutes 5x 3x 8x 0 20,000 lux
DAYSIM 3 Phase Method 5 Phase Method CPU GPU CPU GPU CPU GPU December 21, 12 PM 12.5 minutes 2.5 minutes 9.8 minutes 3.4 minutes 421 minutes 51 minutes 5x 3x 8x 0 20,000 lux
Time (minutes) 300 200 100 3 Phase CPU 3 Phase GPU DAYSIM CPU DAYSIM GPU 3 Phase Speedup DAYSIM Speedup 8 4 Speedup 0 1400 Points 32400 Points 0 1 room 1400 sensors 20 rooms 32400 sensors
RGB Color 12 bytes Contribution Coefficient 24 bytes Daylight Coefficient Array 592 bytes
Introducing Accelerad Under the Hood Annual Daylighting Real Time Glare Analysis Setup and Installation
Ray Tracing
Ray Tracing Path Tracing Direct Path Tracing Diffuse
Frame 0 Frame 1 Frame 10 Frame 100 Frame 1000 Frame 10,000 Radiance 10 10 2 10 3 10 cd/m 4 2
Frame 0 Frame 10 Frame 1 Frame 1000 Frame 10,000 Frame 100 Radiance 10 102 cd/m2 103 104
10 AM 11 AM 12 PM 1 PM 2 PM Time of Day Overcast 10 10 2 10 3 10 cd/m 4 2
Vertical Eye Illuminance (E v ) Pixel angle from view direction Pixel solid angle Pixel luminance Task Area Luminance (TAL R ) Pixel solid angle Pixel luminance Contrast Ratio (CR) Low state High state
Veiling Glare 2 10 5 pixels 5646 pixels 32 pixels
Daylight Glare Probability (DGP) Source luminance Source solid angle Guth position index Vertical eye illuminance
Intolerable Glare Perceptible Glare
Progressive Rendering Preview image Intermediate visual comfort results Command Line Access Simplified workflow Eliminates aa ab ad af ar as av aw Speedup 10 20x rvu 1000x rpict
Image Credit: J. Alstan Jakubiec Introducing Accelerad Under the Hood Annual Daylighting Real Time Glare Analysis Setup and Installation
2009 2012 2014 2016 Fermi compute ability 2.X retiring Kepler compute ability 3.X Maxwell compute ability 5.X Pascal compute ability 6.X
TCC: Tesla Compute Cluster Gaming Professional Compute GeForce Quadro Tesla WDDM: Windows Display Driver Model Cost http://www.nvidia.com/
1. 2. 3. 4. 5. Choose a GPU Install hardware Update graphics driver Install Accelerad Run as usual
Basic Installation accelerad rpict.exe accelerad rtrace.exe optix.1.dll cudart64_xx.dll Accelerad/bin Advanced Installation Accelerad/bin *.ptx rpict.exe rtrace.exe optix.1.dll cudart64_xx.dll Accelerad/lib Radiance/bin Accelerad/lib *.ptx Radiance/lib
? Questions Discussions Google Group: Accelerad Users Bug Reports
Download http://mit.edu/sustainabledesignlab/projects/accelerad/ Questions? Nathaniel Jones <nljones@mit.edu>