Scalable Ambient Effects

Size: px

Start display at page:

Download "Scalable Ambient Effects"

Philippa Leonard
6 years ago
Views:

Scalable Ambient Effects Introduction Imagine playing a video game where the player guides a character through a marsh in the pitch black dead of night; the only guiding light is a swarm of fireflies

1 Scalable Ambient Effects Introduction Imagine playing a video game where the player guides a character through a marsh in the pitch black dead of night; the only guiding light is a swarm of fireflies that follow the player. Or imagine playing a game where the player guides a character through a desert, kicking up dust clouds with each step. These effects can be computationally expensive, but using a multithreaded implementation, they can be added to a game and scaled based on the processing power of the given system. Fireflies is a code sample demonstrating scalable ambient effects. In this sample, thousands of fireflies scatter, flock, and then return to settle and form a walking character. The ambient effect in the sample uses simple AI that includes flocking and collision avoidance with the terrain and surrounding trees. By utilizing task-based threading, the sample scales to use all available CPU cores on a target machine. All the necessary calculations for the AI are optimized by dividing the work into tasks that can be run in parallel. The task scheduler is written with Intel Threading Building Blocks (Intel TBB). Figure 1: Fireflies in action Sample Functionality To get a feel for this code, download and run it. While it runs, switch between multithreaded and serial mode to easily see the performance difference that multithreading can bring. In the taskbased threading mode, there is the option to change the number of tasks. While playing around with

2 these options on a multi-core machine, it is apparent that the number of tasks affects the performance of the sample. A lower number of tasks such as 1 or 2 yields lower performance, while a higher number of tasks yields a performance increase. Of course, changing the number of particles also affects the sample's performance. The user interface features on the right hand side were included so that a user can experiment with what setting works best on a given machine. When integrating an ambient effect like Fireflies, the goal is to add the best possible ambient effect without slowing down the overall application performance. The Fireflies sample includes functionality to auto-scale the ambient effect. In the upper right hand corner of the UI, there is a button labeled "Auto-Calibrate Optimal Number Particles". This button will cause the sample to estimate the max number of particles that can be simulated while maintaining a base performance on the target machine. The auto-scaling makes the fireflies continuously flock close together, to try to simulate the highest CPU workload experienced in the sample. In order to have the greatest possible throughput, more threads are spawned than the total number of logical hardware threads. This works well, because Intel TBB will automatically distribute the total workload, and finegrained tasks are scheduled more consistently. After setting a value for the number of tasks, the sample sets different values for the number of particles to simulate, and tries to find the highest number of particles that can be simulated while still maintaining at least 30 frames per second. To drill down and visualize how the sample works, run the sample using the Profile build of the executable. The sample's Profile version has macros that capture frame activity and performance information in the Platform View of Intel Graphics Performance Analyzers (Intel GPA). Divide Work Into Tasks Compared to the serial version of the application, the computations performed per frame in the multithreaded version for each firefly are split among multiple tasks. When fireflies scatter from the model, then later return, they perform the calculations necessary to flock together as well as avoid obstacles such as the terrain as well as avoiding vertical obstructions such as pillars and trees. When the sample is running in serial mode, each firefly performs its flocking and collision detection tests in order, one after another. On the other hand, when running the sample in multithreaded mode, the firefly flight calculations are broken up into tasks. In this case, a task simply refers to a set number of the firefly flight calculations that are all executed on separate threads. All flight calculations are independent of each other, so they may be easily done in parallel. In theory, the more tasks there are, the more the flight calculations can all be completed in parallel. In reality, however, the parallelization of these calculations is limited by the actual number of CPU cores. Moreover, there is an overhead incurred when scheduling a task and thus the amount of work assigned to each task should be greater than the scheduling overhead. Breaking up the tasks efficiently requires finding the right number of tasks to gain peak parallel performance without too much overhead cost.

3 Figure 2 shows a graph of the lowest frames per second (fps) recorded for various sizes of task sets. 1 From Figure 2, it is apparent that there is a maximum number of particles that works well with a given number of tasks. With too many particles, increasing the number of tasks does not have a great impact, because compute time is wasted spawning extra tasks without any performance benefit and without more cores to utilize the extra tasks there is no increase in parallel work being done. However, for a high number of particles, distributing the particle calculations across multiple tasks did have a significant performance increase as compared to simply running the sample serially. As shown in Figure 2, by splitting particle calculations across even as few as 4 tasks, the sample showed a performance increase of as much as 2x. Increasing the number of tasks to 12 yielded as high as a 4x performance increase. Overall, from a performance standpoint it is advantageous to multithread an ambient effect so that it can take advantage of a multi-core processor. In addition, Intel TBB task-based threading allows the calculations to be distributed across all the available cores. As can be seen from Figure 3, the parallel portion of the simulation experiences a fairly linear increase in performance with an increase in the number of cores available for the simulation. This graph was obtained by measuring the estimated average time taken to perform the purely parallel portion of the code, which was the firefly flight trajectory calculations, each frame with a varying number of cores assigned to the sample. 2 The graph shows the estimated average number of fireflies' flight trajectory update calculations that can be done per frame given a certain number of cores. 1 Testing was completed on an Intel Core i7-980x processor-based machine running at 3.33 GHz with 6 GB of RAM using an NVIDIA GeForce* GTX 285 graphics card. 2 Data obtained on an Intel Core i7-980x processor-based machine running at 3.33 GHz with 6 GB of RAM using an NVIDIA GeForce* GTX 285 graphics card. Processors were assigned to sample through the Task Manager by assigning processor affinity.

4 Average Particle Updates Per Second lowest frames per second 600 Number of Tasks Versus Lowest FPS number of tasks 400 particles 5000 particles particles particles Figure 2: Number of tasks versus fps on a 6-core CPU Assigned Cores Versus Number of Updates Number of Assigned Cores Figure 3: Average time taken to perform the update code, where the firefly flight trajectories are calculated, while varying the number of cores assigned to the sample

5 One may notice that when running the sample with a small number of fireflies, the multithreaded mode still runs faster than the serial mode. Besides the fireflies flight trajectory calculations, another important part of the code is how the task-based threading is used to parallelize the computation performed in setting up and rendering each frame. Running serially, the sample performs the usual frame setup, processing data for that frame, and rendering. However, when run in multithreaded mode, the processing of the fireflies is done in the previous frame and in parallel with the frame render, which in effect shortens the total time needed per frame. Below one can see the different sequence of steps executed in multithreaded frame activity compared to the serial frame activity. One can also see in Figure 5 a screenshot of all the sample's thread activity in a frame when run in multithreaded mode, as captured in Intel GPA Platform View. Fireflies Multi-threaded Frame Activity Start of Frame N End of Frame N Start of Frame N+1 Pre draw frame N setup Render frame N Perform Calculations for frame N+1 Distribute Particle Frame Calculations Across Multiple Tasks Update 0 - A Update A- B : : Update Y- Z Pre draw frame N+1 setup Render frame N+1 Perform Calculations for frame N+2 Fireflies Serial Frame Activity Start of Frame N End of Frame N Pre draw frame N setup Perform Calculations for frame N Render frame N Figure 4: Diagrams of parallel versus serial frame activity showing how multithreaded mode breaks up per frame calculations and rendering among threads

pre-render calculations distributed across multiple threads Frame render executing asynchronously from prerender calculations Figure 5: Sample multi-threaded frame activity in Intel GPA Platform View

6 pre-render calculations distributed across multiple threads Frame render executing asynchronously from prerender calculations Figure 5: Sample multi-threaded frame activity in Intel GPA Platform View Conclusion This sample shows an ambient effect that can be used to enhance a game and demonstrates how distributing the computation across multiple tasks yields multiple benefits. Not only does multithreading increase performance, but it also enables the ambient effect to scale easily across platforms with different CPU power. By being able to change the number of tasks used to perform the calculations and the number of objects needing calculations, developers can create scalable ambient effects, such as in the Firefly sample, once and not have to worry about the processing power of their target platform. With a task-based threading methodology, developers can write code for ambient effects and have it run on a variety of processors, from Intel Atom processors in netbooks all the way up to high end desktop systems. About the Author Eliezer Payzer is an intern with Intel's Visual Computing Software Division where he worked on samples that demonstrate the power of Intel architecture. He is finishing up his Masters in Computer Science at the University of Southern California.

Multi-Screen Computer Buyers Guide. // //

Multi-Screen Computer Buyers Guide. // // www.multiplemonitors.co.uk // Sales@MultipleMonitors.co.uk // 0845 508 53 77 CPU / Processors CPU s or processors are the heart of any computer system, they are the main chips which carry out instructions