Scalable Ambient Effects

Size: px
Start display at page:

Download "Scalable Ambient Effects"

Transcription

1 Scalable Ambient Effects Introduction Imagine playing a video game where the player guides a character through a marsh in the pitch black dead of night; the only guiding light is a swarm of fireflies that follow the player. Or imagine playing a game where the player guides a character through a desert, kicking up dust clouds with each step. These effects can be computationally expensive, but using a multithreaded implementation, they can be added to a game and scaled based on the processing power of the given system. Fireflies is a code sample demonstrating scalable ambient effects. In this sample, thousands of fireflies scatter, flock, and then return to settle and form a walking character. The ambient effect in the sample uses simple AI that includes flocking and collision avoidance with the terrain and surrounding trees. By utilizing task-based threading, the sample scales to use all available CPU cores on a target machine. All the necessary calculations for the AI are optimized by dividing the work into tasks that can be run in parallel. The task scheduler is written with Intel Threading Building Blocks (Intel TBB). Figure 1: Fireflies in action Sample Functionality To get a feel for this code, download and run it. While it runs, switch between multithreaded and serial mode to easily see the performance difference that multithreading can bring. In the taskbased threading mode, there is the option to change the number of tasks. While playing around with

2 these options on a multi-core machine, it is apparent that the number of tasks affects the performance of the sample. A lower number of tasks such as 1 or 2 yields lower performance, while a higher number of tasks yields a performance increase. Of course, changing the number of particles also affects the sample's performance. The user interface features on the right hand side were included so that a user can experiment with what setting works best on a given machine. When integrating an ambient effect like Fireflies, the goal is to add the best possible ambient effect without slowing down the overall application performance. The Fireflies sample includes functionality to auto-scale the ambient effect. In the upper right hand corner of the UI, there is a button labeled "Auto-Calibrate Optimal Number Particles". This button will cause the sample to estimate the max number of particles that can be simulated while maintaining a base performance on the target machine. The auto-scaling makes the fireflies continuously flock close together, to try to simulate the highest CPU workload experienced in the sample. In order to have the greatest possible throughput, more threads are spawned than the total number of logical hardware threads. This works well, because Intel TBB will automatically distribute the total workload, and finegrained tasks are scheduled more consistently. After setting a value for the number of tasks, the sample sets different values for the number of particles to simulate, and tries to find the highest number of particles that can be simulated while still maintaining at least 30 frames per second. To drill down and visualize how the sample works, run the sample using the Profile build of the executable. The sample's Profile version has macros that capture frame activity and performance information in the Platform View of Intel Graphics Performance Analyzers (Intel GPA). Divide Work Into Tasks Compared to the serial version of the application, the computations performed per frame in the multithreaded version for each firefly are split among multiple tasks. When fireflies scatter from the model, then later return, they perform the calculations necessary to flock together as well as avoid obstacles such as the terrain as well as avoiding vertical obstructions such as pillars and trees. When the sample is running in serial mode, each firefly performs its flocking and collision detection tests in order, one after another. On the other hand, when running the sample in multithreaded mode, the firefly flight calculations are broken up into tasks. In this case, a task simply refers to a set number of the firefly flight calculations that are all executed on separate threads. All flight calculations are independent of each other, so they may be easily done in parallel. In theory, the more tasks there are, the more the flight calculations can all be completed in parallel. In reality, however, the parallelization of these calculations is limited by the actual number of CPU cores. Moreover, there is an overhead incurred when scheduling a task and thus the amount of work assigned to each task should be greater than the scheduling overhead. Breaking up the tasks efficiently requires finding the right number of tasks to gain peak parallel performance without too much overhead cost.

3 Figure 2 shows a graph of the lowest frames per second (fps) recorded for various sizes of task sets. 1 From Figure 2, it is apparent that there is a maximum number of particles that works well with a given number of tasks. With too many particles, increasing the number of tasks does not have a great impact, because compute time is wasted spawning extra tasks without any performance benefit and without more cores to utilize the extra tasks there is no increase in parallel work being done. However, for a high number of particles, distributing the particle calculations across multiple tasks did have a significant performance increase as compared to simply running the sample serially. As shown in Figure 2, by splitting particle calculations across even as few as 4 tasks, the sample showed a performance increase of as much as 2x. Increasing the number of tasks to 12 yielded as high as a 4x performance increase. Overall, from a performance standpoint it is advantageous to multithread an ambient effect so that it can take advantage of a multi-core processor. In addition, Intel TBB task-based threading allows the calculations to be distributed across all the available cores. As can be seen from Figure 3, the parallel portion of the simulation experiences a fairly linear increase in performance with an increase in the number of cores available for the simulation. This graph was obtained by measuring the estimated average time taken to perform the purely parallel portion of the code, which was the firefly flight trajectory calculations, each frame with a varying number of cores assigned to the sample. 2 The graph shows the estimated average number of fireflies' flight trajectory update calculations that can be done per frame given a certain number of cores. 1 Testing was completed on an Intel Core i7-980x processor-based machine running at 3.33 GHz with 6 GB of RAM using an NVIDIA GeForce* GTX 285 graphics card. 2 Data obtained on an Intel Core i7-980x processor-based machine running at 3.33 GHz with 6 GB of RAM using an NVIDIA GeForce* GTX 285 graphics card. Processors were assigned to sample through the Task Manager by assigning processor affinity.

4 Average Particle Updates Per Second lowest frames per second 600 Number of Tasks Versus Lowest FPS number of tasks 400 particles 5000 particles particles particles Figure 2: Number of tasks versus fps on a 6-core CPU Assigned Cores Versus Number of Updates Number of Assigned Cores Figure 3: Average time taken to perform the update code, where the firefly flight trajectories are calculated, while varying the number of cores assigned to the sample

5 One may notice that when running the sample with a small number of fireflies, the multithreaded mode still runs faster than the serial mode. Besides the fireflies flight trajectory calculations, another important part of the code is how the task-based threading is used to parallelize the computation performed in setting up and rendering each frame. Running serially, the sample performs the usual frame setup, processing data for that frame, and rendering. However, when run in multithreaded mode, the processing of the fireflies is done in the previous frame and in parallel with the frame render, which in effect shortens the total time needed per frame. Below one can see the different sequence of steps executed in multithreaded frame activity compared to the serial frame activity. One can also see in Figure 5 a screenshot of all the sample's thread activity in a frame when run in multithreaded mode, as captured in Intel GPA Platform View. Fireflies Multi-threaded Frame Activity Start of Frame N End of Frame N Start of Frame N+1 Pre draw frame N setup Render frame N Perform Calculations for frame N+1 Distribute Particle Frame Calculations Across Multiple Tasks Update 0 - A Update A- B : : Update Y- Z Pre draw frame N+1 setup Render frame N+1 Perform Calculations for frame N+2 Fireflies Serial Frame Activity Start of Frame N End of Frame N Pre draw frame N setup Perform Calculations for frame N Render frame N Figure 4: Diagrams of parallel versus serial frame activity showing how multithreaded mode breaks up per frame calculations and rendering among threads

6 pre-render calculations distributed across multiple threads Frame render executing asynchronously from prerender calculations Figure 5: Sample multi-threaded frame activity in Intel GPA Platform View Conclusion This sample shows an ambient effect that can be used to enhance a game and demonstrates how distributing the computation across multiple tasks yields multiple benefits. Not only does multithreading increase performance, but it also enables the ambient effect to scale easily across platforms with different CPU power. By being able to change the number of tasks used to perform the calculations and the number of objects needing calculations, developers can create scalable ambient effects, such as in the Firefly sample, once and not have to worry about the processing power of their target platform. With a task-based threading methodology, developers can write code for ambient effects and have it run on a variety of processors, from Intel Atom processors in netbooks all the way up to high end desktop systems. About the Author Eliezer Payzer is an intern with Intel's Visual Computing Software Division where he worked on samples that demonstrate the power of Intel architecture. He is finishing up his Masters in Computer Science at the University of Southern California.

Multi-Screen Computer Buyers Guide. // //

Multi-Screen Computer Buyers Guide.   // // www.multiplemonitors.co.uk // Sales@MultipleMonitors.co.uk // 0845 508 53 77 CPU / Processors CPU s or processors are the heart of any computer system, they are the main chips which carry out instructions

More information

Level 2 Diploma Unit 3 Computer Systems

Level 2 Diploma Unit 3 Computer Systems Level 2 Diploma Unit 3 Computer Systems You are an IT technician in a small company which creates web sites. The company has recently employed someone who is partially sighted and is also left handed.

More information

Accelerating Reinforcement Learning in Engineering Systems

Accelerating Reinforcement Learning in Engineering Systems Accelerating Reinforcement Learning in Engineering Systems Tham Chen Khong with contributions from Zhou Chongyu and Le Van Duc Department of Electrical & Computer Engineering National University of Singapore

More information

Table of Contents. Questions or problems?

Table of Contents. Questions or problems? 1 Introduction Overview Setting Up Occluders Shadows and Occlusion LODs Creating LODs LOD Selection Optimization Basics Controlling the Hierarchy MultiThreading Multiple Active Culling Cameras Umbra Comparison

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU

More information

Computer Performance

Computer Performance Computer Performance Microprocessor At the centre of all modern personal computers is one, or more, microprocessors. The microprocessor is the chip that contains the CPU, Cache Memory (RAM), and connects

More information

School of Computer and Information Science

School of Computer and Information Science School of Computer and Information Science CIS Research Placement Report Multiple threads in floating-point sort operations Name: Quang Do Date: 8/6/2012 Supervisor: Grant Wigley Abstract Despite the vast

More information

Parallelism and Concurrency. COS 326 David Walker Princeton University

Parallelism and Concurrency. COS 326 David Walker Princeton University Parallelism and Concurrency COS 326 David Walker Princeton University Parallelism What is it? Today's technology trends. How can we take advantage of it? Why is it so much harder to program? Some preliminary

More information

Table of Contents 1.1. Introduction Installation Quick Start Documentation Asynchronous Configuration 1.4.

Table of Contents 1.1. Introduction Installation Quick Start Documentation Asynchronous Configuration 1.4. Table of Contents Introduction 1 Installation 2 Quick Start 3 Documentation Asynchronous Configuration Level Streaming Saving And Loading Slot Templates 1.1 1.2 1.3 1.4 1.4.1 1.4.2 1.4.3 1.4.4 1.4.5 1

More information

FAQ No Outline

FAQ No Outline [Game Master] Gaming on Windows 10 - Troubleshooting This document supplies you with useful solutions if you re encountering the problems below in games. Please follow the instruction for troubleshooting.

More information

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved. Chapter 6 Parallel Processors from Client to Cloud FIGURE 6.1 Hardware/software categorization and examples of application perspective on concurrency versus hardware perspective on parallelism. 2 FIGURE

More information

CS10 The Beauty and Joy of Computing

CS10 The Beauty and Joy of Computing CS10 The Beauty and Joy of Computing Lecture #19 Distributed Computing UC Berkeley EECS Lecturer SOE Dan Garcia 2010-11-08 Researchers at Indiana U used data mining techniques to uncover evidence that

More information

Fast BVH Construction on GPUs

Fast BVH Construction on GPUs Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California

More information

FAQ No Outline

FAQ No Outline [Game Master] Gaming on Windows 10 - Troubleshooting This document supplies you with useful solutions if you re encountering the problems below in games. Please follow the instruction for troubleshooting.

More information

NVIDIA GRID APPLICATION SIZING FOR AUTODESK REVIT 2016

NVIDIA GRID APPLICATION SIZING FOR AUTODESK REVIT 2016 NVIDIA GRID APPLICATION SIZING FOR AUTODESK REVIT 2016 BPG-08489-001 March 2017 Best Practices Guide TABLE OF CONTENTS Users Per Server (UPS)... 1 Technology Overview... 3 Autodesk Revit 2016 Application...

More information

OpenCL implementation of PSO: a comparison between multi-core CPU and GPU performances

OpenCL implementation of PSO: a comparison between multi-core CPU and GPU performances OpenCL implementation of PSO: a comparison between multi-core CPU and GPU performances Stefano Cagnoni 1, Alessandro Bacchini 1,2, Luca Mussi 1 1 Dept. of Information Engineering, University of Parma,

More information

Auto Tracking Server Software Installation Procedures

Auto Tracking Server Software Installation Procedures Auto Tracking Server Software Installation Procedures Table of Contents TABLE OF CONTENTS... 2 INTRODUCTION... 3 OPERATING ENVIRONMENT... 3 ABOUT THE DESCRIPTIONS IN THIS FILE... 4 ABOUT REGISTRATION AND

More information

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8. Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport

Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport GTC 2018 Jeremy Sweezy Scientist Monte Carlo Methods, Codes and Applications Group 3/28/2018 Operated by Los Alamos National

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

Multiprocessor Systems. COMP s1

Multiprocessor Systems. COMP s1 Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve

More information

The Beauty and Joy of Computing

The Beauty and Joy of Computing The Beauty and Joy of Computing Lecture #19 Distributed Computing UC Berkeley Sr Lecturer SOE Dan By the end of the decade, we re going to see computers that can compute one exaflop (recall kilo, mega,

More information

Tutorial: Understanding the Lumberyard Interface

Tutorial: Understanding the Lumberyard Interface Tutorial: Understanding the Lumberyard Interface This tutorial walks you through a basic overview of the Interface. Along the way we will create our first level, generate terrain, navigate within the editor,

More information

Benchmark Performance Results for Pervasive PSQL v11. A Pervasive PSQL White Paper September 2010

Benchmark Performance Results for Pervasive PSQL v11. A Pervasive PSQL White Paper September 2010 Benchmark Performance Results for Pervasive PSQL v11 A Pervasive PSQL White Paper September 2010 Table of Contents Executive Summary... 3 Impact Of New Hardware Architecture On Applications... 3 The Design

More information

Gecata by Movavi 5. Recording desktop. Recording with webcam Capture videos of the games you play. Record video of your full desktop.

Gecata by Movavi 5. Recording desktop. Recording with webcam Capture videos of the games you play. Record video of your full desktop. Gecata by Movavi 5 Don't know where to start? Read these tutorials: Recording gameplay Recording desktop Recording with webcam Capture videos of the games you play. Record video of your full desktop. Add

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior

More information

GPGPU Applications. for Hydrological and Atmospheric Simulations. and Visualizations on the Web. Ibrahim Demir

GPGPU Applications. for Hydrological and Atmospheric Simulations. and Visualizations on the Web. Ibrahim Demir GPGPU Applications for Hydrological and Atmospheric Simulations and Visualizations on the Web Ibrahim Demir Big Data We are collecting and generating data on a petabyte scale (1Pb = 1,000 Tb = 1M Gb) Data

More information

Stream Processing with CUDA TM A Case Study Using Gamebryo's Floodgate Technology

Stream Processing with CUDA TM A Case Study Using Gamebryo's Floodgate Technology Stream Processing with CUDA TM A Case Study Using Gamebryo's Floodgate Technology Dan Amerson, Technical Director, Emergent Game Technologies Purpose Why am I giving this talk? To answer this question:

More information

The Beauty and Joy of Computing

The Beauty and Joy of Computing The Beauty and Joy of Computing Lecture #18 Distributed Computing UC Berkeley Sr Lecturer SOE Dan By the end of the decade, we re going to see computers that can compute one exaflop (recall kilo, mega,

More information

There are two lights in the scene: one infinite (directional) light, and one spotlight casting from the lighthouse.

There are two lights in the scene: one infinite (directional) light, and one spotlight casting from the lighthouse. Sample Tweaker Ocean Fog Overview This paper will discuss how we successfully optimized an existing graphics demo, named Ocean Fog, for our latest processors with Intel Integrated Graphics. We achieved

More information

Table of Contents 2-4

Table of Contents 2-4 Setting Up TS 2018 with a single nvidia card, using nvidia Control Panel (NVCP) PLUS (optional) nvidia Inspector (NVI). Single Standard and GSync Monitor settings. Setting up DSR in TS 2018 This is a guide

More information

EE 7722 GPU Microarchitecture. Offered by: Prerequisites By Topic: Text EE 7722 GPU Microarchitecture. URL:

EE 7722 GPU Microarchitecture. Offered by: Prerequisites By Topic: Text EE 7722 GPU Microarchitecture. URL: 00 1 EE 7722 GPU Microarchitecture 00 1 EE 7722 GPU Microarchitecture URL: http://www.ece.lsu.edu/gp/. Offered by: David M. Koppelman 345 ERAD, 578-5482, koppel@ece.lsu.edu, http://www.ece.lsu.edu/koppel

More information

LATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN

LATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN LATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN Russ Fellows Enabling you to make the best technology decisions November 2017 EXECUTIVE OVERVIEW* The new Intel Xeon Scalable platform

More information

INTEL NEXT GENERATION TECHNOLOGY - POWERING NEW PERFORMANCE LEVELS

INTEL NEXT GENERATION TECHNOLOGY - POWERING NEW PERFORMANCE LEVELS INTEL NEXT GENERATION TECHNOLOGY - POWERING NEW PERFORMANCE LEVELS Russ Fellows Enabling you to make the best technology decisions July 2017 EXECUTIVE OVERVIEW* The new Intel Xeon Scalable platform is

More information

SATGPU - A Step Change in Model Runtimes

SATGPU - A Step Change in Model Runtimes SATGPU - A Step Change in Model Runtimes User Group Meeting Thursday 16 th November 2017 Ian Wright, Atkins Peter Heywood, University of Sheffield 20 November 2017 1 SATGPU: Phased Development Phase 1

More information

ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer

ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session 2117 Olivier Zegdoun AMD Sr. Software Engineer CONTENTS Rendering Effects Before Fusion: single discrete GPU case Before Fusion: multiple discrete

More information

Fast Interactive Sand Simulation for Gesture Tracking systems Shrenik Lad

Fast Interactive Sand Simulation for Gesture Tracking systems Shrenik Lad Fast Interactive Sand Simulation for Gesture Tracking systems Shrenik Lad Project Guide : Vivek Mehta, Anup Tapadia TouchMagix media labs TouchMagix www.touchmagix.com Interactive display solutions Interactive

More information

Intel Threading Building Blocks (Intel TBB) 2.1. In-Depth

Intel Threading Building Blocks (Intel TBB) 2.1. In-Depth Intel Threading Building Blocks (Intel TBB) 2.1 In-Depth Contents Intel Threading Building Blocks (Intel TBB) 2.1........... 3 Features................................................ 3 New in this Release.....................................

More information

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction

More information

Ultra-Low Latency Down to Microseconds SSDs Make It. Possible

Ultra-Low Latency Down to Microseconds SSDs Make It. Possible Ultra-Low Latency Down to Microseconds SSDs Make It Possible DAL is a large ocean shipping company that covers ocean and land transportation, storage, cargo handling, and ship management. Every day, its

More information

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010 Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)

More information

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters CLUSTERING HIVEMQ Building highly available, horizontally scalable MQTT Broker Clusters 12/2016 About this document MQTT is based on a publish/subscribe architecture that decouples MQTT clients and uses

More information

CS 354 R Game Technology

CS 354 R Game Technology CS 354 R Game Technology Particles and Flocking Behavior Fall 2017 Particle Effects 2 General Particle Systems Objects are considered point masses with orientation Simple rules control how the particles

More information

Immersive Out-of-Core Visualization of Large-Size and Long-Timescale Molecular Dynamics Trajectories

Immersive Out-of-Core Visualization of Large-Size and Long-Timescale Molecular Dynamics Trajectories Immersive Out-of-Core Visualization of Large-Size and Long-Timescale Molecular Dynamics Trajectories J. Stone, K. Vandivort, K. Schulten Theoretical and Computational Biophysics Group Beckman Institute

More information

Qlik Sense Performance Benchmark

Qlik Sense Performance Benchmark Technical Brief Qlik Sense Performance Benchmark This technical brief outlines performance benchmarks for Qlik Sense and is based on a testing methodology called the Qlik Capacity Benchmark. This series

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed. CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed. File-System Structure File structure Logical storage unit Collection of related information File

More information

GeoImaging Accelerator Pansharpen Test Results. Executive Summary

GeoImaging Accelerator Pansharpen Test Results. Executive Summary Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance Whitepaper), the same approach has

More information

Minecraft Due: March. 6, 2018

Minecraft Due: March. 6, 2018 CS1950U Topics in 3D Game Engine Development Barbara Meier Minecraft Due: March. 6, 2018 Introduction In this assignment you will build your own version of one of the most popular indie games ever: Minecraft.

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

CS 378: Computer Game Technology

CS 378: Computer Game Technology CS 378: Computer Game Technology Dynamic Path Planning, Flocking Spring 2012 University of Texas at Austin CS 378 Game Technology Don Fussell Dynamic Path Planning! What happens when the environment changes

More information

Streaming Massive Environments From Zero to 200MPH

Streaming Massive Environments From Zero to 200MPH FORZA MOTORSPORT From Zero to 200MPH Chris Tector (Software Architect Turn 10 Studios) Turn 10 Internal studio at Microsoft Game Studios - we make Forza Motorsport Around 70 full time staff 2 Why am I

More information

Adding Virtual Characters to the Virtual Worlds. Yiorgos Chrysanthou Department of Computer Science University of Cyprus

Adding Virtual Characters to the Virtual Worlds. Yiorgos Chrysanthou Department of Computer Science University of Cyprus Adding Virtual Characters to the Virtual Worlds Yiorgos Chrysanthou Department of Computer Science University of Cyprus Cities need people However realistic the model is, without people it does not have

More information

NUMB3RS Activity: Follow the Flock. Episode: In Plain Sight

NUMB3RS Activity: Follow the Flock. Episode: In Plain Sight Teacher Page 1 NUMB3RS Activity: Follow the Flock Topic: Introduction to Flock Behavior Grade Level: 8-12 Objective: Use a mathematical model to simulate an aspect of birds flying in a flock Time: 30 minutes

More information

GPU Accelerating Speeded-Up Robust Features Timothy B. Terriberry, Lindley M. French, and John Helmsen

GPU Accelerating Speeded-Up Robust Features Timothy B. Terriberry, Lindley M. French, and John Helmsen GPU Accelerating Speeded-Up Robust Features Timothy B. Terriberry, Lindley M. French, and John Helmsen Overview of ArgonST Manufacturer of integrated sensor hardware and sensor analysis systems 2 RF, COMINT,

More information

ACER VR UTILITY FOR CHROME BROWSER WEBVR GUIDE [ EXPERIMENTAL ]

ACER VR UTILITY FOR CHROME BROWSER WEBVR GUIDE [ EXPERIMENTAL ] ACER VR UTILITY FOR CHROME BROWSER WEBVR GUIDE [ EXPERIMENTAL ] SYSTEM REQUIREMENT 1.1. Download 1.2. Minimum and Recommended PC specifications FREQUENTLY ASKED QUESTIONS 3.1. FAQs INSTALLATION GUIDE 2.1.

More information

GPU Consolidation for Cloud Games: Are We There Yet?

GPU Consolidation for Cloud Games: Are We There Yet? GPU Consolidation for Cloud Games: Are We There Yet? Hua-Jun Hong 1, Tao-Ya Fan-Chiang 1, Che-Run Lee 1, Kuan-Ta Chen 2, Chun-Ying Huang 3, Cheng-Hsin Hsu 1 1 Department of Computer Science, National Tsing

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

Testing Overview: Executive Summary:

Testing Overview: Executive Summary: IMSCAD put Intel s CORE i7 vpro 8 th Gen processor with Radeon Pro Graphics (Kaby Lake G) through its paces with market leading graphical applications and workflows. Executive Summary: IMSCAD were asked

More information

B. Tech. Project Second Stage Report on

B. Tech. Project Second Stage Report on B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic

More information

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS CIS 601 - Graduate Seminar Presentation 1 GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS PRESENTED BY HARINATH AMASA CSU ID: 2697292 What we will talk about.. Current problems GPU What are GPU Databases GPU

More information

Analyze and Optimize Windows* Game Applications Using Intel INDE Graphics Performance Analyzers (GPA)

Analyze and Optimize Windows* Game Applications Using Intel INDE Graphics Performance Analyzers (GPA) Analyze and Optimize Windows* Game Applications Using Intel INDE Graphics Performance Analyzers (GPA) Intel INDE Graphics Performance Analyzers (GPA) are powerful, agile tools enabling game developers

More information

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games Viewdle Inc. 1 Use cases Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games 2 Why OpenCL matter? OpenCL is going to bring such

More information

CS3600 SYSTEMS AND NETWORKS

CS3600 SYSTEMS AND NETWORKS CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection

More information

Lightroom System February 2018 Updates

Lightroom System February 2018 Updates Lightroom System February 2018 Updates This February Adobe have updated Lightroom Classic CC to bring you better performance (especially for high-end system users) and the ability to filter folders via

More information

Specifying Storage Servers for IP security applications

Specifying Storage Servers for IP security applications Specifying Storage Servers for IP security applications The migration of security systems from analogue to digital IP based solutions has created a large demand for storage servers high performance PCs

More information

A Simulated Annealing algorithm for GPU clusters

A Simulated Annealing algorithm for GPU clusters A Simulated Annealing algorithm for GPU clusters Institute of Computer Science Warsaw University of Technology Parallel Processing and Applied Mathematics 2011 1 Introduction 2 3 The lower level The upper

More information

David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms.

David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms. Whitepaper Introduction A Library Based Approach to Threading for Performance David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms.

More information

Map Abstraction with Adjustable Time Bounds

Map Abstraction with Adjustable Time Bounds Map Abstraction with Adjustable Time Bounds Sourodeep Bhattacharjee and Scott D. Goodwin School of Computer Science, University of Windsor Windsor, N9B 3P4, Canada sourodeepbhattacharjee@gmail.com, sgoodwin@uwindsor.ca

More information

Parallelism in Hardware

Parallelism in Hardware Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law

More information

REAL PERFORMANCE RESULTS WITH VMWARE HORIZON AND VIEWPLANNER

REAL PERFORMANCE RESULTS WITH VMWARE HORIZON AND VIEWPLANNER April 4-7, 2016 Silicon Valley REAL PERFORMANCE RESULTS WITH VMWARE HORIZON AND VIEWPLANNER Manvender Rawat, NVIDIA Jason K. Lee, NVIDIA Uday Kurkure, VMware Inc. Overview of VMware Horizon 7 and NVIDIA

More information

Distributed Virtual Reality Computation

Distributed Virtual Reality Computation Jeff Russell 4/15/05 Distributed Virtual Reality Computation Introduction Virtual Reality is generally understood today to mean the combination of digitally generated graphics, sound, and input. The goal

More information

Creating Loopable Animations By Ryan Bird

Creating Loopable Animations By Ryan Bird Creating Loopable Animations By Ryan Bird A loopable animation is any-length animation that starts the same way it ends. If done correctly, when the animation is set on a loop cycle (repeating itself continually),

More information

Parallel LZ77 Decoding with a GPU. Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU

Parallel LZ77 Decoding with a GPU. Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU Parallel LZ77 Decoding with a GPU Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU Outline Background (What?) Problem definition and motivation (Why?)

More information

Journal of Universal Computer Science, vol. 14, no. 14 (2008), submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.

Journal of Universal Computer Science, vol. 14, no. 14 (2008), submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J. Journal of Universal Computer Science, vol. 14, no. 14 (2008), 2416-2427 submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.UCS Tabu Search on GPU Adam Janiak (Institute of Computer Engineering

More information

INSIDE EXPLORER HARDWARE CATALOGUE MAKING THE INSIDE VISIBLE - AN INTUITIVE AND INTERACTIVE 3D LEARNING EXPERIENCE V 1.

INSIDE EXPLORER HARDWARE CATALOGUE MAKING THE INSIDE VISIBLE - AN INTUITIVE AND INTERACTIVE 3D LEARNING EXPERIENCE V 1. INSIDE EXPLORER HARDWARE CATALOGUE V 1.0 APRIL 2017 MAKING THE INSIDE VISIBLE - AN INTUITIVE AND INTERACTIVE 3D LEARNING EXPERIENCE www.interspectral.com INSIDE EXPLORER HARDWARE Inside explorer is delivered

More information

Installing Acumen Fuse in a Citrix XenApp Environment

Installing Acumen Fuse in a Citrix XenApp Environment Installing Acumen Fuse in a Citrix XenApp Environment Requirements The XenApp servers should run Windows Server 2003 Service Pack 2, 2003 R2 or 2008 or 2008 R2 or greater. The servers must also have the

More information

Predictive Runtime Code Scheduling for Heterogeneous Architectures

Predictive Runtime Code Scheduling for Heterogeneous Architectures Predictive Runtime Code Scheduling for Heterogeneous Architectures Víctor Jiménez, Lluís Vilanova, Isaac Gelado Marisa Gil, Grigori Fursin, Nacho Navarro HiPEAC 2009 January, 26th, 2009 1 Outline Motivation

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

GaaS Workload Characterization under NUMA Architecture for Virtualized GPU

GaaS Workload Characterization under NUMA Architecture for Virtualized GPU GaaS Workload Characterization under NUMA Architecture for Virtualized GPU Huixiang Chen, Meng Wang, Yang Hu, Mingcong Song, Tao Li Presented by Huixiang Chen ISPASS 2017 April 24, 2017, Santa Rosa, California

More information

FlightViz Simulation Tool 2006 User & Administrator s Manual

FlightViz Simulation Tool 2006 User & Administrator s Manual FlightViz Simulation Tool 2006 User & Administrator s Manual Michigan State University In Collaboration with The Boeing Company Boeing Representatives Jayson Vincent Lanya da Silva Kim Monteith Michigan

More information

MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer

MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer A New Gaming Experience Made Possible With Processor Graphics Released in early 2011, the 2nd Generation

More information

Assignment 4: Flight Simulator

Assignment 4: Flight Simulator VR Assignment 4: Flight Simulator Released : Feb 19 Due : March 26th @ 4:00 PM Please start early as this is long assignment with a lot of details. We simply want to make sure that you have started the

More information

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Appsense Environment Manager. User Personalization Performance & Scalability (version ) Technical Overview

Appsense Environment Manager. User Personalization Performance & Scalability (version ) Technical Overview Appsense Environment Manager Technical Overview APPSENSE ENVIRONMENT MANAGER - Technical Overview This report details the results of the internal performance and scalability testing performed by AppSense

More information

Dynamic 3D representation of information using low cost Cloud ready Technologies

Dynamic 3D representation of information using low cost Cloud ready Technologies National Technical University Of Athens School of Rural and Surveying Engineering Laboratory of Photogrammetry Dynamic 3D representation of information using low cost Cloud ready Technologies George MOURAFETIS,

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

GPU 101. Mike Bailey. Oregon State University. Oregon State University. Computer Graphics gpu101.pptx. mjb April 23, 2017

GPU 101. Mike Bailey. Oregon State University. Oregon State University. Computer Graphics gpu101.pptx. mjb April 23, 2017 1 GPU 101 Mike Bailey mjb@cs.oregonstate.edu gpu101.pptx Why do we care about GPU Programming? A History of GPU Performance vs. CPU Performance 2 Source: NVIDIA How Can You Gain Access to GPU Power? 3

More information

GPU 101. Mike Bailey. Oregon State University

GPU 101. Mike Bailey. Oregon State University 1 GPU 101 Mike Bailey mjb@cs.oregonstate.edu gpu101.pptx Why do we care about GPU Programming? A History of GPU Performance vs. CPU Performance 2 Source: NVIDIA 1 How Can You Gain Access to GPU Power?

More information

Blazer Pro V2.1 Client Requirements & Hardware Performance

Blazer Pro V2.1 Client Requirements & Hardware Performance Blazer Pro V2.1 Client Requirements & Hardware Performance Table of Contents Chapter 1 Client Requirements... 2 Chapter 2 Control Client Performance... 3 2.1 Local Control Client on Blazer Pro Server...

More information

AcuSolve Performance Benchmark and Profiling. October 2011

AcuSolve Performance Benchmark and Profiling. October 2011 AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute

More information

What You Need to Know When Buying a New Computer JackaboutComputers.com

What You Need to Know When Buying a New Computer JackaboutComputers.com If it s been a while since you bought your last computer, you could probably use a quick refresher on what you need to know to make a good purchase. Computers today are a much larger part of our life than

More information

Parallel Computer Architecture and Programming Written Assignment 3

Parallel Computer Architecture and Programming Written Assignment 3 Parallel Computer Architecture and Programming Written Assignment 3 50 points total. Due Monday, July 17 at the start of class. Problem 1: Message Passing (6 pts) A. (3 pts) You and your friend liked the

More information

Consulting Solutions WHITE PAPER Citrix XenDesktop XenApp 6.x Planning Guide: Virtualization Best Practices

Consulting Solutions WHITE PAPER Citrix XenDesktop XenApp 6.x Planning Guide: Virtualization Best Practices Consulting Solutions WHITE PAPER Citrix XenDesktop XenApp 6.x Planning Guide: Virtualization Best Practices www.citrix.com Table of Contents Overview... 3 Scalability... 3 Guidelines... 4 Operations...

More information

An Adaptive Control Scheme for Multi-threaded Graphics Programs

An Adaptive Control Scheme for Multi-threaded Graphics Programs Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 498 An Adaptive Control Scheme for Multi-threaded Graphics Programs

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information