Using CUDA for Solar Thermal Plant Computation

Size: px
Start display at page:

Download "Using CUDA for Solar Thermal Plant Computation"

Transcription

1 Using CUDA for Solar Thermal Plant Computation Instructor: Dr. Kwok-Bun Yue Mentors: Dr. Michel Izygon Peter Armstrong Team: Sahithi Chalasani Pranav Mantini Claus Nilsson Arunkumar Subramanian Spring /4/2009

2

3 1.0 Abstract Solar thermal power plants consist of a central tower surrounded by heliostats (mirrors.) The heliostats focus the sunlight on the tower where the thermal energy is used to generate electricity. Each heliostat may be shaded by its neighbors as well as blocked by them so the reflected light does not reach the tower. The Solar Thermal Plant Computation application is used to calculate the effective area of each heliostat. While the calculations needed to determine the shaded and blocked areas are relatively simple, the sheer number of calculations needed to determine the interaction between the heliostats is immense even for relatively small fields. The area of the representative heliostat that is shaded or blocked does not contribute towards the power generated. This area should be subtracted using a polygon clipper. The original program designed by Tietronix Software, Inc. makes call to a general polygon clipping (GPC) library. This GPC is a huge library designed at The University of Manchester. It has about 2500 lines of code. Most of the processing time for calculating the co-ordinates is taken by the GPC. The GPC library used for the original computation could not be used for our purpose. As, the GPC library are located on the host, a call to the function on the host from the device is lot more time consuming. A polygon clipping algorithm that is more specific to the computation algorithm has to be designed. The polygon clipping algorithm used for our design is a paper, Efficient clipping of arbitrary polygons proposed by Gunther Greiner and Kai Hormann. This algorithm is chosen because it is relatively more efficient than Sutherland Hodgman algorithm which is more commonly used. The data structures used for the polygons are very simple. A doubly linked list is used in the algorithm to represent the polygons. The clipping algorithm involves the 1

4 calculation of all the intersection points and the choosing among these points to create the desired polygon. The current version of the application is single threaded which means that one run can take several hours. Since multiple runs are needed to judge the efficiency of the layout of a field throughout the year, it can be very time consuming to test design changes. The purpose of this project is to demonstrate the feasibility of decreasing the application's run time by using Nvidia's CUDA (Compute Unified Device Architecture) architecture. CUDA allows an application to take advantage of the many cores of an Nvidia graphics processor to parallelize the calculations thereby decreasing the time needed for each run. Several issues with CUDA influenced our design. CUDA does not allow function calls from the GPU to the CPU so using the GPC clipping library in its current form is not possible. 2

5 Table of Contents 1.0 Abstract Introduction & Background Introduction Background Shading and blocking Design and Implementation Technologies Architecture Design Clipping Algorithm CUDA Implementation Implementation Issues Evaluation Conclusion Further Work References Appendices A: Project Management and Team Information B: Major tasks and contributions C: Code comments D: Schedule

6 2.0 Introduction & Background 2.1 Introduction Tietronix, Inc. has a single threaded application used to calculate the efficiency of a solar thermal power plant at a given position, date, and time. In order to find an optimal positioning of the heliostats in the field, the efficiency of the field must be calculated multiple times for different days of the year and times of day. Currently the calculations needed to calculate the efficiency of a layout take so long, that exploring multiple layouts is a very time consuming activity and hinders the usability of the application. Therefore, Tietronix suggested that Nvidia s CUDA technology be used to decrease the time needed to calculate the efficiency of a field at a given time and date, hoping that it could be reduced enough that exploring multiple layouts of a field would become practical rather than just possible. 2.2 Background Solar Thermal Plant: Concentrating solar power plants produce electricity by reflecting sunlight onto a central receiver, where the energy is used to heat a medium which ultimately drives electrical generators. Sunlight is reflected toward the receiver by mirrored devices called heliostats. Heliostats, sometimes numbering in the tens of thousands, are organized into fields around a tower holding the receiver at the appropriate height above the ground. The sunlight is reflected using a device called a heliostat. A heliostat tracks the movement of the sun and orients the mirror, to redirect the sunlight to the central receiver. 4

7 Figure 2 - View of Solucar PS10 near Seville, Spain [3] In order to create the optimum layout for a solar thermal power plant, the individual contributions of each heliostat must be calculated to ensure that as little energy (sunlight) is wasted as possible. To develop an initial layout an application like this one is used to generate a preliminary design. The design is preliminary because the effectiveness of each heliostat is only calculated for a minority of the heliostats. Figure 1 - A cell with the representative heliostat (in red) surrounded by neighboring heliostats. Tietronix [1] The field is expressed as a grid, with the receiver in one cell surrounded by heliostats in all the other cells. Each cell has one representative heliostat which stands in for the actual heliostats which will be located in that section of the field and a number (8 to 84) of neighboring heliostats. The effectiveness of the representative heliostat is limited by the shadows cast on it by its 5

8 neighbors as well as the light it reflects towards the receiver which is blocked by one or more of its neighbors. The cells are treated as being independent of each other and do not affect other cells. Ultimately the total effective area is calculated for the entire grid, giving an effectiveness rating for a given time and day. It is this total effective area that the user wants to maximize in order to get the best value out of power plant. Factors which affect the calculations for a field are (amongst others) the size and shape of the mirrors, the spacing between heliostats, how the heliostats are placed, time of day, day of year, geographical location of the plant, etc. This application lets the user develop an overall layout before proceeding to another application which calculates the effectiveness for every heliostat in the field based on how each heliostat is influenced by its neighbors. 2.3 Shading and blocking A field of heliostats suffers losses caused by shading and blocking by neighboring heliostats. When a heliostat shadows itself to another heliostat which is located behind it, shading occurs at low sun angles. Blocking occurs when a heliostat in front of another heliostat blocks the reflected suns energy on its way to the receiver. The amount of sunlight reflected onto the central receiver depends on the total area of the heliostats that is neither shaded nor blocked. To optimize the energy generated from a solar thermal plant, the total area of the heliostats that is shaded or blocked should be calculated. The following figure illustrates the concept of shading and blocking losses. 6

9 Figure 3 - Explains about the losses of shading and blocking. [5] 3.0 Design and Implementation Tietronix was interested in determining whether or not using CUDA could improve the runtime over their current implementation. Peter Armstrong of Tietronix provided the team with the algorithm and equations needed to calculate the layout (positions of the heliostats) of a thermal solar plant. The resultant data from this algorithm is then used as input for the clipping algorithm which calculates the percentage of each heliostat s mirror which is actively contributing to the working of the power plant. Currently Tietronix uses a code library, called GPC (General Clipping Algorithm), to handle clipping for their application and then calculates the percentages from the clipping results. CUDA does not allow function calls from the device (the GPU) to host (the CPU) side functions (GPC) so the team had to find some way of either 7

10 calling the clipping function from the host side, which would result in an application much like the current one, or implement the clipping functionality on the device. We chose to implement clipping on the device and started looking for a suitable algorithm. Initially we looked at porting the GPC code to run on CUDA, but decided against it for two reasons. One, the library is fairly large with about 2500 lines of code, and secondly it uses a lot of dynamic memory allocations which are not supported by CUDA. 3.1 Technologies The project requirements specified the use of CUDA to decrease the runtime of the current application. Also, Tietronix requested a Windows application which led to the initial selection of Microsoft s Visual Studio 2008 as the IDE of choice. Due to complications integrating the CUDA API with VS 2008, Visual Studio 2005 was chosen instead as it provided the needed functionality and did not have issues with CUDA. Two Nvidia GPUs were used by the team to test the CUDA code; a 8600M running on Windows XP Pro and a 8800 GTS running on 64 bit Windows Vista Ultimate. CUDA is an extension to the C programming language allowing programmers to easily take advantage of the floating point calculating power of a modern Nvidia graphics processing unit (GPU.) CUDA applications divide into two distinct parts; code which runs on the host (the CPU) and code that runs on the device (the GPU). Host side code can use the full range of C functionality with a CUDA specific extension. The device side code is a subset of C extended with some device specific commands. Using CUDA a developer can convert parts of (or all of) an application to execute in parallel on the GPU to achieve a performance gain with just a little work. CUDA manages all creation, 8

11 maintenance, and destruction of threads, leaving the developer to focus on optimizing the application to use the available resources in the most efficient manner. 3.2 Architecture Modern graphics processors can have hundreds of thread processors and are capable of processing thousands of threads concurrently. While individually these processors are slower and less capable than a CPU core, the sheer numbers of them allow the GPU to churn through a large number of floating point calculations in short order. Also, the Nvidia GPUs are created to use very low cost threads making switching between threads very cheap which helps boost the efficiency of the GPU. Nvidia GPUs are divided into multiple Thread Processors which can run multiple threads concurrently. The number of threads and the number of thread processors varies from product to product making it vital to tailor ones program to the exact model of GPU in order to achieve the maximum efficiency for an application. 9

12 3.3 Design The project has two major parts; a single threaded C application used to implement the algorithms in a familiar environment for debugging and testing. This single threaded version cut down on the number of unknowns presented by the project by not introducing CUDA into the code. Secondly, the multithreaded version implemented using Nvidia s CUDA platform. Since CUDA is an extension to C, some of the code from the single threaded application could be copied directly to the CUDA kernel thereby reducing the amount of new unproven code in the new environment Clipping Algorithm As large part of computation time is taken by the clipping algorithm, initially the team started to design a clipping algorithm that is more specific to the requirements. But, as the number of times shading and blocking that can occur in a cell are uncertain, the number of vertices that the polygon clipping code takes as input is unknown. For this purpose a more general algorithm had to be chosen. The clipping algorithm used for our purpose has already been implemented in C. But this code has lot of functions that are unnecessary for our computation so the algorithm had to be implemented again. This algorithm was compared to Vatti s algorithm, a very widely used algorithm, and the comparison shows an improved performance. [4] CUDA Implementation The CUDA implementation consists of one main function which runs on the host (CPU), two kernels, and a number of device side functions. Kernels are functions which run on the device (the graphics card; containing the GPU and the device memory) which are callable from the host. Kernels run asynchronously so once called, they return the execution pointer right back to the 10

13 calling code on the host which can then choose to do something else or wait for the kernel to finish. Device side functions (non-kernels) can only be called from the device. The first kernel does the setup of the field and calculates where all the heliostats are, as well as their orientation in regards to the sun and the receiver. This allows us to determine the three dimensional positions of the mirrors vertices which can then be projected into the representative heliostat s mirror s 2D plane which allows us to determine how much of the representative mirror is shaded and/or blocked by the neighboring heliostats. Kernels equate to device side threads. Therefore, when calling a kernel the caller specifies how many threads need to be started. Kernel one is called once per heliostat in the grid by telling CUDA that we want a grid of size m by n (which matches the grid of the power plant field), as well as how many threads are to be run per cell in the grid (nine in our case; one for each heliostat in the cell.) While this may not be the most efficient use of the resources available, it represents a logical representation of the field in question, which allows for a simpler design. Furthermore, each cell in the grid is run as its own thread group which shares a section of memory. This allows us to share information between the threads in a cell, which in our case means that we can share certain parts of the representative heliostat s calculated values with the neighboring heliostats. In order to share, the threads need to synchronize, which in CUDA s case means a break point is inserted into the code which tells each thread of the group (cell) to wait here until every thread reaches this point. This may be inefficient (it might be more efficient to simply recalculate the values in question for every heliostat) as it introduces a break in the parallel execution of the cell s threads, but this still remains to be tested. 11

14 The second kernel implements the clipping functionality. This kernel processes the cells in parallel, but the heliostats within each cell are processed sequentially because the output from the clipping function for one pairing (representative and neighboring heliostat) may be needed for the next pairing (if any). Currently the application is configured to first calculate the non-shaded area of the representative heliostats and then the non-blocked area. Calculating these two areas in parallel and then taking the intersection between them may be more efficient, but that is left for a later time. The algorithm the team decided to implement uses dynamic memory allocation to build the polygons as the algorithm progresses. Since CUDA does not support dynamic memory allocation from the device side, we had to implement a fixed size array in lieu of the dynamic memory allocations. This obviously leads to inefficient use of memory since we have to size the array to hold the maximum number of vertices we foresee for any resulting polygon. This may not be too bad for our project, but for a production system this would be a major issue which must be addressed further. Also, the algorithm uses double linked lists to hold the polygons. Furthermore, a vertex in one polygon may point to vertex in another polygon (called neighbors). This pointing functionality is replaced by a simple search function in our implementation which is obviously not as efficient as a simple pointer. Currently both kernels as well as the device functions use intermediary steps during calculation using local variables. This leads to a large number of variable declarations and initializations. CUDA is picky about how memory is used, so this is definitely one area that is likely to yield improvements once these temporary variables are removed. However, since we are not done debugging they are still in the code. 12

15 3.4 Implementation Issues At the beginning of the project the team encountered several problems regarding CUDA; only one team member had access to an Nvidia GPU at home. One other team member was able to install the CUDA API and get it to run in emulation mode (in emulation mode the CPU simulates the GPU, allowing for easier debugging). Also, there were issues with making the CUDA compiler work well with Visual Studio 2008, which prompted a switch to using Visual Studio As programming on the CUDA side commenced in became apparent that the single precision offered by older CUDA capable hardware (which includes the devices available to the team) could not handle some of the vectors used in the application without truncation which lead to calculation errors throughout the application. Unfortunately, due to the unfinished nature of the application we have not yet been able to determine the severity of these errors. The clipping algorithms evaluated by the team made heavy use of dynamic memory allocation during runtime which is not supported by CUDA. Therefore, the team had to re-design the chosen algorithm to use a fixed amount of memory based on the estimated maximum needed by the application. 4.0 Evaluation Since the application is not complete we are unable to determine whether or not converting it to use CUDA is worthwhile. It is the guess of the authors that using CUDA is worthwhile but we can offer no data to support this opinion. During our informal testing (for debugging purposes) 13

16 the application has seemed responsive with the runtime being under 20 seconds for every run, but so far we have only run the application to calculate shading. Including blocking in the run as well as effective area calculations and a total area calculation would obvious affect the runtime, but we estimate that a total run can be performed in less than 60 seconds for a 10 by 10 grid with 9 heliostats per cell. Whether or not this would represent a satisfactory and/or worthwhile improvement to Tietronix is unknown. It is now obvious that the group needed far better project management and communication and the group leader takes full responsibility for the shortcomings in this area. Furthermore, the leader should have been far more proactive in ensuring that the project followed the planned timeline, and not accepted the excessive delays in various aspects of the project. 5.0 Conclusion This project has shown that creating a massively multithreaded application to calculate the efficiency of a solar thermal power plant is possible. However, as we are still debugging the clipping part of the project we can draw no formal conclusions. Using CUDA was easier than anticipated even though the lack of dynamic memory allocation affected how we implemented the application. The team never moved into optimizing the code for performance on the device since the code is incomplete, but it is the impression that this grid based application will not achieve as much of a improvement as an application which calculates the effectiveness for every heliostat in a field. 14

17 5.1 Further Work In order to make the CUDA technologies work in a production environment the application developed by team 5 must be altered from a cell oriented approach to an approach focusing on individual heliostats. While the application design would be almost identical to the one created for this project, the number of threads (heliostats) which are grouped together would depend upon the actual hardware on which the application is to run in order to optimize the usage of the available resources. The number of threads must be high enough to fully utilize the individual processors of the GPU, yet low enough that the kernel (CUDA method) can run on one thread processor. The current implementation of the clipping algorithm relies on a simple fixed sized array (with insertion at a specific index) to handle the polygons. This implementation should be replaced with a double linked list designed to work with a fixed size array. This array can either be sized to fit one polygon (and included for every cell), or one array can be designed to function as regular memory and hold all the vertices for all the polygons of the field. The array per cell is the easiest to implement, while the one array as dynamic memory would probably use memory more efficiently (since not all polygons will be of the maximum size) but may not be efficient in CUDA due to how CUDA accesses device memory. 15

18 References 1. Armstrong, P. An Algorithm for Shading and Blocking Computations of a Field of Heliostats Arranged in a Grid Layout. Available from Tietronix Software, Inc.; received February Greiner, G. and Hormann, K Efficient clipping of arbitrary polygons. ACM Trans. Graph. 17, 2 (Apr. 1998), DOI= 3. PS10 solar power tower 2.jpg. Retrieved from Wikipedia.org on April 23 rd, Greiner, G. and Hormann, K. Efficient Clipping of Arbitrary Polygons. Retrieved May 4 th, Thathireddy, K., Garre, S., Khorsand, S., Nandigam, T. Solar Thermal Plant Design and Operation Suite of Tools. UHCL Capstone Project. May 5 th, Retrieved May 4 th,

19 Appendices A: Project Management and Team Information Roles: Application Design: CUDA Programmer: C Programmer: Design an efficient CUDA code for the solar thermal Plant computation. Adapt the C code to use CUDA. Implement the existing solar thermal plant computation algorithm and the chosen polygon clipping in C. Research on clipping algorithms: Find or design a polygon clipping algorithm that is more effective. The polygon clipping algorithm used for our design is a paper, Efficient clipping of arbitrary polygons proposed by Gunther Greiner and Kai Hormann. This algorithm is chosen because it is relatively more efficient than Sutherland Hodgman algorithm which is more commonly used. The data structures used for the polygons are very simple. Website Maintenance: Minutes and agendas: Technical writing: Design and update the capstone website regularly. Write the meeting and agenda for all the team and mentor meetings. Write the technical report. 17

20 B: Major tasks and contributions Application Design: 50% Pranav and 50% Claus CUDA Programmer: 100 % Claus Nilsson C Programmer: 100 % Pranav Mantini Research on clipping algorithms: 40% Arun, 20% Pranav, and 40% Sahithi Website Maintenance: 50% Pranav, 20% Claus, 15% Sahithi, 15% Arun Minutes and agendas: 100 % Sahithi Chalasani Technical writing: 50% Claus, 25% Pranav, 20% Sahithi, 5% Arun. C: Code comments The code included on the accompanying disk comes as two Visual Studio 2005 projects. One is for a 64 bit systems the other for 32 bit. The code is the same for both projects, but the configuration of the project requires either a 64 or 32 bit version of the CUDA API (both are available from Nvidia) be installed prior to compilation. Also, a CUDA enabled driver must be installed for the Nvidia graphics processor on the system. Please see for more information. The project code is currently heavily infested with printf statements used for debugging and therefore the project may not compile in regular debug mode. Please use EmuDebug instead which allows for outputting directly from kernels and device side functions. The code currently does not call the clipping kernel as a bug was revealed during testing which affected made the clipping function go into an infinite loop due to a problem with a 18

21 dysfunctional polygon caused by too many neighboring heliostats shading the representative heliostat. At least that is what we think is wrong right now. More experiments are needed to determine the exact bug and fix it. 19

22 D: Schedule 20

An Introduction to GPGPU Pro g ra m m ing - CUDA Arc hitec ture

An Introduction to GPGPU Pro g ra m m ing - CUDA Arc hitec ture An Introduction to GPGPU Pro g ra m m ing - CUDA Arc hitec ture Rafia Inam Mälardalen Real-Time Research Centre Mälardalen University, Västerås, Sweden http://www.mrtc.mdh.se rafia.inam@mdh.se CONTENTS

More information

GENERAL-PURPOSE COMPUTATION USING GRAPHICAL PROCESSING UNITS

GENERAL-PURPOSE COMPUTATION USING GRAPHICAL PROCESSING UNITS GENERAL-PURPOSE COMPUTATION USING GRAPHICAL PROCESSING UNITS Adrian Salazar, Texas A&M-University-Corpus Christi Faculty Advisor: Dr. Ahmed Mahdy, Texas A&M-University-Corpus Christi ABSTRACT Graphical

More information

Introduction to Computing and Systems Architecture

Introduction to Computing and Systems Architecture Introduction to Computing and Systems Architecture 1. Computability A task is computable if a sequence of instructions can be described which, when followed, will complete such a task. This says little

More information

CUDA (Compute Unified Device Architecture)

CUDA (Compute Unified Device Architecture) CUDA (Compute Unified Device Architecture) Mike Bailey History of GPU Performance vs. CPU Performance GFLOPS Source: NVIDIA G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce

More information

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions.

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions. 1 2 Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions. 3 We aim for something like the numbers of lights

More information

Enhanced Debugging with Traces

Enhanced Debugging with Traces Enhanced Debugging with Traces An essential technique used in emulator development is a useful addition to any programmer s toolbox. Peter Phillips Creating an emulator to run old programs is a difficult

More information

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process

More information

GPU 101. Mike Bailey. Oregon State University. Oregon State University. Computer Graphics gpu101.pptx. mjb April 23, 2017

GPU 101. Mike Bailey. Oregon State University. Oregon State University. Computer Graphics gpu101.pptx. mjb April 23, 2017 1 GPU 101 Mike Bailey mjb@cs.oregonstate.edu gpu101.pptx Why do we care about GPU Programming? A History of GPU Performance vs. CPU Performance 2 Source: NVIDIA How Can You Gain Access to GPU Power? 3

More information

GPU 101. Mike Bailey. Oregon State University

GPU 101. Mike Bailey. Oregon State University 1 GPU 101 Mike Bailey mjb@cs.oregonstate.edu gpu101.pptx Why do we care about GPU Programming? A History of GPU Performance vs. CPU Performance 2 Source: NVIDIA 1 How Can You Gain Access to GPU Power?

More information

GPU Programming Using NVIDIA CUDA

GPU Programming Using NVIDIA CUDA GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics

More information

REAL-TIME GPU PHOTON MAPPING. 1. Introduction

REAL-TIME GPU PHOTON MAPPING. 1. Introduction REAL-TIME GPU PHOTON MAPPING SHERRY WU Abstract. Photon mapping, an algorithm developed by Henrik Wann Jensen [1], is a more realistic method of rendering a scene in computer graphics compared to ray and

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Threads. Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011

Threads. Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011 Threads Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011 Threads Effectiveness of parallel computing depends on the performance of the primitives used to express

More information

Speed Up Your Codes Using GPU

Speed Up Your Codes Using GPU Speed Up Your Codes Using GPU Wu Di and Yeo Khoon Seng (Department of Mechanical Engineering) The use of Graphics Processing Units (GPU) for rendering is well known, but their power for general parallel

More information

Offloading Java to Graphics Processors

Offloading Java to Graphics Processors Offloading Java to Graphics Processors Peter Calvert (prc33@cam.ac.uk) University of Cambridge, Computer Laboratory Abstract Massively-parallel graphics processors have the potential to offer high performance

More information

Slide Set 5. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Slide Set 5. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng Slide Set 5 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide

More information

Soft shadows. Steve Marschner Cornell University CS 569 Spring 2008, 21 February

Soft shadows. Steve Marschner Cornell University CS 569 Spring 2008, 21 February Soft shadows Steve Marschner Cornell University CS 569 Spring 2008, 21 February Soft shadows are what we normally see in the real world. If you are near a bare halogen bulb, a stage spotlight, or other

More information

Capriccio: Scalable Threads for Internet Services

Capriccio: Scalable Threads for Internet Services Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley Presenter: Cong Lin Outline Part I Motivation

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied

More information

Available online at ScienceDirect. Energy Procedia 69 (2015 )

Available online at   ScienceDirect. Energy Procedia 69 (2015 ) Available online at www.sciencedirect.com ScienceDirect Energy Procedia 69 (2015 ) 1885 1894 International Conference on Concentrating Solar Power and Chemical Energy Systems, SolarPACES 2014 Heliostat

More information

6.033 Computer System Engineering

6.033 Computer System Engineering MIT OpenCourseWare http://ocw.mit.edu 6.033 Computer System Engineering Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 6.033 2009 Lecture

More information

Could you make the XNA functions yourself?

Could you make the XNA functions yourself? 1 Could you make the XNA functions yourself? For the second and especially the third assignment, you need to globally understand what s going on inside the graphics hardware. You will write shaders, which

More information

CSCI 4620/8626. Computer Graphics Clipping Algorithms (Chapter 8-5 )

CSCI 4620/8626. Computer Graphics Clipping Algorithms (Chapter 8-5 ) CSCI 4620/8626 Computer Graphics Clipping Algorithms (Chapter 8-5 ) Last update: 2016-03-15 Clipping Algorithms A clipping algorithm is any procedure that eliminates those portions of a picture outside

More information

Advanced CUDA Optimization 1. Introduction

Advanced CUDA Optimization 1. Introduction Advanced CUDA Optimization 1. Introduction Thomas Bradley Agenda CUDA Review Review of CUDA Architecture Programming & Memory Models Programming Environment Execution Performance Optimization Guidelines

More information

Supporting Data Parallelism in Matcloud: Final Report

Supporting Data Parallelism in Matcloud: Final Report Supporting Data Parallelism in Matcloud: Final Report Yongpeng Zhang, Xing Wu 1 Overview Matcloud is an on-line service to run Matlab-like script on client s web browser. Internally it is accelerated by

More information

Computer Fundamentals: Operating Systems, Concurrency. Dr Robert Harle

Computer Fundamentals: Operating Systems, Concurrency. Dr Robert Harle Computer Fundamentals: Operating Systems, Concurrency Dr Robert Harle This Week The roles of the O/S (kernel, timeslicing, scheduling) The notion of threads Concurrency problems Multi-core processors Virtual

More information

Abstract. Introduction. Kevin Todisco

Abstract. Introduction. Kevin Todisco - Kevin Todisco Figure 1: A large scale example of the simulation. The leftmost image shows the beginning of the test case, and shows how the fluid refracts the environment around it. The middle image

More information

Scientific discovery, analysis and prediction made possible through high performance computing.

Scientific discovery, analysis and prediction made possible through high performance computing. Scientific discovery, analysis and prediction made possible through high performance computing. An Introduction to GPGPU Programming Bob Torgerson Arctic Region Supercomputing Center November 21 st, 2013

More information

Operating- System Structures

Operating- System Structures Operating- System Structures 2 CHAPTER Practice Exercises 2.1 What is the purpose of system calls? Answer: System calls allow user-level processes to request services of the operating system. 2.2 What

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

PROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8

PROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8 PROCESSES AND THREADS THREADING MODELS CS124 Operating Systems Winter 2016-2017, Lecture 8 2 Processes and Threads As previously described, processes have one sequential thread of execution Increasingly,

More information

ULTIMATE Grass & Meadows Worldbuilder

ULTIMATE Grass & Meadows Worldbuilder ULTIMATE Grass & Meadows Worldbuilder USER GUIDE Welcome! First, I want to say, this product is simpler than it may initially seem. Everything should make sense to you fairly readily if you use the Content

More information

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions. Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication

More information

Parallelism. Parallel Hardware. Introduction to Computer Systems

Parallelism. Parallel Hardware. Introduction to Computer Systems Parallelism We have been discussing the abstractions and implementations that make up an individual computer system in considerable detail up to this point. Our model has been a largely sequential one,

More information

COMP 605: Introduction to Parallel Computing Lecture : GPU Architecture

COMP 605: Introduction to Parallel Computing Lecture : GPU Architecture COMP 605: Introduction to Parallel Computing Lecture : GPU Architecture Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University (SDSU) Posted:

More information

ArcGIS Runtime: Maximizing Performance of Your Apps. Will Jarvis and Ralf Gottschalk

ArcGIS Runtime: Maximizing Performance of Your Apps. Will Jarvis and Ralf Gottschalk ArcGIS Runtime: Maximizing Performance of Your Apps Will Jarvis and Ralf Gottschalk Agenda ArcGIS Runtime Version 100.0 Architecture How do we measure performance? We will use our internal Runtime Core

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices About libgcm Using the SPUs with the RSX Brief overview of GCM Replay December 7 th, 2004

More information

Technical Briefing. The TAOS Operating System: An introduction. October 1994

Technical Briefing. The TAOS Operating System: An introduction. October 1994 Technical Briefing The TAOS Operating System: An introduction October 1994 Disclaimer: Provided for information only. This does not imply Acorn has any intention or contract to use or sell any products

More information

ECE 574 Cluster Computing Lecture 15

ECE 574 Cluster Computing Lecture 15 ECE 574 Cluster Computing Lecture 15 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 30 March 2017 HW#7 (MPI) posted. Project topics due. Update on the PAPI paper Announcements

More information

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc. Chapter 1 GETTING STARTED SYS-ED/ Computer Education Techniques, Inc. Objectives You will learn: Java platform. Applets and applications. Java programming language: facilities and foundation. Memory management

More information

Chapter 3: Towards the Simplex Method for Efficient Solution of Linear Programs

Chapter 3: Towards the Simplex Method for Efficient Solution of Linear Programs Chapter 3: Towards the Simplex Method for Efficient Solution of Linear Programs The simplex method, invented by George Dantzig in 1947, is the basic workhorse for solving linear programs, even today. While

More information

Parallel Execution of Kahn Process Networks in the GPU

Parallel Execution of Kahn Process Networks in the GPU Parallel Execution of Kahn Process Networks in the GPU Keith J. Winstein keithw@mit.edu Abstract Modern video cards perform data-parallel operations extremely quickly, but there has been less work toward

More information

Open Packet Processing Acceleration Nuzzo, Craig,

Open Packet Processing Acceleration Nuzzo, Craig, Open Packet Processing Acceleration Nuzzo, Craig, cnuzz2@uis.edu Summary The amount of data in our world is growing rapidly, this is obvious. However, the behind the scenes impacts of this growth may not

More information

Chapter 1. Numeric Artifacts. 1.1 Introduction

Chapter 1. Numeric Artifacts. 1.1 Introduction Chapter 1 Numeric Artifacts 1.1 Introduction Virtually all solutions to problems in electromagnetics require the use of a computer. Even when an analytic or closed form solution is available which is nominally

More information

Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer

Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer Executive Summary The NVIDIA Quadro2 line of workstation graphics solutions is the first of its kind to feature hardware support for

More information

Movie: For The Birds. Announcements. Ray Tracing 1. Programming 2 Recap. Programming 3 Info Test data for part 1 (Lines) is available

Movie: For The Birds. Announcements. Ray Tracing 1. Programming 2 Recap. Programming 3 Info Test data for part 1 (Lines) is available Now Playing: Movie: For The Birds Pixar, 2000 Liar Built To Spill from You In Reverse Released April 11, 2006 Ray Tracing 1 Rick Skarbez, Instructor COMP 575 November 1, 2007 Announcements Programming

More information

Six Billion Dollar Team

Six Billion Dollar Team Six Billion Dollar Team Dominick Condoleo Brennan Kimura Kris Macoskey June 15, 2012 1 Abstract The purpose of this project was to take already developed code written for Design Optimization Models and

More information

AES Cryptosystem Acceleration Using Graphics Processing Units. Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley

AES Cryptosystem Acceleration Using Graphics Processing Units. Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley AES Cryptosystem Acceleration Using Graphics Processing Units Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley Overview Introduction Compute Unified Device Architecture (CUDA) Advanced

More information

Visual Design Flows for Faster Debug and Time to Market FlowTracer White Paper

Visual Design Flows for Faster Debug and Time to Market FlowTracer White Paper Visual Design Flows for Faster Debug and Time to Market FlowTracer White Paper 2560 Mission College Blvd., Suite 130 Santa Clara, CA 95054 (408) 492-0940 Introduction As System-on-Chip (SoC) designs have

More information

The modularity requirement

The modularity requirement The modularity requirement The obvious complexity of an OS and the inherent difficulty of its design lead to quite a few problems: an OS is often not completed on time; It often comes with quite a few

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013! Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Threading Issues Operating System Examples

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU

More information

Understanding Geospatial Data Models

Understanding Geospatial Data Models Understanding Geospatial Data Models 1 A geospatial data model is a formal means of representing spatially referenced information. It is a simplified view of physical entities and a conceptualization of

More information

Principles of Architectural and Environmental Design EARC 2417 Lecture 2 Forms

Principles of Architectural and Environmental Design EARC 2417 Lecture 2 Forms Islamic University-Gaza Faculty of Engineering Architecture Department Principles of Architectural and Environmental Design EARC 2417 Lecture 2 Forms Instructor: Dr. Suheir Ammar 2016 1 FORMS ELEMENTS

More information

Up and Running Software The Development Process

Up and Running Software The Development Process Up and Running Software The Development Process Success Determination, Adaptative Processes, and a Baseline Approach About This Document: Thank you for requesting more information about Up and Running

More information

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads Operating Systems 2 nd semester 2016/2017 Chapter 4: Threads Mohamed B. Abubaker Palestine Technical College Deir El-Balah Note: Adapted from the resources of textbox Operating System Concepts, 9 th edition

More information

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.

More information

UBC: An Efficient Unified I/O and Memory Caching Subsystem for NetBSD

UBC: An Efficient Unified I/O and Memory Caching Subsystem for NetBSD UBC: An Efficient Unified I/O and Memory Caching Subsystem for NetBSD Chuck Silvers The NetBSD Project chuq@chuq.com, http://www.netbsd.org/ Abstract This paper introduces UBC ( Unified Buffer Cache ),

More information

Threads. Computer Systems. 5/12/2009 cse threads Perkins, DW Johnson and University of Washington 1

Threads. Computer Systems.   5/12/2009 cse threads Perkins, DW Johnson and University of Washington 1 Threads CSE 410, Spring 2009 Computer Systems http://www.cs.washington.edu/410 5/12/2009 cse410-20-threads 2006-09 Perkins, DW Johnson and University of Washington 1 Reading and References Reading» Read

More information

Popularity of Twitter Accounts: PageRank on a Social Network

Popularity of Twitter Accounts: PageRank on a Social Network Popularity of Twitter Accounts: PageRank on a Social Network A.D-A December 8, 2017 1 Problem Statement Twitter is a social networking service, where users can create and interact with 140 character messages,

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

CSC 2405: Computer Systems II

CSC 2405: Computer Systems II CSC 2405: Computer Systems II Dr. Mirela Damian http://www.csc.villanova.edu/~mdamian/csc2405/ Spring 2016 Course Goals: Look under the hood Help you learn what happens under the hood of computer systems

More information

What s in a process?

What s in a process? CSE 451: Operating Systems Winter 2015 Module 5 Threads Mark Zbikowski mzbik@cs.washington.edu Allen Center 476 2013 Gribble, Lazowska, Levy, Zahorjan What s in a process? A process consists of (at least):

More information

Chapter 17: The Truth about Normals

Chapter 17: The Truth about Normals Chapter 17: The Truth about Normals What are Normals? When I first started with Blender I read about normals everywhere, but all I knew about them was: If there are weird black spots on your object, go

More information

OpenACC 2.6 Proposed Features

OpenACC 2.6 Proposed Features OpenACC 2.6 Proposed Features OpenACC.org June, 2017 1 Introduction This document summarizes features and changes being proposed for the next version of the OpenACC Application Programming Interface, tentatively

More information

Optimizing Your Android Applications

Optimizing Your Android Applications Optimizing Your Android Applications Alexander Nelson November 27th, 2017 University of Arkansas - Department of Computer Science and Computer Engineering The Problem Reminder Immediacy and responsiveness

More information

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No # 09 Lecture No # 40 This is lecture forty of the course on

More information

A PROPOSED METHOD FOR GENERATING,STORING AND MANAGING LARGE AMOUNTS OF MODELLING DATA USING SCRIPTS AND ON-LINE DATABASES

A PROPOSED METHOD FOR GENERATING,STORING AND MANAGING LARGE AMOUNTS OF MODELLING DATA USING SCRIPTS AND ON-LINE DATABASES Ninth International IBPSA Conference Montréal, Canada August 15-18, 2005 A PROPOSED METHOD FOR GENERATING,STORING AND MANAGING LARGE AMOUNTS OF MODELLING DATA USING SCRIPTS AND ON-LINE DATABASES Spyros

More information

CECOS University Department of Electrical Engineering. Wave Propagation and Antennas LAB # 1

CECOS University Department of Electrical Engineering. Wave Propagation and Antennas LAB # 1 CECOS University Department of Electrical Engineering Wave Propagation and Antennas LAB # 1 Introduction to HFSS 3D Modeling, Properties, Commands & Attributes Lab Instructor: Amjad Iqbal 1. What is HFSS?

More information

Introduction to Multicore Programming

Introduction to Multicore Programming Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming

More information

OpenACC. Arthur Lei, Michelle Munteanu, Michael Papadopoulos, Philip Smith

OpenACC. Arthur Lei, Michelle Munteanu, Michael Papadopoulos, Philip Smith OpenACC Arthur Lei, Michelle Munteanu, Michael Papadopoulos, Philip Smith 1 Introduction For this introduction, we are assuming you are familiar with libraries that use a pragma directive based structure,

More information

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming Pablo Brubeck Department of Physics Tecnologico de Monterrey October 14, 2016 Student Chapter Tecnológico de Monterrey Tecnológico de Monterrey Student Chapter Outline

More information

Opening Microsoft Visual Studio. On Microsoft Windows Vista and XP to open the visual studio do the following:

Opening Microsoft Visual Studio. On Microsoft Windows Vista and XP to open the visual studio do the following: If you are a beginner on Microsoft Visual Studio 2008 then you will at first find that this powerful program is not that easy to use for a beginner this is the aim of this tutorial. I hope that it helps

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2004 Lecture 18: Naming, Directories, and File Caching 18.0 Main Points How do users name files? What is a name? Lookup:

More information

CS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay

CS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay Introduction to CUDA Lecture originally by Luke Durant and Tamas Szalay Today CUDA - Why CUDA? - Overview of CUDA architecture - Dense matrix multiplication with CUDA 2 Shader GPGPU - Before current generation,

More information

Physically-Based Laser Simulation

Physically-Based Laser Simulation Physically-Based Laser Simulation Greg Reshko Carnegie Mellon University reshko@cs.cmu.edu Dave Mowatt Carnegie Mellon University dmowatt@andrew.cmu.edu Abstract In this paper, we describe our work on

More information

Flash Drive Emulation

Flash Drive Emulation Flash Drive Emulation Eric Aderhold & Blayne Field aderhold@cs.wisc.edu & bfield@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison Abstract Flash drives are becoming increasingly

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2002 Lecture 18: Naming, Directories, and File Caching 18.0 Main Points How do users name files? What is a name? Lookup:

More information

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function Chen-Ting Chang, Yu-Sheng Chen, I-Wei Wu, and Jyh-Jiun Shann Dept. of Computer Science, National Chiao

More information

Big Mathematical Ideas and Understandings

Big Mathematical Ideas and Understandings Big Mathematical Ideas and Understandings A Big Idea is a statement of an idea that is central to the learning of mathematics, one that links numerous mathematical understandings into a coherent whole.

More information

Using SYCL as an Implementation Framework for HPX.Compute

Using SYCL as an Implementation Framework for HPX.Compute Using SYCL as an Implementation Framework for HPX.Compute Marcin Copik 1 Hartmut Kaiser 2 1 RWTH Aachen University mcopik@gmail.com 2 Louisiana State University Center for Computation and Technology The

More information

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology Point Cloud Filtering using Ray Casting by Eric Jensen 01 The Basic Methodology Ray tracing in standard graphics study is a method of following the path of a photon from the light source to the camera,

More information

1 Hardware virtualization for shading languages Group Technical Proposal

1 Hardware virtualization for shading languages Group Technical Proposal 1 Hardware virtualization for shading languages Group Technical Proposal Executive Summary The fast processing speed and large memory bandwidth of the modern graphics processing unit (GPU) will make it

More information

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001 K42 Team modified October 2001 This paper discusses how K42 uses Linux-kernel components to support a wide range of hardware, a full-featured TCP/IP stack and Linux file-systems. An examination of the

More information

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019 CS 31: Introduction to Computer Systems 22-23: Threads & Synchronization April 16-18, 2019 Making Programs Run Faster We all like how fast computers are In the old days (1980 s - 2005): Algorithm too slow?

More information

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation Why are threads useful? How does one use POSIX pthreads? Michael Swift 1 2 What s in a process? Organizing a Process A process

More information

Final Examination CS 111, Fall 2016 UCLA. Name:

Final Examination CS 111, Fall 2016 UCLA. Name: Final Examination CS 111, Fall 2016 UCLA Name: This is an open book, open note test. You may use electronic devices to take the test, but may not access the network during the test. You have three hours

More information

Faster Simulations of the National Airspace System

Faster Simulations of the National Airspace System Faster Simulations of the National Airspace System PK Menon Monish Tandale Sandy Wiraatmadja Optimal Synthesis Inc. Joseph Rios NASA Ames Research Center NVIDIA GPU Technology Conference 2010, San Jose,

More information

CUDA Programming Model

CUDA Programming Model CUDA Xing Zeng, Dongyue Mou Introduction Example Pro & Contra Trend Introduction Example Pro & Contra Trend Introduction What is CUDA? - Compute Unified Device Architecture. - A powerful parallel programming

More information

Shadows in the graphics pipeline

Shadows in the graphics pipeline Shadows in the graphics pipeline Steve Marschner Cornell University CS 569 Spring 2008, 19 February There are a number of visual cues that help let the viewer know about the 3D relationships between objects

More information

OpenCL. Matt Sellitto Dana Schaa Northeastern University NUCAR

OpenCL. Matt Sellitto Dana Schaa Northeastern University NUCAR OpenCL Matt Sellitto Dana Schaa Northeastern University NUCAR OpenCL Architecture Parallel computing for heterogenous devices CPUs, GPUs, other processors (Cell, DSPs, etc) Portable accelerated code Defined

More information

10/10/ Gribble, Lazowska, Levy, Zahorjan 2. 10/10/ Gribble, Lazowska, Levy, Zahorjan 4

10/10/ Gribble, Lazowska, Levy, Zahorjan 2. 10/10/ Gribble, Lazowska, Levy, Zahorjan 4 What s in a process? CSE 451: Operating Systems Autumn 2010 Module 5 Threads Ed Lazowska lazowska@cs.washington.edu Allen Center 570 A process consists of (at least): An, containing the code (instructions)

More information

Lecture 13: OpenGL Shading Language (GLSL)

Lecture 13: OpenGL Shading Language (GLSL) Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 18, 2018 1/56 Motivation } Last week, we discussed the many of the new tricks in Graphics require low-level access to the Graphics

More information

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte

More information

Implementation of Parallel Path Finding in a Shared Memory Architecture

Implementation of Parallel Path Finding in a Shared Memory Architecture Implementation of Parallel Path Finding in a Shared Memory Architecture David Cohen and Matthew Dallas Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 Email: {cohend4, dallam}

More information

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS Computer Science 14 (4) 2013 http://dx.doi.org/10.7494/csci.2013.14.4.679 Dominik Żurek Marcin Pietroń Maciej Wielgosz Kazimierz Wiatr THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT

More information

Graphic Design & Digital Photography. Photoshop Basics: Working With Selection.

Graphic Design & Digital Photography. Photoshop Basics: Working With Selection. 1 Graphic Design & Digital Photography Photoshop Basics: Working With Selection. What You ll Learn: Make specific areas of an image active using selection tools, reposition a selection marquee, move and

More information