Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California

Size: px

Start display at page:

Download "Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California"

Gilbert Andrews
5 years ago
Views:

1 Dynamic Cuda with F# HPC GPU & F# Meetup March 19 San Jose, California Dr. Daniel Egloff

2 About Us! Software development and consulting company! Based in Zurich! Core competence! Quantitative finance and risk management! Derivative pricing and modeling! Numerical computing! High performance computing (clusters, grid, GPUs)! Software engineering (C++, F#, Scala, )! Early adopter of GPUs! First project with GPUs in finance back in 2007

3 Situation Compute intensive problems GPGPU Big data and information rich applications Cloud and distributed computing

4 Problems Recurring challenges in our HPC projects Slow progress because of tools and complexity Algorithms change often Critical algorithms are not designed with GPUs in mind Efficient GPU code is difficult to develop Implications? Availability of GPU hardware C++ adds unwanted level of complexity Interoperation and wrapper code Hard to test correctness of algorithms

Implications Negative impact on the usability and acceptance of new technology Unpredictable project duration and costs Moving project targets Rewrite of critical numerical code

5 Implications Negative impact on the usability and acceptance of new technology Unpredictable project duration and costs Moving project targets Rewrite of critical numerical code Static and inflexible solutions Implications Delays because of missing hardware Optimized code hard to maintain and extend Maintenance of wrapper code Maintenance of large body of test code

6 Needs What would be needed and how can we improve? Better and more flexible tools to make CUDA more accessible Using modern programming languages and concepts Solid framework compatible with CUDA programming model Support iterative development and rapid prototyping Building on top of modern technologies like.net or the JVM Generated CUDA code at par with compiled CUDA C/C++ CUDA programming model This is where comes into play

7 What is?

Alea.cuBase! Complete solution to develop CUDA accelerated GPU applications in.net! Relies on F# and in particular code quotation! Based on LLVM and CUDA 5 technology!

8 Alea.cuBase! Complete solution to develop CUDA accelerated GPU applications in.net! Relies on F# and in particular code quotation! Based on LLVM and CUDA 5 technology! Fully F# based, no additional changes or extensions to the F# language, no <<< >>>! No wrappers, no post build process to transform IL code Dynamic code generation GPU algorithm scripting Industry grade performance Rapid development Solid framework for reusability Advanced CUDA programming

9 Benefits Dynamic code generation! Generate GPU code programmatically at run-time! Use.NET generics and F# code quotation splicing for flexible kernels! Foundation to develop GPU aware domain specific languages

10 Benefits Rapid development! Easy and quick setup of development environment, no need to install NVIDIA nvcc compiler tools! Rapid prototyping in F# interactive! Iteratively improve CUDA kernel algorithms without time consuming build cycles

11 Benefits GPU algorithm scripting! Execute F# scripts with GPU algorithms on command line or in F# interactive! GPU scripting in Excel! Integrate Alea.cuBase directly with Python

12 Benefits Solid framework for reusability! Framework for type-safe definition of GPU resources! CUDA monad to specify GPU resources together with launch logic in unified manner! Reuse GPU kernel code and compose them to modular GPU kernel libraries cuda { kernel_a launch logic } cuda { kernel_b launch logic }

Benefits Industry grade performance! Generating performance optimized code which is on par with compiled CUDA C/C++ code! Low level device functions and special math functions!

13 Benefits Industry grade performance! Generating performance optimized code which is on par with compiled CUDA C/C++ code! Low level device functions and special math functions! Built in occupancy calculator to identify optimal thread block layout Segmented Scan by Key Alea.cuExtension against CUDA Thrust % % % % % % int32 float32 float % % 0.00%

14 Benefits Advanced CUDA programming! Support for texture, constant and shared memory! Pointer operations to partition array data! Special pointer types such as volatile pointers! Runtime compilation control e.g. fast math! Multiple streams! Thread safe use of multiple GPUs! Inline PTX assembly instructions

15 Alea Ecosystem CUDA C Ecosystem Alea Ecosystem User Applications Thrust CUDPP Alea.cuExtension CUDA Runtime API Alea.cuBase CUDA Driver API

16 Advantages over CUDA C/C++! Faster development! More productive tools, type inference, Intellisense support! Cleaner more expressive code results in fewer bugs, less testing and less debugging! Removing unnecessary complexity! No template meta programming or preprocessor tricks needed! More flexibility! Dynamic code generation and scripting is entirely missing in CUDA C/C++! Easier to reuse and compose GPU algorithm! More transparency! GPU resources are defined where needed! Launch logic defines thread layout and data requirement more transparent! Direct access to other valuable F# and.net technologies! Seamless integration into.net, without any interoperation layer! Monads aka computational expressions! Async workflows, agents, parallel programming, type providers! Uniform.NET type system

17 How does work?

18 Development Process! Four steps to a CUDA kernel with F# and Alea.cuBase cuda { constant array texture kernel... launch logic } PTemplate 1 NVVM IR module 3 CUDA module 4 2 PTX module Launch function CUDA device CUDA context CUDA cubin module PModule Device worker Comilation process

19 How to program with?

20 Live Coding! Basic kernel programming! Excel GPU scripting with Alea.cuBase and Alea.cuExtension, in Tsunami IDE and FCell! Excel based Monte Carlo simulation! PDE solver for 2d heat equation in GPU

21 How did F# pay off?! F# is used in multiple ways! Internally to build our CUDA compiler with quotations! As hosting language to define an internal DSL for CUDA, i.e. the CUDA monad! Use extensibility of F# to build more DSL such as the pcalc monad! Benefit of using F#! Rich ecosystem first class language in.net! Functional fewer and more descriptive code! Monads ideal to create DSLs, seq, async, F# linq! Type providers easier to integrate into information rich programming! Quotations lot of flexibility for DSL or compiler implementation! Extensibility pure solution for CUDA programming

22 More Information on F#! F# is a functional-first programming language! Strongly typed! First class.net language! Open source! Designed to solve complex computing problems with simple, maintainable robust code! Runs on Windows, Linux, Mac OS, HTML5 and also on GPUs!

23 Thank you

F# for Industrial Applications Worth a Try?

F# for Industrial Applications Worth a Try? Jazoon TechDays 2015 Dr. Daniel Egloff Managing Director Microsoft MVP daniel.egloff@quantalea.net October 23, 2015 Who are we Software and solution provider