Speeding up MATLAB Applications Sean de Wolski Application Engineer

Similar documents
Optimizing and Accelerating Your MATLAB Code

Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen

Speeding up MATLAB Applications The MathWorks, Inc.

Parallel and Distributed Computing with MATLAB The MathWorks, Inc. 1

Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer

Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen

Multicore Computer, GPU 및 Cluster 환경에서의 MATLAB Parallel Computing 기능

Accelerating System Simulations

Modeling a 4G LTE System in MATLAB

Getting Started with MATLAB Francesca Perino

Parallel Computing with MATLAB

개발과정에서의 MATLAB 과 C 의연동 ( 영상처리분야 )

High Performance and GPU Computing in MATLAB

MATLAB to iphone Made Easy

Large Data in MATLAB: A Seismic Data Processing Case Study U. M. Sundar Senior Application Engineer

2^48 - keine Angst vor großen Datensätzen in MATLAB

Scaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1

Modeling a 4G LTE System in MATLAB

INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX

컴퓨터비전의최신기술 : Deep Learning, 3D Vision and Embedded Vision

Integrate MATLAB Analytics into Enterprise Applications

MATLAB AND PARALLEL COMPUTING

Parallel Computing with MATLAB

Parallel Computing with MATLAB

Deep learning in MATLAB From Concept to CUDA Code

HPC with Multicore and GPUs

2015 The MathWorks, Inc. 1

Using Parallel Computing Toolbox to accelerate the Video and Image Processing Speed. Develop parallel code interactively

Moving MATLAB Algorithms into Complete Designs with Fixed-Point Simulation and Code Generation

Technical Computing with MATLAB

Scaling up MATLAB Analytics Marta Wilczkowiak, PhD Senior Applications Engineer MathWorks

Integrate MATLAB Analytics into Enterprise Applications

Daniel D. Warner. May 31, Introduction to Parallel Matlab. Daniel D. Warner. Introduction. Matlab s 5-fold way. Basic Matlab Example

Parallel Computing with Matlab and R

2015 The MathWorks, Inc. 1

MATLAB Parallel Computing Toolbox Benchmark for an Embarrassingly Parallel Application

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU

Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA

Introduction to C and HDL Code Generation from MATLAB

Model-Based Design: Design with Simulation in Simulink

Using Intel Math Kernel Library with MathWorks* MATLAB* on Intel Xeon Phi Coprocessor System

Technical Report TR

MATLAB Based Optimization Techniques and Parallel Computing

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

What s New in MATLAB and Simulink The MathWorks, Inc. 1

General Purpose GPU Computing in Partial Wave Analysis

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2

System Requirements & Platform Availability by Product for R2016b

Introduction to MATLAB

Parallel Processing Tool-box

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

Introduction to Matlab GPU Acceleration for. Computational Finance. Chuan- Hsiang Han 1. Section 1: Introduction

Integrate MATLAB Analytics into Enterprise Applications

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

Parallel MATLAB at VT

Experiment 6 SIMULINK

Georgia Institute of Technology Center for Signal and Image Processing Steve Conover February 2009

Issues In Implementing The Primal-Dual Method for SDP. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

Accelerated Machine Learning Algorithms in Python

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

High-Performance Data Loading and Augmentation for Deep Neural Network Training

Computer Caches. Lab 1. Caching

The Use of Computing Clusters and Automatic Code Generation to Speed Up Simulation Tasks

Optimized Scientific Computing:

Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations

A Multi-Tiered Optimization Framework for Heterogeneous Computing

Parallel Computing with MATLAB on Discovery Cluster

Parallel Computing with MATLAB

Sharing and Deploying MATLAB Programs Sundar Umamaheshwaran Amit Doshi Application Engineer-Technical Computing

Technology for a better society. hetcomp.com

Approaches to acceleration: GPUs vs Intel MIC. Fabio AFFINITO SCAI department

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Deploying Deep Learning Networks to Embedded GPUs and CPUs

Speeding up Simulink. Murali Yeddanapudi The MathWorks, Inc. 1

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016

IBM Power AC922 Server

Advanced Software Development with MATLAB

2015 The MathWorks, Inc. 1

What s New in MATLAB and Simulink

Accelerating Implicit LS-DYNA with GPU

Handling and Processing Big Data for Biomedical Discovery with MATLAB

Matlab for Engineers

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017

Introduction to GPU hardware and to CUDA

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

MATLAB Distributed Computing Server (MDCS) Training

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging

A MATLAB Interface to the GPU

Calcul intensif et Stockage de Masse. CÉCI/CISM HPC training sessions Use of Matlab on the clusters

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

Lecture x: MATLAB - advanced use cases

Evaluating the MATLAB Parallel Computing Toolbox Kashif Hussain Computer Science 2012/2013

MATLAB Distributed Computing Server Release Notes

Calcul intensif et Stockage de Masse. CÉCI/CISM HPC training sessions

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348

Data Analytics with MATLAB. Tackling the Challenges of Big Data

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

Transcription:

Speeding up MATLAB Applications Sean de Wolski Application Engineer 2014 The MathWorks, Inc. 1

Non-rigid Displacement Vector Fields 2

Agenda Leveraging the power of vector and matrix operations Addressing bottlenecks Generating and incorporating C code Utilizing additional processing power Summary 3

Example: Block Processing Images Evaluate function at grid points Reevaluate function over larger blocks Compare the results Evaluate code performance 4

Effect of Not Preallocating Memory x = 4 x(2) = 7 x(3) = 12 4 4 0x0000 0x0008 0x0010 0x0018 0x0020 0x0028 X(2) = 7 0x0000 4 7 0x0008 0x0010 0x0018 0x0020 0x0028 X(3) = 12 0x0000 4 7 0x0008 0x0010 4 7 12 0x0018 0x0020 0x0028 0x0030 0x0038 0x0030 0x0038 0x0030 0x0038 5

Benefit of Preallocation x = zeros(3,1) x(1) = 4 x(2) = 7 x(3) = 12 0x0000 0 0 0 0x0008 0x0010 04 0 0 0x0000 0x0008 0x0010 4 07 0 0x0000 0x0008 0x0010 4 7 12 0 0x0000 0x0008 0x0010 0x0018 0x0018 0x0018 0x0018 0x0020 0x0020 0x0020 0x0020 0x0028 0x0028 0x0028 0x0028 0x0030 0x0030 0x0030 0x0030 0x0038 0x0038 0x0038 0x0038 6

Benefit of Preallocation x = zeros(3,1) x(1) = 4 x(2) = 7 x(3) = 12 0x0000 0 0 0 0x0008 0x0010 0x0000 4 0 0 0x0008 0x0010 4 7 0 0x0000 0x0008 0x0010 0x0000 4 7 12 0x0008 0x0010 0x0018 0x0018 0x0018 0x0018 0x0020 0x0020 0x0020 0x0020 0x0028 0x0028 0x0028 0x0028 0x0030 0x0030 0x0030 0x0030 0x0038 0x0038 0x0038 0x0038 7

MATLAB Underlying Technologies Commercial libraries BLAS:Basic Linear Algebra Subroutines (multithreaded) LAPACK: Linear Algebra Package etc. 8

MATLAB Underlying Technologies JIT/Accelerator Improves looping Generates on-the-fly multithreaded code Continually improving 9

Summary of Example: Tools Used built-in timing functions: tic, toc 10

Summary of Example: Tools Used built-in timing functions: timeit 11

Summary of Example: Tools Used Code Analyzer to find suboptimal code 12

Summary of Example: Techniques Preallocated arrays >> x = zeros(3,1) 0x0000 0 0 0 0x0008 0x0010 0x0018 0x0020 0x0028 0x0030 0x0038 13

Summary of Example: Techniques Vectorized code 14

Other Best Practices Minimize dynamically changing path >> cd( ) 15

Other Best Practices Minimize dynamically changing path >> cd( ) instead use: >> addpath( ) >> fullfile( ) 16

Other Best Practices Use the functional load syntax >> load('myvars.mat') 17

Other Best Practices Use the functional load syntax >> load('myvars.mat') instead use: >> x = load('myvars.mat') x = a: 5 b: 'hello' 18

Other Best Practices Minimize changing variable class >> x = 1; >> x = 'hello'; 19

Other Best Practices Minimize changing variable class >> x = 1; >> x = 'hello'; instead use: >> x = 1; >> xnew = 'hello'; 20

Agenda Leveraging the power of vector and matrix operations Addressing bottlenecks Generating and incorporating C code Utilizing additional processing power Summary 21

Example: Fitting Data Load data from multiple files Extract a specific test Fit a spline to the data Write results to Microsoft Excel 22

Summary of Example: Tools Profiler Total number of function calls Time per function call 23

Summary of Example: Techniques Target significant bottlenecks Reduce file I/O Disk is slow compared to RAM When possible, use load and save commands 24

Summary of Example: Techniques Target significant bottlenecks Reuse figure Avoid printing to command window 25

Steps for Improving Performance First focus on getting your code working Then speed up the code within core MATLAB Consider other languages (i.e. C or Fortran MEX files) and additional processing power 26

Agenda Leveraging the power of vector and matrix operations Addressing bottlenecks Generating and incorporating C code Utilizing additional processing power Summary 27

Why engineers and scientists translate MATLAB to C today? Integrate MATLAB algorithms w/ existing C environment using source code and static/dynamic libraries Prototype MATLAB algorithms on desktops as standalone executables Accelerate user-written MATLAB algorithms Implement C code on processors or hand-off to software engineers 28

Challenges with Manual Translation from MATLAB to C Re-code in C/C++ Separate functional and implementation specification Leads to multiple implementations that are inconsistent Hard to modify requirements during development Difficult to keep reference MATLAB code and C code in-sync 29

Challenges with Manual Translation from MATLAB to C Re-code in C/C++ Manual coding errors 30

Challenges with Manual Translation from MATLAB to C Re-code in C/C++ Time consuming and expensive 31

Automatic Translation of MATLAB to C iterate Algorithm Design and Code Generation in MATLAB verify / accelerate With MATLAB Coder, design engineers can Maintain one design in MATLAB Design faster and get to C quickly Test more systematically and frequently Spend more time improving algorithms in MATLAB 32

Acceleration using MEX Speed-up factor will vary When you may see a speedup Often for Communications and Signal Processing Always for Fixed-point Likely for loops with states or when vectorization isn t possible When you may not see a speedup MATLAB implicitly multithreads computation Built-functions call IPP or BLAS libraries 33

Supported MATLAB Language Features and Functions Matrices and Arrays Data Types Programming Constructs Functions Matrix operations N-dimensional arrays Subscripting Frames Persistent variables Global variables Complex numbers Integer math Double/singleprecision Fixed-point arithmetic Characters Structures Numeric class Variable-sized data MATLAB Classes System objects Arithmetic, relational, and logical operators Program control (if, for, while, switch ) MATLAB functions and subfunctions Variable length argument lists Function handles Supported algorithms > 800 MATLAB operators and functions > 200 System objects for Signal processing Communications Computer vision Supported Functions 34

More Information To learn more visit the product page www.mathworks.com/products/matlab-coder On-Demand Webinar: MATLAB to C Made Easy Search at http://www.mathworks.com/company/events/webinars/index.html 35

Agenda Leveraging the power of vector and matrix operations Addressing bottlenecks Generating and incorporating C code Utilizing additional processing power Summary 36

Going Beyond Serial MATLAB Applications Worker Worker Worker TOOLBOXES BLOCKSETS Worker Worker Worker Worker Worker 37

Parallel Computing enables you to Larger Compute Pool Speed up Computations Larger Memory Pool Work with Large Data 11 26 41 12 27 42 13 28 43 14 29 44 15 30 45 16 31 46 17 32 47 17 33 48 19 34 49 20 35 50 21 36 51 22 37 52 38

Parallel Computing on the Desktop Desktop Computer Parallel Computing Toolbox Speed up parallel applications on local computer Take full advantage of desktop power by using CPUs and GPUs Separate computer cluster not required 39

Ease of Use Using Additional Cores/Processors (CPUs) Support built into Toolboxes Greater Control 40

Tools Providing Parallel Computing Support Optimization Toolbox Global Optimization Toolbox Statistics Toolbox Communications System Toolbox Simulink Design Optimization Bioinformatics Toolbox Image Processing Toolbox TOOLBOXES BLOCKSETS Worker Worker Worker Worker Worker Worker Worker Directly leverage functions in Parallel Computing Toolbox http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html 41

Ease of Use Using Additional Cores/Processors (CPUs) Support built into Toolboxes Simple programming constructs: parfor, batch, distributed Greater Control 42

Running Independent Tasks or Iterations Ideal problem for parallel computing No dependencies or communications between tasks Examples include parameter sweeps and Monte Carlo simulations Time Time 43

The Mechanics of parfor Loops 1 12 23 34 4 55 66 7 88 9 910 10 Worker a = zeros(10, 1) parfor i = 1:10 a(i) = i; end a a(i) = i; Worker a(i) = i; Worker a(i) = i; Worker a(i) = i; Pool of MATLAB Workers 44

Example: Parameter Sweep of ODEs Parallel for-loops Parameter sweep of ODEs Deflection of a truss under a dynamic load Area, A N = 4 Load Displacement, d Length, L Height, H M x + C x + Kx = F 45

Example: Parameter Sweep of ODEs Parallel for-loops Parameter sweep of ODEs Deflection of a truss under a dynamic load Sweeping two parameters: Number of truss elements Cross sectional area of truss elements Area, A N = 4 Load Displacement, d Length, L Height, H M x + C x + Kx = F 46

Ease of Use Using Additional Cores/Processors (CPUs) Support built into Toolboxes Simple programming constructs: parfor, batch, distributed Full control of parallelization: jobs and tasks, spmd Greater Control 47

Scale Up to Clusters, Grids and Clouds Desktop Computer Parallel Computing Toolbox Computer Cluster MATLAB Distributed Computing Server Scheduler 48

Scheduling Work Work Worker TOOLBOXES BLOCKSETS Result Scheduler Worker Worker Worker 49

Offload Computations with batch MATLAB Desktop (Client) Work Result Worker Worker Worker Worker batch( ) 50

Offload and Scale Computations with batch MATLAB Desktop (Client) Work Result Worker Worker Worker Worker batch(,'pool', ) 51

What is a Graphics Processing Unit (GPU) Originally for graphics acceleration, now also used for scientific calculations Massively parallel array of integer and floating point processors Typically hundreds of processors per card GPU cores complement CPU cores Dedicated high-speed memory 52

Performance Gain with More Hardware Using More Cores (CPUs) Core 1 Using GPUs Core 2 GPU cores Core 3 Core 4 Cache Device Device Memory Memory 53

GPU Requirements Parallel Computing Toolbox requires NVIDIA GPUs This includes the Tesla 20-series products MATLAB Release MATLAB R2014b MATLAB R2014a and earlier releases Required Compute Capability 2.0 or greater 1.3 or greater See a complete listing at www.nvidia.com/object/cuda_gpus.html 54

Ease of Use Programming Parallel Applications (GPU) Built-in support with toolboxes Greater Control 55

Ease of Use Programming Parallel Applications (GPU) Built-in support with toolboxes Simple programming constructs: gpuarray, gather Greater Control 56

Example: Solving 2D Wave Equation Solve 2 nd order wave equation using spectral methods: 2 u t 2 = 2 u x 2 + 2 u y 2 57

Benchmark: Solving 2D Wave Equation CPU v. GPU Intel Xeon Processor W3550 (3.07GHz), NVIDIA Tesla K20c GPU 58

Ease of Use Programming Parallel Applications (GPU) Built-in support with toolboxes Simple programming constructs: gpuarray, gather Advanced programming constructs: arrayfun, spmd Interface for experts: CUDAKernel, MEX support Greater Control www.mathworks.com/help/distcomp/run-cuda-or-ptx-code-on-gpu www.mathworks.com/help/distcomp/run-mex-functions-containing-cuda-code 59

Agenda Leveraging the power of vector and matrix operations Addressing bottlenecks Generating and incorporating C code Utilizing additional processing power Summary 60

Key Takeaways Consider performance benefit of vector and matrix operations in MATLAB Analyze your code for bottlenecks and address most critical items Leverage MATLAB Coder to speed up applications through generated C/C++ code Leverage parallel computing tools to take advantage of additional computing resources 61

Sample of Other Performance Resources MATLAB documentation MATLAB Advanced Software Development Performance and Memory Accelerating MATLAB Algorithms and Applications http://www.mathworks.com/company/newsletters/articles/accelerating-matlabalgorithms-and-applications.html The Art of MATLAB, Loren Shure s blog blogs.mathworks.com/loren/ MATLAB Answers http://www.mathworks.com/matlabcentral/answers/ 62

2014 The MathWorks, Inc. 63