Speeding up MATLAB Applications 2009 The MathWorks, Inc.
Agenda Leveraging the power of vector & matrix operations Addressing bottlenecks Utilizing additional processing power Summary 2
Example: Block Processing Images Evaluate function at grid points Reevaluate function over larger blocks Compare the results Evaluate code performance 3
Summary of Example Used built-in timing functions >> tic >> toc Used M-Lint to find suboptimal code Preallocated arrays Vectorized code 4
Effect of Not Preallocating Memory >> x = 4 >> x(2) = 7 >> x(3) = 12 4 4 0x0000 0x0008 0x0010 0x0018 0x0020 0x0028 X(2) = 7 0x0000 4 7 0x0008 0x0010 0x0018 0x0020 0x0028 X(3) = 12 0x0000 4 7 0x0008 0x0010 4 7 12 0x0018 0x0020 0x0028 0x0030 0x0038 0x0030 0x0038 0x0030 0x0038 5
Benefit of Preallocation >> x = zeros(3,1) >> x(1) = 4 >> x(2) = 7 >> x(3) = 12 0x0000 0 0 0 0x0008 0x0010 0x0000 4 0 0 0x0008 0x0010 4 7 0 0x0000 0x0008 0x0010 0x0000 4 7 12 0x0008 0x0010 0x0018 0x0018 0x0018 0x0018 0x0020 0x0020 0x0020 0x0020 0x0028 0x0028 0x0028 0x0028 0x0030 0x0030 0x0030 0x0030 0x0038 0x0038 0x0038 0x0038 6
Data Storage of MATLAB Arrays >> x = magic(3) x = 8 1 6 3 5 7 4 9 2 0x0000 8 3 4 1 5 9 6 7 2 0x0008 0x0010 0x0018 0x0020 0x0028 0x0030 0x0038 0x0040 0x0048 0x0050 0x0058 0x0060 0x0068 See the June 2007 article in The MathWorks News and Notes : http://www.mathworks.com/company/newsletters/news_notes/june07/patterns.html 7
Indexing into MATLAB Arrays Subscripted indexing Subscripted Access elements by rows and columns 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 ind2sub sub2ind Linear Access elements with a single number 1 4 7 2 5 8 3 6 9 Linear indexing Logical Access elements with logical operations or mask 8
MATLAB Underlying Technologies Commercial libraries BLAS: Basic Linear Algebra Subroutines (multithreaded) LAPACK: Linear Algebra Package etc. JIT/Accelerator Improves looping Generates on-the-fly multithreaded code Continually improving 9
Other Best Practices for Performance Minimize dynamically changing path >> addpath( ) >> fullfile( ) Use the functional load syntax >> x = load('myvars.mat') x = a: 5 b: 'hello' Minimize changing variable class >> x = 1; >> x = 'hello'; 10
Summary Techniques for addressing performance Vectorization Preallocation Consider readability and maintainability Looping vs. matrix operations Subscripted vs. linear vs. logical etc. 11
Agenda Leveraging the power of vector & matrix operations Addressing bottlenecks Utilizing additional processing power Summary 12
Example: Fitting Data Load data from multiple files Extract a specific test Fit a spline to the data Write results to Microsoft Excel 13
Summary of Example Used profiler to analyze code Targeted significant bottlenecks Reduced file I/O Reused figure 14
Interpreting Profiler Results Focus on top bottleneck Total number of function calls Time per function call Functions All function calls have overhead MATLAB functions often take vectors or matrices as inputs Find the right function performance may vary Search MATLAB functions (e.g., textscan vs. textread) Write a custom function (specific/dedicated functions may be faster) Many shipping functions have viewable source code 15
Classes of Bottlenecks File I/O Disk is slow compared to RAM When possible, use load and save commands Displaying output Creating new figures is expensive Writing to command window is slow Computationally intensive Use what you ve learned today Trade-off modularization, readability and performance Integrate other languages or additional hardware e.g. emlmex, MEX, GGPUs, FPGAs, clusters, etc. 16
Steps for Improving Performance First focus on getting your code working Then speed up the code within core MATLAB Consider additional processing power 17
Agenda Leveraging the power of vector & matrix operations Addressing bottlenecks Utilizing additional processing power Summary 18
Going Beyond Serial MATLAB Applications TOOLBOXES BLOCKSETS 19
Example: Optimizing Tower Placement Determine location of cell towers Maximize coverage Minimize overlap 20
Summary of Example Enabled built-in support for Parallel Computing Toolbox in Optimization Toolbox Used a pool of MATLAB workers Optimized in parallel using fmincon 21
Parallel Computing Support in Optimization Toolbox Functions: fmincon Finds a constrained minimum of a function of several variables fminimax Finds a minimax solution of a function of several variables fgoalattain Solves the multiobjective goal attainment optimization problem Functions can take finite differences in parallel in order to speed the estimation of gradients 22
Other Tools Providing Parallel Computing Support Optimization Toolbox GADS Toolbox SystemTest Simulink Design Optimization Bioinformatics Toolbox Model-Based Calibration Toolbox TOOLBOXES BLOCKSETS Directly leverage functions in Parallel Computing Toolbox 23
Task Parallel Applications TOOLBOXES BLOCKSETS Task 1 Task 2 Task 3 Task 4 Time Time 24
Example: Parameter Sweep of ODEs 1.2 Solve a 2 nd order ODE } 5 m && x + b{ x& + k{ x = 0 1,2,... 1,2,... Displacement (x) 1 0.8 0.6 0.4 0.2 0-0.2 m = 5, b = 2, k = 2 m = 5, b = 5, k = 5-0.4 0 5 10 15 20 25 Time (s) Simulate with different values for b and k Record peak value for each run Plot results Peak Displacement (x) 2.5 2 1.5 1 0.5 0 2 Damping (b) 4 6 1 2 3 4 Stiffness (k) 5 25
Summary of Example 1.2 Mixed task-parallel and serial code in the same function Ran loops on a pool of MATLAB resources Displacement (x) 1 0.8 0.6 0.4 0.2 0-0.2 m = 5, b = 2, k = 2 m = 5, b = 5, k = 5-0.4 0 5 10 15 20 25 Time (s) Used M-Lint analysis to help in converting existing for-loop into parfor-loop Peak Displacement (x) 2.5 2 1.5 1 0.5 0 2 Damping (b) 4 6 1 2 3 4 Stiffness (k) 5 26
The Mechanics of parfor Loops 1 12 23 34 4 55 66 7 88 9 910 10 a = zeros(10, 1) parfor i = 1:10 a(i) = i; end a a(i) = i; a(i) = i; a(i) = i; a(i) = i; Pool of MATLAB s 27
Converting for to parfor Requirements for parfor loops Task independent Order independent Constraints on the loop body Cannot introduce variables (e.g. eval, load, global, etc.) Cannot contain break or return statements Cannot contain another parfor loop 28
Advice for Converting for to parfor Use M-Lint to diagnose parfor issues If your for loop cannot be converted to a parfor, consider wrapping a subset of the body to a function Read the section in the documentation on classification of variables 29
Interactive to Scheduling Interactive Great for prototyping Immediate access to MATLAB workers Scheduling Offloads work to other MATLAB workers (local or on a cluster) Access to more computing resources for improved performance Frees up local MATLAB session 30
Scheduling Work Work TOOLBOXES BLOCKSETS Result Scheduler 31
Example: Schedule Processing 1.2 Offload parameter sweep to local workers Get peak value results when processing is complete Displacement (x) 1 0.8 0.6 0.4 0.2 0-0.2 m = 5, b = 2, k = 2 m = 5, b = 5, k = 5-0.4 0 5 10 15 20 25 Time (s) Plot results in local MATLAB Peak Displacement (x) 2.5 2 1.5 1 0.5 0 2 Damping (b) 4 6 1 2 3 4 Stiffness (k) 5 32
Summary of Example 1.2 Used batch for off-loading work Used matlabpool option to off-load and run in parallel Displacement (x) 1 0.8 0.6 0.4 0.2 0-0.2 m = 5, b = 2, k = 2 m = 5, b = 5, k = 5-0.4 0 5 10 15 20 25 Time (s) Used load to retrieve worker s workspace Peak Displacement (x) 2.5 2 1.5 1 0.5 0 2 Damping (b) 4 6 1 2 3 4 Stiffness (k) 5 33
Task-Parallel Workflows parfor Multiple independent iterations Easy to combine serial and parallel code Workflow Interactive using matlabpool Scheduled using batch jobs/tasks Series of independent tasks; not necessarily iterations Workflow Always scheduled 34
Scheduling Jobs and Tasks Task Result Task Job Result TOOLBOXES Results Scheduler Task BLOCKSETS Task Result Result 35
Example: Scheduling Independent Simulations Offload three independent approaches to solving our previous ODE example Retrieve simulated displacement as a function of time for each simulation Displacement (x) 1.2 1 0.8 0.6 0.4 0.2 0-0.2 m = 5, b = 2, k = 2 m = 5, b = 5, k = 5-0.4 0 5 10 15 20 25 Time (s) Plot comparison of results in local MATLAB 36
Summary of Example Used findresource to find scheduler 1.2 1 Used createjob and createtask to set up the problem Used submit to off-load and run in parallel Displacement (x) 0.8 0.6 0.4 0.2 0-0.2 m = 5, b = 2, k = 2 m = 5, b = 5, k = 5-0.4 0 5 10 15 20 25 Time (s) Used getalloutputarguments to retrieve all task outputs 37
Factors to Consider for Scheduling There is always an overhead to distribution Combine small repetitive function calls Share code and data with workers efficiently Set job properties (FileDependencies, PathDependencies) Minimize I/O Enable Workspace option for batch Capture command window output Enable CaptureDiary option for batch 38
Parallel Computing Tools Address Long computations Task-Parallel Large data problems Data-Parallel Multiple independent iterations parfor i = 1 : n % do something with i end Series of tasks Task 1 Task 2 Task 3 Task 4 11 2641 12 2742 13 2843 14 2944 15 3045 16 3146 17 3247 17 3348 19 3449 20 3550 21 3651 22 3752 39
Parallel Computing with MATLAB Built in parallel functionality within specific toolboxes (also requires Parallel Computing Toolbox) Optimization Toolbox GADS Toolbox System Test Simulink Design Optimization Bioinformatics Toolbox Model-Based Calibration Toolbox High level parallel functions MATLAB and Parallel Computing Tools parfor matlabpool batch Low level parallel functions jobs, tasks Built on industry standard libraries Industry Libraries Message Passing Interface (MPI) ScaLAPACK 40
Run 8 Local s on Desktop Desktop Computer Parallel Computing Toolbox Rapidly develop parallel applications on local computer Take full advantage of desktop power Separate computer cluster not required 41
Scale Up to Clusters, Grids and Clouds Desktop Computer Parallel Computing Toolbox Computer Cluster MATLAB Distributed Computing Server Scheduler 42
Agenda Leveraging the power of vector & matrix operations Addressing bottlenecks Utilizing additional processing power Summary 43
Summary Consider performance benefit of vector and matrix operations in MATLAB Analyze your code for bottlenecks and address most critical items Leverage parallel computing tools to take advantage of additional computing resources 44
Sample of Other Performance Resources MATLAB documentation MATLAB Programming Fundamentals Performance Memory Management Guide www.mathworks.com/support/tech-notes/1100/1106.html?bb=1 The Art of MATLAB, Loren Shure s blog blogs.mathworks.com/loren/ MATLAB Central s Usenet portal www.mathworks.com/matlabcentral/newsreader.html 45
2009 The MathWorks, Inc.