Adaptive Algorithms and Parallel Computing

Size: px
Start display at page:

Download "Adaptive Algorithms and Parallel Computing"

Transcription

1 Adaptive Algorithms and Parallel Computing Multicore and Multimachine Computing with MATLAB (Part 1) Academic year Simone Scardapane

2 Content of the Lecture Main points: Parallelizing loops with the par-for construct. Understanding the single program, multiple data (SPMD) concept. Distributed arrays. Communication between workers. Interactive parallel programming. In the next lecture(s): MapReduce and Hadoop. Java threading. Introduction to CUDA. GPU acceleration in MATLAB.

3 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES

4 Basic Ideas of Parallel Computing Types of parallelism Parallel computing refers to executing multiple portions of a program at the same time on different hardware components. Parallel behaviors can be categorized in several ways, such as the granularity of parallelism: Bit-level: each processor does not operate on single bits, but on words of multiple bits (e.g. 32 or 64 bit architectures). This is the most basic form of parallelization. Instruction-level: this is the capability of parallelizing multiple CPU instruction in a single cycle. This is declined in various forms, including pipelining, superscalar processors, speculative execution, etc. Task-level: executing the same or different code on multiple threads, which can operate on the same or different data.

5 Basic Ideas of Parallel Computing Moore s Law In 1965 Gordon Moore predicted an exponential increase in the number of elements per integrated circuit. 40 years later, Intel switched from increasing frequency scaling (due to the breakdown of the so-called Dennard scaling ) in favor of multi-core architectures. Nowadays, we may be witnessing the of the multi-core era towards different paradigms (three-dimensional circuits, quantum computing, etc.) [Mar14]. At the same, there is a renewed interest in (a) commodity clusters (e.g. Hadoop/MapReduce); (b) computing with dedicated architectural components; and (c) increasing parallelism with general purpose GPU cards.

6 Basic Ideas of Parallel Computing Amdahl s Law A first problem of parallel computing (compared to classical CPU frequency scaling) is that the speedup of parallelization is not linear w.r.t. the number of threads. If we assume that only a portion (1 β) [0, 1] of a program can be fully parallelized, and the rest has to be run serially, the time T(n) required to run the program on n threads is given by: T(n) = βt(1) + 1 (1 β)t(1) (1) n This is known as Amdahl s law. It is straightforward to compute the speedup provided by parallelizing the execution: S(n) = T(1) T(n) = T(1) βt(1) + 1 n (1 β)t(1) = 1 β + 1 n (1 β) (2)

7 Basic Ideas of Parallel Computing Amdahl s Law (2) Figure: Graphical depiction of Amdahl s Law when considering programs with different levels of parallelization [SOURCE].

8 Basic Ideas of Parallel Computing Depencies Another essential concept in parallel computing is the idea of depency. One portion of the program is depent of a previous one if one of the following conditions hold: 1 One output of the first portion is required by the second (flow depency). 2 One output of the second portion is required by the first (anti-depency). 3 Both segments output the same variable (output depency). These are known as Bernstein s conditions. One way of circumventing them is to consider some form of shared memory between the portions, which brings the necessity of synchronizing reading and writing operations.

9 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES

10 Introduction to the Parallel Computing Toolbox The MATLAB Ecosystem Figure: Products in the MATLAB family [SOURCE].

11 Introduction to the Parallel Computing Toolbox Implicit and explicit parallelization In MATLAB, several built-in functions performing numerical operations are multi-threaded by default (so-called implicit parallelism). Conversely, you can parallelize your own code using the capabilities of the Parallel Computing Toolbox (PCT). In this lecture, we focus on several of its features: par-for loops, spmd blocks, distributed arrays, etc. An important aspect of the PCT is that the same code can scale transparently to any cluster configuration (more on this later on). Although we consider only the PCT, other open-source possibilities are available, such as the Parallel Matlab Toolbox (pmatlab):

12 Introduction to the Parallel Computing Toolbox Under the hood of implicit multi-threading How effective is the built-in multi-threading of MATLAB? Let us try a simple benchmark: t i c ; randn (10000, 10000)\randn ( , 1 ) ; f p r i n t f ( ' Elapsed time i s %.2 f s ecs.\n ', toc ) ; On an Intel i7, 16 GB RAM, with MATLAB R2013a, this takes (in average) 11 seconds. Disabling multi-threading, the time rises to 33 seconds. You can disable multi-threading with the starting flag -singlecompthread. Currently, you can also control it in Matlab with the function maxnumcompthreads, but this feature will be removed in future releases.

13 Introduction to the Parallel Computing Toolbox Deprecated Functions A function such as maxnumcompthreads is called deprecated: it is kept for compatibility, but it will be removed in future releases. Starting in R2015a, the documentation of MATLAB provides an annotation detailing the release of introduction for each feature. Additionally, MATLAB offers several functions to check the version: version % MATLAB version number and r e l e a s e ver % Detailed information on the Mathworks products You also have the possibility of defining code conditionally on the version: i f verlessthan ( ' matlab ', ' ' ) e r r o r ( 'MATLAB version not supported. ' ) ;

14 Introduction to the Parallel Computing Toolbox Workers A worker is the individual computational engine of the PCT. Typically, workers are started one per core, in a single-thread mode. They are indepent one of another, having their own workspace, although they possess the capability of communicating between them. A set of workers is called a pool. To start it: matlabpool open % Older versions p = parpool % New versions Similarly, to stop it: matlabpool c l o s e % Old versions d el et e ( p ) % New versions The number of workers, along with their type, is specified in a cluster configuration. The opening command with no argument loads the local configuration.

15 Introduction to the Parallel Computing Toolbox Local and remote workers Workers can be local or remote. Remote workers are handled by the Matlab Distributed Computing Server (DCS). Within a network environment, a job scheduler is used to distribute computation between the workers (image copyright of Mathworks):

16 Introduction to the Parallel Computing Toolbox Job Schedulers You can have one head node, several worker nodes, and one or more client nodes. You can run jobs using multiple job schedulers. This lecture assumes the standard MATLAB one, called MATLAB Job Scheduler. To install it, you can follow the instructions on the MATLAB documentation: Configure Cluster to Use a MATLAB Job Scheduler (MJS). On the client, you can select a specific cluster from Parallel >>Manage Cluster Profiles. You can have more than one profile, and switch them manually or programmatically.

17 Introduction to the Parallel Computing Toolbox Cluster Configuration (screenshot)

18 Introduction to the Parallel Computing Toolbox Admin Center (screenshot)

19 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES

20 Understanding par-for Introductory Example Consider this very simple code: X = randn ( 5 0, 2000) ; y = zeros ( 5 0, 1 ) ; f o r ii = 1 : 5 0 y ( ii ) = my_function ( X ( ii, : ) ) ; Each cycle of the for-loop is completely indepent of the others. If we have a pool available, the previous computation can be distributed among workers using a par-for construct: X = randn ( 5 0, 2000) ; y = zeros ( 5 0, 1 ) ; parfor ii = 1 : 5 0 y ( ii ) = my_function ( X ( ii, : ) ) ;

21 Understanding par-for Parfor execution When designing a parallel for, two concepts are essential to understand: Iterations of the loop are executed in no particular order. MATLAB analyzes what data must be sent to the workers at the beginning of the par-for loop. This puts several constraints on your code (more on this later). As a rule of thumb, you should consider using a par-for whenever you need to perform several times a relatively simple computation, without depencies between each computation. As a second rule, try to decide on the par-for before writing the code, instead of adapting it later on.

22 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES

23 Sliced, reduction, and temporary variables Sliced variables In a par-for, a variable is sliced if it can be decomposed into indepent groups over the workers. A prerequisite is that a variable must be indexed consistently inside a par-for. In the following example, X is sliced, Y is not: parfor ii = 1 : 3 a = X ( A ( ii ), A ( ii ) ) ; b = Y ( A ( ii ), A ( ii+1) ) ; To be sliced, no elements can be inserted or deleted from a matrix inside a for-loop. Here, X is not sliced: parfor ii = 1 : 3 X ( ii, : ) = [ ] ;

24 Sliced, reduction, and temporary variables Sliced variables /2 Sliced variables are important because they reduce the communication overhead of transmitting data to the workers. Note that a variable can be either an input sliced variable, or an output sliced variable: parfor ii = 1 : 3 a = X ( ii ) ; % X i s an input s l i c e d v a r i a b l e Y ( ii ) = a ; % Y i s an output s l i c e d v a r i a b l e A variable which is not sliced, nor assigned in the for-loop is called a broadcast variable. You should strive to make as much variables sliced as possible.

25 Sliced, reduction, and temporary variables Reduction variables A reduction variable is the only exception to the rule that loops must be indepent. A reduction assignment accumulates iteratively the result of a computation, with the prerequisite that the operation must not dep on the ordering of the iterations. The easiest example: a = 0 ; parfor ii = 1 : 3 a = a + X ( ii ) ; % a i s the reduction variable MATLAB handles reduction assignment in a particular way, leading to some surprising results. Consider the following: a = [ ] ; parfor ii = 1 : 3 a = [ a ; X ( ii ) ] ; % concatenation i s a valid reduction assignment Despite the fact that iterations are not executed in order, the resulting concatenated vector is ordered.

26 Sliced, reduction, and temporary variables Reduction variables /2 Other examples of reduction operators are * (matrix product),.* (pointwise product), & (logical and), min (minimum), union (set union), etc. You can use your own reduction function: f ; parfor ii = 1 : 3 a = f ( a, X ( ii ) ) ; You must ensure that the function f is associative, i.e.: f (a, f (b, c)) = f (f (a, b), c) Commutativity is not required because you are not allowed to change the order of the operands inside the for-loop.

27 Sliced, reduction, and temporary variables Temporary variables The last special class of variables to consider are temporary variables. A temporary variable is a variable that we assign directly without a reduction assignment: parfor ii = 1 : 3 d = X ( ii ) ; % d i s temporary Temporary variables are effectively destroyed and recreated at every iteration. Moreover, they are not passed back to the client when the computation is over. The main consequence is the following: d = 1 ; parfor ii = 1 : 3 d = 2 ; display ( d ) ; % d has value 1

28 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES

29 Parfor limitations Transparency All the variables inside a par-for loop must be transparent, i.e., they must be visible to the interpreter. As a consequence, the following set of actions is not allowed inside a par-for: Calling a function such as eval, because the input to it is a string, not a variable. This code will not execute: my_variable = 3 ; parfor ii = 1 : 4 eval ( ' my variable ' ) ; Loading/saving a.mat file, or clearing any workspace variables. Calling an external script (although functions are allowed). Requesting any form of user input.

30 Parfor limitations Nested for-loops A par-for cannot be nested inside another par-for, but a for loop can. However, this has a set of limitations. Let use see two of them. First of all, the range of a nested loop must be a constant. The following will not work: parfor ii = 1 : 3 f o r jj = my_range ( ii ) [... ] Additionally, the nested index variable cannot be used as part of an expression for indexing a sliced matrix. The following is invalid: A = zeros ( 3, 3) ; parfor ii = 1 : 3 f o r jj = 1 : 3 A ( ii, jj+1) = i + j ;

31 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES

32 Parfor Examples Example 1 - Batch Image Processing In this experiment, we perform some basic image processing on a batch of images. We can easily parallelize the code by splitting the images over the different workers. Source images for this example are available on the web: SOURCE.

33 Parfor Examples Example 1 - Serial version d = ' data GT ' ; % Directory of images imgs = d ir ( d ) ; % Get a l i s t of the f i l e s % Remove '. ' and '.. ' from the l i s t imgs = imgs ( arrayfun ( x ) strcmp ( x. name ( 1 ), '. ' ) && strcmp ( x. name ( 1 ), '.. ' ), imgs ) ) ; imgs = {imgs. name } ' ; % Simply s t o r e the image names N_imgs = length ( imgs ) ; imgs_o = cell ( N_imgs, 1) ; f o r ii = 1 : N_imgs im = imread ( fullfile ( d, imgs{ii}) ) ; % Read the image imgs_o{ii} = rgb2gray ( im ) ; % Convert to bw imgs_o{ii} = imadjust ( imgs_o{ii}) ; % Enhance c o n t r a s t

34 Parfor Examples Example 1 - Result (a) Before processing (b) After processing Figure: First image in the batch before and after processing.

35 Parfor Examples Example 1 - Parallel version parfor ii = 1 : N_imgs f p r i n t f ( ' Processing %s... \ n ', imgs{ii}) ; im = imread ( fullfile ( d, imgs{ii}) ) ; imgs_o{ii} = rgb2gray ( im ) ; imgs_o{ii} = imadjust ( imgs_o{ii}) ; It s the same! Note that imgs is a sliced input variable, imgs_o is a sliced output variable, im is temporary. Look at the order of printing instructions: the images are not elaborated in alphabetical order anymore.

36 Parfor Examples Example 2 - Computing the Mandelbrot set The following example is adapted from [Kep09]. The Mandelbrot set is the set of complex numbers c such that the following succession remains bounded: z i+1 = z 2 i + c (3) with initial value z 0 = 0. In the following, we sweep the region defined by Re(c) [ 1.4, 0.6] and Im(c) [ 1, 1]. Our method will take 3 parameters: N is the number of points on each dimension, max_iter is the number of iterations to perform, and epsilon is a tolerance.

37 Parfor Examples Example 2 - The code (serial version) % N i s the s i z e of the image, max iter the number of i t e r a t i o n s, epsilon a t o l e r a n c e function W = mandelbrot_serial ( N, max_iter, epsilon ) Z = zeros ( N, N ) ; W = zeros ( N, N ) ; indices = true ( N, N ) ; steps = ( 1 : N ) ' ; [ ReC, ImC ] = meshgrid ( steps '. / ( N/2) 1. 6, steps. / ( N/2) 1) ; C = ReC + 1i * ImC ; f o r i = 1 : max_iter Z ( indices ) = Z ( indices ). * Z ( indices ) + C ( indices ) ; W ( indices ) = exp( abs ( Z ( indices ) ) ) ; indices = W > epsilon ;

38 Parfor Examples Example 2 - Output Figure: Weights of the Mandelbrot set, shown as an image with the gray colormap.

39 Parfor Examples Example 2 - The code (parallel version) We distribute the computation column-wise: function W = mandelbrot_parallel_parfor ( N, max_iter, epsilon ) W = zeros ( N, N ) ; parfor i = 1 : N Z = zeros ( 1, N ) ; W_local = zeros ( 1, N ) ; indices = true ( 1, N ) ; steps = ( 1 : N ) ' ; C_local = ( steps '. / ( N/2) 1. 6 ) + 1i * ( i. / ( N/2) 1) ; f o r j=1:max_iter Z ( indices ) = Z ( indices ). * Z ( indices ) + C_local ( indices ) ; W_local ( indices ) = exp( abs ( Z ( indices ) ) ) ; indices = W_local > epsilon ; W ( i, : ) = W_local ;

40 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES

41 Single-program, multiple-data The SPMD construct The SPMD (Single-Program, Multiple-Data) construct is apparently similar to par-for, but it is largely different. In SPMD, there is a fixed number of workers (also called labs) that execute the same program. Differences are that each lab knows its id, and it can eventually communicate with all the others. Moreover, data on the workers is kept between SPMD commands. A simple operation on all the available workers: spmd my_fcn ( 2 ) ; We can also instantiate a given number of labs, e.g. 2: spmd ( 2 ) my_fcn ( 2 ) ;

42 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES

43 Example: parallel restarts in optimization Description of the example As an example of SPMD programming, consider the following. We want to minimize the following function: f (x) = cos(3πx) x We will use a steepest-descent algorithm (SDA) [BV04]. This procedure, however, may converge to a local minimum, hence we want to run it starting from several different points. In particular, we will run it twice in parallel using SPMD: one time starting in [0, 1], the second time starting in [1, 2].

44 Example: parallel restarts in optimization Steepest-descent optimization (recall) We remember some concepts from unconstrained optimization. We want to minimize a function f (x), x R d on the whole domain, under the assumption that f is continuous and differentiable. Starting from a point x 0, SDA updates iteratively the estimate as: x i+1 = x i µ f (x i ) (4) where f (x i ) is the gradient of f evaluated in x i and µ a parameter known as step-size. Supposing that µ satisfies the Wolfe conditions at every iteration [BV04], Eq. (4) will converge to a local minimum [BV04]. For simplicity, we will consider a small, fixed µ in our implementation.

45 Example: parallel restarts in optimization Steepest-descent code For computing the derivative, we use the Symbolic Toolbox of MAT- LAB. In the code, f is a symbolic expression, x0 is the starting point, step_size is the step size, and tol is a tolerance to use as stopping criterion: function x_best = grad_descent ( f, x0, step_size, tol ) f_prime = d i f f ( f ) ; % Derivative of f gap = I n f ; % We e x i t whenever gap > t o l x_best = x0 ; % Current best estimate while gap > tol descent_dir = double ( f_prime ( x_best ) ) ; x_new = x_best + step_size * descent_dir ; gap = abs ( x_new x_best ) ; x_best = x_new ; f p r i n t f ( 'Minimum reached : %.2 f \n ', double ( f ( x_best ) ) ) ;

46 Example: parallel restarts in optimization SPMD code syms x ; f = symfun ( cos ( 3 * pi * x )./x, x ) ; matlabpool open spmd % labindex returns the id of the current worker x_best = grad_descent ( f, ( labindex 1) + rand ( ), 0. 01, 10ˆ 4) ; matlabpool c l o s e The output on the console: Starting matlabpool using the local profile... connected to 2 workers. Lab 1: Minimum reached: Lab 2: Minimum reached: Sing a stop signal to all the workers... stopped.

47 Example: parallel restarts in optimization The result Y value X value Figure: Result of running in parallel an SDA procedure from two different starting points.

48 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES

49 Composites and distributed arrays Composites In the previous example, x_best is a Composite object. A composite is similar to an N-dimensional cell array, where each element is the result of the computation on the i-th lab. You can create a composite outside a SPMD block: a = Composite ( ) ; You can access and set each element using curly brackets: a = Composite ( 2 ) ; spmd ( 2 ) a = my_fcn ( ) ; b = a{1} % a c c e s s value from lab 1 a{2} = 1 ; % set value on lab 2 ( for successive spmd c a l l s )

50 Composites and distributed arrays Creating a distributed array An important aspect of SPMD is the possibility of distributing a single array over the workers. This can be done in three different ways: Calling the distributed function with an existing array. Using an overloaded constructor with the distributed flag. Creating the distributed array directly on the workers (in this case, it is called codistributed). A large set of MATLAB functions are overloaded to work on distributed arrays: using-matlab-functions-on-codistributed-arrays. html

51 Composites and distributed arrays Distributing an existing array Create a matrix, distribute it, and perform an inversion on the workers: A = randn ( 500, 500) ; A = distributed ( A ) ; Ainv = inv ( A ) ; Note that Ainv is implicitly distributed, which can be seen by analyzing the workspace: >> whos Name Size Bytes Class Attributes A 500x distributed Ainv 500x distributed We can collect the array on the client using the gather function: Ainv = gather ( Ainv ) ;

52 Composites and distributed arrays Overloaded constructors In the previous examples, the computation is parallelized implicitly. Here is an equivalent version, using the distributed constructor for randn, and performing the computation inside a SPMD block: A = distributed. randn ( 500, 500) ; spmd Ainv = inv ( A ) ; In cluster configurations, this allows you to construct arrays that are bigger than the memory limit on the single machine. You can change the way in which the array is partitioned using a codistributor object:

53 Composites and distributed arrays Codistributed arrays A distributed array can also be created inside an SPMD block. In this case, it is called a codistributed array. Let us reformulate the previous example: spmd A = codistributed. randn ( 500, 500) ; Ainv = inv ( A ) ; Inside a lab, you can obtain the local portion of the array using the getlocalpart method: spmd A = codistributed. randn ( 500, 500) ; A_local = getlocalpart ( A ) ;

54 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES

55 Communication functions Worker communication SPMD allows workers to communicate through the use of a simple set of functions: labbarrier blocks the execution until all the workers are at the given point. labbroadcast ss a value from a worker to all the others. labs ss a value from a worker to another. labreceive receives a value from another worker. And a few others: task-control-and-worker-communication.html We show the use of these functions by implementing a simple Diffusion Least Mean-square filter [LS08].

56 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES

57 Example: a simple diffusion LMS Least mean-square filter (recall) Given an input buffer x i, i = 1,..., n, and a desired output d i, i = 1,..., n, we want a linear model of the form f (x) = w T x. Starting from an initial estimate w 0, at every iteration we apply a steepest-descent approach, approximating the overall error with the instantaneous error e i = d i f (x i ). The resulting filter is called the Least Mean-Square (LMS) adaptive filter: w i+1 = w i µe i x i (5)

58 Example: a simple diffusion LMS Diffusion LMS Consider a network of N agents. Every agent has its own version of the input signal, and its own estimate of the optimal weights. In particular, at time instant i, the k-th agent receives an input buffer x k,i, it can communicate with a set of neighbors N k, and it has an estimate of the weights given by w k,i. A diffused version of the LMS is obtained as follows: 1 Each agent constructs a weighted sum of the estimates of its neighbors: φ k,i = l N k k s klw l,i. The weights s jz are called trusting factors, and they are set in the beginning (for simplicity). 2 Then, it applies an LMS rule to the new weight vector φ k,i, using its version of the input. See a full discussion in [LS08].

59 Example: a simple diffusion LMS The code For simplicity, we suppose that: All the N agents are connected to all the others. Step size at every node is extracted in [0, 0.01]. The trust weights are all equal to s = 1/N. The agents differentiate themselves on the initialization of the weight vector and on the specific input signal. The input signal is white noise. There is random noise on the output extracted from a Gaussian distribution with 0 mean and σ 2 = Removing the constraint on the connectivity is left as an exercise.

60 Example: a simple diffusion LMS The code /2 We start by initializing all the required values: % Define system to be i d e n t i f i e d wo = [ ; ; 0.96; 0.02; 0.13]; n_agents = 2 ; N = 10000; matlabpool ( ' open ', n_agents ) ; m = length ( wo ) ; equal_trust = 1/n_agents ; spmd step_size = rand * ; x = randn ( N, m ) ; y = x * wo + randn ( N, 1) * ; wopt = rand ( m, 1) ; err = zeros ( N, 1) ;

61 Example: a simple diffusion LMS The code /3 We apply the diffused rule at every instant: spmd f o r i = 1 : N % Combine the weights phi = equal_trust * wopt ; labs ( wopt, find ( 1 : n_agents = labindex ) ) ; f o r j = 1 : n_agents i f ( j = labindex ) while ( labprobe ( j ) ) phi = phi + equal_trust * labreceive ( j ) ; % Adapt the f i l t e r err ( i ) = y ( i ) x ( i, : ) * phi ; wopt = phi + step_size * err ( i ) * x ( i, : ) ' ; matlabpool ( ' c l o s e ' ) ;

62 Example: a simple diffusion LMS Testing the diffused LMS (6 agents, no diffusion) Figure: Evolution of the MSE in a network with no cooperation.

63 Example: a simple diffusion LMS Testing the diffused LMS (6 agents, with diffusion) Figure: Evolution of the MSE in a network with diffusion.

64 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES

65 Debugging and Profiling Interactive mode Debugging parallel applications can be complicated. MATLAB offers the possibility of running an interactive parallel mode, to see the output of a given set of commands on multiple workers. We can start it using: >> pmode start We enter the desired command in the interactive window: P>> wopt = ones ( d, 1) * labindex ;

66 Debugging and Profiling Interactive debugging

67 Debugging and Profiling Parallel profiler A second tool for debugging is the parallel profiler, which provides information on the functions called inside each SPMD block at the worker level. In an SPMD block, the profiler is started as: mpiprofile on details builtin Then, after the parallel code, but before existing the SPMD block, you can save the information and close the profiler: i n f o = mpiprofile ( ' i n f o ' ) ; mpiprofile off Finally, to view the information (e.g., for worker 1): mpiprofile ( ' viewer ', i n f o {1}) ;

68 Debugging and Profiling Parallel Profiler (Example 1/5) We reformulate the main SPMD block as a function: function err = dlms_local ( x, y, wopt, step_size, equal_trust, n_agents ) N = length ( y ) ; err = zeros ( N, 1) ; f o r i = 1 : N % Combine the weights phi = equal_trust * wopt ; labs ( wopt, find ( 1 : n_agents = labindex ) ) ; f o r j = 1 : n_agents i f ( j = labindex ) while ( labprobe ( j ) ) phi = phi + equal_trust * labreceive ( j ) ; % Adapt the f i l t e r err ( i ) = y ( i ) x ( i, : ) * phi ; wopt = phi + step_size * err ( i ) * x ( i, : ) ' ;

69 Debugging and Profiling Parallel Profiler (Example 2/5) Now we can call the profiler to monitor the function: spmd mpiprofile on detail builtin err = dlms_local ( x, y, wopt, step_size, equal_trust, n_agents ) ; i n f o = mpiprofile ( ' i n f o ' ) ; mpiprofile off mpiprofile ( ' viewer ', info {1}) ; % Get information on worker 1 Note that we are also tracking the builtin functions of MATLAB.

70 Debugging and Profiling Parallel Profiler (Example 3/5)

71 Debugging and Profiling Parallel Profiler (Example 4/5)

72 Debugging and Profiling Parallel Profiler (Example 5/5) There is probably a better way to implement this... You can also combine the parallel profiler with the interactive mode to get information on the full cluster.

73 Bibliography I Stephen P. Boyd and Lieven Vandenberghe, Convex optimization, Cambridge university press, Jeremy Kepner, Parallel MATLAB for multicore and multinode computers, vol. 21, SIAM, C. G. Lopes and A. H. Sayed, Diffusion least-mean squares over adaptive networks: Formulation and performance analysis, IEEE Transactions on Signal Processing 56 (2008), no. 7, Igor L Markov, Limits on fundamental limits to computation, Nature 512 (2014), no. 7513,

Adaptive Algorithms and Parallel Computing

Adaptive Algorithms and Parallel Computing Adaptive Algorithms and Parallel Computing Multicore and Multimachine Computing with MATLAB Academic year 2016-2017 Simone Scardapane Content of the Lecture Main points: Parallelizing loops with the par-for

More information

Parallel Processing Tool-box

Parallel Processing Tool-box Parallel Processing Tool-box Start up MATLAB in the regular way. This copy of MATLAB that you start with is called the "client" copy; the copies of MATLAB that will be created to assist in the computation

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University

More information

Matlab: Parallel Computing Toolbox

Matlab: Parallel Computing Toolbox Matlab: Parallel Computing Toolbox Shuxia Zhang University of Mineesota e-mail: szhang@msi.umn.edu or help@msi.umn.edu Tel: 612-624-8858 (direct), 612-626-0802(help) Outline Introduction - Matlab PCT How

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Jemmy Hu SHARCNET HPC Consultant University of Waterloo May 24, 2012 https://www.sharcnet.ca/~jemmyhu/tutorials/uwo_2012 Content MATLAB: UWO site license on goblin MATLAB:

More information

MATLAB on BioHPC. portal.biohpc.swmed.edu Updated for

MATLAB on BioHPC. portal.biohpc.swmed.edu Updated for MATLAB on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2015-06-17 What is MATLAB High level language and development environment for: - Algorithm and application

More information

Lecture x: MATLAB - advanced use cases

Lecture x: MATLAB - advanced use cases Lecture x: MATLAB - advanced use cases Parallel computing with Matlab s toolbox Heikki Apiola and Juha Kuortti February 22, 2018 Aalto University juha.kuortti@aalto.fi, heikki.apiola@aalto.fi Parallel

More information

Parallel MATLAB at VT

Parallel MATLAB at VT Parallel MATLAB at VT Gene Cliff (AOE/ICAM - ecliff@vt.edu ) James McClure (ARC/ICAM - mcclurej@vt.edu) Justin Krometis (ARC/ICAM - jkrometis@vt.edu) 11:00am - 11:50am, Thursday, 25 September 2014... NLI...

More information

Parallel Programming in MATLAB on BioHPC

Parallel Programming in MATLAB on BioHPC Parallel Programming in MATLAB on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2017-05-17 What is MATLAB High level language and development environment for:

More information

Parallel Computing with MATLAB on Discovery Cluster

Parallel Computing with MATLAB on Discovery Cluster Parallel Computing with MATLAB on Discovery Cluster Northeastern University Research Computing: Nilay K Roy, MS Computer Science, Ph.D Computational Physics Lets look at the Discovery Cluster Matlab environment

More information

Using the MATLAB Parallel Computing Toolbox on the UB CCR cluster

Using the MATLAB Parallel Computing Toolbox on the UB CCR cluster Using the MATLAB Parallel Computing Toolbox on the UB CCR cluster N. Barlow, C. Cornelius, S. Matott Center for Computational Research University at Buffalo State University of New York October, 1, 2013

More information

Daniel D. Warner. May 31, Introduction to Parallel Matlab. Daniel D. Warner. Introduction. Matlab s 5-fold way. Basic Matlab Example

Daniel D. Warner. May 31, Introduction to Parallel Matlab. Daniel D. Warner. Introduction. Matlab s 5-fold way. Basic Matlab Example to May 31, 2010 What is Matlab? Matlab is... an Integrated Development Environment for solving numerical problems in computational science. a collection of state-of-the-art algorithms for scientific computing

More information

Speeding up MATLAB Applications Sean de Wolski Application Engineer

Speeding up MATLAB Applications Sean de Wolski Application Engineer Speeding up MATLAB Applications Sean de Wolski Application Engineer 2014 The MathWorks, Inc. 1 Non-rigid Displacement Vector Fields 2 Agenda Leveraging the power of vector and matrix operations Addressing

More information

Using Parallel Computing Toolbox to accelerate the Video and Image Processing Speed. Develop parallel code interactively

Using Parallel Computing Toolbox to accelerate the Video and Image Processing Speed. Develop parallel code interactively Using Parallel Computing Toolbox to accelerate the Video and Image Processing Speed Presenter: Claire Chuang TeraSoft Inc. Agenda Develop parallel code interactively parallel applications for faster processing

More information

Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer

Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster

More information

Matlab for Engineers

Matlab for Engineers Matlab for Engineers Alistair Johnson 31st May 2012 Centre for Doctoral Training in Healthcare Innovation Institute of Biomedical Engineering Department of Engineering Science University of Oxford Supported

More information

Performance Analysis of Matlab Code and PCT

Performance Analysis of Matlab Code and PCT p. 1/45 Performance Analysis of Matlab Code and PCT Xiaoxu Guan High Performance Computing, LSU March 21, 2018 1 tic; 2 nsize = 10000; 3 for k = 1:nsize 4 B(k) = sum( A(:,k) ); 5 6 toc; p. 2/45 Overview

More information

Parallel and Distributed Computing with MATLAB The MathWorks, Inc. 1

Parallel and Distributed Computing with MATLAB The MathWorks, Inc. 1 Parallel and Distributed Computing with MATLAB 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster insight on more complex problems with larger datasets

More information

Modeling and Simulating Social Systems with MATLAB

Modeling and Simulating Social Systems with MATLAB Modeling and Simulating Social Systems with MATLAB Lecture 6 Optimization and Parallelization Olivia Woolley, Tobias Kuhn, Dario Biasini, Dirk Helbing Chair of Sociology, in particular of Modeling and

More information

MATLAB AND PARALLEL COMPUTING

MATLAB AND PARALLEL COMPUTING Image Processing & Communication, vol. 17, no. 4, pp. 207-216 DOI: 10.2478/v10248-012-0048-5 207 MATLAB AND PARALLEL COMPUTING MAGDALENA SZYMCZYK, PIOTR SZYMCZYK AGH University of Science and Technology,

More information

INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX. Kadin Tseng Boston University Scientific Computing and Visualization

INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX. Kadin Tseng Boston University Scientific Computing and Visualization INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX Kadin Tseng Boston University Scientific Computing and Visualization MATLAB Parallel Computing Toolbox 2 The Parallel Computing Toolbox is a MATLAB tool

More information

INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX

INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX Keith Ma ---------------------------------------- keithma@bu.edu Research Computing Services ----------- help@rcs.bu.edu Boston University ----------------------------------------------------

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Jos Martin Principal Architect, Parallel Computing Tools jos.martin@mathworks.co.uk 2015 The MathWorks, Inc. 1 Overview Scene setting Task Parallel (par*) Why doesn t it

More information

Parallel Programming in MATLAB on BioHPC

Parallel Programming in MATLAB on BioHPC Parallel Programming in MATLAB on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2018-03-21 What is MATLAB High level language and development environment for:

More information

Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen

Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen Frank Graeber Application Engineering MathWorks Germany 2013 The MathWorks, Inc. 1 Speed up the serial code within core

More information

Getting Started with MATLAB Francesca Perino

Getting Started with MATLAB Francesca Perino Getting Started with MATLAB Francesca Perino francesca.perino@mathworks.it 2014 The MathWorks, Inc. 1 Agenda MATLAB Intro Importazione ed esportazione Programmazione in MATLAB Tecniche per la velocizzazione

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

Adaptive Filters Algorithms (Part 2)

Adaptive Filters Algorithms (Part 2) Adaptive Filters Algorithms (Part 2) Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing and System

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Jos Martin Principal Architect, Parallel Computing Tools jos.martin@mathworks.co.uk 1 2013 The MathWorks, Inc. www.matlabexpo.com Code used in this presentation can be found

More information

Introductory Parallel and Distributed Computing Tools of nirvana.tigem.it

Introductory Parallel and Distributed Computing Tools of nirvana.tigem.it Introductory Parallel and Distributed Computing Tools of nirvana.tigem.it A Computer Cluster is a group of networked computers, working together as a single entity These computer are called nodes Cluster

More information

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU 1 1 Samara National Research University, Moskovskoe Shosse 34, Samara, Russia, 443086 Abstract.

More information

MATLAB Parallel Computing

MATLAB Parallel Computing MATLAB Parallel Computing John Burkardt Information Technology Department Virginia Tech... FDI Summer Track V: Using Virginia Tech High Performance Computing http://people.sc.fsu.edu/ jburkardt/presentations/fdi

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Notes: Parallel MATLAB

Notes: Parallel MATLAB Notes: Parallel MATLAB Adam Charles Last Updated: August 3, 2010 Contents 1 General Idea of Parallelization 2 2 MATLAB implementation 2 2.1 Parallel Workspaces and Interactive Pools........................

More information

Linear Regression Optimization

Linear Regression Optimization Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:

More information

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one

More information

Parallelization Strategy

Parallelization Strategy COSC 335 Software Design Parallel Design Patterns (II) Spring 2008 Parallelization Strategy Finding Concurrency Structure the problem to expose exploitable concurrency Algorithm Structure Supporting Structure

More information

MATLAB Distributed Computing Server (MDCS) Training

MATLAB Distributed Computing Server (MDCS) Training MATLAB Distributed Computing Server (MDCS) Training Artemis HPC Integration and Parallel Computing with MATLAB Dr Hayim Dar hayim.dar@sydney.edu.au Dr Nathaniel Butterworth nathaniel.butterworth@sydney.edu.au

More information

Accelerating Implicit LS-DYNA with GPU

Accelerating Implicit LS-DYNA with GPU Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,

More information

CS 475: Parallel Programming Introduction

CS 475: Parallel Programming Introduction CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.

More information

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today

More information

Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen

Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen Michael Glaßer Application Engineering MathWorks Germany 2014 The MathWorks, Inc. 1 Key Takeaways 1. Speed up your serial

More information

Calcul intensif et Stockage de Masse. CÉCI/CISM HPC training sessions

Calcul intensif et Stockage de Masse. CÉCI/CISM HPC training sessions Calcul intensif et Stockage de Masse CÉCI/ HPC training sessions Calcul intensif et Stockage de Masse Parallel Matlab on the cluster /CÉCI Training session www.uclouvain.be/cism www.ceci-hpc.be November

More information

Data Analysis with MATLAB. Steve Lantz Senior Research Associate Cornell CAC

Data Analysis with MATLAB. Steve Lantz Senior Research Associate Cornell CAC Data Analysis with MATLAB Steve Lantz Senior Research Associate Cornell CAC Workshop: Data Analysis on Ranger, October 12, 2009 MATLAB Has Many Capabilities for Data Analysis Preprocessing Scaling and

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing This document consists of two parts. The first part introduces basic concepts and issues that apply generally in discussions of parallel computing. The second part consists

More information

On the Parallel Implementation of Best Fit Decreasing Algorithm in Matlab

On the Parallel Implementation of Best Fit Decreasing Algorithm in Matlab Contemporary Engineering Sciences, Vol. 10, 2017, no. 19, 945-952 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ces.2017.79120 On the Parallel Implementation of Best Fit Decreasing Algorithm in

More information

Optimizing and Accelerating Your MATLAB Code

Optimizing and Accelerating Your MATLAB Code Optimizing and Accelerating Your MATLAB Code Sofia Mosesson Senior Application Engineer 2016 The MathWorks, Inc. 1 Agenda Optimizing for loops and using vector and matrix operations Indexing in different

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

COMP 633 Parallel Computing.

COMP 633 Parallel Computing. COMP 633 Parallel Computing http://www.cs.unc.edu/~prins/classes/633/ Parallel computing What is it? multiple processors cooperating to solve a single problem hopefully faster than a single processor!

More information

Diffusion Spline Adaptive Filtering

Diffusion Spline Adaptive Filtering 24 th European Signal Processing Conference (EUSIPCO 16) Diffusion Spline Adaptive Filtering Authors: Simone Scardapane, Michele Scarpiniti, Danilo Comminiello and Aurelio Uncini Contents Introduction

More information

Calcul intensif et Stockage de Masse. CÉCI/CISM HPC training sessions Use of Matlab on the clusters

Calcul intensif et Stockage de Masse. CÉCI/CISM HPC training sessions Use of Matlab on the clusters Calcul intensif et Stockage de Masse CÉCI/ HPC training sessions Use of Matlab on the clusters Typical usage... Interactive Batch Type in and get an answer Submit job and fetch results Sequential Parallel

More information

MATLAB Distributed Computing Server Release Notes

MATLAB Distributed Computing Server Release Notes MATLAB Distributed Computing Server Release Notes How to Contact MathWorks www.mathworks.com Web comp.soft-sys.matlab Newsgroup www.mathworks.com/contact_ts.html Technical Support suggest@mathworks.com

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Image Processing. Filtering. Slide 1

Image Processing. Filtering. Slide 1 Image Processing Filtering Slide 1 Preliminary Image generation Original Noise Image restoration Result Slide 2 Preliminary Classic application: denoising However: Denoising is much more than a simple

More information

Parallel Computing Concepts. CSInParallel Project

Parallel Computing Concepts. CSInParallel Project Parallel Computing Concepts CSInParallel Project July 26, 2012 CONTENTS 1 Introduction 1 1.1 Motivation................................................ 1 1.2 Some pairs of terms...........................................

More information

732A54/TDDE31 Big Data Analytics

732A54/TDDE31 Big Data Analytics 732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks

More information

Chapter 13 Strong Scaling

Chapter 13 Strong Scaling Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Mathematical Programming and Research Methods (Part II)

Mathematical Programming and Research Methods (Part II) Mathematical Programming and Research Methods (Part II) 4. Convexity and Optimization Massimiliano Pontil (based on previous lecture by Andreas Argyriou) 1 Today s Plan Convex sets and functions Types

More information

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Performance Analysis Instructor: Haidar M. Harmanani Spring 2018 Outline Performance scalability Analytical performance measures Amdahl

More information

Parallel MATLAB: Parallel For Loops

Parallel MATLAB: Parallel For Loops Parallel MATLAB: Parallel For Loops John Burkardt (FSU) & Gene Cliff (AOE/ICAM) 2pm - 4pm, Tuesday, 31 May 2011 Mathematics Commons Room... vt 2011 parfor.pdf... FSU: Florida State University AOE: Department

More information

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment

More information

ART Exhibit. Per Christian Hansen. Tommy Elfving Touraj Nikazad Hans Henrik B. Sørensen. joint work with, among others

ART Exhibit. Per Christian Hansen. Tommy Elfving Touraj Nikazad Hans Henrik B. Sørensen. joint work with, among others ART Exhibit Per Christian Hansen joint work with, among others Tommy Elfving Touraj Nikazad Hans Henrik B. Sørensen About Me Forward problem Interests: numerical methods for inverse problems and tomography,

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

Convex Optimization MLSS 2015

Convex Optimization MLSS 2015 Convex Optimization MLSS 2015 Constantine Caramanis The University of Texas at Austin The Optimization Problem minimize : f (x) subject to : x X. The Optimization Problem minimize : f (x) subject to :

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

The Trainable Alternating Gradient Shrinkage method

The Trainable Alternating Gradient Shrinkage method The Trainable Alternating Gradient Shrinkage method Carlos Malanche, supervised by Ha Nguyen April 24, 2018 LIB (prof. Michael Unser), EPFL Quick example 1 The reconstruction problem Challenges in image

More information

A parallel approach of Best Fit Decreasing algorithm

A parallel approach of Best Fit Decreasing algorithm A parallel approach of Best Fit Decreasing algorithm DIMITRIS VARSAMIS Technological Educational Institute of Central Macedonia Serres Department of Informatics Engineering Terma Magnisias, 62124 Serres

More information

Column-Action Methods in Image Reconstruction

Column-Action Methods in Image Reconstruction Column-Action Methods in Image Reconstruction Per Christian Hansen joint work with Tommy Elfving Touraj Nikazad Overview of Talk Part 1: the classical row-action method = ART The advantage of algebraic

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

Microsoft Windows HPC Server 2008 R2 for the Cluster Developer

Microsoft Windows HPC Server 2008 R2 for the Cluster Developer 50291B - Version: 1 02 May 2018 Microsoft Windows HPC Server 2008 R2 for the Cluster Developer Microsoft Windows HPC Server 2008 R2 for the Cluster Developer 50291B - Version: 1 5 days Course Description:

More information

Building a system for symbolic differentiation

Building a system for symbolic differentiation Computer Science 21b Structure and Interpretation of Computer Programs Building a system for symbolic differentiation Selectors, constructors and predicates: (constant? e) Is e a constant? (variable? e)

More information

Lecture 12: convergence. Derivative (one variable)

Lecture 12: convergence. Derivative (one variable) Lecture 12: convergence More about multivariable calculus Descent methods Backtracking line search More about convexity (first and second order) Newton step Example 1: linear programming (one var., one

More information

MATLAB Based Optimization Techniques and Parallel Computing

MATLAB Based Optimization Techniques and Parallel Computing MATLAB Based Optimization Techniques and Parallel Computing Bratislava June 4, 2009 2009 The MathWorks, Inc. Jörg-M. Sautter Application Engineer The MathWorks Agenda Introduction Local and Smooth Optimization

More information

Chapter 6 Parallel Loops

Chapter 6 Parallel Loops Chapter 6 Parallel Loops Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Convexity Theory and Gradient Methods

Convexity Theory and Gradient Methods Convexity Theory and Gradient Methods Angelia Nedić angelia@illinois.edu ISE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign Outline Convex Functions Optimality

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more

More information

Multicore Computer, GPU 및 Cluster 환경에서의 MATLAB Parallel Computing 기능

Multicore Computer, GPU 및 Cluster 환경에서의 MATLAB Parallel Computing 기능 Multicore Computer, GPU 및 Cluster 환경에서의 MATLAB Parallel Computing 기능 성호현 MathWorks Korea 2012 The MathWorks, Inc. 1 A Question to Consider Do you want to speed up your algorithms? If so Do you have a multi-core

More information

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP AMath 483/583 Lecture 13 April 25, 2011 Amdahl s Law Today: Amdahl s law Speed up, strong and weak scaling OpenMP Typically only part of a computation can be parallelized. Suppose 50% of the computation

More information

MATLAB Parallel Computing Toolbox Benchmark for an Embarrassingly Parallel Application

MATLAB Parallel Computing Toolbox Benchmark for an Embarrassingly Parallel Application MATLAB Parallel Computing Toolbox Benchmark for an Embarrassingly Parallel Application By Nils Oberg, Benjamin Ruddell, Marcelo H. García, and Praveen Kumar Department of Civil and Environmental Engineering

More information

Speeding up MATLAB Applications The MathWorks, Inc.

Speeding up MATLAB Applications The MathWorks, Inc. Speeding up MATLAB Applications 2009 The MathWorks, Inc. Agenda Leveraging the power of vector & matrix operations Addressing bottlenecks Utilizing additional processing power Summary 2 Example: Block

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 2: MapReduce Algorithm Design (2/2) January 14, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Spring 2 Parallelization Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Outline Why Parallelism Parallel Execution Parallelizing Compilers

More information

Accelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization

Accelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization Accelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization Ryan Chladny Application Engineering May 13 th, 2014 2014 The MathWorks, Inc. 1 Design Challenge: Electric

More information

Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton. Matlab Tutorial. Lesson: Matlab Tutorial

Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton. Matlab Tutorial. Lesson: Matlab Tutorial 1 Matlab Tutorial 2 Lecture Learning Objectives Each student should be able to: Describe the Matlab desktop Explain the basic use of Matlab variables Explain the basic use of Matlab scripts Explain the

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited. page v Preface xiii I Basics 1 1 Optimization Models 3 1.1 Introduction... 3 1.2 Optimization: An Informal Introduction... 4 1.3 Linear Equations... 7 1.4 Linear Optimization... 10 Exercises... 12 1.5

More information

CONLIN & MMA solvers. Pierre DUYSINX LTAS Automotive Engineering Academic year

CONLIN & MMA solvers. Pierre DUYSINX LTAS Automotive Engineering Academic year CONLIN & MMA solvers Pierre DUYSINX LTAS Automotive Engineering Academic year 2018-2019 1 CONLIN METHOD 2 LAY-OUT CONLIN SUBPROBLEMS DUAL METHOD APPROACH FOR CONLIN SUBPROBLEMS SEQUENTIAL QUADRATIC PROGRAMMING

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

A Comprehensive Study on the Performance of Implicit LS-DYNA

A Comprehensive Study on the Performance of Implicit LS-DYNA 12 th International LS-DYNA Users Conference Computing Technologies(4) A Comprehensive Study on the Performance of Implicit LS-DYNA Yih-Yih Lin Hewlett-Packard Company Abstract This work addresses four

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware

More information

Image Registration Lecture 4: First Examples

Image Registration Lecture 4: First Examples Image Registration Lecture 4: First Examples Prof. Charlene Tsai Outline Example Intensity-based registration SSD error function Image mapping Function minimization: Gradient descent Derivative calculation

More information

Scaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1

Scaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1 Scaling MATLAB for Your Organisation and Beyond Rory Adams 2015 The MathWorks, Inc. 1 MATLAB at Scale Front-end scaling Scale with increasing access requests Back-end scaling Scale with increasing computational

More information

Evaluating the MATLAB Parallel Computing Toolbox Kashif Hussain Computer Science 2012/2013

Evaluating the MATLAB Parallel Computing Toolbox Kashif Hussain Computer Science 2012/2013 Evaluating the MATLAB Parallel Computing Toolbox Kashif Hussain Computer Science 2012/2013 The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @

More information

High Performance Computing for BU Economists

High Performance Computing for BU Economists High Performance Computing for BU Economists Marc Rysman Boston University October 25, 2013 Introduction We have a fabulous super-computer facility at BU. It is free for you and easy to gain access. It

More information

Parallelization of Sequential Programs

Parallelization of Sequential Programs Parallelization of Sequential Programs Alecu Felician, Pre-Assistant Lecturer, Economic Informatics Department, A.S.E. Bucharest Abstract The main reason of parallelization a sequential program is to run

More information