Adaptive Algorithms and Parallel Computing
|
|
- Bennett McDowell
- 6 years ago
- Views:
Transcription
1 Adaptive Algorithms and Parallel Computing Multicore and Multimachine Computing with MATLAB (Part 1) Academic year Simone Scardapane
2 Content of the Lecture Main points: Parallelizing loops with the par-for construct. Understanding the single program, multiple data (SPMD) concept. Distributed arrays. Communication between workers. Interactive parallel programming. In the next lecture(s): MapReduce and Hadoop. Java threading. Introduction to CUDA. GPU acceleration in MATLAB.
3 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES
4 Basic Ideas of Parallel Computing Types of parallelism Parallel computing refers to executing multiple portions of a program at the same time on different hardware components. Parallel behaviors can be categorized in several ways, such as the granularity of parallelism: Bit-level: each processor does not operate on single bits, but on words of multiple bits (e.g. 32 or 64 bit architectures). This is the most basic form of parallelization. Instruction-level: this is the capability of parallelizing multiple CPU instruction in a single cycle. This is declined in various forms, including pipelining, superscalar processors, speculative execution, etc. Task-level: executing the same or different code on multiple threads, which can operate on the same or different data.
5 Basic Ideas of Parallel Computing Moore s Law In 1965 Gordon Moore predicted an exponential increase in the number of elements per integrated circuit. 40 years later, Intel switched from increasing frequency scaling (due to the breakdown of the so-called Dennard scaling ) in favor of multi-core architectures. Nowadays, we may be witnessing the of the multi-core era towards different paradigms (three-dimensional circuits, quantum computing, etc.) [Mar14]. At the same, there is a renewed interest in (a) commodity clusters (e.g. Hadoop/MapReduce); (b) computing with dedicated architectural components; and (c) increasing parallelism with general purpose GPU cards.
6 Basic Ideas of Parallel Computing Amdahl s Law A first problem of parallel computing (compared to classical CPU frequency scaling) is that the speedup of parallelization is not linear w.r.t. the number of threads. If we assume that only a portion (1 β) [0, 1] of a program can be fully parallelized, and the rest has to be run serially, the time T(n) required to run the program on n threads is given by: T(n) = βt(1) + 1 (1 β)t(1) (1) n This is known as Amdahl s law. It is straightforward to compute the speedup provided by parallelizing the execution: S(n) = T(1) T(n) = T(1) βt(1) + 1 n (1 β)t(1) = 1 β + 1 n (1 β) (2)
7 Basic Ideas of Parallel Computing Amdahl s Law (2) Figure: Graphical depiction of Amdahl s Law when considering programs with different levels of parallelization [SOURCE].
8 Basic Ideas of Parallel Computing Depencies Another essential concept in parallel computing is the idea of depency. One portion of the program is depent of a previous one if one of the following conditions hold: 1 One output of the first portion is required by the second (flow depency). 2 One output of the second portion is required by the first (anti-depency). 3 Both segments output the same variable (output depency). These are known as Bernstein s conditions. One way of circumventing them is to consider some form of shared memory between the portions, which brings the necessity of synchronizing reading and writing operations.
9 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES
10 Introduction to the Parallel Computing Toolbox The MATLAB Ecosystem Figure: Products in the MATLAB family [SOURCE].
11 Introduction to the Parallel Computing Toolbox Implicit and explicit parallelization In MATLAB, several built-in functions performing numerical operations are multi-threaded by default (so-called implicit parallelism). Conversely, you can parallelize your own code using the capabilities of the Parallel Computing Toolbox (PCT). In this lecture, we focus on several of its features: par-for loops, spmd blocks, distributed arrays, etc. An important aspect of the PCT is that the same code can scale transparently to any cluster configuration (more on this later on). Although we consider only the PCT, other open-source possibilities are available, such as the Parallel Matlab Toolbox (pmatlab):
12 Introduction to the Parallel Computing Toolbox Under the hood of implicit multi-threading How effective is the built-in multi-threading of MATLAB? Let us try a simple benchmark: t i c ; randn (10000, 10000)\randn ( , 1 ) ; f p r i n t f ( ' Elapsed time i s %.2 f s ecs.\n ', toc ) ; On an Intel i7, 16 GB RAM, with MATLAB R2013a, this takes (in average) 11 seconds. Disabling multi-threading, the time rises to 33 seconds. You can disable multi-threading with the starting flag -singlecompthread. Currently, you can also control it in Matlab with the function maxnumcompthreads, but this feature will be removed in future releases.
13 Introduction to the Parallel Computing Toolbox Deprecated Functions A function such as maxnumcompthreads is called deprecated: it is kept for compatibility, but it will be removed in future releases. Starting in R2015a, the documentation of MATLAB provides an annotation detailing the release of introduction for each feature. Additionally, MATLAB offers several functions to check the version: version % MATLAB version number and r e l e a s e ver % Detailed information on the Mathworks products You also have the possibility of defining code conditionally on the version: i f verlessthan ( ' matlab ', ' ' ) e r r o r ( 'MATLAB version not supported. ' ) ;
14 Introduction to the Parallel Computing Toolbox Workers A worker is the individual computational engine of the PCT. Typically, workers are started one per core, in a single-thread mode. They are indepent one of another, having their own workspace, although they possess the capability of communicating between them. A set of workers is called a pool. To start it: matlabpool open % Older versions p = parpool % New versions Similarly, to stop it: matlabpool c l o s e % Old versions d el et e ( p ) % New versions The number of workers, along with their type, is specified in a cluster configuration. The opening command with no argument loads the local configuration.
15 Introduction to the Parallel Computing Toolbox Local and remote workers Workers can be local or remote. Remote workers are handled by the Matlab Distributed Computing Server (DCS). Within a network environment, a job scheduler is used to distribute computation between the workers (image copyright of Mathworks):
16 Introduction to the Parallel Computing Toolbox Job Schedulers You can have one head node, several worker nodes, and one or more client nodes. You can run jobs using multiple job schedulers. This lecture assumes the standard MATLAB one, called MATLAB Job Scheduler. To install it, you can follow the instructions on the MATLAB documentation: Configure Cluster to Use a MATLAB Job Scheduler (MJS). On the client, you can select a specific cluster from Parallel >>Manage Cluster Profiles. You can have more than one profile, and switch them manually or programmatically.
17 Introduction to the Parallel Computing Toolbox Cluster Configuration (screenshot)
18 Introduction to the Parallel Computing Toolbox Admin Center (screenshot)
19 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES
20 Understanding par-for Introductory Example Consider this very simple code: X = randn ( 5 0, 2000) ; y = zeros ( 5 0, 1 ) ; f o r ii = 1 : 5 0 y ( ii ) = my_function ( X ( ii, : ) ) ; Each cycle of the for-loop is completely indepent of the others. If we have a pool available, the previous computation can be distributed among workers using a par-for construct: X = randn ( 5 0, 2000) ; y = zeros ( 5 0, 1 ) ; parfor ii = 1 : 5 0 y ( ii ) = my_function ( X ( ii, : ) ) ;
21 Understanding par-for Parfor execution When designing a parallel for, two concepts are essential to understand: Iterations of the loop are executed in no particular order. MATLAB analyzes what data must be sent to the workers at the beginning of the par-for loop. This puts several constraints on your code (more on this later). As a rule of thumb, you should consider using a par-for whenever you need to perform several times a relatively simple computation, without depencies between each computation. As a second rule, try to decide on the par-for before writing the code, instead of adapting it later on.
22 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES
23 Sliced, reduction, and temporary variables Sliced variables In a par-for, a variable is sliced if it can be decomposed into indepent groups over the workers. A prerequisite is that a variable must be indexed consistently inside a par-for. In the following example, X is sliced, Y is not: parfor ii = 1 : 3 a = X ( A ( ii ), A ( ii ) ) ; b = Y ( A ( ii ), A ( ii+1) ) ; To be sliced, no elements can be inserted or deleted from a matrix inside a for-loop. Here, X is not sliced: parfor ii = 1 : 3 X ( ii, : ) = [ ] ;
24 Sliced, reduction, and temporary variables Sliced variables /2 Sliced variables are important because they reduce the communication overhead of transmitting data to the workers. Note that a variable can be either an input sliced variable, or an output sliced variable: parfor ii = 1 : 3 a = X ( ii ) ; % X i s an input s l i c e d v a r i a b l e Y ( ii ) = a ; % Y i s an output s l i c e d v a r i a b l e A variable which is not sliced, nor assigned in the for-loop is called a broadcast variable. You should strive to make as much variables sliced as possible.
25 Sliced, reduction, and temporary variables Reduction variables A reduction variable is the only exception to the rule that loops must be indepent. A reduction assignment accumulates iteratively the result of a computation, with the prerequisite that the operation must not dep on the ordering of the iterations. The easiest example: a = 0 ; parfor ii = 1 : 3 a = a + X ( ii ) ; % a i s the reduction variable MATLAB handles reduction assignment in a particular way, leading to some surprising results. Consider the following: a = [ ] ; parfor ii = 1 : 3 a = [ a ; X ( ii ) ] ; % concatenation i s a valid reduction assignment Despite the fact that iterations are not executed in order, the resulting concatenated vector is ordered.
26 Sliced, reduction, and temporary variables Reduction variables /2 Other examples of reduction operators are * (matrix product),.* (pointwise product), & (logical and), min (minimum), union (set union), etc. You can use your own reduction function: f ; parfor ii = 1 : 3 a = f ( a, X ( ii ) ) ; You must ensure that the function f is associative, i.e.: f (a, f (b, c)) = f (f (a, b), c) Commutativity is not required because you are not allowed to change the order of the operands inside the for-loop.
27 Sliced, reduction, and temporary variables Temporary variables The last special class of variables to consider are temporary variables. A temporary variable is a variable that we assign directly without a reduction assignment: parfor ii = 1 : 3 d = X ( ii ) ; % d i s temporary Temporary variables are effectively destroyed and recreated at every iteration. Moreover, they are not passed back to the client when the computation is over. The main consequence is the following: d = 1 ; parfor ii = 1 : 3 d = 2 ; display ( d ) ; % d has value 1
28 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES
29 Parfor limitations Transparency All the variables inside a par-for loop must be transparent, i.e., they must be visible to the interpreter. As a consequence, the following set of actions is not allowed inside a par-for: Calling a function such as eval, because the input to it is a string, not a variable. This code will not execute: my_variable = 3 ; parfor ii = 1 : 4 eval ( ' my variable ' ) ; Loading/saving a.mat file, or clearing any workspace variables. Calling an external script (although functions are allowed). Requesting any form of user input.
30 Parfor limitations Nested for-loops A par-for cannot be nested inside another par-for, but a for loop can. However, this has a set of limitations. Let use see two of them. First of all, the range of a nested loop must be a constant. The following will not work: parfor ii = 1 : 3 f o r jj = my_range ( ii ) [... ] Additionally, the nested index variable cannot be used as part of an expression for indexing a sliced matrix. The following is invalid: A = zeros ( 3, 3) ; parfor ii = 1 : 3 f o r jj = 1 : 3 A ( ii, jj+1) = i + j ;
31 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES
32 Parfor Examples Example 1 - Batch Image Processing In this experiment, we perform some basic image processing on a batch of images. We can easily parallelize the code by splitting the images over the different workers. Source images for this example are available on the web: SOURCE.
33 Parfor Examples Example 1 - Serial version d = ' data GT ' ; % Directory of images imgs = d ir ( d ) ; % Get a l i s t of the f i l e s % Remove '. ' and '.. ' from the l i s t imgs = imgs ( arrayfun ( x ) strcmp ( x. name ( 1 ), '. ' ) && strcmp ( x. name ( 1 ), '.. ' ), imgs ) ) ; imgs = {imgs. name } ' ; % Simply s t o r e the image names N_imgs = length ( imgs ) ; imgs_o = cell ( N_imgs, 1) ; f o r ii = 1 : N_imgs im = imread ( fullfile ( d, imgs{ii}) ) ; % Read the image imgs_o{ii} = rgb2gray ( im ) ; % Convert to bw imgs_o{ii} = imadjust ( imgs_o{ii}) ; % Enhance c o n t r a s t
34 Parfor Examples Example 1 - Result (a) Before processing (b) After processing Figure: First image in the batch before and after processing.
35 Parfor Examples Example 1 - Parallel version parfor ii = 1 : N_imgs f p r i n t f ( ' Processing %s... \ n ', imgs{ii}) ; im = imread ( fullfile ( d, imgs{ii}) ) ; imgs_o{ii} = rgb2gray ( im ) ; imgs_o{ii} = imadjust ( imgs_o{ii}) ; It s the same! Note that imgs is a sliced input variable, imgs_o is a sliced output variable, im is temporary. Look at the order of printing instructions: the images are not elaborated in alphabetical order anymore.
36 Parfor Examples Example 2 - Computing the Mandelbrot set The following example is adapted from [Kep09]. The Mandelbrot set is the set of complex numbers c such that the following succession remains bounded: z i+1 = z 2 i + c (3) with initial value z 0 = 0. In the following, we sweep the region defined by Re(c) [ 1.4, 0.6] and Im(c) [ 1, 1]. Our method will take 3 parameters: N is the number of points on each dimension, max_iter is the number of iterations to perform, and epsilon is a tolerance.
37 Parfor Examples Example 2 - The code (serial version) % N i s the s i z e of the image, max iter the number of i t e r a t i o n s, epsilon a t o l e r a n c e function W = mandelbrot_serial ( N, max_iter, epsilon ) Z = zeros ( N, N ) ; W = zeros ( N, N ) ; indices = true ( N, N ) ; steps = ( 1 : N ) ' ; [ ReC, ImC ] = meshgrid ( steps '. / ( N/2) 1. 6, steps. / ( N/2) 1) ; C = ReC + 1i * ImC ; f o r i = 1 : max_iter Z ( indices ) = Z ( indices ). * Z ( indices ) + C ( indices ) ; W ( indices ) = exp( abs ( Z ( indices ) ) ) ; indices = W > epsilon ;
38 Parfor Examples Example 2 - Output Figure: Weights of the Mandelbrot set, shown as an image with the gray colormap.
39 Parfor Examples Example 2 - The code (parallel version) We distribute the computation column-wise: function W = mandelbrot_parallel_parfor ( N, max_iter, epsilon ) W = zeros ( N, N ) ; parfor i = 1 : N Z = zeros ( 1, N ) ; W_local = zeros ( 1, N ) ; indices = true ( 1, N ) ; steps = ( 1 : N ) ' ; C_local = ( steps '. / ( N/2) 1. 6 ) + 1i * ( i. / ( N/2) 1) ; f o r j=1:max_iter Z ( indices ) = Z ( indices ). * Z ( indices ) + C_local ( indices ) ; W_local ( indices ) = exp( abs ( Z ( indices ) ) ) ; indices = W_local > epsilon ; W ( i, : ) = W_local ;
40 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES
41 Single-program, multiple-data The SPMD construct The SPMD (Single-Program, Multiple-Data) construct is apparently similar to par-for, but it is largely different. In SPMD, there is a fixed number of workers (also called labs) that execute the same program. Differences are that each lab knows its id, and it can eventually communicate with all the others. Moreover, data on the workers is kept between SPMD commands. A simple operation on all the available workers: spmd my_fcn ( 2 ) ; We can also instantiate a given number of labs, e.g. 2: spmd ( 2 ) my_fcn ( 2 ) ;
42 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES
43 Example: parallel restarts in optimization Description of the example As an example of SPMD programming, consider the following. We want to minimize the following function: f (x) = cos(3πx) x We will use a steepest-descent algorithm (SDA) [BV04]. This procedure, however, may converge to a local minimum, hence we want to run it starting from several different points. In particular, we will run it twice in parallel using SPMD: one time starting in [0, 1], the second time starting in [1, 2].
44 Example: parallel restarts in optimization Steepest-descent optimization (recall) We remember some concepts from unconstrained optimization. We want to minimize a function f (x), x R d on the whole domain, under the assumption that f is continuous and differentiable. Starting from a point x 0, SDA updates iteratively the estimate as: x i+1 = x i µ f (x i ) (4) where f (x i ) is the gradient of f evaluated in x i and µ a parameter known as step-size. Supposing that µ satisfies the Wolfe conditions at every iteration [BV04], Eq. (4) will converge to a local minimum [BV04]. For simplicity, we will consider a small, fixed µ in our implementation.
45 Example: parallel restarts in optimization Steepest-descent code For computing the derivative, we use the Symbolic Toolbox of MAT- LAB. In the code, f is a symbolic expression, x0 is the starting point, step_size is the step size, and tol is a tolerance to use as stopping criterion: function x_best = grad_descent ( f, x0, step_size, tol ) f_prime = d i f f ( f ) ; % Derivative of f gap = I n f ; % We e x i t whenever gap > t o l x_best = x0 ; % Current best estimate while gap > tol descent_dir = double ( f_prime ( x_best ) ) ; x_new = x_best + step_size * descent_dir ; gap = abs ( x_new x_best ) ; x_best = x_new ; f p r i n t f ( 'Minimum reached : %.2 f \n ', double ( f ( x_best ) ) ) ;
46 Example: parallel restarts in optimization SPMD code syms x ; f = symfun ( cos ( 3 * pi * x )./x, x ) ; matlabpool open spmd % labindex returns the id of the current worker x_best = grad_descent ( f, ( labindex 1) + rand ( ), 0. 01, 10ˆ 4) ; matlabpool c l o s e The output on the console: Starting matlabpool using the local profile... connected to 2 workers. Lab 1: Minimum reached: Lab 2: Minimum reached: Sing a stop signal to all the workers... stopped.
47 Example: parallel restarts in optimization The result Y value X value Figure: Result of running in parallel an SDA procedure from two different starting points.
48 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES
49 Composites and distributed arrays Composites In the previous example, x_best is a Composite object. A composite is similar to an N-dimensional cell array, where each element is the result of the computation on the i-th lab. You can create a composite outside a SPMD block: a = Composite ( ) ; You can access and set each element using curly brackets: a = Composite ( 2 ) ; spmd ( 2 ) a = my_fcn ( ) ; b = a{1} % a c c e s s value from lab 1 a{2} = 1 ; % set value on lab 2 ( for successive spmd c a l l s )
50 Composites and distributed arrays Creating a distributed array An important aspect of SPMD is the possibility of distributing a single array over the workers. This can be done in three different ways: Calling the distributed function with an existing array. Using an overloaded constructor with the distributed flag. Creating the distributed array directly on the workers (in this case, it is called codistributed). A large set of MATLAB functions are overloaded to work on distributed arrays: using-matlab-functions-on-codistributed-arrays. html
51 Composites and distributed arrays Distributing an existing array Create a matrix, distribute it, and perform an inversion on the workers: A = randn ( 500, 500) ; A = distributed ( A ) ; Ainv = inv ( A ) ; Note that Ainv is implicitly distributed, which can be seen by analyzing the workspace: >> whos Name Size Bytes Class Attributes A 500x distributed Ainv 500x distributed We can collect the array on the client using the gather function: Ainv = gather ( Ainv ) ;
52 Composites and distributed arrays Overloaded constructors In the previous examples, the computation is parallelized implicitly. Here is an equivalent version, using the distributed constructor for randn, and performing the computation inside a SPMD block: A = distributed. randn ( 500, 500) ; spmd Ainv = inv ( A ) ; In cluster configurations, this allows you to construct arrays that are bigger than the memory limit on the single machine. You can change the way in which the array is partitioned using a codistributor object:
53 Composites and distributed arrays Codistributed arrays A distributed array can also be created inside an SPMD block. In this case, it is called a codistributed array. Let us reformulate the previous example: spmd A = codistributed. randn ( 500, 500) ; Ainv = inv ( A ) ; Inside a lab, you can obtain the local portion of the array using the getlocalpart method: spmd A = codistributed. randn ( 500, 500) ; A_local = getlocalpart ( A ) ;
54 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES
55 Communication functions Worker communication SPMD allows workers to communicate through the use of a simple set of functions: labbarrier blocks the execution until all the workers are at the given point. labbroadcast ss a value from a worker to all the others. labs ss a value from a worker to another. labreceive receives a value from another worker. And a few others: task-control-and-worker-communication.html We show the use of these functions by implementing a simple Diffusion Least Mean-square filter [LS08].
56 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES
57 Example: a simple diffusion LMS Least mean-square filter (recall) Given an input buffer x i, i = 1,..., n, and a desired output d i, i = 1,..., n, we want a linear model of the form f (x) = w T x. Starting from an initial estimate w 0, at every iteration we apply a steepest-descent approach, approximating the overall error with the instantaneous error e i = d i f (x i ). The resulting filter is called the Least Mean-Square (LMS) adaptive filter: w i+1 = w i µe i x i (5)
58 Example: a simple diffusion LMS Diffusion LMS Consider a network of N agents. Every agent has its own version of the input signal, and its own estimate of the optimal weights. In particular, at time instant i, the k-th agent receives an input buffer x k,i, it can communicate with a set of neighbors N k, and it has an estimate of the weights given by w k,i. A diffused version of the LMS is obtained as follows: 1 Each agent constructs a weighted sum of the estimates of its neighbors: φ k,i = l N k k s klw l,i. The weights s jz are called trusting factors, and they are set in the beginning (for simplicity). 2 Then, it applies an LMS rule to the new weight vector φ k,i, using its version of the input. See a full discussion in [LS08].
59 Example: a simple diffusion LMS The code For simplicity, we suppose that: All the N agents are connected to all the others. Step size at every node is extracted in [0, 0.01]. The trust weights are all equal to s = 1/N. The agents differentiate themselves on the initialization of the weight vector and on the specific input signal. The input signal is white noise. There is random noise on the output extracted from a Gaussian distribution with 0 mean and σ 2 = Removing the constraint on the connectivity is left as an exercise.
60 Example: a simple diffusion LMS The code /2 We start by initializing all the required values: % Define system to be i d e n t i f i e d wo = [ ; ; 0.96; 0.02; 0.13]; n_agents = 2 ; N = 10000; matlabpool ( ' open ', n_agents ) ; m = length ( wo ) ; equal_trust = 1/n_agents ; spmd step_size = rand * ; x = randn ( N, m ) ; y = x * wo + randn ( N, 1) * ; wopt = rand ( m, 1) ; err = zeros ( N, 1) ;
61 Example: a simple diffusion LMS The code /3 We apply the diffused rule at every instant: spmd f o r i = 1 : N % Combine the weights phi = equal_trust * wopt ; labs ( wopt, find ( 1 : n_agents = labindex ) ) ; f o r j = 1 : n_agents i f ( j = labindex ) while ( labprobe ( j ) ) phi = phi + equal_trust * labreceive ( j ) ; % Adapt the f i l t e r err ( i ) = y ( i ) x ( i, : ) * phi ; wopt = phi + step_size * err ( i ) * x ( i, : ) ' ; matlabpool ( ' c l o s e ' ) ;
62 Example: a simple diffusion LMS Testing the diffused LMS (6 agents, no diffusion) Figure: Evolution of the MSE in a network with no cooperation.
63 Example: a simple diffusion LMS Testing the diffused LMS (6 agents, with diffusion) Figure: Evolution of the MSE in a network with diffusion.
64 Overview 1 BASIC CONCEPTS Basic Ideas of Parallel Computing Introduction to the Parallel Computing Toolbox 2 PARALLEL-FOR Understanding par-for Sliced, reduction, and temporary variables Parfor limitations Parfor Examples 3 SPMD Single-program, multiple-data Example: parallel restarts in optimization Composites and distributed arrays 4 COMMUNICATION Communication functions Example: a simple diffusion LMS Debugging and Profiling 5 REFERENCES
65 Debugging and Profiling Interactive mode Debugging parallel applications can be complicated. MATLAB offers the possibility of running an interactive parallel mode, to see the output of a given set of commands on multiple workers. We can start it using: >> pmode start We enter the desired command in the interactive window: P>> wopt = ones ( d, 1) * labindex ;
66 Debugging and Profiling Interactive debugging
67 Debugging and Profiling Parallel profiler A second tool for debugging is the parallel profiler, which provides information on the functions called inside each SPMD block at the worker level. In an SPMD block, the profiler is started as: mpiprofile on details builtin Then, after the parallel code, but before existing the SPMD block, you can save the information and close the profiler: i n f o = mpiprofile ( ' i n f o ' ) ; mpiprofile off Finally, to view the information (e.g., for worker 1): mpiprofile ( ' viewer ', i n f o {1}) ;
68 Debugging and Profiling Parallel Profiler (Example 1/5) We reformulate the main SPMD block as a function: function err = dlms_local ( x, y, wopt, step_size, equal_trust, n_agents ) N = length ( y ) ; err = zeros ( N, 1) ; f o r i = 1 : N % Combine the weights phi = equal_trust * wopt ; labs ( wopt, find ( 1 : n_agents = labindex ) ) ; f o r j = 1 : n_agents i f ( j = labindex ) while ( labprobe ( j ) ) phi = phi + equal_trust * labreceive ( j ) ; % Adapt the f i l t e r err ( i ) = y ( i ) x ( i, : ) * phi ; wopt = phi + step_size * err ( i ) * x ( i, : ) ' ;
69 Debugging and Profiling Parallel Profiler (Example 2/5) Now we can call the profiler to monitor the function: spmd mpiprofile on detail builtin err = dlms_local ( x, y, wopt, step_size, equal_trust, n_agents ) ; i n f o = mpiprofile ( ' i n f o ' ) ; mpiprofile off mpiprofile ( ' viewer ', info {1}) ; % Get information on worker 1 Note that we are also tracking the builtin functions of MATLAB.
70 Debugging and Profiling Parallel Profiler (Example 3/5)
71 Debugging and Profiling Parallel Profiler (Example 4/5)
72 Debugging and Profiling Parallel Profiler (Example 5/5) There is probably a better way to implement this... You can also combine the parallel profiler with the interactive mode to get information on the full cluster.
73 Bibliography I Stephen P. Boyd and Lieven Vandenberghe, Convex optimization, Cambridge university press, Jeremy Kepner, Parallel MATLAB for multicore and multinode computers, vol. 21, SIAM, C. G. Lopes and A. H. Sayed, Diffusion least-mean squares over adaptive networks: Formulation and performance analysis, IEEE Transactions on Signal Processing 56 (2008), no. 7, Igor L Markov, Limits on fundamental limits to computation, Nature 512 (2014), no. 7513,
Adaptive Algorithms and Parallel Computing
Adaptive Algorithms and Parallel Computing Multicore and Multimachine Computing with MATLAB Academic year 2016-2017 Simone Scardapane Content of the Lecture Main points: Parallelizing loops with the par-for
More informationParallel Processing Tool-box
Parallel Processing Tool-box Start up MATLAB in the regular way. This copy of MATLAB that you start with is called the "client" copy; the copies of MATLAB that will be created to assist in the computation
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University
More informationMatlab: Parallel Computing Toolbox
Matlab: Parallel Computing Toolbox Shuxia Zhang University of Mineesota e-mail: szhang@msi.umn.edu or help@msi.umn.edu Tel: 612-624-8858 (direct), 612-626-0802(help) Outline Introduction - Matlab PCT How
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB Jemmy Hu SHARCNET HPC Consultant University of Waterloo May 24, 2012 https://www.sharcnet.ca/~jemmyhu/tutorials/uwo_2012 Content MATLAB: UWO site license on goblin MATLAB:
More informationMATLAB on BioHPC. portal.biohpc.swmed.edu Updated for
MATLAB on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2015-06-17 What is MATLAB High level language and development environment for: - Algorithm and application
More informationLecture x: MATLAB - advanced use cases
Lecture x: MATLAB - advanced use cases Parallel computing with Matlab s toolbox Heikki Apiola and Juha Kuortti February 22, 2018 Aalto University juha.kuortti@aalto.fi, heikki.apiola@aalto.fi Parallel
More informationParallel MATLAB at VT
Parallel MATLAB at VT Gene Cliff (AOE/ICAM - ecliff@vt.edu ) James McClure (ARC/ICAM - mcclurej@vt.edu) Justin Krometis (ARC/ICAM - jkrometis@vt.edu) 11:00am - 11:50am, Thursday, 25 September 2014... NLI...
More informationParallel Programming in MATLAB on BioHPC
Parallel Programming in MATLAB on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2017-05-17 What is MATLAB High level language and development environment for:
More informationParallel Computing with MATLAB on Discovery Cluster
Parallel Computing with MATLAB on Discovery Cluster Northeastern University Research Computing: Nilay K Roy, MS Computer Science, Ph.D Computational Physics Lets look at the Discovery Cluster Matlab environment
More informationUsing the MATLAB Parallel Computing Toolbox on the UB CCR cluster
Using the MATLAB Parallel Computing Toolbox on the UB CCR cluster N. Barlow, C. Cornelius, S. Matott Center for Computational Research University at Buffalo State University of New York October, 1, 2013
More informationDaniel D. Warner. May 31, Introduction to Parallel Matlab. Daniel D. Warner. Introduction. Matlab s 5-fold way. Basic Matlab Example
to May 31, 2010 What is Matlab? Matlab is... an Integrated Development Environment for solving numerical problems in computational science. a collection of state-of-the-art algorithms for scientific computing
More informationSpeeding up MATLAB Applications Sean de Wolski Application Engineer
Speeding up MATLAB Applications Sean de Wolski Application Engineer 2014 The MathWorks, Inc. 1 Non-rigid Displacement Vector Fields 2 Agenda Leveraging the power of vector and matrix operations Addressing
More informationUsing Parallel Computing Toolbox to accelerate the Video and Image Processing Speed. Develop parallel code interactively
Using Parallel Computing Toolbox to accelerate the Video and Image Processing Speed Presenter: Claire Chuang TeraSoft Inc. Agenda Develop parallel code interactively parallel applications for faster processing
More informationParallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer
Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster
More informationMatlab for Engineers
Matlab for Engineers Alistair Johnson 31st May 2012 Centre for Doctoral Training in Healthcare Innovation Institute of Biomedical Engineering Department of Engineering Science University of Oxford Supported
More informationPerformance Analysis of Matlab Code and PCT
p. 1/45 Performance Analysis of Matlab Code and PCT Xiaoxu Guan High Performance Computing, LSU March 21, 2018 1 tic; 2 nsize = 10000; 3 for k = 1:nsize 4 B(k) = sum( A(:,k) ); 5 6 toc; p. 2/45 Overview
More informationParallel and Distributed Computing with MATLAB The MathWorks, Inc. 1
Parallel and Distributed Computing with MATLAB 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster insight on more complex problems with larger datasets
More informationModeling and Simulating Social Systems with MATLAB
Modeling and Simulating Social Systems with MATLAB Lecture 6 Optimization and Parallelization Olivia Woolley, Tobias Kuhn, Dario Biasini, Dirk Helbing Chair of Sociology, in particular of Modeling and
More informationMATLAB AND PARALLEL COMPUTING
Image Processing & Communication, vol. 17, no. 4, pp. 207-216 DOI: 10.2478/v10248-012-0048-5 207 MATLAB AND PARALLEL COMPUTING MAGDALENA SZYMCZYK, PIOTR SZYMCZYK AGH University of Science and Technology,
More informationINTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX. Kadin Tseng Boston University Scientific Computing and Visualization
INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX Kadin Tseng Boston University Scientific Computing and Visualization MATLAB Parallel Computing Toolbox 2 The Parallel Computing Toolbox is a MATLAB tool
More informationINTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX
INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX Keith Ma ---------------------------------------- keithma@bu.edu Research Computing Services ----------- help@rcs.bu.edu Boston University ----------------------------------------------------
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB Jos Martin Principal Architect, Parallel Computing Tools jos.martin@mathworks.co.uk 2015 The MathWorks, Inc. 1 Overview Scene setting Task Parallel (par*) Why doesn t it
More informationParallel Programming in MATLAB on BioHPC
Parallel Programming in MATLAB on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2018-03-21 What is MATLAB High level language and development environment for:
More informationMit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen
Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen Frank Graeber Application Engineering MathWorks Germany 2013 The MathWorks, Inc. 1 Speed up the serial code within core
More informationGetting Started with MATLAB Francesca Perino
Getting Started with MATLAB Francesca Perino francesca.perino@mathworks.it 2014 The MathWorks, Inc. 1 Agenda MATLAB Intro Importazione ed esportazione Programmazione in MATLAB Tecniche per la velocizzazione
More informationParallelism. CS6787 Lecture 8 Fall 2017
Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does
More informationAdaptive Filters Algorithms (Part 2)
Adaptive Filters Algorithms (Part 2) Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing and System
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB Jos Martin Principal Architect, Parallel Computing Tools jos.martin@mathworks.co.uk 1 2013 The MathWorks, Inc. www.matlabexpo.com Code used in this presentation can be found
More informationIntroductory Parallel and Distributed Computing Tools of nirvana.tigem.it
Introductory Parallel and Distributed Computing Tools of nirvana.tigem.it A Computer Cluster is a group of networked computers, working together as a single entity These computer are called nodes Cluster
More informationImplementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU
Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU 1 1 Samara National Research University, Moskovskoe Shosse 34, Samara, Russia, 443086 Abstract.
More informationMATLAB Parallel Computing
MATLAB Parallel Computing John Burkardt Information Technology Department Virginia Tech... FDI Summer Track V: Using Virginia Tech High Performance Computing http://people.sc.fsu.edu/ jburkardt/presentations/fdi
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationNotes: Parallel MATLAB
Notes: Parallel MATLAB Adam Charles Last Updated: August 3, 2010 Contents 1 General Idea of Parallelization 2 2 MATLAB implementation 2 2.1 Parallel Workspaces and Interactive Pools........................
More informationLinear Regression Optimization
Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:
More informationSummer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics
Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one
More informationParallelization Strategy
COSC 335 Software Design Parallel Design Patterns (II) Spring 2008 Parallelization Strategy Finding Concurrency Structure the problem to expose exploitable concurrency Algorithm Structure Supporting Structure
More informationMATLAB Distributed Computing Server (MDCS) Training
MATLAB Distributed Computing Server (MDCS) Training Artemis HPC Integration and Parallel Computing with MATLAB Dr Hayim Dar hayim.dar@sydney.edu.au Dr Nathaniel Butterworth nathaniel.butterworth@sydney.edu.au
More informationAccelerating Implicit LS-DYNA with GPU
Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,
More informationCS 475: Parallel Programming Introduction
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
More informationDeep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur
Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today
More informationMit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen
Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen Michael Glaßer Application Engineering MathWorks Germany 2014 The MathWorks, Inc. 1 Key Takeaways 1. Speed up your serial
More informationCalcul intensif et Stockage de Masse. CÉCI/CISM HPC training sessions
Calcul intensif et Stockage de Masse CÉCI/ HPC training sessions Calcul intensif et Stockage de Masse Parallel Matlab on the cluster /CÉCI Training session www.uclouvain.be/cism www.ceci-hpc.be November
More informationData Analysis with MATLAB. Steve Lantz Senior Research Associate Cornell CAC
Data Analysis with MATLAB Steve Lantz Senior Research Associate Cornell CAC Workshop: Data Analysis on Ranger, October 12, 2009 MATLAB Has Many Capabilities for Data Analysis Preprocessing Scaling and
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing This document consists of two parts. The first part introduces basic concepts and issues that apply generally in discussions of parallel computing. The second part consists
More informationOn the Parallel Implementation of Best Fit Decreasing Algorithm in Matlab
Contemporary Engineering Sciences, Vol. 10, 2017, no. 19, 945-952 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ces.2017.79120 On the Parallel Implementation of Best Fit Decreasing Algorithm in
More informationOptimizing and Accelerating Your MATLAB Code
Optimizing and Accelerating Your MATLAB Code Sofia Mosesson Senior Application Engineer 2016 The MathWorks, Inc. 1 Agenda Optimizing for loops and using vector and matrix operations Indexing in different
More informationDeveloping MapReduce Programs
Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes
More informationCOMP 633 Parallel Computing.
COMP 633 Parallel Computing http://www.cs.unc.edu/~prins/classes/633/ Parallel computing What is it? multiple processors cooperating to solve a single problem hopefully faster than a single processor!
More informationDiffusion Spline Adaptive Filtering
24 th European Signal Processing Conference (EUSIPCO 16) Diffusion Spline Adaptive Filtering Authors: Simone Scardapane, Michele Scarpiniti, Danilo Comminiello and Aurelio Uncini Contents Introduction
More informationCalcul intensif et Stockage de Masse. CÉCI/CISM HPC training sessions Use of Matlab on the clusters
Calcul intensif et Stockage de Masse CÉCI/ HPC training sessions Use of Matlab on the clusters Typical usage... Interactive Batch Type in and get an answer Submit job and fetch results Sequential Parallel
More informationMATLAB Distributed Computing Server Release Notes
MATLAB Distributed Computing Server Release Notes How to Contact MathWorks www.mathworks.com Web comp.soft-sys.matlab Newsgroup www.mathworks.com/contact_ts.html Technical Support suggest@mathworks.com
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationImage Processing. Filtering. Slide 1
Image Processing Filtering Slide 1 Preliminary Image generation Original Noise Image restoration Result Slide 2 Preliminary Classic application: denoising However: Denoising is much more than a simple
More informationParallel Computing Concepts. CSInParallel Project
Parallel Computing Concepts CSInParallel Project July 26, 2012 CONTENTS 1 Introduction 1 1.1 Motivation................................................ 1 1.2 Some pairs of terms...........................................
More information732A54/TDDE31 Big Data Analytics
732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks
More informationChapter 13 Strong Scaling
Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables
More informationMathematical Programming and Research Methods (Part II)
Mathematical Programming and Research Methods (Part II) 4. Convexity and Optimization Massimiliano Pontil (based on previous lecture by Andreas Argyriou) 1 Today s Plan Convex sets and functions Types
More informationOutline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems
CSC 447: Parallel Programming for Multi- Core and Cluster Systems Performance Analysis Instructor: Haidar M. Harmanani Spring 2018 Outline Performance scalability Analytical performance measures Amdahl
More informationParallel MATLAB: Parallel For Loops
Parallel MATLAB: Parallel For Loops John Burkardt (FSU) & Gene Cliff (AOE/ICAM) 2pm - 4pm, Tuesday, 31 May 2011 Mathematics Commons Room... vt 2011 parfor.pdf... FSU: Florida State University AOE: Department
More informationWorkloads Programmierung Paralleler und Verteilter Systeme (PPV)
Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment
More informationART Exhibit. Per Christian Hansen. Tommy Elfving Touraj Nikazad Hans Henrik B. Sørensen. joint work with, among others
ART Exhibit Per Christian Hansen joint work with, among others Tommy Elfving Touraj Nikazad Hans Henrik B. Sørensen About Me Forward problem Interests: numerical methods for inverse problems and tomography,
More informationLecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the
More informationConvex Optimization MLSS 2015
Convex Optimization MLSS 2015 Constantine Caramanis The University of Texas at Austin The Optimization Problem minimize : f (x) subject to : x X. The Optimization Problem minimize : f (x) subject to :
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationThe Trainable Alternating Gradient Shrinkage method
The Trainable Alternating Gradient Shrinkage method Carlos Malanche, supervised by Ha Nguyen April 24, 2018 LIB (prof. Michael Unser), EPFL Quick example 1 The reconstruction problem Challenges in image
More informationA parallel approach of Best Fit Decreasing algorithm
A parallel approach of Best Fit Decreasing algorithm DIMITRIS VARSAMIS Technological Educational Institute of Central Macedonia Serres Department of Informatics Engineering Terma Magnisias, 62124 Serres
More informationColumn-Action Methods in Image Reconstruction
Column-Action Methods in Image Reconstruction Per Christian Hansen joint work with Tommy Elfving Touraj Nikazad Overview of Talk Part 1: the classical row-action method = ART The advantage of algebraic
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationThe Art of Parallel Processing
The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a
More informationMicrosoft Windows HPC Server 2008 R2 for the Cluster Developer
50291B - Version: 1 02 May 2018 Microsoft Windows HPC Server 2008 R2 for the Cluster Developer Microsoft Windows HPC Server 2008 R2 for the Cluster Developer 50291B - Version: 1 5 days Course Description:
More informationBuilding a system for symbolic differentiation
Computer Science 21b Structure and Interpretation of Computer Programs Building a system for symbolic differentiation Selectors, constructors and predicates: (constant? e) Is e a constant? (variable? e)
More informationLecture 12: convergence. Derivative (one variable)
Lecture 12: convergence More about multivariable calculus Descent methods Backtracking line search More about convexity (first and second order) Newton step Example 1: linear programming (one var., one
More informationMATLAB Based Optimization Techniques and Parallel Computing
MATLAB Based Optimization Techniques and Parallel Computing Bratislava June 4, 2009 2009 The MathWorks, Inc. Jörg-M. Sautter Application Engineer The MathWorks Agenda Introduction Local and Smooth Optimization
More informationChapter 6 Parallel Loops
Chapter 6 Parallel Loops Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables
More informationConvexity Theory and Gradient Methods
Convexity Theory and Gradient Methods Angelia Nedić angelia@illinois.edu ISE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign Outline Convex Functions Optimality
More informationChapter 12. CPU Structure and Function. Yonsei University
Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationLatches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter
IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more
More informationMulticore Computer, GPU 및 Cluster 환경에서의 MATLAB Parallel Computing 기능
Multicore Computer, GPU 및 Cluster 환경에서의 MATLAB Parallel Computing 기능 성호현 MathWorks Korea 2012 The MathWorks, Inc. 1 A Question to Consider Do you want to speed up your algorithms? If so Do you have a multi-core
More informationAmdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP
AMath 483/583 Lecture 13 April 25, 2011 Amdahl s Law Today: Amdahl s law Speed up, strong and weak scaling OpenMP Typically only part of a computation can be parallelized. Suppose 50% of the computation
More informationMATLAB Parallel Computing Toolbox Benchmark for an Embarrassingly Parallel Application
MATLAB Parallel Computing Toolbox Benchmark for an Embarrassingly Parallel Application By Nils Oberg, Benjamin Ruddell, Marcelo H. García, and Praveen Kumar Department of Civil and Environmental Engineering
More informationSpeeding up MATLAB Applications The MathWorks, Inc.
Speeding up MATLAB Applications 2009 The MathWorks, Inc. Agenda Leveraging the power of vector & matrix operations Addressing bottlenecks Utilizing additional processing power Summary 2 Example: Block
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 2: MapReduce Algorithm Design (2/2) January 14, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationMain Points of the Computer Organization and System Software Module
Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a
More informationParallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
Spring 2 Parallelization Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Outline Why Parallelism Parallel Execution Parallelizing Compilers
More informationAccelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization
Accelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization Ryan Chladny Application Engineering May 13 th, 2014 2014 The MathWorks, Inc. 1 Design Challenge: Electric
More informationConstraint-based Metabolic Reconstructions & Analysis H. Scott Hinton. Matlab Tutorial. Lesson: Matlab Tutorial
1 Matlab Tutorial 2 Lecture Learning Objectives Each student should be able to: Describe the Matlab desktop Explain the basic use of Matlab variables Explain the basic use of Matlab scripts Explain the
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationContents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.
page v Preface xiii I Basics 1 1 Optimization Models 3 1.1 Introduction... 3 1.2 Optimization: An Informal Introduction... 4 1.3 Linear Equations... 7 1.4 Linear Optimization... 10 Exercises... 12 1.5
More informationCONLIN & MMA solvers. Pierre DUYSINX LTAS Automotive Engineering Academic year
CONLIN & MMA solvers Pierre DUYSINX LTAS Automotive Engineering Academic year 2018-2019 1 CONLIN METHOD 2 LAY-OUT CONLIN SUBPROBLEMS DUAL METHOD APPROACH FOR CONLIN SUBPROBLEMS SEQUENTIAL QUADRATIC PROGRAMMING
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationA Comprehensive Study on the Performance of Implicit LS-DYNA
12 th International LS-DYNA Users Conference Computing Technologies(4) A Comprehensive Study on the Performance of Implicit LS-DYNA Yih-Yih Lin Hewlett-Packard Company Abstract This work addresses four
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware
More informationImage Registration Lecture 4: First Examples
Image Registration Lecture 4: First Examples Prof. Charlene Tsai Outline Example Intensity-based registration SSD error function Image mapping Function minimization: Gradient descent Derivative calculation
More informationScaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1
Scaling MATLAB for Your Organisation and Beyond Rory Adams 2015 The MathWorks, Inc. 1 MATLAB at Scale Front-end scaling Scale with increasing access requests Back-end scaling Scale with increasing computational
More informationEvaluating the MATLAB Parallel Computing Toolbox Kashif Hussain Computer Science 2012/2013
Evaluating the MATLAB Parallel Computing Toolbox Kashif Hussain Computer Science 2012/2013 The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @
More informationHigh Performance Computing for BU Economists
High Performance Computing for BU Economists Marc Rysman Boston University October 25, 2013 Introduction We have a fabulous super-computer facility at BU. It is free for you and easy to gain access. It
More informationParallelization of Sequential Programs
Parallelization of Sequential Programs Alecu Felician, Pre-Assistant Lecturer, Economic Informatics Department, A.S.E. Bucharest Abstract The main reason of parallelization a sequential program is to run
More information