Introduction to SWARM Software and Algorithms for Running on Multicore Processors

Size: px
Start display at page:

Download "Introduction to SWARM Software and Algorithms for Running on Multicore Processors"

Transcription

1 Itroductio to SWARM Software ad Algorithms for Ruig o Multicore Processors David A. Bader Georgia Istitute of Techology Tutorial compiled by Rucheek H. Sagai M.S. Studet, College of Computig Georgia istitute of Techology SWARM is a portable ope source library of basic primitives that provide framework for desigig algorithms o multicore systems. Usig this framework, we have implemeted efficiet parallel algorithms for importat primitive operatios such as prefix-sums, poiter-jumpig, symmetry breakig, ad list rakig; combiatorial problems such as sortig ad selectio; parallel graph theoretic algorithms such as spaig tree, miimum spaig tree, graph decompositio, ad tree cotractio; ad computatioal geomics applicatios such as maximum parsimoy. This documetatio provides descriptios for various variables, fuctios ad macros that ca be used as API s for developig parallel codes ad is supported by the explaatio of a example code. 1

2 Idex of Cotets 1. Itroductio Motivatio for SWARM What is SWARM SWARM Applicatio Programmig Iterface Tutorial SWARM Program Structure SWARM Variables ad Predefies SWARM Fuctios ad Primitives Iitializatio, Executio ad Termiatio Fuctios Bier Fuctios Memory Maagemet Fuctios Broadcast Fuctios Replicate Fuctios Reduce Fuctios Sca Fuctios SWARM Macros Workig with the Example Code Stadard Deviatio SWARM for Stadard Deviatio Calculatio Example Code itself Step by Step Explaatio of the Code Coclusio

3 1. Itroductio: 1.1 Motivatio for SWARM Sice the iceptio of desktop computer, software performace has improved at a expoetial rate, primarily drive by rapid growth i processig power. Performace of a algorithm kept o simply improvig by the ival of ew ad faster processors. However, we ca o loger solely rely o Moore s law for performace improvemets. Fudametal physical limitatios such as the size of the trasistor ad power costraits have ow ecessitated a radical chage i commodity microprocessor architecture to multicore desigs. Dual ad quad-core processors are slowly ad steadily fidig their way ito the desktops ad the laptops. Software developers ad programmers are ow required to exploit this cocurecy at algorithmic level. 1.2 What is SWARM? SWARM (SoftWare ad Algorithms for Ruig o Multicore) has bee itroduced as a ope source parallel programmig framework. It is a library of primitives that fully exploit the multicore processors. The SWARM programmig framework is a descedat of the symmetric multiprocessor (SMP) ode library compoet of SIMPLE (Joural of Parallel ad Distributed Computig, 58(1):92 108, 1999). SWARM is built o POSIX threads that allow the user to use either the already developed primitives or direct thread primitives. SWARM has costructs for parallelizatio, restrictig cotrol of threads, allocatio ad de-allocatio of shared memory, ad commuicatio primitives for sychroizatio, replicatio ad broadcastig. The framework has bee successfully used to implemet efficiet parallel versios of primitive algorithms. Viz. List rakig, Prefix sums, Symmetry breakig etc. I order to use the SWARM library, the programmer eeds to make miimal modificatios to existig sequetial code. After idetifyig compute-itesive routies i the program, work ca be assiged to each core usig a efficiet multicore algorithm. Idepedet operatios such as those arisig i fuctioal parallelism or loop parallelism ca be typically threaded. For fuctioal parallelism, this meas that each thread acts as a fuctioal process for that task, ad for loop parallelism, each thread computes its portio of the computatio cocurretly. Note that it might be ecessary to apply loop trasformatios to reduce data depedecies betwee threads. SWARM cotais efficiet implemetatios of commoly-used primitives i parallel programmig. These computatio ad commuicatio primitives have bee discussed below ad followed up with their usage i a example code. 3

4 2. SWARM Applicatio Programmig Iterface Tutorial 2.1 SWARM Program Structure Before begiig with the SWARM API fuctios ad variables, we first preset the structure of a typical SWARM program. This will help i uderstadig the fuctios alog with their usage described later. The code executes the fuctio routie i parallel, sychroizes at a certai poit ad later prits the thread id o which it is ruig for each idividual thread. 8 threads/cores have bee assumed for all the examples preseted i this sectio. #iclude <swarm.h> static void routie (THREADED) SWARM_Bier_syc(TH); pritf( My thread id = %d, Total threads = %d\, MYTHREAD, THREADS); it mai (it argc, char **argv) SWARM_Iit(&argc, &argv); /* sequetial code */ /* parallelize a routie usig SWARM */ SWARM_Ru(routie); /* more sequetial code */ /*Termiate clealy */ SWARM_Fialize(); 4

5 Correspodig output (assumig 8 threads My thread id = 2, Total threads = 8 My thread id = 5, Total threads = 8 My thread id = 0, Total threads = 8 My thread id = 1, Total threads = 8 My thread id = 7, Total threads = 8 My thread id = 4, Total threads = 8 My thread id = 3, Total threads = 8 My thread id = 6, Total threads = 8 The order i which the threads are prited varies o differet executios. 5

6 2.2 SWARM Variables ad Predefies THREADED: THREADED is defied to be a structure that cotais all the required iformatio for the particular thread. The parallel routie must be ivoked with THREADED as the argumet. TH: TH is istace of THREADED. As see i the above code, TH is passed as a argumet for SWARM_Bier_syc. The fuctio defiitio that accepts TH is defied as SWARM_Bier_syc(THREADED) which ca be see i the swarm.h header file. From the programmig poit of view, ot much has to be worried about TH ad THREADED as they have bee declared for iteral implemetatio purposes. MYTHREAD: Provides the thread id of the thread cotaiig it. MYTHREAD is output o the test code for each of the threads. THREADS: Specifies the total umber of threads executig i parallel. Both THREADS ad MYTHREAD are useful variables from programmers poit of view. Figure below explais the above cocepts: struct THREADED exter it THREADS = 8 TH a istace of THREADED TH 0 MYTHREA TH 1 MYTHREA TH 7 MYTHREA THREADS THREADS THREADS 6

7 2.3 SWARM Fuctios ad Primitives Iitializatio, Executio ad Termiatio Fuctios: These fuctios are called from the mai of the program. As replicated i the code below, most of the SWARM applicatios will have the structure of mai usig these three fuctios. it mai (it argc, char **argv) SWARM_Iit(&argc, &argv); /* sequetial code */ /* parallelize a routie usig SWARM */ SWARM_Ru(routie); /* more sequetial code */ /*Termiate clealy */ SWARM_Fialize(); Each of them is further described below. SWARM_Iit(&argc, &argv) This fuctio is resposible for iitializig the parallel eviromet ad allocatig the requisite memory. It looks at the umber of processors available o the machie ad sets up the threads accordigly. The user ca also specify the umber of threads by usig the optio t. SWARM_Ru ((void *)routie) This is the routie which we wat to parallelize across various threads. Thread creatio ad executio takes place here. SWARM_Fialize() Performs the clea up task by freeig up the allocated memory. 7

8 2.3.2 Bier Fuctios Bier fuctios are used to sychroize the executio o various threads. The threads execute asychroously i parallel util this fuctio is called. Whe oe of the threads reaches the bier, it waits util all the threads reach this poit, thus sychroizig all the threads. Oce sychroized, all threads start executig asychroously i parallel agai. //Parallel code.... Bier() //Parallel code.... Bier() //Parallel code.... Bier() time //Parallel code Bier() //Parallel code Bier() //Parallel code Bier() //Parallel code. //Parallel code. //Parallel code. THREAD 0 THREAD 1 THREAD 7 Wait for threads to reach here void SWARM_Bier() By default SWARM_Bier uses SWARM_Bier_syc, which is described ext. Usage: static void routie (THREADED) /* parallel code */ /* use the SWARM Bier for sychroizatio */ SWARM_Bier(); /* more parallel code */ 8

9 void SWARM_Bier_syc(THREADED) This fuctio achieves thread sychroizatio usig coditioal wait o mutex lock util all threads have reached. Usage: static void routie (THREADED) /* parallel code */ /* use the SWARM Bier for sychroizatio */ SWARM_Bier_syc(); /* more parallel code */ void SWARM_Bier_tree(THREADED) This fuctio achieves thread sychroizatio usig shared buffers. Threads are viewed as a biary tree based o their thread id. Each thread seds a message to the paret thread whe it ivokes the bier. This goes bottom up util root detects completio of bier. The root, ow releases the processes for further executio i top dow order. Usage: static void routie (THREADED) /* parallel code */ /* use the SWARM Bier for sychroizatio */ SWARM_Bier_tree(); /* more parallel code */ 9

10 2.3.3 Memory Maagemet Fuctios: These are wrapper fuctios for dyamic memory allocatio ad release of shared structures. Allocatio of memory is doe globally, i.e. all threads will access the same memory locatio Local poiter to global memory locatio -2-1 A A A THREAD 0 THREAD 1 THREAD 7 void *SWARM_malloc(it bytes, THREADED) Dyamic memory allocatio for specified umber of bytes. The uderlyig fuctio implemetatio esures that allocatio is doe oly o oe of the threads. Retur type is a void poiter ad ca be type-casted to achieve desired type. Usage: static void routie (THREADED) it *A; /* example: allocate a shared ay of size */ A = (it*)swarm_malloc(*sizeof(it),th); 10

11 void *SWARM_malloc_l(log bytes, THREADED) Similar to SWARM_malloc, except that, it is useful for allocatig larger amouts of memory that caot be specified by it. Thus, if sizeof(it) is 4 bytes ad if we wat to allocate more tha 2^32 1 (or greater tha or equal to 2GB of memory ), SWARM_malloc_l should be used. Usage: static void routie (THREADED) it *A; /* example: allocate a shared ay of size */ A = (it*)swarm_malloc_l(*sizeof(it),th); void SWARM_free(void *ptr, THREADED) Used for de-allocatig the dyamically allocated memory. The uderlyig fuctio implemetatio esures that de-allocatio is doe oly o oe of the threads. Usage: static void routie (THREADED) it *A; A = (it*)swarm_malloc(*sizeof(it),th); /* Free the memory allocated for A */ SWARM_free(A); 11

12 2.3.4 Broadcast Fuctios: There are may situatios where a value or a memory locatio o oe of the threads or cores eeds to be distributed to other cores. Broadcast fuctios are used for the same. Broadcast copyig poiter cotet A A A i i i THREAD 0 THREAD 1 Broadcast copyig data value THREAD 7 it SWARM_Bcast_i(it myval, THREADED) Used to Broadcast iteger values to all the cores. Usage: static void routie (THREADED) it i = 5; o_oe_thread i = 10; pritf( Before broadcast:\\ ); pritf("value of i = %d o thread %d\", I, MYTHREAD); SWARM_Bier(); /* Broadcastig value of i to all cores */ i = SWARM_Bcast_i(i,TH); 12

13 o_oe_thread pritf( \\ ); pritf( After broadcast:\\ ); pritf("value of i = %d o thread %d\", I, MYTHREAD); Output for the above code: Before broadcast: Value of i = 5 o thread 1 Value of i = 10 o thread 0 Value of i = 5 o thread 7 Value of i = 5 o thread 2 Value of i = 5 o thread 3 Value of i = 5 o thread 4 Value of i = 5 o thread 6 Value of i = 5 o thread 5 After broadcast: Value of i = 10 o thread 1 Value of i = 10 o thread 0 Value of i = 10 o thread 7 Value of i = 10 o thread 2 Value of i = 10 o thread 3 Value of i = 10 o thread 4 Value of i = 10 o thread 6 Value of i = 10 o thread 5 it SWARM_Bcast_from_i(it myval, it source, THREADED) Broadcast routie metioed previously, broadcasts the value o thread/core 0. However, i cases where it is required to copy values from a specific thread, you ca use the Bcast_from fuctio. Fuctioally, it is same as Bcast, except that you eed to specify the source thread id as the argumet. log SWARM_Bcast_l(log myval, THREADED) Broadcast values of type log to all cores. 13

14 log SWARM_Bcast_from_l(log myval, it source, THREADED) Broadcast values of type log from a specific core to all cores. double SWARM_Bcast_d(double myval, THREADED) Broadcast values of type double to all cores. double SWARM_Bcast_from_d(double myval, it source, THREADED) Broadcast values of type double from a specific core to all cores. char SWARM_Bcast_c(char myval, THREADED) Broadcast char values to all the cores. char SWARM_Bcast_from_c(char myval, it source, THREADED) Broadcast char values from a specific core to all cores. it *SWARM_Bcast_ip(it *myval, THREADED) Used to provide each processig core with the address of the shared buffer. Care should be take to esure that the buffer whose address is beig broadcasted is shared. Usage: static void routie (THREADED) it *a = (it *)SWARM_malloc(sizeof(it), TH); it *b = (it *)SWARM_malloc(sizeof(it), TH); it *c; *a = 5; *b = 10; c = a; SWARM_Bier(); o_oe_thread pritf( Before broadcast:\\ ); c = b; //b is shared 14

15 pritf("value of *c = %d o thread %d\", *c, MYTHREAD); /* Broadcastig address of b to all cores */ c = SWARM_Bcast_ip(c,TH); o_oe_thread pritf( \\ ); pritf( After broadcast:\\ ); SWARM_Bier(); pritf("value of *c = %d o thread %d\", *c, MYTHREAD); Output for the above code: Before broadcast: Value of *c = 5 o thread 1 Value of *c = 10 o thread 0 Value of *c = 5 o thread 7 Value of *c = 5 o thread 2 Value of *c = 5 o thread 3 Value of *c = 5 o thread 4 Value of *c = 5 o thread 6 Value of *c = 5 o thread 5 After broadcast: Value of *c = 10 o thread 1 Value of *c = 10 o thread 0 Value of *c = 10 o thread 7 Value of *c = 10 o thread 2 Value of *c = 10 o thread 3 Value of *c = 10 o thread 4 Value of *c = 10 o thread 6 Value of *c = 10 o thread 5 it *SWARM_Bcast_from_ip(it *myval, it source, THREADED) Broadcast poiter to a iteger from a specific core to all cores. log *SWARM_Bcast_lp(log *myval, THREADED) Broadcast poiter to log iteger. 15

16 log *SWARM_Bcast_from_lp(log *myval, it source, THREADED) Broadcast poiter to a log iteger from a specific core to all cores. double *SWARM_Bcast_dp(double *myval, THREADED) Broadcast poiter to a double. double *SWARM_Bcast_from_dp(double *myval, it source, THREADED) Broadcast poiter to a double from a specific core to all cores. char *SWARM_Bcast_cp(char *myval, THREADED) Broadcast poiter to a character. char *SWARM_Bcast_from_cp(char *myval, it source, THREADED) Broadcast poiter to a character from a specific core to call cores. 16

17 2.3.5 Replicate Fuctios: The basic differece betwee replicate ad broadcast is that, while broadcast copies the value ito pre-existig memory locatios o all cores, replicate allocates memory durig the fuctio call ad creates the replica of the object passed by the callig thread. Replicate cotets Allocate memory Allocate memory A A A THREAD 0 THREAD 1 THREAD 7 void *SWARM_Replicate(void *myval, it source, it bytes, THREADED) The argumets for the replicate fuctio require you to pass the poiter to the object to be replicated, the thread/core id of the callig thread ad the size of object to be replicated. The returig object must be type-casted with the correspodig type of the passed object. Usage static void routie (THREADED) double *a = NULL; it size = 0; 17

18 o_thread(1) a = (double *)malloc(3 * sizeof(double)); a[0] = 9.99; a[1] = 8.88; a[2] = 7.77; size = sizeof(double)*3; pritf("before Replicate\\"); pritf("o thread %d values = %lf %lf %lf\", MYTHREAD, *a, *(a+1), *(a+2)); /*Replicatig the double object 'a' to o all cores*/ a = (double *)SWARM_Replicate(a, 1, size, TH); SWARM_Bier(); o_oe_thread pritf("\\"); pritf("after Replicate\\"); SWARM_Bier(); pritf("o thread %d values = %lf %lf %lf\", MYTHREAD, *a, *(a+1), *(a+2)); Output: Before Replicate O thread 1 values = After Replicate O thread 0 values = O thread 1 values = O thread 2 values = O thread 3 values = O thread 4 values = O thread 5 values =

19 2.3.5 Reduce Fuctios: These are set of fuctios that are used to obtai additio, maximum or miimum of values calculated across differet threads. Each thread provides the value i its local copy to this primitive ad the operatio specified by the op argumet is performed o these values. op ca take values SUM, MAX or MIN based o task to be performed. Used for iteger valued operatios. it SWARM_Reduce_i(it myval, reduce_t op, THREADED) The value provided by each thread is of type iteger. Usage: static void routie (THREADED) it sum = 0; sum = SWARM_Reduce_i(MYTHREAD, SUM, TH); pritf("value of sum = %d o thread %d\", sum, MYTHREAD); Output: Value of sum = 28 o thread 0 Value of sum = 28 o thread 1 Value of sum = 28 o thread 2 Value of sum = 28 o thread 3 Value of sum = 28 o thread 4 Value of sum = 28 o thread 5 Value of sum = 28 o thread 6 Value of sum = 28 o thread 7 log SWARM_Reduce_l(log myval, reduce_t op, THREADED) Perform operatio o log values. double SWARM_Reduce_d(double myval, reduce_t op, THREADED) Perform operatio o floatig poit values. 19

20 Sca Fuctios: Sca fuctios perform task similar to Reduce fuctios. However, these are prefix operatios, i the sese that output of each thread is based oly o the threads before them (i.e. those threads havig smaller thread id tha it). it SWARM_Sca_i(it myval, reduce_t op, THREADED) The value provided by each thread is of type iteger. Usage: static void routie (THREADED) it sum = 0; sum = SWARM_Sca_i(MYTHREAD, SUM, TH); pritf("value of sum = %d o thread %d\", sum, MYTHREAD); Output: Value of sum = 0 o thread 0 Value of sum = 1 o thread 1 Value of sum = 2 o thread 2 Value of sum = 6 o thread 3 Value of sum = 10 o thread 4 Value of sum = 15 o thread 5 Value of sum = 21 o thread 6 Value of sum = 28 o thread 7 log SWARM_Sca_l(log myval, reduce_t op, THREADED) Perform operatio o log values. double SWARM_Sca_d(double myval, reduce_t op, THREADED) Perform operatio o float values. 20

21 2.4 Macros for SWARM o_thread, o_oe_thread Cotrol ca be give to ay particular thread usig these macros. o_oe_thread gives cotrol to thread 0. Usage: static void routie (THREADED) /* example: execute code o thread MYTHREAD */ o_thread(threads - 1) pritf("reached here i oly oe of the threads\"); pritf("i thread %d\\", MYTHREAD); SWARM_Bier(); /* example: execute code o oe thread */ o_oe_thread pritf("reached here i oly oe of the threads\"); pritf("i thread %d\", MYTHREAD); Output: Reached here i oly oe of the threads I thread 7 Reached here i oly oe of the threads I thread 0 21

22 SWARM_pardo The SWARM library cotais several basic pardo directives for executig loops cocurretly o oe or more processig cores. Typically, this is useful whe a idepedet operatio is to be applied to every locatio i a ay, for example elemet-wise additio of two ays. Pardo implicitly partitios the loop amog the cores without the eed for coordiatig overheads such as sychroizatio of commuicatio betwee the cores. 0-1 pardo implicitly esures that each thread works o its part of the ay idepedetly ad i parallel A A A THREAD 0 THREAD 1 THREAD 7 Usage: static void routie (THREADED) /*example: partitioig a "for" loop amog cores */ SWARM_pardo(i, start, ed, icr) A[i] = A[i] * A[i]; 22

23 3. Workig with the Example Code The example code preseted below is used to demostrate the SWARM API. At least oe fuctio of each type (sychroizatio, replicatio, broadcast etc.) has bee icorporated ito the example code. The code is used to calculate the stadard deviatio of a set of umbers represeted i a ay i a parallel eviromet. The stadard deviatio is used to measure how widely the data is spread i a distributio. 3.1 Stadard Deviatio (σ) If x 1, x 2, x 3 x represets a sequece of umbers, the stadard distributio σ is give mathematically as: i 1 x i 2 where μ is the mea of the distributio give by, i 1 x i 3.2 SWARM for Stadard Deviatio Calculatio The basic idea is to distribute the elemets i the ay across various processors/threads. Each thread calculates the sum o its part of the ay, which is used to calculate the total sum ad hece, the mea. The mea is ow distributed across various threads, after which each thread ow computes the sum of square of differeces with the mea o its part of the ay, which is used to calculate the fial stadard deviatio. The example code has bee replicated below followed by which, we have a detailed explaatio of the code. 23

24 3.3 The Code itself #iclude <swarm.h> #iclude <swarm_radom.h> static void stddev_routie (THREADED) it i, = 10; it max_radom = 10, partial_sum = 0, prefix_sum = 0, global_sum = 0; it *; double global_mea = 0, partial_squared_sum = 0, squared_sum = 0, std_dev = 0; = SWARM_malloc( * sizeof(it), TH); o_oe_thread pritf("array memory allocated o a sigle thread...\\"); /***************** Figure ************************/ SWARM_radom_iit(TH); SWARM_sradom(MYTHREAD + 1,TH); pardo(i, 0,, 1) [i] = SWARM_radom(TH)%max_radom; SWARM_Bier(); /***************** Figure ************************/ o_oe_thread pritf("radomly geerated ay is:\"); for(i = 0; i < ; i++) pritf("[%d] = %ld\", i, [i]); pritf("\\"); SWARM_Bier(); pardo(i, 0,, 1) partial_sum += [i]; 24

25 o_oe_thread pritf("partial Sum:\"); pritf("thread %d: %d\", MYTHREAD, partial_sum); /***************** Figure ************************/ SWARM_Bier(); o_oe_thread pritf("\\"); pritf("global sum calculated usig Reduce operatio...\\"); pritf("global Sum:\"); SWARM_Bier(); global_sum = SWARM_Reduce_i(partial_sum, SUM, TH); pritf("thread %d: %d\", MYTHREAD, global_sum); /***************** Figure ************************/ o_oe_thread global_mea = 1.0 * global_sum/; pritf("\\"); pritf("mea calculated o a thread = %f\", global_mea); pritf("\\"); pritf("broadcastig mea...\\"); pritf("global Mea:\"); /***************** Figure ************************/ global_mea = SWARM_Bcast_d(global_mea, TH); /***************** Figure ************************/ SWARM_Bier(); pritf("thread %d: %f\", MYTHREAD, global_mea); SWARM_Bier(); pardo(i, 0,, 1) partial_squared_sum += ([i] - global_mea)*([i] - global_mea); /***************** Figure ************************/ 25

26 SWARM_Bier(); o_oe_thread pritf("\\"); pritf("partial Squared Sum:\"); SWARM_Bier(); pritf("thread %d: %f\", MYTHREAD, partial_squared_sum); SWARM_Bier(); o_oe_thread pritf("\\calculatig total squared sum from partial squared sums usig Sca operatio...\\"); pritf("total Squared Sum:\"); SWARM_Bier(); squared_sum = SWARM_Sca_d(partial_squared_sum, SUM, TH); pritf("thread %d: %f\", MYTHREAD, squared_sum); /***************** Figure ************************/ SWARM_Bier(); o_thread(threads - 1) pritf("\\"); pritf("calculatig Stadard Deviatio o last thread...\\"); std_dev = sqrt(squared_sum / ); pritf("stadard Deviatio = %f\", std_dev); SWARM_Bier(); /***************** Figure ************************/ o_oe_thread pritf("\\"); pritf("releasig ay memory...\\"); SWARM_free(, TH); 26

27 it mai (it argc, char **argv) SWARM_Iit(&argc,&argv); SWARM_Ru ((void *)stddev_routie); SWARM_Fialize(); retur 0; Output: THREADS: 3 Array memory allocated o a sigle thread... Radomly geerated ay is: [0] = 3 [1] = 4 [2] = 7 [3] = 8 [4] = 5 [5] = 9 [6] = 0 [7] = 6 [8] = 2 [9] = 2 Partial Sum: Thread 1: 22 Thread 2: 10 Thread 0: 14 Global sum calculated usig Reduce operatio... Global Sum: Thread 2: 46 Thread 0: 46 Thread 1: 46 Mea calculated o a thread = Broadcastig mea... Global Mea: Thread 2: Thread 1: Thread 0:

28 Partial Squared Sum: Thread 0: Thread 1: Thread 2: Calculatig total squared sum from partial squared sums usig Sca operatio... Total Squared Sum: Thread 1: Thread 0: Thread 2: Calculatig Stadard Deviatio o last thread... Stadard Deviatio = Releasig ay memory... 28

29 3.4 Step by Step Explaatio of the Code The code executio begis with mai. The mai costruct will be almost similar for all the examples usig SWARM. It begis with the iitializatio of the SWARM parallel eviromet ad cocludes with the clea up of this eviromet. Betwee this, we have a call to a routie that eeds to be executed i this parallel eviromet. All this is achieved usig 3 fuctios SWARM_Iit, SWARM_Ru ad SWARM_Fialize which are further detailed i the API explaatio. The routie defied for parallelizig the calculatio of stadard deviatio is stddev_routie. This routie will be executed o each of the threads/cores. We ow describe a istace of executio. For the purpose of explaatio, we assume that there are 3 threads ad the size of the data set is 10. (THREADS = 3, = 10). Withi the code, commets have bee placed specifyig the figure to refer, which will provide a detailed view of code executio at that poit of time. The memory that will hold the data set of umbers is dyamically allocated. This esures, that oly oe copy of this list is maitaied across all the threads. O the other had, variables defied locally (viz., partial_sum etc) are replicated withi all the threads. Figure below supports the explaatio partial_sum partial_sum partial_sum Thread 0 Thread 1 Thread 2 Figure

30 Next, the ay is populated radomly. Radomizatio fuctios have also bee icorporated i SWARM which mimics the stadard radom fuctio. A seed has to be specified for geeratig the radom sequece. Varyig the seed varies the sequece created. The populatig process is also doe i a parallel maer. Thus thread 0 fills the first 3 elemets, thread 1 fills the ext 3 ad the last thread fills the remaiig elemets partial_sum partial_sum partial_sum Thread 0 Thread 1 Thread 2 Figure

31 Each thread ow works o its part of the ay to calculate the partial sum partial_sum partial_sum partial_sum Thread 0 Thread 1 Thread 2 Figure

32 Oce the partial sum is calculated at each thread, global sum is calculated usig the reduce operatio. Reduce fuctio picks up a value from each thread ad performs a biary operatio o those values. Here we request to perform sum operatio o the partial sum values to obtai the global sum global_sum global_sum global_sum partial_sum partial_sum 10 partial_sum Thread 0 Thread 1 Thread 2 Figure

33 Global average or global mea is ow calculated o thread global_mea global_mea global_mea global_sum global_sum 46 global_sum Thread 0 Thread 1 Thread 2 Figure

34 This global mea o thread 0 has to be broadcasted to all other threads for further calculatio of stadard deviatio global_mea global_mea global_mea global_sum global_sum 46 global_sum Thread 0 Thread 1 Thread 2 Figure

35 As each thread receives the value of mea, they ow start calculatig the sum of square of differeces with the mea, o their part of the ay. This step is exactly similar to the part where we were calculatig the partial sum o each thread partial_squared_sum partial_squared_sum partial_squared_sum Thread 0 Thread 1 Thread 2 Figure

36 Sca operatio is ow used to add up the partial sums. The basic differece betwee sca ad reduce operatio is the fact that sca performs prefix operatio. Thus thread 0 has its ow value, thread 1 has sum of thread 0 ad thread 1, while thread 2 has sum of thread 0, 1 ad 2. Figure below describes this squared_sum squared_sum squared_sum partial_squared_sum partial_squared_sum partial_sqaured_sum Thread 0 Thread 1 Thread 2 Figure

37 Fially, the last thread (thread 2 here) has the total squared sum of differeces with the mea. We calculate the stadard deviatio o this thread by dividig the total by the umber of elemets (10) ad the takig the square root std_dev std_dev std_dev squared_sum squared_sum sqaured_sum Thread 0 Thread 1 Thread 2 Figure Oce the stadard deviatio is calculated, the memory allocated for the ay is released back to the operatig system ad we exit the parallel routie. 37

38 4. Coclusio SWARM is thus able to provide a soud iterface for program developers to develop parallel applicatios without worryig about the uderlyig thread level aspects. We have already implemeted several importat parallel primitives ad algorithms usig this framework. I future, we ited to add to the fuctioality of basic primitives i SWARM, as well as build more multicore applicatios usig this library. For further details, you ca view the paper: SWARM: A Parallel Programmig Framework for Multicore Processors, David A. Bader, Varu Kaade ad Kamesh Madduri 38

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 10 Defiig Classes Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 10.1 Structures 10.2 Classes 10.3 Abstract Data Types 10.4 Itroductio to Iheritace Copyright 2015 Pearso Educatio,

More information

Chapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 9 Poiters ad Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 9.1 Poiters 9.2 Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Slide 9-3

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved. Chapter 11 Frieds, Overloaded Operators, ad Arrays i Classes Copyright 2014 Pearso Addiso-Wesley. All rights reserved. Overview 11.1 Fried Fuctios 11.2 Overloadig Operators 11.3 Arrays ad Classes 11.4

More information

CMPT 125 Assignment 2 Solutions

CMPT 125 Assignment 2 Solutions CMPT 25 Assigmet 2 Solutios Questio (20 marks total) a) Let s cosider a iteger array of size 0. (0 marks, each part is 2 marks) it a[0]; I. How would you assig a poiter, called pa, to store the address

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13 CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis

More information

One advantage that SONAR has over any other music-sequencing product I ve worked

One advantage that SONAR has over any other music-sequencing product I ve worked *gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig

More information

Code Review Defects. Authors: Mika V. Mäntylä and Casper Lassenius Original version: 4 Sep, 2007 Made available online: 24 April, 2013

Code Review Defects. Authors: Mika V. Mäntylä and Casper Lassenius Original version: 4 Sep, 2007 Made available online: 24 April, 2013 Code Review s Authors: Mika V. Mätylä ad Casper Lasseius Origial versio: 4 Sep, 2007 Made available olie: 24 April, 2013 This documet cotais further details of the code review defects preseted i [1]. of

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

Avid Interplay Bundle

Avid Interplay Bundle Avid Iterplay Budle Versio 2.5 Cofigurator ReadMe Overview This documet provides a overview of Iterplay Budle v2.5 ad describes how to ru the Iterplay Budle cofiguratio tool. Iterplay Budle v2.5 refers

More information

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000. 5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 22 Database Recovery Techiques Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Recovery algorithms Recovery cocepts Write-ahead

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation 6-0-0 Kowledge Trasformatio from Task Scearios to View-based Desig Diagrams Nima Dezhkam Kamra Sartipi {dezhka, sartipi}@mcmaster.ca Departmet of Computig ad Software McMaster Uiversity CANADA SEKE 08

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information

Programming with Shared Memory PART II. HPC Spring 2017 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Spring 2017 Prof. Robert van Engelen Programmig with Shared Memory PART II HPC Sprig 2017 Prof. Robert va Egele Overview Sequetial cosistecy Parallel programmig costructs Depedece aalysis OpeMP Autoparallelizatio Further readig HPC Sprig

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Chapter 5. Functions for All Subtasks. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 5. Functions for All Subtasks. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 5 Fuctios for All Subtasks Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 5.1 void Fuctios 5.2 Call-By-Referece Parameters 5.3 Usig Procedural Abstractio 5.4 Testig ad Debuggig

More information

Data Structures Week #9. Sorting

Data Structures Week #9. Sorting Data Structures Week #9 Sortig Outlie Motivatio Types of Sortig Elemetary (O( 2 )) Sortig Techiques Other (O(*log())) Sortig Techiques 21.Aralık.2010 Boraha Tümer, Ph.D. 2 Sortig 21.Aralık.2010 Boraha

More information

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 4 Procedural Abstractio ad Fuctios That Retur a Value Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 4.1 Top-Dow Desig 4.2 Predefied Fuctios 4.3 Programmer-Defied Fuctios 4.4

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

Outline n Introduction n Background o Distributed DBMS Architecture

Outline n Introduction n Background o Distributed DBMS Architecture Outlie Itroductio Backgroud o Distributed DBMS Architecture Datalogical Architecture Implemetatio Alteratives Compoet Architecture o Distributed DBMS Architecture o Distributed Desig o Sematic Data Cotrol

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

Operating System Concepts. Operating System Concepts

Operating System Concepts. Operating System Concepts Chapter 4: Mass-Storage Systems Logical Disk Structure Logical Disk Structure Disk Schedulig Disk Maagemet RAID Structure Disk drives are addressed as large -dimesioal arrays of logical blocks, where the

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

CS 11 C track: lecture 1

CS 11 C track: lecture 1 CS 11 C track: lecture 1 Prelimiaries Need a CMS cluster accout http://acctreq.cms.caltech.edu/cgi-bi/request.cgi Need to kow UNIX IMSS tutorial liked from track home page Track home page: http://courses.cms.caltech.edu/courses/cs11/material

More information

the beginning of the program in order for it to work correctly. Similarly, a Confirm

the beginning of the program in order for it to work correctly. Similarly, a Confirm I our sytax, a Assume statemet will be used to record what must be true at the begiig of the program i order for it to work correctly. Similarly, a Cofirm statemet is used to record what should be true

More information

Homework 1 Solutions MA 522 Fall 2017

Homework 1 Solutions MA 522 Fall 2017 Homework 1 Solutios MA 5 Fall 017 1. Cosider the searchig problem: Iput A sequece of umbers A = [a 1,..., a ] ad a value v. Output A idex i such that v = A[i] or the special value NIL if v does ot appear

More information

Τεχνολογία Λογισμικού

Τεχνολογία Λογισμικού ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Τεχνολογία Λογισμικού, 7ο/9ο εξάμηνο 2018-2019 Τεχνολογία Λογισμικού Ν.Παπασπύρου, Αν.Καθ. ΣΗΜΜΥ, ickie@softlab.tua,gr

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

n Learn how resiliency strategies reduce risk n Discover automation strategies to reduce risk

n Learn how resiliency strategies reduce risk n Discover automation strategies to reduce risk Chapter Objectives Lear how resiliecy strategies reduce risk Discover automatio strategies to reduce risk Chapter #16: Architecture ad Desig Resiliecy ad Automatio Strategies 2 Automatio/Scriptig Resiliet

More information

% Sun Logo for. X3T10/95-229, Revision 0. April 18, 1998

% Sun Logo for. X3T10/95-229, Revision 0. April 18, 1998 Su Microsystems, Ic. 2550 Garcia Aveue Moutai View, CA 94045 415 960-1300 X3T10/95-229, Revisio 0 April 18, 1998 % Su Logo for Joh Lohmeyer Chairperso, X3T10 Symbios Logic Ic. 1635 Aeroplaza Drive Colorado

More information

Abstract. Chapter 4 Computation. Overview 8/13/18. Bjarne Stroustrup Note:

Abstract. Chapter 4 Computation. Overview 8/13/18. Bjarne Stroustrup   Note: Chapter 4 Computatio Bjare Stroustrup www.stroustrup.com/programmig Abstract Today, I ll preset the basics of computatio. I particular, we ll discuss expressios, how to iterate over a series of values

More information

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III GE2112 - FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III PROBLEM SOLVING AND OFFICE APPLICATION SOFTWARE Plaig the Computer Program Purpose Algorithm Flow Charts Pseudocode -Applicatio Software Packages-

More information

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis Outlie ad Readig Aalysis of Algorithms Iput Algorithm Output Ruig time ( 3.) Pseudo-code ( 3.2) Coutig primitive operatios ( 3.3-3.) Asymptotic otatio ( 3.6) Asymptotic aalysis ( 3.7) Case study Aalysis

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued

More information

Cache and I/O Efficient Functional Algorithms

Cache and I/O Efficient Functional Algorithms Cache ad I/O Efficiet Fuctioal Algorithms Guy E. Blelloch Robert Harper Caregie Mello Uiversity guyb@cs.cmu.edu rwh@cs.cmu.edu Abstract The widely studied I/O ad ideal-cache models were developed to accout

More information

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems

More information

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING Y.K. Patil* Iteratioal Joural of Advaced Research i ISSN: 2278-6244 IT ad Egieerig Impact Factor: 4.54 HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING Prof. V.S. Nadedkar** Abstract: Documet clusterig is

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Chapter 8. Strings and Vectors. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Chapter 8. Strings and Vectors. Copyright 2014 Pearson Addison-Wesley. All rights reserved. Chapter 8 Strigs ad Vectors Overview 8.1 A Array Type for Strigs 8.2 The Stadard strig Class 8.3 Vectors Slide 8-3 8.1 A Array Type for Strigs A Array Type for Strigs C-strigs ca be used to represet strigs

More information

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions U.C. Berkeley CS170 : Algorithms Midterm 1 Solutios Lecturers: Sajam Garg ad Prasad Raghavedra Feb 1, 017 Midterm 1 Solutios 1. (4 poits) For the directed graph below, fid all the strogly coected compoets

More information

1 Enterprise Modeler

1 Enterprise Modeler 1 Eterprise Modeler Itroductio I BaaERP, a Busiess Cotrol Model ad a Eterprise Structure Model for multi-site cofiguratios are itroduced. Eterprise Structure Model Busiess Cotrol Models Busiess Fuctio

More information

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

Massachusetts Institute of Technology Lecture : Theory of Parallel Systems Feb. 25, Lecture 6: List contraction, tree contraction, and

Massachusetts Institute of Technology Lecture : Theory of Parallel Systems Feb. 25, Lecture 6: List contraction, tree contraction, and Massachusetts Istitute of Techology Lecture.89: Theory of Parallel Systems Feb. 5, 997 Professor Charles E. Leiserso Scribe: Guag-Ie Cheg Lecture : List cotractio, tree cotractio, ad symmetry breakig Work-eciet

More information

Overview. Chapter 18 Vectors and Arrays. Reminder. vector. Bjarne Stroustrup

Overview. Chapter 18 Vectors and Arrays. Reminder. vector. Bjarne Stroustrup Chapter 18 Vectors ad Arrays Bjare Stroustrup Vector revisited How are they implemeted? Poiters ad free store Destructors Iitializatio Copy ad move Arrays Array ad poiter problems Chagig size Templates

More information

WYSE Academic Challenge Sectional Computer Science 2005 SOLUTION SET

WYSE Academic Challenge Sectional Computer Science 2005 SOLUTION SET WYSE Academic Challege Sectioal Computer Sciece 2005 SOLUTION SET 1. Correct aswer: a. Hz = cycle / secod. CPI = 2, therefore, CPI*I = 2 * 28 X 10 8 istructios = 56 X 10 8 cycles. The clock rate is 56

More information

Chapter 8. Strings and Vectors. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 8. Strings and Vectors. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 8 Strigs ad Vectors Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 8.1 A Array Type for Strigs 8.2 The Stadard strig Class 8.3 Vectors Copyright 2015 Pearso Educatio, Ltd..

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

Σ P(i) ( depth T (K i ) + 1),

Σ P(i) ( depth T (K i ) + 1), EECS 3101 York Uiversity Istructor: Ady Mirzaia DYNAMIC PROGRAMMING: OPIMAL SAIC BINARY SEARCH REES his lecture ote describes a applicatio of the dyamic programmig paradigm o computig the optimal static

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

Keywords Software Architecture, Object-oriented metrics, Reliability, Reusability, Coupling evaluator, Cohesion, efficiency

Keywords Software Architecture, Object-oriented metrics, Reliability, Reusability, Coupling evaluator, Cohesion, efficiency Volume 3, Issue 9, September 2013 ISSN: 2277 128X Iteratioal Joural of Advaced Research i Computer Sciece ad Software Egieerig Research Paper Available olie at: www.ijarcsse.com Couplig Evaluator to Ehace

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

LU Decomposition Method

LU Decomposition Method SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS LU Decompositio Method Jamie Traha, Autar Kaw, Kevi Marti Uiversity of South Florida Uited States of America kaw@eg.usf.edu http://umericalmethods.eg.usf.edu Itroductio

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

Priority Queues. Binary Heaps

Priority Queues. Binary Heaps Priority Queues Biary Heaps Priority Queues Priority: some property of a object that allows it to be prioritized with respect to other objects of the same type Mi Priority Queue: homogeeous collectio of

More information

Lower Bounds for Sorting

Lower Bounds for Sorting Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Appendix A. Use of Operators in ARPS

Appendix A. Use of Operators in ARPS A Appedix A. Use of Operators i ARPS The methodology for solvig the equatios of hydrodyamics i either differetial or itegral form usig grid-poit techiques (fiite differece, fiite volume, fiite elemet)

More information

n The C++ template facility provides the ability to define n A generic facility allows code to be written once then

n The C++ template facility provides the ability to define n A generic facility allows code to be written once then UCLA PIC 10 B Problem Solvig usig C++ Programmig Ivo Diov, Asst. Prof. i Mathematics, Neurology, Statistics Istructor: Teachig Assistat: Suzae Nezzar, Mathematics Chapter 13 Templates for More Abstractio

More information

Analysis of Algorithms

Analysis of Algorithms Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Aalysis of Algorithms Iput 2015 Goodrich ad Tamassia Algorithm Aalysis of Algorithms

More information

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein 068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig

More information

Outline. CSCI 4730 Operating Systems. Questions. What is an Operating System? Computer System Layers. Computer System Layers

Outline. CSCI 4730 Operating Systems. Questions. What is an Operating System? Computer System Layers. Computer System Layers Outlie CSCI 4730 s! What is a s?!! System Compoet Architecture s Overview Questios What is a?! What are the major operatig system compoets?! What are basic computer system orgaizatios?! How do you commuicate

More information

SCI Reflective Memory

SCI Reflective Memory Embedded SCI Solutios SCI Reflective Memory (Experimetal) Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o

More information

Graphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU)

Graphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU) Graphs Miimum Spaig Trees Slides by Rose Hoberma (CMU) Problem: Layig Telephoe Wire Cetral office 2 Wirig: Naïve Approach Cetral office Expesive! 3 Wirig: Better Approach Cetral office Miimize the total

More information

ICS Regent. Communications Modules. Module Operation. RS-232, RS-422 and RS-485 (T3150A) PD-6002

ICS Regent. Communications Modules. Module Operation. RS-232, RS-422 and RS-485 (T3150A) PD-6002 ICS Reget Commuicatios Modules RS-232, RS-422 ad RS-485 (T3150A) Issue 1, March, 06 Commuicatios modules provide a serial commuicatios iterface betwee the cotroller ad exteral equipmet. Commuicatios modules

More information

Computers and Scientific Thinking

Computers and Scientific Thinking Computers ad Scietific Thikig David Reed, Creighto Uiversity Chapter 15 JavaScript Strigs 1 Strigs as Objects so far, your iteractive Web pages have maipulated strigs i simple ways use text box to iput

More information

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a 4. [10] Usig a combiatorial argumet, prove that for 1: = 0 = Let A ad B be disjoit sets of cardiality each ad C = A B. How may subsets of C are there of cardiality. We are selectig elemets for such a subset

More information

Τεχνολογία Λογισμικού

Τεχνολογία Λογισμικού ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Τεχνολογία Λογισμικού, 7ο/9ο εξάμηνο 2018-2019 Τεχνολογία Λογισμικού Ν.Παπασπύρου, Αν.Καθ. ΣΗΜΜΥ, ickie@softlab.tua,gr

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware A Overview Graphics System Moitor Iput devices CPU/Memory GPU Raster Graphics System Raster: A array of picture elemets Based o raster-sca TV techology The scree (ad a picture)

More information

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions Proceedigs of the 10th WSEAS Iteratioal Coferece o APPLIED MATHEMATICS, Dallas, Texas, USA, November 1-3, 2006 316 A Geeralized Set Theoretic Approach for Time ad Space Complexity Aalysis of Algorithms

More information