10. Parallel Methods for Data Sorting

Size: px
Start display at page:

Download "10. Parallel Methods for Data Sorting"

Transcription

1 10. Parallel Methods for Data Sorting 10. Parallel Methods for Data Sorting Parallelizing Princiles Scaling Parallel Comutations Bubble Sort Sequential Algorithm Odd-Even Transosition Algorithm Comutation Decomosition and Analysis of Information Deendencies Scaling and Distributing Subtasks among Processors Efficiency Analysis Comutational Exeriment Results Shell Sort Sequential Algorithm Parallel Algorithm Efficiency Analysis Comutational Exeriment Results Quick Sort Sequential Algorithm The Parallel Quick Sort Algorithm Parallel Comutational Scheme Efficiency Analysis Comutational Exeriments Results The Parallel HyerQuickSort Algorithm Software Imlementation Comutational Exeriments Results The Parallel Sorting by Regular Samling Parallel Comutational Scheme Efficiency Analysis Comutational Exeriment Results Summary References Discussions Exercises...3 Data sorting is one of tyical roblems of data rocessing and is usually considered to be a roblem of redistributing the elements of a given sequence of values S = { a1, a,..., an in order of the monotonic increase or decrease S ~ S' = {( a, a ' ' ' ' ' ' 1,..., an) : a1 a n... a (hereinafter we will be discussing only the examle of data sorting in the increasing order). The ossible methods of solving this roblem are broadly discussed. The work by Knuth (1997) gives a comlete survey of the data sorting algorithms. Among the latest editions we may recommend the work by Cormen et al. (001). The comutational comlexity of the sorting methods is considerably high. Thus, for a number of well known methods (bubble sort, insertion sorting etc.) the number of the necessary oerations is determined by the square deendence with resect to the number of the data being sorted T ~ n.

2 For more efficient algorithms (merge sorting, Shell sorting, quick sorting) the comlexity is determined by the following value: T ~ nlog n. This relation gives also the lower estimation of the necessary number of oerations for sorting the set of n values. The algorithms of smaller comlexity may be obtained only for articular variants of the roblem. Data sorting seedu may be rovided by means of using several (>1) rocessors. In this case the data is distributed among the rocessors. In the course of calculations the data are transmitted among the rocessors and one art of data is comared to another one. As a rule, the resulting (sorted) data are distributed among the rocessors. To regulate such distribution a scheme of consecutive numeration is introduced for the rocessors. It is usually required that after sorting termination the data located on the rocessors with smaller numbers should not exceed the values on the rocessors with greater numbers. The extensive analysis of the data sorting roblem should be a subject of further consideration. In this Section the main attention is devoted to the study of arallel methods of execution for a number of well known methods of internal sorting, when all the ordered data on each rocessor may be fully located in main memory. This Section has been written based essentially on the teaching materials given in Kumar, at al. (199) and Quinn (00) Parallelizing Princiles Under closer consideration of data sorting oerations alied in sorting algorithms, it becomes evident that many methods are based on the same basic comare-exchange oeration. This oeration consists in comaring a air of values of the data set being sorted and exchanging the values, if their values do not corresond to the sorting conditions. // Basic comare-exchange oeration if ( A[i] > A[j] ) { tem = A[i]; A[i] = A[j]; A[j] = tem; Examle Basic comare-exchange oeration of many sorting rocedures The successive alication of this oeration makes ossible to sort the data. In many cases just aroaches for choosing the airs of this oeration determine the main difference between the sorting algorithms. Let us consider the situation when the number of rocessors coincides with the number of values being sorted (i.e. =n) and, as a result, there is only one value of the initial data on each rocessor. This consideration will be done for arallel generalization of the selected basic oeration. Then the comarison of the values a i and a j, located corresondingly on rocessors P i and P i, may be organized in the following way (a arallel generalization of the basic sorting oeration): Exchange the values available on rocessors P i and P i (the initial elements must be ket on the rocessors), Comare on each rocessor P i and P i the obtained identical airs of values (a, a i j); the results of the comarison are used for data distribution among the rocessors: the smaller element remains on a rocessor (for instance, P i ), the other rocessor (i.e. P j ) stores the greater value of the air for further rocessing ' i a = min( a, a ), a = max( a, a ). i j ' j 10.. Scaling Parallel Comutations i j Such arallel generalization of the basic sorting oeration may be adequately adoted for the case when <n, i.e. the number of rocessors is smaller than the number of the values being sorted. Each rocessor in this situation will already hold a art (a block of size n/) of data being sorted. Let us define the result of the arallel sorting algorithm execution as such the situation, when the data on the rocessors are sorted and the order of block distribution among the rocessors corresonds to linear numeration order (i.e. the value of the last element on the rocessor P i is less or equal to the value of the first element on the rocessor P i+1, where 0 i <-1). Blocks are usually sorted at the very beginning of sorting on each rocessor searately by means of some fast algorithm (the initial stage of arallel sorting). Then in accordance with the described above scheme of a single value comarison, the interaction of the rocessors P i and P i+1 for sorting the air of blocks A i and A i+1 can be imlemented as follows:

3 Execute the exchange of blocks between the rocessors P i and P i+1, Unite the blocks A i and A i+1 on each rocessor into a sorted block of double size (if blocks A i and A i+1 have been initially sorted, the rocedure of uniting is reduced to fast merging the sorted data), Subdivide the obtained double block into two equal arts and leave one of the arts (for instance, with smaller data values) on the rocessor P i ; then the other art (with the greater values corresondingly) must be located on the rocessor P i+1 ' ' ' ' ' ' ' ' [ Ai Ai + 1 ] сорт = Ai Ai + 1 : ai Ai, a j Ai + 1 ai a j. This rocedure is usually called the comare-slit oeration. It should be noted that the blocks formed as a result of the rocedure on the rocessors P i and P i+1 are of the same size as the initial blocks A i and A i+1 and all the values located on the rocessor P i, do not exceed the values on the rocessor P i+1. The above mentioned comare-slit oeration may be defined as the basic comutational subtask for organizing arallel comutations. As it follows from its construction, the number of such subtasks arametrically deends on the number of the available rocessors. As a result, the roblem of scaling the comutations for arallel algorithms of data sorting became ractically unnecessary. Alongside with this it should be noted that the data blocks of the subtasks change in the course of sorting. In simle cases the size of data blocks in the subtasks remains the same. In more comlicated situations (as, for instance, in the quick sorting algorithms, - see Subsection 10.5) the amounts of data located on the rocessors may be different, which may lead to the violation of equal comutational rocessor loading Bubble Sort Sequential Algorithm The sequential bubble sort algorithm (see, for instance, Knuth (1997), Cormen et al. (001)) comares and exchanges the neighboring elements in a sequence to be sorted. For the sequence (a, a,, a ) 1 n the algorithm first executed n-1 basic comare-exchange oerations for sequential airs of elements (a 1, a ), (a, a 3 ),..., (a n-1,a n ). As a result, the biggest element is moved to the end of the sequence after the first algorithm iteration. Then the last element in the transformed sequence may be omitted, and the above described rocedure is alied to the remaining art of the sequence (a' 1, a',..., a' n-1 ). As it can be seen, the sequence may be sorted out after n-1 iterations. The bubble sorting efficiency may be imroved, if the algorithm is terminated when there no changes of the data sequence being sorted in the course of some successive sorting iteration. // Algorithm // Sequential bubble sorting algorithm BubbleSort(double A[], int n) { for (i=0; i<n-1; i++) for (j=0; j<n-i; j++) comare_exchange(a[j], A[j+1]); Algorithm Odd-Even Transosition Algorithm The sequential bubble sort algorithm The bubble sort algorithm is rather comlicated for arallelizing. The comarison of the value airs of the sorted data is strictly sequential. In this connection the modification of the algorithm, which is known as the oddeven transosition, is used in arallel alication, - see, for instance, Kumar et al. (003). The essence of modification may be described as follows: two different rules of executing the method iterations are introduced into the sort algorithm. The elements with odd or even indices corresondingly are chosen for rocessing deending on the even or odd number of the sorting iteration. The selected values are comared to their right neighboring elements. Thus, at all odd iterations the following airs are comared: (a 1, a ), (a 3, a ),..., (a n-1,a n ) (if n is even), at even iterations the following elements are rocessed (a, a 3 ), (a, a 5 ),..., (a n-,a n-1 ). After n sorting iterations the initial data aears to be ordered. 3

4 //Algorithm 10. // Sequential odd-even transosition algorithm OddEvenSort ( double A[], int n ) { for ( i=1; i<n; i++ ) { if ( i%==1 ) { // odd iteration for ( j=0; j<n/-; j++ ) comare_exchange(a[j+1],a[j+]); if ( n%==1 ) // the comarison of the last air, if n is odd comare_exchange(a[n-],a[n-1]); if ( i%==0 ) // even iteration for ( j=1; j<n/-1; j++ ) comare_exchange(a[j],a[j+1]); Algorithm 10.. The sequential odd-even transosition algorithm Comutation Decomosition and Analysis of Information Deendencies Obtaining a arallel variant for the odd-even transosition method does not cause any roblems. The airs of values at sorting iterations may be comared indeendently and in arallel. In case when <n, i.e. the number of rocessor is less than the number of the values being sorted, the rocessors contain the data blocks of n/size. The comare-slit oeration may be used as the basic comutational subtask (see Subsection 10.). //Algorithm 10.3 // Parallel algorithm of odd-even transosition ParallelOddEvenSort(double A[], int n) { int id = GetProcId(); // Process number int n = GetProcNum(); // Number of rocessors for ( int i=0; i<n; i++ ) { if ( i% == 1 ) { // Odd iteration if ( id% == 1 ) { // Odd rocess number // Comare-exchange with the right neighbor rocess if ( id < n -1 ) comare_slit_min(id+1); else // Comare-exchange with the left neighbor rocess if ( id > 0 ) comare_ slit_max(id-1); if ( i% == 0 ) { // Even iteration if( id% == 0 ) { // Even rocess number // Comare-exchange with the right neighbor rocess if ( id < n -1 ) comare_ slit_min(id+1); else // Comare-exchange with the left neighbor rocess comare_ slit_max(id-1); Algorithm The arallel odd-even transosition algorithm To exlain this arallel method of data sorting Figure 10.1 shows the examle of data sorting when n=16, = (i.e. the block of values on each rocessor holds n/= elements). The number and tye of the method iteration are given in the first column of the table. The same column shows the airs of the rocessors, for which the comare-slit oeration are executed in arallel. The interacting airs of rocessors are shown in the Table in double-lined frames. The Table shows the state of data being sorted for each sorting ste before and after iteration execution. Table The examle of data sorting by means of the arallel odd-even transosition method and tye Processors of iteration 1 3

5 Initial Data odd (1,),(3.) even (,3) 3 odd (1,),(3.) even (,3) In the general case the execution of the arallel method may be terminated, if there are no changes in the state of the data being sorted during two sequential iterations of sorting. As a result, the total number of iterations may be reduced. To imlement such modification a control rocessor should be introduced for fixing such situations. This rocessor should determine the state of the data after the execution of each sorting iteration. However, the comlexity of this communication oeration (gathering the messages from all the rocessors) may be so significant that the overhead of data communications will exceed the effect of the ossible reduction of method iterations Scaling and Distributing Subtasks among Processors As it has been reviously mentioned, the number of subtasks corresonds to the number of the available rocessors. As a result, there is no need for comutation scaling. The initial distribution of the blocks of the data being sorted among the rocessors may be randomly chosen. In order to execute the discussed arallel sorting algorithm efficiently, it is necessary that all the rocessors with the neighboring numbers should have direct communication lines Efficiency Analysis Let us estimate the general comlexity of the discussed arallel sort algorithm and then add the comlexity characteristics of the erformed communications to the obtained relations. Let us first determine the comlexity of the sequential comutations. The bubble sort algorithm allows to demonstrate a very imortant asect in consideration of this roblem. As it has been already mentioned, the method of data sorting used for arallelizing is characterized by a square deendence of comlexity with resect to the number of data being sorted, i.e. T 1 ~ n. However, alication of this nonotimal comlexity estimation of the sequential algorithm will lead to the distortion of the quality criteria sense of arallel comutations. In this case the efficiency characteristics would rather refer to the arallel execution of a given sort method than to the effectiveness of using arallelism for solving the roblem of data sorting on the whole. The difference is that more efficient sequential algorithms may be used for sorting and comlexity of these algorithms is the order: T 1 = nlog n. (10.1) It is essential to use this very comlexity estimation in order to comare, how faster the data may be sorted by means of arallel comutations. As a result, we can formulate the following: the efficiency of the best sequential algorithm should be used as the estimation of the comlexity for the sequential method of solving the roblem under consideration in determining the seedu and efficiency characteristics for arallel comutations. Parallel methods of solving roblems should be comared to the most efficient fast sequential comutational methods! Let us determine now the comlexity of the described arallel algorithm of data sorting. As it has been reviously mentioned, each rocessor at the initial stage of the method oeration sorts out its data blocks (the size of blocks in case of equal data distribution is equal to n/). Let us assume that this initial sorting may be erformed by means of the best sequential sort algorithms. The comlexity of the initial comutational stage may be determined in this case by the following relation: 1 T = n / ) log ( n / ). (10.) ( Then at each iteration of arallel sorting the interacting airs of rocessors exchange the blocks with each other. The block airs formed on each rocessor are united using the merge rocedure. The total number of iterations does not exceed the value. As a result, the total number of oerations in this art of arallel comutations aears to be equal to the following: T = ( n / ) = n. (10.3) 5

6 With regard to the obtained relations the efficiency and seedu characteristics for the arallel method of data sorting look as follows: S E n log n = = ( n / ) log ( n / ) + n log n log n = = (( n / ) log ( n / ) + n) log log n, ( n / ) + log n ( n / ) +. (10.) Let us enhance these exressions by taking into account the duration of the comutational oerations erformed and estimate the comlexity of the block exchange between the rocessors. In case when the Hockney model is used, the total execution time for all the block exchanges erformed in the course of sorting may be estimated by means of the following relation: T ( comm) ( α + w ( n ) / β ) =, (10.5) where α is the latency, β is the network bandwidth, and w is the size of the data element in bytes. With regard to the comlexity of the communication oerations the total execution time of the arallel data sort algorithm is determined by the following exression: T ( α + w ( n ) β ) = (( n / ) log ( n / ) + n)τ + /, (10.6) where τ is the execution time of the basic sorting oeration Comutational Exeriment Results The comutational exeriments for estimating the efficiency of the arallel bubble sort algorithm were carried out under the conditions described in In brief terms these conditions are the following. The exeriments were carried out on the comutational cluster on the basis of the rocessor Intel XEON EM6T 3000 Mhz and Gigabit Ethernet under OS Microsoft Windows Server 003 Standart x6 Edition (see 1..3). To estimate the duration τ of the basic sorting oeration we solve the roblem of ordering by means of a sequential algorithm. The time of comutations obtained this way was further divided by the total number of oerations. The value 10.1 nsec was obtained for τ as a result of the exeriments. The exeriments carried out in order to determine the network arameters showed the value of the latency α and the value of the network bandwidth β corresondingly 130 msec and 53.9 Mbyte/sec. All the comutations were executed with the numerical values of double tye, i.e. the value w is equal to 8 bytes. The results of the comutational exeriments are given in Table The exeriments were carried out with the use of two and four rocessors. Number of elements Table The results of the comutational exeriments for the arallel bubble sort algorithm Sequential algorithm Parallel algorithm rocessors rocessors Time Seedu Time Seedu 10, , , , ,

7 Seedu 0, , , , , , , , , , Number of rocesses elements 0000 elements elements 0000 elements elements Figure Seedu of the arallel bubble sort algorithm According to the exerimental results of the comutational exeriments the arallel bubble sort algorithm oerates more slowly than the original sequential method of bubble sorting. The reason for it is that the volume of the data transmitted among the rocessors is rather large and is comarable to the number of the executed comutational oerations (this disbalance of the amount of comutations and the comlexity of data communications grows with the increase of the number of rocessors). The comarison of the exerimental execution time T and the theoretical estimation T from (10.5) is given in Table 10. and Figure 10.. Table 10.. The comarison of the exerimental and theoretical execution time for the arallel bubble sort algorithm Data size T Parallel algorithm rocessors rocessors T T 10, , , , , T 7

8 0, , Time 0, , ,00000 Exeriment Model 0, , Number of elements Figure 10.. Exerimental and theoretical execution time for Processors 10.. Shell Sort Sequential Algorithm In case of the Shell sort algorithm (see, for instance, например, Knuth (1997), Cormen et al. (001)) from the very beginning the comared airs of values are formed from elements that are located rather far from each other in the sorted data. This modification of the sort method makes ossible to ermute unsorted airs of distant located values fast enough (sorting such airs usually requires a greater number of ermutation oerations, if only neighboring elements are comared). The general scheme of the method is described below. The elements of n/ airs (a i, a n/+i ) for 1 i n/.are sorted during the first ste of the algorithm. The elements of n/ grous of four elements each (a i, a n/+1, a n/+1, a 3n/+1 ) for 1 i n/ are sorted during the second ste. During the third ste the elements of n/8 grous of eight elements each are sorted etc. All the elements of the array (a 1, a,, a n) are sorted at the last ste. The insertion sort method is used at each ste for sorting elements in grous. As it can be noted the total number of iterations of the Shell algorithm is equal to log n. The Shell algorithm can be resented in a simler way as it is shown below: // Algorithm 10. // Sequential algorithm of Shell sorting ShellSort(double A[], int n){ int incr = n/; while( incr > 0 ) { for ( int i=incr+1; i<n; i++ ) { j = i-incr; while ( j > 0 ) if ( A[j] > A[j+incr] ){ swa(a[j], A[j+incr]); j = j - incr; else j = 0; incr = incr/; Algorithm Parallel Algorithm The sequential Shell sort algorithm A arallel variant of the Shell sort method may be suggested (see, for instance, Kumar et al. (003)), if the communication network toology may be resented as an N-dimensional hyercube (if the number of rocessors is 8

9 equal to = N ). In this case sorting may be subdivided into two sequential stages. The interaction of the rocessors neighboring in the hyercube structure takes lace at the first stage (N iterations). These rocessors may aear to be rather far from each other in case of linear enumeration. The required maing the hyercube toology into the linear array structure may be imlemented using the Gray code (see Section 3). Forming the airs of rocessors interacting with each other during the comare-slit oeration may be rovided by means of the following simle rule: the rocessors whose bit codes of their numbers differ only in osition N-i are aired at each iteration i, 0 i < N. At the second stage the usual iterations of the arallel odd-even transosition algorithm are erformed. The iterations of this stage are executed u to the actual termination of changes of the data being sorted. Thus, the total number L of such iterations may vary from to. Figure 10.3 shows the examle of sorting the array, which consists of 16 elements by means of the discussed method. It should be noted that the data aears to be sorted after the comletion of the first stage, and there is no need to execute the odd-even transosition iterations iteration after the comletion of the second iteration iteration Figure The examle of the use of the arallel Shell algorithm for rocessors (the rocessors are marked by circles, the rocessor numbers are given in their binary reresentation) With regard to the given descrition the same decomosition aroach can be alied and define the comareslit oeration as the basic comutational subtask. As a result, the number of subtasks will coincides with the number of the available rocessors (the size of the data blocks in the subtasks is equal to n/). As a result, scaling the comutations is not needed again. The distribution of the sorted data among the rocessors should be selected with regard to the efficient imlementation of the comare-slit oeration in the hyercube network toology Efficiency Analysis The relations obtained for arallel bubble sort method of (see Subsection ) may be used for estimating the efficiency of the arallel variant of the Shell algorithm. It is only necessary to take into account the two stages of the Shell algorithm. With regard to this eculiarity the total execution time for the new arallel method may be determined by means of the following exression: = n / ) log ( n / ) τ + (log + L)[(n / ) τ + ( α + w ( n ) / )]. (10.7) T ( β As it can be noted, the efficiency of the arallel variant of Shell sorting deends considerably on the value L. If the value L is small, the new arallel sorting method is executed more quickly than the reviously described oddeven transosition algorithm Comutational Exeriment Results The comutational exeriments for estimating the efficiency of the Shell sort arallel method were carried out under the same conditions as the exeriments described reviously (see 10.3.). The results of the comutational exeriments are given in Table 10.. The exeriments were carried out with the use of and rocessors. The time is given in seconds. Table 10.. The results of the comutational exeriments for the arallel Shell sort algorithm Number of Sequential Parallel algorithm 9

10 elements algorithm rocessors rocessors Time Seedu Time Seedu 10, , , , , Seedu 0, , , , , , , , , , Number of Processors elements 0000 elements elements 0000 elements elements Figure 10.. Seedu the arallel Shell sort algorithm The comarison of the exerimental execution time T and the theoretical estimation T from (10.7) is given in Table 10.5 and Figure Table The comarison of the exerimental and theoretical execution time for the Shell sort arallel algorithm Number of elements T Parallel algorithm rocessors rocessors T T 10, , , , , T 10

11 0, , , Time 0, , Exeriment Model 0, , , Number of elements Figure Exerimental and theoretical execution time for rocessors Quick Sort Sequential Algorithm In the general consideration of the quick sort algorithm suggested by Hoare C.A.R., first of all it should be noted that the method is based on the sequential subdividing the sorted data into blocks of smaller sizes in such a way that the ordering relation is rovided among the values of different blocks (for any air of blocks all the values of one of the blocks do not exceed the values of the other one). The division of the original data into the first two arts is erformed at the first iteration of the method. A certain ivot element is selected for roviding this division, and all the values of the data, which are smaller that the ivot element, are transferred to the first block being formed. All the rest of the values form the second block of the sorted data. These rules are alied recursively for the two created blocks on the second iteration of the sorting etc. If the choice of the ivot elements is adequate, than the initial data array aears to be sorted after the execution of log n iterations. More detailed information concerning the method may be found in Knuth (1997), Cormen et al. (001). The quick sort method efficiency is determined to a great extent by the choice of the ivot elements during the data division into blocks. At worst case the comlexity of the method is of the same order of comlexity as the bubble sort method (i.e. T 1 ~ n ). If the choice of the ivot elements is otimal, than each block is divided into equal sized arts and the comlexity of the algorithm coincides with the comlexity of the most efficient sort methods ( T1 ~ nlog n ). On average the number of the oerations carried out by the quick sort algorithm is determined by the following exression (see, for instance, Knuth (1997), Cormen et al. (001)): T 1 = 1.n log n. The general scheme of the quick sorting algorithm may be given in the following form (the ivot element is determined by the first element value of the sorted data): // Algorithm 10.5 // The sequential Algorithm of Quick Sorting QuickSort(double A[], int i1, int i) { if ( i1 < i ){ double ivot = A[i1]; int is = i1; for ( int i = i1+1; i<i; i++ ) if ( A[i] ivot ) { is = is + 1; swa(a[is], A[i]); swa(a[i1], A[is]); QuickSort (A, i1, is); QuickSort (A, is+1, i); 11

12 Algorithm The sequential quick sort slgorithm The Parallel Quick Sort Algorithm Parallel Comutational Scheme The arallel generalization of the quick sorting algorithm (see, for instance, Quinn (00)) may be obtained in the simlest way for a comuter system, the toology of which is an N-dimensional hyercube (i.e. = N ). Let the initial data, as reviously, be distributed among the rocessors in blocks of the same size n/. The resulting location of blocks must corresond to the enumeration of the hyercube rocessors. Under these conditions a ossible method to execute the first iteration of the arallel method is the following: Select the ivot element and broadcast it to all the rocessors (for instance, the arithmetic mean of the elements of some ivot rocessor may be chosen as the ivot element); Subdivide the data block available on each rocessor into two arts using the ivot element; Form the airs of rocessors, for which the bit resentation of the numbers differs only in N osition. After that the exchange of the data among these rocessors should be executed. As a result of these data transmissions, the arts of the blocks with the values smaller than the ivot element must aear on the rocessors, for which the bit osition N of the rocessor numbers are equal to 0. The rocessors with the numbers in which the bit N is equal to 1 must collect corresondingly all the data values exceeding the value of the ivot element. As a result of executing this iteration, the initial data aear to be subdivided into two arts. One of them (with the values smaller than the ivot element value) is located on the rocessors, whose numbers hold 0 in the N-th bit. There are only / such rocessors. Thus, the initial N-dimensional hyercube also is subdivided into two subhyercubes of N-1 dimension. The above described rocedure may also be alied to these subhyercubes. After executing N such iterations, it is sufficient to sort the data blocks which have been formed on each searate rocessor to terminate the method. To illustrate the arallel quick algorithm Figure 10.6 shows the examle of sorting data when n=16, = (i.е. each rocessor block holds four elements). The rocessors are shown as rectangles, the data blocks being sorted are shown inside the rectangles. The block values are given at the beginning and at the comletion of each sorting iteration. The interacting airs of rocessors are linked by double-headed arrows. The otimal values of the ivot elements were chosen for data artitioning. At the first iteration the value 0 was used for all the rocessors. At the second iteration for the air of rocessors (0, 1) the ivot element was equal to, for the air (, 3) the value was chosen to be equal to iteration beginning 1 iteration comletion (the leading element =0 Proc. Proc.3 Proc. Proc Proc.0 Proc.1 Proc.0 Proc.1 Figure iteration beginning iteration comletion Proc. Proc.3 Proc. Proc Proc.0 Proc.1 Proc.0 Proc.1 The examle of sorting data by the arallel quick sort method of (the results of local block sorting are not included) As reviously, the basic comutational subtask may be the comare-slit oeration. The number of the subtasks coincides with the number of the rocessors used. The distribution of the subtasks among the rocessors should be done with regard to the efficient algorithm execution for the hyercube network toology. 1

13 Efficiency Analysis Let us estimate the comlexity of the described arallel method. Let us assume that we have an N-dimensional hyercube (i.e. = N ) and <n. The efficiency of the arallel quick sort method deends largely on the otimality of the ivot element choice, as it was in case of the sequential variant. It is rather comlicated to work out the general rule for the selection of these values. But this choice can be imlemented easier if at the beginning of the method execution the rocessor data blocks are sorted. It is also useful to rovide the more uniform data distribution among the rocessors. Let us determine the comutational comlexity of the sort algorithm. At each of log sorting iterations each rocessor divides the data block with regard to the ivot element. The comlexity of this stage is n/ oerations (let us consider the best ossible case that each block is divided into equally sized arts at each sorting iteration). After the termination of the comutations the rocessors carry out sorting the blocks. It may be done in (n/)log (n/) oerations by means of using the quick sort algorithm. Thus, the total comutational time for the arallel quick sort algorithm is the following: T calc) = [( n / ) log + ( n / ) log ( n / )]τ, (10.8) ( where τ is the execution time of the basic sorting oeration. Let us consider the comlexity of the communication oerations. The total number of the rocessor communications to broadcast the ivot elements for the N-dimensional hyercube may be evaluated by the following estimation: N i = N( N + 1) / = log (log + 1) / ~ (log ). (10.9) i = 1 With regard to the assumtion we have made (the choice of the ivot elements is otimal), we define the number of the algorithm iterations as equal to log, and the amount of the transmitted data as always equal to a half of the block, i.e. (n/)/. Under these conditions, the communication comlexity of the arallel algorithm for the quick sort method is determined by means of the following relation: ( β T comm) = (log ) ( α + w / β ) + log ( α + w( n / ) / ), (10.10) where α is the latency, β is the network bandwidth, and w is the size of the set element in bytes. Finally we may determine the algorithm time comlexity by the following exression: T = [( n / ) log + ( n / ) log( n / )] τ + (log ) ( α + w / β ) + log ( α + w( n / ) / β ). (10.11) Comutational Exeriments Results The comutational exeriments for estimating the efficiency of the arallel quick sort method were carried out under the same conditions as the exeriments described reviously (see 10.3.). The results of the comutational exeriments are given in Table The exeriments were carried out with the use of and rocessors. The time is given in seconds. Number of elements Table The results of the comutational exeriments for the arallel quick sort algorithm Sequential algorithm Parallel algorithm rocessors rocessors Time Seedu Time Seedu 10, , , , ,

14 Seedu 1, , , , , , , , , Number of rocesses elements 0000 elements elements 0000 elements elements Figure Seedu of the arallel quick sort algorithm According to the results of the comutational exeriments, the arallel quick sort algorithm allows to seed u solving the roblem of data sorting. The comarison of the exerimental execution time T and the theoretical estimation T from (10.11) is given in Table 10.7 and Figure Table The comarison of the exerimental and theoretical execution time for the arallel quick sort algorithm Data size T Parallel algorithm rocessors rocessors T T 10, , , , , T 0, , ,00000 Time 0, ,00000 Exeriment Model 0, , Number of elements 1

15 Figure Exerimental and theoretical execution time for rocessors The Parallel HyerQuickSort Algorithm In addition to the above described quick sort method, there is a generalized technique called the HyerQuickSort algorithm which suggests a secific scheme for choosing the ivot elements. In accordance with this scheme the data blocks located on the rocessors should be sorted at the very beginning of the comutations. Besides, the rocessors should merge the arts of blocks obtained after their artitioning so as to maintain data ordering in the course of comutations. As a result, due to the regularity of blocks, it is reasonable to choose the average element of some block (for instance, the block on the first rocessor) as the ivot element at each iteration of the quick sort algorithm. In some cases the ivot elements selected in such a way may aear to be very closer to real arithmetic mean of the data being sorted than any other randomly chosen value. All the other oerations in the algorithm being described are executed according to the original quick sort method. In detail the HyerQuickSort algorithm is described, for instance, in Quinn (00). It is ossible to use the relation (10.11) for analyzing the efficiency of the HyerQuickSort algorithm. It should be noted that the oeration of merging block arts is carried out at each method iteration (as reviously we assume that the size of the block arts is the same and is equal to n/)/). Besides, the rocedure of artitioning may be modified due to the block regularity. It is sufficient to carry out the binary search for the ivot element osition in a block instead of exhaustive linear search through all the block elements. With regard to this, the comlexity of the HyerQuickSort algorithm may be determined by means of the following exression: T = [( n / ) log( n / ) + (log( n / ) + ( n / ))log ] τ + (log ) ( α + w / β ) + log ( α + w( n / ) / β ).(10.1) Software Imlementation Let us discuss a ossible variant of software imlementation of the HyerQuickSort algorithm. It should be noted that rogram code of several modules is not given as its absence does not influence the understanding of the general scheme of arallel comutations. 1. The main function. The main function imlements the comutational method scheme by sequential calling out the necessary subrograms. // Program // The HyerQuickSort Method int ProcRank; // Rank of current rocess int ProcNum; // Number of rocesses int main(int argc, char argv[]) { double ProcData; // Data block for the rocess int ProcDataSize; // Data block size MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &ProcRank); MPI_Comm_size(MPI_COMM_WORLD, &ProcNum); // Data Initialization and their distribution among the rocessors ProcessInitialization ( &ProcData, &ProcDataSize); // Parallel sorting ParallelHyerQuickSort ( ProcData, ProcDataSize ); // The termination of rocess comutations ProcessTermination ( ProcData, ProcDataSize ); MPI_Finalize(); The function ProcessInitialization determines the initial data for the roblem being solved (the size of the data being sorted). It also allocates memory for data storage and generates the data being sorted (for instance, by means of random number generator). The function also distributes the data among the rocesses. The function ProcessTermination erforms the necessary outut of the sorted data and releases all the reviously allocated memory for storing the data. The imlementation of all the above mentioned functions may be erformed on the analogy with the examles, which have been discussed earlier and is given to the reader a training exercise. 15

16 . The function ParallelHyerQuickSort. It erforms arallel quick sorting according to the algorithm, which has been described above. // The Parallel HyerQuickSort Method void ParallelHyerQuickSort ( double ProcData, int ProcDataSize) { MPI_Status status; int CommProcRank; // Rank of the rocessor involved in communications double MergeData, // Block obtained after merging the block arts Data, // Block art, which remains on the rocessor SendData, // Block art, which is sent to the rocessor CommProcRank RecvData; // Block art, which is received from the roc CommProcRank int DataSize, SendDataSize, RecvDataSize, MergeDataSize; int HyercubeDim = (int)ceil(log(procnum)/log()); // Hyercube dimension int Mask = ProcNum; double Pivot; // Local data sorting LocalDataSort ( ProcData, ProcDataSize ); // Hyerquicksort iterations for ( int i = HyercubeDim; i > 0; i-- ) { // Determination of the ivot value and broadcast it to rocessors PivotDistribution (ProcData,ProcDataSize,HyercubeDim,Mask,i,&Pivot); Mask = Mask >> 1; // Determination of the data division osition int os = GetProcDataDivisionPos (ProcData, ProcDataSize, Pivot); // Block division if ( ( (rank&mask) >> (i-1) ) == 0 ) { // high order bit= 0 SendData = & ProcData[os+1]; SendDataSize = ProcDataSize - os 1; if ( SendDataSize < 0 ) SendDataSize = 0; CommProcRank = ProcRank + Mask Data = & ProcData[0]; DataSize = os + 1; else { // high order bit = 1 SendData = & ProcData[0]; SendDataSize = os + 1; if ( SendDataSize > ProcDataSize ) SendDataSize = os; CommProcRank = ProcRank Mask Data = & ProcData[os+1]; DataSize = ProcDataSize - os - 1; if ( DataSize < 0 ) DataSize = 0; // Sending the sizes of the data block arts MPI_Sendrecv(&SendDataSize, 1, MPI_INT, CommProcRank, 0, &RecvDataSize, 1, MPI_INT, CommProcRank, 0, MPI_COMM_WORLD, &status); // Sending the data block arts RecvData = new double[recvdatasize]; MPI_Sendrecv(SendData, SendDataSize, MPI_DOUBLE, CommProcRank, 0, RecvData, RecvDataSize, MPI_DOUBLE, CommProcRank, 0, MPI_COMM_WORLD, &status); // Data merge MergeDataSize = DataSize + RecvDataSize; MergeData = new double[mergedatasize]; DataMerge(MergeData, MergeData, Data, DataSize, RecvData, RecvDataSize); delete [] ProcData; delete [] RecvData; 16

17 ProcData = MergeData; ProcDataSize = MergeDataSize; The function LocalDataSort sorts the data block on each rocessor using the sequential quick sort algorithm. The function PivotDistribution determines the ivot element and sends its value to all the rocessors. The function GetProcDataDivisionPos calculates the osition of the data block artition with resect to the ivot element. The result of the function is the integer number, which determines the osition of the element on the border of two blocks. The function DataMerge merges the data arts into the sorted data block. 3. The function PivotDistribution. This function selects the ivot element and sends it to all the hyercube rocessors. As the data located on the rocessors have already been sorted, the ivot element is selected as the middle element of the data block. // Determination of the ivot value and broadcast it to all the rocessors void PivotDistribution (double ProcData, int ProcDataSize, int Dim, int Mask, int Iter, double Pivot) { MPI_Grou WorldGrou; MPI_Grou SubcubeGrou; // a grou of rocessors a subhyercube MPI_Comm SubcubeComm; // subhyercube communcator int j = 0; int GrouNum = ProcNum /(int)ow(, Dim-Iter); int ProcRanks = new int [GrouNum]; // Forming the list of ranks for the hyercube rocesses int StartProc = ProcRank GrouNum; if (StartProc < 0 ) StartProc = 0; int EndProc = (ProcRank + GrouNum; if (EndProc > ProcNum ) EndProc = ProcNum; for (int roc = StartProc; roc < EndProc; roc++) { if ((ProcRank & Mask)>>(Iter) == (roc & Mask)>>(Iter)) { ProcRanks[j++] = roc; // Creating the communicator for the subhyercube rocesses MPI_Comm_grou(MPI_COMM_WORLD, &WorldGrou); MPI_Grou_incl(WorldGrou, GrouNum, ProcRanks, &SubcubeGrou); MPI_Comm_create(MPI_COMM_WORLD, SubcubeGrou, &SubcubeComm); // Selecting the ivot element and seding it to the subhyercube rocesses if (ProcRank == ProcRanks[0]) Pivot = ProcData[(ProcDataSize)/]; MPI_Bcast ( Pivot, 1, MPI_DOUBLE, 0, SubcubeComm ); MPI_Grou_free(&SubcubeGrou); MPI_Comm_free(&SubcubeComm); delete [] ProcRanks; Comutational Exeriments Results The comutational exeriments for estimating the efficiency of the arallel variant of the HyerQuickSort method were carried out under the same conditions as the exeriments described reviously (see 10.3.). The results of the comutational exeriments are given in Table The exeriments were carried out with the use of and rocessors. The time is given in seconds. Table The results of the comutational exeriments for the arallel HyerQuickSort algorithm Number of elements Sequential algorithm Parallel algorithm rocessors rocessors Time Seedu Time Seedu 17

18 10, , , , , Seedu 1, , , , , , , , , , Number of elements elements 0000 elements elements 0000 elements elements Figure Seedu of the arallel HyerQuickSort algorithm The comarison of the exerimental execution time T and the theoretical estimation T from (10.1) is given in Table 10.9 and Figure Table The comarison of the exerimental and theoretical execution time for the arallel HyerQuickSort algorithm Data size T Parallel algorithm rocessors rocessors T T 10, , , , , T 18

19 0, , Time 0, , ,00000 Exeriment Model 0, , Number of elements Figure Exerimental and theoretical execution time for rocessors The Parallel Sorting by Regular Samling Parallel Comutational Scheme The algorithm of the Parallel Sorting by regular samling is also a generalization of the quick sort method (see, for instance, Quinn (00)). To sort data in accordance with this new variant of the quick sort algorithm the following four stages should be imlemented: In the first stage of the algorithm the blocks located on the rocessors are sorted. This oeration may be executed by each rocessor indeendently of the other rocessors by means of the original quick algorithm. Each rocessor then forms a set of elements of its blocks with the indices 0, m, m,,(-1)m, where m=n/ (this set can be considered as regular samles of the rocessor data block); In the second stage of the algorithm all the data sets, which have been formed on the rocessors, are accumulated on a rocessor and are merged in a single sorted set. Then the values of this set with the indices / 1, + / 1,...,( 1) / + + form a new set of the ivot elements, then this set is transmitted to all the rocessors being used. At the end of the stage each rocessor artitions its block into arts using the obtained set of the ivot values; In the third stage of the algorithm each rocessor sends the selected arts of its block to all the other rocessors. It is done in accordance with the enumeration order the art j, 0 j<, of each block is transmitted to the rocessor j; In the fourth stage of the algorithm each rocessor merges the obtained arts in a single sorted block. After the termination of the stage the initial data become sorted. Figure shows an examle of data sorting by means of the algorithm, which is described above. It should be noted that the number of rocessors for the given algorithm may be arbitrary. In this examle it is equal to 3. 19

20 1 stage 1: : 3: stage : 3 stage : : : : : stage 1: : 3: Figure The examle of executing the quick sorting algorithm by regular samling for 3 rocessors Efficiency Analysis Let us estimate the comlexity of this arallel method. Let n be the amount of the sorted data,, <n, denotes the number of the rocessors being used, and corresondingly n/ is the size of the data blocks on the rocessors. During the first stage of the algorithm each rocessor sorts its data block by means of the quick sort method. Thus, the duration of the oerations erformed is equal to the following: 1 T = n / )log ( n / ) τ, ( where τ is the execution time of the basic sorting oeration. During the second stage of the algorithm one of the rocessors accumulates the sets of elements from all the other rocessors and merges the obtained data (the total number of the elements is equal to ), and forms the set of -1 ivot elements. Then the rocessor transmits the set of ivot elements to the other rocessors. Taking into account all the above mentioned oerations we may determine the duration of the second stage as follows: [ T = α log + w( 1) / β ] + [ log τ ] + [ τ ] + [log ( α + w / β )] (the subexressions in square brackets corresond to the four above mentioned oerations); in this case, as reviously, α is the latency, β is the network bandwidth, and w is the size of the set element in bytes. During the third stage of the algorithm each rocessor divides its block with regards to the ivot elements into arts (the total number of the oerations for this urose may be limited by the value n/). Then all the rocessors transmit the formed arts of blocks to each other. The comlexity estimation of this communications in case of the hyercube network toology was considered in Section 3. It was shown that the execution of this oeration might be carried out in log stes. Each rocessor at each ste transmits and receives a message of (n/)/ elements. As a result, the comlexity of the third stage may be estimated as follows: 3 T = n / ) τ + log ( α + w( n / ) / β ). ( During the fourth stage each rocessors merges sorted arts in a single sorted block. The estimation of comlexity for the oerations was carried out in the course of the consideration of the second stage. Thus, the duration of the merge rocedure execution is as follows: T = ( n / )log τ. With regard to all the obtained relations the total execution time for the arallel sorting by regular samling may be estimated as follows: 0

Introduction to Parallel. Programming

Introduction to Parallel. Programming University of Nizhni Novgorod Faculty of Computational Mathematics & Cybernetics Introduction to Parallel Section 10. Programming Parallel Methods for Sorting Gergel V.P., Professor, D.Sc., Software Department

More information

Introduction to Parallel Algorithms

Introduction to Parallel Algorithms CS 1762 Fall, 2011 1 Introduction to Parallel Algorithms Introduction to Parallel Algorithms ECE 1762 Algorithms and Data Structures Fall Semester, 2011 1 Preliminaries Since the early 1990s, there has

More information

Complexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks

Complexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks Journal of Comuting and Information Technology - CIT 8, 2000, 1, 1 12 1 Comlexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks Eunice E. Santos Deartment of Electrical

More information

Efficient Parallel Hierarchical Clustering

Efficient Parallel Hierarchical Clustering Efficient Parallel Hierarchical Clustering Manoranjan Dash 1,SimonaPetrutiu, and Peter Scheuermann 1 Deartment of Information Systems, School of Comuter Engineering, Nanyang Technological University, Singaore

More information

COMP Parallel Computing. BSP (1) Bulk-Synchronous Processing Model

COMP Parallel Computing. BSP (1) Bulk-Synchronous Processing Model COMP 6 - Parallel Comuting Lecture 6 November, 8 Bulk-Synchronous essing Model Models of arallel comutation Shared-memory model Imlicit communication algorithm design and analysis relatively simle but

More information

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University Query Processing Shuigeng Zhou May 18, 2016 School of Comuter Science Fudan University Overview Outline Measures of Query Cost Selection Oeration Sorting Join Oeration Other Oerations Evaluation of Exressions

More information

A Model-Adaptable MOSFET Parameter Extraction System

A Model-Adaptable MOSFET Parameter Extraction System A Model-Adatable MOSFET Parameter Extraction System Masaki Kondo Hidetoshi Onodera Keikichi Tamaru Deartment of Electronics Faculty of Engineering, Kyoto University Kyoto 66-1, JAPAN Tel: +81-7-73-313

More information

EE678 Application Presentation Content Based Image Retrieval Using Wavelets

EE678 Application Presentation Content Based Image Retrieval Using Wavelets EE678 Alication Presentation Content Based Image Retrieval Using Wavelets Grou Members: Megha Pandey megha@ee. iitb.ac.in 02d07006 Gaurav Boob gb@ee.iitb.ac.in 02d07008 Abstract: We focus here on an effective

More information

Parallel Construction of Multidimensional Binary Search Trees. Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka

Parallel Construction of Multidimensional Binary Search Trees. Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka Parallel Construction of Multidimensional Binary Search Trees Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka School of CIS and School of CISE Northeast Parallel Architectures Center Syracuse

More information

CS2 Algorithms and Data Structures Note 8

CS2 Algorithms and Data Structures Note 8 CS2 Algorithms and Data Structures Note 8 Heasort and Quicksort We will see two more sorting algorithms in this lecture. The first, heasort, is very nice theoretically. It sorts an array with n items in

More information

CS2 Algorithms and Data Structures Note 8

CS2 Algorithms and Data Structures Note 8 CS2 Algorithms and Data Structures Note 8 Heasort and Quicksort We will see two more sorting algorithms in this lecture. The first, heasort, is very nice theoretically. It sorts an array with n items in

More information

Sensitivity Analysis for an Optimal Routing Policy in an Ad Hoc Wireless Network

Sensitivity Analysis for an Optimal Routing Policy in an Ad Hoc Wireless Network 1 Sensitivity Analysis for an Otimal Routing Policy in an Ad Hoc Wireless Network Tara Javidi and Demosthenis Teneketzis Deartment of Electrical Engineering and Comuter Science University of Michigan Ann

More information

SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation

SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation To aear in IEEE VLSI Test Symosium, 1997 SITFIRE: Scalable arallel Algorithms for Test Set artitioned Fault Simulation Dili Krishnaswamy y Elizabeth M. Rudnick y Janak H. atel y rithviraj Banerjee z y

More information

Collective communication: theory, practice, and experience

Collective communication: theory, practice, and experience CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Comutat.: Pract. Exer. 2007; 19:1749 1783 Published online 5 July 2007 in Wiley InterScience (www.interscience.wiley.com)..1206 Collective

More information

Space-efficient Region Filling in Raster Graphics

Space-efficient Region Filling in Raster Graphics "The Visual Comuter: An International Journal of Comuter Grahics" (submitted July 13, 1992; revised December 7, 1992; acceted in Aril 16, 1993) Sace-efficient Region Filling in Raster Grahics Dominik Henrich

More information

An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2

An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2 An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2 Mingliang Chen 1, Weiyao Lin 1*, Xiaozhen Zheng 2 1 Deartment of Electronic Engineering, Shanghai Jiao Tong University, China

More information

AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS. Ren Chen and Viktor K.

AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS. Ren Chen and Viktor K. inuts er clock cycle Streaming ermutation oututs er clock cycle AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS Ren Chen and Viktor K.

More information

Collective Communication: Theory, Practice, and Experience. FLAME Working Note #22

Collective Communication: Theory, Practice, and Experience. FLAME Working Note #22 Collective Communication: Theory, Practice, and Exerience FLAME Working Note # Ernie Chan Marcel Heimlich Avi Purkayastha Robert van de Geijn Setember, 6 Abstract We discuss the design and high-erformance

More information

Improved heuristics for the single machine scheduling problem with linear early and quadratic tardy penalties

Improved heuristics for the single machine scheduling problem with linear early and quadratic tardy penalties Imroved heuristics for the single machine scheduling roblem with linear early and quadratic tardy enalties Jorge M. S. Valente* LIAAD INESC Porto LA, Faculdade de Economia, Universidade do Porto Postal

More information

Brigham Young University Oregon State University. Abstract. In this paper we present a new parallel sorting algorithm which maximizes the overlap

Brigham Young University Oregon State University. Abstract. In this paper we present a new parallel sorting algorithm which maximizes the overlap Aeared in \Journal of Parallel and Distributed Comuting, July 1995 " Overlaing Comutations, Communications and I/O in Parallel Sorting y Mark J. Clement Michael J. Quinn Comuter Science Deartment Deartment

More information

An empirical analysis of loopy belief propagation in three topologies: grids, small-world networks and random graphs

An empirical analysis of loopy belief propagation in three topologies: grids, small-world networks and random graphs An emirical analysis of looy belief roagation in three toologies: grids, small-world networks and random grahs R. Santana, A. Mendiburu and J. A. Lozano Intelligent Systems Grou Deartment of Comuter Science

More information

arxiv: v1 [cs.mm] 18 Jan 2016

arxiv: v1 [cs.mm] 18 Jan 2016 Lossless Intra Coding in with 3-ta Filters Saeed R. Alvar a, Fatih Kamisli a a Deartment of Electrical and Electronics Engineering, Middle East Technical University, Turkey arxiv:1601.04473v1 [cs.mm] 18

More information

Optimization of Collective Communication Operations in MPICH

Optimization of Collective Communication Operations in MPICH To be ublished in the International Journal of High Performance Comuting Alications, 5. c Sage Publications. Otimization of Collective Communication Oerations in MPICH Rajeev Thakur Rolf Rabenseifner William

More information

Sensitivity of multi-product two-stage economic lotsizing models and their dependency on change-over and product cost ratio s

Sensitivity of multi-product two-stage economic lotsizing models and their dependency on change-over and product cost ratio s Sensitivity two stage EOQ model 1 Sensitivity of multi-roduct two-stage economic lotsizing models and their deendency on change-over and roduct cost ratio s Frank Van den broecke, El-Houssaine Aghezzaf,

More information

Lecture 2: Fixed-Radius Near Neighbors and Geometric Basics

Lecture 2: Fixed-Radius Near Neighbors and Geometric Basics structure arises in many alications of geometry. The dual structure, called a Delaunay triangulation also has many interesting roerties. Figure 3: Voronoi diagram and Delaunay triangulation. Search: Geometric

More information

Lecture 8: Orthogonal Range Searching

Lecture 8: Orthogonal Range Searching CPS234 Comutational Geometry Setember 22nd, 2005 Lecture 8: Orthogonal Range Searching Lecturer: Pankaj K. Agarwal Scribe: Mason F. Matthews 8.1 Range Searching The general roblem of range searching is

More information

Modified Bloom filter for high performance hybrid NoSQL systems

Modified Bloom filter for high performance hybrid NoSQL systems odified Bloom filter for high erformance hybrid NoSQL systems A.B.Vavrenyuk, N.P.Vasilyev, V.V.akarov, K.A.atyukhin,..Rovnyagin, A.A.Skitev National Research Nuclear University EPhI (oscow Engineering

More information

An improved algorithm for Hausdorff Voronoi diagram for non-crossing sets

An improved algorithm for Hausdorff Voronoi diagram for non-crossing sets An imroved algorithm for Hausdorff Voronoi diagram for non-crossing sets Frank Dehne, Anil Maheshwari and Ryan Taylor May 26, 2006 Abstract We resent an imroved algorithm for building a Hausdorff Voronoi

More information

OMNI: An Efficient Overlay Multicast. Infrastructure for Real-time Applications

OMNI: An Efficient Overlay Multicast. Infrastructure for Real-time Applications OMNI: An Efficient Overlay Multicast Infrastructure for Real-time Alications Suman Banerjee, Christoher Kommareddy, Koushik Kar, Bobby Bhattacharjee, Samir Khuller Abstract We consider an overlay architecture

More information

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism Erlin Yao, Mingyu Chen, Rui Wang, Wenli Zhang, Guangming Tan Key Laboratory of Comuter System and Architecture Institute

More information

A Novel Iris Segmentation Method for Hand-Held Capture Device

A Novel Iris Segmentation Method for Hand-Held Capture Device A Novel Iris Segmentation Method for Hand-Held Cature Device XiaoFu He and PengFei Shi Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200030, China {xfhe,

More information

A Parallel Algorithm for Constructing Obstacle-Avoiding Rectilinear Steiner Minimal Trees on Multi-Core Systems

A Parallel Algorithm for Constructing Obstacle-Avoiding Rectilinear Steiner Minimal Trees on Multi-Core Systems A Parallel Algorithm for Constructing Obstacle-Avoiding Rectilinear Steiner Minimal Trees on Multi-Core Systems Cheng-Yuan Chang and I-Lun Tseng Deartment of Comuter Science and Engineering Yuan Ze University,

More information

A Study of Protocols for Low-Latency Video Transport over the Internet

A Study of Protocols for Low-Latency Video Transport over the Internet A Study of Protocols for Low-Latency Video Transort over the Internet Ciro A. Noronha, Ph.D. Cobalt Digital Santa Clara, CA ciro.noronha@cobaltdigital.com Juliana W. Noronha University of California, Davis

More information

A Scalable Parallel Sorting Algorithm Using Exact Splitting

A Scalable Parallel Sorting Algorithm Using Exact Splitting A Scalable Parallel Sorting Algorithm Using Exact Slitting Christian Siebert 1,2 and Felix Wolf 1,2,3 1 German Research School for Simulation Sciences, 52062 Aachen, Germany 2 RWTH Aachen University, Comuter

More information

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model. U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 18 Professor Satish Rao Lecturer: Satish Rao Last revised Scribe so far: Satish Rao (following revious lecture notes quite closely. Lecture

More information

521493S Computer Graphics Exercise 3 (Chapters 6-8)

521493S Computer Graphics Exercise 3 (Chapters 6-8) 521493S Comuter Grahics Exercise 3 (Chaters 6-8) 1 Most grahics systems and APIs use the simle lighting and reflection models that we introduced for olygon rendering Describe the ways in which each of

More information

Cross products. p 2 p. p p1 p2. p 1. Line segments The convex combination of two distinct points p1 ( x1, such that for some real number with 0 1,

Cross products. p 2 p. p p1 p2. p 1. Line segments The convex combination of two distinct points p1 ( x1, such that for some real number with 0 1, CHAPTER 33 Comutational Geometry Is the branch of comuter science that studies algorithms for solving geometric roblems. Has alications in many fields, including comuter grahics robotics, VLSI design comuter

More information

Patterned Wafer Segmentation

Patterned Wafer Segmentation atterned Wafer Segmentation ierrick Bourgeat ab, Fabrice Meriaudeau b, Kenneth W. Tobin a, atrick Gorria b a Oak Ridge National Laboratory,.O.Box 2008, Oak Ridge, TN 37831-6011, USA b Le2i Laboratory Univ.of

More information

Topics. Lecture 4. IT Group Cluster2 (1/2) What is a cluster? IT Group Cluster2 (2/2) Important Commands / Queuing.

Topics. Lecture 4. IT Group Cluster2 (1/2) What is a cluster? IT Group Cluster2 (2/2) Important Commands / Queuing. Toics Our Cluster Lecture 4 MPI Programming (I) MPI Introduction Information inquery Broadcast / Reduce 1 2 What is a cluster? A cluster is a dedicated resource for running comutational tasks. A collection

More information

Distributed Estimation from Relative Measurements in Sensor Networks

Distributed Estimation from Relative Measurements in Sensor Networks Distributed Estimation from Relative Measurements in Sensor Networks #Prabir Barooah and João P. Hesanha Abstract We consider the roblem of estimating vectorvalued variables from noisy relative measurements.

More information

Simulating Ocean Currents. Simulating Galaxy Evolution

Simulating Ocean Currents. Simulating Galaxy Evolution Simulating Ocean Currents (a) Cross sections (b) Satial discretization of a cross section Model as two-dimensional grids Discretize in sace and time finer satial and temoral resolution => greater accuracy

More information

Visualization, Estimation and User-Modeling for Interactive Browsing of Image Libraries

Visualization, Estimation and User-Modeling for Interactive Browsing of Image Libraries Visualization, Estimation and User-Modeling for Interactive Browsing of Image Libraries Qi Tian, Baback Moghaddam 2 and Thomas S. Huang Beckman Institute, University of Illinois, Urbana-Chamaign, IL 680,

More information

PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS

PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS Kevin Miller, Vivian Lin, and Rui Zhang Grou ID: 5 1. INTRODUCTION The roblem we are trying to solve is redicting future links or recovering missing links

More information

Randomized Selection on the Hypercube 1

Randomized Selection on the Hypercube 1 Randomized Selection on the Hyercube 1 Sanguthevar Rajasekaran Det. of Com. and Info. Science and Engg. University of Florida Gainesville, FL 32611 ABSTRACT In this aer we resent randomized algorithms

More information

Randomized algorithms: Two examples and Yao s Minimax Principle

Randomized algorithms: Two examples and Yao s Minimax Principle Randomized algorithms: Two examles and Yao s Minimax Princile Maximum Satisfiability Consider the roblem Maximum Satisfiability (MAX-SAT). Bring your knowledge u-to-date on the Satisfiability roblem. Maximum

More information

Building Better Nurse Scheduling Algorithms

Building Better Nurse Scheduling Algorithms Building Better Nurse Scheduling Algorithms Annals of Oerations Research, 128, 159-177, 2004. Dr Uwe Aickelin Dr Paul White School of Comuter Science University of the West of England University of Nottingham

More information

Efficient stereo vision for obstacle detection and AGV Navigation

Efficient stereo vision for obstacle detection and AGV Navigation Efficient stereo vision for obstacle detection and AGV Navigation Rita Cucchiara, Emanuele Perini, Giuliano Pistoni Diartimento di Ingegneria dell informazione, University of Modena and Reggio Emilia,

More information

Source Coding and express these numbers in a binary system using M log

Source Coding and express these numbers in a binary system using M log Source Coding 30.1 Source Coding Introduction We have studied how to transmit digital bits over a radio channel. We also saw ways that we could code those bits to achieve error correction. Bandwidth is

More information

Communication-Avoiding Parallel Algorithms for Solving Triangular Matrix Equations

Communication-Avoiding Parallel Algorithms for Solving Triangular Matrix Equations Research Collection Bachelor Thesis Communication-Avoiding Parallel Algorithms for Solving Triangular Matrix Equations Author(s): Wicky, Tobias Publication Date: 2015 Permanent Link: htts://doi.org/10.3929/ethz-a-010686133

More information

Chapter 8: Adaptive Networks

Chapter 8: Adaptive Networks Chater : Adative Networks Introduction (.1) Architecture (.2) Backroagation for Feedforward Networks (.3) Jyh-Shing Roger Jang et al., Neuro-Fuzzy and Soft Comuting: A Comutational Aroach to Learning and

More information

[9] J. J. Dongarra, R. Hempel, A. J. G. Hey, and D. W. Walker, \A Proposal for a User-Level,

[9] J. J. Dongarra, R. Hempel, A. J. G. Hey, and D. W. Walker, \A Proposal for a User-Level, [9] J. J. Dongarra, R. Hemel, A. J. G. Hey, and D. W. Walker, \A Proosal for a User-Level, Message Passing Interface in a Distributed-Memory Environment," Tech. Re. TM-3, Oak Ridge National Laboratory,

More information

Pivot Selection for Dimension Reduction Using Annealing by Increasing Resampling *

Pivot Selection for Dimension Reduction Using Annealing by Increasing Resampling * ivot Selection for Dimension Reduction Using Annealing by Increasing Resamling * Yasunobu Imamura 1, Naoya Higuchi 1, Tetsuji Kuboyama 2, Kouichi Hirata 1 and Takeshi Shinohara 1 1 Kyushu Institute of

More information

TOPP Probing of Network Links with Large Independent Latencies

TOPP Probing of Network Links with Large Independent Latencies TOPP Probing of Network Links with Large Indeendent Latencies M. Hosseinour, M. J. Tunnicliffe Faculty of Comuting, Information ystems and Mathematics, Kingston University, Kingston-on-Thames, urrey, KT1

More information

IMS Network Deployment Cost Optimization Based on Flow-Based Traffic Model

IMS Network Deployment Cost Optimization Based on Flow-Based Traffic Model IMS Network Deloyment Cost Otimization Based on Flow-Based Traffic Model Jie Xiao, Changcheng Huang and James Yan Deartment of Systems and Comuter Engineering, Carleton University, Ottawa, Canada {jiexiao,

More information

CASCH - a Scheduling Algorithm for "High Level"-Synthesis

CASCH - a Scheduling Algorithm for High Level-Synthesis CASCH a Scheduling Algorithm for "High Level"Synthesis P. Gutberlet H. Krämer W. Rosenstiel Comuter Science Research Center at the University of Karlsruhe (FZI) HaidundNeuStr. 1014, 7500 Karlsruhe, F.R.G.

More information

AN ANALYTICAL MODEL DESCRIBING THE RELATIONSHIPS BETWEEN LOGIC ARCHITECTURE AND FPGA DENSITY

AN ANALYTICAL MODEL DESCRIBING THE RELATIONSHIPS BETWEEN LOGIC ARCHITECTURE AND FPGA DENSITY AN ANALYTICAL MODEL DESCRIBING THE RELATIONSHIPS BETWEEN LOGIC ARCHITECTURE AND FPGA DENSITY Andrew Lam 1, Steven J.E. Wilton 1, Phili Leong 2, Wayne Luk 3 1 Elec. and Com. Engineering 2 Comuter Science

More information

Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data

Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data Efficient Processing of To-k Dominating Queries on Multi-Dimensional Data Man Lung Yiu Deartment of Comuter Science Aalborg University DK-922 Aalborg, Denmark mly@cs.aau.dk Nikos Mamoulis Deartment of

More information

CMSC 425: Lecture 16 Motion Planning: Basic Concepts

CMSC 425: Lecture 16 Motion Planning: Basic Concepts : Lecture 16 Motion lanning: Basic Concets eading: Today s material comes from various sources, including AI Game rogramming Wisdom 2 by S. abin and lanning Algorithms by S. M. LaValle (Chats. 4 and 5).

More information

A GPU Heterogeneous Cluster Scheduling Model for Preventing Temperature Heat Island

A GPU Heterogeneous Cluster Scheduling Model for Preventing Temperature Heat Island A GPU Heterogeneous Cluster Scheduling Model for Preventing Temerature Heat Island Yun-Peng CAO 1,2,a and Hai-Feng WANG 1,2 1 School of Information Science and Engineering, Linyi University, Linyi Shandong,

More information

Learning Motion Patterns in Crowded Scenes Using Motion Flow Field

Learning Motion Patterns in Crowded Scenes Using Motion Flow Field Learning Motion Patterns in Crowded Scenes Using Motion Flow Field Min Hu, Saad Ali and Mubarak Shah Comuter Vision Lab, University of Central Florida {mhu,sali,shah}@eecs.ucf.edu Abstract Learning tyical

More information

RESEARCH ARTICLE. Simple Memory Machine Models for GPUs

RESEARCH ARTICLE. Simple Memory Machine Models for GPUs The International Journal of Parallel, Emergent and Distributed Systems Vol. 00, No. 00, Month 2011, 1 22 RESEARCH ARTICLE Simle Memory Machine Models for GPUs Koji Nakano a a Deartment of Information

More information

MATHEMATICAL MODELING OF COMPLEX MULTI-COMPONENT MOVEMENTS AND OPTICAL METHOD OF MEASUREMENT

MATHEMATICAL MODELING OF COMPLEX MULTI-COMPONENT MOVEMENTS AND OPTICAL METHOD OF MEASUREMENT MATHEMATICAL MODELING OF COMPLE MULTI-COMPONENT MOVEMENTS AND OPTICAL METHOD OF MEASUREMENT V.N. Nesterov JSC Samara Electromechanical Plant, Samara, Russia Abstract. The rovisions of the concet of a multi-comonent

More information

A Method to Determine End-Points ofstraight Lines Detected Using the Hough Transform

A Method to Determine End-Points ofstraight Lines Detected Using the Hough Transform RESEARCH ARTICLE OPEN ACCESS A Method to Detere End-Points ofstraight Lines Detected Using the Hough Transform Gideon Kanji Damaryam Federal University, Lokoja, PMB 1154, Lokoja, Nigeria. Abstract The

More information

An Efficient VLSI Architecture for Adaptive Rank Order Filter for Image Noise Removal

An Efficient VLSI Architecture for Adaptive Rank Order Filter for Image Noise Removal International Journal of Information and Electronics Engineering, Vol. 1, No. 1, July 011 An Efficient VLSI Architecture for Adative Rank Order Filter for Image Noise Removal M. C Hanumantharaju, M. Ravishankar,

More information

Ad Hoc Networks. Latency-minimizing data aggregation in wireless sensor networks under physical interference model

Ad Hoc Networks. Latency-minimizing data aggregation in wireless sensor networks under physical interference model Ad Hoc Networks (4) 5 68 Contents lists available at SciVerse ScienceDirect Ad Hoc Networks journal homeage: www.elsevier.com/locate/adhoc Latency-minimizing data aggregation in wireless sensor networks

More information

A Morphological LiDAR Points Cloud Filtering Method based on GPGPU

A Morphological LiDAR Points Cloud Filtering Method based on GPGPU A Morhological LiDAR Points Cloud Filtering Method based on GPGPU Shuo Li 1, Hui Wang 1, Qiuhe Ma 1 and Xuan Zha 2 1 Zhengzhou Institute of Surveying & Maing, No.66, Longhai Middle Road, Zhengzhou, China

More information

Interactive Image Segmentation

Interactive Image Segmentation Interactive Image Segmentation Fahim Mannan (260 266 294) Abstract This reort resents the roject work done based on Boykov and Jolly s interactive grah cuts based N-D image segmentation algorithm([1]).

More information

Source-to-Source Code Generation Based on Pattern Matching and Dynamic Programming

Source-to-Source Code Generation Based on Pattern Matching and Dynamic Programming Source-to-Source Code Generation Based on Pattern Matching and Dynamic Programming Weimin Chen, Volker Turau TR-93-047 August, 1993 Abstract This aer introduces a new technique for source-to-source code

More information

A DEA-bases Approach for Multi-objective Design of Attribute Acceptance Sampling Plans

A DEA-bases Approach for Multi-objective Design of Attribute Acceptance Sampling Plans Available online at htt://ijdea.srbiau.ac.ir Int. J. Data Enveloment Analysis (ISSN 2345-458X) Vol.5, No.2, Year 2017 Article ID IJDEA-00422, 12 ages Research Article International Journal of Data Enveloment

More information

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing Mikael Taveniku 2,3, Anders Åhlander 1,3, Magnus Jonsson 1 and Bertil Svensson 1,2

More information

Directed File Transfer Scheduling

Directed File Transfer Scheduling Directed File Transfer Scheduling Weizhen Mao Deartment of Comuter Science The College of William and Mary Williamsburg, Virginia 387-8795 wm@cs.wm.edu Abstract The file transfer scheduling roblem was

More information

Learning Lab 3: Parallel Methods of Solving the Linear Equation Systems

Learning Lab 3: Parallel Methods of Solving the Linear Equation Systems Learning Lab 3: Parallel Methods of Solving the Linear Equation Systems Lab Objective... Eercise State the Problem of Solving the Linear Equation Systems... 2 Eercise 2 - Studying the Gauss Algorithm for

More information

Support Vector Machines for Face Authentication

Support Vector Machines for Face Authentication Suort Vector Machines for Face Authentication K Jonsson 1 2, J Kittler 1,YPLi 1 and J Matas 1 2 1 CVSSP, University of Surrey Guildford, Surrey GU2 5XH, United Kingdom 2 CMP, Czech Technical University

More information

Convex Hulls. Helen Cameron. Helen Cameron Convex Hulls 1/101

Convex Hulls. Helen Cameron. Helen Cameron Convex Hulls 1/101 Convex Hulls Helen Cameron Helen Cameron Convex Hulls 1/101 What Is a Convex Hull? Starting Point: Points in 2D y x Helen Cameron Convex Hulls 3/101 Convex Hull: Informally Imagine that the x, y-lane is

More information

Phase Transitions in Interconnection Networks with Finite Buffers

Phase Transitions in Interconnection Networks with Finite Buffers Abstract Phase Transitions in Interconnection Networks with Finite Buffers Yelena Rykalova Boston University rykalova@bu.edu Lev Levitin Boston University levitin@bu.edu This aer resents theoretical models

More information

level 0 level 1 level 2 level 3

level 0 level 1 level 2 level 3 Communication-Ecient Deterministic Parallel Algorithms for Planar Point Location and 2d Voronoi Diagram? Mohamadou Diallo 1, Afonso Ferreira 2 and Andrew Rau-Chalin 3 1 LIMOS, IFMA, Camus des C zeaux,

More information

Assignment #3. Assignment #3. Assignment #3. What is a cluster? IT Group Cluster2 (1/2) IT Group Cluster2

Assignment #3. Assignment #3. Assignment #3. What is a cluster? IT Group Cluster2 (1/2) IT Group Cluster2 Assignment #3 Assignment #3 How to count FLOP? A = A + b * c 2 floating oint oerations for(int i=0;i

More information

Submission. Verifying Properties Using Sequential ATPG

Submission. Verifying Properties Using Sequential ATPG Verifying Proerties Using Sequential ATPG Jacob A. Abraham and Vivekananda M. Vedula Comuter Engineering Research Center The University of Texas at Austin Austin, TX 78712 jaa, vivek @cerc.utexas.edu Daniel

More information

A BICRITERION STEINER TREE PROBLEM ON GRAPH. Mirko VUJO[EVI], Milan STANOJEVI] 1. INTRODUCTION

A BICRITERION STEINER TREE PROBLEM ON GRAPH. Mirko VUJO[EVI], Milan STANOJEVI] 1. INTRODUCTION Yugoslav Journal of Oerations Research (00), umber, 5- A BICRITERIO STEIER TREE PROBLEM O GRAPH Mirko VUJO[EVI], Milan STAOJEVI] Laboratory for Oerational Research, Faculty of Organizational Sciences University

More information

Face Recognition Based on Wavelet Transform and Adaptive Local Binary Pattern

Face Recognition Based on Wavelet Transform and Adaptive Local Binary Pattern Face Recognition Based on Wavelet Transform and Adative Local Binary Pattern Abdallah Mohamed 1,2, and Roman Yamolskiy 1 1 Comuter Engineering and Comuter Science, University of Louisville, Louisville,

More information

Matlab Virtual Reality Simulations for optimizations and rapid prototyping of flexible lines systems

Matlab Virtual Reality Simulations for optimizations and rapid prototyping of flexible lines systems Matlab Virtual Reality Simulations for otimizations and raid rototying of flexible lines systems VAMVU PETRE, BARBU CAMELIA, POP MARIA Deartment of Automation, Comuters, Electrical Engineering and Energetics

More information

Tiling for Performance Tuning on Different Models of GPUs

Tiling for Performance Tuning on Different Models of GPUs Tiling for Performance Tuning on Different Models of GPUs Chang Xu Deartment of Information Engineering Zhejiang Business Technology Institute Ningbo, China colin.xu198@gmail.com Steven R. Kirk, Samantha

More information

split split (a) (b) split split (c) (d)

split split (a) (b) split split (c) (d) International Journal of Foundations of Comuter Science c World Scientic Publishing Comany ON COST-OPTIMAL MERGE OF TWO INTRANSITIVE SORTED SEQUENCES JIE WU Deartment of Comuter Science and Engineering

More information

CENTRAL AND PARALLEL PROJECTIONS OF REGULAR SURFACES: GEOMETRIC CONSTRUCTIONS USING 3D MODELING SOFTWARE

CENTRAL AND PARALLEL PROJECTIONS OF REGULAR SURFACES: GEOMETRIC CONSTRUCTIONS USING 3D MODELING SOFTWARE CENTRAL AND PARALLEL PROJECTIONS OF REGULAR SURFACES: GEOMETRIC CONSTRUCTIONS USING 3D MODELING SOFTWARE Petra Surynková Charles University in Prague, Faculty of Mathematics and Physics, Sokolovská 83,

More information

A CLASS OF STRUCTURED LDPC CODES WITH LARGE GIRTH

A CLASS OF STRUCTURED LDPC CODES WITH LARGE GIRTH A CLASS OF STRUCTURED LDPC CODES WITH LARGE GIRTH Jin Lu, José M. F. Moura, and Urs Niesen Deartment of Electrical and Comuter Engineering Carnegie Mellon University, Pittsburgh, PA 15213 jinlu, moura@ece.cmu.edu

More information

Using Rational Numbers and Parallel Computing to Efficiently Avoid Round-off Errors on Map Simplification

Using Rational Numbers and Parallel Computing to Efficiently Avoid Round-off Errors on Map Simplification Using Rational Numbers and Parallel Comuting to Efficiently Avoid Round-off Errors on Ma Simlification Maurício G. Grui 1, Salles V. G. de Magalhães 1,2, Marcus V. A. Andrade 1, W. Randolh Franklin 2,

More information

Skip List Based Authenticated Data Structure in DAS Paradigm

Skip List Based Authenticated Data Structure in DAS Paradigm 009 Eighth International Conference on Grid and Cooerative Comuting Ski List Based Authenticated Data Structure in DAS Paradigm Jieing Wang,, Xiaoyong Du,. Key Laboratory of Data Engineering and Knowledge

More information

Truth Trees. Truth Tree Fundamentals

Truth Trees. Truth Tree Fundamentals Truth Trees 1 True Tree Fundamentals 2 Testing Grous of Statements for Consistency 3 Testing Arguments in Proositional Logic 4 Proving Invalidity in Predicate Logic Answers to Selected Exercises Truth

More information

Face Recognition Using Legendre Moments

Face Recognition Using Legendre Moments Face Recognition Using Legendre Moments Dr.S.Annadurai 1 A.Saradha Professor & Head of CSE & IT Research scholar in CSE Government College of Technology, Government College of Technology, Coimbatore, Tamilnadu,

More information

Equality-Based Translation Validator for LLVM

Equality-Based Translation Validator for LLVM Equality-Based Translation Validator for LLVM Michael Ste, Ross Tate, and Sorin Lerner University of California, San Diego {mste,rtate,lerner@cs.ucsd.edu Abstract. We udated our Peggy tool, reviously resented

More information

Complexity analysis of matrix product on multicore architectures

Complexity analysis of matrix product on multicore architectures Comlexity analysis of matrix roduct on multicore architectures Mathias Jacquelin, Loris Marchal and Yves Robert École Normale Suérieure de Lyon, France {Mathias.Jacquelin Loris.Marchal Yves.Robert}@ens-lyon.fr

More information

GEOMETRIC CONSTRAINT SOLVING IN < 2 AND < 3. Department of Computer Sciences, Purdue University. and PAMELA J. VERMEER

GEOMETRIC CONSTRAINT SOLVING IN < 2 AND < 3. Department of Computer Sciences, Purdue University. and PAMELA J. VERMEER GEOMETRIC CONSTRAINT SOLVING IN < AND < 3 CHRISTOPH M. HOFFMANN Deartment of Comuter Sciences, Purdue University West Lafayette, Indiana 47907-1398, USA and PAMELA J. VERMEER Deartment of Comuter Sciences,

More information

12) United States Patent 10) Patent No.: US 6,321,328 B1

12) United States Patent 10) Patent No.: US 6,321,328 B1 USOO6321328B1 12) United States Patent 10) Patent No.: 9 9 Kar et al. (45) Date of Patent: Nov. 20, 2001 (54) PROCESSOR HAVING DATA FOR 5,961,615 10/1999 Zaid... 710/54 SPECULATIVE LOADS 6,006,317 * 12/1999

More information

Figure 8.1: Home age taken from the examle health education site (htt:// Setember 14, 2001). 201

Figure 8.1: Home age taken from the examle health education site (htt://  Setember 14, 2001). 201 200 Chater 8 Alying the Web Interface Profiles: Examle Web Site Assessment 8.1 Introduction This chater describes the use of the rofiles develoed in Chater 6 to assess and imrove the quality of an examle

More information

Auto-Tuning Distributed-Memory 3-Dimensional Fast Fourier Transforms on the Cray XT4

Auto-Tuning Distributed-Memory 3-Dimensional Fast Fourier Transforms on the Cray XT4 Auto-Tuning Distributed-Memory 3-Dimensional Fast Fourier Transforms on the Cray XT4 M. Gajbe a A. Canning, b L-W. Wang, b J. Shalf, b H. Wasserman, b and R. Vuduc, a a Georgia Institute of Technology,

More information

Fast Distributed Process Creation with the XMOS XS1 Architecture

Fast Distributed Process Creation with the XMOS XS1 Architecture Communicating Process Architectures 20 P.H. Welch et al. (Eds.) IOS Press, 20 c 20 The authors and IOS Press. All rights reserved. Fast Distributed Process Creation with the XMOS XS Architecture James

More information

Statistical Detection for Network Flooding Attacks

Statistical Detection for Network Flooding Attacks Statistical Detection for Network Flooding Attacks C. S. Chao, Y. S. Chen, and A.C. Liu Det. of Information Engineering, Feng Chia Univ., Taiwan 407, OC. Email: cschao@fcu.edu.tw Abstract In order to meet

More information

Contents 1 Introduction 2 2 Outline of the SAT Aroach Performance View Abstraction View

Contents 1 Introduction 2 2 Outline of the SAT Aroach Performance View Abstraction View Abstraction and Performance in the Design of Parallel Programs Der Fakultat fur Mathematik und Informatik der Universitat Passau vorgelegte Zusammenfassung der Veroentlichungen zur Erlangung der venia

More information

A Scalable Parallel Approach for Peptide Identification from Large-scale Mass Spectrometry Data

A Scalable Parallel Approach for Peptide Identification from Large-scale Mass Spectrometry Data 2009 International Conference on Parallel Processing Workshos A Scalable Parallel Aroach for Petide Identification from Large-scale Mass Sectrometry Data Gaurav Kulkarni, Ananth Kalyanaraman School of

More information

A NOVEL GEOMETRIC ALGORITHM FOR FAST WIRE-OPTIMIZED FLOORPLANNING

A NOVEL GEOMETRIC ALGORITHM FOR FAST WIRE-OPTIMIZED FLOORPLANNING A OVEL GEOMETRIC ALGORITHM FOR FAST WIRE-OPTIMIZED FLOORPLAIG Peter G. Sassone, Sung K. Lim School of Electrical and Comuter Engineering Georgia Institute of Technology Atlanta, Georgia 30332, U ABSTRACT

More information