An improved Thomas Algorithm for finite element matrix parallel computing

Size: px
Start display at page:

Download "An improved Thomas Algorithm for finite element matrix parallel computing"

Transcription

1 A improved Thomas Algorithm for fiite elemet matrix parallel computig Qigfeg Du 1a), Zogli Li 1b), Hogmei Zhag 2, Xili Lu 2, Liu Zhag 1 1) School of Software Egieerig, Togji Uivesity,Shaghai, , Chia 2) Research Istitute of Structural Egieerig ad Disaster Reductio, Togji Uiversity,Shaghai, , Chia) zhaghogmei@togji.edu.c ABSTRACT With the expasio of the scale of liear fiite elemet aalyzig data, efficiecy of computatio is a mai techical bottleeck. For solvig the bottleeck, parallel computatio is playig a icreasigly promiet role. At preset, the researches o liear fiite elemet parallel computatio are maily cocetrated o the preprocessig phase, ad goal of these researches is to reduce the commuicatio overhead ad improve the homogeeous degree. The researches o computatio phase dealig with liear fiite elemet aalyzig data are rare. However, most of computatio cost is i this phase. I this paper, we study ad aalyze stiffess matrix decompositio ad after comparig the differet matrix decompositio algorithms, a improved algorithm for parallel computatio based o Thomas algorithm is proposed. Verificatio by a large amout of data proves that the improved algorithm greatly ehaces the parallel performace of liear fiite elemet computatio. 1. INTRODUCTION Nowadays, liear fiite elemet method is widely used i large complex combiatio structure aalysis. With the growth of data scale, liear fiite elemet parallel computatio (Aath 2003) is playig a icreasigly importat role i egieerig field especially i the cocrete structure simulatio (Liu 2009, Lv 2011). At the momet, the research o liear fiite elemet parallel computatio is maily cocetrated o the pre-processig phase, ad goal of their researches is to reduce the commuicatio overhead ad improve the homogeeous degree (Maurer 2011, Paz 2005). Seldom research is coducted o computatio phase dealig with fiite elemet aalyzig data. Ad most of computatio cost is i this phase ad the computatio is maily i solvig large size liear equatios. Structural stiffess matrix, which is the coefficiet matrix of the equatios, is sigular, symmetrical ad sparse, with o-zero elemets spread o a * Correspodig author, Associate Professor, PhD, zhaghogmei@togji.edu.c ; du_cloud@togji.edu.c a Professor, PhD b Master Studet Note: Copied from the mauscript submitted to Computers ad Cocrete, A Iteratioal Joural for presetatio at ASEM13 Cogress 1469

2 stripe regio ad is usually a tridiagoal matrix (Turmo 2012). The most mature decompositio algorithm is Gaussia Elimiatio Algorithm, which is also called LU algorithm. For tridiagoal matrix, Thomas proposed chasig algorithm (Thomas algorithm) based o the LU algorithm (Turmo, J. 2012). This algorithm is very effective o solvig tridiagoal liear equatios, but it is ot suitable for parallel computatio (Paz. 2005). I order to solve the problem of parallel computatio of tridiagoal liear equatios, the origial Thomas algorithm o sigle processor wea alyzed here, ad the, a improved algorithm suitable for parallel computatio by describig the idea ad logic of the algorithm is proposed for complex structural matrix aalyze. To verify the efficiecy of the improved algorithm, the parallel method amed MPI (Message Passig Iterface) (Pacheco. 1997) was employed to test o various levels of specificity of data. Test result idicates that the improved algorithm ehaces the parallel performace of liear fiite elemet computatio sigificatly. 1. BACKGROUND AND RELEVANT KNOWLEDGE 1.1 The decompositio of the coefficiet matrixes of liear equatios ad Thomas Algorithm Assumig that is a large size liear equatios ad is the coefficiet matrix. The most mature decompositio algorithm is Gaussia Elimiatio Algorithm. This decompositio algorithm is divided ito two phases. Firstly, some algebraic operatios simplify ito upper triagular equatios, so ca be writte as ( is a uit upper triagular coefficiet matrix). The secod phase, backward substitutio method is used to solve the equatios. A improved method of the method is algorithm, that is the algorithm that decomposes matrix ad lower triagle matrix ad that is. is stored i the lower triagle of ad is stored i the upper triagular of (sice the diagoal elemets are ot stored, the default value is 1). Gaussia decompositio method is applicable to the geeral dese matrix. Cholesky decompositio algorithm is widely used to solve positive defiite matrix, beig see as a special case of algorithm ad more suitable to solve the symmetric positive defiite matrix. I the algorithm, the matrix is decomposed ito a product of a triagular matrix ad its traspose, i.e., ( is a upper triagular matrix). Obviously, the decompositio algorithm calculated amout is, ad the calculated amout is oly half of Gaussia decompositio method. For Tridiagoal matrix, Thomas proposed chasig algorithm (Thomas algorithm) based o the algorithm. The algorithm is very simple ad the calculated amout is oly times of multiplicatio ad divisio operatios. The algorithm is a umerically stable algorithm ad is a classical algorithm to solve tridiagoal liear equatios too. 1.2 Structural stiffess matrix A elemet of structural stiffess matrix meas: how much force should be exerted o ode whe the displacemet of ode is oe uit value while other odes are zero. The differece betwee elemet stiffess matrix ad structural stiffess matrix is that, structure is the collectio of uit ad every uit affects the structure. As a uit stiffess matrix is symmetrical ad sigular, the structural stiffess matrix that is 1470

3 itegrated by some uits is also symmetrical ad sigular. That is that costrait coditio of displacemet has to be give i order to remove the sigularity of, so that the displacemet of elemets ca be obtaied. The structural stiffess matrix is the collectio of uit stiffess matrixes. Although the total umber of elemets is more, ad the order of structural stiffess is high, most of the elemets are zero. So if the umber is reasoable, the o-zero elemets will spread o a stripe regio cetered i pricipal diagoal. I short, a structural stiffess matrix is sigular, symmetrical ad sparse, with ozero elemets spread o a stripe regio. With reasoable umberig, the matrix is positive defiite tridiagoal matrix. 1.3 Existig problems Fiite elemet aalysis is a very importat umerical aalysis method that has bee widely used i the field of egieerig ad scietific computig. However, large or very large complex structure aalysis usig fiite elemet aalysis method will result i the calculated amout icreases expoetially ad the usual strategy is to use a supercomputer to calculate. I recet years, the fiite elemet parallel computig researches draw the researchers' attetio, ad oe of the cocers is to improve the fiite elemet parallel computig algorithm to raise the efficiecy of large scale complex structural aalysis uder commo distributed parallel computig eviromet ad make it applicable to commo users. The fiite elemet distributed parallel computig ca be divided ito three stages: pre-processig, computatio ad post-processig. I the pre-processig stage, fiite elemet model is built, ad the uit grid is divided. Durig the post-processig, we aalyze the results to help users extract iformatio ad uderstad the calculated results. The computig cost is maily i the computatio stage. I fact, this majority of computatio is to solve large-scale liear equatios. The computatio process ivolves liear equatios coefficiet matrix algorithm, Thomas algorithm ad structural stiffess matrix. The key poit of our research is the stiffess matrix decompositio. This paper aalyzes the features of large-scale liear equatios coefficiet matrix ad the Thomas algorithm ad the we put forward a effective matrix decompositio strategy ad a improved Thomas algorithm based o the strategy, which is applicable to large complex structure aalysis ad suitable for parallel computatio. 2. THE IMPROVEMENT OF THOMAS ALGORITHM 2.1 The existig Thomas algorithm o a sigle process Algorithm itroductio Firstly, a sigle process Thomas algorithm is give ad assumig a coefficiet matrix A a positive tridiagoal matrix, that is: b1 c1 a2 b2 c 2 A = (1) a 1 b 1 c 1 a b 1471

4 Decompose A accordig to Crout, that is, L = (2) U = (3) a i, i ad i are udetermied coefficiet. By matrix multiplicatio, we ca get: b1 1, c1 1 1 ai i, bi i i1 ai, i2,3,, (4) ci i i, i 2,3,, 1 1 b1, 1 c1/ b1 i bi aii 1, i2,3,,, (5) i ci / i, i 2, 3,, 1, Therefore, the existig tridiagoal equatios are equivalet to the followig two equatios The logic of the algorithm Step 1: Iput data,,, Step 2: Computatio 1 b1, 1 c1/ b1 i bi aii 1, i2,3,,, i ci / i, i 2, 3,, 1, (6) Step 3: Solvig the equatios y f / b, Step 4: Solvig the equatios x y, i i i i1 i y f ay /, i 2,3,,, x y x, i1, 2,,1. i i i i 1 Step5:Output the solutio of equatios (7) (8) 1472

5 The process of calculatig ad is the process of forward sweep. The process of is the process of backward substitutio. As the formula of Thomas algorithm is very simple, its calculated amout of is times of multiplicatio ad divisio. Thomas algorithm is umerically stable algorithm, so it is widely used i serial processig tridiagoal equatios. 2.2 Improvemet of Thomas algorithm for parallel computig 2.2.1The existig storage strategy of large tridiagoal matrix Block Storage Strategy: Assume is a -order square matrix, ad is the umber of ode machies. For large fiite elemet aalysis, it is, uder ormal circumstaces. Assume ad block storage scheme is, the first ode machie stores the first lies, the secod ode machie for the secod lies, ad so o, ad the last ode machie store the last lies. If is ot a iteger multiple of, the rest of the rows is stored i first ode. As the frot lies of the matrix first complete the matrix decompositio, whe each lies is decomposed a idle machie is added, so the storage strategy is serious load imbalace. Sigle-lie Shutter Storage Strategy: Sigle lie shutter storage is that the lie is stored i the ode machie. For example, the first ode stores lie 1, lie ad lie, ad so o. Whe lie decompositio is completed, the first processor stops operatios. This strategy miimizes the load imbalace. However, the commuicatio overhead icreased sigificatly. Multi-lie Shutter Storage Strategy: Multi-lie shutter storage strategy takes the advatages of the above two. This algorithm is to decompose the matrix ito blocks by rows, each block cotais multiple lies ad the block stored ito the ode machie. Whe block decompositio is completed, the first processor will stop operatio. Sice each block cotais multiple lies, the commuicatio overhead is less tha the sigle lie shutter storage The existig Decompositio strategy ad mappig techology --- Cholesky decompositio The idea of the Cholesky Decompositio: b1 c1 B1 C1 a2 b2 c T 2 C1 B2 C2 T a 1 b 1 c 1 Cq 1 Bq 1 Cq 1 a b T Cq Bq Fig. 1 The left is the matrix before decompositio, the right is the matrix after decmpositio Fig. 1 is the two matrixes before decompositio ad after decompositio. is a m- order square matrix, ad oly the lower left quarter elemet is ot zero. is a m- order positive defiite tridiagoal matrix, ad it is decomposed Cholesky decompositio method:. is a diagoal matrix. is a uit lower triagular matrix. Matrix trasformatios are as follows. 1473

6 I Fig. 2, ad all its elemets, except for those i the last lie, are 0. As is a diagoal matrix, the elemets of the last lie of ca be elimiated to zero respectively except for the last elemet. ad are the matrixes after elimiatio. Thus, the elemets of the last lies of, as well as form ew threediagoal matrix equatios. The small three-diagoal matrix equatios ca be solved o a sigle processor, the the solutio is set to other processors. The origial problem ca be solved. D1 C1 T C1 D2 C2 C D C T Cp D T p1 p1 p1 Fig. 2 The left is the matrix before decompositio, the right is the matrix after decmpositio p Algorithm descriptio: Step 1: Solve, Make =. Step 2: Trasform. Step 3: Solve, Solve the small three-diagoal matrix equatios o a sigle processor. Step 4: Solve the origial problem. This algorithm has good parallelism, but the computatioal complexity is double that of the serial algorithm. Despite a icrease i parallelism, but the computatioal complexity reduces the efficiecy of the algorithm. Therefore, we would like to improve the idea of this algorithm to reduce computatioal complexity The improvemet of storage strategy (Improved Multilie Wrapped Iterleaved Row Storage IMWIRS) Large tridiagoal matrix belogs to large sparse matrixes, all elemets, except for those ear the diagoal, are zero. Storig those zero elemets ito the memory is a waste of memory resources. A ew algorithm is to trasform the matrix, ad oly store ozero elemets. This storage strategy is a improvemet of the multi-lie roller shutter storage method, which is specifically applicable to tridiagoal matrix. Assumig a liear equatios, is a tridiagoal matrix of orders. The origial storage method is to store ito a two-dimesioal array. The improved method oly eeds 3* memory space, which is oly 3/ of the origial oe. With the icrease of, the advatage becomes more obvious. 1474

7 b1 c1 a2 b2 c2 A = a b c a b 0 b1 c1 a2 b2 c 2 A ' = a1 b 1 c 1 a b Fig. 3 A is the origial storage method, which eeds array, ad A is the trasformed oe with oly memory space I fig. 3, i order to be clear, the correspodig elemets have the same label. I actual storage, we assume that ad respectively represet arrays ray before trasformatio ad after trasformatio. is the elemet of, ad is of.,,, Decompositio strategy ad mappig techology based o improved storage strategy Step1, decomposig a large tridiagoal matrix ito blocks; assumig that the matrix is a -order matrix, ad decomposig it ito m-order small square matrixes. Assumig q = / m ad there are processors, accordig to multi-lie shutter store strategy, the first processor stores the first m rows, the secod processor stores the secod m rows ad so o, the i-th processor stores m rows; there are rows. b1 c1 B1 C1 a2 b2 c 2 A2 B2 C2 a 1 b 1 c 1 Aq 1 Bq 1 Cq 1 a b Aq Bq Fig. 4 the left oe is the origial tridiagoal matrix, the right oe is the matrix after decompositio I Fig. 4, is a tridiagoal matrix; is a square matrix with oly the lower left corer elemet is o-zero; is a square matrix with oly the top right corer elemet is ozero., are all m-order square matrixes. Step2, storig, ito the i-th ode accordig to the storage method itroduced above. 1475

8 I Fig. 5, is the oe-zero elemet of ; is the oe-zero elemet of ; the remaiig elemets costitute tridiagoal matrix. a( i1) m1 b( i1) m1 c( i1) m1 a( i1) m2 b( i1) m2 c( i1) m2 aim 1 bim 1 cim 1 a b c im im im i i i A B C Fig. 5 The elemets ad blocks of the i-th ode a b c a b c a b c a b c ( i1) m1 ( i1) m1 ( i1) m1 ( i1) m2 ( i1) m2 ( i1) m2 im1 im1 im1 im im im Fig. 6 Showig how data is stored ito the i-th ode Step3, for each ode, solvig small tridiagoal matrix ad oly oe o-zero elemet matrixes, usig Thomas algorithm. The first ode do t have to deal with, ad the last ode do t have to solve Descriptio of Improved Algorithm Accordig to the aalysis above, we give the cocrete descriptio of the algorithm. The algorithm oly stores the elemets ear the diagoal therefore ca save memory overhead. I terms of decompositio strategy ad mappig techology, we decompose large-scale three-diagoal matrix ito small square matrixes; each processor eeds to solve oe small tridiagoal matrix ad two square matrixes that cotai oly oe elemet. The each processor will sed the results to the master processor to calculate the results of the origial problem. The algorithm is described below. Assumig the etwork topology of parallel computers is master-slave architecture ad the liear equatios are described as. Step1, Iputtig the order of the large stiffess matrix ito master computer. Assumig program decomposes the large matrix by lie ito blocks (that is slave odes), the accordig to multi-lie shutter storage strategy, master computer calculates how may lies each block cotais, that is. 1476

9 Step2, Iputtig the elemets of the large matrix ad vector. While iputtig the elemets, master computer assigs from the first lie to the lie ad vector to the first ode, the lie to the lie ad vector to the secod ode ad so o, the remaiig lies are assiged to the first ode. It's worth otig that whe assigig elemets, master ode oly processes the pricipal diagoal elemets. Step3, For each slave ode, allocatig a array ad the elemets assiged to it are stored i the array by IMWIRS storage strategy. Step4, For each slave ode, decomposig the elemets of the array ito three -order square matrix by logic; the first elemet of the array is regarded as the lower right elemet of, the last elemet is regarded as the top left elemet of ad the remaiig elemets is regarded as s; the mappig method refers to figure 4. Step5, For each slave ode, is calculated accordig to sigle process Thomas algorithm. Step6, For each ode, calculatig ad, which cotai oly oe elemet (The first elemet does ot eed to calculate, ad the last ode (p) do ot have to calculate ). Step7, For each ode, returig the computatio results to master computer, the master computer calculates the fial result, ad outputs the result vector Algorithm implemetatio usig pseudo-code 1477

10 1. master 2. procedure distributio() 3. Iput :iteger;// order of the coefficiet matrix 4. q:iteger;// umber of the blocks(sometimes equals to the umber of slave computers) 5. resultvector:array; // result vector of Ax = b 6. Begi 7. m=/q;// the umber of lies i each block 8. P curret=0;//mark the curret processed block 9. while(p curret<q){ //have ot distributed all blocks 10. for (j = 0; j < m; ++j) 11. for(i=0; i < ; ++i) 12. { 13. If( i-j <=1)oly the tridiagal elemet be set to slave computers 14. { 15. iput a[i,j];// a[i,j] is a elemet of the coefficiet matrix 16. distribute a[i,j]to ode computer P curret%p;//p is the total umber of slave computer 17. } 18. } 19. Distribute resultarray to ode Pcurret%P; 20. Pcurret++; 21. } 22. ed 23. Slave p 24. procedure store() 25. Iput a[i,j]//the coefficiet elemets which the master computer distribute to it 26. ew resultvector // result vector of Ax=b 27. Begi 28. ew coefficietarray // m*3 Array to store the elemets which the master computer distribute to it. 1478

11 29. for(i=1;i<=m;++i) 30. { 31. coefficietarray[i,0] = a[i,i-1]; 32. } 33. for(i=0;i<=m;i++i) 34. { 35. coefficietarray[i,1]=a[i,j]; 36. } 37. for(i=0;i<=m-1;++i) 38. { 39. coeefficietarray[i,2]=a[i,i+1]; 40. } 41. coefficietarray[0,0]=0;//the first elemet of the array 42. coefficietarray[m-1,2]=0;//the last elemet of the array 43. ew resultarray:array//*1 Array to store result 44. Copy resultvector to resultarray; 45. Ed 46. Slave computer p 47. procedure calculate()//thomas algorithm 48. Begi 49. coefficietarray[0,1] 50. coefficietarray[0,2]/ coefficietarray [0,1] 51. = coefficietarray[i,1]- coefficietarray[i,0] 52. = coefficietarray[i,2]/ (resultarray[1]- )/ sed to master 58. Ed 59. Master 60. prit 2.2.7Time ad Space Complexity Aalysis of Improved Algorithm The origial algorithm uses Gaussia elimiatio method ad its time complexity is O() ad space complexity is O( 2 ). Here, based o the above pseudo-code logic of improved algorithm, we aalyze the time ad space complexity of improved algorithm. Observig pseudo-code of lie 10 to lie 21 (matrix partitio), we ca see that it is double layer for loop. We kow that whe there are several loops, the time complexity of a algorithm is decided by the frequecy f() of the iermost statemet i the maximum loop estig. I the pseudocode, the maximum frequecy of statemets is lie 15 ad lie 16 with times ier loop ad m times outer loop. Accordig to lie 7, we kow that m=/q (q is the umber of computers), that is m ad is liear relatioship ad therefore the time complexity of this part of pseudo-code is T()=O(m)=O( 2 ). From lie 29 to lie 40 (matrix assigmet), there are three sigle layer for loop, the scale of each loop is m, m ad m+1, ad correspodig time complexity are O(m), O(m) ad O(m+1), that is their 1479

12 complexity are all O(). I short, the time complexity should be the maximum times of statemets executed withi the whole code; therefore the fial time complexity of improved algorithm is T()=O( 2 +3)=O( 2 ). As to space complexity, the mai space cost is matrix elemet storage. Accordig to algorithm descriptio, there are q slave ode computers ad each ode computer eeds to create a array of m 3; therefore the total space eeded is q m 3= 3, that is space complexity is S()=O(3)=O(). We ca see that the time complexity of improved algorithm does ot decrease compared with the origial algorithm (owig to matrix partitio icreasig the complexity). However, the advatage of improved algorithm are maily embodied i the high performace of parallel computig ad great alleviatio of space cost, especially for large size stiffess matrix, such as matrix data file is larger tha 500M, that is matrix order is more tha Through the verificatio below, we ca see that the improved algorithm could ehace the computig speed of large size stiffess matrix equatios greatly. 3. IMPROVED ALGORITHM VERIFICATION Here, we use the improved algorithm to solve tridiagoal liear equatios AX b to verify the efficiecy of the algorithm. We employ a parallel method amed MPI. MPI (a stadard, a model of message passig iterface, with a variety of implemetatios such as MPICH) is a tool to coect multiple hosts through etwork for parallel computig. We ca also utilize it for multicore or multi-cpu parallel computig o oe sigle machie but the efficiecy is poor. It ca coordiate several hosts together for parallel computig, ad therefore it has good scalability i parallel computig. However, commuicatio amog processes could also lead to the problems of large memory overhead, low parallel efficiecy as well as complexity i programig. I additio, we also cosider stimulatig parallel computig by employig OpeMP. OpeMP is desiged for parallel computig o sigle host with multiple CPUs or multiple cores. I other words, OpeMP is more suitable for parallel computig o sigle machie with shared memory ad sice threads for parallel computig could share memory, it is of high efficiecy ad low memory overhead. Yet OpeMP is oly available for parallel computig o sigle host rather tha cluster. I order to verify the high efficiecy of the improved algorithm ad use as much resource as possible durig the verificatio, we choose MPI, that is to apply multi hosts cooperatig together for parallel computig. For the tridiagoal liear equatios AX Y, we assig differet orders of coefficiet matrix by data scale, ad these orders of coefficiet matrix are 1 * 10 7,5 * 10 7,1 * 10 8, 5 * 10 8 ad 1 * For each order, we verify it by 1 processor, 2 processors, 4 processors, 8 processors ad 16 processors respectively. Whe the umber of processor is 1, it meas the origial serial algorithm. I additio, T(uit is secod) represets computatio time. S represets speedup ratio ad E represets parallel efficiecy. Results of verificatio are show from Table 1 to Table 5. I these tables, S = T 1 /T m, whe m = 1, T 1 is the origial serial computig time, ad whe m>1, T m is the parallel computig time usig differet umbers of processors. The parallel efficiecy E is equal to T m / m * T m. 1480

13 Table 1 verificatio result of a 1*10 7 order matrix m(processor T m (uit is S(S=T 1 /T m ) E(E=T 1 /m*t m ) amout) secod(s)) Table 2 verificatio result of a 5*10 7 order matrix m(processor T m (uit is S(S=T 1 /T m ) E(E=T 1 /m*t m ) amout) secod(s)) Table 3 verificatio result of a 1*10 8 order matrix m(processor T m (uit is S(S=T 1 /T m ) E(E=T 1 /m*t m ) amout) secod(s)) Table 4 verificatio result of a 5*10 8 order matrix m(processor T m (uit is S(S=T 1 /T m ) E(E=T 1 /m*t m ) amout) secod(s)) Table 5 verificatio result of a 1*10 9 order matrix m(processor T m (uit is S(S=T 1 /T m ) E(E=T 1 /m*t m ) amout) secod(s))

14 Based o the verificatio results show i Table 1 to Table 5, we ca see whe the order is same, with the icreasig of the umber of processors, the computig time decreases sigificatly; but, with the umber of processors icreasig, the decreasig extet of computig time is slowig dow. Verificatio maifests that with the icreasig of order of the stiffess matrix, the computig efficiecy of improved algorithm improves greatly. As the order of the matrix ad the umber of processors are chagig, the figures from figure 6 to figure8 show the relatioship amog order of matrix, umber of processor m, computig time T, speedup ratio S ad parallel efficiecy E. I Fig. 7, whe the umber of processors icreases from oe to two, the algorithm trasforms from serial computig to parallel computig, ad the computig time drops sharply, especially for large-scale matrix. Whe the scale of the matrix is larger, the less time it uses for parallel computig. O the other had, with the icreasig umber of processors, the computig-time is becomig closer. Because whe the umber of processors is large eough, such as m=16, the processig performace is high eough to deal with differet orders of matrix i a short time. I Fig. 8, with the icreasig umber of processors, the speedup S of differet scale matrix has a tred of liear icreasig, especially whe the umber of processors icreases from 1 to 2, speedup ehaces greatly; the whe m is more tha 2, speedup ratio icreases getly. I additio, we ca see that whe the umber of processors is the same, bigger-scale matrix has higher speedup tha smaller-scale matrix. It is because whe commuicatio cost is the same, the advatage of processig performace is more obvious whe the scale of matrix is large. Fig. 7 The treds of T(s) with the icreasig of m 1482

15 Matrix order Fig. 8 The treds of S with the icreasig of m I Fig. 9, with the icreasig umber of processors, the parallel efficiecy E of differet scale matrix has a tred of liear decreasig. It is because the more processors are, the greater the commuicatio cost is, which leads to the declie of parallel efficiecy. I additio, we ca see that whe the umber of processors is the same, bigger-scale matrix has higher parallel efficiecy tha smaller-scale matrix. The reaso for this is that the advatage of parallel performace couteracts the commuicatio cost whe the scale is great eough. It also reflect the efficiecy of the improved algorithm. Matrix order Fig. 9 The treds of E with the icreasig of m 4. CONCLUSIONS As you ca see, i this paper, we aalyze the origial Thomas algorithm o sigle processor first, ad we discuss the existig storage strategy of large tridiagoal matrix, existig decompositio strategy ad mappig techology. Ad the we propose our improved Thomas algorithm based o our aalysis. At last, we preset the pseudocode accordig to the idea of the improved algorithm ad verify the efficiecy of the improved algorithm by employig the parallel method MPI o various levels of specificity of data. The results of verificatio show the improved algorithm has good 1483

16 performace for liear fiite elemet parallel computig, ad it embodies i four aspects: Savig storage space. The space complexity of the improved algorithm is S()=O(3)=O() ad the space complexity of the origial algorithm is S()=O( 2 ). That is to say the space cost of improved algorithm is oly 3/ times of origial algorithm (Thomas Algorithm). Lower iteractio overhead ad computatio complexity. Iteractio overhead is small whe the umber of processors is i specific scope, ad the computatio complexity is far less tha cholesky algorithm. Parallel efficiecy decreases with too may processors. With the icreasig of the umber of processors, the speedup ratio icreases. But, whe the umber is over 16, the icreasig of speedup ratio lowers dow ad parallel efficiecy starts to declie because of the overhead of commuicatio icreasig. The efficiecy of parallel computatio icreases with the icreasig size of matrix. The performace of the improved algorithm is very high, especially for large size matrix because the commuicatio overhead amog differet computer odes could be overlooked whe the size of matrix is large eough. I future, we will do more experimets usig structure stiffess matrix i actual scees to verify or further improve our algorithm ad we will also apply our research results i egieerig field especially i the cocrete structure simulatio. ACKNOWLEDGMENTS The authors gratefully ackowledge the fiacial support provided by Kwag-Hua Fud for College of Civil Egieerig i Togji Uiversity, the Natioal Natural Sciece Foudatio of Chia(Grat No: , , ad ), Hog Kog, Macao ad Taiwa Sciece & Techology Cooperatio Program of Chia (2012DFH70130), the Fudametal Research Fuds of the Cetral Uiversities (2011QNA4016), Zhejiag Provicial Natural Sciece Foudatio of Chia (LR13E080001). The authors also thaks the hard work provided by Xuefei Zhou ad Xiaowei Zhou i this research program. REFERENCES Aath Grama, Ashul Gupta, George Karypis, et al. (2003), Itroductio to Parallel Computig, Beijig, Chia, Jue. C.Xavier,S.S (2004), IyegarItroductio to Parallel Algorithms, Beijig, Chia, Jue. Efediev, Y., Hou, T. Y. (2009), Multiscale fiite elemet methods, Applied ad Computatioal Mathematics, 217, 50. Ferziger, J. H., Perić, M. (1996), Computatioal methods for fluid dyamics (Vol. 3). Berli: Spriger. Hedrickso, B., Kolda, T. G. (2000), Graph partitioig models for parallel computig, Parallel Computig, 26(12), Kim, H.S., Wu, S.Z., Chag, L.W. (2001), A Scalable Tridiagoal Solver for GPUs[C] 2011 Iteratioal Coferece o Parallel Processig (ICPP),, 2011(9),

17 Law, K. H. (1986), A parallel fiite elemet solutio method. Computers & Structures, 23(6), Liu, W.J., Wag R.Q. (2009), Parallel Computig Based Fiite Elemet Aalysis st Iteratioal Coferece o Iformatio Sciece ad Egieerig (ICISE 2009), 2009, Lv H, Di R.H., Gog H., Li C.X. (2011), A MPI/OpeMP hybrid parallel oliear equatio solver used i fiite elemet aalysis Sixth Chia Grid Aual Coferece (Chia Grid), 2011, Maurer, D., Wieers, C. (2011), A parallel block LU decompositio method for distributed fiite elemet matrices. Parallel Computig, 2011, 37(12), Paz, C. N. M., Alves, J. L. D., Ebecke, N. F. F. (2005), Assessmet of computatioal performace for a vector parallel implemetatio: 3D probabilistic model discrete crackig i cocrete. Computers & Cocrete, 2(5), Pacheco, P. S. (1997), Parallel programmig with MPI, Morga Kaufma Pub. Su, W.Y., Du, Q.K., Che, J.R. (2007). Calculatio Method, Beijig, Chia, May. Takizawa, K., Tezduyar, T. E. (2012), Computatioal methods for parachute fluid structure iteractios, Archives of Computatioal Methods i Egieerig, 19(1), Turmo, J., Ramos, G., & Aparicio, A. C. (2012). Towards a model of dry shear keyed joits: modellig of pael tests, Computers & Cocrete, 10(5), Wag, X.B.,Zhog, Z.H. (2004), Geeralized Thomas algorithm for solvig cyclic tridiagoal equatios, Joural of Computer Mechaics, 104(2),

LU Decomposition Method

LU Decomposition Method SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS LU Decompositio Method Jamie Traha, Autar Kaw, Kevi Marti Uiversity of South Florida Uited States of America kaw@eg.usf.edu http://umericalmethods.eg.usf.edu Itroductio

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

1.2 Binomial Coefficients and Subsets

1.2 Binomial Coefficients and Subsets 1.2. BINOMIAL COEFFICIENTS AND SUBSETS 13 1.2 Biomial Coefficiets ad Subsets 1.2-1 The loop below is part of a program to determie the umber of triagles formed by poits i the plae. for i =1 to for j =

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c Iteratioal Coferece o Computatioal Sciece ad Egieerig (ICCSE 015) Harris Corer Detectio Algorithm at Sub-pixel Level ad Its Applicatio Yuafeg Ha a, Peijiag Che b * ad Tia Meg c School of Automobile, Liyi

More information

Mobile terminal 3D image reconstruction program development based on Android Lin Qinhua

Mobile terminal 3D image reconstruction program development based on Android Lin Qinhua Iteratioal Coferece o Automatio, Mechaical Cotrol ad Computatioal Egieerig (AMCCE 05) Mobile termial 3D image recostructio program developmet based o Adroid Li Qihua Sichua Iformatio Techology College

More information

Fuzzy Minimal Solution of Dual Fully Fuzzy Matrix Equations

Fuzzy Minimal Solution of Dual Fully Fuzzy Matrix Equations Iteratioal Coferece o Applied Mathematics, Simulatio ad Modellig (AMSM 2016) Fuzzy Miimal Solutio of Dual Fully Fuzzy Matrix Equatios Dequa Shag1 ad Xiaobi Guo2,* 1 Sciece Courses eachig Departmet, Gasu

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve Advaces i Computer, Sigals ad Systems (2018) 2: 19-25 Clausius Scietific Press, Caada Aalysis of Server Resource Cosumptio of Meteorological Satellite Applicatio System Based o Cotour Curve Xiagag Zhao

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network Available olie at www.sciecedirect.com Eergy Procedia 6 (202) 60 64 202 Iteratioal Coferece o Future Eergy, Eviromet, ad Materials Adaptive Resource Allocatio for Electric Evirometal Pollutio through the

More information

CS 683: Advanced Design and Analysis of Algorithms

CS 683: Advanced Design and Analysis of Algorithms CS 683: Advaced Desig ad Aalysis of Algorithms Lecture 6, February 1, 2008 Lecturer: Joh Hopcroft Scribes: Shaomei Wu, Etha Feldma February 7, 2008 1 Threshold for k CNF Satisfiability I the previous lecture,

More information

Optimization for framework design of new product introduction management system Ma Ying, Wu Hongcui

Optimization for framework design of new product introduction management system Ma Ying, Wu Hongcui 2d Iteratioal Coferece o Electrical, Computer Egieerig ad Electroics (ICECEE 2015) Optimizatio for framework desig of ew product itroductio maagemet system Ma Yig, Wu Hogcui Tiaji Electroic Iformatio Vocatioal

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

The identification of key quality characteristics based on FAHP

The identification of key quality characteristics based on FAHP Iteratioal Joural of Research i Egieerig ad Sciece (IJRES ISSN (Olie: 2320-9364, ISSN (Prit: 2320-9356 Volume 3 Issue 6 ǁ Jue 2015 ǁ PP.01-07 The idetificatio of ey quality characteristics based o FAHP

More information

Cubic Polynomial Curves with a Shape Parameter

Cubic Polynomial Curves with a Shape Parameter roceedigs of the th WSEAS Iteratioal Coferece o Robotics Cotrol ad Maufacturig Techology Hagzhou Chia April -8 00 (pp5-70) Cubic olyomial Curves with a Shape arameter MO GUOLIANG ZHAO YANAN Iformatio ad

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

The isoperimetric problem on the hypercube

The isoperimetric problem on the hypercube The isoperimetric problem o the hypercube Prepared by: Steve Butler November 2, 2005 1 The isoperimetric problem We will cosider the -dimesioal hypercube Q Recall that the hypercube Q is a graph whose

More information

A Note on Least-norm Solution of Global WireWarping

A Note on Least-norm Solution of Global WireWarping A Note o Least-orm Solutio of Global WireWarpig Charlie C. L. Wag Departmet of Mechaical ad Automatio Egieerig The Chiese Uiversity of Hog Kog Shati, N.T., Hog Kog E-mail: cwag@mae.cuhk.edu.hk Abstract

More information

ISSN (Print) Research Article. *Corresponding author Nengfa Hu

ISSN (Print) Research Article. *Corresponding author Nengfa Hu Scholars Joural of Egieerig ad Techology (SJET) Sch. J. Eg. Tech., 2016; 4(5):249-253 Scholars Academic ad Scietific Publisher (A Iteratioal Publisher for Academic ad Scietific Resources) www.saspublisher.com

More information

BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM

BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM MATEC Web of Cofereces 79, 01014 (016) DOI: 10.1051/ mateccof/0167901014 T 016 BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM Staislav Shidlovskiy 1, 1 Natioal Research

More information

Second-Order Domain Decomposition Method for Three-Dimensional Hyperbolic Problems

Second-Order Domain Decomposition Method for Three-Dimensional Hyperbolic Problems Iteratioal Mathematical Forum, Vol. 8, 013, o. 7, 311-317 Secod-Order Domai Decompositio Method for Three-Dimesioal Hyperbolic Problems Youbae Ju Departmet of Applied Mathematics Kumoh Natioal Istitute

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

THIN LAYER ORIENTED MAGNETOSTATIC CALCULATION MODULE FOR ELMER FEM, BASED ON THE METHOD OF THE MOMENTS. Roman Szewczyk

THIN LAYER ORIENTED MAGNETOSTATIC CALCULATION MODULE FOR ELMER FEM, BASED ON THE METHOD OF THE MOMENTS. Roman Szewczyk THIN LAYER ORIENTED MAGNETOSTATIC CALCULATION MODULE FOR ELMER FEM, BASED ON THE METHOD OF THE MOMENTS Roma Szewczyk Istitute of Metrology ad Biomedical Egieerig, Warsaw Uiversity of Techology E-mail:

More information

AN OPTIMIZATION NETWORK FOR MATRIX INVERSION

AN OPTIMIZATION NETWORK FOR MATRIX INVERSION 397 AN OPTIMIZATION NETWORK FOR MATRIX INVERSION Ju-Seog Jag, S~ Youg Lee, ad Sag-Yug Shi Korea Advaced Istitute of Sciece ad Techology, P.O. Box 150, Cheogryag, Seoul, Korea ABSTRACT Iverse matrix calculatio

More information

BOOLEAN MATHEMATICS: GENERAL THEORY

BOOLEAN MATHEMATICS: GENERAL THEORY CHAPTER 3 BOOLEAN MATHEMATICS: GENERAL THEORY 3.1 ISOMORPHIC PROPERTIES The ame Boolea Arithmetic was chose because it was discovered that literal Boolea Algebra could have a isomorphic umerical aspect.

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations Applied Mathematical Scieces, Vol. 1, 2007, o. 25, 1203-1215 A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045, Oe

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting)

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting) MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fittig) I this chapter, we will eamie some methods of aalysis ad data processig; data obtaied as a result of a give

More information

The measurement of overhead conductor s sag with DLT method

The measurement of overhead conductor s sag with DLT method Advaces i Egieerig Research (AER), volume 7 2d Aual Iteratioal Coferece o Electroics, Electrical Egieerig ad Iformatio Sciece (EEEIS 206) he measuremet of overhead coductor s sag with DL method Fag Ye,

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0 Polyomial Fuctios ad Models 1 Learig Objectives 1. Idetify polyomial fuctios ad their degree 2. Graph polyomial fuctios usig trasformatios 3. Idetify the real zeros of a polyomial fuctio ad their multiplicity

More information

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis IOSR Joural of Egieerig Redudacy Allocatio for Series Parallel Systems with Multiple Costraits ad Sesitivity Aalysis S. V. Suresh Babu, D.Maheswar 2, G. Ragaath 3 Y.Viaya Kumar d G.Sakaraiah e (Mechaical

More information

Solving Fuzzy Assignment Problem Using Fourier Elimination Method

Solving Fuzzy Assignment Problem Using Fourier Elimination Method Global Joural of Pure ad Applied Mathematics. ISSN 0973-768 Volume 3, Number 2 (207), pp. 453-462 Research Idia Publicatios http://www.ripublicatio.com Solvig Fuzzy Assigmet Problem Usig Fourier Elimiatio

More information

Dimensionality Reduction PCA

Dimensionality Reduction PCA Dimesioality Reductio PCA Machie Learig CSE446 David Wadde (slides provided by Carlos Guestri) Uiversity of Washigto Feb 22, 2017 Carlos Guestri 2005-2017 1 Dimesioality reductio Iput data may have thousads

More information

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme Improvig Iformatio Retrieval System Security via a Optimal Maximal Codig Scheme Dogyag Log Departmet of Computer Sciece, City Uiversity of Hog Kog, 8 Tat Chee Aveue Kowloo, Hog Kog SAR, PRC dylog@cs.cityu.edu.hk

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

Dynamic Programming and Curve Fitting Based Road Boundary Detection

Dynamic Programming and Curve Fitting Based Road Boundary Detection Dyamic Programmig ad Curve Fittig Based Road Boudary Detectio SHYAM PRASAD ADHIKARI, HYONGSUK KIM, Divisio of Electroics ad Iformatio Egieerig Chobuk Natioal Uiversity 664-4 Ga Deokji-Dog Jeoju-City Jeobuk

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

Study on effective detection method for specific data of large database LI Jin-feng

Study on effective detection method for specific data of large database LI Jin-feng Iteratioal Coferece o Automatio, Mechaical Cotrol ad Computatioal Egieerig (AMCCE 205) Study o effective detectio method for specific data of large database LI Ji-feg (Vocatioal College of DogYig, Shadog

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Analysis of Class Design Coupling Based on Information Entropy Di Jiang 1,2, a, Hua Zhou 1,2,b and Xingping Sun 1,2,c

Analysis of Class Design Coupling Based on Information Entropy Di Jiang 1,2, a, Hua Zhou 1,2,b and Xingping Sun 1,2,c Advaced Materials Research Olie: 2013-01-25 IN: 1662-8985, Vol. 659, pp 196-201 doi:10.4028/www.scietific.et/amr.659.196 2013 Tras Tech Publicatios, witzerlad Aalysis of Class Desig Couplig Based o Iformatio

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

Alpha Individual Solutions MAΘ National Convention 2013

Alpha Individual Solutions MAΘ National Convention 2013 Alpha Idividual Solutios MAΘ Natioal Covetio 0 Aswers:. D. A. C 4. D 5. C 6. B 7. A 8. C 9. D 0. B. B. A. D 4. C 5. A 6. C 7. B 8. A 9. A 0. C. E. B. D 4. C 5. A 6. D 7. B 8. C 9. D 0. B TB. 570 TB. 5

More information

Optimal Mapped Mesh on the Circle

Optimal Mapped Mesh on the Circle Koferece ANSYS 009 Optimal Mapped Mesh o the Circle doc. Ig. Jaroslav Štigler, Ph.D. Bro Uiversity of Techology, aculty of Mechaical gieerig, ergy Istitut, Abstract: This paper brigs out some ideas ad

More information

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c Advaces i Egieerig Research (AER), volume 131 3rd Aual Iteratioal Coferece o Electroics, Electrical Egieerig ad Iformatio Sciece (EEEIS 2017) Pruig ad Summarizig the Discovered Time Series Associatio Rules

More information

Analysis of Algorithms

Analysis of Algorithms Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Aalysis of Algorithms Iput 2015 Goodrich ad Tamassia Algorithm Aalysis of Algorithms

More information

RESEARCH ON AUTOMATIC INSPECTION TECHNIQUE OF REAL-TIME RADIOGRAPHY FOR TURBINE-BLADE

RESEARCH ON AUTOMATIC INSPECTION TECHNIQUE OF REAL-TIME RADIOGRAPHY FOR TURBINE-BLADE RESEARCH ON AUTOMATIC INSPECTION TECHNIQUE OF REAL-TIME RADIOGRAPHY FOR TURBINE-BLADE Z.G. Zhou, S. Zhao, ad Z.G. A School of Mechaical Egieerig ad Automatio, Beijig Uiversity of Aeroautics ad Astroautics,

More information

x x 2 x Iput layer = quatity of classificatio mode X T = traspositio matrix The core of such coditioal probability estimatig method is calculatig the

x x 2 x Iput layer = quatity of classificatio mode X T = traspositio matrix The core of such coditioal probability estimatig method is calculatig the COMPARATIVE RESEARCHES ON PROBABILISTIC NEURAL NETWORKS AND MULTI-LAYER PERCEPTRON NETWORKS FOR REMOTE SENSING IMAGE SEGMENTATION Liu Gag a, b, * a School of Electroic Iformatio, Wuha Uiversity, 430079,

More information

PARALLEL AND DISTRIBUTED MULTI-ALGORITHM CIRCUIT SIMULATION. A Thesis RUICHENG DAI

PARALLEL AND DISTRIBUTED MULTI-ALGORITHM CIRCUIT SIMULATION. A Thesis RUICHENG DAI PARALLEL AND DISTRIBUTED MULTI-ALGORITHM CIRCUIT SIMULATION A Thesis by RUICHENG DAI Submitted to the Office of Graduate Studies of Texas A&M Uiversity i partial fulfillmet of the requiremets for the degree

More information

Introduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition

Introduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition Lecture Goals Itroductio to Computig Systems: From Bits ad Gates to C ad Beyod 2 d Editio Yale N. Patt Sajay J. Patel Origial slides from Gregory Byrd, North Carolia State Uiversity Modified slides by

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

Image Segmentation EEE 508

Image Segmentation EEE 508 Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.

More information

A New Bit Wise Technique for 3-Partitioning Algorithm

A New Bit Wise Technique for 3-Partitioning Algorithm Special Issue of Iteratioal Joural of Computer Applicatios (0975 8887) o Optimizatio ad O-chip Commuicatio, No.1. Feb.2012, ww.ijcaolie.org A New Bit Wise Techique for 3-Partitioig Algorithm Rajumar Jai

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

An Algorithm to Solve Multi-Objective Assignment. Problem Using Interactive Fuzzy. Goal Programming Approach

An Algorithm to Solve Multi-Objective Assignment. Problem Using Interactive Fuzzy. Goal Programming Approach It. J. Cotemp. Math. Scieces, Vol. 6, 0, o. 34, 65-66 A Algorm to Solve Multi-Objective Assigmet Problem Usig Iteractive Fuzzy Goal Programmig Approach P. K. De ad Bharti Yadav Departmet of Mathematics

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

An Estimation of Distribution Algorithm for solving the Knapsack problem

An Estimation of Distribution Algorithm for solving the Knapsack problem Vol.4,No.5, 214 Published olie: May 25, 214 DOI: 1.7321/jscse.v4.5.1 A Estimatio of Distributio Algorithm for solvig the Kapsack problem 1 Ricardo Pérez, 2 S. Jös, 3 Arturo Herádez, 4 Carlos A. Ochoa *1,

More information

Stone Images Retrieval Based on Color Histogram

Stone Images Retrieval Based on Color Histogram Stoe Images Retrieval Based o Color Histogram Qiag Zhao, Jie Yag, Jigyi Yag, Hogxig Liu School of Iformatio Egieerig, Wuha Uiversity of Techology Wuha, Chia Abstract Stoe images color features are chose

More information

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits Egieerig Letters, :, EL Reversible Realizatio of Quaterary Decoder, Multiplexer, ad Demultiplexer Circuits Mozammel H.. Kha, Member, ENG bstract quaterary reversible circuit is more compact tha the correspodig

More information

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III GE2112 - FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III PROBLEM SOLVING AND OFFICE APPLICATION SOFTWARE Plaig the Computer Program Purpose Algorithm Flow Charts Pseudocode -Applicatio Software Packages-

More information

GPUMP: a Multiple-Precision Integer Library for GPUs

GPUMP: a Multiple-Precision Integer Library for GPUs GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract

More information

condition w i B i S maximum u i

condition w i B i S maximum u i ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility

More information

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article Available olie www.jocpr.com Joural of Chemical ad Pharmaceutical Research, 2013, 5(12):745-749 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 K-meas algorithm i the optimal iitial cetroids based

More information

arxiv: v2 [cs.ds] 24 Mar 2018

arxiv: v2 [cs.ds] 24 Mar 2018 Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves

More information

A Reinforced Hungarian Algorithm for Task Allocation in Global Software Development

A Reinforced Hungarian Algorithm for Task Allocation in Global Software Development A Reiforced Hugaria Algorithm for Task Allocatio i Global Software Developmet Xiao Yu State Key Lab. of Software Egieerig, Computer School, Wuha Uiversity, Wuha, Chia xiaoyu_whu@yahoo.com Ma Wu School

More information

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING Y.K. Patil* Iteratioal Joural of Advaced Research i ISSN: 2278-6244 IT ad Egieerig Impact Factor: 4.54 HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING Prof. V.S. Nadedkar** Abstract: Documet clusterig is

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

A Novel Feature Extraction Algorithm for Haar Local Binary Pattern Texture Based on Human Vision System

A Novel Feature Extraction Algorithm for Haar Local Binary Pattern Texture Based on Human Vision System A Novel Feature Extractio Algorithm for Haar Local Biary Patter Texture Based o Huma Visio System Liu Tao 1,* 1 Departmet of Electroic Egieerig Shaaxi Eergy Istitute Xiayag, Shaaxi, Chia Abstract The locality

More information

World Scientific Research Journal (WSRJ) ISSN: Research on Fresnel Lens Optical Receiving Antenna in Indoor Visible

World Scientific Research Journal (WSRJ) ISSN: Research on Fresnel Lens Optical Receiving Antenna in Indoor Visible World Scietific Research Joural (WSRJ) ISSN: 2472-3703 www.wsr-j.org Research o Fresel Les Optical Receivig Atea i Idoor Visible Light Commuicatio Zhihua Du College of Electroics Egieerig, Chogqig Uiversity

More information

AMS subject classifications. 65F05 Direct methods for linear systems and matrix inversion

AMS subject classifications. 65F05 Direct methods for linear systems and matrix inversion EFFICIENT SOLUTION OF A x (k) = b (k) USING A 1 ADI DITKOWSKI, GADI FIBICH, AND NIR GAVISH Abstract. I this work we cosider the problem of solvig Ax (k) = b (k), k = 1,..., K where b (k+1) = f(x (k) ).

More information

Enhancing Efficiency of Software Fault Tolerance Techniques in Satellite Motion System

Enhancing Efficiency of Software Fault Tolerance Techniques in Satellite Motion System Joural of Iformatio Systems ad Telecommuicatio, Vol. 2, No. 3, July-September 2014 173 Ehacig Efficiecy of Software Fault Tolerace Techiques i Satellite Motio System Hoda Baki Departmet of Electrical ad

More information

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA Creatig Exact Bezier Represetatios of CST Shapes David D. Marshall Califoria Polytechic State Uiversity, Sa Luis Obispo, CA 93407-035, USA The paper presets a method of expressig CST shapes pioeered by

More information

Supercomputer (eg. IBM SP or Cray T3D) Supercomputer (eg. IBM SP or Cray T3D) Network. Supercomputer (eg. IBM SP or Cray T3D) Cluster of Workstations

Supercomputer (eg. IBM SP or Cray T3D) Supercomputer (eg. IBM SP or Cray T3D) Network. Supercomputer (eg. IBM SP or Cray T3D) Cluster of Workstations Mesh artitioig for Distributed Systems: Explorig Optimal Number of artitios with Local ad Remote Commuicatio Jia Che Valerie E. Taylor Departmet of Electrical ad Computer Egieerig Northwester Uiversity

More information

Eigenimages. Digital Image Processing: Bernd Girod, Stanford University -- Eigenimages 1

Eigenimages. Digital Image Processing: Bernd Girod, Stanford University -- Eigenimages 1 Eigeimages Uitary trasforms Karhue-Loève trasform ad eigeimages Sirovich ad Kirby method Eigefaces for geder recogitio Fisher liear discrimat aalysis Fisherimages ad varyig illumiatio Fisherfaces vs. eigefaces

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

A Boolean Query Processing with a Result Cache in Mediator Systems

A Boolean Query Processing with a Result Cache in Mediator Systems A Boolea Query Processig with a Result Cache i Mediator Systems Jae-heo Cheog ad Sag-goo Lee * Departmet of Computer Sciece Seoul Natioal Uiversity Sa 56-1 Shillim-dog Kwaak-gu, Seoul Korea {cjh, sglee}cygus.su.ac.kr

More information

Numerical Methods Lecture 6 - Curve Fitting Techniques

Numerical Methods Lecture 6 - Curve Fitting Techniques Numerical Methods Lecture 6 - Curve Fittig Techiques Topics motivatio iterpolatio liear regressio higher order polyomial form expoetial form Curve fittig - motivatio For root fidig, we used a give fuctio

More information

2D Isogeometric Shape Optimization considering both control point positions and weights as design variables

2D Isogeometric Shape Optimization considering both control point positions and weights as design variables 1 th World Cogress o tructural ad Multidiscipliary Optimizatio May 19-24, 213, Orlado, Florida, UA 2D Isogeometric hape Optimizatio cosiderig both cotrol poit positios ad weights as desig variables Yeo-Ul

More information

The Counterchanged Crossed Cube Interconnection Network and Its Topology Properties

The Counterchanged Crossed Cube Interconnection Network and Its Topology Properties WSEAS TRANSACTIONS o COMMUNICATIONS Wag Xiyag The Couterchaged Crossed Cube Itercoectio Network ad Its Topology Properties WANG XINYANG School of Computer Sciece ad Egieerig South Chia Uiversity of Techology

More information

Appendix A. Use of Operators in ARPS

Appendix A. Use of Operators in ARPS A Appedix A. Use of Operators i ARPS The methodology for solvig the equatios of hydrodyamics i either differetial or itegral form usig grid-poit techiques (fiite differece, fiite volume, fiite elemet)

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea FPGA IMPLEMENTATION OF BASE-N LOGARITHM Salvador E. Tropea Electróica e Iformática Istituto Nacioal de Tecología Idustrial Bueos Aires, Argetia email: salvador@iti.gov.ar ABSTRACT I this work, we preset

More information