Using SIMD Registers and Instructions to Enable Instruction-Level Parallelism in Sorting Algorithms

Size: px
Start display at page:

Download "Using SIMD Registers and Instructions to Enable Instruction-Level Parallelism in Sorting Algorithms"

Transcription

1 Using SIMD Registers n Instrutions to Enle Instrution-Level Prllelism in Sorting Algorithms Timothy Furtk furtk@s.ulert. José Nelson Amrl mrl@s.ulert. Roert Niewiomski niewio@s.ulert. Deprtment of Computing Siene University of Alert, Emonton, AB, Cn ABSTRACT Most ontemporry proessors offer some version of Single Instrution Multiple Dt (SIMD) mhinery vetor registers n instrutions to mnipulte t store in suh registers. The entrl ie of this pper is to use these SIMD resoures to improve the performne of the til of reursive sorting lgorithms. When the numer of elements to e sorte rehes set threshol, t is loe into the vetor registers, mnipulte in-register, n the result store k to memory. Three implementtions of sorting with two ifferent SIMD mhineries x86-64 s SSE2 n G5 s AltiVe emonstrte tht this ie elivers signifint spee improvements. The improvements provie re orthogonl to the gins otine through empiril serh for suitle sorting lgorithm [11]. When integrte with the Dynmilly Tune Sorting Lirry (DTSL) this new oe genertion strtegy reues the time spent y DTSL up to 22% for moertely-size rrys, with greter reltive reutions for smll rrys. Wll-lok performne of -heps is improve y up to 39% using similr tehnique. Ctegories n Sujet Desriptors C.1.2 [Proessor Arhitetures]: Multiple Dt Strem Arhitetures (Multiproessors) Single-instrution-strem, multiple-t-strem proessors (SIMD) Generl Terms Algorithms, Performne Keywors Quiksort, Sorting, Sorting Networks, SIMD, Instrution- Level Prllelism, Vetoriztion. 1. INTRODUCTION This pper resses the utomti genertion of effiient oe to sort short sequenes of vlues. The ie is tht Permission to mke igitl or hr opies of ll or prt of this work for personl or lssroom use is grnte without fee provie tht opies re not me or istriute for profit or ommeril vntge n tht opies er this notie n the full ittion on the first pge. To opy otherwise, to repulish, to post on servers or to reistriute to lists, requires prior speifi permission n/or fee. SPAA 07, June 9 11, 2007, Sn Diego, Cliforni, USA. Copyright 2007 ACM /07/ $5.00. n he-of-time optimizer serhes for fst oe for severl sequene lengths n mhine onfigurtions. Then the ompiler n simply instntite suh oe when generting n optimize lirry. While lgorithm-speifi optimiztions n empiril serh hve long een use oth for sientifi omputtion n for lrge prllel mhines [4, 5, 19, 21], only reently these tehniques were pplie to integer-intensive, symoli, omputtion. Li et l. evelope the Dynmilly Tune Sorting Lirry tht pts to the hrteristis of the input to e sorte [11]. The min ontriution of this pper is the insight tht the resoures implemente in ontemporry proessors to enle SIMD omputtions n e put to goo use to improve the performne of sorting short sequenes. As emonstrte in this work the effetive use of these SIMD resoures improves performne through the reution of memory referenes n inrese in instrution level prllelism. The initil inspirtion for this work ws the nee for fst sorting of short sequenes in the implementtion of grphis renering in intertive vieo-gme pplitions. In suh pplitions it is often neessry to eie, for eh pixel of the imge, wht is the orer of the elements tht shoul e isplye [2]. Even though Z-uffer pixel-orering omputtions re typilly hnle y speilize Grphis Proessing Unit (GPU), there re plenty of similr orering omputtions tht re one y the Centrl Proessing Unit (CPU) in omputer gmes. For instne, sorting is use to hrterize the intensity of the vrious light soures tht illuminte hrter. Moreover, ontemporry vieo-gme pplition hve t their isposl rih supply of SIMD registers n instrutions. For exmple, the PowerPC-se XBox 360 hrwre fetures 128 AltiVe registers on eh of its three ores long with n expne set of AltiVe instrutions. In ition to intertive vieo-gme pplitions, sorting of short sequenes is lso present in prtiulephysis simultion pplitions. Thus, using SIMD registers n instrutions to sort smll sequenes is nturl. One solution ws rete, pplying it to the sequenes tht must e sorte t the til-en of stnr reursive sorting lgorithms ws the next logil step. The experimentl evlution of the new vetorregister-se sorting lgorithms presente in this pper use ommoity proessors (x86-64 n G5) n extensions to the DTSL lirry euse these mhines n lgorithms re more reily ville n exploitle thn proprietry vieo-gme hrwre n softwre. The lgorithms presente re effetive for sorting short sequenes of flotingpoint or integer vlues (keys), n pirs omprise of key

2 n memory ress, i.e. key-pointer pirs, s well s omputing the inex of minimum (mximum) element. Three new SIMD-se lgorithms use the onept of sorting networks tht re effetive to sort smll sets of numers. Setion 2 esries: (1) the opertion of stnr sorting networks; (2) how the SIMD vetors n e use to implement sorting networks; n (3) how oe genertor n instntite optimize vetor oe for sorting networks operting in sequenes of ny length. The min ontriutions in this pper re: three lgorithms tht use the SIMD mhinery of ontemporry proessors for effiient in-register sorting of short sequenes n their integrtion into n optimize generl-purpose sorting lirry; metho to use itertive-eepening serh to fin fst instrution sequenes to move t within the SIMD registers; metho to ompute the minimum element in n rry, with pplitions to -heps; n n extensive experimentl stuy in three ifferent proessors tht emonstrte up to 22% improvement in the performne of DTSL for moertely-size rry, n up to 39% in -heps. This stuy lso inites tht the elimintion of los, stores, rnhes, n rnh mispreitions orreltes well with the improve performne. Setion 3 esries two lgorithms tht omine firstpss sorting in the SIMD registers with seon-pss sorting in memory. Setion 4 esries n lgorithm tht sorts shorter sequenes ompletely within the SIMD registers, thus eliminting rnh instrutions ltogether. Setion 5 esries how to exten these key-sorting lgorithms to sort key-pointer pirs, n Setion 6 uses similr tehniques to spee up hepify-own opertions in -heps. The experimentl evlution is presente in Setion SORTING NETWORKS The inputs to n in-ple omprtor, COMP(, ), re two storge units memory lotions, registers, or vetorregister elements n, eh ontining numeril input. After the omprtor exeutes, the lower numeril vlue is store in n the higher numeril vlue is store in. Knuth esries omprtor network s evie tht pplies fixe sequene of omprtor opertors to n input vetor of given size [8]. When omprtor network proues sorte output for ny possile input sequene, it is lle sorting network. The size of sorting network is the totl numer of omprtors in the network. The epth of sorting network is the length of the ritil pth in its epenene grph. Therefore the epth provies oun for the prllel exeution of the sorting network, while the size provies oun for sequentil exeution. An exmple of sorting network with size 5 n epth 3 is shown in Fig. 1. The network is epite s set of vluerrying vertil rils n omprtors. Vlues flow from top to ottom. A hevy ot t line rossing inites tht the vlue t the vertil ril is n input to the omprtor represente y the horizontl line. A omprtor moves the lrger vlue to the left, n the smller vlue to the right. COMP(, ) COMP(, ) COMP(, ) COMP(, ) COMP(, ) Figure 1: A 4-element sorting network. For instne, if the inputs re = 7, = 2, = 5, = 9, then the sorte output t the ottom of the sorting network is = 9, = 7, = 5, n = 2. The vlue 9 moves from ril to ril t COMP(, ), n then moves from ril to ril t COMP(, ). Although severl lgorithms re ville to generte oe for sorting networks, Bther s o-even mergesort lgorithm is often hosen for its effiieny [1]. Bther s lgorithm uses O(n log 2 n) omprtors n hs epth of O(log 2 n). Sorting networks n e effiiently implemente in proessors tht provie min n mx instrution. Sorting networks implemente with these instrutions voi the performne penlties of rnh miss-preitions inurre y tritionl rnh-se sorting implementtions. The experimentl results in Setion 7 inite tht eliminting rnhes in the oe of sorting networks is signifint win in ontemporry proessors. 2.1 Supporting Hrwre Consier mhine tht hs the following min n mx instrutions: ( ( : : min(, ) =, mx(, ) = : otherwise : otherwise The omprtor require y sorting network is esily onstrute using these two opertions, opy instrution, n temporry vrile. For instne, suh instrutions re ville in the x86-64 rhitetures supporting the SSE2 min n mx opertions tht return the minimum (mximum) pke single-preision floting-point vlues [6]. 1 The extension of sorting networks to operte on vetor instrutions requires the efinition of vetorize min n mx instrutions. 2 For input vetors A n B, A = B = n, let C = min(a, B) e the element-wise minimum vetor, suh tht C i = min(a i, B i), 1 i n. The vetorize mx instrution is efine similrly. The with of (vetorize) sorting network refers to the numer of vetors eing sorte. Given n orere list of vetors X 1, X 2,..., X n, strem of t is forme y seleting the i th element from eh vetor in orer, thus the i th strem is Xi 1, Xi 2,..., Xi n. For instne, the x86-64 rhiteture hs 16 XMM vetor registers, n eh register n hol 4 floting-point vlues. Therefore, sorting the vlues in n XMM registers using sorting network proues 4 sorte strems of t of length n. Up to 15 XMM registers n e use, i.e. 1 n < 16, euse one register must e reserve s temporry storge for the swp of vlues in the omprtor. This ompre-n-swp mhinery offers severl vntges to sort smll set of vlues tht fits within the SIMD 1 SSE stns for Streming SIMD Extensions. SSE2 improves upon the originl SSE. 2 These vetor instrutions re lle SIMD extension.

3 registers: (1) its opertion is unonitionl n t inepenent; (2) it is inherently rnh-free, n thus free of rnh-preition performne penlties; (3) it inreses the nwith of sorting y enling the SIMD instrutionlevel prllelism; n (4) eh ompre-n-swp requires the exeution of only 3 instrutions. A oe genertor must e le to generte oe to sort sequenes of ny length in mhine with n+1 SIMD registers. The solution is to efine size-optiml sorting networks tht use 1, 2,..., n registers. The optiml oe for the implementtion of eh of these sorting networks is pre-generte n store in smll oese ville to the oe genertor for eployment. One t hs een loe into the SIMD registers the oe genertor instntites the oe to perform the omprtor opertions speifie y the sorting network, n integrtes the resulting strems. 3. STREAM-BASED TWO-PASS SORTING The first two SIMD-se sorting lgorithms isusse in this pper operte in two phses. In the first phse the SIMD registers n instrutions re use to generte prtilly-sorte output. In the seon phse stnr sorting lgorithm insertion sort n mergesort re investigte in this pper finishes the sorting. The hoie of lgorithm for the seon phse ittes the est t orgniztion for the first one. For the first phse, onsier the use of the SIMD sorting mhinery esrie in Setion 2 for the tsk of sorting sequene of k n vlues using n SIMD registers, eh register ple of storing k vlues. Eh group of k vlues is loe from memory into seprte SIMD register. For moment, ssume tht the strt of the sequene is ligne for suh lo opertion. The sorting mhinery is then pplie to proue k sorte strems of length n, n the sorte strems re written k in-ple to memory in n interleve form. The orgniztion of the t in memory for k = 4 is shown in Fig. 2. After sorting, A 1 A 2... A n, B 1 B 2... B n, et. After this initil sorting the orering reltionship etween elements from seprte strems, A i, B i, C i, n D i, is still unknown. Now the output from the vetorize sorting network must unergo n itionl sorting pss. Let us exmine the use of insertion sort n mergesort to finish sorting this prtilly sorte output. If the strt of the sequene is not ligne, the tehnique use in this pper will e to sort the ligne vetor loks tht overlp the trget region. The extr fringe elements will e sve to temporry rry, reple y positive/negtive infinity s pproprite, n resorte upon ompletion. A 1 B 1 C 1 D 1 A 2 B 2 C 2 D 2 A n B n C n D n Figure 2: Interleve sorte strems from n 4- element SIMD registers. The first register ontins elements A 1, B 1, C 1, n D 1. A 1 A 2 A n, et. 3.1 Seon Pss with Insertion Sort A stnr insertion-sort lgorithm my e use to sort the output of the SIMD-se sorting network. Insertion sort elivers the est performne when its input is mostly sorte euse the lgorithm oes not hve to move elements very fr. Thus potentil issue with using insertion sort s seon pss is how the t shoul e loe into the SIMD vetors in the first phse to proue the most fvorle input for insertion sort. Consier n input sequene of S vlues, n mhine with n + 1 SIMD vetors. Eh vetor n store up to k vlues. Let m = S/k. If m n the entire rry n e loe into the SIMD registers, sorte, n written k inple. Then ll to insertion sort will finish sorting the entire sequene. If m > n, n in-ple lgorithm ivies the rry into susets smll enough to fit in the vetor registers, sorts them with sorting network, n writes eh sorte suset k to the sme lotions. A nive pproh woul simply ivie the rry into m/n lmost equl-size loks. However, if the t is uniformly istriute this prtition results in m/n similr loks, one fter the other. The prolem is tht smll elements from the lst lok woul hve similr vlues to the smll elements from the first lok, n woul require insertion sort to move mny elements to fr positions to omine these loks. A etter pproh is to lo the loks into the SIMD registers in strie fshion. Consier for exmple n = 4 n m = 12 whih requires three sorting network lls. Inste of the first ll ting on elements A 1, A 2, A 3, n A 4, it ts on A 1, A 4, A 7, n A 10. The seon ll ts on elements A 2, A 5, A 8, n A 11, n the thir on A 3, A 6, A 9, n A 12. In this wy the smll vlues in the rry re likely to en up in A 1, A 2, n A 3. A strie with greter thn one improves insertion sort performne in ses of uniform or mostly-sorte istriutions. In this pper, this strie version of the vetorize sorting network followe y n insertion sort pss is lle ISort. 3.2 Seon Pss with Mergesort The mergesort lgorithm, lle MSort, uses fixe-size lok of temporry storge T tht is lrge enough to hol the entire rry A. Beuse the SIMD-se sorting is pplie to smll sequenes this rry will not e lrge in prtie. MSort proees s follows. Compute the numer of loks of t to e sorte, m/n, n llote temporry spe T. Cll the sorting network on eh lok from A n store the sorte strems to T. The Q-MERGE lgorithm esrie y Wikremesinghe et l. [20] se on work y [14] is now use to store the sorte t into A: (1) Buil hep ontining the first element in eh strem, n ssoite with eh element pointer to the next element in its strem; (2) Repetely extrt the minimum element from the hep. During the extrtion, reple the remove element with the next element in its strem, n reuil the hep. With smll numer of strems, suffiient registers my e ville to ontin the entire hep. Hepify opertions re then effiient n the only flow of t to/from memory is to feth the next item from strem or to store the next vlue to A. For heps tht re too lrge to fit within the ville registers, in-memory hep oe my e use. Mintenne opertions on smll heps my e written using the known register lotions of elements, voiing potentilly ostly memory esses n pointer iniretions. MSort uses one merge hep, with the numer of inputs eing multiple of v. Tht is, eh hep ompletely hn-

4 les the output from one or more vetorize sorting network lls. Further, only heps whih my e ontine within the ville registers re onsiere. Aitionl optimiztions inlue pling sentinel vlue of infinity t the en of eh strem to voi heking if strems re empty [20]. One the sentinel is loe into the he it will sink to the ottom. When ny sentinel is extrte from the hep the sorting is omplete. Eh sorting network ll ples elements from the sme strem onstnt istne wy from eh other. Thus the next element on strem n e foun y ing onstnt offset to the ress of the urrent element, whih mkes the mintenne of the next element pointer in the hep strightforwr. 4. ONE-PASS VECTOR SORTING The thir SIMD-se sorting lgorithm omplishes the sorting in single pss. Intuitively this is possile y loing ll of the n elements to e sorte into the vetor registers, pplying the omprtors for n n-element (slr) sorting network, n writing the elements k to memory in-ple. Tle 1: SSE2 instrutions use in the exmple of Fig. 4 Instrution Desription movps R, R opy the ontents of R to R shufps R, R, i opy 2 elements of R to the 2 loworer wors of R, n 2 elements of R to the 2 high-orer wors of R. The elements to e opie re speifie y i. movhlps R, R opy the 2 high-orer wors from R to the 2 low-orer wors of R. movlhps R, R opy the 2 low-orer wors from R to the 2 high-orer wors of R. The iffiulty with this pproh lies in repositioning elements within the vetor registers suh tht the vetor omprtor opertions o not orrupt the vlues of elements not involve in the omprison. Moreover, simply ligning omprtor inputs my e hllenging, epening on the frgmenttion of free lotions within the vetor registers. Sine the ost of pplying vetor omprtor remins the sme regrless of the numer of re vlues in eh input vetor, nturl optimiztion is to exeute more thn one (slr) sorting-network omprtor t time. However, the ost of itionl t-movement instrutions to properly position multiple omprtor inputs in eh vetor register my outweigh the enefit of prlleliztion. In prtie, for the sorting networks onsiere, it i not pper to e the se tht ligning s mny elements s possile 3 ws ever etrimentl to the resulting sequene of opertions. However the lgorithm we present oes provie the ility to lne suh lignment osts for the trget rhiteture. 4.1 Serhing to Aligning Vetor Elements We will first esrie the lgorithm use for fining sequene of lignment instrutions, n then show how this pplies to smll 4-element sorting network. 3 With the optimiztion tht the sorting networks orresponing to Bther s Merge Exhnge re expliitly seprte into lyers. Figure 3: An 8-element sorting network proue y Bther s Merge Exhnge, with reks inite etween lyers Algorithm Input The input to our lgorithm is sequene of omprtors orresponing to sorting network. In our se the sorting networks were proue y Bther s Merge Exhnge not to e onfuse with Bther s Bitoni Sort. Merge Exhnge hs the property of prouing n initil sequene of omprtors onneting elements tht re seprte y powers of 2. This llows for exeuting lrge numer of prllel omprtors t the strt without ny nee for lignment instrutions. The t epenenies in the sorting network efine prtil orering for the exeution of the omprisons. The omprtors n thus e prtitione into sets in suh wy tht ll the omprtors in eh set n e exeute in prllel. This prtition orrespons to the omputtion of the mximl nti-hins in t-epeneny grph [18]. One nturl optimiztion, onsiering multiple legl orerings of the omprtor sequene, ws not implemente ue to the omintoril inrese in the serh spe. While we present no forml pproximtion ouns, we feel tht the resulting suoptimlity of the instrution sequenes proue is not signifint. One importnt optimiztion whih reues oth the numer of ssemly instrutions n the time neee to serh for sequene is to insert expliit reks etween levels of the Merge Exhnge sorting network. Tht is, to isllow exeuting slr omprtors from ifferent levels within one prllel omprtor. For this purpose we onsier levels to e the results of the innermost loop in Bther s Merge Exhnge lgorithm s esrie in [8]. An 8-element Merge Exhnge network is shown in Fig. 3 with suh lyer reks inite. The sequene of lignment instrutions within lyer is often repete for susequent loks of slr omprtors. Foring reks etween levels my e thought of s helping to mintin this repeting pttern of element positions within vetors. This repetition is not exploite iretly, ut it oes seem to introue less noise whih my propgte when rerrnging elements Initil Stte For onveniene we will ssume tht we hve n unoune numer of vetor registers. The resulting sequene of ssemly instrutions my e restrite to smll numer of physil vetor registers s post-proessing step y spilling n loing vlues to n from memory s pproprite.

5 We will lso ssume tht the elements re lote in ontinuous region of memory, re ppropritely ligne, n tht the numer of elements is multiple of the size of vetor. These restritions re for simplifition only n my e lifte y mking smll hnges to the lgorithm. Note tht the proess of serhing for sequene of lignment instrutions is only onerne with keeping trk of the lels of the elements ontine within the vetor registers we will refer to mnipultions of elements only for onveniene. The first step is to lo ll of the elements from memory into the vetor registers. It is nturl n onvenient to ssume sequentil leling, suh tht the first memory lotion is lele 0 n the lst lotion n 1. Given relisti onstrints on the pilities of the vetor mnipultion instrutions, numer of empty vetor registers re require s swp spe for rerrnging elements. In our experiments hving 5 empty vetor registers in ition to those registers holing the initil vlues ws seen to e suffiient Aligning Set of Comprtors While ll of the omprtors in the sorting network hve not yet een exeute, selet the next k omprtors tht o not ross lyer n suh tht k is no lrger thn the numer of elements in vetor register. The tsk is then to rerrnge elements suh tht ll the low elements from the k omprtors re in one vetor, n ll the high elements in nother 4, n ligne element-wise with their prtner. Suh n lignment is only vli if pplying vetor omprtor will not erse the lst opy of ny element. An ersure must neessrily our when ompring n element with either n empty (grge) vlue or nother element with n unknown orering reltion. Note tht pplying vetor omprtor will lso invlite opies of ompre elements tht re lote in other registers. Fining sequene of ssemly instrutions to omplish this lignment is performe using stnr itertiveeepening serh. The legl tions in stte re ll vetor ssemly instrutions whih o not ompletely eliminte n element from the set of vetor registers. Due to fesiility onerns, eh itertive-eepening serh is ivie into two phses: moving the low hlf of eh omprtor into one vetor, n then moving the high hlf into lignment. If the mximum serh epth in ny one phse rehes 3, then tht tsk is further suivie into moving the first 2 elements into vetor, the next 2 elements into nother, n finlly omining them. Even with these inrementl stges, ue to the mssive rnhing ftor nive implementtion of this serh woul tke signifint mount of time for even moertely lrge networks. Our implementtion mkes use of severl missile heuristis to prune portions of the serh spe. To ress the treoff etween the ost of exeuting vetor omprtor n the ost of lignment instrutions, the ove serh is repete for smller vlues of k, n the finl ost eomes omintion of the numer of lignment instrutions n penlty for inluing fewer slr omprtors thn is possile. Intuitively this ttempts to selet the sequene with the est rtio of numer of lignments proue versus instru- 4 The prtitioning of low n high elements my e roppe if relelling is performe when pplying the vetor omprtor, se on whether the slr omprtor is inverte. tions require, with n itionl is towrs prouing more lignments sine more lignments will reue the totl numer of omprison steps. When ll pproprite vlues of k hve een serhe, the hoie of how mny omprtors to inlue is me greeily n is not revisite. The vetor omprtor is then pplie n the serh ontinues using the remining omprtors Writing Vlues Bk to Memory After the finl omprtor hs een pplie the elements re sorte ut re not lote within the vetor registers in n orer in whih they n e written k to memory. A similr itertive-eepening serh now fins n instrution sequene to otin the orret lignment. 4.2 Exmple Serh The sorting network shown in Fig. 1 will e use to illustrte the sequene of events in the lignment lgorithm for single-pss in-register sorting. This network hs four elements re requires the exeution of five omprison instrutions. An in-register sorting instne of this network using the x86-64 SSE(2) SIMD mhinery is shown in Fig. 4. The instrutions use in this instne re esrie in Tle 1. 5 The sorting network of Fig. 1 proues the following prtitions: P 1 = {COMP(, ), COMP(, )}; P 2 = {COMP(, ), COMP(, )}; n P 3 = {COMP(, )}. First the elements of XMM0 re ssigne the four elements to e sorte (,,, n ). Then low-ost sequene of vetor instrutions is serhe for to lign with n with. Here this my e one with single movlhps instrution in step 1. This llows for exeuting the COMP(, ) n COMP(, ) omprtors in prllel (step 2) 6. After this omprison the vlue store in element is smller thn the vlue store in element, n the vlue store in element is smller thn the vlue store in element. In Fig. 4 lnk squre represents vetor element tht ontins n unknown vlue tht is not relevnt to the sorting proess. For instne, fter the omprison in step 2 the vlues tht were in elements n in the low-orer wors of XMM0 my hve move. As they re not prt of the sorting proess they re now represente y lnk squres. If the inputs to the sorting network re = 7, = 2, = 5, n = 9, this omprison woul leve the highest-orer wors of XMM0 n XMM1 intt n woul swp the ontents of the seon highest-orer wors. It my lso swp the vlues in the two low-orer wors of these registers, ut the ontents of those wors re irrelevnt. Now the two omprtors in prtition P 2 re nites for the next vetor lignment. The initil stte for this serh is the position of the elements in the vetors t the en of step 2. In the exmple in Fig. 4 sequene of two instrutions, movhlps n shufps, is selete to lign elements with n with. Thus oth omprtors of P 2 n e exeute in prllel in step 5. A penultimte serh is performe to exeute the lst omprtor, resulting in steps 6 n 7, t whih point the 5 Other SSE2 instrutions frequently use for t movement ut not inlue in this exmple re: pshuf, unpkhps, n unpklps. 6 For SSE2, omprtor etween the ontents of two registers R n R requires temporry register T n the exeution of three instrutions: movps T, R; minps R, R; n mxps R, T.

6 Step 1: movhlps xmm1, xmm0 Step 2: COMP(0,1) XMM0 XMM1 XMM2 XMM3 Step 3: movhlps xmm0, xmm1 Step 4: Step 5: shufps xmm1, xmm0, 0x88 COMP(0, 1) XMM0 XMM1 XMM2 XMM3 Step 6: movhlps xmm2, xmm1 Step 7: movps xmm3, xmm0 Step 8: COMP(2, 3) XMM0 XMM1 XMM2 XMM3 Step 9: Step 10: shufps xmm0, xmm2, 0x13 movlhps xmm1, xmm3 XMM0 XMM1 XMM2 XMM3 Step 11: shufps xmm1, xmm0, 0x2 Step 12: movps [rsi+(0)], xmm1 min memory Figure 4: Instrution sequene to pply n in-register 4-element sorting network in n x86-64 rhiteture. The ssoite sorting network is shown in Fig. 1. element vlues re sorte. Finlly, the elements must e properly positione within one register (in this se XMM1) efore the sorte sequene n e written k to memory with movps instrution. The vetoriztion of sorting network only nees to e one one for eh sorting network n for eh rhiteture s set of vetor instrutions. Thus ll the serhes esrie ove shoul e performe one n offline. The resulting sheule n then e use whenever sequene of the orresponing size nees to e sorte. 5. SORTING KEY-POINTER PAIRS So fr this pper resses the prolem of sorting n rry of floting-point vlues. A more generl prolem is tht of sorting n rry of t strutures. Consier the se where eh struture hs well-efine floting-point key vlue. Effiient lgorithms sort n rry of key-pointer pirs to voi moving lrge t strutures. This setion esries n extension of the vetorize sorting networks to hnle key-pointer pirs with floting-point keys n yte sequene representing the pointer. The solution to the key-pointer sorting prolem onsists of storing the keys n the pointers into seprte SIMD vetors. If keys n pointers pper interleve in memory then they must e swizzle when loe into the SIMD vetors n this swizzling must e reverse when storing the sorte result to memory. With the keys n pointers in seprte vetors, the stnr sorting network solution is implemente for the keys, while the pointers move in synhrony with the key movements. This is omplishe y using itmsk to pply the swp opertions only to selete elements in the pointer vetor. Speifilly, those elements whih orrespon to hnges in the key vetor fter pplying the key omprtor. The onstrution of this itmsk is supporte in rhitetures tht support SIMD opertions. For instne, this my e one in strightforwr mnner using the AltiVe vsel instrution, while x86-64 rhitetures must mke use of sequene of oolen opertions to msk n omine registers s shown in Fig. 5. sm("pshuf xmm15, xmm1, 0xE4"); // xmm15 := opy of key_ sm("minps xmm1, xmm2"); // key_ := min(key_, key_) sm("mxps xmm2, xmm15"); // key_ := mx(key_, key_) sm("mpps xmm15, xmm1, 4"); // xmm15 := key_!= key_ sm("pshuf xmm14, xmm3, 0xE4"); // xmm14 := opy of ptr_ sm("xorps xmm14, xmm4"); // q := ptr_ XOR ptr_ sm("nps xmm15, xmm14") // q := q AND itmsk sm("xorps xmm3, xmm15"); // ptr_ := ptr_ XOR q sm("xorps xmm4, xmm15"); // ptr_ := ptr_ XOR q Figure 5: Key-pointer omprtor using SSE2 ssemly instrutions. Vetor registers xmm1 n xmm2 hol keys, registers xmm3 n xmm4 hol the respetive pointers. Registers xmm14 n xmm15 re use s temporry storge.

7 6. VECTORIZING D-HEAPS -heps re strightforwr generliztion of inry heps where eh internl noe hs hilren inste of 2. Inresing the vlue of results in shllower tree t the expense of requiring elete-min opertions to perform more work when serhing for the hil noe with minimum key vlue. For onreteness ssume min-heps. Assume n impliit hep lyout, with ll elements store in ontiguous rry. The root noe is lote t inex 0, n the nth hil of noe t inex i is lote t inex i + n, with 1 n. The prent of ny noe my e similrly ompute y iviing its inex-1 y. In [9, 10] LMr n Lner investigte the performne of tritionl impliit heps n how they re ffete y t hes. They suggest inresing the rnhing ftor s well s the t lignment tehniques esrie here n use in our implementtion. We present here metho for inresing -hep performne y using SIMD vetor instrutions to quikly ompute the inex of the hil with minimum key vlue. This omputtion is use within hepify-own opertions. This metho is similr to the one use for sorting keypointer pirs in tht it relies on the synhronous movement of vlues within seon set of registers. In this sitution the vlues moving in synhrony re the inies of eh hil noe (speifilly the offset from the first hil, suh tht the vlues rnge from 0 to 1). For simpliity, ssume tht is multiple of k, the numer of elements in SIMD vetor. This ssumption lso ligns noe s hilren on oth he-line n SIMD vetor ounries. This lignment requires tht the root noe e lote t the en of he-line suh tht its first hil is t the eginning of the next he-line. If the noes in the hep re key-pointer pirs, rther thn just keys, loing into SIMD vetor my require itionl swizzle instrutions to interleve the keys from 2 seprte vetor los. Only the key vlues re require; the ssoite pointer t my e isre. When lok of keys is loe into SIMD vetor, the inex offsets for those keys re loe into nother vetor from onstnt n stti rry ontining vlues 0, 1,..., 1. The synhronous movement of the inex offsets is implemente in the sme mnner s the movement of the pointer vlues in Setion 5. The loing n movement of these offsets is omitte from the lgorithm esription for revity. The lgorithm proees s follows: (1) lo the first k keys into one SIMD vetor, ll this register A; (2) while unre keys remin, re the next k keys into SIMD vetor B n set A := min(a, B); (3) finlly, repetely ompre one hlf of the vlues in A ginst the other hlf until only one element remins; (4) return the inex of this element. If the noe eing exmine oes not hve hilren (this my only our t lst internl hep noe) then the vetorize serh is reple y strightforwr liner sn. 7. EXPERIMENTAL EVALUATION 7.1 Sorting Algorithms The three versions of vetorize sorting esrie in this pper were evlute y integrting them s the low-level lgorithms for DTSL s quiksort. The min finings of this experimentl evlution re: Signifint reutions in exeution time re possile for sorting on the Pentium 4, with lesser reutions on the G5 n Core 2 Duo, epening on rry size. The integrtion of SIMD-se sorting lgorithms to sort sequenes smller thn fixe threshol improves the performne of DTSL when sorting element rrys of floting-point key-pointer pirs y up to 22%. This performne improvement is ue not only to reution in the numer of los, stores, n rnh instrutions, ut lso to signifint erese in the numer of rnh mispreitions Integrting Algorithms into DTSL Tle 2: Algorithms stuie Algorithm Desription MSortX - Y MSort lgorithm with X strems pplie t Y threshol. ISortX - Y ISort lgorithm with X strems pplie t Y threshol. RSort - Y One-pss register sort pplie t Y threshol. DTSL - Y Originl DTSL quiksort with SN pplie t Y threshol. Ins - Y Stnr insertion sort pplie t Y threshol. Brnh misp. erese from DTSL % 100% 80% 60% 40% 20% 0% -20% P4 Reution of Quiksort Brnh Mispreitions per Low-Level Algorithm (Key-Pointer pirs, 5000 trils) MSort4-16 MSort4-32 MSort4-64 MSort4-253 MSort ISort4-16 ISort4-64 ISort4-253 ISort8-509 ISort RSort - 16 RSort - 64 RSort - 96 Low-level lgorithm use RSort rry size= Figure 6: Reution of rnh mispreitions on 64-it 3.40 GHz Pentium 4. The SIMD-se lgorithms presente in this pper were integrte in the quiksort implementtion of DTSL. The DTSL s quiksort is not reursive. Inste it mintins n in-funtion stk of urrent prtitions. When the numer of elements to e sorte rops elow threshol, DTSL swithes to low-level sorting lgorithm. The version of quiksort tht proues the est, or lose the est, performne when sorting elements in DTSL uses slr sorting network SN s the low-level lgorithm [11]. The singleelement omprtors in this sorting network re written in the C lnguge n use rnh instrutions to onitionlly perform element interhnges. The efult threshol to swith to this low-level lgorithm is sixteen elements. This version of DTSL s quiksort is the seline for the omprtive performne stuy in this pper. Tle 2 lists the lgorithms use in this performne evlution. The stnr RSort RSort DTSL - 32 Ins - 16 Ins - 32 Ins - 48

8 Time erese from DTSL % 50% 40% 30% 20% 10% 0% -10% -20% -30% MSort4-16 MSort4-32 P4 Reution of Quiksort Cyles per Low-Level Algorithm (Key-Pointer pirs, 5000 trils) MSort4-64 MSort4-253 MSort ISort4-16 ISort4-64 ISort4-253 ISort8-509 ISort RSort - 16 RSort - 64 RSort - 96 RSort rry size= RSort RSort DTSL - 32 Ins - 16 Ins - 32 Ins - 48 Time erese from DTSL % 40% 30% 20% 10% 0% -10% -20% -30% Core 2 Reution of Quiksort Cyles per Low-Level Algorithm (Key-Pointer pirs, 5000 trils) rry size= MSort4-16 MSort4-32 MSort4-64 MSort4-253 MSort8-509 MSort ISort4-16 ISort4-32 ISort4-64 ISort4-253 ISort8-509 ISort RSort - 8 RSort - 16 RSort - 24 RSort - 32 RSort - 64 RSort - 96 RSort DTSL - 32 Ins - 16 Ins - 32 Ins - 48 Low-level lgorithm use Low-level lgorithm use Figure 7: Quiksort yle ounts reltive to DTSL on 64-it 3.40 GHz Pentium 4. Figure 9: Quiksort yle ounts reltive to DTSL on 3.2 GHz Core 2 Duo E6400. Time erese from DTSL % 20% 10% 0% -10% -20% -30% MSort4-16 MSort4-32 MSort4-64 G5 Reution of Quiksort Cyles per Low-Level Algorithm (Key-Pointer pirs, 5000 trils) MSort4-253 MSort8-509 MSort ISort4-16 ISort4-32 ISort4-64 ISort4-253 ISort RSort - 8 RSort - 16 RSort - 24 RSort - 32 Low-level lgorithm use RSort - 64 rry size= Figure 8: Quiksort wll-lok times reltive to DTSL on 2.7 GHz Power M G5. insertion-sort lgorithm, Ins - Y, is inlue to provie fmilir omprison point Wll-Clok Exeution Time Experiments were performe on 64-it 3.4 GHz Pentium 4, n IBM 2.7 GHz PowerPC G5, n 3.2 GHz Core 2 Duo E6400. Figs. 7, 8, 9 show the reltive wll-lok exeution times for the sorting of vetor of key-pointer pirs in reltion to the DTSL seline. Eh r represents the verge runtime over 5000 trils on uniformly istriute keys reltive to the DTSL seline. The lrge threshols for MSort, ISort, n RSort, extening eyon wht n onurrently fit within the physil vetor registers, re the result of the register spilling mentione in Se. 4. Time reutions for the Pentium 4 re quite strong for rnge of rry sizes, with the gretest reution of 58% for 200 elements, where ll re immeitely sorte y MSort RSort - 96 eomes the etter lterntive for lrger rrys, with time reution of 22% for elements. Lrge time reutions on the Core 2 Duo n the G5 re limite to smll rry sizes. For 200 elements MSort8-509 hs respetive time reution of 43% n 33%. For the lrgest rry RSort - 32 hieves only 7% n 4% reltive improvement on these rhitetures. As seen in Fig. 6 the numer of rnh mispreitions for eh lgorithm tens to erese s the threshol eomes lrger, refleting the reue numer (or lk) of rnh instrutions involve. However lrger RSort threshols require RSort - 96 RSort DTSL - 32 Ins - 16 Ins - 32 Ins - 48 itionl funtions, muh more so thn for the two-pss lgorithms. These funtions grow proportionl to the size of their respetive sorting networks n inlue lignment opertions. The size of some of the generte ojet files spns to severl megytes. The first-pss sorting instrutions for MSort n ISort o not require nerly s muh spe Low-Level Algorithm Timing PAPI TOT_CYC event ount 1e P4 Low-level Sorting Algorithm Timing for Key-Pointer Pirs RSort MSort4 MSort12 ISort4 ISort12 DTSL SN Insertion Arry size Figure 10: Low-level lgorithm yle ounts on 64-it 3.40 GHz Pentium 4. Fig. 10 shows the numer of lok yles, otine through the PAPI lirry, require y eh lgorithm s the numer of elements to e sorte vries to the mximum (s implemente) for eh lgorithm. Eh point in the grph is the verge over trils with uniformly istriute keys. This grph shows tht RSort is signifintly superior to oth the SN rnh-intensive lgorithm n stnr insertion sort, n onfirms tht MSort is lso n exellent hoie for the sorting of short sequenes. The performne of RSort on the G5 n Core 2 is roughly the sme s tht of the Pentium 4 results shown in Fig. 10 for sequenes smller thn 32 elements. A etile stuy of other performne ounters showe orreltion etween reution in the numer of rnhes, los, n stores exeute n the reltive performne of the lgorithms.

9 7.2 D-Heps The performne of -heps ws investigte y ompring highly optimize versions with ifferent rnhing ftors ginst SIMD vrints where vetor instrutions were use uring hepify-own opertions. The min finings re signifint reution in yle ount for lrger heps, when ompring the est SIMD -hep ginst the est non-simd -hep. SIMD time versus non-simd 100% 80% 60% 40% Rtio of Best -Hep Times P4 20% G5 Core 2 0% Hep size (log 2 ) Figure 12: Rtio of the est SIMD hep times reltive to est non-simd hep times from Fig. 11. All soure oe ws written in C++ n ws ompile using g 3.4.6, 4.0.0, n on the Pentium 4, G5, n Core 2 Duo respetively, with full optimiztions n loop unrolling. The rnhing ftor ws known t ompile time. The hep itself ws ligne in memory suh tht the root noe s hilren egn on he-line ounry. A onsequene is tht ll hilren re ligne for SIMD vetor esses. The inry hep h further optimize inex omputtions. As with the previous experiments, hep elements re keypointer pirs. Hep hve sizes whih re powers of 2, from 2 4 to 2 26, n re initilize y inserting n elements, where n is the mximum size of the hep. Keys for initil elements re rwn uniformly from 0,..., n 1. 10,000,000 itertions se on the Hol moel s esrie in [7] were then performe. Eh itertion onsists of ll to elete-min followe y insert-element. The key of the new element is equl to the key of element lst remove plus vlue rwn uniformly from 0,..., n 1. As seen in Fig. 11, when the hep size eomes 2 18 there is rossover etween vlues of in the performne of tritionl heps. For smll heps = 2 performs etter, while = 8 or = 16 performs etter for lrger heps, resulting from etter lolity of eh noe s hilren s well s erese hep epth. All grphs in Fig. 11 show only the est or ner-est vlues of for lrity. Fig. 12 shows the rtio of exeution times etween the est SIMD hep t eh size versus the est tritionl hep. For the Pentium 4, G5, n Core 2 Duo, the SIMD heps hve n verge reution in yles of 31%, 18%, n 15% respetively, with the lrgest reutions ourring t the 2 18 rossover point for the Pentium 4 n G5, n t 2 20 for the Core RELATED WORK The implementtion of sorting in lrge-sle vetor mhines hs een extensively stuie. Siegel proue one of the erliest esriptions of how to implement Bther s sorting network, lso known s itoni sorting, in SIMD mhines [17]. Bitton et l. provies n extensive esription of suh implementtions [3]. The new ontriution of this pper is to emonstrte how the well-known sorting networks n e implemente in the SIMD mhinery of ontemporry proessors n to inite tht oe genertors n instne suh implementtions to improve the performne of reursive sorting lgorithms n heps. The ie of mking etter use of register resoures within the proessor to reue the numer of lo of stores, in our se to put the SIMD resoures to goo use in sorting, is lso explore y Arge et l. [20]. Their ie of forming he-lo-size runs with quiksort is similr to our ie of swithing to SIMD-register-se sorting t n pproprite threshol. The ontrst is tht we re lso enefiting from the SIMD mhinery whih llows more prllelism in the exeution n the elimintion of rnhes while they use the generl-purpose registers n the storge ville t he line. Reently ompilers hve een use more often to improve the oe genertion for SIMD mhinery in ontemporry proessors. Ren et l. s pproh of using n optimiztion lgorithm to improve the t permuttions is more generl thn our speifi itertive-eepening serh [15]. Nuzmn et l. esries ompiler frmework to generte vetorize oe for interleve t [13]. The reltionship etween the SIMD-register-se sorting lgorithms presente in this pper n the evelopment of DTSL is n orthogonl improvement to lirry genertor [11]. Li et l. fouse on the ynmi ientifition of the est sorting lgorithm for given input sequene [12]. They selete n effiient lgorithm for the til of their reursive metho. This pper offers etter solution for the sorting of sequenes tht re smll enough to enefit from the use of the SIMD mhinery. Similrly, we provie fster mehnism for seleting minimum (mximum) hil in the impliit -heps stuie y LMr n Lner [9, 10]. Our SIMD-register-se sorting oul lso improve prtition se sorting methos. For instne, Shen n Ding use n ptive prtitioning sheme to ttempt to evenly prtition t into hunks smller thn he size n then use quiksort or insertion sort to finish sorting eh uket [16]. This pper offers etter solution for the sorting of sequenes tht re smll enough to enefit from the use of the SIMD mhinery. 9. CONCLUSIONS This pper proposes the use of the SIMD mhinery provie in moern proessors to improve the performne of reursion tils. The ie is tht whenever the numer of elements to e proesse fits within the SIMD registers ville in the proessor, these vlues shoul e loe one into the SIMD registers n then n effiient SIMD exeution shoul e use. While the fesiility of this ie ws emonstrte with the integrtion of more effiient lgorithm for sorting short sequenes into DTSL, the ie shoul e generlly pplile to reursive omputtion. One effiient low-level SIMD lgorithms re rfte, they n e generte into solution tse to e instntite y oe genertors into optimize lirries. Alterntively, if suitle ientifition lgorithm is rete, the ompiler shoul e le to integrte these solutions iretly into generl progrms.

10 PAPI TOT_CYC event ount 8e+10 7e+10 6e+10 5e+10 4e+10 3e+10 2e+10 1e+10 P4 -Hep Timing (Hol Moel) SIMD 8 SIMD Hep size (log 2 ) () Time (miroseons) 4e e+07 3e e+07 2e e+07 1e+07 5e+06 G5 -Hep Timing (Hol Moel) SIMD Hep size (log 2 ) () PAPI TOT_CYC event ount 6e+10 5e+10 4e+10 3e+10 2e+10 1e+10 Core 2 Duo -Hep Timing (Hol Moel) SIMD 16 SIMD Hep size (log 2 ) () Figure 11: Cyle ount / wll-lok time for ifferent hep sizes n vlues of on : () 64-it 3.40 GHz Pentium 4; () 2.7 GHz Power M G5; () 3.20 GHz Core 2 Duo E insertions n eletions. Aknowlegments The experimentl evlution of these ies ws me possile thnks to Dvi Pu s generous shring of his group s DTSL oe. This reserh is support y grnts from the Nturl Siene n Engineering Reserh Counil (NSERC) of Cn, n y IBM Corportion. 10. REFERENCES [1] K. E. Bther. Sorting networks n their pplitions. In AFIPS Spring Joint Computing Conferene, pges , [2] L. Bishop, D. Eerly, T. Whitte, M. Finh, n M. Shntz. Designing PC gme engine. IEEE Computer Grphis n Applitions, 18(1):46 53, [3] D. Bitton, D. J. DeWitt, D. K. Hsio, n J. Menon. A txonomy of prllel sorting. Computing Surveys, 16(3): , Septemer [4] J. D. Frens n D. S. Wise. Auto-loking mtrix-multiplition or trking BLAS3 performne from soure oe. In Prinipples n Prtie of Prllel Progrmming PPoPP, pges , Ls Vegs, Nev, [5] M. Frigo. A fst Fourier trnsform ompiler. In Progrmming Lnguge Design n Implementtion PLDI, pges , Atlnt, GA, June [6] Intel. IA-32 Intel R 64 n i-32 rhitetures softwre eveloper s mnul volume 1: Bsi rhiteture pf, [7] Dougls W. Jones. An empiril omprison of priority-queue n event-set implementtions. Commun. ACM, 29(4): , [8] Donl Ervin Knuth. The Art of Computer Progrmming, Vol. 3 - Sorting n Serhing. Aison-Wesley Longmn Pulishing Co., In., Boston, MA, USA, [9] A. LMr n R. E. Lner. The influene of hes on the performne of heps. ACM Journl of Experimentl Algorithms, 1:4, [10] A. LMr n R. E. Lner. The influene of hes on the performne of sorting. In SODA: ACM-SIAM Symposium on Disrete Algorithms (A Conferene on Theoretil n Experimentl Anlysis of Disrete Algorithms), [11] X. Li, M. Grzrn, n D. Pu. A ynmilly tune sorting lirry. In Coe Genertion n Optimiztion CGO, pges , Plo Alto, CA, [12] X. Li, M. J. Grzrán, n D. Pu. Optimizing sorting with geneti lgorithms. In Coe Genertion n Optimiztion CGO, pges , Sn Jose, CA, Mrh [13] D. Nuzmn, I. Rosen, n A. Zks. Auto-vetoriztion of interleve t for SIMD. In Progrmming lnguge esign n implementtion PLDI, pges , [14] A. Rne, S. Kothri, n R. Uup. Register effiient mergesorting. In High Performne Computing HiPC, volume 1970 of LNCS, pges Springer, [15] Gng Ren, Peng Wu, n Dvi Pu. Optimizing t permuttions for SIMD evies. In Progrmming lnguge esign n implementtion PLDI, pges , [16] Xipeng Shen n Chen Ding. Aptive t prtition for sorting using proility istriution. In ICPP 04: Proeeings of the 2004 Interntionl Conferene on Prllel Proessing (ICPP 04), pges , Wshington, DC, USA, IEEE Computer Soiety. [17] H. J. Siegel. The universlity of vrious types of SIMD mhine interonnetion networks. In Proeeings of the 4th Annul Symposium on Computer Arhiteture, pges 23 25, Silver Spring, MD, Mrh ACM SIGARCH/IEEE-CS. [18] S. A. A. Touti. Register sturtion in instrution level prllelism. Interntionl Journl of Prllel Progrmming, 33(4): , [19] R. Whley, A. Petitet, n J. Dongrr. Automte empiril optimiztions of sotwre n the ATLAS projet. Prllel Computing, 27(1-2):3 35, [20] R. Wikremesinghe, L. Arge, J. S. Chse, n J. S. Vitter. Effiient sorting using registers n hes. ACM Journl of Experimentl Algorithmis, 7:9, [21] J. Xiong, J. Johnson, R. Johnson, n D. Pu. SPL: A lnguge n ompiler for DSP lgorithms. In Progrmming Lnguge Design n Implementtion PLDI, pges , Snowir, Uth, June 2001.

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved. Chpter 9 Greey Tehnique Copyright 2007 Person Aison-Wesley. All rights reserve. Greey Tehnique Construts solution to n optimiztion prolem piee y piee through sequene of hoies tht re: fesile lolly optiml

More information

Greedy Algorithm. Algorithm Fall Semester

Greedy Algorithm. Algorithm Fall Semester Greey Algorithm Algorithm 0 Fll Semester Optimiztion prolems An optimiztion prolem is one in whih you wnt to fin, not just solution, ut the est solution A greey lgorithm sometimes works well for optimiztion

More information

Distance vector protocol

Distance vector protocol istne vetor protool Irene Finohi finohi@i.unirom.it Routing Routing protool Gol: etermine goo pth (sequene of routers) thru network from soure to Grph strtion for routing lgorithms: grph noes re routers

More information

Containers: Queue and List

Containers: Queue and List Continers: Queue n List Queue A ontiner in whih insertion is one t one en (the til) n eletion is one t the other en (the he). Also lle FIFO (First-In, First-Out) Jori Cortell n Jori Petit Deprtment of

More information

10.2 Graph Terminology and Special Types of Graphs

10.2 Graph Terminology and Special Types of Graphs 10.2 Grph Terminology n Speil Types of Grphs Definition 1. Two verties u n v in n unirete grph G re lle jent (or neighors) in G iff u n v re enpoints of n ege e of G. Suh n ege e is lle inient with the

More information

Table-driven look-ahead lexical analysis

Table-driven look-ahead lexical analysis Tle-riven look-he lexil nlysis WUU YANG Computer n Informtion Siene Deprtment Ntionl Chio-Tung University, HsinChu, Tiwn, R.O.C. Astrt. Moern progrmming lnguges use regulr expressions to efine vli tokens.

More information

UTMC APPLICATION NOTE UT1553B BCRT TO INTERFACE PSEUDO-DUAL-PORT RAM ARCHITECTURE INTRODUCTION ARBITRATION DETAILS DESIGN SELECTIONS

UTMC APPLICATION NOTE UT1553B BCRT TO INTERFACE PSEUDO-DUAL-PORT RAM ARCHITECTURE INTRODUCTION ARBITRATION DETAILS DESIGN SELECTIONS UTMC APPLICATION NOTE UT1553B BCRT TO 80186 INTERFACE INTRODUCTION The UTMC UT1553B BCRT is monolithi CMOS integrte iruit tht provies omprehensive Bus Controller n Remote Terminl funtions for MIL-STD-

More information

CS 241 Week 4 Tutorial Solutions

CS 241 Week 4 Tutorial Solutions CS 4 Week 4 Tutoril Solutions Writing n Assemler, Prt & Regulr Lnguges Prt Winter 8 Assemling instrutions utomtilly. slt $d, $s, $t. Solution: $d, $s, nd $t ll fit in -it signed integers sine they re 5-it

More information

MITSUBISHI ELECTRIC RESEARCH LABORATORIES Cambridge, Massachusetts. Introduction to Matroids and Applications. Srikumar Ramalingam

MITSUBISHI ELECTRIC RESEARCH LABORATORIES Cambridge, Massachusetts. Introduction to Matroids and Applications. Srikumar Ramalingam Cmrige, Msshusetts Introution to Mtrois n Applitions Srikumr Rmlingm MERL mm//yy Liner Alger (,0,0) (0,,0) Liner inepenene in vetors: v, v2,..., For ll non-trivil we hve s v s v n s, s2,..., s n 2v2...

More information

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V)

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V) Definitions G = (V, E) V = set of verties (vertex / noe) E = set of eges (v, w) (v, w in V) (v, w) orere => irete grph (igrph) (v, w) non-orere => unirete grph igrph: w is jent to v if there is n ege from

More information

Parallelization Optimization of System-Level Specification

Parallelization Optimization of System-Level Specification Prlleliztion Optimiztion of System-Level Speifition Luki i niel. Gjski enter for Emedded omputer Systems University of liforni Irvine, 92697, US {li, gjski} @es.ui.edu strt This pper introdues the prlleliztion

More information

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems Distriuted Systems Priniples nd Prdigms Mrten vn Steen VU Amsterdm, Dept. Computer Siene steen@s.vu.nl Chpter 11: Distriuted File Systems Version: Deemer 10, 2012 2 / 14 Distriuted File Systems Distriuted

More information

Duality in linear interval equations

Duality in linear interval equations Aville online t http://ijim.sriu..ir Int. J. Industril Mthemtis Vol. 1, No. 1 (2009) 41-45 Dulity in liner intervl equtions M. Movhedin, S. Slhshour, S. Hji Ghsemi, S. Khezerloo, M. Khezerloo, S. M. Khorsny

More information

Internet Routing. Reminder: Routing. CPSC Network Programming

Internet Routing. Reminder: Routing. CPSC Network Programming PS 360 - Network Progrmming Internet Routing Mihele Weigle eprtment of omputer Siene lemson University mweigle@s.lemson.eu pril, 00 http://www.s.lemson.eu/~mweigle/ourses/ps360 Reminer: Routing Internet

More information

Error Numbers of the Standard Function Block

Error Numbers of the Standard Function Block A.2.2 Numers of the Stndrd Funtion Blok evlution The result of the logi opertion RLO is set if n error ours while the stndrd funtion lok is eing proessed. This llows you to rnh to your own error evlution

More information

COMMON FRACTIONS. or a / b = a b. , a is called the numerator, and b is called the denominator.

COMMON FRACTIONS. or a / b = a b. , a is called the numerator, and b is called the denominator. COMMON FRACTIONS BASIC DEFINITIONS * A frtion is n inite ivision. or / * In the frtion is lle the numertor n is lle the enomintor. * The whole is seprte into "" equl prts n we re onsiering "" of those

More information

Asurveyofpractical algorithms for suffix tree construction in external memory

Asurveyofpractical algorithms for suffix tree construction in external memory Asurveyofprtil lgorithms for suffix tree onstrution in externl memory M. Brsky,, U. Stege n A. Thomo University of Vitori, PO Box, STN CSC Vitori, BC, VW P, Cn SUMMAY The onstrution of suffix trees in

More information

COMPUTER EDUCATION TECHNIQUES, INC. (WEBLOGIC_SVR_ADM ) SA:

COMPUTER EDUCATION TECHNIQUES, INC. (WEBLOGIC_SVR_ADM ) SA: In orer to lern whih questions hve een nswere orretly: 1. Print these pges. 2. Answer the questions. 3. Sen this ssessment with the nswers vi:. FAX to (212) 967-3498. Or. Mil the nswers to the following

More information

A decision support system prototype for fuzzy multiple objective optimization

A decision support system prototype for fuzzy multiple objective optimization EUSFLAT - LFA A eision support system prototype for fuzzy multiple ojetive optimiztion Fengjie Wu Jie Lu n Gungqun Zhng Fulty of Informtion Tehnology University of Tehnology Syney Austrli E-mil: {fengjiewjieluzhngg}@it.uts.eu.u

More information

Lesson 4.4. Euler Circuits and Paths. Explore This

Lesson 4.4. Euler Circuits and Paths. Explore This Lesson 4.4 Euler Ciruits nd Pths Now tht you re fmilir with some of the onepts of grphs nd the wy grphs onvey onnetions nd reltionships, it s time to egin exploring how they n e used to model mny different

More information

GENG2140 Modelling and Computer Analysis for Engineers

GENG2140 Modelling and Computer Analysis for Engineers GENG4 Moelling n Computer Anlysis or Engineers Letures 9 & : Gussin qurture Crete y Grn Romn Joles, PhD Shool o Mehnil Engineering, UWA GENG4 Content Deinition o Gussin qurture Computtion o weights n points

More information

Cooperative Routing in Multi-Source Multi-Destination Multi-hop Wireless Networks

Cooperative Routing in Multi-Source Multi-Destination Multi-hop Wireless Networks oopertive Routing in Multi-Soure Multi-estintion Multi-hop Wireless Networks Jin Zhng Qin Zhng eprtment of omputer Siene n ngineering Hong Kong University of Siene n Tehnology, HongKong {zjzj, qinzh}@se.ust.hk

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distriuted Systems Priniples nd Prdigms Christoph Dorn Distriuted Systems Group, Vienn University of Tehnology.dorn@infosys.tuwien..t http://www.infosys.tuwien..t/stff/dorn Slides dpted from Mrten vn Steen,

More information

WORKSHOP 9 HEX MESH USING SWEEP VECTOR

WORKSHOP 9 HEX MESH USING SWEEP VECTOR WORKSHOP 9 HEX MESH USING SWEEP VECTOR WS9-1 WS9-2 Prolem Desription This exerise involves importing urve geometry from n IGES file. The urves re use to rete other urves. From the urves trimme surfes re

More information

Using Red-Eye to improve face detection in low quality video images

Using Red-Eye to improve face detection in low quality video images Using Re-Eye to improve fe etetion in low qulity vieo imges Rihr Youmrn Shool of Informtion Tehnology University of Ottw, Cn youmrn@site.uottw. Any Aler Shool of Informtion Tehnology University of Ottw,

More information

Internet Routing. IP Packet Format. IP Fragmentation & Reassembly. Principles of Internet Routing. Computer Networks 9/29/2014.

Internet Routing. IP Packet Format. IP Fragmentation & Reassembly. Principles of Internet Routing. Computer Networks 9/29/2014. omputer Networks 9/29/2014 IP Pket Formt Internet Routing Ki Shen IP protool version numer heder length (words) for qulity of servie mx numer remining hops (deremented t eh router) upper lyer protool to

More information

CICS Application Design

CICS Application Design CICS Applition Design In orer to lern whih questions hve een nswere orretly: 1. Print these pges. 2. Answer the questions. 3. Sen this ssessment with the nswers vi:. FAX to (212) 967-3498. Or. Mil the

More information

CS 551 Computer Graphics. Hidden Surface Elimination. Z-Buffering. Basic idea: Hidden Surface Removal

CS 551 Computer Graphics. Hidden Surface Elimination. Z-Buffering. Basic idea: Hidden Surface Removal CS 55 Computer Grphis Hidden Surfe Removl Hidden Surfe Elimintion Ojet preision lgorithms: determine whih ojets re in front of others Uses the Pinter s lgorithm drw visile surfes from k (frthest) to front

More information

Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow

Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow Shifting n Comption for the High-Level Synthesis of Designs with Complex Control low Sumit Gupt Nikil Dutt Rjesh Gupt Alexnru Niolu Center for Emee Computer Systems Shool of Informtion n Computer Siene

More information

Rolling Back Remote Provisioning Changes. Dell Command Integration for System Center

Rolling Back Remote Provisioning Changes. Dell Command Integration for System Center Rolling Bk Remote Provisioning Chnges Dell Commn Integrtion for System Center Notes, utions, n wrnings NOTE: A NOTE inites importnt informtion tht helps you mke etter use of your prout. CAUTION: A CAUTION

More information

String comparison by transposition networks

String comparison by transposition networks String omprison y trnsposition networks Alexnder Tiskin (Joint work with Peter Krushe) Deprtment of Computer Siene University of Wrwik http://www.ds.wrwik..uk/~tiskin (inludes n extended version of this

More information

Comparing Hierarchical Data in External Memory

Comparing Hierarchical Data in External Memory Compring Hierrhil Dt in Externl Memory Surshn S. Chwthe Deprtment of Computer Siene University of Mryln College Prk, MD 090 hw@s.um.eu Astrt We present n externl-memory lgorithm for omputing minimum-ost

More information

Midterm Exam CSC October 2001

Midterm Exam CSC October 2001 Midterm Exm CSC 173 23 Otoer 2001 Diretions This exm hs 8 questions, severl of whih hve suprts. Eh question indites its point vlue. The totl is 100 points. Questions 5() nd 6() re optionl; they re not

More information

Minimal Memory Abstractions

Minimal Memory Abstractions Miniml Memory Astrtions (As implemented for BioWre Corp ) Nthn Sturtevnt University of Alert GAMES Group Ferury, 7 Tlk Overview Prt I: Building Astrtions Minimizing memory requirements Performnes mesures

More information

COSC 6374 Parallel Computation. Dense Matrix Operations

COSC 6374 Parallel Computation. Dense Matrix Operations COSC 6374 Prllel Computtion Dense Mtrix Opertions Edgr Griel Fll Edgr Griel Prllel Computtion Edgr Griel erminology Dense Mtrix: ll elements of the mtrix ontin relevnt vlues ypilly stored s 2-D rry, (e.g.

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

Pattern Matching. Pattern Matching. Pattern Matching. Review of Regular Expressions

Pattern Matching. Pattern Matching. Pattern Matching. Review of Regular Expressions Pttern Mthing Pttern Mthing Some of these leture slides hve een dpted from: lgorithms in C, Roert Sedgewik. Gol. Generlize string serhing to inompletely speified ptterns. pplitions. Test if string or its

More information

An Efficient Algorithm for the Physical Mapping of Clustered Task Graphs onto Multiprocessor Architectures

An Efficient Algorithm for the Physical Mapping of Clustered Task Graphs onto Multiprocessor Architectures An Effiient Algorithm for the Physil Mpping of Clustere Tsk Grphs onto Multiproessor Arhitetures Netrios Koziris Pnyiotis Tsnks Mihel Romesis George Ppkonstntinou Ntionl Tehnil University of Athens Dept.

More information

Advanced Programming Handout 5. Enter Okasaki. Persistent vs. Ephemeral. Functional Queues. Simple Example. Persistent vs.

Advanced Programming Handout 5. Enter Okasaki. Persistent vs. Ephemeral. Functional Queues. Simple Example. Persistent vs. Avne Progrmming Hnout 5 Purel Funtionl Dt Strutures: A Cse Stu in Funtionl Progrmming Persistent vs. Ephemerl An ephemerl t struture is one for whih onl one version is ville t time: fter n upte opertion,

More information

WORKSHOP 8A TENSION COUPON

WORKSHOP 8A TENSION COUPON WORKSHOP 8A TENSION COUPON WS8A-2 Workshop Ojetives Buil the tension oupon geometry Control the mesh y using tehniques isusse in lss Compre FEA stress results to theoretil results From Stress Conentrtion

More information

CS453 INTRODUCTION TO DATAFLOW ANALYSIS

CS453 INTRODUCTION TO DATAFLOW ANALYSIS CS453 INTRODUCTION TO DATAFLOW ANALYSIS CS453 Leture Register llotion using liveness nlysis 1 Introdution to Dt-flow nlysis Lst Time Register llotion for expression trees nd lol nd prm vrs Tody Register

More information

CMPUT101 Introduction to Computing - Summer 2002

CMPUT101 Introduction to Computing - Summer 2002 CMPUT Introdution to Computing - Summer 22 %XLOGLQJ&RPSXWHU&LUFXLWV Chpter 4.4 3XUSRVH We hve looked t so fr how to uild logi gtes from trnsistors. Next we will look t how to uild iruits from logi gtes,

More information

Scalable Spatio-temporal Continuous Query Processing for Location-aware Services

Scalable Spatio-temporal Continuous Query Processing for Location-aware Services Slle Sptio-temporl Continuous uery Proessing for Lotion-wre Servies iopeng iong Mohme F. Mokel Wli G. Aref Susnne E. Hmrush Sunil Prhkr Deprtment of Computer Sienes, Purue University, West Lfyette, IN

More information

Robust internal multiple prediction algorithm Zhiming James Wu, Sonika, Bill Dragoset*, WesternGeco

Robust internal multiple prediction algorithm Zhiming James Wu, Sonika, Bill Dragoset*, WesternGeco Roust internl multiple preition lgorithm Zhiming Jmes Wu, Sonik, Bill Drgoset*, WesternGeo Summry Multiple ttenution is n importnt t proessing step for oth mrine n ln t. Tehniques for surfe- rpily in the

More information

COMP108 Algorithmic Foundations

COMP108 Algorithmic Foundations Grph Theory Prudene Wong http://www.s.liv..uk/~pwong/tehing/omp108/201617 How to Mesure 4L? 3L 5L 3L ontiner & 5L ontiner (without mrk) infinite supply of wter You n pour wter from one ontiner to nother

More information

Comparison-based Choices

Comparison-based Choices Comprison-se Choies John Ugner Mngement Siene & Engineering Stnfor University Joint work with: Jon Kleinerg (Cornell) Senhil Mullinthn (Hrvr) EC 17 Boston June 28, 2017 Preiting isrete hoies Clssi prolem:

More information

Outline. CS38 Introduction to Algorithms. Graphs. Graphs. Graphs. Graph traversals

Outline. CS38 Introduction to Algorithms. Graphs. Graphs. Graphs. Graph traversals Outline CS38 Introution to Algorithms Leture 2 April 3, 2014 grph trversls (BFS, DFS) onnetivity topologil sort strongly onnete omponents heps n hepsort greey lgorithms April 3, 2014 CS38 Leture 2 2 Grphs

More information

Declarative Routing: Extensible Routing with Declarative Queries

Declarative Routing: Extensible Routing with Declarative Queries elrtive Routing: Extensile Routing with elrtive Queries Boon Thu Loo 1 Joseph M. Hellerstein 1,2, Ion toi 1, Rghu Rmkrishnn3, 1 University of Cliforni t Berkeley, 2 Intel Reserh Berkeley, 3 University

More information

Kulleġġ San Ġorġ Preca Il-Liċeo tas-subien Ħamrun. Name & Surname: A) Mark the correct answer by inserting an X in the correct box. a b c d.

Kulleġġ San Ġorġ Preca Il-Liċeo tas-subien Ħamrun. Name & Surname: A) Mark the correct answer by inserting an X in the correct box. a b c d. Kulleġġ Sn Ġorġ Pre Il-Liċeo ts-suien Ħmrun Hlf Yerly Exmintion 2012 Trk 3 Form 3 INFORMATION TECHNOLOGY Time : 1hr 30 mins Nme & Surnme: Clss: A) Mrk the orret nswer y inserting n X in the orret ox. 1)

More information

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms Prdigm. Dt Struture Known exmples: link tble, hep, Our leture: suffix tree Will involve mortize method tht will be stressed shortly in this ourse Suffix trees Wht is suffix tree? Simple pplitions History

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Prefetching in an Intelligent Memory Architecture Using a Helper Thread

Prefetching in an Intelligent Memory Architecture Using a Helper Thread Prefething in n Intelligent Memory Arhiteture Using Helper Thre Yn Solihin, Jejin Lee, n Josep Torrells University of Illinois t Urn-Chmpign Mihign Stte University solihin,torrells @s.uiu.eu jlee@se.msu.eu

More information

COMPUTATION AND VISUALIZATION OF REACHABLE DISTRIBUTION NETWORK SUBSTATION VOLTAGE

COMPUTATION AND VISUALIZATION OF REACHABLE DISTRIBUTION NETWORK SUBSTATION VOLTAGE 24 th Interntionl Conferene on Eletriity Distriution Glsgow, 12-15 June 2017 Pper 0615 COMPUTATION AND VISUALIZATION OF REACHABLE DISTRIBUTION NETWORK SUBSTATION VOLTAGE Mihel SANKUR Dniel ARNOLD Lun SCHECTOR

More information

Partitioning for Parallelization Using Graph Parsing

Partitioning for Parallelization Using Graph Parsing Prtitioning for Prlleliztion Using Grph Prsing C. L. MCrery Deprtment of Computer Siene n Engineering Auurn University, Al 36849 e-mil: mrery@eng.uurn.eu (205) 844-6307 List of Figures Figure 3.1 Figure

More information

A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications $

A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications $ Informtion Systems 29 (2004) 23 46 A mthing lgorithm for mesuring the struturl similrity etween n XML oument n DTD n its pplitions $ Elis Bertino, Giovnn Guerrini, Mro Mesiti, * Diprtimento i Informti

More information

WORKSHOP 8B TENSION COUPON

WORKSHOP 8B TENSION COUPON WORKSHOP 8B TENSION COUPON WS8B-2 Workshop Ojetives Prtie reting n eiting geometry Prtie mesh seeing n iso meshing tehniques. WS8B-3 Suggeste Exerise Steps 1. Crete new tse. 2. Crete geometry moel of the

More information

Resource and Memory Management Techniques for the High-Level Synthesis of Software Threads into Parallel FPGA Hardware

Resource and Memory Management Techniques for the High-Level Synthesis of Software Threads into Parallel FPGA Hardware Resoure n Memory Mngement Tehniques for the High-Level Synthesis of Softwre Thres into Prllel FPGA Hrwre Jongsok Choi, Stephen Brown, n Json Anerson ECE Deprtment, University of Toronto, Toronto, ON, Cn

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

Using a User-Level Memory Thread for Correlation Prefetching

Using a User-Level Memory Thread for Correlation Prefetching Using User-Level Memory Thre for Correltion Prefething Yn Solihin Jejin Lee Josep Torrells University of Illinois t Urn-Chmpign Mihign Stte University http://iom.s.uiu.eu http://www.se.msu.eu/ jlee Astrt

More information

WORKSHOP 19 GLOBAL/LOCAL MODELING USING FEM FIELDS

WORKSHOP 19 GLOBAL/LOCAL MODELING USING FEM FIELDS WORKSHOP 19 GLOBAL/LOCAL MODELING USING FEM FIELDS WS19-1 WS19-2 Prolem Desription This exerise is use to emonstrte how to mp isplement results from the nlysis of glol(overll) moel onto the perimeter of

More information

Cellular-based Population to Enhance Genetic Algorithm for Assignment Problems

Cellular-based Population to Enhance Genetic Algorithm for Assignment Problems Amerin Journl of Intelligent Systems. 0; (): -5 DOI: 0. 593/j.jis.000.0 Cellulr-se Popultion to Enhne Geneti Algorithm for Assignment Prolems Hossein Rjlipour Cheshmehgz *, Hiollh Hron, Mohmm Rez Myoi

More information

Distance Computation between Non-convex Polyhedra at Short Range Based on Discrete Voronoi Regions

Distance Computation between Non-convex Polyhedra at Short Range Based on Discrete Voronoi Regions Distne Computtion etween Non-onvex Polyhedr t Short Rnge Bsed on Disrete Voronoi Regions Ktsuki Kwhi nd Hiroms Suzuki Deprtment of Preision Mhinery Engineering, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku,

More information

1 Which of the following keyword can not be appeared inside the class? a)virtual b)static c)template d)friend c

1 Which of the following keyword can not be appeared inside the class? a)virtual b)static c)template d)friend c 1 Whih of the following keywor n not e ppere insie the lss? )virtul )stti )templte )frien 2 Wht is templte? )Templte is formul for reting generi lss )Templte is use to mnipulte lss )Templte is use for

More information

Evaluating Regular Expression Matching Engines on Network and General Purpose Processors

Evaluating Regular Expression Matching Engines on Network and General Purpose Processors Evluting Regulr Expression Mthing Engines on Network n Generl Purpose Proessors Mihel Behi Wshington University Computer Siene n Engineering St. Louis, MO 63130-4899 mehi@se.wustl.eu Chrlie Wisemn Wshington

More information

The Droplet Virtual Brush for Chinese Calligraphic Character Modeling

The Droplet Virtual Brush for Chinese Calligraphic Character Modeling The Droplet Virtul Brush for Chinese Clligrphi Chrter Moeling Xiofeng Mi Jie Xu Min Tng Jinxing Dong CAD & CG Stte Key L of Chin, Zhejing University, Hngzhou, Chin Artifiil Intelligene Institute, Zhejing

More information

Outline. Motivation Background ARCH. Experiment Additional usages for Input-Depth. Regular Expression Matching DPI over Compressed HTTP

Outline. Motivation Background ARCH. Experiment Additional usages for Input-Depth. Regular Expression Matching DPI over Compressed HTTP ARCH This work ws supported y: The Europen Reserh Counil, The Isreli Centers of Reserh Exellene, The Neptune Consortium, nd Ntionl Siene Foundtion wrd CNS-119748 Outline Motivtion Bkground Regulr Expression

More information

McAfee Web Gateway

McAfee Web Gateway Relese Notes Revision C MAfee We Gtewy 7.6.2.11 Contents Aout this relese Enhnement Resolved issues Instlltion instrutions Known issues Additionl informtion Find produt doumenttion Aout this relese This

More information

CS553 Lecture Introduction to Data-flow Analysis 1

CS553 Lecture Introduction to Data-flow Analysis 1 ! Ide Introdution to Dt-flow nlysis!lst Time! Implementing Mrk nd Sweep GC!Tody! Control flow grphs! Liveness nlysis! Register llotion CS553 Leture Introdution to Dt-flow Anlysis 1 Dt-flow Anlysis! Dt-flow

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

Class Overview. Database Design. Database Design Process. Database Design. Introduction to Data Management CSE 414

Class Overview. Database Design. Database Design Process. Database Design. Introduction to Data Management CSE 414 Introution to Dt Mngement CSE 44 Unit 6: Coneptul Design E/R Digrms Integrity Constrints BCNF Introution to Dt Mngement CSE 44 E/R Digrms ( letures) CSE 44 Autumn 08 Clss Overview Dtse Design Unit : Intro

More information

Hash-based Subgraph Query Processing Method for Graph-structured XML Documents

Hash-based Subgraph Query Processing Method for Graph-structured XML Documents Hsh-bse Subgrph Query Proessing Metho for Grph-struture XML Douments Hongzhi Wng Hrbin Institute of Teh. wngzh@hit.eu.n Jinzhong Li Hrbin Institute of Teh. lijzh@hit.eu.n Jizhou Luo Hrbin Institute of

More information

Shared Memory Architectures. Programming and Synchronization. Today s Outline. Page 1. Message passing review Cosmic Cube discussion

Shared Memory Architectures. Programming and Synchronization. Today s Outline. Page 1. Message passing review Cosmic Cube discussion Tody s Outline Arhitetures Progrmming nd Synhroniztion Disuss pper on Cosmi Cube (messge pssing) Messge pssing review Cosmi Cube disussion > Messge pssing mhine Shred memory model > Communition > Synhroniztion

More information

Graph Contraction and Connectivity

Graph Contraction and Connectivity Chpter 14 Grph Contrtion n Connetivity So fr we hve mostly overe tehniques for solving problems on grphs tht were evelope in the ontext of sequentil lgorithms. Some of them re esy to prllelize while others

More information

4.3 Balanced Trees. let us assume that we can manipulate them conveniently and see how they can be put together to form trees.

4.3 Balanced Trees. let us assume that we can manipulate them conveniently and see how they can be put together to form trees. 428 T FOU 4.3 Blned Trees T BT GOIT IN T VIOU setion work well for wide vriety of pplitions, ut they hve poor worst-se performne. s we hve noted, files lredy in order, files in reverse order, files with

More information

Lecture 13: Graphs I: Breadth First Search

Lecture 13: Graphs I: Breadth First Search Leture 13 Grphs I: BFS 6.006 Fll 2011 Leture 13: Grphs I: Bredth First Serh Leture Overview Applitions of Grph Serh Grph Representtions Bredth-First Serh Rell: Grph G = (V, E) V = set of verties (ritrry

More information

COSC 6374 Parallel Computation. Non-blocking Collective Operations. Edgar Gabriel Fall Overview

COSC 6374 Parallel Computation. Non-blocking Collective Operations. Edgar Gabriel Fall Overview COSC 6374 Prllel Computtion Non-loking Colletive Opertions Edgr Griel Fll 2014 Overview Impt of olletive ommunition opertions Impt of ommunition osts on Speedup Crtesin stenil ommunition All-to-ll ommunition

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Generating Editors for Direct Manipulation of Diagrams

Generating Editors for Direct Manipulation of Diagrams Generting Eitors for Diret Mnipultion of Digrms Gerhr Viehstet n Mrk Mins Lehrstuhl für Progrmmiersprhen Universität Erlngen-Nürnerg Mrtensstr. 3, 91058 Erlngen, Germny E-mil: fviehste,minsg@informtik.uni-erlngen.e

More information

Problem Final Exam Set 2 Solutions

Problem Final Exam Set 2 Solutions CSE 5 5 Algoritms nd nd Progrms Prolem Finl Exm Set Solutions Jontn Turner Exm - //05 0/8/0. (5 points) Suppose you re implementing grp lgoritm tt uses ep s one of its primry dt strutures. Te lgoritm does

More information

Introduction to Algebra

Introduction to Algebra INTRODUCTORY ALGEBRA Mini-Leture 1.1 Introdution to Alger Evlute lgeri expressions y sustitution. Trnslte phrses to lgeri expressions. 1. Evlute the expressions when =, =, nd = 6. ) d) 5 10. Trnslte eh

More information

COSC 6374 Parallel Computation. Communication Performance Modeling (II) Edgar Gabriel Fall Overview. Impact of communication costs on Speedup

COSC 6374 Parallel Computation. Communication Performance Modeling (II) Edgar Gabriel Fall Overview. Impact of communication costs on Speedup COSC 6374 Prllel Computtion Communition Performne Modeling (II) Edgr Griel Fll 2015 Overview Impt of ommunition osts on Speedup Crtesin stenil ommunition All-to-ll ommunition Impt of olletive ommunition

More information

Width and Bounding Box of Imprecise Points

Width and Bounding Box of Imprecise Points Width nd Bounding Box of Impreise Points Vhideh Keikh Mrten Löffler Ali Mohdes Zhed Rhmti Astrt In this pper we study the following prolem: we re given set L = {l 1,..., l n } of prllel line segments,

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

Photovoltaic Panel Modelling Using a Stochastic Approach in MATLAB &Simulink

Photovoltaic Panel Modelling Using a Stochastic Approach in MATLAB &Simulink hotovolti nel Modelling Using Stohsti Approh in MATLAB &Simulink KAREL ZALATILEK, JAN LEUCHTER eprtment of Eletril Engineering University of efene Kouniov 65, 61 City of Brno CZECH REUBLIC krelzpltilek@unoz,

More information

An Efficient Code Update Scheme for DSP Applications in Mobile Embedded Systems

An Efficient Code Update Scheme for DSP Applications in Mobile Embedded Systems An Effiient Code Updte Sheme for DSP Applitions in Moile Emedded Systems Weiji Li, Youto Zhng Computer Siene Deprtment,University of Pittsurgh,Pittsurgh, PA 526 {weijili,zhngyt}@s.pitt.edu Astrt DSP proessors

More information

Bayesian Networks: Directed Markov Properties (Cont d) and Markov Equivalent DAGs

Bayesian Networks: Directed Markov Properties (Cont d) and Markov Equivalent DAGs Byesin Networks: Direte Mrkov Properties (Cont ) n Mrkov Equivlent DAGs Huizhen Yu jney.yu@s.helsinki.fi Dept. Computer Siene, Univ. of Helsinki Proilisti Moels, Spring, 2010 Huizhen Yu (U.H.) Byesin Networks:

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

1. Be able to do System Level Designs by: 2. Become proficient in a hardware-description language (HDL)

1. Be able to do System Level Designs by: 2. Become proficient in a hardware-description language (HDL) Ojetives CENG53 Digitl Sstem Design Digitl Mhine Design Overview 1. Be le to do Sstem Level Designs : Mstering design issues in ottom-up fshion nd Designing sstems for speifi pplitions in top-down methodolog

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

c s ha2 c s Half Adder Figure 2: Full Adder Block Diagram

c s ha2 c s Half Adder Figure 2: Full Adder Block Diagram Adder Tk: Implement 2-it dder uing 1-it full dder nd 1-it hlf dder omponent (Figure 1) tht re onneted together in top-level module. Derie oth omponent in VHDL. Prepre two implementtion where VHDL omponent

More information

Fault tree conversion to binary decision diagrams

Fault tree conversion to binary decision diagrams Loughorough University Institutionl Repository Fult tree onversion to inry deision digrms This item ws sumitted to Loughorough University's Institutionl Repository y the/n uthor. Cittion: ANDREWS, J.D.

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

TEMPLATE FOR ENTRY in Encyclopedia of Database Systems: GRID FILE. Yannis Manolopoulos

TEMPLATE FOR ENTRY in Encyclopedia of Database Systems: GRID FILE. Yannis Manolopoulos TEMPLATE FOR ENTRY in Enylopedi of Dtse Systems: GRID FILE Apostolos N. Ppdopoulos Ynnis Mnolopoulos Ynnis Theodoridis Vssilis Tsotrs Deprtment of Informtis Aristotle University of Thessloniki Thessloniki,

More information

WORKSHOP 3 FRAME MODEL CREATION USING CURVES, AND ANALYSIS

WORKSHOP 3 FRAME MODEL CREATION USING CURVES, AND ANALYSIS WORKSHOP 3 FRAME MODEL CREATION USING CURVES, AND ANALYSIS WS3-1 WS3-2 Workshop Ojetives Moel simple frme struture using geometri urves n 1D Br elements. The frme moel is to e onstrine using pin restrints,

More information

Structure in solution spaces: Three lessons from Jean-Claude

Structure in solution spaces: Three lessons from Jean-Claude Struture in solution spes: Three lessons from Jen-Clue Dvi Eppstein Computer Siene Deprtment, Univ. of Cliforni, Irvine Conferene on Meningfulness n Lerning Spes: A Triute to the Work of Jen-Clue Flmgne

More information

SAS Event Stream Processing 5.1: Using SAS Event Stream Processing Studio

SAS Event Stream Processing 5.1: Using SAS Event Stream Processing Studio SAS Event Strem Proessing 5.1: Using SAS Event Strem Proessing Stuio Overview to SAS Event Strem Proessing Stuio Overview SAS Event Strem Proessing Stuio is we-se lient tht enles you to rete, eit, uplo,

More information

Chapter 2. 3/28/2004 H133 Spring

Chapter 2. 3/28/2004 H133 Spring Chpter 2 Newton believe tht light ws me up of smll prticles. This point ws ebte by scientists for mny yers n it ws not until the 1800 s when series of experiments emonstrte wve nture of light. (But be

More information

Kinetic Collision Detection: Algorithms and Experiments

Kinetic Collision Detection: Algorithms and Experiments Kineti Collision Detetion: Algorithms n Experiments Leonis J. Guis Feng Xie Li Zhng Computer Siene Deprtment, Stnfor University Astrt Effiient ollision etetion is importnt in mny rooti tsks, from high-level

More information

SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS

SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS Progress In Eletromgnetis Reserh C, Vol. 3, 195 22, 28 SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS W.-L. Chen nd G.-M. Wng Rdr Engineering Deprtment Missile Institute of Air Fore Engineering

More information

Compiling a Parallel DSL to GPU

Compiling a Parallel DSL to GPU Compiling Prllel DSL to GPU Rmesh Nrynswmy Bdri Gopln Synopsys In. Synopsys 2012 1 Agend Overview of Verilog Simultion Prllel Verilog Simultion Algorithms Prllel Simultion Trdeoffs on GPU Chllenges Synopsys

More information