Prefetching in an Intelligent Memory Architecture Using a Helper Thread

Size: px
Start display at page:

Download "Prefetching in an Intelligent Memory Architecture Using a Helper Thread"

Transcription

1 Prefething in n Intelligent Memory Arhiteture Using Helper Thre Yn Solihin, Jejin Lee, n Josep Torrells University of Illinois t Urn-Chmpign Mihign Stte University jlee@se.msu.eu Astrt Dt prefething is populr tehnique for tolerting long memory ess ltenies. In this pper, we introue novel type of prefething: memory-sie orreltion prefething implemente in user-level thre. The prefething thre runs on generl-purpose proessor emee in the min memory. By lloting the orreltion tle in min memory, we n ffor the lrge spe require y the tle. In ition, the sheme n e supporte with few moifitions to the L2 he n no moifition to the min proessor ore. We introue new orgniztion of the orreltion tle n new prefething lgorithm tht enle fst n urte frhe prefething with high overge. Overll, our evlution shows tht the lgorithm effetively prefethes irregulr pplitions, speeing up three pplitions y n verge of Furthermore, our sheme n work synergistilly with onventionl proessor-sie prefether to eliver n verge speeup of Introution Dt prefething is populr tehnique to tolerte long memory ess ltenies. There hve een mny proposls using helper thre to help prefething for the min thre, suh s [12, 15]. These proposls hve fouse on either SMT or CMP pltforms. In this pper, we propose prefething thre sheme tht is suitle for implementtion in n Intelligent Memory Arhiteture (IMA). In IMA, the memory system is ugmente with one or more memory proessors. The nture of the prolems in IMA is quite ifferent thn in SMT or CMP pltforms. First, in SMT/CMP, Proessor-Sie prefething is use, while in IMA, Memory-Sie prefething is use, euse prefeth requests re generte y the proessor in the min memory. Seonly, ommunition etween the thres is hep in SMT/CMP, while it is expensive in IMA. Thus, suitle prefething sheme is one tht opertes utonomously n tht n e effetive with orsegrin ommunition etween the prefething n the min This work ws supporte in prt y the Ntionl Siene Fountion uner grnts CCR , EIA , n EIA , y DARPA uner grnt F C-0078, n y NCSA, Mihign Stte University, n gifts from IBM n Intel. thres. In this work, we implement the prefether s userlevel thre tht n prefeth irregulr pplitions effetively using orreltion prefething lgorithms. The only ommunition neee y the prefething thre is the miss ress strem of the min thre. Memory-sie prefething is ttrtive for severl resons. First, it elimintes the overhes tht prefeth requests n stte ookkeeping introue in the pths etween the min proessor n its hes. Seonly, it n e supporte with very few moifitions to the L2 he n no moifition to the min proessor ore. Thirly, the prefether n exploit its proximity to the memory to its vntge. Memory-sie prefething hs the itionl ttrtion of riing the tehnology tren of inrese hip integrtion. Inee, populr pltforms like PCs re eing equippe with grphis engines in the memory system [16]. Some hipsets, like NVIDIA s nfore [13] even integrte powerful proessor in the North Brige hip. Similr engines n e provie for prefething, or existing grphis proessors n e reuse for prefething when uner-utilize. Moreover, there re proposls to integrte proessing logi in DRAM hips, suh s IRAM [8]. Using n engine for memory-sie prefething hs een propose elsewhere [1, 2, 4, 13, 14, 16, 18]. However, in most ses, these engines perform either very simple opertions or highly-speifi opertions, suh s prefething linke t strutures [4, 18]. Inste, wht we woul like, is very flexile, generl-purpose prefether. While memory-sie prefether n support vriety of prefething lgorithms, one type tht is prtiulrly suitle is Correltion Prefething [1, 3, 5, 11]. Correltion prefething relies on orreltion of miss resses to preit n prefeth future misses se on the urrent stte. Beuse the only informtion the prefeth thre nees is the miss ress strem, orreltion prefething is suitle for n IMA pltform. In the pst, generl orreltion prefething hs een supporte y hrwre ontrollers tht require lrge eite hrwre tle struture [1, 3, 5, 11]. In ll ut one se, these ontrollers hve een ple etween the L1 n L2 hes or etween the L1 n the proessor. While effetive, the pproh hs very high hrwre ost. Furthermore, it oes not prefeth fr enough n tens to hve low overge.

2 This pper introues novel prefething sheme where memory-sie orreltion prefething lgorithms re implemente in softwre y using user-level thre. The lgorithms run on generl-purpose proessor in the min memory system. The sheme llows prefething lgorithms to evolve with the pplitions, even fter the omputer system is shippe. In ition, the system n e supporte with few moifitions to the L2 he, n no moifitions to the min proessor ore. We introue new orgniztion of the orreltion tle n new orreltion prefething lgorithm tht enle fst n fr-he prefething, with high overge n ury. By lloting the orreltion tle in min memory, we n ffor the lrge spe require y the tle. We emonstrte tht the softwre lgorithm n effetively prefeth t for irregulr pplitions. Inee, our sheme spees up three SPECInt2000 pplitions y n verge of We lso show tht our sheme n work synergistilly with onventionl proessor-sie prefether to eliver n verge speeup of The rest of the pper is orgnize s follows: Setion 2 isusses memory-sie prefething n orreltion prefething; Setion 3 presents our esign; Setion 4 isusses our evlution setup; Setion 5 evlutes our esign; n Setion 6 onlues. 2 Relte Issues 2.1 Memory-Sie Prefething Memory-Sie prefething ours when prefething is initite y one or set of engines tht resie in or esie the min memory (efinitely eyon ny memory us). Chip mnufturers hve integrte hrwire ontrollers tht proly reognize very simple sequenes like stries, suh s NVIDIA s DASP engine in the North Brige hip [13] n Intel s prefeth he in its i860 hipset. In this pper, we propose to use simple generl-purpose memory proessor for memory-sie prefething. Although this ie is pplile to generi memory system, we will illustrte it on PC-like memory system epite in Figure 1- (). The memory proessor n e ple in severl ples, suh s in the North Brige (Memory Controller) hip (1), or in the DRAM hips (2). The vntges of the first se re tht it is simple to support, euse the DRAM interfe is not moifie, n tht the memory proessor n e employe for other uses, suh s grphis engine. The seon se, lthough more omplite to support, hs the vntge of lower memory ess lteny n higher memory nwith ue to higher integrtion. In this pper, we stuy the performne potentil of the DRAM se. Memory- n proessor-sie prefething re not the sme s Push n Pull (or on-emn) prefething [18], respetively. Push prefeth ours when prefethe t is sent to he or proessor tht hs not requeste it, while pull prefeth is the opposite. Clerly, memory prefether n t s pull prefether, y simply storing the prefethe t in CPU L1 $ L2 $ North Brige Chip 1 2 () DRAM Memory Min Pro 1: MIss i Mem Pro () 3: Prefeth j, k Mem 2: Lookup Min Memory System Figure 1: Arhiteture of the system (), n tions of the prefethes (). lol uffer n supplying it to the proessor on emn. In generl, however, memory-sie prefething is most interesting when it performs push prefething to the hes of the proessor, euse it n hie lrger frtion of memory ess lteny. In our system, the memory proessor oserves the requests from the min proessor tht reh min memory. Bse on them, n fter exmining some internl stte, the memory proessor prefethes other lines tht it expets the min proessor to nee in the future (Figure 1-()). In this pper, we onentrte on push prefething into the L2 he. Sine the memory proessor only sees L2 he miss strems, it ims to eliminte L2 he misses y pushing the prefethe t into the L2 he. L2 he miss penlty is the lrgest omponent of memory ess lteny, n it is the hrest to hie, even y n out-of-orer proessor. Our sheme is inexpensive to support. The min proessor ore oes not nee to e moifie t ll. The L2 he nees to hve the following supports. First, s in mny other systems [4, 7], the L2 he ontroller hs to e le to ept lines from the memory system tht it hs not requeste. To o so, the L2 hs to ssign unuse Miss Sttus Hnling Registers (MSHRs) [10] to suh lines. Seonly, if the L2 hs pening request for the sme line when prefeth rrives, the prefeth simply stels the MSHR n uptes the he s if it were the reply. Finlly, prefethe line rriving t L2 is roppe in the following ses: the L2 he lrey hs opy of the line, the write k queue hs opy of the line euse the L2 is trying to write it k to memory, ll MSHRs re full, or ll the lines in the set where the prefeth line wnts to go re in pening stte. 2.2 Correltion Prefething Correltion Prefething uses the urrent stte of the referene or miss strem to preit n prefeth future misses. Two populr orreltion shemes re the Strie-Bse n Pir-Bse shemes. The former tries to fin strie pttern in the miss strem n prefeth ll the lotions tht woul e esse if the pttern ontinues in the future. The lt-

3 ter tries to ientify orreltion etween pirs of misses, for exmple etween miss n its immeite suessor. It silly reors sequene of miss resses in tle, n lter when it enounters the he of the sequene, it looks up the tle n prefethes the rest of the sequene. Wht mkes pir-se shemes ttrtive is their generl ppliility, i.e. they work for ny miss sequenes tht repet. This is true for regulr pplitions n for wie rnge of irregulr pplitions suh s those tht operte on sprse mtries n linke t strutures. Furthermore, the shemes n e employe without ny ompiler support or hnges in the pplition inries. Pir-se orreltion prefething hs only een stuie using hrwre implementtion of prefeth engines [1, 3, 5, 11, 17], usully y pling the engine etween the L1 n L2 he [3, 5, 11, 17]. These stuies hve emonstrte the ppliility of pir-se orreltion prefething on wie vriety of pplitions. However, they lso revel shortomings of the pproh. One ritil prolem is tht to e effetive, it nees lrge storge spe to mth the footprints of the pplitions. One n two Megytes of eite on-hip SRAM tles hve een propose [5, 11], while some pplitions with lrger footprints even nee 7.6 MB off-hip SRAM tle [11]. Furthermore, it oes not prefeth fr enough n hs low overge (unless it is tightly ouple to the min proessor n uses more fine grin informtion [11]). For exmple, for eh miss, Joseph n Grunwl only store immeite suessors [5]. The overge is low euse it nees one miss to trigger the prefether to prefeth the suessor of the miss. At est only hlf of the misses n e eliminte. This sheme uses wie tle tht stores mny suessors per miss n ontinuously reuils the tle to inrese the overge. However, it uses exessive useless prefethes. 3 Propose Sheme Pir-se orreltion prefething is suitle for our memorysie prefething system to support euse it hs generl ppliility n n e supporte inexpensively. We show tht shortomings of the urrent orreltion prefething shemes n e eliminte y improving the orreltion lgorithms n implementing them in softwre. The lgorithms esrie re implemente in prefething thre running on the memory proessor. The oe for the prefething thre is written in C n hn-optimize for miniml prefeth response n oupny time. In the following setions, we isuss the onepts (Setion 3.1), the rhiteture (Setion 3.2), pir-se orreltion prefething lgorithms (Setion 3.3), n onventionl proessor-sie prefething (Setion 3.4). 3.1 Conepts Prefething lgorithms re implemente s user-level helper thre tht we ll prefething thre. The tions of the memory proessor re etermine y the ehvior of the prefething thre tht we implement. The opertion of the prefething thre n e oneptully ivie into two phses: lerning n prefething. In the lerning phse, the prefething thre reors the L2 re n write miss ptterns tht it oserves in orreltion tle, one miss t time. In the prefething phse, every time tht the prefething thre sees miss, it looks up the orreltion tle n prefethes severl memory lines to the L2 he of the min proessor. No tion is tken on write-k memory ess. In prtie, s in [5], we foun tht omining the lerning n prefething phses enles the orreltion tle to quikly lern new ptterns n provies the est performne in most ses (Figure 2). Miss ress Prefeth resses Hnler finishes ville ville proessing Prefething phse Response Time Oupny Time Lerning phse Figure 2: Timing of the prefething thre. The prefething lgorithm n e hrterize y its response time n oupny time (Figure 2). The response time is efine s the time eginning when the prefething thre otins miss ress until the prefething thre proues the prefeth resses. The oupny time is the time the prefething thre is usy n nnot proess nother miss ress. As n e seen in the figure, the prefething phse is lwys exeute efore the lerning phse to minimize the response time. For the softwre implementtion to e vile, the oupny time hs to e smller thn the verge time etween two onseutive L2 he misses. Also, for est performne, the response time nees to e s smll s possile. By using prefething thre tht stores the orreltion tle in the min memory, we eliminte the high hrwre ost require y the tle in the tritionl implementtion. We further ress the inequies of tritionl orreltion prefething, nmely low prefething overge, n not prefething fr enough, y improving the orreltion lgorithms (Setion 3.3). 3.2 Arhiteture of the System When we integrte the memory proessor in the DRAM hips, the DRAM hips n possily the DRAM interfe nee to e moifie. Extr omplexities in hnling multiple DRAM hips must lso e resse. Our gol in this pper is to stuy the performne potentil of this se. Consequently, we strt wy the implementtion omplexity of integrting the proessor in the DRAM y ssuming single hip min memory with single memory proessor in it (Figure 3). The key ommunition ours through queues 1, 2, n 3. Miss requests from the min proessor re eposite in queues 1 n then in 2. In the lerning phse, the memory proessor uses the entries in queue 2 to uil its stte. In the prefething phse, the memory proessor uses the entries in queue 2 n its stte to generte resses to prefeth. The

4 North Brige Chip Other Units Min Proessor Bus Interfe 4 Memory Controller 1 DRAM hip 2 Memory Proessor Che 3 Row Deoer DRAM ells Row Buffer Figure 3: Mirorhiteure DRAM hip tht inlues memory proessor use for orreltion prefething. lines prefethe re eposite in queue 3. If the memory proessor suffers he miss on its orreltion tle struture, it esses the DRAM iretly. Queue 4 is in the replying pth from memory to the min proessor. 3.3 Pir-Bse Correltion Algorithms We now isuss the pir-se orreltion prefething lgorithms. We onsier two ifferent orgniztions for the orreltion tle: si one tht oes not llow t replition n more vne one tht llows replition. Their use gives rise to ifferent lgorithms. We onsier them in turn. Pir-Bse Algorithms with Bsi Tle Orgniztion Eh row in this tle stores the tg of the miss ress, n the resses of set of immeite suessor misses store in MRU orer. We onsier two lgorithms tht use this si orgniztion: Bse n Chin. Bse follows the sheme propose y Joseph n Grunwl [5]. For ny given miss, Bse is only intereste in prefething immeite suessor misses. The prmeters of the lgorithm re the numer of immeite suessors preite (NumSu), the numer of misses tht the orreltion tle n store preitions for (NumRows), n the ssoitivity of the orreltion tle (Asso). Bse is illustrte in Figure 4-(). It shows two snpshots of the orreltion tle t the point tht the orresponing miss tre hs een onsume (i n ii). In the exmple, NumSu is 2, NumRows is 4, n Asso is 1. Within row, suessors re reple using LRU replement poliy. As in Joseph n Grunwl s stuy [5], we fin tht LRU replement poliy for the suessors in eh row works est. The figures show the suessors in MRU orer from left to right. In the lerning phse, the proessor keeps pointer to the row of the lst miss oserve. When miss ours, its ress is ple s one of the immeite suessors of the lst miss, n new row is llote for the new miss unless n entry for the ress lrey exists. In the prefething phse (iii), when miss is oserve, the proessor fins the orresponing row n prefethes ll the NumSu immeite suessors, strting from the MRU one. Sine Bse only prefethes immeite suessors, its overge n lteny hiing pilities re limite. To improve this, we propose the Chin lgorithm, whih for every miss prefethes multiple levels of suessors. The lgorithm tkes one extr prmeter lle NumLevels, whih is the numer of levels of suessors prefethe. The lgorithm is illustrte in Figure 4-(). In the lerning phse, Chin is ientil to Bse (i n ii). However, Chin oes more work in the prefething phse (iii). After prefething the row of immeite suessors, it tkes the most reently-use suessor mong them n inexes the orreltion tle with its ress. If the entry is foun, it prefethes ll NumSu suessors there. Then, it tkes the most reently use suessor in tht row n repets the proess for NumLevels-1 times. As n exmple, suppose tht miss on line ours (iii). The memory proessor first prefethes n. Then, it tkes the MRU entry, looks-up the tle, n prefethes s suessor,. While improving the overge n fr-he prefething pility over Bse, Chin hs two limittions. One limittion is tht the response time of the lgorithm is high. To issue prefethes in response to miss, it nees to mke NumLevels esses to ifferent rows in the tle, eh possily involving low-ssoitive serh n potentilly using he miss. The seon limittion is tht it oes not prefeth the orret MRU suessors of eh level of suessors. Inste, it only prefethes suessors foun long the MRU pth. Pir-Bse Algorithms with ite Tle Orgniztion Eh row in this tle stores the tg of the miss ress, n NumLevels levels of suessors. Eh level ontins Num- Su resses, whih re MRU-orere. We propose new lgorithm lle ite tht exploits this tle orgniztion. ite tkes the sme prmeters s Chin. In the lerning phse, NumLevels pointers to the tle re kept for effiient ess, pointing to the rows for the ress of the lst miss, seon lst, n so on. When miss ours, its ress is reore in the orret position of MRU suessors of the lst few misses y using these pointers. Figures 4-() illustrtes the lgorithm. In the exmple, NumSu is 2, NumRows is 4, Asso is 1, n NumLevels is 2. The figure shows two snpshots of the orreltion tle in the lerning phse t the point where the orresponing miss tre hs een onsume (i n ii). The figure lso shows the position of the two pointers, n the lgorithm in prefething phse (iii). Note tht this orgniztion solves the two prolems of Chin. First, the response time is muh shorter. We n prefeth severl levels of suessors with single row ess, possily with only one he miss. In ft, we shift some omputtion from the prefething phse, whih is the ritil phse, to the lerning phse. Now the lerning phse nees to upte severl rows in the tle. However, the rows re most likely still in the he n, sine we keep the pointers to the entries of lst few miss resses, the ssoitive serh is voie. Seonly, y grouping together ll the suessors from given level, we n ientify the orret MRU suessors from tht level, yieling higher ury.

5 (i) NumRows=4 (ii) (iii) on miss Softwre Correltion Tle NumSu=2 () urrent miss,,,,,,... (tre of misses) urrent miss,,,,,,... prefeth, (i) NumRows=4 (ii) (iii) on miss follow link NumLevels=2 Softwre Correltion Tle NumSu=2 () urrent miss,,,,,,... (tre of misses) urrent miss,,,,,,... prefeth, prefeth (i) NumLevels=2 urrent miss SeonLst Lst,,,,,,... NumSu=2 (ii) urrent miss Lst,,,,,,... SeonLst (iii) on miss prefeth,, Lst SeonLst () Figure 4: Pir-se orreltion lgorithms: Bse (), Chin (), n ite (). Chrteristis Bse Chin ite Levels of suessors prefethe 1 Full MRU orering for eh level? Yes No Yes Num. row esses in the prefething phse (SEARCH) 1 1 Num. row esses in the lerning phse (NO SEARCH) 1 1 Response Time Low High Low Spe requirement (for onstnt numer of prefethes) Algorithm Comprison Tle 1 ompres the three pir-se shemes. From the tle, we see tht ite lgorithm tries to solve prolems in urrent orreltion prefething lgorithms: it looks fr he y prefething severl levels of suessors, therey improving overge, while keeping high ury y prefething the orret MRU suessors in eh level. Its only shortoming is its high spe requirements for the orreltion tle. Fortuntely, this is minor issue, sine the tle is llote in the min memory. The response time is etter with the ite lgorithm thn with the Chin lgorithm. The hnler in ite runs very effiiently euse he lines re well utilize. Note tht ll the orreltion lgorithms oul e implemente in hrwre. However, ite is very suitle for softwre implementtion euse it hs low response time, frhe prefething pility, n uses he lines well. Tle 1: Compring the ifferent pir-se lgorithms. 3.4 Conventionl Prefething Previous stuies foun tht pling strie-se prefether s front en of pir-se prefether mkes pir-se prefething more effetive [3, 17]. We exploit this fining y inluing proessor-sie prefething in the form of hrwre multi-strem sequentil prefether t the L1 he. The prefether hs similr pilities to strem uffers [6], exept tht the prefeth lines re put iretly in the L1 he. In our system, we ssume tht the memory ontroller n istinguish the prefethes issue y the proessor-sie prefether from regulr misses. The memory ontroller hooses not to pss suh prefethes to the memory proessor. As result, in generl, the proessor-sie prefether trgets the regulr misses while the memory-sie prefether trgets the irregulr ones. 4 Evlution Environment Applitions. To evlute our prefething sheme, we use three mostly irregulr memory-intensive pplitions from the SPECInt2000 suite. Irregulr pplitions re hrly menle to ompiler-se prefething. Consequently, they re the ovious trget for the type of prefething tht we propose. We hoose Gp, Mf, n Prser. Gp uses suset of the test input set, Mf uses the test input set, n Prser uses suset of the trin input set. Simultion Environment. The evlution is performe using exeution-riven simultion. Our environment is se on n extension to MINT tht supports ynmi superslr proessor moels with register renming, rnh preition, n non-loking memory opertions [9]. The rhiteture moele is tht of high-en PC with

6 Min Pro 6-issue ynmi, 1.6 GHz. Int, fp, l/st FU: 4,4,2. Pening l/st: 8/16. Brnh penlty: 12 yles. L1 t: write-k, 16 KB, 2 wy, 32-B line, 3-yle hit RT. L2 t: write-k, 512 KB, 4 wy, 64-B line, 19-yle hit RT. RT memory lteny: 243 yles (row miss), 208 yles (row hit). Min memory us: split-trnstion, 8-B wie, 400 MHz, 3.2 GB/se pek. Mem Pro in DRAM 2-issue ynmi, 800 MHz. Int, fp, l/st FU: 2,2,1. Pening l/st: 4/4. Brnh penlty: 6 yles. L1 t: write-k, 32 KB, 2 wy, 32-B line, 4-yle hit RT. RT memory lteny: 56 yles (row miss), 21 yles (row hit). Internl DRAM t us: 32-B wie, 800 MHz, 25.6 GB/se. DRAM prmeters Dul hnnel; eh hnnel 2-B wie, 800 MHz; totl 3.2 GB/se pek. Rnom ess time (trac) 45 ns; from Mem Controller (tsystem) 60 ns. Other Depth of queues 1 through 4: % Corret Preition Seq1 Seq4 Bse Seq4+Bse Gp Mf Prser Averge Figure 5: Chrterizing the preitility of misses. Tle 2: Prmeters of the simulte rhiteture. Ltenies orrespon to ontention-free onitions. RT stns for roun-trip from the proessor. All yles re 1.6 GHz yles. 512-KB L2 he is hosen for the min proessor euse we run smll inputs for the pplitions. memory proessor tht is integrte in the DRAM, following the mirorhiteture of Figure 3. Tle 2 shows the prmeters use for eh omponent of the rhiteture. The rhiteture is moele yle y yle, inluing ontention effets. In the simultion, oth the pplition thre n the prefething thre re run simultneously. We moel the ontention etween the two thres on memory susystems tht re shre (memory ontroller, DRAM hnnels, DRAM nks, et.). The simultion inlues ll overhes inurre y running the two thres on ifferent proessors. Algorithm Prmeters. Tle 3 shows the efult prmeter vlues tht we use for the lgorithms esrie in Setion 3.2. For the Bse lgorithm, we use the vlues similr to wht Joseph n Grunwl use for their system [5] to mke the omprison esier. For ll the lgorithms, we use Num- Rows = 64K, whih results in tle of size 1.3 MBytes, 0.66 MBytes, n 1.8 MBytes for Bse, Chin, n, respetively. These sizes re very tolerle, sine the tle is plin softwre t struture tht is store in min memory, is ynmilly llote, n is he y the memory proessor. The onventionl prefething isusse in Setion 3.4 tkes two prmeters: the numer of strems it is le to prefeth simultneously (NumSeq) n the numer of prefethes tht it issues per miss in sequene oserve (NumPref). We implement this lgorithm in hrwre in the L1 he (Conven4) n lso in softwre running on the memory proessor (Seq1 n Seq4). Algorithm Lel Prmeter Vlues Bse Bse NumSu = 4, Asso = 4 Chin Chin NumSu = 2, Asso = 2, NumLevels = 3 ite NumSu = 2, Asso = 2, NumLevels = 3 Conventionl 1-Strem Seq1 NumSeq = 1, NumPref = 6 Conventionl 4-Strem Seq4 NumSeq = 4, NumPref = 6 Conventionl 4-Strem Conven4 NumSeq = 4, NumPref = 6 Tle 3: Prmeter vlues use in the lgorithms. 5 Evlution To evlute our prefething sheme, we first hrterize the ehvior of pplitions (Setion 5.1) n then ompre the performne of ifferent lgorithms (Setion 5.2). 5.1 Chrterizing Applition Behvior For memory-sie orreltion prefething to e effetive, the miss ress strems hve to e preitle. In this experiment, we reor the frtion of L2 he misses tht re orretly preite. For sequentil sheme, this mens tht the upoming ress extly mthes the one preite, while for pir-se sheme, the upoming ress mthes one of the preite suessors. The thre oes not perform prefething here n it only oserves the resses of ll L2 he misses. In our experiments, shown in Figure 5, we reor the frtion of L2 he misses tht re orretly preite. We try strie-se shemes tht etet up to one strem (Seq1) n four strems (Seq4), the Bse lgorithm, n the omintion. The figure shows tht the miss strem is lrgely preitle, with Seq4, Bse, n Seq4+Bse orretly preiting roughly 40%, 70%, n 80% of the misses on verge, respetively. However, the preitility of eh pplition iffers. For exmple, Mf oes not hve sequentil ptterns, while Prser hs mostly sequentil ptterns, n Gp is mixe. % of Misses 100% 80% 60% 40% 20% 0% Figure 6: misses. Gp Mf Prser Averge [280,Infinity) [200,280) [80,200) [0,80) Chrterizing the time etween onseutive Seq4 lwys outperform Seq1, initing tht multiple

7 Normize Exeution Time NoPref Conven4 Bse Chin Conven4+ NoPref Conven4 Bse Chin Conven4+ NoPref Conven4 Bse Chin Conven4+ NoPref Conven4 Bse Chin Conven4+ Busy L1toL2 PstL2 Gp Mf Prser Averge Figure 7: Exeution time of the ifferent lgorithms. strem support is neessry for sequentil sheme. The figure shows tht in ll pplitions, Bse is lmost s goo s the omintion Seq4+Bse. This is euse orreltion tle is le to etet oth sequentil n irregulr ptterns, s long s the ptterns repet. One the tle lerns pttern, it n preit it effetively. However, it is still enefiil to hve multi-strem sequentil prefether t the proessor-sie for severl resons: it oes not nee lerning, it n e heply implemente, n it n hie the full memory lteny if integrte with the L1 he. Furthermore, it splits the misses into regulr n irregulr strems, n y tkling the regulr one, it removes some lo from the memory prefether. We now onsier the time etween misses. Figure 6 lssifies the misses oring to the numer of yles etween two onseutive misses rriving t the memory. The misses re groupe in ins orresponing to [0,80) 1.6 GHz proessor yles, [80,200), et. The most signifint ins in the figure re [200,280), [280, ), n [0,80), whih ontriute on verge to 54%, 28%, n 18% of ll miss istnes. The misses with istnes etween 200 n 280 re ritil s they re oth frequent n hr to hie even with out-of-orer proessors. Furthermore, sine the roun-trip memory lteny is etween 208 n 243 yles, epenent misses re likely to fll in this in. This hrteriztion suggests tht, to e on the sfe sie, oupny time of the prefething lgorithm shoul e less thn 200 yles. The [0,80) in ontins misses tht my not give enough time for our prefething thre to respon. Fortuntely, these misses re not frequent n re likely to e overlppe with eh other or with omputtion. Thus, they hrm the performne muh less thn the in size implies. 5.2 Compring the Different Algorithms Figure 7 ompres the exeution time of the pplitions in ifferent ses: no prefething (NoPref), hrwre proessorsie L1 prefething s shown in Tle 3 (Conven4), ifferent softwre memory-sie prefething shemes s shown in Tle 3 (Bse, Chin, n ), n the omintion of Conven4 n (Conven4+). For eh pplition n the verge, the rs re normlize to NoPref. They re roken own into miss stll time pst the L2 he (PstL2), miss stll time etween the L1 n L2 hes (L1toL2), n the remining time (Busy) tht represents proessor omputtion plus vrious pipeline stlls. On verge, the PstL2 time is the most signifint omponent of the exeution time, ontriuting out 40%, while Busy n L1toL2 follow with 35% n 25%, respetively. Thus, lthough our softwre sheme n only trget L2 he misses, we re trgeting the min performne ottlenek. The onventionl sheme (Conven4) performs well on pplitions with some sequentil ptterns, suh s Gp n Prser, ut is ineffetive in the pplition tht hs purely irregulr ptterns (Mf). On verge, Conven4 reues the exeution time y 10%. The pir-se shemes show mixe performne. The Bse sheme, moele fter Joseph n Grunwl s, shows limite speeups euse it oes not prefeth fr enough. Chin performs slightly etter thn Bse, ut is limite y inury n high response time. is le to reue the exeution time signifintly. It outperforms oth Bse n Chin in ll pplitions. Its impt omes from the nie properties of the ite lgorithm, s isusse in Setion 3. The omine sheme (Conven4+) performs the est. Its impt is signifint: it removes on verge 60% of PstL2 stll time, proviing n verge speeup of Compre to proessor-sie prefething only (Conven4) with n verge speeup of 1.11, n memory-sie prefething only () with n verge speeup of 1.28, there is ler synergisti effet in the omine sheme. Memory-sie prefething helps proessor-sie prefething in irregulr ptterns, while proessor-sie prefething helps in regulr ptterns. Worklo of the Prefething Thre We n gin further insight y exmining the work lo of the prefething thre. Figure 8 shows the verge response

8 time n oupny of the prefething thre for eh of the memory-sie prefething lgorithm. The ltenies re shown in 1.6 GHz yles n orrespon to the verge of ll pplitions. Eh r is roken own into omputtion time (Busy) n memory stll time (Mem). The numers on top of eh r show the verge IPC of the prefething thre. The IPC is lulte s the numer of instrutions ivie y the numer of memory proessor yles. The figure shows tht for ll the lgorithms, the oupny time is less thn 200 yles, showing the viility of the softwre implementtion. Chin n hve the lowest oupny time. Due to the fewer ssoitive serhes n the etter he use, hs only slightly higher oupny time ompre to Chin, espite performing more tle uptes. The response time is very importnt for prefething effetiveness. The figure shows tht hs the lowest response time. its vlue is roun 30 yles. Numer of Proessor Cyles Bse 1.09 Chin Mem Busy 0.94 Response Time 1.40 Bse 1.24 Chin Oupny Time Figure 8: Response n oupny time of the prefething thre for eh of the prefething lgorithm. 6 Conlusions This pper introue memory-sie orreltion-se prefething implemente in user-level thre. The sheme runs on generl-purpose proessor in the min memory. The sheme n e supporte with few moifitions to the L2 he n no moifition to the min proessor. We introue new orgniztion of the orreltion tle n new orreltion prefething lgorithm tht enle fst n urte fr-he prefething with high overge. Overll, our sheme effetively prefethe irregulr pplitions, speeing up three SPECInt2000 pplitions y n verge of Furthermore, our sheme n work synergistilly with onventionl proessor-sie prefether to eliver n verge speeup of Referenes [1] T. Alexner n G. Keem. Distriute Preitive Che Design for High Performne Memory Systems. In Proeeings of the 2n HPCA, Fe [2] J.B. Crter, W.C. Hsieh, L.B. Stoller, M.R. Swnson, L. Zhng, E.L. Brunvn, A. Dvis, C.-C. Kuo, R. Kurmkote, M.A. Prker, L. Shelike, n T. Tteym. Impulse: Builing Smrter Memory Controller. In Proeeings the 5th HPCA, Jnury [3] M.J. Chrney n A.P.Reeves. Generlize Correltion Bse Hrwre Prefething. Tehnil Report EE-CEG-95-1, Cornell University, Fe [4] C.J. Hughes. Prefething Linke Dt Strutures in Systems with Merge DRAM-Logi. Mster s thesis, University of Illinois t Urn-Chmpign, My URL: jhughes/jhughesmsthesis.pf. [5] D. Joseph n D. Grunwl. Prefething Using Mrkov Preitors. In Proeeings of the 24th ISCA, June [6] N.P. Jouppi. Improving Diret-Mppe Che Performne y the Aition of Smll Fully-Assoitive Che n Prefeth Buffers. In Proeeings of the 17th ISCA, pges , [7] D. Koufty n J. Torrells. Compring Dt Forwring n Prefething for Communition-Inue Misses in Shre-Memory MPs. In Proeeings of the ICS, July [8] C. Kozyrkis, S. Perisskis, D. Ptterson, T. Anerson, K. Asnovi, N. Crwell, R. Fromm, J. Golus, B. Grist, K. Keeton, R. Thoms, N. Treuhft, n K. Yelik. Slle Proessors in the Billion- Trnsistor Er: IRAM. IEEE Computer, Septemer [9] V. Krishnn n J. Torrells. An Exeution-Driven Frmework for Fst n Aurte Simultion of Superslr Proessors. In Interntionl Conferene on Prllel Arhitetures n Compiltion Tehniques (PACT), Otoer [10] D. Kroft. Lokup-free Instrution Feth/Prefeth Che Orgniztion. In Proeeings of the 8th ISCA, pges 87 85, [11] A.-C. Li, C. Fie, n B. Flsfi. De-Blok Preition n De- Blok Correlting Prefethers. In Proeeings of the 28th ISCA, [12] C.-K. Luk. Tolerting Memory Lteny through Softwre-Controlle Pre-Exeution in Simultneous Multithreing Proessors. In Proeeings of the 28th ISCA, [13] NVIDIA. [14] R. Cooksey, D. Colrelli, n D. Grunwl. Content-se Prefething: Initil Results. In The 2n Workshop on Intelligent Memory Systems, Nov [15] A. Roth n G.S. Sohi. Speultive Dt-Driven Multithreing. In Proeeings of the 7th HPCA, pges 37 48, Jn [16] Sony Computer Entertinment In. [17] T. Sherwoo, S. Sir, n B. Cler. Preitor-Direte Strem Buffers. In Proeeings of the 33th MICRO, De [18] C.-L. Yng n A.R.Leek. Push vs. Pull: Dt Movement for Linke Dt Strutures. In Proeeings of the 2000 ICS, My Aknowlegement We thnk Jmes Tuk, Jose F. Mrtinez, Jose Renu, n Mihel Hung for ontriutions to this work.

Using a User-Level Memory Thread for Correlation Prefetching

Using a User-Level Memory Thread for Correlation Prefetching Using User-Level Memory Thre for Correltion Prefething Yn Solihin Jejin Lee Josep Torrells University of Illinois t Urn-Chmpign Mihign Stte University http://iom.s.uiu.eu http://www.se.msu.eu/ jlee Astrt

More information

Distance vector protocol

Distance vector protocol istne vetor protool Irene Finohi finohi@i.unirom.it Routing Routing protool Gol: etermine goo pth (sequene of routers) thru network from soure to Grph strtion for routing lgorithms: grph noes re routers

More information

UTMC APPLICATION NOTE UT1553B BCRT TO INTERFACE PSEUDO-DUAL-PORT RAM ARCHITECTURE INTRODUCTION ARBITRATION DETAILS DESIGN SELECTIONS

UTMC APPLICATION NOTE UT1553B BCRT TO INTERFACE PSEUDO-DUAL-PORT RAM ARCHITECTURE INTRODUCTION ARBITRATION DETAILS DESIGN SELECTIONS UTMC APPLICATION NOTE UT1553B BCRT TO 80186 INTERFACE INTRODUCTION The UTMC UT1553B BCRT is monolithi CMOS integrte iruit tht provies omprehensive Bus Controller n Remote Terminl funtions for MIL-STD-

More information

Table-driven look-ahead lexical analysis

Table-driven look-ahead lexical analysis Tle-riven look-he lexil nlysis WUU YANG Computer n Informtion Siene Deprtment Ntionl Chio-Tung University, HsinChu, Tiwn, R.O.C. Astrt. Moern progrmming lnguges use regulr expressions to efine vli tokens.

More information

Greedy Algorithm. Algorithm Fall Semester

Greedy Algorithm. Algorithm Fall Semester Greey Algorithm Algorithm 0 Fll Semester Optimiztion prolems An optimiztion prolem is one in whih you wnt to fin, not just solution, ut the est solution A greey lgorithm sometimes works well for optimiztion

More information

CS 241 Week 4 Tutorial Solutions

CS 241 Week 4 Tutorial Solutions CS 4 Week 4 Tutoril Solutions Writing n Assemler, Prt & Regulr Lnguges Prt Winter 8 Assemling instrutions utomtilly. slt $d, $s, $t. Solution: $d, $s, nd $t ll fit in -it signed integers sine they re 5-it

More information

10.2 Graph Terminology and Special Types of Graphs

10.2 Graph Terminology and Special Types of Graphs 10.2 Grph Terminology n Speil Types of Grphs Definition 1. Two verties u n v in n unirete grph G re lle jent (or neighors) in G iff u n v re enpoints of n ege e of G. Suh n ege e is lle inient with the

More information

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V)

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V) Definitions G = (V, E) V = set of verties (vertex / noe) E = set of eges (v, w) (v, w in V) (v, w) orere => irete grph (igrph) (v, w) non-orere => unirete grph igrph: w is jent to v if there is n ege from

More information

Shared Memory Architectures. Programming and Synchronization. Today s Outline. Page 1. Message passing review Cosmic Cube discussion

Shared Memory Architectures. Programming and Synchronization. Today s Outline. Page 1. Message passing review Cosmic Cube discussion Tody s Outline Arhitetures Progrmming nd Synhroniztion Disuss pper on Cosmi Cube (messge pssing) Messge pssing review Cosmi Cube disussion > Messge pssing mhine Shred memory model > Communition > Synhroniztion

More information

Internet Routing. Reminder: Routing. CPSC Network Programming

Internet Routing. Reminder: Routing. CPSC Network Programming PS 360 - Network Progrmming Internet Routing Mihele Weigle eprtment of omputer Siene lemson University mweigle@s.lemson.eu pril, 00 http://www.s.lemson.eu/~mweigle/ourses/ps360 Reminer: Routing Internet

More information

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems Distriuted Systems Priniples nd Prdigms Mrten vn Steen VU Amsterdm, Dept. Computer Siene steen@s.vu.nl Chpter 11: Distriuted File Systems Version: Deemer 10, 2012 2 / 14 Distriuted File Systems Distriuted

More information

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved. Chpter 9 Greey Tehnique Copyright 2007 Person Aison-Wesley. All rights reserve. Greey Tehnique Construts solution to n optimiztion prolem piee y piee through sequene of hoies tht re: fesile lolly optiml

More information

Outline. Motivation Background ARCH. Experiment Additional usages for Input-Depth. Regular Expression Matching DPI over Compressed HTTP

Outline. Motivation Background ARCH. Experiment Additional usages for Input-Depth. Regular Expression Matching DPI over Compressed HTTP ARCH This work ws supported y: The Europen Reserh Counil, The Isreli Centers of Reserh Exellene, The Neptune Consortium, nd Ntionl Siene Foundtion wrd CNS-119748 Outline Motivtion Bkground Regulr Expression

More information

MITSUBISHI ELECTRIC RESEARCH LABORATORIES Cambridge, Massachusetts. Introduction to Matroids and Applications. Srikumar Ramalingam

MITSUBISHI ELECTRIC RESEARCH LABORATORIES Cambridge, Massachusetts. Introduction to Matroids and Applications. Srikumar Ramalingam Cmrige, Msshusetts Introution to Mtrois n Applitions Srikumr Rmlingm MERL mm//yy Liner Alger (,0,0) (0,,0) Liner inepenene in vetors: v, v2,..., For ll non-trivil we hve s v s v n s, s2,..., s n 2v2...

More information

Containers: Queue and List

Containers: Queue and List Continers: Queue n List Queue A ontiner in whih insertion is one t one en (the til) n eletion is one t the other en (the he). Also lle FIFO (First-In, First-Out) Jori Cortell n Jori Petit Deprtment of

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distriuted Systems Priniples nd Prdigms Christoph Dorn Distriuted Systems Group, Vienn University of Tehnology.dorn@infosys.tuwien..t http://www.infosys.tuwien..t/stff/dorn Slides dpted from Mrten vn Steen,

More information

Error Numbers of the Standard Function Block

Error Numbers of the Standard Function Block A.2.2 Numers of the Stndrd Funtion Blok evlution The result of the logi opertion RLO is set if n error ours while the stndrd funtion lok is eing proessed. This llows you to rnh to your own error evlution

More information

CICS Application Design

CICS Application Design CICS Applition Design In orer to lern whih questions hve een nswere orretly: 1. Print these pges. 2. Answer the questions. 3. Sen this ssessment with the nswers vi:. FAX to (212) 967-3498. Or. Mil the

More information

CMPUT101 Introduction to Computing - Summer 2002

CMPUT101 Introduction to Computing - Summer 2002 CMPUT Introdution to Computing - Summer 22 %XLOGLQJ&RPSXWHU&LUFXLWV Chpter 4.4 3XUSRVH We hve looked t so fr how to uild logi gtes from trnsistors. Next we will look t how to uild iruits from logi gtes,

More information

GENG2140 Modelling and Computer Analysis for Engineers

GENG2140 Modelling and Computer Analysis for Engineers GENG4 Moelling n Computer Anlysis or Engineers Letures 9 & : Gussin qurture Crete y Grn Romn Joles, PhD Shool o Mehnil Engineering, UWA GENG4 Content Deinition o Gussin qurture Computtion o weights n points

More information

Duality in linear interval equations

Duality in linear interval equations Aville online t http://ijim.sriu..ir Int. J. Industril Mthemtis Vol. 1, No. 1 (2009) 41-45 Dulity in liner intervl equtions M. Movhedin, S. Slhshour, S. Hji Ghsemi, S. Khezerloo, M. Khezerloo, S. M. Khorsny

More information

Robust internal multiple prediction algorithm Zhiming James Wu, Sonika, Bill Dragoset*, WesternGeco

Robust internal multiple prediction algorithm Zhiming James Wu, Sonika, Bill Dragoset*, WesternGeco Roust internl multiple preition lgorithm Zhiming Jmes Wu, Sonik, Bill Drgoset*, WesternGeo Summry Multiple ttenution is n importnt t proessing step for oth mrine n ln t. Tehniques for surfe- rpily in the

More information

COMPUTER EDUCATION TECHNIQUES, INC. (WEBLOGIC_SVR_ADM ) SA:

COMPUTER EDUCATION TECHNIQUES, INC. (WEBLOGIC_SVR_ADM ) SA: In orer to lern whih questions hve een nswere orretly: 1. Print these pges. 2. Answer the questions. 3. Sen this ssessment with the nswers vi:. FAX to (212) 967-3498. Or. Mil the nswers to the following

More information

CS553 Lecture Introduction to Data-flow Analysis 1

CS553 Lecture Introduction to Data-flow Analysis 1 ! Ide Introdution to Dt-flow nlysis!lst Time! Implementing Mrk nd Sweep GC!Tody! Control flow grphs! Liveness nlysis! Register llotion CS553 Leture Introdution to Dt-flow Anlysis 1 Dt-flow Anlysis! Dt-flow

More information

Internet Routing. IP Packet Format. IP Fragmentation & Reassembly. Principles of Internet Routing. Computer Networks 9/29/2014.

Internet Routing. IP Packet Format. IP Fragmentation & Reassembly. Principles of Internet Routing. Computer Networks 9/29/2014. omputer Networks 9/29/2014 IP Pket Formt Internet Routing Ki Shen IP protool version numer heder length (words) for qulity of servie mx numer remining hops (deremented t eh router) upper lyer protool to

More information

Scalable Spatio-temporal Continuous Query Processing for Location-aware Services

Scalable Spatio-temporal Continuous Query Processing for Location-aware Services Slle Sptio-temporl Continuous uery Proessing for Lotion-wre Servies iopeng iong Mohme F. Mokel Wli G. Aref Susnne E. Hmrush Sunil Prhkr Deprtment of Computer Sienes, Purue University, West Lfyette, IN

More information

COMMON FRACTIONS. or a / b = a b. , a is called the numerator, and b is called the denominator.

COMMON FRACTIONS. or a / b = a b. , a is called the numerator, and b is called the denominator. COMMON FRACTIONS BASIC DEFINITIONS * A frtion is n inite ivision. or / * In the frtion is lle the numertor n is lle the enomintor. * The whole is seprte into "" equl prts n we re onsiering "" of those

More information

Using SIMD Registers and Instructions to Enable Instruction-Level Parallelism in Sorting Algorithms

Using SIMD Registers and Instructions to Enable Instruction-Level Parallelism in Sorting Algorithms Using SIMD Registers n Instrutions to Enle Instrution-Level Prllelism in Sorting Algorithms Timothy Furtk furtk@s.ulert. José Nelson Amrl mrl@s.ulert. Roert Niewiomski niewio@s.ulert. Deprtment of Computing

More information

Class Overview. Database Design. Database Design Process. Database Design. Introduction to Data Management CSE 414

Class Overview. Database Design. Database Design Process. Database Design. Introduction to Data Management CSE 414 Introution to Dt Mngement CSE 44 Unit 6: Coneptul Design E/R Digrms Integrity Constrints BCNF Introution to Dt Mngement CSE 44 E/R Digrms ( letures) CSE 44 Autumn 08 Clss Overview Dtse Design Unit : Intro

More information

COSC 6374 Parallel Computation. Dense Matrix Operations

COSC 6374 Parallel Computation. Dense Matrix Operations COSC 6374 Prllel Computtion Dense Mtrix Opertions Edgr Griel Fll Edgr Griel Prllel Computtion Edgr Griel erminology Dense Mtrix: ll elements of the mtrix ontin relevnt vlues ypilly stored s 2-D rry, (e.g.

More information

Comparison-based Choices

Comparison-based Choices Comprison-se Choies John Ugner Mngement Siene & Engineering Stnfor University Joint work with: Jon Kleinerg (Cornell) Senhil Mullinthn (Hrvr) EC 17 Boston June 28, 2017 Preiting isrete hoies Clssi prolem:

More information

CS453 INTRODUCTION TO DATAFLOW ANALYSIS

CS453 INTRODUCTION TO DATAFLOW ANALYSIS CS453 INTRODUCTION TO DATAFLOW ANALYSIS CS453 Leture Register llotion using liveness nlysis 1 Introdution to Dt-flow nlysis Lst Time Register llotion for expression trees nd lol nd prm vrs Tody Register

More information

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms Prdigm. Dt Struture Known exmples: link tble, hep, Our leture: suffix tree Will involve mortize method tht will be stressed shortly in this ourse Suffix trees Wht is suffix tree? Simple pplitions History

More information

Lesson 4.4. Euler Circuits and Paths. Explore This

Lesson 4.4. Euler Circuits and Paths. Explore This Lesson 4.4 Euler Ciruits nd Pths Now tht you re fmilir with some of the onepts of grphs nd the wy grphs onvey onnetions nd reltionships, it s time to egin exploring how they n e used to model mny different

More information

WORKSHOP 9 HEX MESH USING SWEEP VECTOR

WORKSHOP 9 HEX MESH USING SWEEP VECTOR WORKSHOP 9 HEX MESH USING SWEEP VECTOR WS9-1 WS9-2 Prolem Desription This exerise involves importing urve geometry from n IGES file. The urves re use to rete other urves. From the urves trimme surfes re

More information

Evaluating Regular Expression Matching Engines on Network and General Purpose Processors

Evaluating Regular Expression Matching Engines on Network and General Purpose Processors Evluting Regulr Expression Mthing Engines on Network n Generl Purpose Proessors Mihel Behi Wshington University Computer Siene n Engineering St. Louis, MO 63130-4899 mehi@se.wustl.eu Chrlie Wisemn Wshington

More information

McAfee Web Gateway

McAfee Web Gateway Relese Notes Revision C MAfee We Gtewy 7.6.2.11 Contents Aout this relese Enhnement Resolved issues Instlltion instrutions Known issues Additionl informtion Find produt doumenttion Aout this relese This

More information

Pattern Matching. Pattern Matching. Pattern Matching. Review of Regular Expressions

Pattern Matching. Pattern Matching. Pattern Matching. Review of Regular Expressions Pttern Mthing Pttern Mthing Some of these leture slides hve een dpted from: lgorithms in C, Roert Sedgewik. Gol. Generlize string serhing to inompletely speified ptterns. pplitions. Test if string or its

More information

Asurveyofpractical algorithms for suffix tree construction in external memory

Asurveyofpractical algorithms for suffix tree construction in external memory Asurveyofprtil lgorithms for suffix tree onstrution in externl memory M. Brsky,, U. Stege n A. Thomo University of Vitori, PO Box, STN CSC Vitori, BC, VW P, Cn SUMMAY The onstrution of suffix trees in

More information

Resource and Memory Management Techniques for the High-Level Synthesis of Software Threads into Parallel FPGA Hardware

Resource and Memory Management Techniques for the High-Level Synthesis of Software Threads into Parallel FPGA Hardware Resoure n Memory Mngement Tehniques for the High-Level Synthesis of Softwre Thres into Prllel FPGA Hrwre Jongsok Choi, Stephen Brown, n Json Anerson ECE Deprtment, University of Toronto, Toronto, ON, Cn

More information

Introduction to Algebra

Introduction to Algebra INTRODUCTORY ALGEBRA Mini-Leture 1.1 Introdution to Alger Evlute lgeri expressions y sustitution. Trnslte phrses to lgeri expressions. 1. Evlute the expressions when =, =, nd = 6. ) d) 5 10. Trnslte eh

More information

Declarative Routing: Extensible Routing with Declarative Queries

Declarative Routing: Extensible Routing with Declarative Queries elrtive Routing: Extensile Routing with elrtive Queries Boon Thu Loo 1 Joseph M. Hellerstein 1,2, Ion toi 1, Rghu Rmkrishnn3, 1 University of Cliforni t Berkeley, 2 Intel Reserh Berkeley, 3 University

More information

Cooperative Routing in Multi-Source Multi-Destination Multi-hop Wireless Networks

Cooperative Routing in Multi-Source Multi-Destination Multi-hop Wireless Networks oopertive Routing in Multi-Soure Multi-estintion Multi-hop Wireless Networks Jin Zhng Qin Zhng eprtment of omputer Siene n ngineering Hong Kong University of Siene n Tehnology, HongKong {zjzj, qinzh}@se.ust.hk

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

FASTEST METHOD TO FIND ALTERNATIVE RE-ROUTE

FASTEST METHOD TO FIND ALTERNATIVE RE-ROUTE INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 FASTEST METHOD TO FIND ALTERNATIVE RE-ROUTE 1 M.JothiLkshmi, M.S., M.Phil. 2 C.Theeendr, M.S., M.Phil. 3 M.K.Pvithr,

More information

COSC 6374 Parallel Computation. Non-blocking Collective Operations. Edgar Gabriel Fall Overview

COSC 6374 Parallel Computation. Non-blocking Collective Operations. Edgar Gabriel Fall Overview COSC 6374 Prllel Computtion Non-loking Colletive Opertions Edgr Griel Fll 2014 Overview Impt of olletive ommunition opertions Impt of ommunition osts on Speedup Crtesin stenil ommunition All-to-ll ommunition

More information

Inter-domain Routing

Inter-domain Routing COMP 631: NETWORKED & DISTRIBUTED SYSTEMS Inter-domin Routing Jsleen Kur Fll 2016 1 Internet-sle Routing: Approhes DV nd link-stte protools do not sle to glol Internet How to mke routing slle? Exploit

More information

LINX MATRIX SWITCHERS FIRMWARE UPDATE INSTRUCTIONS FIRMWARE VERSION

LINX MATRIX SWITCHERS FIRMWARE UPDATE INSTRUCTIONS FIRMWARE VERSION Overview LINX MATRIX SWITCHERS FIRMWARE UPDATE INSTRUCTIONS FIRMWARE VERSION 4.4.1.0 Due to the omplex nture of this updte, plese fmilirize yourself with these instrutions nd then ontt RGB Spetrum Tehnil

More information

Introduction. Example

Introduction. Example OMS0 Introution isjoint sets n minimum spnning trees In this leture we will strt by isussing t struture use for mintining isjoint subsets of some bigger set. This hs number of pplitions, inluing to mintining

More information

Graph Contraction and Connectivity

Graph Contraction and Connectivity Chpter 14 Grph Contrtion n Connetivity So fr we hve mostly overe tehniques for solving problems on grphs tht were evelope in the ontext of sequentil lgorithms. Some of them re esy to prllelize while others

More information

Graph theory Route problems

Graph theory Route problems Bhelors thesis Grph theory Route prolems Author: Aolphe Nikwigize Dte: 986 - -5 Sujet: Mthemtis Level: First level (Bhelor) Course oe: MAE Astrt In this thesis we will review some route prolems whih re

More information

Comparing Hierarchical Data in External Memory

Comparing Hierarchical Data in External Memory Compring Hierrhil Dt in Externl Memory Surshn S. Chwthe Deprtment of Computer Siene University of Mryln College Prk, MD 090 hw@s.um.eu Astrt We present n externl-memory lgorithm for omputing minimum-ost

More information

COSC 6374 Parallel Computation. Communication Performance Modeling (II) Edgar Gabriel Fall Overview. Impact of communication costs on Speedup

COSC 6374 Parallel Computation. Communication Performance Modeling (II) Edgar Gabriel Fall Overview. Impact of communication costs on Speedup COSC 6374 Prllel Computtion Communition Performne Modeling (II) Edgr Griel Fll 2015 Overview Impt of ommunition osts on Speedup Crtesin stenil ommunition All-to-ll ommunition Impt of olletive ommunition

More information

PROBLEM OF APOLLONIUS

PROBLEM OF APOLLONIUS PROBLEM OF APOLLONIUS In the Jnury 010 issue of Amerin Sientist D. Mkenzie isusses the Apollonin Gsket whih involves fining the rius of the lrgest irle whih just fits into the spe etween three tngent irles

More information

A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications $

A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications $ Informtion Systems 29 (2004) 23 46 A mthing lgorithm for mesuring the struturl similrity etween n XML oument n DTD n its pplitions $ Elis Bertino, Giovnn Guerrini, Mro Mesiti, * Diprtimento i Informti

More information

Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow

Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow Shifting n Comption for the High-Level Synthesis of Designs with Complex Control low Sumit Gupt Nikil Dutt Rjesh Gupt Alexnru Niolu Center for Emee Computer Systems Shool of Informtion n Computer Siene

More information

Parallelization Optimization of System-Level Specification

Parallelization Optimization of System-Level Specification Prlleliztion Optimiztion of System-Level Speifition Luki i niel. Gjski enter for Emedded omputer Systems University of liforni Irvine, 92697, US {li, gjski} @es.ui.edu strt This pper introdues the prlleliztion

More information

Midterm Exam CSC October 2001

Midterm Exam CSC October 2001 Midterm Exm CSC 173 23 Otoer 2001 Diretions This exm hs 8 questions, severl of whih hve suprts. Eh question indites its point vlue. The totl is 100 points. Questions 5() nd 6() re optionl; they re not

More information

Compiling a Parallel DSL to GPU

Compiling a Parallel DSL to GPU Compiling Prllel DSL to GPU Rmesh Nrynswmy Bdri Gopln Synopsys In. Synopsys 2012 1 Agend Overview of Verilog Simultion Prllel Verilog Simultion Algorithms Prllel Simultion Trdeoffs on GPU Chllenges Synopsys

More information

WORKSHOP 19 GLOBAL/LOCAL MODELING USING FEM FIELDS

WORKSHOP 19 GLOBAL/LOCAL MODELING USING FEM FIELDS WORKSHOP 19 GLOBAL/LOCAL MODELING USING FEM FIELDS WS19-1 WS19-2 Prolem Desription This exerise is use to emonstrte how to mp isplement results from the nlysis of glol(overll) moel onto the perimeter of

More information

A METHOD FOR CHARACTERIZATION OF THREE-PHASE UNBALANCED DIPS FROM RECORDED VOLTAGE WAVESHAPES

A METHOD FOR CHARACTERIZATION OF THREE-PHASE UNBALANCED DIPS FROM RECORDED VOLTAGE WAVESHAPES A METHOD FOR CHARACTERIZATION OF THREE-PHASE UNBALANCED DIPS FROM RECORDED OLTAGE WAESHAPES M.H.J. Bollen, L.D. Zhng Dept. Eletri Power Engineering Chlmers University of Tehnology, Gothenurg, Sweden Astrt:

More information

Lecture 8: Graph-theoretic problems (again)

Lecture 8: Graph-theoretic problems (again) COMP36111: Advned Algorithms I Leture 8: Grph-theoreti prolems (gin) In Prtt-Hrtmnn Room KB2.38: emil: iprtt@s.mn..uk 2017 18 Reding for this leture: Sipser: Chpter 7. A grph is pir G = (V, E), where V

More information

Minimal Memory Abstractions

Minimal Memory Abstractions Miniml Memory Astrtions (As implemented for BioWre Corp ) Nthn Sturtevnt University of Alert GAMES Group Ferury, 7 Tlk Overview Prt I: Building Astrtions Minimizing memory requirements Performnes mesures

More information

5 ANGLES AND POLYGONS

5 ANGLES AND POLYGONS 5 GLES POLYGOS urling rige looks like onventionl rige when it is extene. However, it urls up to form n otgon to llow ots through. This Rolling rige is in Pington sin in Lonon, n urls up every Friy t miy.

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Outline. CS38 Introduction to Algorithms. Graphs. Graphs. Graphs. Graph traversals

Outline. CS38 Introduction to Algorithms. Graphs. Graphs. Graphs. Graph traversals Outline CS38 Introution to Algorithms Leture 2 April 3, 2014 grph trversls (BFS, DFS) onnetivity topologil sort strongly onnete omponents heps n hepsort greey lgorithms April 3, 2014 CS38 Leture 2 2 Grphs

More information

Pipeline Example: Cycle 1. Pipeline Example: Cycle 2. Pipeline Example: Cycle 4. Pipeline Example: Cycle 3. 3 instructions. 3 instructions.

Pipeline Example: Cycle 1. Pipeline Example: Cycle 2. Pipeline Example: Cycle 4. Pipeline Example: Cycle 3. 3 instructions. 3 instructions. ipeline Exmple: Cycle 1 ipeline Exmple: Cycle X X/ /W X X/ /W $3,$,$1 lw $,0($5) $3,$,$1 3 instructions 8 9 ipeline Exmple: Cycle 3 ipeline Exmple: Cycle X X/ /W X X/ /W sw $6,($7) lw $,0($5) $3,$,$1 sw

More information

Solids. Solids. Curriculum Ready.

Solids. Solids. Curriculum Ready. Curriulum Rey www.mthletis.om This ooklet is ll out ientifying, rwing n mesuring solis n prisms. SOM CUES The Som Cue ws invente y Dnish sientist who went y the nme of Piet Hein. It is simple 3 # 3 #

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

UT1553B BCRT True Dual-port Memory Interface

UT1553B BCRT True Dual-port Memory Interface UTMC APPICATION NOTE UT553B BCRT True Dul-port Memory Interfce INTRODUCTION The UTMC UT553B BCRT is monolithic CMOS integrted circuit tht provides comprehensive MI-STD- 553B Bus Controller nd Remote Terminl

More information

SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS

SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS Progress In Eletromgnetis Reserh C, Vol. 3, 195 22, 28 SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS W.-L. Chen nd G.-M. Wng Rdr Engineering Deprtment Missile Institute of Air Fore Engineering

More information

ORGANIZER QUICK START GUIDE

ORGANIZER QUICK START GUIDE NOTES ON USING GOTOWEBINAR GoToWeinr Orgnizers my hol Weinrs for up to 1,000 ttenees. The Weinr proess n e roken into three stges: Weinr Plnning, Weinr Presenttion n Weinr Follow-up. Orgnizers nee to first

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Accurate Indirect Branch Prediction

Accurate Indirect Branch Prediction Aurte Indiret Brnh Predition Krel Driesen nd Urs Hölzle Deprtment of Computer Siene University of Cliforni Snt Brbr, CA 9 Abstrt Indiret brnh predition is likely to beome inresingly importnt in the future

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

CS 551 Computer Graphics. Hidden Surface Elimination. Z-Buffering. Basic idea: Hidden Surface Removal

CS 551 Computer Graphics. Hidden Surface Elimination. Z-Buffering. Basic idea: Hidden Surface Removal CS 55 Computer Grphis Hidden Surfe Removl Hidden Surfe Elimintion Ojet preision lgorithms: determine whih ojets re in front of others Uses the Pinter s lgorithm drw visile surfes from k (frthest) to front

More information

An Efficient Algorithm for the Physical Mapping of Clustered Task Graphs onto Multiprocessor Architectures

An Efficient Algorithm for the Physical Mapping of Clustered Task Graphs onto Multiprocessor Architectures An Effiient Algorithm for the Physil Mpping of Clustere Tsk Grphs onto Multiproessor Arhitetures Netrios Koziris Pnyiotis Tsnks Mihel Romesis George Ppkonstntinou Ntionl Tehnil University of Athens Dept.

More information

The Droplet Virtual Brush for Chinese Calligraphic Character Modeling

The Droplet Virtual Brush for Chinese Calligraphic Character Modeling The Droplet Virtul Brush for Chinese Clligrphi Chrter Moeling Xiofeng Mi Jie Xu Min Tng Jinxing Dong CAD & CG Stte Key L of Chin, Zhejing University, Hngzhou, Chin Artifiil Intelligene Institute, Zhejing

More information

[SYLWAN., 158(6)]. ISI

[SYLWAN., 158(6)]. ISI The proposl of Improved Inext Isomorphi Grph Algorithm to Detet Design Ptterns Afnn Slem B-Brhem, M. Rizwn Jmeel Qureshi Fulty of Computing nd Informtion Tehnology, King Adulziz University, Jeddh, SAUDI

More information

A decision support system prototype for fuzzy multiple objective optimization

A decision support system prototype for fuzzy multiple objective optimization EUSFLAT - LFA A eision support system prototype for fuzzy multiple ojetive optimiztion Fengjie Wu Jie Lu n Gungqun Zhng Fulty of Informtion Tehnology University of Tehnology Syney Austrli E-mil: {fengjiewjieluzhngg}@it.uts.eu.u

More information

Rolling Back Remote Provisioning Changes. Dell Command Integration for System Center

Rolling Back Remote Provisioning Changes. Dell Command Integration for System Center Rolling Bk Remote Provisioning Chnges Dell Commn Integrtion for System Center Notes, utions, n wrnings NOTE: A NOTE inites importnt informtion tht helps you mke etter use of your prout. CAUTION: A CAUTION

More information

Advanced Programming Handout 5. Enter Okasaki. Persistent vs. Ephemeral. Functional Queues. Simple Example. Persistent vs.

Advanced Programming Handout 5. Enter Okasaki. Persistent vs. Ephemeral. Functional Queues. Simple Example. Persistent vs. Avne Progrmming Hnout 5 Purel Funtionl Dt Strutures: A Cse Stu in Funtionl Progrmming Persistent vs. Ephemerl An ephemerl t struture is one for whih onl one version is ville t time: fter n upte opertion,

More information

Partitioning for Parallelization Using Graph Parsing

Partitioning for Parallelization Using Graph Parsing Prtitioning for Prlleliztion Using Grph Prsing C. L. MCrery Deprtment of Computer Siene n Engineering Auurn University, Al 36849 e-mil: mrery@eng.uurn.eu (205) 844-6307 List of Figures Figure 3.1 Figure

More information

Using Red-Eye to improve face detection in low quality video images

Using Red-Eye to improve face detection in low quality video images Using Re-Eye to improve fe etetion in low qulity vieo imges Rihr Youmrn Shool of Informtion Tehnology University of Ottw, Cn youmrn@site.uottw. Any Aler Shool of Informtion Tehnology University of Ottw,

More information

Generating Editors for Direct Manipulation of Diagrams

Generating Editors for Direct Manipulation of Diagrams Generting Eitors for Diret Mnipultion of Digrms Gerhr Viehstet n Mrk Mins Lehrstuhl für Progrmmiersprhen Universität Erlngen-Nürnerg Mrtensstr. 3, 91058 Erlngen, Germny E-mil: fviehste,minsg@informtik.uni-erlngen.e

More information

Hash-based Subgraph Query Processing Method for Graph-structured XML Documents

Hash-based Subgraph Query Processing Method for Graph-structured XML Documents Hsh-bse Subgrph Query Proessing Metho for Grph-struture XML Douments Hongzhi Wng Hrbin Institute of Teh. wngzh@hit.eu.n Jinzhong Li Hrbin Institute of Teh. lijzh@hit.eu.n Jizhou Luo Hrbin Institute of

More information

FEEDBACK: The standard error of a regression is not an unbiased estimator for the standard deviation of the error in a multiple regression model.

FEEDBACK: The standard error of a regression is not an unbiased estimator for the standard deviation of the error in a multiple regression model. Introutory Eonometris: A Moern Approh 6th Eition Woolrige Test Bnk Solutions Complete ownlo: https://testbnkre.om/ownlo/introutory-eonometris-moern-pproh-6th-eition-jeffreym-woolrige-test-bnk/ Solutions

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

6.045J/18.400J: Automata, Computability and Complexity. Quiz 2: Solutions. Please write your name in the upper corner of each page.

6.045J/18.400J: Automata, Computability and Complexity. Quiz 2: Solutions. Please write your name in the upper corner of each page. 6045J/18400J: Automt, Computbility nd Complexity Mrh 30, 2005 Quiz 2: Solutions Prof Nny Lynh Vinod Vikuntnthn Plese write your nme in the upper orner of eh pge Problem Sore 1 2 3 4 5 6 Totl Q2-1 Problem

More information

COMP108 Algorithmic Foundations

COMP108 Algorithmic Foundations Grph Theory Prudene Wong http://www.s.liv..uk/~pwong/tehing/omp108/201617 How to Mesure 4L? 3L 5L 3L ontiner & 5L ontiner (without mrk) infinite supply of wter You n pour wter from one ontiner to nother

More information

1 Which of the following keyword can not be appeared inside the class? a)virtual b)static c)template d)friend c

1 Which of the following keyword can not be appeared inside the class? a)virtual b)static c)template d)friend c 1 Whih of the following keywor n not e ppere insie the lss? )virtul )stti )templte )frien 2 Wht is templte? )Templte is formul for reting generi lss )Templte is use to mnipulte lss )Templte is use for

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

Kinetic Collision Detection: Algorithms and Experiments

Kinetic Collision Detection: Algorithms and Experiments Kineti Collision Detetion: Algorithms n Experiments Leonis J. Guis Feng Xie Li Zhng Computer Siene Deprtment, Stnfor University Astrt Effiient ollision etetion is importnt in mny rooti tsks, from high-level

More information

Cellular-based Population to Enhance Genetic Algorithm for Assignment Problems

Cellular-based Population to Enhance Genetic Algorithm for Assignment Problems Amerin Journl of Intelligent Systems. 0; (): -5 DOI: 0. 593/j.jis.000.0 Cellulr-se Popultion to Enhne Geneti Algorithm for Assignment Prolems Hossein Rjlipour Cheshmehgz *, Hiollh Hron, Mohmm Rez Myoi

More information

Structure in solution spaces: Three lessons from Jean-Claude

Structure in solution spaces: Three lessons from Jean-Claude Struture in solution spes: Three lessons from Jen-Clue Dvi Eppstein Computer Siene Deprtment, Univ. of Cliforni, Irvine Conferene on Meningfulness n Lerning Spes: A Triute to the Work of Jen-Clue Flmgne

More information

Width and Bounding Box of Imprecise Points

Width and Bounding Box of Imprecise Points Width nd Bounding Box of Impreise Points Vhideh Keikh Mrten Löffler Ali Mohdes Zhed Rhmti Astrt In this pper we study the following prolem: we re given set L = {l 1,..., l n } of prllel line segments,

More information