Using a User-Level Memory Thread for Correlation Prefetching

Size: px
Start display at page:

Download "Using a User-Level Memory Thread for Correlation Prefetching"

Transcription

1 Using User-Level Memory Thre for Correltion Prefething Yn Solihin Jejin Lee Josep Torrells University of Illinois t Urn-Chmpign Mihign Stte University jlee Astrt This pper introues the ie of using User-Level Memory Thre (ULMT) for orreltion prefething. In this pproh, user thre runs on generl-purpose proessor in min memory, either in the memory ontroller hip or in DRAM hip. The thre performs orreltion prefething in softwre, sening the prefethe t into the L2 he of the min proessor. This pproh requires miniml hrwre eyon the memory proessor: the orreltion tle is softwre t struture tht resies in min memory, while the min proessor only nees few moifitions to its L2 he so tht it n ept inoming prefethes. In ition, the pproh hs wie usility, s it n effetively prefeth even for irregulr pplitions. Finlly, it is very flexile, s the prefething lgorithm n e ustomize y the user on n pplition sis. Our simultion results show tht, through new esign of the orreltion tle n prefething lgorithm, our sheme elivers goo results. Speifilly, nine mostly-irregulr pplitions show n verge speeup of Furthermore, our sheme works well in omintion with onventionl proessor-sie sequentil prefether, in whih se the verge speeup inreses to Finlly, y exploiting the ustomiztion of the prefething lgorithm, we inrese the verge speeup to Introution Dt prefething is populr tehnique to tolerte long memory ess ltenies. Most of the pst work on t prefething hs fouse on proessor-sie prefething [6, 7, 8, 12, 13, 14, 15, 19, 20, 23, 25, 26, 28, 29]. In this pproh, the proessor or n engine in its he hierrhy issues the prefeth requests. An interesting lterntive is memory-sie prefething, where the engine tht prefethes t for the proessor is in the min memory system [1, 4, 9, 11, 22, 28]. Memory-sie prefething is ttrtive for severl resons. First, it elimintes the overhes n stte ookkeeping tht prefeth requests introue in the pths etween the min proessor n its hes. Seon, it n e supporte with few moifitions to the ontroller of the L2 he n no moifition to the min proessor. Thir, the prefether n exploit its proximity to the memory to its vntge, for exmple y storing its stte in memory. Finlly, memory-sie prefething hs the itionl ttrtion of riing the tehnology tren of inrese hip integrtion. Inee, populr pltforms like PCs re eing equippe with grphis engines in the memory system [27]. Some hipsets like NVIDIA s nfore even integrte powerful proessor in the North Brige hip [22]. Simpler This work ws supporte in prt y the Ntionl Siene Fountion uner grnts CCR , EIA , EIA , n CHE ; y DARPA uner grnt F C-0078; y Mihign Stte University; n y gifts from IBM, Intel, n Hewlett-Pkr. engines n e provie for prefething, or existing grphis proessors n e ugmente with prefething pilities. Moreover, there re proposls to integrte proessing logi in DRAM hips, suh s IRAM [16]. Unfortuntely, existing proposls for memory-sie prefething engines hve nrrow sope [1, 9, 11, 22, 28]. Inee, some esigns re hrwre ontrollers tht perform simple n speifi opertions [1, 9, 22]. Other esigns re speilize engines tht re ustom-esigne to prefeth linke t strutures [11, 28]. Inste, we woul like n engine tht is usle in wie vriety of worklos n tht offers flexiility of use to the progrmmer. While memory-sie prefething n support vriety of prefething lgorithms, one type tht is prtiulrly suite to it is Correltion prefething [1, 6, 12, 18, 26]. Correltion prefething uses pst sequenes of referene or miss resses to preit n prefeth future misses. Sine no progrm knowlege is neee, orreltion prefething n e esily move to the memory sie. In the pst, orreltion prefething hs een supporte y hrwre ontrollers tht typilly require lrge hrwre tle to keep the orreltions [1, 6, 12, 18]. In ll ses ut one, these ontrollers re ple etween the L1 n L2 hes, or etween the proessor n the L1. While effetive, this pproh hs high hrwre ost. Furthermore, it is often unle to prefeth fr he enough n eliver goo prefeth overge. In this pper, we present new sheme where orreltion prefething is performe y User-Level Memory Thre (ULMT) running on simple generl-purpose proessor in memory. Suh proessor is either in the memory ontroller hip or in DRAM hip, n prefethes lines to the L2 he of the min proessor. The sheme requires miniml hrwre support eyon the memory proessor: the orreltion tle is softwre t struture tht resies in min memory, while the min proessor only nees few moifitions to its L2 he ontroller so tht it n ept inoming prefethes. Moreover, our sheme hs wie usility, s it n effetively prefeth even for irregulr pplitions. Finlly, it is very flexile, s the prefething lgorithm exeute y the ULMT n e ustomize y the progrmmer on n pplition sis. Using new esign of the orreltion tle n orreltion prefething lgorithm, our sheme elivers n verge speeup of 1.32 for nine mostly-irregulr pplitions. Furthermore, our sheme works well in omintion with onventionl proessor-sie sequentil prefether, in whih se the verge speeup inreses to Finlly, y exploiting the ustomiztion of the prefething lgorithm, we inrese the verge speeup to This pper is orgnize s follows: Setion 2 isusses memory-sie n orreltion prefething; Setion 3 presents ULMT for orrel-

2 Possile Lotions of the Memory Proessor CPU L1 $ L2 $ North Brige Chip DRAM Memory Min Pro 1: Feth i Mem Pro 3: Prefeth j, k 2: y i () () () Figure 1. Memory-sie prefething: some lotions where the memory proessor n e ple (), n tions uner push pssive () n push tive () prefething. Mem 2: Lookup Min Memory System Min Pro Mem Pro 1: Exeute 3: y i 3: Prefeth i Mem 2: Feth i Min Memory System tion prefething; Setion 4 isusses our evlution setup; Setion 5 evlutes our esign; Setion 6 isusses relte work; n Setion 7 onlues. 2. Memory-Sie n Correltion Prefething 2.1. Memory-Sie Prefething Memory-Sie prefething ours when prefething is initite y n engine tht resies either lose to the min memory (eyon ny memory us) or insie of it [1, 4, 9, 11, 22, 28]. Some mnufturers hve uilt suh engines. Typilly, they re simple hrwire ontrollers tht proly reognize only simple strie-se sequenes n prefeth t into lol uffers. Some exmples re NVIDIA s DASP engine in the North Brige hip [22] n Intel s prefeth he in the i860 hipset. In this pper, we propose to support memory-sie prefething with user-level thre running on generl-purpose ore. The ore n e very simple n oes not nee to support floting point. For illustrtion purposes, Figure 1-() shows the memory system of PC. The ore n e ple in ifferent ples, suh s in the North Brige (memory ontroller) hip or in the DRAM hips. Pling it in the North Brige simplifies the esign euse the DRAM is not moifie. Moreover, some existing systems lrey inlue ore in the North Brige for grphis proessing [22], whih oul potentilly e reuse for prefething. Pling the ore in DRAM hip omplites the esign, ut the resulting highly-integrte system hs lower memory ess lteny n higher memory nwith. In this pper, we exmine the performne potentil of oth esigns. Memory- n proessor-sie prefething re not the sme s Push n Pull (or On-Demn) prefething [28], respetively. Push prefething ours when prefethe t is sent to he or proessor tht hs not requeste it, while pull prefething is the opposite. Clerly, memory prefether n t s pull prefether y simply uffering the prefethe t lolly n supplying it to the proessor on emn [1, 22]. In generl, however, memory-sie prefething is most interesting when it performs push prefething to the hes of the proessor euse it n hie lrger frtion of the memory ess lteny. Memory-sie prefething n lso e lssifie into Pssive n Ative. In pssive prefething, the memory proessor oserves the requests from the min proessor tht reh min memory. on them, n fter exmining some internl stte, the memory proessor prefethes other t for the min proessor tht it expets the ltter to nee in the future (Figure 1-()). In tive prefething, the memory proessor runs n rige version of the oe tht is running on the min proessor. The exeution of the oe inues the memory proessor to feth t tht the min proessor will nee lter. The t fethe y these requests is lso sent to the min proessor (Figure 1-()). In this pper, we onentrte on pssive push memory-sie prefething into the L2 he of the min proessor. The memory proessor ims to eliminte only L2 he misses, sine they re the only ones tht it sees. Typilly, L2 he miss time is n importnt ontriutor to the proessor stll ue to memory esses, n is usully the hrest to hie with out-of-orer exeution. This pproh to prefething is inexpensive to support. The min proessor ore oes not nee to e moifie t ll. Its L2 he nees to hve the following supports. First, s in other systems [11, 15, 28], the L2 he hs to ept lines from the memory tht it hs not requeste. To o so, the L2 uses free Miss Sttus Hnling Registers (MSHRs) in suh events. Seonly, if the L2 hs pening request n prefethe line with the sme ress rrives, the prefeth simply stels the MSHR n uptes the he s if it were the reply. Finlly, prefethe line rriving t L2 is roppe in the following ses: the L2 he lrey hs opy of the line, the write-k queue hs opy of the line euse the L2 he is trying to write it k to memory, ll MSHRs re usy, or ll the lines in the set where the prefethe line wnts to go re in trnstion-pening stte Correltion Prefething Correltion Prefething uses pst sequenes of referene or miss resses to preit n prefeth future misses [1, 6, 12, 18, 26]. Two populr orreltion shemes re Strie- n Pir- shemes. Strie-se shemes fin strie ptterns in the ress sequenes n prefeth ll the resses tht will e esse if the ptterns ontinue in the future. Pir-se shemes ientify orreltion etween pirs or groups of resses, for exmple etween miss n sequene of suessor misses. A typil implementtion of pir-se shemes uses Correltion Tle to reor the resses tht re orrelte. Lter, when miss is oserve, ll the resses tht re orrelte with its ress re prefethe. Pir-se shemes re ttrtive euse they hve generl ppliility: they work for ny miss ptterns s long s miss ress sequenes repet. Suh ehvior is ommon in oth regulr n irregulr pplitions, inluing those with sprse mtries or linke t strutures. Furthermore, pir-se shemes, like ll orreltion shemes, nee neither ompiler support nor hnges in the pplition inry. Pir-se orreltion prefething hs only een stuie using hrwre-se implementtions [1, 6, 12, 18, 26], typilly y pling ustom prefeth engine n hrwre orreltion tle etween the proessor n L1 he, or etween the L1 n L2 hes. The typil orreltion tle, s use in [6, 12, 26], is orgnize s

3 follows. Eh row stores the tg of n ress tht misse, n the resses of set of immeite suessor misses. These re misses tht hve een seen to immeitely follow the first one t ifferent points in the pplition. The prmeters of the tle re the mximum numer of immeite suessors per miss (NumSu), the mximum numer of misses tht the tle n store preitions for (NumRows), n the ssoitivity of the tle (Asso). Aoring to [12], for est performne, the entries in row shoul reple eh other with LRU poliy. Figure 4-() illustrtes how the lgorithm works. We ll the lgorithm. The figure shows two snpshots of the tle t ifferent points in the miss strem ((i) n (ii)). Within row, suessors re liste in MRU orer from left to right. At ny time, the hrwre keeps pointer to the row of the lst miss oserve. When miss ours, the tle lerns y pling the miss ress s one of the immeite suessors of the lst miss, n new row is llote for the new miss unless it lrey exists. When the tle is use to prefeth ((iii)), it rets to n oserve miss y fining the orresponing row n prefething ll NumSu suessors, strting from the MRU one. The esigns in [1, 18] work slightly ifferently. They re isusse in Setion 6. Overll, pst work hs emonstrte the ppliility of pir-se orreltion prefething for mny pplitions. However, it hs lso revele the shortomings of the pproh. One ritil prolem is tht, to e effetive, this pproh nees lrge tle. Propose shemes typilly nee 1-2 Myte on-hip SRAM tle [12, 18], while some pplitions with lrge footprints even nee 7.6 Myte off-hip SRAM tle [18]. Furthermore, the populr shemes tht prefeth severl potentil immeite suessors for eh miss [6, 12, 26] hve two limittions: they o not prefeth very fr he n, intuitively, they nee to oserve one miss to eliminte nother miss (its immeite suessor). As result, they ten to hve low overge. Coverge is the numer of useful prefethes over the originl numer of misses [12]. 3. ULMT for Correltion Prefething We propose to use ULMT to eliminte the shortomings of pirse orreltion prefething while enhning its vntges. In the following, we isuss the min onept (Setion 3.1), the rhiteture of the system (Setion 3.2), moifie orreltion prefething lgorithms (Setion 3.3), n relte operting system issues (Setion 3.4) Min Conept A ULMT running on generl-purpose ore in memory performs two oneptully istint opertions: lerning n prefething. Lerning involves oserving the misses on the min proessor s L2 he n reoring them in orreltion tle one miss t time. The prefething opertion involves reting to one suh miss y looking up the orreltion tle n triggering the prefething of severl memory lines for the L2 he of the min proessor. No tion is tken on write-k to memory. In prtie, in greement with pst work [12], we fin tht omining oth lerning n prefething works est: the orreltion tle ontinuously lerns new ptterns, while uninterrupte prefething elivers higher performne. Consequently, the ULMT exeutes the infinite loop shown in Figure 2. Initilly, the thre wits for miss to e oserve. When it oserves one, it looks up the tle n genertes the resses of the lines to prefeth (Prefething Step). Then, it uptes the tle with the ress of the oserve miss (Lerning Step). It then resumes witing. Miss ress oserve Prefething step Response time Prefeth resses generte Oupny time Wit Lerning step Figure 2. Infinite loop exeute y the ULMT. Tle upte Any prefeth lgorithm exeute y the ULMT is hrterize y its Response n Oupny times. The response time is the time from when the ULMT oserves miss ress until it genertes the resses to prefeth. For est performne, the response time shoul e s smll s possile. This is why we lwys exeute the Prefething step efore the Lerning one. Moreover, we shift s muh omputtion s possile from the Prefething to the Lerning step, retining only the most ritil opertions in the Prefething step. The oupny time is the time when the ULMT is usy proessing single oserve miss. For the ULMT implementtion of the prefether to e vile, the oupny time hs to e smller thn the time etween two onseutive L2 misses most of the times. The orreltion tle tht the ULMT res n writes is simply softwre t struture in memory. Consequently, our sheme elimintes the ostly hrwre tle require y urrent implementtions of orreltion prefething [12, 18]. Moreover, esses to the softwre tle re inexpensive euse the memory proessor trnsprently hes the tle in its he. Finlly, our new sheme enles the reesign of the orreltion tle n prefething lgorithms (Setion 3.3) to ress the low-overge n short-istne prefething limittions of urrent implementtions Arhiteture of the System Figures 3-() n () show the rhiteture of system tht integrtes the memory proessor in the North Brige hip or in DRAM hip, respetively. The first esign requires no moifition to the DRAM or its interfe, n is lrgely omptile with onventionl memory systems. The seon esign nees hnges to the DRAM hips n their interfe, n nees speil support to work in typil memory systems, whih hve multiple DRAM hips. However, sine our gol is to exmine the performne potentil of the two esigns, we strt wy some of the implementtion omplexity of the seon esign y ssuming single-hip min memory. In the following, we outline how the systems work. In our isussion, we only onsier memory esses resulting from misses; we ignore write-ks for simpliity n euse they o not ffet our lgorithms. In Figure 3-(), the key ommunition ours through queues 1, 2, n 3. Miss requests from the min proessor re eposite in queues 1 n 2 simultneously. The ULMT uses the entries in queue 2 to uil its tle n, se on it, generte the resses to prefeth. The ltter re eposite in queue 3. Queues 1 n 3 ompete to ess memory, lthough queue 3 hs lower priority thn 1. When the ress of line to prefeth is eposite in queue 3, the hrwre ompres it ginst ll the entries in queue 2. If mth for ress ress X is etete, X is remove from oth queues. We remove X from queue 3 euse it is reunnt: higher-priority

4 North Brige Chip Memory Proessor Che Other Units Filter 5 Min Proessor 2 Bus Interfe Memory Controller Memory 4 North Brige Chip Other Units Min Proessor Bus Interfe Memory Controller DRAM hip () () Figure 3. Arhiteture of system tht integrtes the memory proessor in the North Brige hip () or in DRAM hip () Memory Proessor Che Filter 3 DRAM request for X is lrey in queue 1. X is remove from queue 2 to sve omputtion in the ULMT. Note tht it is unler whether we lost the opportunity to prefeth X s suessors y not proessing X. The reson is tht our lgorithms prefeth severl levels of suessor misses (Setion 3.3) n, s result, some of X s suessors my lrey e in queue 3. Proessing X my help improve the stte in the orreltion tle. However, minimizing the totl oupny of the ULMT is ruil in our sheme. Similrly, when min-proessor miss is out to e eposite in queues 1 n 2, the hrwre ompres its ress ginst those in queue 3. If there is mth, the request is put only in queue 1 n the mthing entry in queue 3 is remove. It is possile tht requests from the min proessor rrive too fst for the ULMT to onsume them n queue 2 overflows. In this se, the memory proessor simply rops these requests. Figure 3-() lso shows the Filter moule ssoite with queue 3. This moule improves the performne of orreltion prefething, whih my sometimes try to prefeth the sme ress severl times in short time. The Filter moule rops prefeth requests irete to ny ress tht hs een reently issue nother prefeth requests. The moule is fixe-size FIFO list tht reors the resses of ll the reently-issue requests. Before request is issue to queue 3, the hrwre heks the Filter list. If it fins its ress, the request is roppe n the list is left unmoifie. Otherwise, the ress is e to the til of the list. With this support, some unneessry prefeth requests re eliminte. For ompleteness, the figure shows other queues. ies from memory to the min proessor go through queue 4. In ition, the ULMT nees to ess the softwre orreltion tle in min memory. Rell tht the tle is trnsprently he y the memory proessor. Logil queues 5 n 6 provie the neessry pths for the memory proessor to ess min memory. In prtie, queues 5 n 6 re merge with the others. If the memory proessor is in the DRAM hip (Figure 3-()), the system works slightly ifferently. Miss requests from the min proessor re eposite first in queue 1 n then in queue 2. The ULMT in the memory proessor esses the orreltion tle from its he n, on miss, iretly from the DRAM. The resses to prefeth re psse through the Filter moule n ple in queue 3. As in Figure 3-(), entries in queues 2 n 3 re heke ginst eh other, n the ommon entries re roppe. The replies to oth prefethes n min-proessor requests re returne to the memory ontroller. As they reh the memory ontroller, their resses re ompre to the proessor miss requests in queue 1. If memory-prefethe line mthes miss request from the min proessor, the former is onsiere to e the reply of the ltter, n the ltter is not sent to the memory hip. Finlly, in mhines tht inlue form of proessor-sie prefething, we envision our rhiteture to operte in two moes: Verose n Non-Verose. In Verose moe, queue 2 in Figures 3-() n () reeives oth min-proessor misses n min-proessor prefeth requests. In Non-Verose moe, queue 2 only reeives min-proessor misses. This moe ssumes tht min-proessor prefeth requests re istinguishle from other requests, for exmple with tg s in the MIPS R10000 [21]. The Non-Verose moe is useful to reue the totl oupny of the ULMT. In this se, the proessor-sie prefether n fous on the esy-to-preit sequentil or regulr miss ptterns, while the ULMT n fous on the hr-to-preit irregulr ones. The Verose moe is lso useful: the ULMT n implement prefeth lgorithm tht enhnes the effetiveness of the proessor-sie prefether. We present n exmple of this se in Setion Correltion Prefething Algorithms Simply tking the urrent pir-se orreltion tle n lgorithm n implementing them in softwre is not goo enough. Inee, s inite in Setion 2.2, the lgorithm hs two limittions: it oes not prefeth very fr he n, intuitively, it nees to oserve one miss to eliminte nother miss (its immeite suessor). As result, it tens to hve low overge. To inrese overge, three things nee to our. First, we nee to eliminte these two limittions y storing in the tle (n prefething) severl levels of suessor misses per miss: immeite suessors, suessors of immeite suessors, n so on for severl levels. Seon, these prefethes hve to e highly urte. Finlly, the prefether hs to tke eisions erly enough so tht the prefethe lines reh the min proessor efore they re neee. These onitions re esier to support n ensure when the orreltion lgorithm is implemente s ULMT. There re two resons for it. The first one is tht storge is now hep n, therefore, the orreltion tle n e inexpensively expne to hol multiple levels of suessor misses per miss, even if tht mens repliting informtion. The seon reson is the Customizility provie y softwre implementtion of the prefething lgorithm. In the rest of this setion, we esrie how ULMT implementtion of orreltion prefething n eliver high overge. We esrie three pprohes: using onventionl tle orgniztion, using tle re-orgnize for ULMT, n exploiting ustomizility.

5 Correltion Tle (i) NumRows=4 Miss Sequene urrent miss,,,,,,... Correltion Tle (i) NumRows=4 Miss Sequene urrent miss,,,,,,... (i) SeonLst Lst Correltion Tle NumLevels=2 Miss Sequene urrent miss,,,,,,... NumSu=2 NumSu=2 (ii) urrent miss,,,,,,... (ii) urrent miss,,,,,,... (ii) Lst SeonLst NumSu=2 urrent miss,,,,,,... (iii) on miss prefeth, (iii) on miss follow link NumLevels=2 prefeth, prefeth (iii) on miss prefeth,, () () Figure 4. Pir-se orreltion lgorithms: (), (), n ite (). () Using Conventionl Tle Orgniztion As first step, we ttempt to improve overge without speifilly exploiting the low-ost storge or ustomizility vntges of ULMT. We simply tke the onventionl tle orgniztion of Setion 2.2 n fore the ULMT to prefeth multiple levels of suessors for every miss. The resulting lgorithm we ll. tkes the sme prmeters s plus NumLevels, whih is the numer of levels of suessors prefethe. The lgorithm is illustrte in Figure 4-(). uptes the tle like ((i) n (ii)) ut prefethes ifferently ((iii)). Speifilly, fter prefething the row of immeite suessors, it tkes the MRU one mong them n esses the orreltion tle gin with its ress. If the entry is foun, it prefethes ll NumSu suessors there. Then, it tkes the MRU suessor in tht row n repets the proess. This is one NumLevels-1 times. As n exmple, suppose tht miss on ours ((iii)). The ULMT first prefethes n. Then, it tkes the MRU entry, looks-up the tle, n prefethes s suessor,. resses the two limittions of, nmely not prefething very fr he, n neeing one miss to eliminte seon one. However, my not eliver high overge for two resons: the prefethes my not e highly urte n the ULMT my hve high response time to issue ll the prefethes. The prefethes my e inurte euse oes not prefeth the true MRU suessors in eh level of suessors. Inste, it only prefethes suessors foun long the MRU pth. For exmple, onsier sequene of misses tht lterntes etween,, n,e,,f:,,,...,,e,,f,...,,,,... When miss is enountere, prefethes its immeite suessors (), n then esses the entry for to prefeth e n f. Note tht is not prefethe. The high response time of to miss omes from hving to mke NumLevels esses to ifferent rows in the tle. Eh ess involves n ssoitive serh euse the tle is ssoitive n, potentilly, one or more he misses Using Tle Re-Orgnize for ULMT We now ttempt to improve overge y exploiting the low ost of storge in ULMT solutions. Speifilly, we expn the tle to llow replite informtion. Eh row of the tle stores the tg of the miss ress, n NumLevels levels of suessors. Eh level ontins NumSu resses tht use LRU for replement. Using this tle, we propose n lgorithm lle ite (Figure 4-()). ite tkes the sme prmeters s. As shown in Figure 4-(), ite keeps NumLevels pointers to the tle. These pointers point to the entries for the ress of the lst miss, seon lst, n so on, n re use for effiient tle ess. When miss ours, these pointers re use to ess the entries of the lst few misses, n insert the new ress s the MRU suessor of the orret level ((i) n (ii)). In the figure, the NumSu entries t eh level re MRU orere. Finlly, prefething in ite is simple: when miss is seen, ll the entries in the orresponing row re prefethe ((iii)). Note tht ite elimintes the two prolems of. First, prefethes re urte euse they ontin the true MRU suessors t eh level. This is the result of grouping together ll the suessors from given level, irrespetive of the pth tken. In the sequene shown ove,,,...,,e,,f,...,,,,..., on miss on, ite prefethes n. Seon, the response time of ite is muh smller thn. Inee, ite prefethes severl levels of suessors with single row ess, n mye even with single he miss. ite effetively shifts some omputtion from the Prefething step to the Lerning one: prefething nees single tle ess, while lerning miss nees multiple tle uptes. This is goo tre-off euse the Prefething step is the ritil one. Furthermore, these multiple lerning uptes re inexpensive: the use of the pointers elimintes the nee to o ny ssoitive serhes on the tle, n the rows to e upte re most likely still in the he of the memory proessor (sine they were upte most reently) Exploiting the Customizility of ULMT We n lso improve overge y exploiting the seon vntge of ULMT solutions: ustomizility. The progrmmer or system n hoose to run ifferent lgorithm in the ULMT for eh pplition. The hosen lgorithm n e highly ustomize to the pplition s nees. One pproh to ustomiztion is to use the tle orgniztions n prefething lgorithms esrie ove ut to tune their prmeters on n pplition sis. For exmple, in pplitions where the miss sequenes re highly preitle, we n set the numer of levels of suessors to prefeth (NumLevels) to high vlue. As result,

6 Chrteristis ite Levels of suessors prefethe 1 NumLevels NumLevels True MRU orering for eh level? Yes No Yes Numer of row esses in the Prefething step (Requires SEARCH) 1 NumLevels 1 Numer of row esses in the Lerning step (Requires NO SEARCH) 1 1 NumLevels Response time Low High Low Spe requirement (for onstnt numer of prefethes) NumLevels Tle 1. Compring ifferent pir-se orreltion prefething lgorithms running on ULMT. we will prefeth more levels of suessors with high ury. In pplitions with unpreitle sequenes, we n o the opposite. We n lso tune the numer of rows in the tle (NumRows). In pplitions tht hve lrge footprints, we n set NumRows to high vlue to hol more informtion in the tle. In smll pplitions, we n o the opposite to sve spe. A seon pproh to ustomiztion is to use ifferent prefething lgorithm. For exmple, we n support for sequentil prefething to ll the lgorithms esrie ove. The resulting lgorithms will hve low response time for sequentil miss ptterns. Another pproh is to ptively eie the lgorithm on-the-fly, s the pplition exeutes. In ft, this pproh n lso e use to exeute ifferent lgorithms in ifferent prts of one pplition. Suh intr-pplition ustomizility my e useful in omplex pplitions. Finlly, the ULMT n lso e use for profiling purposes. It n monitor the misses of n pplition n infer higher-level informtion suh s he performne, pplition ess ptterns, or pge onflits Compring the Algorithms Tle 1 ompres the,, n ite lgorithms exeuting on ULMT. ite hs the highest potentil for high overge: it supports fr-he prefething y prefething severl levels of suessors, its prefethes hve high ury euse they prefeth the true MRU suessors t eh level, n it hs low response time, in prt euse it only nees to ess single tle row in the Prefething step. Aessing single row minimizes the ssoitive serhes n the he misses. The only shortoming of ite is the lrger spe tht it requires for the orreltion tle. However, this is minor issue sine the tle is softwre struture llote in min memory. Note tht ll these lgorithms n lso e implemente in hrwre. However, ite is more suitle for n ULMT implementtion euse proviing the lrger spe require in hrwre is expensive Operting System Issues There re some operting system issues tht re relte to ULMT opertion. We outline them here. Protetion. The ULMT hs its own seprte ress spe with its instrutions, the orreltion tle, n few other t strutures. The ULMT shres neither instrutions nor t with ny pplition. The ULMT n oserve the physil resses of the pplition misses. It n lso issue prefethes for these resses on ehlf of the min proessor. However, it n neither re from nor write to these resses. Therefore, protetion is gurntee. Multiprogrmme Environment. It is poor pproh to hve ll the pplitions shre single tle: the tle is likely to suffer lot of interferene. A etter pproh is to ssoite ifferent ULMT, with its own tle, to eh pplition. This elimintes interferene in the tles. In ition, it enles the ustomiztion of eh ULMT to its own pplition. If we onservtively ssume 4-Myte tle on verge per pplition, 8 pplitions require 32 Mytes, whih is only moest frtion of toy s typil min memory. If this requirement is exessive, we n sve spe y ynmilly sizing the tles. In this se, if n pplition oes not use the spe, its tle shrinks. Sheuling. The sheuler knows the ULMT ssoite with eh pplition. Consequently, the sheuler sheules n preempts oth pplition n ULMT s group. Furthermore, the operting system provies n interfe for the pplition to ontrol its ULMT. Pge Re-mpping. Sometimes, pge gets re-mppe. Sine ULMTs operte on physil resses, suh events n use some tle entries to eome stle. We n hoose to tke no tion n let the tle upte itself utomtilly through lerning. Alterntively, the operting system n inform the orresponing ULMT when re-mpping ours, pssing the ol n new physil pge numer. Then, the ULMT inexes its tle for eh line of the ol pge. If the entry is foun, the ULMT relotes it n uptes oth the tg n ny pplile suessors in the row. Given urrent pge sizes, we estimte the tle upte to tke few miroseons. Suh overhe my e overlppe with the exeution of the operting system pge mpping hnler in the min proessor. Note tht some other entries in the tle my still keep stle suessor informtion. Suh informtion my use few useless prefethes, ut the tle will quikly upte itself utomtilly. 4. Evlution Environment Applitions. To evlute the ULMT pproh, we use nine mostlyirregulr, memory-intensive pplitions. Irregulr pplitions re hrly menle to ompiler-se prefething. Consequently, they re the ovious trget for ULMT orreltion prefething. The exeption is CG, whih is regulr pplition. Tle 2 esries the pplitions. The lst four olumns of the tle will e expline lter. Simultion Environment. The evlution is one using n exeution-riven simultion environment tht supports ynmi superslr proessor moel [17]. We moel PC rhiteture with simple memory proessor tht is integrte in either the North Brige hip or in DRAM hip, following the miro-rhiteture of Figure 3. Tle 3 shows the prmeters use for eh omponent of the rhiteture. All yles re 1.6 GHz yles. The rhiteture is moele yle y yle. We moel only uni-progrmme environment with single pplition n single ULMT tht exeute onurrently. We moel ll the ontention in the system, inluing the ontention of the pplition thre n the ULMT on shre resoures suh s the memory ontroller, DRAM hnnels, n DRAM nks. Proessor-Sie Prefething. The min proessor optionlly inlues hrwre prefether tht n prefeth multiple strems of strie 1 or -1 into the L1 he. The prefether monitors L1 he misses n n ientify n prefeth up to NumSeq sequen-

7 Correltion Tle Appl Suite Prolem Input NumRows Size (Mytes) (K) CG NAS Conjugte grient Clss S Equke SpeFP2000 Seismi wve propgtion simultion Test FT NAS 3D Fourier trnsform Clss S Gp SpeInt2000 Group theory solver Rko (suset of test) Mf SpeInt2000 Comintoril optimiztion Test MST Olen Fining minimum spnning tree 1024 noes Prser SpeInt2000 Wor proessing Suset of trin Sprse SprseBenh[10] GMRES with ompresse row storge Tree Univ. of Hwii[3] Brnes-Hut N-oy prolem 2048 oies Averge Tle 2. Applitions use. PROCESSOR Min Proessor: 6-issue ynmi. 1.6 GHz. Int, fp, l/st FUs: 4, 4, 2 Pening l, st: 8, 16. Brnh penlty: 12 yles Memory Proessor: 2-issue ynmi. 800 MHz. Int, fp, l/st FUs: 2, 0, 1 Pening l, st: 4, 4. Brnh penlty: 6 yles MEMORY Min Proessor s Memory Hierrhy: L1 t: write-k, 16 KB, 2 wy, 32-B line, 3-yle hit RT L2 t: write-k, 512 KB, 4 wy, 64-B line, 19-yle hit RT RT memory lteny: 243 yles (row miss), 208 yles (row hit) Memory us: split-trnstion, 8 B, 400 MHz, 3.2 GB/se pek Memory Proessor s Memory Hierrhy: L1 t: write-k, 32 KB, 2 wy, 32-B line, 4-yle hit RT In North Brige: RT mem lteny: 100 yles (row miss), 65 yles (row hit) Lteny of prefeth request to reh DRAM: 25 yles In DRAM: RT mem lteny: 56 yles (row miss), 21 yles (row hit) Internl DRAM t us: 32-B wie, 800 MHz, 25.6 GB/se pek DRAM Prmeters (pplile to ll pros): Dul hnnel. Eh hnnel: 2 B, 800 MHz. Totl: 3.2 GB/se pek Rnom ess time (trac): 45 ns Time from memory ontroller (tsystem): 60 ns OTHER Depth of queues 1 through 6: 16 Filter moule: 32 entries, FIFO Tle 3. Prmeters of the simulte rhiteture. Ltenies orrespon to ontention-free onitions. RT stns for roun-trip from the proessor. All yles re 1.6 GHz yles. til strems onurrently. It works s follows. When the thir miss in sequene is oserve, the prefether reognizes strem. Then, it prefethes the next NumPref lines in the strem into the L1 he. Furthermore, it stores the strie n the next ress expete in the strem in speil register. If the proessor lter misses on the ress in the register, the prefether prefethes the next NumPref lines in the strem n uptes the register. The prefether ontins NumSeq suh registers. As we n see, while this sheme works somewht like strem uffers [13], the prefethe lines go to L1. We hoose this pproh to minimize hrwre omplexity. A shortoming is tht the L1 he my get pollute. For ompleteness, we resimulte the system with the prefethes going into seprte uffers rther thn into L1. We foun tht the performne hnges very little, in prt euse heking the uffers on L1 misses introues ely. Algorithm Prmeters. Tle 4 lists the prefething lgorithms tht we evlute n the efult prmeters tht we use. The sequentil prefething supporte in hrwre y the min proessor is lle for onventionl. It n lso e implemente in softwre y ULMT. We evlute two suh softwre implementtions (Seq1 n Seq4). In this se, the prefether in memory oserves L2 misses rther thn L1. Unless otherwise inite, the proessor-sie prefether is off n, if it is on, the ULMT lgorithms operte in Non-Verose moe (Setion 3.2). For the lgorithm, we hoose the prmeter vlues use y Joseph n Grunwl [12] so tht we n ompre the work. The lst four olumns of Tle 2 give onservtive vlue for the size of the orreltion tle for eh pplition. The tle is twowy set-ssoitive. We hve size the numer of rows in the tle (NumRows) to e the lowest power of two suh tht, with trivil hshing funtion tht simply tkes the lower its of the line ress, less thn 5% of the insertions reple n existing entry. This is very generous llotion. A more sophistite hsh funtion n reue NumRows signifintly without inresing onflits muh. In ny se, knowing tht eh row in,, n tkes 20, 12, n 28 ytes, respetively, in 32-it mhine, we n ompute the totl tle size. Overll, while some pplitions nee more spe thn others, the verge vlue is tolerle: 2.7, 1.6, n 3.8 Mytes for,, n, respetively. ULMT Implementtion. We wrote ll ULMTs in C n hnoptimize them for miniml response n oupny time. One mjor performne ottlenek of the implementtion is frequent rnhes. We remove rnhes y unrolling loops n hrwiring ll lgorithm prmeters. We lso perform optimiztions to inrese the sptil lolity n to reue instrution ount. None of the lgorithms uses floting-point opertions. 5. Evlution 5.1. Chrterizing Applition Behvior Preitility of the Miss Sequenes. We strt y hrterizing how well our ULMT lgorithms n preit the miss sequenes of the pplitions. For tht, we run eh ULMT lgorithm simply oserving ll L2 he miss resses without performing prefething. We reor the frtion of L2 he misses tht re orretly preite. For sequentil prefether, this mens tht the upoming miss ress mthes the next ress preite y one of the strems ientifie; for pir-se prefether, the upoming ress mthes one of the suessors preite for tht level. Figure 5 shows the results of preition for up to three levels of suessors. Given miss, the Level 1 hrt shows the preitility of the immeite suessor, while Level 2 shows the preitility of the next suessor, n Level 3 the suessor fter tht one. The experiments for the pir-se shemes use lrge tles to ensure tht prtilly no preition is misse ue to onflits in the tle: Num- Rows is 256 K, Asso is 4, n NumSu is 4. Uner these oni-

8 Prefething Algorithm Implementtion Nme Prmeter Vlues NumSu = 4, Asso = 4 NumSu = 2, Asso = 2, NumLevels = 3 ite Softwre in memory s ULMT NumSu = 2, Asso = 2, NumLevels = 3 Sequentil 1-Strem Seq1 NumSeq = 1, NumPref = 6 Sequentil 4-Strems Seq4 NumSeq = 4, NumPref = 6 Sequentil 4-Strems Hrwre in L1 of min proessor NumSeq = 4, NumPref = 6 Tle 4. Prmeter vlues use for the ifferent lgorithms. Level 1 Level 2 Level 3 % Corret Preition % Corret Preition % Corret Preition CG Equke FT Gp Mf MST Prser Sprse Tree Averge CG Equke FT Gp Mf MST Prser Sprse Tree Averge CG Equke FT Gp Mf MST Prser Sprse Tree Averge Seq1 Seq4 Seq4+ Seq1 Seq4 Seq4+ Seq1 Seq4 Seq4+ Figure 5. Frtion of L2 he misses tht re orretly preite y ifferent lgorithms for ifferent levels of suessors. tions, for level 1, n re equivlent to. For levels 2 n 3, is not pplile. The figure lso shows the effet of omining lgorithms. Figure 5 shows tht our ULMT lgorithms n effetively preit the miss strems of the pplitions. For exmple, t level 1, Seq4 n orretly preit on verge 49% n 82% of the misses, respetively. Moreover, the est lgorithms keep preiting orretly ross severl levels of suessors. For exmple, orretly preits on verge 77% n 73% of the misses for levels 2 n 3, respetively. Therefore, these lgorithms hve goo potentil. The figure lso shows tht ifferent pplitions hve ifferent miss ehvior. For instne, pplitions suh s Mf n Tree o not hve sequentil ptterns n, therefore, only pir-se lgorithms n preit misses. In other pplitions suh s CG, inste, sequentil ptterns ominte. As result, sequentil prefething n preit prtilly ll L2 misses. Most pplitions hve mix of oth ptterns. Among pir-se lgorithms, lmost lwys outperforms y wie mrgin. This is euse oes not mintin the true MRU suessors t eh level. However, while is effetive uner ll ptterns, it is etter when omine with multi-strem sequentil prefething (Seq4+). Time Between L2 Misses. Another importnt issue is the time etween L2 misses. Figure 6 lssifies L2 misses oring to the numer of yles etween two onseutive misses rriving t the memory. The misses re groupe in ins orresponing to [0,80) yles, [80,200) yles, et. The unit is 1.6 GHz proessor yles. The most signifint in is [200,280), whih ontriutes with 60% of ll miss istnes on verge. These misses re ritil eyon their numers euse their ltenies re hr to hie with out-of-orer exeution. Inee, sine the roun-trip lteny to memory is yles, epenent misses re likely to fll in this in. They ontriute more to proessor stll thn the figure suggests euse epenent misses nnot e overlppe with eh other. Consequently, we wnt the ULMT to prefeth them. To mke sure tht the ULMT is fst enough to lern these misses, its oupny shoul e less thn 200 yles. The misses in the other ins re fewer n less ritil. Those in [280, ) re too fr prt to put pressure on the ULMT s timing. Those in [0,80) my not give enough time to the ULMT to respon. Fortuntely, these misses re more likely to e overlppe with eh other n with omputtion. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0 % of Misses CG Equke FT Gp Mf MST Prser Sprse Tree Averge Figure 6. Chrterizing the time etween L2 misses Compring the Different Algorithms [280,Infinity) [200,280) [80,200) [0,80) Figure 7 ompres the exeution time of the pplitions uner ifferent ses: no prefething (), proessor-sie prefething s liste in Tle 4 (), ifferent ULMT shemes liste in Tle 4 (,, n ), the omintion of n (), n some ustomize lgorithms (Custom). The results re for the se where the memory proessor is integrte in the DRAM. For eh pplition n the verge, the rs re normlize to. The rs show the memory-inue proessor stll time tht is use y requests etween the proessor n the L2 he (UptoL2), n y requests eyon the L2 he (Be-

9 Normize Exeution Time Normlize Exeution Time Custom CG Equke FT Gp Mf Custom MST Prser Sprse Tree Averge Custom BeyonL2 UptoL2 Busy BeyonL2 UptoL2 Busy Figure 7. Exeution time of the pplitions with ifferent prefething lgorithms. yonl2). The remining time (Busy) inlues proessor omputtion plus other pipeline stlls. A system with perfet L2 he woul only hve the Busy n UptoL2 times. On verge, BeyonL2 is the most signifint omponent of the exeution time uner. It ounts for 44% of the time. Thus, lthough our ULMT shemes only trget L2 he misses, they trget the min ontriutor to the exeution time. performs very well on CG euse sequentil ptterns ominte. However, it is ineffetive in pplitions suh s Mf n Tree tht hve purely irregulr ptterns. On verge, reues the exeution time y 17%. The pir-se shemes show mixe performne. shows limite speeups, mostly euse it oes not prefeth fr enough. On verge, it reues s exeution time y 6%. performs little etter, ut it is limite y inury (Figure 5) n high response time (Setion 3.3.1). On verge, it reues s exeution time y 12%. is le to reue the exeution time signifintly. It performs well in lmost ll pplitions. It outperforms oth n in ll ses. Its impt omes from the nie properties of the ite lgorithm, s isusse in Setion The verge of the pplition speeups of over is Finlly, performs the est. On verge, it removes over hlf of the BeyonL2 stll time, n elivers n verge pplition speeup of 1.46 over. If we ompre the impt of proessor-sie prefething only () n memory-sie prefething only (), we see tht they hve onstrutive effet in. The reson is tht the two shemes help eh other. Speifilly, the proessor-sie prefether prefethes n elimintes the sequentil misses. The memory-sie prefether works in Non- Verose moe (Setion 3.2) n, therefore, oes not see the prefeth requests. Therefore, it n fully fous on the irregulr miss ptterns. With the resulting reue lo, the ULMT is more effetive. Algorithm Customiztion. In this first pper on ULMT prefething, we hve ttempte only very simple ustomiztion for few pplitions. Tle 5 shows the hnges. For CG, we run Seq1+ in Verose moe. For MST n Mf, we run with higher Num- Levels. In ll ses, is on. The results re shown in Figure 7 s the Custom r in the three pplitions. Applition Customize ULMT Algorithm CG Seq1+ in Verose moe MST, Mf with NumLevels = 4 Tle 5. Customiztions performe. is lso on. The ustomiztion in CG tries to further exploit positive intertion etween proessor- n memory-sie prefething. While CG only hs sequentil miss ptterns (Figure 5), its multiple strems overwhelm the onventionl prefether. Inee, lthough proessorsie prefethes re very urte (99.8% of the prefethe lines re referene), they re not timely enough (only 64% re timely) euse some of them miss in the L2 he. In our ustomiztion, we turn on the Verose moe so tht proessor-sie prefeth requests re seen y the ULMT. Furthermore, the ULMT is extene with single-strem sequentil prefeth lgorithm (Seq1) efore exeuting. In this environment, the positive intertion etween the two prefethers inreses. Speifilly, while the pplition referenes the ifferent strems in n interleve mnner, the proessor-sie prefether unsrmles the miss sequene into hunks of smestrem prefeth requests. The Seq1 prefether in the ULMT then esily ientifies eh strem n, very effiiently, prefethes he. As result, 81% of the proessor-sie prefethes rrive in timely mnner. With this ustomiztion, the speeup of CG improves from 2.19 (with ) to This se emonstrtes tht even regulr pplitions tht re menle to sequentil proessor-sie prefething n enefit from ULMT prefething. The ustomiztion in MST n Mf tries to exploit preitility eyon the thir level of suessor misses y setting NumLevels to 4 in. As shown in Figure 7, this pproh is suessful for MST, ut it proues mrginl gins in Mf. Overll, this initil ttempt t ustomiztion shows promising results. After pplying ustomiztion on three pplitions, the verge exeution speeup of the nine pplitions reltive to eomes 1.53.

10 Normlize Exeution Time MC MC MC MC MC CG Equke FT Gp Mf MST Prser Sprse Tree Averge Figure 8. Exeution time for ifferent lotions of the memory proessor. MC MC MC MC MC BeyonL2 UptoL2 Busy Lotion of Memory Proessor. Figure 8 exmines the impt of where we ple the memory proessor (Figure 3). The first two rs for eh pplition re tken from Figure 7: n. The lst r for eh pplition orrespons to the lgorithm with the memory proessor ple in the memory ontroller (North Brige) hip (MC). With the proessor in the North Brige hip, we hve twie the memory ess lteny (100 yles vs. 56 yles), eight times lower memory nwith (3.2 GB/se vs GB/se), n n itionl 25-yle ely seen y the prefeth requests efore they reh the DRAM. However, Figure 8 shows tht the impt on the exeution time is very smll. It results in smll erese in verge speeups from 1.46 to The impt is smll thnks to the ility of to urtely prefeth fr he. Only the timeliness of the immeite suessor prefethes is ffete, while the prefething of further levels of suessors is still timely. Overll, given these results n the hrwre ost of the two esigns, we onlue tht putting the memory proessor in the North Brige hip is the most ost-effetive esign of the two. Prefething Effetiveness. To gin further insight into these prefething shemes, Figure 9 exmines the effetiveness of the lines prefethe into the L2 he y the ULMT. These lines re lle prefethes. The figure shows t for Sprse, Tree, n the verge of the other seven pplitions. The figure omines oth L2 misses n prefethes, n reks them own into 5 tegories: prefethes tht eliminte n L2 miss (Hits), prefethes tht eliminte prt of the lteny of n L2 miss euse they rrive it lte (DelyeHits), L2 misses tht py the full lteny (NonPrefMisses), n useless prefethes. Useless prefethes re further roken own into prefethes tht re rought into the L2 ut tht re not referene y the time they re reple (e), n prefethes tht re roppe on rrivl to L2 euse the sme line is lrey in the he (Reunnt). Sine Coverge is the frtion of the originl L2 misses tht re fully or prtilly eliminte, it is represente y the sum of Hits n DelyeHits s shown in Figure 9. NonPrefMisses in Figure 9 is the numer of L2 misses left fter prefething, reltive to the originl numer of L2 misses. Note tht NonPrefMisses n e higher thn 1.0 for some lgorithms. 1.0 NonPrefMisses is the numer of L2 misses eliminte reltive to the originl numer of L2 misses. NonPrefMisses n e roken own into two groups: those misses elow the 1.0 line in Figure 9 (1.0 Hits Delye- Hits) ome from the originl misses, while those ove the 1.0 line (Hits DelyeHits NonPrefMisses 1.0) re the new L2 onflit misses use y prefethes. Looking t the verge of the seven pplitions, we see why n re not effetive: their overge is smll. is hurt All these yle ounts re in min-proessor yles. L2miss+Pref Hits DelyeHits NonPrefMisses e Reunnt MC Sprse Tree Averge for 7 pplitions other thn Sprse n Tree Figure 9. Brekown of the L2 misses n lines prefethe y the ULMT (prefethes). The originl misses re normlize to 1. y its inility to prefeth fr he, while is hmpere y its high response time n limite ury. The figure lso shows tht hs high overge (0.74). However, this omes t the ost of useless prefethes (e plus Reunnt re equivlent to 50% of the originl misses) n itionl misses ue to onflits with prefethes (20% of the originl misses). We n see, therefore, tht vne pir-se shemes nee itionl nwith. seems to hve low overge, espite its high performne in Figure 7. The reson is tht the prefeth requests issue y the proessor-sie prefether, while effetive in eliminting L2 misses, re lumpe into the NonPrefMisses tegory in the figure if they reh memory. Sine the ULMT prefether is in Non-Verose moe, it oes not see these requests. Consequently, the ULMT prefether only fouses on the irregulr miss ptterns. ULMT prefethes tht eliminte irregulr misses pper s Hits+DelyeHits. Finlly, Figure 9 lso shows why Sprse n Tree showe limite speeups in Figure 7. They hve too mny onflits in the he, whih results in mny remining NonPrefMisses. Furthermore, their prefethes re not very urte, whih results in lrge e n Reunnt tegories. Work Lo of the ULMT. Figure 10 shows the verge response time n oupny time (Setion 3.1) for eh of the ULMT lgorithms, verge over ll pplitions. The times re mesure in 1.6 GHz yles. Eh r is roken own into omputtion time (Busy) n memory stll time (Mem). The numers on top of eh r show the verge IPC of the ULMT. The IPC is lulte s the numer MC MC

11 Numer of Proessor Cyles Mem Busy 0.6 MC MC Response Response Timetime Oupny time Time Figure 10. Averge response n oupny time of ifferent ULMT lgorithms in min-proessor yles. of instrutions ivie y the numer of memory proessor yles. The figure shows tht, in ll the lgorithms, the oupny time is less thn 200 yles. Consequently, the ULMT is fst enough to proess most of the L2 misses (Figure 6). Memory stll time is roughly hlf of the ULMT exeution time when the proessor is in the DRAM, n more when the proessor is in the North Brige hip (MC). n hve the lowest oupny time. Note tht s oupny is not muh higher thn s, espite the higher numer of tle uptes performe y. The resons re the fewer ssoitive serhes n the etter he line reuse in. The response time is most importnt for prefething effetiveness. The figure shows tht hs the lowest response time, t roun 30 yles. The response time of MC is out twie s muh. Fortuntely, the ite lgorithm is le to prefeth fr he urtely n, therefore, the effetiveness of prefething is not very sensitive to moest inrese in the response time. Min Memory Bus Utiliztion. Finlly, Figure 11 shows the utiliztion of the min memory us for vrious lgorithms, verge over ll pplitions. The inrese in us utiliztion inue y the vne lgorithms is ivie into two prts: inrese use nturlly y the reue exeution time, n itionl inrese use y the prefething trffi. Overll, the figure shows tht the inrese in us utiliztion is tolerle. The utiliztion inreses from the originl 20% to only 36% in the worst se (). Moreover, most of the inrese omes from the fster exeution; only 6% utiliztion is iretly ttriutle to the prefethes. In generl, the ft tht memory-sie prefething only s one-wy trffi to the min memory us, limits its nwith nees. % Utiliztion 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0 No prefething Due to the reue exeution time Due to prefething Relte Work Figure 11. Min memory us utiliztion MC Memory-Sie Prefething. Some memory-sie prefethers re simple hrwre ontrollers. For exmple, the NVIDIA hipset inlues the DASP ontroller in the North Brige hip [22]. It seems tht it is mostly trgete to strie reognition n uffers t lolly. The i860 hipset from Intel is reporte to hve prefeth he, whih my inite the presene of similr engine. Cooksey et l. [9] propose the Content- prefether, whih is hrwre ontroller tht monitors the t oming from memory. If n item ppers to e n ress, the engine prefethes it. Alexner n Keem [1] propose hrwre ontroller tht monitors requests t the min memory. If it oserves repetle ptterns, it prefethes rows of t from the DRAM to n SRAM uffer insie the memory hip. Overll, our sheme is ifferent in tht we use generl-purpose proessor running prefething lgorithm s user-level thre. Other stuies propose speilize progrmmle engines. For exmple, Hughes [11] n Yng n Leek [28] propose ing speilize engine to prefeth linke t strutures. While Hughes fouses on multiproessor proessing-in-memory system, Yng n Leek fous on uniproessor n put the engine t every level of the he hierrhy. The min proessor ownlos informtion on these engines out the linke strutures n wht prefethes to perform. Our sheme is ifferent in tht it hs generl ppliility. Another relte system is Impulse, n intelligent memory ontroller ple of rempping physil resses to improve the performne of irregulr pplitions [4]. Impulse oul prefeth t, ut only implements next-line prefething. Furthermore, it uffers t in the memory ontroller, rther thn sening it to the proessor. Correltion Prefething. Erly work on orreltion prefething n e foun in [2, 24]. More reently, severl uthors hve me further ontriutions. Chrney n Reeves stuy orreltion prefething n suggest omining strie prefether with generl orreltion prefether [6]. Joseph n Grunwl propose the si orreltion tle orgniztion n lgorithm tht we evlute [12]. Alexner n Keem use orreltion prefething slightly ifferently [1], s we inite ove. Sherwoo et l. use it to help strem uffers prefeth irregulr ptterns [26]. Finlly, Li et l. esign slightly ifferent orreltion prefether [18]. Speifilly, prefeth is not triggere y miss; inste, it is triggere y e-line preitor initing tht line in the he will not e use gin n, therefore, new line shoul e prefethe in. This sheme improves prefething timeliness t the expense of tighter integrtion of the prefether with the proessor, sine the prefether nees to oserve not only miss resses, ut lso referene resses n progrm ounters. We iffer from the reent works in importnt wys. First, they propose hrwre-only engines, whih often require expensive hrwre tles; we use flexile user-level thre on generl-purpose ore tht stores the tle s softwre struture in memory. Seon, exept for Alexner n Keem [1], they ple their engines etween the L1 n L2 hes, or etween the proessor n the L1; we ple the prefether in memory n fous on L2 misses. Time intervls etween L2 misses re lrge enough for ULMT to e vile n effetive. Finlly, we propose new tle orgniztion n prefething lgorithm tht, y exploiting inexpensive memory spe, inreses fr-he prefething n prefeth overge. Prefething Regulr Strutures. Severl shemes hve een propose to prefeth sequentil or strie ptterns. They inlue the Referene Preition tle of Chen n Ber [7], n the Strem uffers of Jouppi [13], Plhrl n Kessler [23], n Sherwoo et l. [26]. We se our proessor-sie prefether on these shemes. Proessor-Sie Prefething. There re mny more proposls for proessor-sie prefething, often for irregulr pplitions. A tiny, non-exhustive list inlues Choi et l. [8], Krlsson et l. [14], Lipsti et l. [19], Luk n Mowry [20], Roth et l. [25], n Zhng n Torrells [29]. Most of these shemes speifilly trget linke t strutures. They ten to rely on progrm informtion tht is ville to the proessor, like the resses n sizes of t stru-

Prefetching in an Intelligent Memory Architecture Using a Helper Thread

Prefetching in an Intelligent Memory Architecture Using a Helper Thread Prefething in n Intelligent Memory Arhiteture Using Helper Thre Yn Solihin, Jejin Lee, n Josep Torrells University of Illinois t Urn-Chmpign Mihign Stte University solihin,torrells @s.uiu.eu jlee@se.msu.eu

More information

Distance vector protocol

Distance vector protocol istne vetor protool Irene Finohi finohi@i.unirom.it Routing Routing protool Gol: etermine goo pth (sequene of routers) thru network from soure to Grph strtion for routing lgorithms: grph noes re routers

More information

UTMC APPLICATION NOTE UT1553B BCRT TO INTERFACE PSEUDO-DUAL-PORT RAM ARCHITECTURE INTRODUCTION ARBITRATION DETAILS DESIGN SELECTIONS

UTMC APPLICATION NOTE UT1553B BCRT TO INTERFACE PSEUDO-DUAL-PORT RAM ARCHITECTURE INTRODUCTION ARBITRATION DETAILS DESIGN SELECTIONS UTMC APPLICATION NOTE UT1553B BCRT TO 80186 INTERFACE INTRODUCTION The UTMC UT1553B BCRT is monolithi CMOS integrte iruit tht provies omprehensive Bus Controller n Remote Terminl funtions for MIL-STD-

More information

Greedy Algorithm. Algorithm Fall Semester

Greedy Algorithm. Algorithm Fall Semester Greey Algorithm Algorithm 0 Fll Semester Optimiztion prolems An optimiztion prolem is one in whih you wnt to fin, not just solution, ut the est solution A greey lgorithm sometimes works well for optimiztion

More information

Internet Routing. Reminder: Routing. CPSC Network Programming

Internet Routing. Reminder: Routing. CPSC Network Programming PS 360 - Network Progrmming Internet Routing Mihele Weigle eprtment of omputer Siene lemson University mweigle@s.lemson.eu pril, 00 http://www.s.lemson.eu/~mweigle/ourses/ps360 Reminer: Routing Internet

More information

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V)

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V) Definitions G = (V, E) V = set of verties (vertex / noe) E = set of eges (v, w) (v, w in V) (v, w) orere => irete grph (igrph) (v, w) non-orere => unirete grph igrph: w is jent to v if there is n ege from

More information

CS 241 Week 4 Tutorial Solutions

CS 241 Week 4 Tutorial Solutions CS 4 Week 4 Tutoril Solutions Writing n Assemler, Prt & Regulr Lnguges Prt Winter 8 Assemling instrutions utomtilly. slt $d, $s, $t. Solution: $d, $s, nd $t ll fit in -it signed integers sine they re 5-it

More information

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved. Chpter 9 Greey Tehnique Copyright 2007 Person Aison-Wesley. All rights reserve. Greey Tehnique Construts solution to n optimiztion prolem piee y piee through sequene of hoies tht re: fesile lolly optiml

More information

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems Distriuted Systems Priniples nd Prdigms Mrten vn Steen VU Amsterdm, Dept. Computer Siene steen@s.vu.nl Chpter 11: Distriuted File Systems Version: Deemer 10, 2012 2 / 14 Distriuted File Systems Distriuted

More information

Table-driven look-ahead lexical analysis

Table-driven look-ahead lexical analysis Tle-riven look-he lexil nlysis WUU YANG Computer n Informtion Siene Deprtment Ntionl Chio-Tung University, HsinChu, Tiwn, R.O.C. Astrt. Moern progrmming lnguges use regulr expressions to efine vli tokens.

More information

Internet Routing. IP Packet Format. IP Fragmentation & Reassembly. Principles of Internet Routing. Computer Networks 9/29/2014.

Internet Routing. IP Packet Format. IP Fragmentation & Reassembly. Principles of Internet Routing. Computer Networks 9/29/2014. omputer Networks 9/29/2014 IP Pket Formt Internet Routing Ki Shen IP protool version numer heder length (words) for qulity of servie mx numer remining hops (deremented t eh router) upper lyer protool to

More information

CICS Application Design

CICS Application Design CICS Applition Design In orer to lern whih questions hve een nswere orretly: 1. Print these pges. 2. Answer the questions. 3. Sen this ssessment with the nswers vi:. FAX to (212) 967-3498. Or. Mil the

More information

Containers: Queue and List

Containers: Queue and List Continers: Queue n List Queue A ontiner in whih insertion is one t one en (the til) n eletion is one t the other en (the he). Also lle FIFO (First-In, First-Out) Jori Cortell n Jori Petit Deprtment of

More information

10.2 Graph Terminology and Special Types of Graphs

10.2 Graph Terminology and Special Types of Graphs 10.2 Grph Terminology n Speil Types of Grphs Definition 1. Two verties u n v in n unirete grph G re lle jent (or neighors) in G iff u n v re enpoints of n ege e of G. Suh n ege e is lle inient with the

More information

COMPUTER EDUCATION TECHNIQUES, INC. (WEBLOGIC_SVR_ADM ) SA:

COMPUTER EDUCATION TECHNIQUES, INC. (WEBLOGIC_SVR_ADM ) SA: In orer to lern whih questions hve een nswere orretly: 1. Print these pges. 2. Answer the questions. 3. Sen this ssessment with the nswers vi:. FAX to (212) 967-3498. Or. Mil the nswers to the following

More information

Outline. Motivation Background ARCH. Experiment Additional usages for Input-Depth. Regular Expression Matching DPI over Compressed HTTP

Outline. Motivation Background ARCH. Experiment Additional usages for Input-Depth. Regular Expression Matching DPI over Compressed HTTP ARCH This work ws supported y: The Europen Reserh Counil, The Isreli Centers of Reserh Exellene, The Neptune Consortium, nd Ntionl Siene Foundtion wrd CNS-119748 Outline Motivtion Bkground Regulr Expression

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distriuted Systems Priniples nd Prdigms Christoph Dorn Distriuted Systems Group, Vienn University of Tehnology.dorn@infosys.tuwien..t http://www.infosys.tuwien..t/stff/dorn Slides dpted from Mrten vn Steen,

More information

COMMON FRACTIONS. or a / b = a b. , a is called the numerator, and b is called the denominator.

COMMON FRACTIONS. or a / b = a b. , a is called the numerator, and b is called the denominator. COMMON FRACTIONS BASIC DEFINITIONS * A frtion is n inite ivision. or / * In the frtion is lle the numertor n is lle the enomintor. * The whole is seprte into "" equl prts n we re onsiering "" of those

More information

CMPUT101 Introduction to Computing - Summer 2002

CMPUT101 Introduction to Computing - Summer 2002 CMPUT Introdution to Computing - Summer 22 %XLOGLQJ&RPSXWHU&LUFXLWV Chpter 4.4 3XUSRVH We hve looked t so fr how to uild logi gtes from trnsistors. Next we will look t how to uild iruits from logi gtes,

More information

Shared Memory Architectures. Programming and Synchronization. Today s Outline. Page 1. Message passing review Cosmic Cube discussion

Shared Memory Architectures. Programming and Synchronization. Today s Outline. Page 1. Message passing review Cosmic Cube discussion Tody s Outline Arhitetures Progrmming nd Synhroniztion Disuss pper on Cosmi Cube (messge pssing) Messge pssing review Cosmi Cube disussion > Messge pssing mhine Shred memory model > Communition > Synhroniztion

More information

MITSUBISHI ELECTRIC RESEARCH LABORATORIES Cambridge, Massachusetts. Introduction to Matroids and Applications. Srikumar Ramalingam

MITSUBISHI ELECTRIC RESEARCH LABORATORIES Cambridge, Massachusetts. Introduction to Matroids and Applications. Srikumar Ramalingam Cmrige, Msshusetts Introution to Mtrois n Applitions Srikumr Rmlingm MERL mm//yy Liner Alger (,0,0) (0,,0) Liner inepenene in vetors: v, v2,..., For ll non-trivil we hve s v s v n s, s2,..., s n 2v2...

More information

Error Numbers of the Standard Function Block

Error Numbers of the Standard Function Block A.2.2 Numers of the Stndrd Funtion Blok evlution The result of the logi opertion RLO is set if n error ours while the stndrd funtion lok is eing proessed. This llows you to rnh to your own error evlution

More information

Robust internal multiple prediction algorithm Zhiming James Wu, Sonika, Bill Dragoset*, WesternGeco

Robust internal multiple prediction algorithm Zhiming James Wu, Sonika, Bill Dragoset*, WesternGeco Roust internl multiple preition lgorithm Zhiming Jmes Wu, Sonik, Bill Drgoset*, WesternGeo Summry Multiple ttenution is n importnt t proessing step for oth mrine n ln t. Tehniques for surfe- rpily in the

More information

Pattern Matching. Pattern Matching. Pattern Matching. Review of Regular Expressions

Pattern Matching. Pattern Matching. Pattern Matching. Review of Regular Expressions Pttern Mthing Pttern Mthing Some of these leture slides hve een dpted from: lgorithms in C, Roert Sedgewik. Gol. Generlize string serhing to inompletely speified ptterns. pplitions. Test if string or its

More information

CS553 Lecture Introduction to Data-flow Analysis 1

CS553 Lecture Introduction to Data-flow Analysis 1 ! Ide Introdution to Dt-flow nlysis!lst Time! Implementing Mrk nd Sweep GC!Tody! Control flow grphs! Liveness nlysis! Register llotion CS553 Leture Introdution to Dt-flow Anlysis 1 Dt-flow Anlysis! Dt-flow

More information

Duality in linear interval equations

Duality in linear interval equations Aville online t http://ijim.sriu..ir Int. J. Industril Mthemtis Vol. 1, No. 1 (2009) 41-45 Dulity in liner intervl equtions M. Movhedin, S. Slhshour, S. Hji Ghsemi, S. Khezerloo, M. Khezerloo, S. M. Khorsny

More information

Midterm Exam CSC October 2001

Midterm Exam CSC October 2001 Midterm Exm CSC 173 23 Otoer 2001 Diretions This exm hs 8 questions, severl of whih hve suprts. Eh question indites its point vlue. The totl is 100 points. Questions 5() nd 6() re optionl; they re not

More information

Class Overview. Database Design. Database Design Process. Database Design. Introduction to Data Management CSE 414

Class Overview. Database Design. Database Design Process. Database Design. Introduction to Data Management CSE 414 Introution to Dt Mngement CSE 44 Unit 6: Coneptul Design E/R Digrms Integrity Constrints BCNF Introution to Dt Mngement CSE 44 E/R Digrms ( letures) CSE 44 Autumn 08 Clss Overview Dtse Design Unit : Intro

More information

Lesson 4.4. Euler Circuits and Paths. Explore This

Lesson 4.4. Euler Circuits and Paths. Explore This Lesson 4.4 Euler Ciruits nd Pths Now tht you re fmilir with some of the onepts of grphs nd the wy grphs onvey onnetions nd reltionships, it s time to egin exploring how they n e used to model mny different

More information

GENG2140 Modelling and Computer Analysis for Engineers

GENG2140 Modelling and Computer Analysis for Engineers GENG4 Moelling n Computer Anlysis or Engineers Letures 9 & : Gussin qurture Crete y Grn Romn Joles, PhD Shool o Mehnil Engineering, UWA GENG4 Content Deinition o Gussin qurture Computtion o weights n points

More information

Inter-domain Routing

Inter-domain Routing COMP 631: NETWORKED & DISTRIBUTED SYSTEMS Inter-domin Routing Jsleen Kur Fll 2016 1 Internet-sle Routing: Approhes DV nd link-stte protools do not sle to glol Internet How to mke routing slle? Exploit

More information

Parallelization Optimization of System-Level Specification

Parallelization Optimization of System-Level Specification Prlleliztion Optimiztion of System-Level Speifition Luki i niel. Gjski enter for Emedded omputer Systems University of liforni Irvine, 92697, US {li, gjski} @es.ui.edu strt This pper introdues the prlleliztion

More information

Using SIMD Registers and Instructions to Enable Instruction-Level Parallelism in Sorting Algorithms

Using SIMD Registers and Instructions to Enable Instruction-Level Parallelism in Sorting Algorithms Using SIMD Registers n Instrutions to Enle Instrution-Level Prllelism in Sorting Algorithms Timothy Furtk furtk@s.ulert. José Nelson Amrl mrl@s.ulert. Roert Niewiomski niewio@s.ulert. Deprtment of Computing

More information

COSC 6374 Parallel Computation. Non-blocking Collective Operations. Edgar Gabriel Fall Overview

COSC 6374 Parallel Computation. Non-blocking Collective Operations. Edgar Gabriel Fall Overview COSC 6374 Prllel Computtion Non-loking Colletive Opertions Edgr Griel Fll 2014 Overview Impt of olletive ommunition opertions Impt of ommunition osts on Speedup Crtesin stenil ommunition All-to-ll ommunition

More information

Resource and Memory Management Techniques for the High-Level Synthesis of Software Threads into Parallel FPGA Hardware

Resource and Memory Management Techniques for the High-Level Synthesis of Software Threads into Parallel FPGA Hardware Resoure n Memory Mngement Tehniques for the High-Level Synthesis of Softwre Thres into Prllel FPGA Hrwre Jongsok Choi, Stephen Brown, n Json Anerson ECE Deprtment, University of Toronto, Toronto, ON, Cn

More information

Cooperative Routing in Multi-Source Multi-Destination Multi-hop Wireless Networks

Cooperative Routing in Multi-Source Multi-Destination Multi-hop Wireless Networks oopertive Routing in Multi-Soure Multi-estintion Multi-hop Wireless Networks Jin Zhng Qin Zhng eprtment of omputer Siene n ngineering Hong Kong University of Siene n Tehnology, HongKong {zjzj, qinzh}@se.ust.hk

More information

Scalable Spatio-temporal Continuous Query Processing for Location-aware Services

Scalable Spatio-temporal Continuous Query Processing for Location-aware Services Slle Sptio-temporl Continuous uery Proessing for Lotion-wre Servies iopeng iong Mohme F. Mokel Wli G. Aref Susnne E. Hmrush Sunil Prhkr Deprtment of Computer Sienes, Purue University, West Lfyette, IN

More information

A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications $

A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications $ Informtion Systems 29 (2004) 23 46 A mthing lgorithm for mesuring the struturl similrity etween n XML oument n DTD n its pplitions $ Elis Bertino, Giovnn Guerrini, Mro Mesiti, * Diprtimento i Informti

More information

Evaluating Regular Expression Matching Engines on Network and General Purpose Processors

Evaluating Regular Expression Matching Engines on Network and General Purpose Processors Evluting Regulr Expression Mthing Engines on Network n Generl Purpose Proessors Mihel Behi Wshington University Computer Siene n Engineering St. Louis, MO 63130-4899 mehi@se.wustl.eu Chrlie Wisemn Wshington

More information

CS453 INTRODUCTION TO DATAFLOW ANALYSIS

CS453 INTRODUCTION TO DATAFLOW ANALYSIS CS453 INTRODUCTION TO DATAFLOW ANALYSIS CS453 Leture Register llotion using liveness nlysis 1 Introdution to Dt-flow nlysis Lst Time Register llotion for expression trees nd lol nd prm vrs Tody Register

More information

COSC 6374 Parallel Computation. Communication Performance Modeling (II) Edgar Gabriel Fall Overview. Impact of communication costs on Speedup

COSC 6374 Parallel Computation. Communication Performance Modeling (II) Edgar Gabriel Fall Overview. Impact of communication costs on Speedup COSC 6374 Prllel Computtion Communition Performne Modeling (II) Edgr Griel Fll 2015 Overview Impt of ommunition osts on Speedup Crtesin stenil ommunition All-to-ll ommunition Impt of olletive ommunition

More information

Asurveyofpractical algorithms for suffix tree construction in external memory

Asurveyofpractical algorithms for suffix tree construction in external memory Asurveyofprtil lgorithms for suffix tree onstrution in externl memory M. Brsky,, U. Stege n A. Thomo University of Vitori, PO Box, STN CSC Vitori, BC, VW P, Cn SUMMAY The onstrution of suffix trees in

More information

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms Prdigm. Dt Struture Known exmples: link tble, hep, Our leture: suffix tree Will involve mortize method tht will be stressed shortly in this ourse Suffix trees Wht is suffix tree? Simple pplitions History

More information

WORKSHOP 9 HEX MESH USING SWEEP VECTOR

WORKSHOP 9 HEX MESH USING SWEEP VECTOR WORKSHOP 9 HEX MESH USING SWEEP VECTOR WS9-1 WS9-2 Prolem Desription This exerise involves importing urve geometry from n IGES file. The urves re use to rete other urves. From the urves trimme surfes re

More information

Declarative Routing: Extensible Routing with Declarative Queries

Declarative Routing: Extensible Routing with Declarative Queries elrtive Routing: Extensile Routing with elrtive Queries Boon Thu Loo 1 Joseph M. Hellerstein 1,2, Ion toi 1, Rghu Rmkrishnn3, 1 University of Cliforni t Berkeley, 2 Intel Reserh Berkeley, 3 University

More information

COSC 6374 Parallel Computation. Dense Matrix Operations

COSC 6374 Parallel Computation. Dense Matrix Operations COSC 6374 Prllel Computtion Dense Mtrix Opertions Edgr Griel Fll Edgr Griel Prllel Computtion Edgr Griel erminology Dense Mtrix: ll elements of the mtrix ontin relevnt vlues ypilly stored s 2-D rry, (e.g.

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

Graph Contraction and Connectivity

Graph Contraction and Connectivity Chpter 14 Grph Contrtion n Connetivity So fr we hve mostly overe tehniques for solving problems on grphs tht were evelope in the ontext of sequentil lgorithms. Some of them re esy to prllelize while others

More information

Introduction. Example

Introduction. Example OMS0 Introution isjoint sets n minimum spnning trees In this leture we will strt by isussing t struture use for mintining isjoint subsets of some bigger set. This hs number of pplitions, inluing to mintining

More information

Introduction to Algebra

Introduction to Algebra INTRODUCTORY ALGEBRA Mini-Leture 1.1 Introdution to Alger Evlute lgeri expressions y sustitution. Trnslte phrses to lgeri expressions. 1. Evlute the expressions when =, =, nd = 6. ) d) 5 10. Trnslte eh

More information

Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow

Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow Shifting n Comption for the High-Level Synthesis of Designs with Complex Control low Sumit Gupt Nikil Dutt Rjesh Gupt Alexnru Niolu Center for Emee Computer Systems Shool of Informtion n Computer Siene

More information

FASTEST METHOD TO FIND ALTERNATIVE RE-ROUTE

FASTEST METHOD TO FIND ALTERNATIVE RE-ROUTE INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 FASTEST METHOD TO FIND ALTERNATIVE RE-ROUTE 1 M.JothiLkshmi, M.S., M.Phil. 2 C.Theeendr, M.S., M.Phil. 3 M.K.Pvithr,

More information

5 ANGLES AND POLYGONS

5 ANGLES AND POLYGONS 5 GLES POLYGOS urling rige looks like onventionl rige when it is extene. However, it urls up to form n otgon to llow ots through. This Rolling rige is in Pington sin in Lonon, n urls up every Friy t miy.

More information

McAfee Web Gateway

McAfee Web Gateway Relese Notes Revision C MAfee We Gtewy 7.6.2.11 Contents Aout this relese Enhnement Resolved issues Instlltion instrutions Known issues Additionl informtion Find produt doumenttion Aout this relese This

More information

The Network Layer: Routing in the Internet. The Network Layer: Routing & Addressing Outline

The Network Layer: Routing in the Internet. The Network Layer: Routing & Addressing Outline CPSC 852 Internetworking The Network Lyer: Routing in the Internet Mihele Weigle Deprtment of Computer Siene Clemson University mweigle@s.lemson.edu http://www.s.lemson.edu/~mweigle/ourses/ps852 1 The

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

Solids. Solids. Curriculum Ready.

Solids. Solids. Curriculum Ready. Curriulum Rey www.mthletis.om This ooklet is ll out ientifying, rwing n mesuring solis n prisms. SOM CUES The Som Cue ws invente y Dnish sientist who went y the nme of Piet Hein. It is simple 3 # 3 #

More information

CS 551 Computer Graphics. Hidden Surface Elimination. Z-Buffering. Basic idea: Hidden Surface Removal

CS 551 Computer Graphics. Hidden Surface Elimination. Z-Buffering. Basic idea: Hidden Surface Removal CS 55 Computer Grphis Hidden Surfe Removl Hidden Surfe Elimintion Ojet preision lgorithms: determine whih ojets re in front of others Uses the Pinter s lgorithm drw visile surfes from k (frthest) to front

More information

A decision support system prototype for fuzzy multiple objective optimization

A decision support system prototype for fuzzy multiple objective optimization EUSFLAT - LFA A eision support system prototype for fuzzy multiple ojetive optimiztion Fengjie Wu Jie Lu n Gungqun Zhng Fulty of Informtion Tehnology University of Tehnology Syney Austrli E-mil: {fengjiewjieluzhngg}@it.uts.eu.u

More information

[SYLWAN., 158(6)]. ISI

[SYLWAN., 158(6)]. ISI The proposl of Improved Inext Isomorphi Grph Algorithm to Detet Design Ptterns Afnn Slem B-Brhem, M. Rizwn Jmeel Qureshi Fulty of Computing nd Informtion Tehnology, King Adulziz University, Jeddh, SAUDI

More information

LINX MATRIX SWITCHERS FIRMWARE UPDATE INSTRUCTIONS FIRMWARE VERSION

LINX MATRIX SWITCHERS FIRMWARE UPDATE INSTRUCTIONS FIRMWARE VERSION Overview LINX MATRIX SWITCHERS FIRMWARE UPDATE INSTRUCTIONS FIRMWARE VERSION 4.4.1.0 Due to the omplex nture of this updte, plese fmilirize yourself with these instrutions nd then ontt RGB Spetrum Tehnil

More information

Minimal Memory Abstractions

Minimal Memory Abstractions Miniml Memory Astrtions (As implemented for BioWre Corp ) Nthn Sturtevnt University of Alert GAMES Group Ferury, 7 Tlk Overview Prt I: Building Astrtions Minimizing memory requirements Performnes mesures

More information

ORGANIZER QUICK START GUIDE

ORGANIZER QUICK START GUIDE NOTES ON USING GOTOWEBINAR GoToWeinr Orgnizers my hol Weinrs for up to 1,000 ttenees. The Weinr proess n e roken into three stges: Weinr Plnning, Weinr Presenttion n Weinr Follow-up. Orgnizers nee to first

More information

Rolling Back Remote Provisioning Changes. Dell Command Integration for System Center

Rolling Back Remote Provisioning Changes. Dell Command Integration for System Center Rolling Bk Remote Provisioning Chnges Dell Commn Integrtion for System Center Notes, utions, n wrnings NOTE: A NOTE inites importnt informtion tht helps you mke etter use of your prout. CAUTION: A CAUTION

More information

Advanced Programming Handout 5. Enter Okasaki. Persistent vs. Ephemeral. Functional Queues. Simple Example. Persistent vs.

Advanced Programming Handout 5. Enter Okasaki. Persistent vs. Ephemeral. Functional Queues. Simple Example. Persistent vs. Avne Progrmming Hnout 5 Purel Funtionl Dt Strutures: A Cse Stu in Funtionl Progrmming Persistent vs. Ephemerl An ephemerl t struture is one for whih onl one version is ville t time: fter n upte opertion,

More information

Comparison-based Choices

Comparison-based Choices Comprison-se Choies John Ugner Mngement Siene & Engineering Stnfor University Joint work with: Jon Kleinerg (Cornell) Senhil Mullinthn (Hrvr) EC 17 Boston June 28, 2017 Preiting isrete hoies Clssi prolem:

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

Troubleshooting. Verify the Cisco Prime Collaboration Provisioning Installation (for Advanced or Standard Mode), page

Troubleshooting. Verify the Cisco Prime Collaboration Provisioning Installation (for Advanced or Standard Mode), page Trouleshooting This setion explins the following: Verify the Ciso Prime Collortion Provisioning Instlltion (for Advned or Stndrd Mode), pge 1 Upgrde the Ciso Prime Collortion Provisioning from Smll to

More information

6.045J/18.400J: Automata, Computability and Complexity. Quiz 2: Solutions. Please write your name in the upper corner of each page.

6.045J/18.400J: Automata, Computability and Complexity. Quiz 2: Solutions. Please write your name in the upper corner of each page. 6045J/18400J: Automt, Computbility nd Complexity Mrh 30, 2005 Quiz 2: Solutions Prof Nny Lynh Vinod Vikuntnthn Plese write your nme in the upper orner of eh pge Problem Sore 1 2 3 4 5 6 Totl Q2-1 Problem

More information

To access your mailbox from inside your organization. For assistance, call:

To access your mailbox from inside your organization. For assistance, call: 2001 Ative Voie, In. All rights reserved. First edition 2001. Proteted y one or more of the following United Sttes ptents:,070,2;,3,90;,88,0;,33,102;,8,0;,81,0;,2,7;,1,0;,90,88;,01,11. Additionl U.S. nd

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Kulleġġ San Ġorġ Preca Il-Liċeo tas-subien Ħamrun. Name & Surname: A) Mark the correct answer by inserting an X in the correct box. a b c d.

Kulleġġ San Ġorġ Preca Il-Liċeo tas-subien Ħamrun. Name & Surname: A) Mark the correct answer by inserting an X in the correct box. a b c d. Kulleġġ Sn Ġorġ Pre Il-Liċeo ts-suien Ħmrun Hlf Yerly Exmintion 2012 Trk 3 Form 3 INFORMATION TECHNOLOGY Time : 1hr 30 mins Nme & Surnme: Clss: A) Mrk the orret nswer y inserting n X in the orret ox. 1)

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

Graph theory Route problems

Graph theory Route problems Bhelors thesis Grph theory Route prolems Author: Aolphe Nikwigize Dte: 986 - -5 Sujet: Mthemtis Level: First level (Bhelor) Course oe: MAE Astrt In this thesis we will review some route prolems whih re

More information

The Droplet Virtual Brush for Chinese Calligraphic Character Modeling

The Droplet Virtual Brush for Chinese Calligraphic Character Modeling The Droplet Virtul Brush for Chinese Clligrphi Chrter Moeling Xiofeng Mi Jie Xu Min Tng Jinxing Dong CAD & CG Stte Key L of Chin, Zhejing University, Hngzhou, Chin Artifiil Intelligene Institute, Zhejing

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS

SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS Progress In Eletromgnetis Reserh C, Vol. 3, 195 22, 28 SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS W.-L. Chen nd G.-M. Wng Rdr Engineering Deprtment Missile Institute of Air Fore Engineering

More information

An Efficient Algorithm for the Physical Mapping of Clustered Task Graphs onto Multiprocessor Architectures

An Efficient Algorithm for the Physical Mapping of Clustered Task Graphs onto Multiprocessor Architectures An Effiient Algorithm for the Physil Mpping of Clustere Tsk Grphs onto Multiproessor Arhitetures Netrios Koziris Pnyiotis Tsnks Mihel Romesis George Ppkonstntinou Ntionl Tehnil University of Athens Dept.

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

WORKSHOP 8B TENSION COUPON

WORKSHOP 8B TENSION COUPON WORKSHOP 8B TENSION COUPON WS8B-2 Workshop Ojetives Prtie reting n eiting geometry Prtie mesh seeing n iso meshing tehniques. WS8B-3 Suggeste Exerise Steps 1. Crete new tse. 2. Crete geometry moel of the

More information

WORKSHOP 19 GLOBAL/LOCAL MODELING USING FEM FIELDS

WORKSHOP 19 GLOBAL/LOCAL MODELING USING FEM FIELDS WORKSHOP 19 GLOBAL/LOCAL MODELING USING FEM FIELDS WS19-1 WS19-2 Prolem Desription This exerise is use to emonstrte how to mp isplement results from the nlysis of glol(overll) moel onto the perimeter of

More information

PROBLEM OF APOLLONIUS

PROBLEM OF APOLLONIUS PROBLEM OF APOLLONIUS In the Jnury 010 issue of Amerin Sientist D. Mkenzie isusses the Apollonin Gsket whih involves fining the rius of the lrgest irle whih just fits into the spe etween three tngent irles

More information

SAS Event Stream Processing 5.1: Using SAS Event Stream Processing Studio

SAS Event Stream Processing 5.1: Using SAS Event Stream Processing Studio SAS Event Strem Proessing 5.1: Using SAS Event Strem Proessing Stuio Overview to SAS Event Strem Proessing Stuio Overview SAS Event Strem Proessing Stuio is we-se lient tht enles you to rete, eit, uplo,

More information

Outline. CS38 Introduction to Algorithms. Graphs. Graphs. Graphs. Graph traversals

Outline. CS38 Introduction to Algorithms. Graphs. Graphs. Graphs. Graph traversals Outline CS38 Introution to Algorithms Leture 2 April 3, 2014 grph trversls (BFS, DFS) onnetivity topologil sort strongly onnete omponents heps n hepsort greey lgorithms April 3, 2014 CS38 Leture 2 2 Grphs

More information

Calculus Differentiation

Calculus Differentiation //007 Clulus Differentition Jeffrey Seguritn person in rowot miles from the nerest point on strit shoreline wishes to reh house 6 miles frther down the shore. The person n row t rte of mi/hr nd wlk t rte

More information

1 Which of the following keyword can not be appeared inside the class? a)virtual b)static c)template d)friend c

1 Which of the following keyword can not be appeared inside the class? a)virtual b)static c)template d)friend c 1 Whih of the following keywor n not e ppere insie the lss? )virtul )stti )templte )frien 2 Wht is templte? )Templte is formul for reting generi lss )Templte is use to mnipulte lss )Templte is use for

More information

COMP108 Algorithmic Foundations

COMP108 Algorithmic Foundations Grph Theory Prudene Wong http://www.s.liv..uk/~pwong/tehing/omp108/201617 How to Mesure 4L? 3L 5L 3L ontiner & 5L ontiner (without mrk) infinite supply of wter You n pour wter from one ontiner to nother

More information

Pipeline Example: Cycle 1. Pipeline Example: Cycle 2. Pipeline Example: Cycle 4. Pipeline Example: Cycle 3. 3 instructions. 3 instructions.

Pipeline Example: Cycle 1. Pipeline Example: Cycle 2. Pipeline Example: Cycle 4. Pipeline Example: Cycle 3. 3 instructions. 3 instructions. ipeline Exmple: Cycle 1 ipeline Exmple: Cycle X X/ /W X X/ /W $3,$,$1 lw $,0($5) $3,$,$1 3 instructions 8 9 ipeline Exmple: Cycle 3 ipeline Exmple: Cycle X X/ /W X X/ /W sw $6,($7) lw $,0($5) $3,$,$1 sw

More information

Lecture 13: Graphs I: Breadth First Search

Lecture 13: Graphs I: Breadth First Search Leture 13 Grphs I: BFS 6.006 Fll 2011 Leture 13: Grphs I: Bredth First Serh Leture Overview Applitions of Grph Serh Grph Representtions Bredth-First Serh Rell: Grph G = (V, E) V = set of verties (ritrry

More information

Comparing Hierarchical Data in External Memory

Comparing Hierarchical Data in External Memory Compring Hierrhil Dt in Externl Memory Surshn S. Chwthe Deprtment of Computer Siene University of Mryln College Prk, MD 090 hw@s.um.eu Astrt We present n externl-memory lgorithm for omputing minimum-ost

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

Compiling a Parallel DSL to GPU

Compiling a Parallel DSL to GPU Compiling Prllel DSL to GPU Rmesh Nrynswmy Bdri Gopln Synopsys In. Synopsys 2012 1 Agend Overview of Verilog Simultion Prllel Verilog Simultion Algorithms Prllel Simultion Trdeoffs on GPU Chllenges Synopsys

More information

Lecture 8: Graph-theoretic problems (again)

Lecture 8: Graph-theoretic problems (again) COMP36111: Advned Algorithms I Leture 8: Grph-theoreti prolems (gin) In Prtt-Hrtmnn Room KB2.38: emil: iprtt@s.mn..uk 2017 18 Reding for this leture: Sipser: Chpter 7. A grph is pir G = (V, E), where V

More information

An Efficient Code Update Scheme for DSP Applications in Mobile Embedded Systems

An Efficient Code Update Scheme for DSP Applications in Mobile Embedded Systems An Effiient Code Updte Sheme for DSP Applitions in Moile Emedded Systems Weiji Li, Youto Zhng Computer Siene Deprtment,University of Pittsurgh,Pittsurgh, PA 526 {weijili,zhngyt}@s.pitt.edu Astrt DSP proessors

More information

UT1553B BCRT True Dual-port Memory Interface

UT1553B BCRT True Dual-port Memory Interface UTMC APPICATION NOTE UT553B BCRT True Dul-port Memory Interfce INTRODUCTION The UTMC UT553B BCRT is monolithic CMOS integrted circuit tht provides comprehensive MI-STD- 553B Bus Controller nd Remote Terminl

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

Hash-based Subgraph Query Processing Method for Graph-structured XML Documents

Hash-based Subgraph Query Processing Method for Graph-structured XML Documents Hsh-bse Subgrph Query Proessing Metho for Grph-struture XML Douments Hongzhi Wng Hrbin Institute of Teh. wngzh@hit.eu.n Jinzhong Li Hrbin Institute of Teh. lijzh@hit.eu.n Jizhou Luo Hrbin Institute of

More information

FEEDBACK: The standard error of a regression is not an unbiased estimator for the standard deviation of the error in a multiple regression model.

FEEDBACK: The standard error of a regression is not an unbiased estimator for the standard deviation of the error in a multiple regression model. Introutory Eonometris: A Moern Approh 6th Eition Woolrige Test Bnk Solutions Complete ownlo: https://testbnkre.om/ownlo/introutory-eonometris-moern-pproh-6th-eition-jeffreym-woolrige-test-bnk/ Solutions

More information

Agilent Mass Hunter Software

Agilent Mass Hunter Software Agilent Mss Hunter Softwre Quick Strt Guide Use this guide to get strted with the Mss Hunter softwre. Wht is Mss Hunter Softwre? Mss Hunter is n integrl prt of Agilent TOF softwre (version A.02.00). Mss

More information

Accurate Indirect Branch Prediction

Accurate Indirect Branch Prediction Aurte Indiret Brnh Predition Krel Driesen nd Urs Hölzle Deprtment of Computer Siene University of Cliforni Snt Brbr, CA 9 Abstrt Indiret brnh predition is likely to beome inresingly importnt in the future

More information