Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

Size: px
Start display at page:

Download "Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors"

Transcription

1 Dt-Flow Prescheduling for Lrge Instruction Windows in Out-of-Order Processors Pierre Michud, André Seznec IRISA/INRIA Cmpus de Beulieu, 35 Rennes Cedex, Frnce {pmichud, Abstrct The performnce of out-of-order processors increses with the instruction window size. In conventionl processors, the effective instruction window cnnot be lrger thn the issue buffer. Determining which instructions from the issue buffer cn be lunched to the execution units is timecriticl opertion which complexity increses with the issue buffer size. We propose to relieve the issue stge by reordering instructions before they enter the issue buffer. This study introduces the generl principle of dt-flow prescheduling. Then we describe possible implementtion. Our preliminry results show tht dt-flow prescheduling mkes it possible to enlrge the effective instruction window while keeping the issue buffer smll. 1. Introduction Processor performnce is strongly correlted with the clock cycle. Shorter clock cycle hs been llowed by both improvements in silicon technology nd creful processor design. As consequence of this evolution, the IPC (verge number of instructions committed per clock cycle) of future processors my decrese rther thn increse [1]. This IPC decy comes from the dispersion of instruction ltencies. In prticulr, lod ltencies in CPU cycles tend to increse cross technology genertions, while ALU opertion ltency remins one cycle. A solution for overcoming the IPC decy is to enlrge the processor instruction window [6, 15], both physiclly (issue buffer, physicl registers...) nd logiclly, through better brnch prediction ccurcy or brnches removed by prediction. However, the instruction window should be enlrged without impiring the clock cycle. In prticulr, the issue buffer nd issue logic re mong the most serious obstcles to enlrging the physicl instruction window [11]. In this pper, we study the ddition of preschedule stge before the issue stge to combine the benefit of lrge instruction window nd short clock cycle. We introduce dt-flow prescheduling. Instructions re sent to the issue buffer in predicted dt-flow order insted of the sequentil order, llowing smller issue buffer. The rtionle of this proposl is to void using entries in the issue buffer for instructions which opernds re known to be yet unvilble. In our proposl, this reordering of instructions is ccomplished through n rry of schedule lines. Ech schedule line corresponds to different depth in the dt-flow grph. The depth of ech instruction in the dt-flow grph is determined, nd the instruction is inserted in the corresponding schedule line. Lines re consumed by the issue buffer sequentilly. Section briefly describes issue buffers nd discusses relted works. Section 3 describes our processor model nd experimentl set-up. Section presents the generl principle of prescheduling nd introduces dt-flow prescheduling. Section 5 describes possible implementtion for dtflow prescheduling. Section 6 nlyses the efficiency of the implementtion proposed bsed on experimentl results. Finlly, Section 7 gives some directions for future reserch.. Bckground nd relted works The issue buffer is the hrdwre structure mterilizing the instruction window. Instructions wit in the issue buffer until they re redy to be lunched to the execution units. Unlike the reorder buffer [1], instructions cn be removed from the issue buffer soon fter issuing, to mke room for new instructions. The two min phses of instruction issue re the wkeup phse nd the selection phse [11]. The wke-up phse determines which instructions hve their dt dependencies resolved. The selection phse resolves resource conflicts nd determines which instructions cn effectively issue. The dely of the wke-up nd selection phses increses with the issue buffer size [11], which mkes lrge issue buffer hrdly comptible with short clock cycle. In some processors like the Alph 16 [8], the issue buffer is collpsble in order to mintin instructions in sequentil order nd fcilitte the insertion of new instruc-

2 tions. Mintining the sequentil order llows the selection logic to give priority to older instructions. In currently vilble processors, seprte issue buffers re implemented for integer nd floting-point instructions, typiclly to times smller thn the reorder buffer (in number of instructions). The integer issue buffer typiclly does not exceed entries in current processors (-entry integer queue in the Alph 16, 18-entry integer scheduler in the AMD Athlon, -entry reservtion sttion in the Intel P6...). Both micro-rchitecturl nd circuit-level solutions hve been proposed for enbling the use of lrge instruction window. In [11, 1], it ws proposed to distribute the issue logic mong multiple clusters of execution units. This solution trdes globl communictions for fst locl communictions. The trce processor [13] is n exmple of such proposition. A chrcteristic of these propositions is tht the instruction window size is proportionl to the number of execution units. A circuit-level pproch ws proposed recently for tckling the window size problem specificlly [5] : the reorder buffer nd the issue buffer re merged, nd prllel-prefix circuits re used for the wke-up nd selection phses. The ide of prescheduling is not new. A dependencebsed prescheduler ws proposed in [11], tht tries to form chins of dependent instructions in set of FIFOs. This is further discussed in Section.1. An ide close to ours ws proposed in [3], but with different implementtion. Note on the issue buffer size. In some processors, instructions my hve to be re-issued. For exmple, on the Alph 16 [7], when lod is predicted to hit in the dt cche but ctully misses, two issue cycles re nnulled nd the issue buffer stte is restored. This requires tht instructions remin vlid in the issue buffer for few cycle fter they hve been issued. These instructions constitute n invisible prt of the issue buffer, which size depends on the issue width nd on the number of pipeline stges between the issue stge nd the execution stge. All issue buffer sizes reported in this study re for the visible prt of the issue buffer. 3. Processor model nd experimentl set-up The processor simulted in this pper is n out-of-order supersclr processor. The two processor configurtions simulted, idel nd 8-wy, re described on Tbles 1 nd respectively. The brnch predictor simulted is 3x16k-entry e-gskew predictor [1]. The size of the reorder buffer, i.e., the number of physicl registers, ws fixed lrge enough so tht it does not interfere with our study. Brnch misprediction recovery is performed s soon s mispredicted brnch is executed. The cche ltencies reported in Tble 1 nd re futuristic vlues nticipting smller feture sizes [1]. instruction cche brnch predictor perfect 3x16k-entry e-gskew globl history: 1 brnches unlimited 1 (fetch/decode) 96 instructions vrible N N universl pipelined issue, X execute, retire most int: X=1 cycle (int) mul: X=7, div: X= lod: X=1+ (ddr, cche) store: X=1+1 (ddr, forwrd) fetch bndwidth front-end stges reorder buffer issue buffer issue width execution units bck-end stges min ltencies dt cche memory dependency predictor perfect perfect Tble 1. Idel configurtion fetch 8 instructions front-end stges 1 (+) issue width 8 execution units 8 universl L1 dt cche 8 Kbytes, direct mpped 6-byte lines unlimited bndwidth L dt cche perfect 15-cycle ltency store set SSIT : 16k entries (tgged) predictor LFST : 18 entries (tgged) Tble. 8-wy configurtion : prmeters not specified re identicl to the idel configurtion. We ssume the issue buffer is distinct from the reorder buffer. It schedules ll the instructions, except those tht re executed in the pipeline front-end, like unconditionl jumps. The issue buffer is collpsble. When instructions re competing for the issue bndwidth, instructions tht entered the issue buffer first re given priority. As we focus our ttention on the visible prt of the issue buffer, we did not simulte the impct of pipeline stges between the issue stge nd the execution stge, which is distinct problem. Lod/store dependencies. When considering lrge instruction windows, we must py ttention to dependencies between lods nd stores. Previous studies hve shown tht memory dependencies cn be predicted with high ccurcy using pst behvior. The memory dependency predic-

3 fetch & decode sequentil order pre-scheduler dt-flow order issue buffer execution Figure 1. The prescheduler sends instructions to the issue buffer in the order defined by dt dependencies. tor used in this study for the 8-wy configurtion is the store set predictor []. The Store Set Identifier Tble (SSIT) is 16k-entry tgged tble (-wy set-ssocitive). When lod misses in the SSIT, it is predicted to crry dependency with no inflight store. A lod is predicted to be dependent on the store encountered the more recently in its store set. The dependency is enforced by the issue logic : the lod will issue fter the store, so tht it cn ctch the correct vlue. As recommended in [], dependencies re enforced between stores belonging to the sme store set in order to reduce the number of memory order violtions. In our simultions, the number of memory order violtions did not exceed % of the number of brnch mispredictions. Benchmrks. All simultions re trce-driven simultions using the IBS trces [16]. The eight trces reflect the execution of sequentil pplictions on MIPS-bsed worksttion, including system ctivity. With the L1 dt cche simulted in the 8-wy configurtion, there is n verge 5% cche miss rtio on our benchmrks, nroff hving the lowest ( %) nd verilog nd video ply the highest (7-8 %) With the brnch predictor simulted, the verge number of instructions between consecutive brnch mispredictions lies between 1 (rel gcc) nd 35 (video ply), nd between nd 5 for other benchmrks.. Prescheduling In tody processors, instructions re pushed in the issue buffer in sequentil order, therefore instructions depending on long dependency chin occupy the issue buffer for long time. All the issue buffer entries re checked on every cycle. This process is time consuming nd the dely increses with the number of entries in the issue buffer. The generl ide behind prescheduling is to llow only instructions which re likely to become fireble in the very next cycles to enter the issue buffer. Informtion on dt dependencies nd instruction ltencies re known before the issue stge nd cn be used for prescheduling. The principle of prescheduling is depicted on Figure 1. Insted of being sent to the issue buffer in sequentil order, instructions re reordered by prescheduler so tht they enter the issue buffer in the dt-flow order, i.e., the order of execution ssuming unlimited execution resources, tking into ccount only dt dependencies. The instructions wit in preschedule buffer until they cn enter the issue buffer. If the predicted dt-flow order is close enough to n optiml issue order, then the issue buffer cn be very smll s it is relieved from the tsk of buffering instructions not yet fireble. In fct, the issue buffer size should be closer to the issue width thn to the effective instruction window size. The job of the hrdwre prescheduler is somewht similr to tht of compiler scheduling instructions within bsic blocks. However, hrdwre prescheduler works on lrge trces of severl tens or hundreds of instructions discovered t run time nd which length is not known priori. Dedlocks. To prevent dedlocks, the prescheduler must ensure tht if instruction B is dependent on instruction A, A enters the issue buffer before B..1. Dependence-bsed prescheduling The dependence-bsed prescheduler presented in [11] is n exmple of prescheduling scheme. The preschedule buffer consists of severl FIFOs. The issue buffer is the set of ll FIFOs heds, hence the issue buffer size is equl to the number of FIFOs. The prescheduling logic forms chins of dependent instructions in FIFOs : n instruction is steered to FIFO such tht it depends on the lst instruction in the FIFO. If it is not possible to ppend n instruction to n existing chin, the instruction is steered to n empty FIFO. When this is not possible, the steering logic stlls until one FIFO gets empty. We verified tht, experimentlly, dependence-bsed prescheduler with N FIFOs is roughly equivlent to n issue buffer of N instructions. A first limittion comes from the complexity of the dt-flow structure of progrms. There re mny very short chins ending on brnch or store, some chins re merged becuse of dydic instructions, severl chins re forked when the sme register vlue is used severl times. There is nother limittion : the optiml distribution in FIFOs would require to enqueue instructions out of the progrm order nd tke into ccount instruction ltencies. Trying to find the optiml distribution on simple exmples convinced us tht this is hrd problem, nd tht it would be difficult to improve on the published heuristic. To overcome these limittions, the dt-flow prescheduler proposed in this pper tkes different pproch. First, 3

4 ctive line C C1 B B1 F F1 E E1 AD A1D A1 B1 C1 D1 E1 F1 A B C D E F lod r < (r1) dd r < r, 1 store r, (r1) dd r1 < r1, 1 sub r < r1, r bltz r, loop lod r < (r1) dd r < r, 1 store r, (r1) dd r1 < r1, 1 sub r < r1, r bltz r, loop schedule_line = mx({source_use_line}, ctive_line) use_line = schedule_line + execution_ltency Figure. Dt-flow prescheduling exmple it defines globl dt-flow order insted of prtil one. Second, it tkes into ccount instruction ltencies, in prticulr lod ltencies... Dt-flow prescheduling Idelly, one would like to send instructions to the issue buffer only when they become fireble. We try to pproch this idel through rel hrdwre. First we ssume unlimited execution resources. The depth of ech instruction in the dt flow grph is computed, tking into ccount dt dependencies nd instruction ltencies (for simplicity, we ssume ll lods hit in the L1 cche). The dt-flow depth for n instruction corresponds to its idel issue cycle, ssuming unlimited execution resources. The reordering of instructions is done through preschedule buffer implemented s n rry of schedule lines. Ech schedule line is ssocited with n issue cycle. An instruction is inserted in the schedule line corresponding to its idel issue cycle. The issue buffer consumes the lines sequentilly. Hence, ssuming unlimited execution resources nd perfect prescheduling, instructions spend single cycle in the issue buffer, nd instructions in the sme line re issued simultneously. The principle of dt-flow prescheduling is illustrted on n exmple in Figure. The ctive line is the line which is currently feeding the issue buffer. The schedule line number for n instruction is lwys higher thn the current ctive line number. The schedule line is determined with the following sequentil prescheduling lgorithm : schedule line = mx({source use line}, ctive line) use line = schedule line + execution ltency For ech instruction, we define the use line s the line where its result is vilble s source opernd for dependent instructions. Rel hrdwre implementtions will require further trde-offs s shown in the next section. 5. A possible implementtion for dt-flow prescheduling 5.1. The preschedule buffer The preschedule buffer is n rry of schedule lines. Ech line is ssocited with line counter indicting how mny instructions re currently stored in the line. We define the line width s the mximum line counter vlue, tht is, the mximum number of instructions tht we llow in the sme line. The line counter is incremented ech time n instruction is written into the line. If the line counter vlue is lredy equl to the line width, this is line overflow. In ech cycle, s slots re freed in the issue buffer, instructions re tken from the current ctive line to fill these slots. Once ll the instructions in the current ctive line hve been consumed, the ctive line number is incremented. We ssume the ctive line number is incremented t most once per cycle, nd only fter the current ctive line is totlly consumed. Note tht the ctive line number keeps incresing monotoniclly. However in prctice, the number of physicl lines, which we define s the preschedule window, is limited. The ctive line, schedule line nd use line numbers mnipulted re virtul line numbers which we mp onto physicl lines circulrly. When the current ctive line is consumed, the physicl line is recycled nd its line counter is reset. In this study, we hve chosen the following policies for preschedule window overflows nd line overflows : Preschedule window overflow. If the schedule line for n instruction is greter thn or equl to the sum of the ctive line nd the preschedule window, prescheduling is blocked, witing for the ctive line to proceed nd physicl lines to be recycled. Line overflow. Similrly, if the trgeted schedule line is full, prescheduling is blocked witing for the ctive line to proceed. The schedule line of the blocked instruction is simply recomputed with the new ctive line number, s mny times s necessry, until the instruction cn be written in the preschedule buffer. Note on the preschedule buffer implementtion. In this study, it is implicitly ssumed tht the preschedule buffer is implemented with direct-mpped two-dimensionl rry : one dimension is the line number, nd the other dimension is the line counter vlue. We did not focus on optimizing the size of the preschedule buffer, s the ccess to the preschedule buffer cn be pipelined without impiring the performnce excessively. It should be noted tht this is not the only possible implementtion. In prticulr, it my be interesting to introduce some ssocitivity by using line numbers nd/or line counter vlues s tgs.

5 5.. Schedule line computtion This section describes the hrdwre supports used for implementing the prescheduling lgorithm introduced in Section.. +x +x +x mx(,b,c) + x 5..1 Registers dependencies The register use line numbers re stored in Register Use Line Tble (RULT) similr to register renme tble. Ech RULT entry is ssocited with logicl register. For ech instruction, we must 1. red the RULT entries corresponding to its source register opernds,. compute the schedule line s the mximum of the current ctive line number nd the two source registers use line numbers, 3. dd the instruction execution ltency to the schedule line number to determine the destintion register use line. 5.. Lod/store dependencies. We slightly modified the store set predictor for being ble to preschedule lod on line fter ll the stores in its store set. When store set ID (SSID) is obtined from the SSIT for lod or store, this SSID is used to index the Lst Fetched Store Tble (LFST) []. Ech LFST entry holds the inum of the more recent store in tht store set. The inum is used to enforce lod-store nd store-store dependencies. We modified the LFST entry by dding field indicting the store set mximum use line number (SSMUL). After prescheduling store, we compre the schedule line number of the store with the SSMUL obtined from the LFST (in cse there ws hit). Then we tke the mximum of the two line numbers, nd we write the result in the LFST entry. We ssume the instruction set rchitecture does not llow indexed ddressing, so tht lods hve single register dependency (e.g., MIPS, Alph). When prescheduling lod, the first use line number is register use line red from the RULT, nd the second use line number is the SSMUL red from the LFST, so tht the lod is scheduled on line fter ll the stores in its store set. Although store is forced to be dependent on previous stores in its store set, the SSMUL is not used for prescheduling stores becuse stores re dydic instructions nd this would require n extr input in the schedule line computtion The preschedule pipeline stge Dt-flow prescheduling requires to dd few extr pipeline stges. In prticulr, preschedule stge is necessry for computing the schedule line numbers. This preschedule b c Figure 3. Prescheduling opertor computing mx(, b, c)+x stge is criticl for performnce, s prescheduling is bsiclly sequentil tsk. Nevertheless, we show how it cn be prllelized. The bsic opertion involved in dt-flow prescheduling computes mx(, b, c) +x, with x being smll constnt vlue depending on the instruction opcode. Figure 3 shows possible implementtion. For shortening the dely, the +x opertion cn be performed in prllel with comprisons, s shown on Figure 3. First, we show tht the prescheduling logic my operte on smll virtul line numbers, typiclly 1-bit wide. Therefore, the opertor depicted on Figure 3 should hve propgtion time shorter thn full 6-bit ALU. Second, we show tht dependent mx(, b, c)+x opertions cn be chined without incresing the circuit depth. Mximum virtul line number. In Section 5.1, we did not consider the limittion of virtul line numbers. In prctice however, virtul line numbers re coded with limited number of bits. When computing the use line of the result of n instruction, if the mximum virtul line number is exceeded, prescheduling is blocked until the processor instruction window gets completely drined. Then, the ctive line number nd ll RULT nd LFST entries re reset to, nd prescheduling resumes from the blocked instruction. This method ensures tht the content of the RULT is lwys coherent, which is importnt for voiding dedlocks. We found tht virtul line numbers cn be coded with 1 bits with no significnt performnce loss. For exmple, with 1 bits, if instructions re issued per cycle, control-flow brek is necessry every 16k instructions, which is n order of mgnitude lrger thn the verge distnce between brnch mispredictions. Though this solution is simple, other solutions re possible for keeping the RULT content coherent. For exmple, we could invlidte RULT entry when the lst instruction which wrote in it leves the preschedule buffer (this would require to chnge the definition of the mx opertion in the 5

6 RULT & input mux bypss from previous u1,u,u3,u i1 i1 o1 mx +x1 u1 i i o mx +x u i3 i3 o3 mx +x3 u3 i i o mx +x u output mux s1 = o1 s = u1 o s3 = u1 u o3 s = u1 u u3 o s j s3 j3 s j verify dydic Figure. Prllel computing of the use line u n nd schedule line s n of group of instructions. lod r < (r1) lod r3 < (r7) dd r3 < r, r3 store r3, (r7) r1 r7 r1 r1 o1 mx + o mx + o3 mx +3 o mx +5 u1 u u3 u output mux s1 = o1 s = o s3 = u1 s = u3 s3 u s r7 verify dydic Figure 5. Exmple of prllel prescheduling. Lods nd stores hve ltency of cycles. We neglect source r3 of the dd nd source r7 of the store. schedule line computtion). Prllel prescheduling. The preschedule stge is principlly constituted of N mx(, b, c)+x opertors, N being the pipeline width. Previous pipeline stges prticipte in the preschedule tsk, performing intr-group dependencies nlysis nd determining instruction ltencies in order to set the inputs of ll mx(, b, c) +x opertors. However this preliminry work is not the core of the problem (the nlysis of intrgroup dependencies is necessry lso for register renming). The min issue is to perform in the sme cycle severl chined mx(, b, c) +x opertions. We present here possible solution to brek dependency chins nd llow the implementtion of dt-flow prescheduling. Figure shows the circuit for computing the schedule line nd use line numbers {s n } nd {u n } of group of instructions, bsed on the opertion mx(, b, c) +x. One entry,, is the ctive line number. The two other entries i n nd i n re the source opernd use cycles. The settings for the i n nd i n sources nd the commnd for the output multiplexor depend on intr-group dependencies determined in previous pipeline stges. If instruction n does not depend on previous instructions in the group, then i n nd i n re tken from the RULT or the LFST, nd the increment x n is equl to the instruction ltency l n. The schedule line number s n, in this cse, is red t the output o n of the mx opertor. Now let use suppose tht instruction n depends on previous instruction in the group. If instruction n is mondic nd depends on instruction m, then the n th opertor is configured s follows : i n = i m, i n = i m, x n = x m + l n,nd s n = u m. The difficulty comes from dydic instructions dependent on previous instructions in the group. We propose to tret dydic instructions like mondic instructions by neglecting one source opernd, tht is, predicting which of the two source use line numbers is not the mximum of the three input use line numbers. The not-the-mx predictor we simulted is -bit sturting counter stored long with the instruction. The most significnt bit of the counter indictes which source opernd to neglect. To check the prediction, we verify tht the schedule line is greter thn or equl to the neglected source use line : we compre the s n vlue with the source use line j n we neglect in the computtion of s n. If the prediction is correct, the -bit counter is strengthened, else it is wekened. Upon misprediction, the group is split : the mispredicted dydic instruction nd following instructions will be prescheduled in the next cycle. An exmple is given on Figure 5. From our experimenttions, we found n verge of one not-the-mx misprediction every 3 instructions. When fetching 8 instructions per cycle, not-the-mx mispredictions decrese the fetch rte by 5-1% Dedlocks Keeping the RULT coherent nd stlling upon preschedule window overflow ensures tht if n instruction B is register-dependent on n instruction A, then B cnnot enter in the issue buffer before A. So the dt-flow prescheduler described previously cnnot experience dedlocks becuse of register dependencies. Lod-store dependencies cnnot generte dedlocks. A lod is lwys scheduled on line fter the store it is predicted to depend on. Note tht if the prescheduler filed to detect lod-store dependency, the lod cnnot be blocked in the issue buffer by the store. Nevertheless, rtificil dependencies between stores in the sme store set cn cuse dedlocks, becuse they re not tken into ccount in the schedule line computtion of store. However, such dedlocks re very rre. Most of 6

7 our simultions experienced no dedlock t ll. Only few simultions experienced dedlocks, but never with less thn 1 million instructions per dedlock. Dedlocks cn be detected nd solved esily : when no instructions hve been issued for certin number of cycles, we relese ll stores in the issue buffer by clering their rtificil dependencies. These dependencies re not necessry for correct execution, they were introduced only for reducing the number of memory order trps. 6. Experimentl evlution 6.1. Line size trde-off The line size is n importnt prmeter of the dt-flow prescheduler. If the line is chosen too smll, prescheduling will stll too often, limiting the effective instruction window. On the other hnd, if the line is chosen too lrge, mny wrong-pth instructions will enter the issue buffer before correct-pth instructions nd my dely the correct pth. The effective instruction window grows proportionl to the squre of the line size. As the line size is incresed, prescheduling stlls less frequently, nd more instructions cn enter the prescheduler. So there is direct reltion between the line size nd the effective instruction window. We simulted n idel configurtion, replcing the e- gskew brnch predictor with perfect brnch predictor. In these conditions, the instruction fetch rte is limited only by line overflows. In this experiment nd ll subsequent ones, the preschedule window is fixed to 18 physicl lines so tht it is not performnce bottleneck. Figure 6 shows the IPC with nd without prescheduler. For the configurtion with prescheduler, we keep the issue buffer size fixed to 3, nd we vry the line size. For the configurtion with no prescheduler, we vry the issue buffer size. The issue width is kept equl to the issue buffer size. This experiment shows the reltion between the line size nd the effective instruction window size. For exmple, prescheduler with line size of 16 instructions gives the sme IPC s n n issue buffer of 18 instructions (for the instruction ltencies simulted, nd with unlimited execution resources). We observed tht the verge number of instructions witing in the preschedule buffer is roughly proportionl to the squre of the line size, which is coherent with the squre-root lw observed in [9]. It should be noted tht, on Figure 6, the issue width is lrger thn the line size. However, when execution resources re limited, dt-flow prescheduler is not exctly equivlent to lrge issue buffer, becuse the dt-flow order differs from the optiml issue order. In cse of resource conflict, lrge issue buffer should give priority to older instructions. A dt-flow prescheduler does not hve this degree of freedom. In prticulr, it is possible for wrong-pth instructions to dely the execution of correctpth instructions if the line size is lrger thn the issue width. Smpling method. Our simultor is trce driven, it is not ble to simulte instructions on the wrong pth. However, we hve simulted the impct of wrong-pth instructions from the observtion tht, from the point of view of the dt flow structure, it is very hrd to distinguish the wrong pth from the correct pth (otherwise, this would provide wy to detect mispredicted brnches). This observtion led us to smpling method, using correct-pth instructions to simulte the wrong pth. A similr technique ws used in []. The whole instruction trce is injected in the simultor, s usul, so tht its internl structures (brnch predictor, cche, store sets,...) re kept wrm. However, we collect sttistics only for one slice every 1 on verge. We define slice s piece of instruction trce delimited by two consecutive brnch mispredictions. For simulting the wrong pth, we inject in the simultor the correct-pth instructions which follow the slice currently smpled. The time counter strts counting when the first instruction in the slice is fetched, nd the counting stops when the mispredicted brnch ending the slice is executed. The smple IPC is the totl number of instructions in ll slices divided by the time cumulted on ll smples. To verify the vlidity of the method, we hve simulted lrge issue buffer giving priority to older instructions, so tht instructions on the wrong pth hve no effect. We lso rn simultions without smpling so s to obtin the orcle IPC, tht is, the IPC obtined when the instruction fetching stlls fter ech mispredicted brnch. The difference between the smple IPC nd the orcle IPC mesured on the IBS benchmrks re within ±% for 5 of the 8 IBS benchmrks, the three others being.% (gs), +.7% (sdet) nd 3.8% (nroff). It should be noted tht our smpler uses rndom number genertor which is lwys initilized with the sme seed. Hence the sequence of slices tht re smpled is fixed for given benchmrk, which mkes comprisons sfer. In the remining, the smple IPC is used s the performnce metric. Impct of wrong pth instructions. Figure 7 shows the smple nd orcle IPC mesured on n idel configurtion s function of the line size, for n issue width of nd 8. To gin plce, we show only the hrmonic men on ll benchmrks. The issue buffer size ws fixed to 3, so tht it is not performnce bottleneck. The difference between the orcle nd smple IPC vlues quntifies the performnce loss ssocited with potentilly issuing wrong-pth instructions before correct-pth instructions. We observe tht when the line size is equl to the issue width, the instructions on the wrong pth hve no impct on performnce, which is coherent. As the line size increses, so does the effective instruction window, nd this increses the smple IPC. However, fter certin line size, wrong-pth instructions begin to 7

8 IPC IPC groff gs mpeg_ply nroff rel_gcc sdet verilog video_ply line size issue buffer Figure 6. IPC of n idel configurtion with perfect brnch prediction. On the left grph, there is prescheduler. nd we vry the line size. On the right grph, there is no prescheduler nd we vry the issue buffer size. IPC (hrmonic men) issue smple 8-issue orcle -issue smple -issue orcle line size IPC no presch. db presch. df presch. groff gs mpeg nroff gcc sdet verilog video Figure 7. Dt-flow prescheduling on n idel configurtion. Hrmonic men on ll benchmrks of the smple nd orcle IPC s function of the line size, for n issue width of nd 8. Figure 8. IPC on 8-wy configurtion with 8- entry issue buffer with dt-flow prescheduler (1- instruction lines), dependence-bsed prescheduler (8 FIFOs), nd with no prescheduler. consume too much issue bndwidth, nd the smple IPC flls. With the instruction ltencies simulted, the optiml line size is pproximtely 5 % lrger thn the issue width. For exmple, for n issue width of 8, we should tke line size of 1. In this cse, on our simultions, wrong-pth instructions generte 5% performnce loss on verge. 6.. Dt-flow prescheduling effectiveness In this section, we compre three 8-wy configurtions with the sme issue buffer size: one uses dt-flow prescheduler, nother uses dependence-bsed prescheduler (the issue buffer size is the number of FIFOs), nd the lst hs no prescheduler. For the dt-flow prescheduler, the line size is set to 1 instructions (following the conclusion of Section 6.1) nd we tke into ccount specific implementtion constrints : virtul line numbers re coded on 1 bits, the preschedule stge uses not-the-mx predictions, nd the pipeline front- IPC no presch. db presch. df presch. groff gs mpeg nroff gcc sdet verilog video Figure 9. IPC on 8-wy configurtion with 16- entry issue buffer with dt-flow prescheduler (1- instruction lines), dependence-bsed prescheduler (16 FIFOs), nd with no prescheduler. 8

9 IPC (hrmonic men) no presched. db presched. df presched issue buffer size Figure 1. Hrmonic men of the IPC on ll benchmrksswevrytheissuebuffersize. IPC (hrmonic men) no presched. db presched. df presched issue buffer size Figure 11. Hrmonic men of the IPC when the L1 cche is removed nd predicted lod ltencies correspond to L cche ccess. end fetures 13 stges insted of 1 for the other two configurtions (i.e., ssuming dt-flow prescheduling requires 3 extr pipeline stges). Figures 8 nd 9 show the IPC for issue buffer sizes 8 nd 16 respectively. First, we observe tht the dt-flow prescheduler, on verge, outperforms the dependence-bsed prescheduler for these issue buffer sizes. With 8-entry issue buffer, dt-flow prescheduler is on verge % more performnt thn dependence-bsed prescheduler nd 5% more performnt thn with no prescheduler. With 16-entry issue buffer, dt-flow prescheduler is still 7% more performnt thn dependence-bsed prescheduler nd 33% more performnt thn with no prescheduler. Anlysis. Figure 1 shows the hrmonic men of the IPC on ll benchmrks s we vry the issue buffer size. With dt-flow prescheduler, it is beneficil for the issue buffer to be lrger thn the line size : the IPC with n issue buffer of 16 is higher thn with n issue buffer of 8. The min reson is tht the dt-flow order is not the optiml issue order becuse of the limited issue width. An issue buffer lrger thn the line size gives more opportunities to the issue logic for correcting the preschedule order nd get closer to n optiml issue order. We cn observe tht incresing the issue buffer size from 16 to 3 brings on verge slight performnce gin with dt-flow prescheduler. Actully, looking t benchmrks individully, it is correlted with the frequency of dt cche misses. Benchmrks with high dt cche miss rte (e.g., verilog, video ply) benefit from n issue buffer of 3, wheres benchmrks with few cche misses (e.g., nroff) do not. This is becuse cche misses degrde the ccurcy of the predicted dt-flow order. From these curves, it ppers tht n effective instruction window of 18 instructions is sufficient (ctully, rel gcc requires window of only 6 instructions becuse the distnce between brnch mispredictions is twice smller thn for other benchmrks). Without prescheduler, we would need very lrge issue buffer to implement such lrge window. With dependence-bsed prescheduler, the difficulty is hlved : 16 FIFOs emulte n effective window of bout 3 instructions. On the other hnd, ccording to Figure 6, dt-flow prescheduler emultes window of bout 6 instructions with line size of 1. In prctice, prt of this potentil is consumed by the impct of wrong pth instructions, by the extr pipeline stges, nd by cche misses degrding the ccurcy of the predicted dt-flow order. To better demonstrte the potentil of dt-flow prescheduling nd give hints for future works, we hve performed simple experiment which results re shown on Figure 11. In this experiment, we remove the L1 dt cche so tht ll lods ccess directly to the L cche, nd the prescheduler predicts tht the lod ltency corresponds to L cche ccess. We cn observe tht this emphsizes the importnce of lrge issue buffer : lrger instruction window is needed to sturte the execution units. We cn lso observe tht we get the full potentil of the dt-flow prescheduler with n issue buffer of 16. This is becuse the dt-flow order is now very ccurte, s there re no longer L1 cche misses. More interesting, the no-prescheduler curve now crosses the dt-flow-prescheduler curve t n issue buffer size of 6, despite the impct of wrong-pth instructions nd extr pipeline stges. By predicting longer lod ltencies, we decrese the frequency of line overflows nd we llow more instructions to enter in the prescheduler. In other words, predicting longer ltencies enlrges the effective instruction window. Now, with the sme 1- instruction line size, we re emulting n effective window lrger thn 6 instructions. 7. Conclusion nd future works The issue buffer is one of the criticl pipeline stges in modern out-of-order processors. The trversl time of the issue stge increses with the issue buffer size. This my 9

10 prevent the implementtion of lrge issue buffers. Dt-flow prescheduling llows to reorder instructions dynmiclly. The gol is to push instructions in the issue buffer in the dt-flow order rther thn in sequentil order. This llows to rech the sme IPC using smller issue buffer. The implementtion proposed in this study is only point in the design spce. In prticulr, we did not explore the possibility of introducing ssocitivity in the preschedule buffer. Associtivity might be useful for smoothing the precheduler behvior. This concerns both the utiliztion of the preschedule buffer spce nd the definition of line overflows. Prescheduling (or other techniques tckling the sme problem) should be viewed s wy to tolerte long instruction ltencies. The min sitution requiring lrge instruction window is when there is not enough instruction prllelism to sturte the execution units. This is often the cse on code sections experiencing frequent dt cche misses. The dt-flow prescheduler we simulted predicts tht ll lod ltencies correspond to L1 dt cche hit. However, for pplictions with frequent cche misses, we would like the prescheduler to predict longer lod ltencies. As prt of future work, it would be interesting to study how the memory hierrchy design could tke dvntge of the ltency tolernce fforded by prescheduler. In prticulr, hit-miss prediction techniques [17, 7] should be considered s prt of the problem. Future work should lso focus on the problem of bypss ltencies. In this study, we ssumed centrlized instruction window feeding compct pool of execution units. However, clustered rchitectures re ppering, with restricted bypss networks. Dt-flow prescheduling might lso be interesting for those rchitectures. References [1] V. Agrwl, M.S. Hrishikesh, S.W. Keckler, nd D. Burger. Clock rte versus IPC: the end of the rod for conventionl microrchitectures. In Proceedings of the 7th Annul Interntionl Symposium on Computer Architecture,. [] M. Butler nd Y. Ptt. An investigtion of the performnce of vrious dynmic scheduling techniques. In Proceedings of the 5th Interntionl Symposium on Microrchitecture, 199. [3] R. Cnl nd A. González. A low-complexity issue logic. In Proceedings of the 1th Interntionl Conference on Supercomputing,. [] G. Chrysos nd J. Emer. Memory dependence prediction using store sets. In Proceedings of the 5th Annul Interntionl Symposium on Computer Architecture, [5] D.S. Henry, B.C. Kuszmul, G.H. Loh, nd R. Smi. Circuits for wide-window supersclr processors. In Proceedings of the 7th Annul Interntionl Symposium on Computer Architecture,. [6] N.P. Jouppi nd P. Rngnthn. The reltive importnce of memory ltency, bndwidth, nd brnch limits to performnce. Workshop on Mixing Logic nd DRAM (ISCA 97). [7] R.E. Kessler. The lph 16 microprocessor. IEEE Micro, Mrch [8] D. Leibholz nd R. Rzdn. The Alph 16: 5 MHz out-of-order execution microprocessor. In Proceedings of IEEE COMPCOM, [9] P. Michud, A. Seznec, nd S. Jourdn. Exploring instruction-fetch bndwidth requirement in wide-issue supersclr processors. In Proceedings of the Interntionl Conference on Prllel Architectures nd Compiltion Techniques, [1] P. Michud, A. Seznec, nd R. Uhlig. Trding conflict nd cpcity lising in conditionl brnch predictors. In Proceedings of the th Annul Interntionl Symposium on Computer Architecture, [11] S. Plchrl, N. Jouppi, nd J.E. Smith. Complexityeffective supersclr processors. In Proceedings of the th Interntionl Symposium on Computer Architecture, [1] N. Rngnthn nd M. Frnklin. An empiricl study of decentrlized ILP execution models. In Proceedings of the 8th Interntionl Conference on Architecturl Support for Progrmming Lnguges nd Operting Systems, [13] E. Rotenberg, Q. Jcobson, Y. Szeides, nd J. Smith. Trce processors. In Proceedings of the 3th Interntionl Symposium on Microrchitecture, [1] J.E. Smith nd A.R. Pleszkun. Implementtion of precise interrupts in pipelined processors. In Proceedings of the 1th Annul Interntionl Symposium on Computer Architecture, [15] S.T. Srinivsn nd A.R. Lebeck. Lod ltency tolernce in dynmiclly scheduled processors. In Proceedings of the 31th Annul Interntionl Sympoisum on Microrchitecture, [16] R. Uhlig, D. Ngle, T. Mudge, S. Sechrest, nd J. Emer. Coping with code blot. In Proceedings of the nd Annul Interntionl Symposium on Computer Architecture, June [17] A. Yoz, M. Erez, R. Ronen, nd S. Jourdn. Specultion techniques for improving lod relted scheduling. In Proceedings of the 6th Annul Interntionl Symposium on Computer Architecture,

Data Flow on a Queue Machine. Bruno R. Preiss. Copyright (c) 1987 by Bruno R. Preiss, P.Eng. All rights reserved.

Data Flow on a Queue Machine. Bruno R. Preiss. Copyright (c) 1987 by Bruno R. Preiss, P.Eng. All rights reserved. Dt Flow on Queue Mchine Bruno R. Preiss 2 Outline Genesis of dt-flow rchitectures Sttic vs. dynmic dt-flow rchitectures Pseudo-sttic dt-flow execution model Some dt-flow mchines Simple queue mchine Prioritized

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-169 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li 2nd Interntionl Conference on Electronic & Mechnicl Engineering nd Informtion Technology (EMEIT-212) Complete Coverge Pth Plnning of Mobile Robot Bsed on Dynmic Progrmming Algorithm Peng Zhou, Zhong-min

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method A New Lerning Algorithm for the MAXQ Hierrchicl Reinforcement Lerning Method Frzneh Mirzzdeh 1, Bbk Behsz 2, nd Hmid Beigy 1 1 Deprtment of Computer Engineering, Shrif University of Technology, Tehrn,

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1.

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1. Answer on Question #5692, Physics, Optics Stte slient fetures of single slit Frunhofer diffrction pttern. The slit is verticl nd illuminted by point source. Also, obtin n expression for intensity distribution

More information

Parallel Square and Cube Computations

Parallel Square and Cube Computations Prllel Squre nd Cube Computtions Albert A. Liddicot nd Michel J. Flynn Computer Systems Lbortory, Deprtment of Electricl Engineering Stnford University Gtes Building 5 Serr Mll, Stnford, CA 945, USA liddicot@stnford.edu

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-186 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

MIPS I/O and Interrupt

MIPS I/O and Interrupt MIPS I/O nd Interrupt Review Floting point instructions re crried out on seprte chip clled coprocessor 1 You hve to move dt to/from coprocessor 1 to do most common opertions such s printing, clling functions,

More information

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants A Heuristic Approch for Discovering Reference Models by Mining Process Model Vrints Chen Li 1, Mnfred Reichert 2, nd Andres Wombcher 3 1 Informtion System Group, University of Twente, The Netherlnds lic@cs.utwente.nl

More information

Computer Arithmetic Logical, Integer Addition & Subtraction Chapter

Computer Arithmetic Logical, Integer Addition & Subtraction Chapter Computer Arithmetic Logicl, Integer Addition & Sutrction Chpter 3.-3.3 3.3 EEC7 FQ 25 MIPS Integer Representtion -it signed integers,, e.g., for numeric opertions 2 s s complement: one representtion for

More information

Accelerating 3D convolution using streaming architectures on FPGAs

Accelerating 3D convolution using streaming architectures on FPGAs Accelerting 3D convolution using streming rchitectures on FPGAs Hohun Fu, Robert G. Clpp, Oskr Mencer, nd Oliver Pell ABSTRACT We investigte FPGA rchitectures for ccelerting pplictions whose dominnt cost

More information

The Distributed Data Access Schemes in Lambda Grid Networks

The Distributed Data Access Schemes in Lambda Grid Networks The Distributed Dt Access Schemes in Lmbd Grid Networks Ryot Usui, Hiroyuki Miygi, Yutk Arkw, Storu Okmoto, nd Noki Ymnk Grdute School of Science for Open nd Environmentl Systems, Keio University, Jpn

More information

Chapter 2 Sensitivity Analysis: Differential Calculus of Models

Chapter 2 Sensitivity Analysis: Differential Calculus of Models Chpter 2 Sensitivity Anlysis: Differentil Clculus of Models Abstrct Models in remote sensing nd in science nd engineering, in generl re, essentilly, functions of discrete model input prmeters, nd/or functionls

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Stack. A list whose end points are pointed by top and bottom

Stack. A list whose end points are pointed by top and bottom 4. Stck Stck A list whose end points re pointed by top nd bottom Insertion nd deletion tke plce t the top (cf: Wht is the difference between Stck nd Arry?) Bottom is constnt, but top grows nd shrinks!

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

A Formalism for Functionality Preserving System Level Transformations

A Formalism for Functionality Preserving System Level Transformations A Formlism for Functionlity Preserving System Level Trnsformtions Smr Abdi Dniel Gjski Center for Embedded Computer Systems UC Irvine Center for Embedded Computer Systems UC Irvine Irvine, CA 92697 Irvine,

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

A REINFORCEMENT LEARNING APPROACH TO SCHEDULING DUAL-ARMED CLUSTER TOOLS WITH TIME VARIATIONS

A REINFORCEMENT LEARNING APPROACH TO SCHEDULING DUAL-ARMED CLUSTER TOOLS WITH TIME VARIATIONS A REINFORCEMENT LEARNING APPROACH TO SCHEDULING DUAL-ARMED CLUSTER TOOLS WITH TIME VARIATIONS Ji-Eun Roh (), Te-Eog Lee (b) (),(b) Deprtment of Industril nd Systems Engineering, Kore Advnced Institute

More information

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers Wht do ll those bits men now? bits (...) Number Systems nd Arithmetic or Computers go to elementry school instruction R-formt I-formt... integer dt number text chrs... floting point signed unsigned single

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Performance analysis of QoS mechanisms in IP networks

Performance analysis of QoS mechanisms in IP networks University of Wollongong Reserch Online Fculty of Informtics - Ppers (Archive) Fculty of Engineering nd Informtion Sciences 2000 Performnce nlysis of QoS mechnisms in IP networks D. Ji University of Wollongong

More information

Section 10.4 Hyperbolas

Section 10.4 Hyperbolas 66 Section 10.4 Hyperbols Objective : Definition of hyperbol & hyperbols centered t (0, 0). The third type of conic we will study is the hyperbol. It is defined in the sme mnner tht we defined the prbol

More information

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers Wht do ll those bits men now? bits (...) Number Systems nd Arithmetic or Computers go to elementry school instruction R-formt I-formt... integer dt number text chrs... floting point signed unsigned single

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Functor (1A) Young Won Lim 10/5/17

Functor (1A) Young Won Lim 10/5/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

ECE 468/573 Midterm 1 September 28, 2012

ECE 468/573 Midterm 1 September 28, 2012 ECE 468/573 Midterm 1 September 28, 2012 Nme:! Purdue emil:! Plese sign the following: I ffirm tht the nswers given on this test re mine nd mine lone. I did not receive help from ny person or mteril (other

More information

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization An Efficient Divide nd Conquer Algorithm for Exct Hzrd Free Logic Minimiztion J.W.J.M. Rutten, M.R.C.M. Berkelr, C.A.J. vn Eijk, M.A.J. Kolsteren Eindhoven University of Technology Informtion nd Communiction

More information

Functor (1A) Young Won Lim 8/2/17

Functor (1A) Young Won Lim 8/2/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications. 15-112 Fll 2018 Midterm 1 October 11, 2018 Nme: Andrew ID: Recittion Section: ˆ You my not use ny books, notes, extr pper, or electronic devices during this exm. There should be nothing on your desk or

More information

Transparent neutral-element elimination in MPI reduction operations

Transparent neutral-element elimination in MPI reduction operations Trnsprent neutrl-element elimintion in MPI reduction opertions Jesper Lrsson Träff Deprtment of Scientific Computing University of Vienn Disclimer Exploiting repetition nd sprsity in input for reducing

More information

CSCI 446: Artificial Intelligence

CSCI 446: Artificial Intelligence CSCI 446: Artificil Intelligence Serch Instructor: Michele Vn Dyne [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI t UC Berkeley. All CS188 mterils re vilble t http://i.berkeley.edu.]

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

Engineer-to-Engineer Note

Engineer-to-Engineer Note Engineer-to-Engineer Note EE-295 Technicl notes on using Anlog Devices DSPs, processors nd development tools Visit our Web resources http://www.nlog.com/ee-notes nd http://www.nlog.com/processors or e-mil

More information

Computing offsets of freeform curves using quadratic trigonometric splines

Computing offsets of freeform curves using quadratic trigonometric splines Computing offsets of freeform curves using qudrtic trigonometric splines JIULONG GU, JAE-DEUK YUN, YOONG-HO JUNG*, TAE-GYEONG KIM,JEONG-WOON LEE, BONG-JUN KIM School of Mechnicl Engineering Pusn Ntionl

More information

Today. Search Problems. Uninformed Search Methods. Depth-First Search Breadth-First Search Uniform-Cost Search

Today. Search Problems. Uninformed Search Methods. Depth-First Search Breadth-First Search Uniform-Cost Search Uninformed Serch [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI t UC Berkeley. All CS188 mterils re vilble t http://i.berkeley.edu.] Tody Serch Problems Uninformed Serch Methods

More information

A dynamic multicast tree based routing scheme without replication in delay tolerant networks

A dynamic multicast tree based routing scheme without replication in delay tolerant networks Accepted Mnuscript A dynmic multicst tree bsed routing scheme without repliction in dely tolernt networks Yunsheng Wng, Jie Wu PII: S0-()00- DOI: 0.0/j.jpdc.0..00 Reference: YJPDC To pper in: J. Prllel

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-188 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

Engineer-to-Engineer Note

Engineer-to-Engineer Note Engineer-to-Engineer Note EE-245 Technicl notes on using Anlog Devices DSPs, processors nd development tools Contct our technicl support t dsp.support@nlog.com nd t dsptools.support@nlog.com Or visit our

More information

3.5.1 Single slit diffraction

3.5.1 Single slit diffraction 3..1 Single slit diffrction ves pssing through single slit will lso diffrct nd produce n interference pttern. The reson for this is to do with the finite width of the slit. e will consider this lter. Tke

More information

A Priority-based Distributed Call Admission Protocol for Multi-hop Wireless Ad hoc Networks

A Priority-based Distributed Call Admission Protocol for Multi-hop Wireless Ad hoc Networks A Priority-bsed Distributed Cll Admission Protocol for Multi-hop Wireless Ad hoc Networks un Sun Elizbeth M. Belding-Royer Deprtment of Computer Science University of Cliforni, Snt Brbr suny, ebelding

More information

Qubit allocation for quantum circuit compilers

Qubit allocation for quantum circuit compilers Quit lloction for quntum circuit compilers Nov. 10, 2017 JIQ 2017 Mrcos Yukio Sirichi Sylvin Collnge Vinícius Fernndes dos Sntos Fernndo Mgno Quintão Pereir Compilers for quntum computing The first genertion

More information

Engineer-to-Engineer Note

Engineer-to-Engineer Note Engineer-to-Engineer Note EE-232 Technicl notes on using Anlog Devices DSPs, processors nd development tools Contct our technicl support t dsp.support@nlog.com nd t dsptools.support@nlog.com Or visit our

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

Revisiting the notion of Origin-Destination Traffic Matrix of the Hosts that are attached to a Switched Local Area Network

Revisiting the notion of Origin-Destination Traffic Matrix of the Hosts that are attached to a Switched Local Area Network Interntionl Journl of Distributed nd Prllel Systems (IJDPS) Vol., No.6, November 0 Revisiting the notion of Origin-Destintion Trffic Mtrix of the Hosts tht re ttched to Switched Locl Are Network Mondy

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer. DBMS Architecture SQL INSTRUCTION OPTIMIZER Dtbse Mngement Systems MANAGEMENT OF ACCESS METHODS BUFFER MANAGER CONCURRENCY CONTROL RELIABILITY MANAGEMENT Index Files Dt Files System Ctlog DATABASE 2 Query

More information

Overview. Network characteristics. Network architecture. Data dissemination. Network characteristics (cont d) Mobile computing and databases

Overview. Network characteristics. Network architecture. Data dissemination. Network characteristics (cont d) Mobile computing and databases Overview Mobile computing nd dtbses Generl issues in mobile dt mngement Dt dissemintion Dt consistency Loction dependent queries Interfces Detils of brodcst disks thlis klfigopoulos Network rchitecture

More information

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE 3.1 Scheimpflug Configurtion nd Perspective Distortion Scheimpflug criterion were found out to be the best lyout configurtion for Stereoscopic PIV, becuse

More information

6.2 Volumes of Revolution: The Disk Method

6.2 Volumes of Revolution: The Disk Method mth ppliction: volumes by disks: volume prt ii 6 6 Volumes of Revolution: The Disk Method One of the simplest pplictions of integrtion (Theorem 6) nd the ccumultion process is to determine so-clled volumes

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Questions About Numbers. Number Systems and Arithmetic. Introduction to Binary Numbers. Negative Numbers?

Questions About Numbers. Number Systems and Arithmetic. Introduction to Binary Numbers. Negative Numbers? Questions About Numbers Number Systems nd Arithmetic or Computers go to elementry school How do you represent negtive numbers? frctions? relly lrge numbers? relly smll numbers? How do you do rithmetic?

More information

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) Numbers nd Opertions, Algebr, nd Functions 45. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) In sequence of terms involving eponentil growth, which the testing service lso clls geometric

More information

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits Systems I Logic Design I Topics Digitl logic Logic gtes Simple comintionl logic circuits Simple C sttement.. C = + ; Wht pieces of hrdwre do you think you might need? Storge - for vlues,, C Computtion

More information

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment File Mnger Quick Reference Guide June 2018 Prepred for the Myo Clinic Enterprise Khu Deployment NVIGTION IN FILE MNGER To nvigte in File Mnger, users will mke use of the left pne to nvigte nd further pnes

More information

Approximation by NURBS with free knots

Approximation by NURBS with free knots pproximtion by NURBS with free knots M Rndrinrivony G Brunnett echnicl University of Chemnitz Fculty of Computer Science Computer Grphics nd Visuliztion Strße der Ntionen 6 97 Chemnitz Germny Emil: mhrvo@informtiktu-chemnitzde

More information

On Computation and Resource Management in Networked Embedded Systems

On Computation and Resource Management in Networked Embedded Systems On Computtion nd Resource Mngement in Networed Embedded Systems Soheil Ghisi Krlene Nguyen Elheh Bozorgzdeh Mjid Srrfzdeh Computer Science Deprtment University of Cliforni, Los Angeles, CA 90095 soheil,

More information

1 Quad-Edge Construction Operators

1 Quad-Edge Construction Operators CS48: Computer Grphics Hndout # Geometric Modeling Originl Hndout #5 Stnford University Tuesdy, 8 December 99 Originl Lecture #5: 9 November 99 Topics: Mnipultions with Qud-Edge Dt Structures Scribe: Mike

More information

A Progressive Register Allocator for Irregular Architectures

A Progressive Register Allocator for Irregular Architectures A Progressive Register Alloctor for Irulr Architectures Dvid Koes nd Seth Copen Goldstein Computer Science Deprtment Crnegie Mellon University {dkoes,seth}@cs.cmu.edu Abstrct Register lloction is one of

More information

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe CSCI 0 fel Ferreir d Silv rfsilv@isi.edu Slides dpted from: Mrk edekopp nd Dvid Kempe LOG STUCTUED MEGE TEES Series Summtion eview Let n = + + + + k $ = #%& #. Wht is n? n = k+ - Wht is log () + log ()

More information

ECEN 468 Advanced Logic Design Lecture 36: RTL Optimization

ECEN 468 Advanced Logic Design Lecture 36: RTL Optimization ECEN 468 Advnced Logic Design Lecture 36: RTL Optimiztion ECEN 468 Lecture 36 RTL Design Optimiztions nd Trdeoffs 6.5 While creting dtpth during RTL design, there re severl optimiztions nd trdeoffs, involving

More information

Tool Vendor Perspectives SysML Thus Far

Tool Vendor Perspectives SysML Thus Far Frontiers 2008 Pnel Georgi Tec, 05-13-08 Tool Vendor Perspectives SysML Thus Fr Hns-Peter Hoffmnn, Ph.D Chief Systems Methodologist Telelogic, Systems & Softwre Modeling Business Unit Peter.Hoffmnn@telelogic.com

More information

3.5.1 Single slit diffraction

3.5.1 Single slit diffraction 3.5.1 Single slit diffrction Wves pssing through single slit will lso diffrct nd produce n interference pttern. The reson for this is to do with the finite width of the slit. We will consider this lter.

More information

II. THE ALGORITHM. A. Depth Map Processing

II. THE ALGORITHM. A. Depth Map Processing Lerning Plnr Geometric Scene Context Using Stereo Vision Pul G. Bumstrck, Bryn D. Brudevold, nd Pul D. Reynolds {pbumstrck,brynb,pulr2}@stnford.edu CS229 Finl Project Report December 15, 2006 Abstrct A

More information

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012 Dynmic Progrmming Andres Klppenecker [prtilly bsed on slides by Prof. Welch] 1 Dynmic Progrmming Optiml substructure An optiml solution to the problem contins within it optiml solutions to subproblems.

More information

9 Graph Cutting Procedures

9 Graph Cutting Procedures 9 Grph Cutting Procedures Lst clss we begn looking t how to embed rbitrry metrics into distributions of trees, nd proved the following theorem due to Brtl (1996): Theorem 9.1 (Brtl (1996)) Given metric

More information

Digital Design. Chapter 6: Optimizations and Tradeoffs

Digital Design. Chapter 6: Optimizations and Tradeoffs Digitl Design Chpter 6: Optimiztions nd Trdeoffs Slides to ccompny the tetbook Digitl Design, with RTL Design, VHDL, nd Verilog, 2nd Edition, by Frnk Vhid, John Wiley nd Sons Publishers, 2. http://www.ddvhid.com

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Pointwise convergence need not behave well with respect to standard properties such as continuity.

Pointwise convergence need not behave well with respect to standard properties such as continuity. Chpter 3 Uniform Convergence Lecture 9 Sequences of functions re of gret importnce in mny res of pure nd pplied mthemtics, nd their properties cn often be studied in the context of metric spces, s in Exmples

More information

Digital Signal Processing: A Hardware-Based Approach

Digital Signal Processing: A Hardware-Based Approach Digitl Signl Processing: A Hrdwre-Bsed Approch Roert Esposito Electricl nd Computer Engineering Temple University troduction Teching Digitl Signl Processing (DSP) hs included the utilition of simultion

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

UNIT 11. Query Optimization

UNIT 11. Query Optimization UNIT Query Optimiztion Contents Introduction to Query Optimiztion 2 The Optimiztion Process: An Overview 3 Optimiztion in System R 4 Optimiztion in INGRES 5 Implementing the Join Opertors Wei-Png Yng,

More information

Exam #1 for Computer Simulation Spring 2005

Exam #1 for Computer Simulation Spring 2005 Exm # for Computer Simultion Spring 005 >>> SOLUTION

More information

INTRODUCTION TO SIMPLICIAL COMPLEXES

INTRODUCTION TO SIMPLICIAL COMPLEXES INTRODUCTION TO SIMPLICIAL COMPLEXES CASEY KELLEHER AND ALESSANDRA PANTANO 0.1. Introduction. In this ctivity set we re going to introduce notion from Algebric Topology clled simplicil homology. The min

More information

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1 Mth 33 Volume Stewrt 5.2 Geometry of integrls. In this section, we will lern how to compute volumes using integrls defined by slice nlysis. First, we recll from Clculus I how to compute res. Given the

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

Caches I. CSE 351 Autumn Instructor: Justin Hsia

Caches I. CSE 351 Autumn Instructor: Justin Hsia L01: Intro, L01: L16: Combintionl Introduction Cches I Logic CSE369, CSE351, Autumn 2016 Cches I CSE 351 Autumn 2016 Instructor: Justin Hsi Teching Assistnts: Chris M Hunter Zhn John Kltenbch Kevin Bi

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

Engineer-to-Engineer Note

Engineer-to-Engineer Note Engineer-to-Engineer Note EE-204 Technicl notes on using Anlog Devices DSPs, processors nd development tools Visit our Web resources http://www.nlog.com/ee-notes nd http://www.nlog.com/processors or e-mil

More information

9 4. CISC - Curriculum & Instruction Steering Committee. California County Superintendents Educational Services Association

9 4. CISC - Curriculum & Instruction Steering Committee. California County Superintendents Educational Services Association 9. CISC - Curriculum & Instruction Steering Committee The Winning EQUATION A HIGH QUALITY MATHEMATICS PROFESSIONAL DEVELOPMENT PROGRAM FOR TEACHERS IN GRADES THROUGH ALGEBRA II STRAND: NUMBER SENSE: Rtionl

More information

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures Other Issues Stck Mnipultion support for procedures (Refer to section 3.6), stcks, frmes, recursion mnipulting strings nd pointers linkers, loders, memory lyout Interrupts, exceptions, system clls nd conventions

More information

Fault injection attacks on cryptographic devices and countermeasures Part 2

Fault injection attacks on cryptographic devices and countermeasures Part 2 Fult injection ttcks on cryptogrphic devices nd countermesures Prt Isrel Koren Deprtment of Electricl nd Computer Engineering University of Msschusetts Amherst, MA Countermesures - Exmples Must first detect

More information

Performance Evaluation of Dynamic Reconfiguration in High-Speed Local Area Networks

Performance Evaluation of Dynamic Reconfiguration in High-Speed Local Area Networks Performnce Evlution of Dynmic Reconfigurtion in High-Speed Locl Are Networks Rfel Csdo, Aurelio Bermúdez, Frncisco J. Quiles, JoséL.Sánchez Depto. de Informátic Universidd de Cstill-L Mnch 271- Albcete,

More information

OPERATION MANUAL. DIGIFORCE 9307 PROFINET Integration into TIA Portal

OPERATION MANUAL. DIGIFORCE 9307 PROFINET Integration into TIA Portal OPERATION MANUAL DIGIFORCE 9307 PROFINET Integrtion into TIA Portl Mnufcturer: 2018 burster präzisionsmesstechnik gmbh & co kg burster präzisionsmesstechnik gmbh & co kg Alle Rechte vorbehlten Tlstrße

More information

a Technical Notes on using Analog Devices' DSP components and development tools

a Technical Notes on using Analog Devices' DSP components and development tools Engineer To Engineer Note EE-146 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

A Fast Imaging Algorithm for Near Field SAR

A Fast Imaging Algorithm for Near Field SAR Journl of Computing nd Electronic Informtion Mngement ISSN: 2413-1660 A Fst Imging Algorithm for Ner Field SAR Guoping Chen, Lin Zhng, * College of Optoelectronic Engineering, Chongqing University of Posts

More information

Allocator Basics. Dynamic Memory Allocation in the Heap (malloc and free) Allocator Goals: malloc/free. Internal Fragmentation

Allocator Basics. Dynamic Memory Allocation in the Heap (malloc and free) Allocator Goals: malloc/free. Internal Fragmentation Alloctor Bsics Dynmic Memory Alloction in the Hep (mlloc nd free) Pges too corse-grined for llocting individul objects. Insted: flexible-sized, word-ligned blocks. Allocted block (4 words) Free block (3

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-208 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

Control-Flow Analysis and Loop Detection

Control-Flow Analysis and Loop Detection ! Control-Flow Anlysis nd Loop Detection!Lst time! PRE!Tody! Control-flow nlysis! Loops! Identifying loops using domintors! Reducibility! Using loop identifiction to identify induction vribles CS553 Lecture

More information

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation Representtion of Numbers Number Representtion Computer represent ll numbers, other thn integers nd some frctions with imprecision. Numbers re stored in some pproximtion which cn be represented by fixed

More information