Extending STI for Demanding Hard-Real-Time Systems

Size: px
Start display at page:

Download "Extending STI for Demanding Hard-Real-Time Systems"

Transcription

1 Extendng STI for Demandng Hard-Real-Tme Systems Benamn Welch Shobht Kanaua Adarsh Seetharam Deepaksrvats Thrumala Center for Embedded Systems Research North Carolna State Unversty Ralegh, NC Alexander G. Dean {sokanau, aseetha, ABSTRACT Software thread ntegraton (STI) s a complaton technque whch enables the effcent use of an applcaton s fne-gran dle tme on generc processors wthout specal hardware support. Wth STI, a prmary functon (wth real-tme requrements on specfc nstructons) s automatcally nterleaved wth a secondary functon to create a sngle mplctly multthreaded functon whch mnmzes context swtchng and hence both mproves performance and also offers very fne-gran concurrency. In ths paper we extend STI technques to address two challenges. Frst, we reduce response tme for nterrupts or other hgh-prorty threads by ntroducng pollng servers nto ntegrated threads. Currently ntegrated threads dsable nterrupts, delayng all other work untl ther completon. Second, we enable ntegraton wth long host threads, expandng the doman of STI. Wth current technques, f there are frequent nterrupts, only host threads whch can fnsh executon before the next nterrupt can be ntegrated. We derve methods to evaluate the response tme for threads n systems wth and wthout these new ntegraton methods. We demonstrate these concepts wth the ntegraton of varous threads n a sample hard-real-tme system on a hghly-constraned mcrocontroller. We use an nexpensve 20 MHz AVR 8-bt mcrocontroller to generate monochrome NTSC vdeo whle servcng a hgh-speed (5.2 kbaud) seral communcaton lnk. We have bult and tested ths system and demonstrate graphcs renderng speed-ups of 3.99x to 3.5x. Categores and Subect Descrptors B..4 [Control Structures and Mcroprogrammng]: Mcroprogram Desgn Ads Languages and complers, optmzaton; C.3 [Specal-Purpose and Applcaton-Based Systems]: Real-tme and embedded systems; D.3.4 [Programmng Languages]: Processors Code Generaton, Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. CASES 03, Oct. 30 Nov. 2, 2003, San Jose, Calforna, USA. Copyrght 2003 ACM /03/000 $5.00. Complers, Optmzaton, Run-Tme Envronments; D.4. [Operatng Systems]: Process Management Concurrency, multtaskng, schedulng, threads; D.4.7 [Operatng Systems]: Organzaton and Desgn Real-tme systems and embedded systems. General Terms Algorthms, Desgn, Expermentaton, Performance, Theory. Keywords Embedded systems, software thread ntegraton, hardware-tosoftware mgraton, post-pass compler, fne-gran concurrency, NTSC vdeo, AVR, STIGLtz.. INTRODUCTION. Software Thread Integraton Software thread ntegraton (STI) s a back-end compler technque that provdes fne-gran concurrency on generc processors by elmnatng many context swtches. By elmnatng the need for specal archtectural features t allows generc, lowcost processors to replace more expensve specalzed devces. STI reduces the clock speed needed to mplement gven functonalty on a generc processor, savng money, power, and energy and smplfyng desgn efforts. Guest Schedule (Executon Hardware Tme Functon Reqts.) Idle Tme Real-Tme Guest (Prmary) Thread Host (Secondary) Thread Integrated Thread Software Thread Integraton Idle Tme Reclamed Fgure. Overvew of hardware to software mgraton wth STI. Idle tme s statcally flled at comple tme wth other useful work from the system. Hardware to software mgraton (HSM) s often performed n ndustry to reduce costs and ncrease desgn flexblty. The economes of scale make mcrocontrollers much less expensve than most dedcated hardware crcuts (e.g. protocol controllers).

2 STI smplfes hardware-to-software mgraton by squeezng more performance out of generc processors whle nsulatng the programmer from mplementaton detals. STI works by mergng two functons nto one mplctly multthreaded functon, as shown n Fgure [8][9][0][][2]. When used for real-tme software, t enables the placement of tme-crtcal nstructons from one thread so they execute at a gven tme relatve to the begnnng of the ntegrated functon, regardless of the control or data flow characterstcs of ether thread. STI begns wth tmng regularzaton; executon paths of uneven duraton are padded to last the same amount of tme. Next, ether the regster fle s parttoned or def-use webs are extracted for later regster reallocaton. Code from one thread can now be moved to execute at gven tme n the other, usng code transformatons such as moton, replcaton, loop peelng / splttng / guardng / unrollng / fuson. The transformatons are drven by user-suppled tmng drectves whch ndcate when specfc real-tme nstructons must execute. We have mplemented a thread-ntegratng compler Thrnt whch mplements many of these analyses and transformatons for the AVR [] archtecture, whch s 8-bt, load/store, and optmzed for embedded C code. Although the tmng varablty of modern hgh-performance CPUs and memory herarches greatly reduces the temporal determnsm whch STI requres, ths s a non-ssue. STI targets applcatons whch nether need nor can afford these CPUs and memory systems. For perspectve, n % of the 8 bllon mcroprocessors sold were four- and eght-bt unts [8]. These mcrocontrollers run applcatons whch are not computatonally ntensve, and do not need more parallelsm or faster clock rates. They lack sophstcated mcroarchtectures and memory systems, and often cannot afford them. Instead these applcatons are constraned by other ssues such as functonalty, cost, power dsspaton, desgn tme and use of commercal off-the-shelf products. Hardware to software mgraton wth STI allows the desgner to address these ssues effcently. Each thread conssts of one or more functons; t s the functons whch are ntegrated. Prevous STI work classfes threads to be ntegrated as ether guest threads (moved from hardware nto software, and hence contanng the most tme-crtcal nstructons) or host threads (applcaton threads whch have always been software). When not referrng to hardware-to-software mgraton, guest threads are called prmary threads due to ther fne-gran (nternal) dle tme, and host threads are called secondary threads due to ther lack of fne-gran dle tme..2 Demandng Hard Real-Tme Systems and Software Thread Integraton Fgure 2 classfes n two dmensons the system s threads (ncludng nterrupt servce routnes (ISRs)). The horzontal axs ndcates the worst-case executon tme (WCET) C of a thread, whle the vertcal axs shows ts laxty L. The lower a thread s on the plot, the less ts laxty and the sooner t must be executed. The farther to the rght a thread s, the more processng t requres per nstance. These are the threads from whch one or more secondary functons for ntegraton must be chosen. Current STI methods make portons of ths desgn space unreachable for two reasons. Frst, a functon ntegrated wth STI dsables nterrupts (and hence context swtches) to ensure that all ts nstructons execute at the approprate tmes. In turn, ths can delay the executon of other threads, potentally resultng n mssed deadlnes, whch wll make hard real-tme systems fal. Ths conflct lmts the use of STI to systems whose threads have laxtes of at least the worst-case duraton of the ntegrated functons. In Fgure 2, sectons 2 and 4 show response tmes whch can no longer be acheved when STI s appled. Laxty (maxmum latency allowed) Worst case executon tme of ntegrated thread 2 4 WCET for applcaton thread Mnmum prmary thread perod mnus maxmum prmary thread work Fgure 2. Desgn space overvew consders real-tme requrements. Second, exstng STI methods rely upon ntegratng prmary functons wth secondary functons that are short enough to complete executon before the occurrence of the next prmary functon (whether perodcally scheduled or trggered by an nterrupt). In sectons 3 and 4 of Fgure 2 the next nstance of the perodc prmary thread arrves before the already-runnng nstance completes ts executon. These constrants lmt a desgner s optons for usng exstng STI methods. Frst, all nterrupt servce routnes (and threads n general) can be delayed by the duraton of the longest ntegrated functon. Ths mples that f any threads are present n regons 2 and 4, STI cannot be used n the system. Second, ntegrated functons must run to completon to ensure the real-tme nstructons wthn are executed on tme. An ntegrated functon conssts of a prmary functon lastng up to C Pr wth some dle tme I Pr and a secondary functon lastng up to C Sec. Ths mples that the ntegrated functon wll last at least C Pr + C Sec I Idle. The system desgner must choose the secondary functon(s) such that the ntegrated functon s duraton s shorter than the mnmum perod of ether the prmary or secondary thread. In the event of a frequently runnng prmary thread, the number of vald secondary functons s dramatcally reduced. In ths paper we present new methods whch elmnate both of these constrants, allowng ntegraton wth threads n regons 2, 3 and 4. Frst we show how to ntegrate pollng servers to handle short-latency nterrupts to reduce the response tme. Second we present methods whch partton a secondary functon nto multple shorter segments, each of whch s ntegrated wth the prmary functon, enablng longer secondary functons to be used. The paper s organzed as follows. Secton 2 summarzes software thread ntegraton. Secton 3 presents our new methods based 3

3 upon pollng servers and thread parttonng. Secton 4 apples the methods to a demandng hard real-tme system -- a 20 MHz 8 bt mcrocontroller whch generated monochrome NTSC vdeo wth mnmal hardware support. Secton 5 analyzes the results of ntegraton upon the system. Secton 6 draws conclusons and presents future work. 2. STI OVERVIEW Fgure presents an overvew of how STI s used for hardwareto-software mgraton (HSM). A hardware functon s replaced wth software wrtten by a programmer. Ths code conssts of one or more guest threads (represented by the sold bar) wth real-tme requrements. When the threads are scheduled for executon on a suffcently fast CPU, gaps wll appear n the schedule of guest nstructons, as llustrated by the whte gaps n the black bar. These gaps are peces of dle tme whch can be reclamed to perform useful host work. STI recovers fne-gran dle tme effcently and automatcally. STI uses a control dependence graph (CDG, a subset of the program dependence graph [4]) to represent each functon n a program. In ths herarchcal graph, control dependence regons such as condtonals and loops are represented as non-leaf nodes, and assembly language nstructons are stored n leaf nodes. Condtonal nestng s represented vertcally whle executon order s horzontal. The CDG s a good form for holdng a program for STI because ths structure smplfes analyss and transformaton through ts herarchy. Program constructs such as loops and condtonals as well as sngle basc blocks are moved effcently n a coarse-gran fashon, yet the transformatons also provde fne-gran schedulng of nstructons as needed. Usng STI for HSM nvolves movng guest code nto the correct poston wthn the host code for executon at the correct tme. The guest functon code s frst appended to the end of each host functon. The resultng functon s then automatcally ntegrated by movng guest nodes to the left n the CDG to locatons whch correspond to the target tme ranges. A tght target tme range may fall completely wthn a host node, forcng movement down nto that node or ts subgraph. Before code moton the host and guest threads are statcally analyzed for tmng behavor, wth best and worst cases predcted. Hardware and software both conspre to make ths a dffcult problem n the general case. However, we focus on applcatons wthout recurson or dynamc functon calls, and processors wthout superscalar executon, vrtual memory or varable latency nstructons. We assume locked caches or fast onchp memory and predctable ppelnng; these restrctons have lttle mpact on the desgn space targeted. Durng ntegraton, programmer-suppled tmng drectves gude ntegraton. Tmng tter from uneven predcates n the host thread s automatcally reduced (usng paddng nstructons) to meet guest requrements. The CDG s structure makes the tmng analyss and ntegraton straghtforward. STI produces code whch s more effcent than context-swtchng or busy-watng. The processor spends fewer cycles performng overhead work. The prce s expanded code memory. STI may duplcate code, unroll and splt loops and add guard nstructons. It may also duplcate both host and guest threads. The memory expanson can be reduced by tradng off executon speed or tmng accuracy. Ths flexblty allows the talorng of STI transformatons to a partcular embedded system s constrants. For more detals please see [9] and related work. 3. NEW TECHNIQUES The methods presented here use pollng servers to support short laxty secondary threads, and segmentaton to enable ntegraton of secondary functons wth frequent, short prmary functons. These extensons assume that the ntegrated threads have the hghest prorty n the system, are run soon enough to meet ther deadlnes, and are not preemptable (by other threads, the OS, or ISRs). Other than ths there are no restrctons on other threads, any OS/RTOS, or the schedulng polcy. 3. Integrated Pollng Servers Table. Terms for Schedulng Analyss Term a-z D T T PS R R C C L I N PS hp() t ps Z Z * Thread Deadlne of thread Perod of thread Defnton Perod of Pollng Server for thread Response tme of thread Response tme of thread after ntegratng pollng servers Worst case computaton tme of thread (ncludng fne-gran dle tme Z ) Worst case computaton tme of thread (ncludng fne-gran dle tme Z ) after ntegratng pollng servers Laxty or slack tme of thread Interference from hgher-prorty threads for thread Number of pollng servers for thread Set of threads wth hgher prorty than thread Set of ntegrated threads before pollng server ntegraton Set of pollng servers Idle tme n thread Mnmum dle tme n thread ncludng mnmum dle tme between nvocatons Table presents the terms used n ths analyss. At tme t, thread s released. To complete on tme (before ts deadlne D ), t must start no later than ts worst case computaton tme C added to the maxmum tme whch t can be delayed by nterference I (from hgher prorty threads hp()) before ts deadlne. Integrated threads (t) cannot be preempted, hence they are all n hp() and can nterfere wth thread. I k t In a system wth multple ntegrated threads, ths may rase response tmes to levels whch make real-tme performance unachevable. We solve ths problem by ntegratng pollng servers nto these ntegrated threads. C k

4 A pollng server s a perodc thread wth relatvely hgh prorty whch servces an aperodc thread n order to ensure a fxed response tme. It conssts of pollng code and condtonally executed servce (thread or ISR) code. If laxty L of thread s greater than an ntegrated thread s duraton C, n t, then can be deferred untl completes, and no pollng server s needed. Otherwse, n order to ensure that an aperodc thread s servced soon enough to meet ts deadlne, a pollng server s ntegrated at least every T PS n each ntegrated thread. The maxmum pollng server perod s T PS = D - R. The response tme R can be computed from the standard recurrence equaton [2], n whch estmate n+ depends upon the prevous estmate, and R 0 = C : R n+ = C + hp( ) R T n C Overlappng multple pollng servers wthn an ntegrated thread are scheduled n order of decreasng prorty, leadng to the delay (statc preempton) of the lower prorty pollng servers. After ntegraton there wll one or more copes of the pollng server for each thread n the ntegrated thread. Threads wth lttle tolerance for delay (T PS << D) wll requre many more pollng server copes than those wth more tolerance (T PS D). N PS C = T Integraton requres reducng tmng tter (typcally through nop paddng) to ensure that upcomng prmary thread nstructons execute on tme. Ths mples that the pollng server code must be padded to last a constant tme regardless of whether the servce s actually needed or not. As a result, the pollng server lasts C n both cases and s not bandwdth-preservng. The secondary thread s executon tme then rses to: C ' = C + ps PS C N The nterference tme also rses, as pollng servers for hgher prorty threads wll affect thread regardless of whether these threads are released. These run more often and always last the worst case tme C. The response tme, below, ncludes three summatons whch consder the pollng servers whch servce hgher prorty threads, all ntegrated threads (not ncludng the pollng servers) and the remanng hgher prorty threads whch do not use ntegrated pollng servers. R' n+ = C + l hp( ) ~ ps R' Tl hp( ) ps n Cl R' T n PS C + PS k t ~ ps R' Tk n C Once laxty drops much below the duraton of a secondary thread, both the code sze and secondary thread executon tme suffer. The executon tme also rses as the duraton of the pollng server rses, emphaszng the mportance of mnmzng the duraton of these hgh-prorty threads, typcally mplemented as nterrupt servce routnes. k + There has been much work extendng pollng servers (e.g. deferred server, sporadc server [20]); nvestgatng what s compatble wth STI s statc schedulng characterstcs s left for future work. 3.2 Parttonng Long Secondary Threads nto Segments A prmary thread whch runs frequently wll not be able to provde much dle tme for a secondary thread to execute, as t must complete before the next nstance of the prmary thread (as shown n Fgure 3). Because ntegrated threads are not preemptable, the maxmum computaton tme C for a secondary thread must be less than Z *, the avalable dle tme per perod. C Z = Z + T C 2 * T * cocall Our soluton s to partton the secondary thread nto segments and ntegrate each segment wth the prmary thread. Coroutne calls and a separate stack frame for the ntegrated thread preserve lve varables across ts successve nvocatons, untl the completon of the secondary thread. Each nvocaton of the ntegrated thread executes a segment of the secondary thread. WCET C ncludes fne-gran dle tme Z fne-gran dle tme T : Perod or mn. nterarrval tme T + Z C Z * : Max. tme avalable for secondary thread T -C Occurrences of task 2*T CS : Contextswtchng delay Fgure 3. Frequently executng prmary thread has lmted dle tme, lmtng duraton of potental secondary thread for ntegraton. Segments are formed by traversng the CDG of the secondary thread functon. Subroutne calls are not supported as t mght be necessary to ntegrate prmary code wth the called functon. Unstructured or rreducble code (strongly connected components wth multple entres [7]) s not supported as t ntroduces tmng varablty; future work wll use node splttng and paddng to correct ths. In the frst step of segment formaton, the condtonals n the secondary thread functon are padded so each case has the same duraton. However, n the case of loops wth unknown teratons, ths paddng s not performed. Second, the CDG s splt nto one or more segments, each of whch completes wthn the avalable dle tme. Thrd, a coroutne call s placed at the end of each segment. Fourth, the prmary thread s ntegrated wth each segment usng exstng STI technques. The algorthm for segment formaton appears n Appendx A and s appled herarchcally. A new segment s created and ts avalable dle tme s set to the maxmum avalable (Z * ) and the frst node n the functon s examned. If ts duraton s less than the avalable dle tme, the node s added to the segment and the method s appled to the successor node. The segment grows untl the next node does not ft. If the node s duraton s known, but s too long, there are two possble actons. If the node s a

5 condtonal, a new segment s started for each case (true and false). If the node s a loop, t s splt to fll but not exceed the remanng dle tme. If the duraton of the node s unknown, then ts type s examned. For a predcate, new segments are started wth the successor node and also wth each condton. For a loop, a new segment s started wth the successor node and also the frst chld n the loop body. Prevous work [] ntegrates prmary and secondary loops by creatng a fused loop and two clean-up loops. The fused verson matches secondary work to avalable dle tme n the prmary loop body through unrollng and paddng. The fused loop executes whle both prmary and secondary work exst. The second loop (prmary clean-up) fnshes any remanng prmary teratons, padded wth nops, and the thrd loop (secondary clean-up) fnshes any remanng secondary teratons. These technques are now modfed so that f the fused loop termnates because the prmary loop teratons have been completed, but enough secondary loop teratons reman to execute the segment agan, the segment wll repeat. Recall that the segment begns at the start of the loop body. Ths s mplemented by savng the current (not next) segment s start address when performng the coroutne call. These transformatons enable long secondary threads to be parttoned nto segments short enough to be ntegrated wth frequent prmary threads wth lttle dle tme. 3.3 Combnng Integraton of Pollng Servers wth Parttoned Threads In order to combne ntegrated pollng servers wth parttoned long secondary threads, the followng approach s used. Frst, the pollng servers are ntegrated wth the prmary thread. Ths flls n the frst C of dle tme wth the pollng server. Pollng servers for multple threads are ntegrated sequentally. The remanng dle tme now determnes the segment tme used to partton the long secondary threads. In ths fashon the pollng servers are executed each tme the ntegrated thread s run, regardless of whch segment executes. 4. EXPERIMENTAL METHOD 4. Target Applcaton Front Porch Back Porch.5 usec 4.7 usec Horzontal Sync 4.7 usec Actve Vdeo 52.6 usec Fgure 4. Vdeo porton of monochrome NTSC vdeo sgnal To demonstrate the benefts of STI for HSM we use an NTSC vdeo refresh controller applcaton for drvng a dsplay. We replace a complex vdeo generator chp wth smple, nexpensve hardware and software-mplemented functonalty, as shown n Fgure 5. NTSC vdeo sgnal generaton represents a large applcaton doman, wth vdeo outputs present n consumer electronc devces such as DVD players, dgtal cameras, camcorders, and vdeo games. Vdeo overlay, or on-screen dsplay, s a related functon n whch locally generated graphcs are overlad atop the ncomng vdeo sgnal. Ths overlayng requres precse tmng analyss of that vdeo sgnal s synchronzaton nformaton to selectvely replace pxels wth frame-buffer contents. Ths appears n televsons, hosptal and securty montorng systems. Pcture-n-pcture dsplay s also related, but requres actual samplng, bufferng and reszng of multple ncomng vdeo streams and hence requres much more processng. From a computatonal perspectve, the generaton and overlay threads are qute smlar, and we have developed onscreen dsplay software and hardware [6] based upon the work presented here. Ths paper s applcaton can be easly appled to vdeo overlay. The processor must generate an NTSC-compatble monochrome vdeo sgnal, Fgure 4 summarzes the vdeo porton of the sgnal. Although the CRT s electron beam scans 525 tmes per frame (n two nterlaced passes (felds) per 33.3 ms frame), only 494 rows are vsble and requre vdeo data. There are addtonal features n a vdeo sgnal (vertcal sync and equalzaton pulses); these are generated by our software as well. The vdeo data porton of the sgnal s the most demandng, as a pxel of vdeo data must be generated every 200 ns (for 256 pxels per row). We use an external shft regster to seralze a byte packed wth data, reducng the processor loadng. Wth a 20 MHz CPU ths corresponds to 6 clock cycles per byte, whch s too frequent for context swtchng or dynamc schedulng. A dgtal-to-analog converter (DAC) converts the seralzed pxels from the data byte to an analog voltage for the NTSC output. Our system generates a monochrome 256 x 254 pxel mage wth two bts per pxel, although resolutons of up to 52 x 525 wth bt per pxel are possble wth mnor modfcatons. 4.2 Expermental Platform 4.2. Hardware Latch 64 kbyte SRAM MCU Clock ATmega28 MCU Clear Pxel Clock Dvder Shft Sync 4-bt Shft Regster Byte Clock Dvder Load NTSC Vdeo Out Fgure 5. Block dagram of overall crcut. 4-bt Shft Regster Our expermental platform, called STIGLtz [3], provdes a lbrary of graphcs prmtve renderng functons ntegrated wth vdeo refresh code and hgh-speed seral communcatons wth very smple hardware. We target the Atmel AVR archtecture, whch features 8 bt natve word sze, 32 general-purpose regsters, and lmted support for 6 bt operatons. Three regster pars can be used as ndex regsters, speedng memory access. The Atmega 28 processor [] s nexpensve (about $3 n volume) and provdes 28 klobytes of Flash program memory, 4 klobytes of on-board data SRAM and numerous perpherals. The CPU core features a two-stage ppelne; most nstructons take one cycle, but some take more (branches are 2 cycles, multples 2, calls 5, returns 4, loads 2-3 and stores 2-3). Data memory s byte-

6 accessble and byte-algned, and there s no cache. An Atmel STK500 evaluaton board and STK50 processor expanson card are used to execute the ntegrated code. These are clocked at 20 MHz. 64 klobytes of external SRAM are used as well, wth a one cycle performance penalty, so loads and stores take three cycles. The C compler used s GCC 3.2 [3]. No operatng system s used, although STI does not preclude the use of one. functon. Fgure 9 presents an overvew of what ntegrated threads are avalable and how they are composed. The chosen thread then reads vdeo data from the frame buffer n memory and sends t out to the CRT through the DAC. The ISR consumes 98.2% of the processor s tme. Seral communcaton at 5.2 kbaud s handled by a UART, two crcular queues, and routnes for enqueueng and dequeung data. Fgure 8 presents the overvew of the software constructs whch allow the applcaton to gather work (.e. renderng graphcs prmtves) to perform durng vdeo refresh. The applcaton program can specfy f renderng work s to be performed mmedately or can be deferred. In the latter case, parameters for each deferred prmtve are saved n the approprate deferred work queue. Graphcs Applcaton Graphcs Prmtve Arguments Fgure 6. Photograph of vdeo seralzer board. Every 6 cycles the crcut shown n Fgure 5 and Fgure 6 samples the data byte present on an MCU output port, seralzes t nto four pxels (usng two four-bt shft regsters) and converts t nto an NTSC-complant voltage. Ths hardware releves the MCU of shftng out ndvdual pxels, but stll leaves t wth the responsblty of ensurng that the vdeo data s present on ts output port at the approprate tme. The shft regsters are clocked by the pxel clock dvder and are loaded by the byte clock dvder. We have bult and tested ths crcut and t works correctly. 4.3 Software 4.3. Overvew and Software Archtecture HSync Back Porch Vdeo 63.5 us Front Porch DrawXLne PumpPxel UART DrawHLne PumpPxel UART Dspatcher n perodc ISR selects one of these functons to refresh dsplay DrawVLne PumpPxel UART Refresh Pxels DrawCrcle PumpPxel UART Output Port and Vdeo Dgtal to to Analog Converter NTSC Vdeo Out Other Graphcs Prmtve Other Graphcs Renderng Prmtve Other Graphcs FunctonsV Renderng Prmtve FunctonsV Renderng Functons Renderng Pxels Frame Buffer PumpPxel UART Fgure 8. Overvew of software archtecture for STIGLtz. Deferrable graphcs work s enqueued for later renderng durng vdeo refresh. Vdeo Refresh PumpPxel UART Pollng Servers Graphcs Prmtves Renderng Tx Rx Servce_DrawHLne_Queue Servce_DrawXLne_Queue NTSC ISR Dspatcher Integrated functon or PumpPxel_UART PumpPxel_UART Servce_DrawDagonalLne_Queue Fgure 7. NTSC ISR calls dspatcher, whch calls or resumes an ntegrated thread to render most of vdeo data and servce the UART. Fgure 7 shows a tmelne of processor actvty durng the generaton of a vdeo scan lne and the relatonshp to portons of the NTSC sgnal. A perodc tmer-based nterrupt trggers an ISR whch generates the vdeo sgnal. Its two responsbltes are to draw a full feld (whch takes 6.7 mllseconds and occurs 60 tmes per second) and to generate the equalzaton pulses of the vertcal synchronzaton sgnal ( Fgure 4). The ISR examnes the queues and selects one of the ntegrated functons (f data s present n the queue) or else a dedcated busy-wat refresh Segments 0 2 DrawHLne_PumpPxel_UART 0 7 DrawDLne_PumpPxel_UART 8 DrawXLne_PumpPxel_UART Integrated functons wth pollng servers Fgure 9. Overvew of ntegrated threads, ncludng orgnal threads and sequence of ntegraton.

7 4.3.2 Real-Tme Workload Analyss The real-tme threads appear n Table 2. Wthout ntegraton, the UART servce code could be delayed for the entre duraton of the vdeo refresh ISR, or over 6 ms. In ths case most ncomng data on the UART wll be lost due to preempton by the vdeo refresh. Term Defnton Vdeo Refresh (hghest prorty) Smlarly the effectve transmsson speed wll be a fracton of the speed possble, as the transmt ISR wll only be servced between vdeo refresh processng. Wth ntegraton of the pollng servers the response tme drops to 63.5 us, guaranteeng proper UART operaton at full speed. Table 2. STIGLtz Real-Tme Thread Set for 20 MHz MCU UART Receve (5.2 kbaud) UART Transmt (5.2 kbaud, lowest prorty) D Deadlne ms 86.8 us 86.8 us T Perod 6.72 ms 86.8 us 86.8 us C Worst case computaton tme (ncludng fne-gran dle tme) 6.7 ms 4.2 us.6 us I Interference from hgher-prorty threads n/a 6.72 ms ms L Laxty or slack tme n/a 82.6 us 85.2 us R Maxmum response tme n/a ms ms R Maxmum response tme after ntegratng pollng servers n/a 63.5 us 63.5 us Prmary Thread Functon The functon PumpPxel (Fgure 0) s called once per scan lne (263 tmes per vdeo refresh nterrupt) to send out a row of vdeo data from the frame buffer. The processng tme s 7 cycles per byte, whle dle tme vares wth dsplay resoluton, bt depth and processor clock speed. Wth a 256 pxel wde, two-bt-per-pxel dsplay, a byte must be sent out every 6 cycles, so the dle tme s 9 cycles per byte. However, ntegraton may nvolve loop unrollng, whch wll reduce the loop overhead (3 cycles) and rase the dle tme toward ts bound of 2 cycles per byte. The dle tme remanng wthn PumpPxel s thus between 567 (28.35 us) and 756 cycles (37.8 us) for 64 bytes of vdeo data. unsgned char * FBptr; unsgned char ; vod PumpPxel(){ FBptr = CurFrameBuffer; = FrameBufferWdth; do { PORTE = *FBPtr++; } whle (--); PP_Int: ld r26,lo8(framebuffer) ld r27,h8(framebuffer) ld r4,framebufferwdth PP_Loop: ld r5,x+ ; 3 cy out PORTE,r5 ; cy dec r4 ; cy brne PP_Loop ; -2 cy Fgure 0. PumpPxel functon body n C and assembly code Secondary Thread Functons Table 3. Secondary Thread Functon Szes Functon Lnes of C Code Compled Sze (.text, bytes) Functon Lnes of C Code Compled Sze (.text, bytes) DrawCrcle DrawXMaorLne DrawHorzontalLne DrawYMaorLne DrawVertcalLne DrawSprte DrawDagonalLne DrawSprteOVR UART_Rx 3 94 UART_Tx 0 46 The dle tme wthn a sngle nvocaton of PumpPxel s too short for most graphcs renderng prmtves. For example, renderng an 80-pxel long x-maor lne takes 435 us. DrawLne s splt nto fve sub-functons based on lne type n order to smplfy ntegraton. In order to be able to use STI, we partton frequently executed graphcs prmtves DrawLne, DrawCrcle, DrawSprte and DrawSprteOVR. The szes of these functons are presented n Table 3. The UART pollng servers, prevously descrbed n Table 2, have no fne-gran (nternal) dle tme and so are secondary threads.

8 4.3.5 Impact of Regster Fle Parttonng upon Performance STI requres that all threads share the regster fle, so hgh regster pressure could lead to more spllng and fllng and hence less effcent code. The optmal method for sharng the regster fle s to perform regster allocaton [7] after ntegraton, allowng each thread to use regsters for lve values only. However, ths could lead to spll and fll code whch would dsrupt tmng. The alternatve we use s to partton the regster fle usng avr-gcc and allocate specfc regsters to each thread statcally before ntegraton. Ths may reduce performance but elmnates post-ntegraton tmng perturbatons. One common characterstc of embedded processors s a small and rregular regster set. Ths complcates the task of regster allocaton. Much research has been performed attemptng to mprove regster allocaton for these archtectures, buldng upon allocaton methods for general CPUs and DSPs [5][5][5] [9]. Although the AVR archtecture features 32 regsters, ther use s rregular. Varous classes of regsters arse due to lmtatons such as regster parng for word nstructons, mplct result regsters for certan nstructons, ponter regsters and mmedate operand constrants. Ths rregularty may lead to reduced performance after regster allocaton. For smplcty we dvde the AVR regsters nto three classes for our senstvty analyss. Ponter regsters (r26-r3) can be used as address 5 ponters and operate on mmedate operands. Immedate regsters (r6-r25) can use mmedate operands. Other regsters (r0-r5) can be used only for regster-regster and I/O operatons. We examne the senstvty of the host threads executon tme to the number of regsters avalable by decreasng the number of regsters n a gven category avalable to the compler s regster allocator through gcc s ffxed opton. Each of the three host functons s called 00 tmes wth varyng parameters by a test harness to render a seres of graphcs prmtves. Executon tme s measured usng an on-chp tmer/counter, and all nterrupts (other than the tmer/counter overflow) are dsabled. Complaton fals when more than one par of ponter regsters s excluded. Fgure shows how host functon executon tme was affected by the regster reductons n three categores. DrawSprte s most senstve to reduced mmedate or other regsters, whle DrawLne and DrawCrcle are not very senstve to the reductons. Based on these measurements, we choose to exclude eght other regsters and two ponter regsters when complng DrawLne and DrawCrcle, but exclude only one other regster and two ponter regsters when complng DrawSprte. Ths wll accelerate the context swtchng for DrawLne and DrawCrcle (reducng overhead) but not slow down DrawSprte s executon. Normalzed Run Tme DrawSprte - Immedate DrawSprte - Ponter DrawSprte - Other DrawCrcle - Immedate DrawCrcle - P o nter DrawCrcle - Other DrawLne - Immedate DrawLne - Ponter DrawLne - Other Number of Regsters Excluded Fgure. Executon tmes for graphcs functons rse as fewer regsters are avalable, reflectng varyng regster pressures. 4.4 Integraton Thread ntegraton s performed as follows. Frst, the secondary thread functons are prepared for ntegraton by complng them wth gcc and excludng varous regsters as descrbed above. Second, Thrnt s used to pad away tmng tter from condtonals usng nops as well as create tme-annotated CDGs to gude ntegraton. The remanng steps are summarzed n Fgure 9. Thrd, the pollng servers are manually ntegrated wth the PumpPxel functon to create PumpPxel_UART. Fourth, segments are manually formed for the secondary thread functons, usng the tmng nformaton prevously derved. Ffth, PumpPxel_UART s manually ntegrated wth each segment of each secondary thread functon. The resultng code can now be assembled, lnked and downloaded. 5. ANALYSIS We evaluate the tmng accuracy of the ntegrated code through emprcal testng; the vdeo sgnal successfully drves two dfferent types of televson set. The tmng of the packed vdeo byte from the mcrocontroller s evaluated wth a dgtal samplng osclloscope to ensure ts correctness. Seral communcatons at 5.2 kbaud are performed usng a PC and are correct. The osclloscope screen shot n Fgure 2

9 demonstrates the proper smultaneous operaton of vdeo generaton wth seral data recepton and transmsson. We measure performance of the orgnal dscrete renderng and new ntegrated renderng usng debuggng sgnals and the osclloscope. output nstructons. Ths leaves only 0.36 MIPS for foreground processng. The four bars to the rght demonstrate processor utlzaton when renderng varous types of lnes. STI reclams large amounts of dle tme, provdng.3 to 4.5 MIPS of lne renderng and 2. MIPS of seral communcaton processng nstead. Some tme s wasted n the dspatcher or context swtchng, whle some s lost because STI ntegraton s not completely effcent when dealng wth unpredctable loops. 5 Normalzed Performance (/tme) 0 5 Dscrete Integrated Fgure 2. Osclloscope shows smultaneous hgh-speed seral transmsson and recepton durng vdeo sgnal generaton. Traces: Seral receve data, seral transmt data, sync out, vdeo out. 5. Renderng performance of ntegrated code MIPS Used No Integraton Int. - Horzontal Lne Int. - Vertcal Lne Int. - Dagonal Lne Int. - X- Maor Lne Wasted capacty Integrated renderng UART pollng servers Dspatcher w /context sw tchng Dsplay refresh & sync Foreground processng Fgure 3. Processor utlzaton for vdeo sgnal generaton by STIGLtz. Wthout ntegraton, 2 MIPS of dle tme s trapped n fragments only 9 cycles long and s unusable for other processng. Fgure 3 shows the processor utlzaton for our system under varous condtons. The leftmost bar (No Integraton) shows that before applyng STI, the MCU spends most of ts tme refreshng the dsplay or executng the nops between vdeo 0 H-Lne V-Lne D-Lne X-Maor-Lne Fgure 4. Integraton leads to speed-ups of 3.99x to 3.5x n tme for renderng graphcs prmtves. Fgure 4 shows the normalzed renderng performance for two dfferent desgn alternatves: dscrete renderng and dsplay refresh, and ntegrated renderng and refresh. In the frst case, the graphcs prmtves are rendered wth dscrete (nonntegrated) code, whch can run only when the vdeo refresh ISR s not actve, or durng the.8% of the total tme avalable. The second case uses ntegrated code when possble to render graphcs prmtves. Integraton speeds up renderng tme by 3.99x to 3.54x over the dscrete case. The varaton n speed-up comes from the amount of renderng work performed per segment and the number of segments needed per secondary thread. Recall that each loop wth an unknown teraton count requres at least one segment; ths s very neffcent f the loop s executon tme s much less than the dle tme of the segment. The horzontal, vertcal and dagonal functons all contan a sngle such loop wth a sngle level of condtonals, allowng effcent ntegraton and only three segments. The x- maor functon has a much more complex CDG and contans four such loops, one doubly nested. These loops requre the formaton of nne segments, wastng much of the avalable dle tme. 5.2 Code memory expanson Table 4 shows how code szes ncrease by a factor of 3x to 5x after ntegraton. Varous factors contrbute to the ncrease, ncludng paddng, loop unrollng and splttng, and code replcaton nto condtonals. Although these code sze ncreases are sgnfcant, they apply only to ntegrated functons, and may be an acceptable prce to pay gven the dramatc performance mprovement.

10 Table 4. Szes of Orgnal and Integrated Functons Functon Orgnal Sze (bytes) Padded Sze Integrated Sze Code Expanson Rato DrawHorzontalLne DrawVertcalLne DrawDagonalLne DrawXMaorLne CONCLUSIONS AND FUTURE WORK Ths paper ntroduces two new methods whch enable STI to be used for hard real-tme systems wth urgent threads and long applcaton work. Frst, prevous STI work dd not allow maxmum response tmes to fall below the duraton of the ntegrated threads; ths work presents pollng servers and methods of fndng the resultng maxmum response tme. Second, prevous work requred that secondary (host) functons be no longer than the worst-case dle tme + slack tme for the prmary (guest) functon. Ths work ntroduces segmentaton methods whch remove ths restrcton. Our methods are demonstrated on a software vdeo generaton applcaton and enable smultaneous hgh-speed seral communcaton, vdeo refresh and graphcs renderng. We have bult and tested the system and performance mproves by a factor of about 4x to over 3.5x. There are varous drectons for mprovement of these technques. Better segment formaton methods could offer more consstent performance mprovement from STI by better usng avalable dle tme when loops have unknown teraton counts. It may be possble to reduce the code sze expanson through schedule-senstve abstracton of common code segments. Implementng the new segment-based transformatons automatcally would accelerate the ntegraton process and smplfy debuggng. 7. ACKNOWLEDGEMENTS Ths work was funded n part by NSF CAREER award CCR The authors thank the varous students who contrbuted to ths proect: Robert Morrson, Barret Krupnow, Jmmy Hll, Crag Nowell, and Paul Lee. In addton, thanks go to Atmel and BITS for the knd donatons of development tools. 8. REFERENCES [] Atmel Corp., Atmega 28: 8-Bt AVR Mcrocontroller wth 28K Bytes In-System Programmable Flash, pdf [2] Audsley, N. et al. Applyng New Schedulng Theory to Statc Prorty Preemptve Schedulng, Software Engneerng Journal, 8(5):284:292 [3] avr-gcc, [4] T. P. Baker and Alan Shaw. The Cyclc Executve Model and Ada, Real-Tme Systems, (): 7-25, 989 [5] Barag, D., Pande S. and Agarwal, D. P.. A Framework for Enhancng Code Qualty n Lmted Regster Set Embedded Processors, ACM SIGPLAN Workshop on Languages, Complers and Tools for Embedded Systems (LCTES) 2000, June [6] Barthelmann, V.. Inter-Task Regster-Allocaton for Statc Operatng Systems, ACM SIGPLAN Workshop on Languages, Complers and Tools for Embedded Systems / Software and Complers for Embedded Systems (LCTES 02-SCOPES 02), June [7] Chatn, G. J. Regster Allocaton and Spllng va Graph Colorng, ACM SIGPLAN Notces, 7(6):98-05 [8] Dean, A. G. and Grzybowsk, R. R., A Hgh-Temperature Embedded Network Interface usng Software Thread Integraton," Second Workshop on Compler and Archtectural Support for Embedded Systems, Washngton, DC, October [9] Dean, A. G., Complng for Concurrency: Plannng and Performng Software Thread Integraton, 23rd IEEE Real- Tme Systems Symposum, December 3-5, 2002, Austn, TX. [0] Dean, A., Shen, J.P. "System-Level Issues for Software Thread Integraton: Guest Trggerng and Host Selecton," 20th IEEE Real-Tme Systems Symposum, Phoenx, Arzona, December -3, 999 [] Dean, A., Shen, J. P. "Technques for Software Thread Integraton n Real-Tme Embedded Systems," 9th IEEE Real-Tme Systems Symposum, Madrd, Span, December 2-4, 998 [2] Dean, A., Shen, J. P. "Hardware to Software Mgraton wth Real-Tme Thread Integraton," 24th EuroMcro Conference, Västerås, Sweden, August 25-27, 998 [3] Dean, A. STIGLtz Proect Manual, CESR Techncal Report, June [4] Ferrante, J., Ottensten, K. J. and Warren, J. D.. The Program Dependence Graph and Its Use n Optmzaton, ACM Transactons on Programmng Languages, July 987, 9(3): [5] Kong, T. and Wlken, K. Precse Regster Allocaton for Irregular Archtectures, 3st Internatonal Mcroarchtecture Conference (MICRO-3), December 998. [6] Krupnow, B., Hll, J., Nowell, C. and Lee, P. Vdeo Software Thread Integraton, CESR Techncal Report, December [7] Muchnck, S. S. Advanced Compler Desgn and Implementaton, Morgan Kaufmann Publshers, 997

11 [8] Nsley, E. Rsng Tdes, Dr. Dobb s Journal, #346, March 2003 [9] Scholz, B.and Ecksten, E. Regster Allocaton for Irregular Archtectures, ACM SIGPLAN Workshop on Languages, Complers and Tools for Embedded Systems / Software and Complers for Embedded Systems (LCTES 02-SCOPES 02), June [20] Strosnder, J. K., Lehoczky, J. P. and Sha, L. The Deferrable Server Algorthm for Enhanced Aperodc Responsveness n Hard Real-Tme Envronments, IEEE Transactons on Computers, 44(), January 995, pp. 73-9

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Verification by testing

Verification by testing Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing EDA222/DIT161 Real-Tme Systems, Chalmers/GU, 2014/2015 Lecture #8 Real-Tme Systems Real-Tme Systems Lecture #8 Specfcaton Professor Jan Jonsson Implementaton System models Executon-tme analyss Department

More information

Real-time Scheduling

Real-time Scheduling Real-tme Schedulng COE718: Embedded System Desgn http://www.ee.ryerson.ca/~courses/coe718/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrcal and Computer Engneerng Ryerson Unversty Overvew RTX

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Lecture 7 Real Time Task Scheduling. Forrest Brewer

Lecture 7 Real Time Task Scheduling. Forrest Brewer Lecture 7 Real Tme Task Schedulng Forrest Brewer Real Tme ANSI defnes real tme as A Real tme process s a process whch delvers the results of processng n a gven tme span A data may requre processng at a

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Real-Time Guarantees. Traffic Characteristics. Flow Control

Real-Time Guarantees. Traffic Characteristics. Flow Control Real-Tme Guarantees Requrements on RT communcaton protocols: delay (response s) small jtter small throughput hgh error detecton at recever (and sender) small error detecton latency no thrashng under peak

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011 9/8/2 2 Outlne Appendx C: The Bascs of Logc Desgn TDT4255 Computer Desgn Case Study: TDT4255 Communcaton Module Lecture 2 Magnus Jahre 3 4 Dgtal Systems C.2: Gates, Truth Tables and Logc Equatons All sgnals

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Multitasking and Real-time Scheduling

Multitasking and Real-time Scheduling Multtaskng and Real-tme Schedulng EE8205: Embedded Computer Systems http://www.ee.ryerson.ca/~courses/ee8205/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrcal and Computer Engneerng Ryerson Unversty

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

IP Camera Configuration Software Instruction Manual

IP Camera Configuration Software Instruction Manual IP Camera 9483 - Confguraton Software Instructon Manual VBD 612-4 (10.14) Dear Customer, Wth your purchase of ths IP Camera, you have chosen a qualty product manufactured by RADEMACHER. Thank you for the

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

An Investigation into Server Parameter Selection for Hierarchical Fixed Priority Pre-emptive Systems

An Investigation into Server Parameter Selection for Hierarchical Fixed Priority Pre-emptive Systems An Investgaton nto Server Parameter Selecton for Herarchcal Fxed Prorty Pre-emptve Systems R.I. Davs and A. Burns Real-Tme Systems Research Group, Department of omputer Scence, Unversty of York, YO10 5DD,

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7 Optmzed Regonal Cachng for On-Demand Data Delvery Derek L. Eager Mchael C. Ferrs Mary K. Vernon Unversty of Saskatchewan Unversty of Wsconsn Madson Saskatoon, SK Canada S7N 5A9 Madson, WI 5376 eager@cs.usask.ca

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

A Predictable Execution Model for COTS-based Embedded Systems

A Predictable Execution Model for COTS-based Embedded Systems 2011 17th IEEE Real-Tme and Embedded Technology and Applcatons Symposum A Predctable Executon Model for COTS-based Embedded Systems Rodolfo Pellzzon, Emlano Bett, Stanley Bak, Gang Yao, John Crswell, Marco

More information

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations* Confguraton Management n Mult-Context Reconfgurable Systems for Smultaneous Performance and Power Optmzatons* Rafael Maestre, Mlagros Fernandez Departamento de Arqutectura de Computadores y Automátca Unversdad

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont) Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Needed Information to do Allocation

Needed Information to do Allocation Complexty n the Database Allocaton Desgn Must tae relatonshp between fragments nto account Cost of ntegrty enforcements Constrants on response-tme, storage, and processng capablty Needed Informaton to

More information

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

If you miss a key. Chapter 6: Demand Paging Source:

If you miss a key. Chapter 6: Demand Paging Source: ADRIAN PERRIG & TORSTEN HOEFLER ( -6- ) Networks and Operatng Systems Chapter 6: Demand Pagng Source: http://redmne.replcant.us/projects/replcant/wk/samsunggalaxybackdoor If you mss a key after yesterday

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Internet Traffic Managers

Internet Traffic Managers Internet Traffc Managers Ibrahm Matta matta@cs.bu.edu www.cs.bu.edu/faculty/matta Computer Scence Department Boston Unversty Boston, MA 225 Jont work wth members of the WING group: Azer Bestavros, John

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Loop Pipelining for High-Throughput Stream Computation Using Self-Timed Rings

Loop Pipelining for High-Throughput Stream Computation Using Self-Timed Rings Loop Ppelnng for Hgh-Throughput Stream Computaton Usng Self-Tmed Rngs Gennette Gll, John Hansen and Montek Sngh Dept. of Computer Scence Unv. of North Carolna, Chapel Hll, NC 27599, USA {gllg,jbhansen,montek}@cs.unc.edu

More information

Polyhedral Compilation Foundations

Polyhedral Compilation Foundations Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden

Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden Optmzng for Speed Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se What s the potental gan? Latency dfference L$ and mem: ~5x Bandwdth dfference L$ and mem: ~x Repeated TLB msses adds a factor ~-3x Execute

More information

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES Ruxandra Olmd Faculty of Mathematcs and Computer Scence, Unversty of Bucharest Emal: ruxandra.olmd@fm.unbuc.ro Abstract Vsual secret sharng schemes

More information

Mixed-Criticality Scheduling on Multiprocessors using Task Grouping

Mixed-Criticality Scheduling on Multiprocessors using Task Grouping Mxed-Crtcalty Schedulng on Multprocessors usng Task Groupng Jankang Ren Lnh Th Xuan Phan School of Software Technology, Dalan Unversty of Technology, Chna Computer and Informaton Scence Department, Unversty

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems S. J and D. Shn: An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems 2355 An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems Seunggu J and Dongkun Shn, Member,

More information

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.

More information

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations Improvng Hgh Level Synthess Optmzaton Opportunty Through Polyhedral Transformatons We Zuo 2,5, Yun Lang 1, Peng L 1, Kyle Rupnow 3, Demng Chen 2,3 and Jason Cong 1,4 1 Center for Energy-Effcent Computng

More information

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET

More information

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management. //7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

with Optic65 and Optic25 Cameras FOR OUTDOOR TRACKING ONLY unless used in conjunction with the Indoor Tracking Accessory.

with Optic65 and Optic25 Cameras FOR OUTDOOR TRACKING ONLY unless used in conjunction with the Indoor Tracking Accessory. wth Optc6 and Optc Cameras Quck Start Gude FOR OUTDOOR TRACKING ONLY unless used n conjuncton wth the Indoor Trackng Accessory. CONGRATULATIONS ON SCORING YOUR SOLOSHOT Our category-creatng lne of personal

More information

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems:

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems: Speed/RAP/CODA Presented by Octav Chpara Real-tme Systems Many wreless sensor network applcatons requre real-tme support Survellance and trackng Border patrol Fre fghtng Real-tme systems: Hard real-tme:

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and

More information

Resource and Virtual Function Status Monitoring in Network Function Virtualization Environment

Resource and Virtual Function Status Monitoring in Network Function Virtualization Environment Journal of Physcs: Conference Seres PAPER OPEN ACCESS Resource and Vrtual Functon Status Montorng n Network Functon Vrtualzaton Envronment To cte ths artcle: MS Ha et al 2018 J. Phys.: Conf. Ser. 1087

More information

Space-Optimal, Wait-Free Real-Time Synchronization

Space-Optimal, Wait-Free Real-Time Synchronization 1 Space-Optmal, Wat-Free Real-Tme Synchronzaton Hyeonjoong Cho, Bnoy Ravndran ECE Dept., Vrgna Tech Blacksburg, VA 24061, USA {hjcho,bnoy}@vt.edu E. Douglas Jensen The MITRE Corporaton Bedford, MA 01730,

More information

Petri Net Based Software Dependability Engineering

Petri Net Based Software Dependability Engineering Proc. RELECTRONIC 95, Budapest, pp. 181-186; October 1995 Petr Net Based Software Dependablty Engneerng Monka Hener Brandenburg Unversty of Technology Cottbus Computer Scence Insttute Postbox 101344 D-03013

More information

Scheduling and queue management. DigiComm II

Scheduling and queue management. DigiComm II Schedulng and queue management Tradtonal queung behavour n routers Data transfer: datagrams: ndvdual packets no recognton of flows connectonless: no sgnallng Forwardng: based on per-datagram forwardng

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning Parallel Inverse Halftonng by Look-Up Table (LUT) Parttonng Umar F. Sddq and Sadq M. Sat umar@ccse.kfupm.edu.sa, sadq@kfupm.edu.sa KFUPM Box: Department of Computer Engneerng, Kng Fahd Unversty of Petroleum

More information

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Evaluation of Parallel Processing Systems through Queuing Model

Evaluation of Parallel Processing Systems through Queuing Model ISSN 2278-309 Vkas Shnde, Internatonal Journal of Advanced Volume Trends 4, n Computer No.2, March Scence - and Aprl Engneerng, 205 4(2), March - Aprl 205, 36-43 Internatonal Journal of Advanced Trends

More information

WCET-Directed Dynamic Scratchpad Memory Allocation of Data

WCET-Directed Dynamic Scratchpad Memory Allocation of Data WCET-Drected Dynamc Scratchpad Memory Allocaton of Data Jean-Franços Deverge and Isabelle Puaut Unversté Européenne de Bretagne / IRISA, Rennes, France Abstract Many embedded systems feature processors

More information

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL) Crcut Analyss I (ENG 405) Chapter Method of Analyss Nodal(KCL) and Mesh(KVL) Nodal Analyss If nstead of focusng on the oltages of the crcut elements, one looks at the oltages at the nodes of the crcut,

More information

Optimized Resource Scheduling Using Classification and Regression Tree and Modified Bacterial Foraging Optimization Algorithm

Optimized Resource Scheduling Using Classification and Regression Tree and Modified Bacterial Foraging Optimization Algorithm World Engneerng & Appled Scences Journal 7 (1): 10-17, 2016 ISSN 2079-2204 IDOSI Publcatons, 2016 DOI: 10.5829/dos.weasj.2016.7.1.22540 Optmzed Resource Schedulng Usng Classfcaton and Regresson Tree and

More information

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems Real-tme Fault-tolerant Schedulng Algorthm for Dstrbuted Computng Systems Yun Lng, Y Ouyang College of Computer Scence and Informaton Engneerng Zheang Gongshang Unversty Postal code: 310018 P.R.CHINA {ylng,

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

Memory Modeling in ESL-RTL Equivalence Checking

Memory Modeling in ESL-RTL Equivalence Checking 11.4 Memory Modelng n ESL-RTL Equvalence Checkng Alfred Koelbl 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 koelbl@synopsys.com Jerry R. Burch 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 burch@synopsys.com

More information