A Generic and Compositional Framework for Multicore Response Time Analysis

Size: px
Start display at page:

Download "A Generic and Compositional Framework for Multicore Response Time Analysis"

Transcription

1 A Generc and Compostonal Framework for Multcore Response Tme Analyss Sebastan Altmeyer Unversty of Luxembourg Unversty of Amsterdam Clare Maza Grenoble INP Vermag Robert I. Davs Unversty of York INRIA, Pars-Rocquencourt Vncent Nels CISTER, ISEP, Porto Leandro Indrusak Unversty of York Jan Reneke Saarland Unversty ABSTRACT In ths paper, we ntroduce a Multcore Response Tme Analyss (MRTA) framework. Ths framework s extensble to dfferent multcore archtectures, wth varous types and arrangements of local memory, and dfferent arbtraton polces for the common nterconnects. We nstantate the framework for sngle level local data and nstructon memores (cache or scratchpads), for a varety of memory bus arbtraton polces, ncludng: Round-Robn, FIFO, Fxed-Prorty, Processor-Prorty, and TDMA, and account for DRAM refreshes. The MRTA framework provdes a general approach to tmng verfcaton for multcore systems that s parametrc n the hardware confguraton and so can be used at the archtectural desgn stage to compare the guaranteed levels of performance that can be obtaned wth dfferent hardware confguratons. The MRTA framework decouples response tme analyss from a relance on context ndependent WCET values. Instead, the analyss formulates response tmes drectly from the demands on dfferent hardware resources. 1. INTRODUCTION Effectve analyss of the worst-case tmng behavour of systems bult on multcore archtectures s essental f these hgh performance platforms are to be deployed n crtcal real-tme embedded systems used n the automotve and aerospace ndustres. We dentfy four dfferent approaches to solvng the problem of determnng tmng correctness. Wth sngle core systems, a tradtonal two-step approach s typcally used. Ths conssts of tmng analyss whch determnes the context-ndependent worst-case executon tme (WCET) of each task, followed by schedulablty analyss, whch uses task WCETs and nformaton about the processor schedulng polcy to determne f each task can be guaranteed to meet ts deadlne. When local memory (e.g. cache) s present, then ths approach can be augmented by analyss of Cache Related Pre-empton Delays (CRPD) [3], or by parttonng the cache to avod CRPD altogether. Both approaches are effectve and result n tght upper bounds on task response tmes [4]. Wth a multcore system, the stuaton s more complex snce WCETs are strongly dependent on the amount of cross-core nterference on shared hardware resources such as man memory, L2-caches, and common nterconnects, due to tasks runnng on other cores. The uncertanty and varablty n ths cross-core nterference renders the tradtonal two-step process neffectve for many multcore processors. For example, on the Freescale P4080, the latency of a read operaton vares from 40 to 600 cycles dependng on the total number of cores runnng and the number of competng tasks [35]. Smlarly, a 14 tmes slowdown has been reported [38] due to nterference on the L2-cache for tasks runnng on Intel Core 2 Quad processors. At the other extreme s a fully ntegrated approach. Ths nvolves consderng the precse nterleavng of nstructons orgnatng from dfferent cores [22]; however, such an approach suffers from potentally nsurmountable problems of combnatoral complexty, due to the prolferaton of dfferent path combnatons, as well as dfferent release tmes and schedules. An alternatve approach s based on temporal solaton [15]. The dea here s to statcally partton the use of shared resources, e.g. space parttonng of cache and DRAM banks, tme parttonng of bus access, so that context-ndependent WCET values can be used and the tradtonal two-step process appled. Ths approach rases a further challenge, how to partton the resources to obtan schedulablty [39]. Technques whch seek to lmt the worst-case cross-core nterference, for example by usng TDMA arbtraton on the memory bus or by lmtng the amount of contenton by suspendng executon on certan cores [35], can have a sgnfcant detrmental effect on performance, effectvely negatng the performance benefts of usng a multcore system altogether. We note that TDMA s rarely f ever used as a bus-arbtraton polcy n real multcore processors, snce t s not work-conservng and so wastes sgnfcant bandwdth. Ths mpacts both worst-case and average-case performance; essental for applcaton areas such as telecommuncatons, whch have a major nfluence on processor desgn. The fnal approach s the one presented n ths paper, based on explct nterference modellng. We explore the premse that due to the strong nterdependences between tmng analyss and schedulablty analyss on multcore systems, they need to be consdered together. In our approach, we omt the noton of WCET per se and nstead drectly target the calculaton of task response tmes. In ths work, we use executon traces to model the behavour of tasks. Traces provde a smple yet expressve way to model task behavour. Note that relyng on executon traces does not pose a fundamental lmtaton to our approach as all requred performance quanttes can also be derved usng statc analyss [31, 20, 1] as wthn the tradtonal context-ndependent tmng analyss; however, traces enable a near-trval statc cache analyss and so allow us to focus on response tme analyss. The man performance metrcs are the processor demand and the memory demand of each task. The latter quantty feeds nto analyss of the arbtraton polcy used by the common nterconnect, enablng us to upper bound the total memory access delays whch may occur durng the response tme of the task. By computng the overall processor demand and memory demand over a relatvely long nterval of tme (.e. the task response tme),

2 as opposed to summng the worst case over many short ntervals (e.g. ndvdual memory accesses), we are able to obtan much tghter response tme bounds. The Multcore Response Tme Analyss framework (MRTA) that we present s extensble to dfferent types and arrangements of local memory, and dfferent arbtraton polces for the common nterconnect. In ths paper, we nstantate the MRTA framework assumng the local memores used for nstructons and data are sngle-level and ether cache, scratchpad, or not present. Further, we assume that the memory bus arbtraton polcy may be TDMA, FIFO, Round-Robn, or Fxed-Prorty (based on task prortes), or Processor-Prorty. We also account for the effects of DRAM refresh [5, 11]. The general approach emboded n the MRTA framework s extensble to more complex, mult-level memory herarches, and other sources of nterference. It provdes a general tmng verfcaton framework that s parametrc n the hardware confguraton (common nterconnect, local memores, number of cores etc.) and so can be used at the archtectural desgn stage to compare the guaranteed levels of performance that can be obtaned wth dfferent hardware confguratons, and also durng the development and ntegraton stages to verfy the tmng behavour of a specfc system. Whle the specfc hardware models and ther mathematcal representatons used n ths paper cannot capture all of the nterference and complexty of actual hardware, they serve as a vald startng pont. They nclude the domnant sources of nterference and represent current archtectures reasonably well. The rest of the paper s organsed as follows. Secton 2 dscusses the related work. Secton 3 descrbes the system model and notaton used. Sectons 4 and 5 show how the effects of a local memory and the common nterconnect can be modelled. Secton 6 presents the core of our framework, nterference-aware Multcore Response Tme Analyss (MRTA). Ths analyss ntegrates processor and memory demands accountng for cross-core nterference. Extensons to the presented analyss are dscussed n Secton 7. Secton 8 descrbes the results of an expermental evaluaton usng the MRTA framework, and Secton 9 concludes wth a summary and perspectves on future work. 2. RELATED WORK In 2007, Rosen et al. [40] proposed an mplementaton n whch TDMA slots on the bus are statcally allocated to cores. Ths technque reles on the avalablty of a user-programmable table-drven bus arbter, whch s typcally not avalable n real hardware, and on knowledge at desgn tme, of the characterstcs of the entre workload that executes on each core. Chattopadhyay et al. [17] and Kelter et al. [25] proposed an analyss whch takes nto account a shared bus and nstructon cache, assumng separate buses and memores for both code and data (uncommon n real hardware) and TDMA bus arbtraton. The method has a lmted applcablty as t does not address data accesses to memory. In 2010, Schranzhofer et al. [42] developed a framework for analysng the worst-case response tme of real-tme tasks on a mult core wth TDMA arbtraton. Ths was followed by work on resource adaptve arbters [43]. They proposed a task model n whch tasks consst of sequences of super-blocks, themselves dvded nto phases that represent mplct communcaton (fetchng or wrtng of data from/to memory), computaton (processng the data), or both. Contrary to the technque presented here, ther approach requres major program nterventon and compler assstance to prefetch data. Also n 2010, Lv et al. [33] proposed a method to model request patterns and the memory bus usng tmed automata. Ther method handles nstructon accesses only and may suffer from state-space exploson when appled to data accesses. A method employng tmed automata was proposed by Gustavsson et al. [22] n whch the WCET s obtaned by provng specal predcates through model checkng. Ths approach allows for a detaled system modellng but s also prone to the state-space exploson problem. In 2014, Kelter et al. [26] analysed the maxmum bus arbtraton delays for multprocessor systems sharng a TDMA bus and usng both (prvate) L1 and (shared) L2 nstructon and data caches. Pellzzon et al. [37] compute an upper bound on the contenton delay ncurred by perodc tasks, for systems comprsng any number of cores and perpheral buses sharng a sngle man memory. Ther method does not cater for non-perodc tasks and does not apply to systems wth shared caches. In addton t reles on accurate proflng of cache utlzaton, sutable assgnment of the TDMA tme-slots to the tasks super-blocks, and mposes a restrcton on where the tasks can be pre-empted. Schlecker et al. [41] proposed a method that employs a general event-based model to estmate the maxmum load on a shared resource. Ths approach makes very few assumptons about the task model and s thus qute generally applcable. However, t only supports a sngle bus arbter that s an unspecfed work-conservng arbter. Paoler et al. [36] proposed a hardware platform that enforces a constant upper bound on the latency of each access to a shared resource. Ths approach enables the analyss of tasks n solaton snce the nterference on other tasks can be conservatvely accounted for usng ths bound on the latency. Smlarly, the PTARM [32] enforces constant latences for all nstructons, ncludng loads and stores. However, both cases represent customzed hardware. Km et al. [27] presented a model to upper bound the memory nterference delay caused by concurrent accesses to a shared DRAM man memory. Ther work dffers from ths paper n that they do not assume a unque shared bus to access the man memory and they prmarly focus on the contenton at the DRAM controller by assumng a fully parttoned prvate and shared cache model. (For shared caches they smply assume that the extra number of requests generated due to cache lne evctons at runtme s gven). Yun et al. [46] proposed a software-based memory throttlng mechansm to explctly lmt the memory request rate of each core and thereby control the memory nterference. They also developed analytcal solutons to compute proper throttlng parameters that satsfy schedulablty of crtcal tasks whle mnmsng the performance mpact of throttlng. In 2015, Dasar et al. [18] proposed a general framework to compute the maxmum nterference caused by the shared memory bus and ts mpact on the executon tme of the tasks runnng on the cores. The method of computaton n [18] s more complex than that proposed n ths paper, and may be more accurate when t estmates the delay due to the shared bus, but t does not take cache-related effects nto account (by assumng parttoned caches), whch makes t less generc than the framework proposed here. Regardng shared caches, Yan and Zhang [45] addressed the problem of computng the WCET of tasks assumng drect mapped, shared L2 nstructon caches on multcores. The applcablty of the approach s unfortunately lmted as t makes very restrctve assumptons such as (1) data caches are perfect,.e. all accesses are hts, and (2) data references from dfferent threads wll not nterfere wth each other n the shared L2 cache. L et al. [30] proposed a method to estmate the worst-case response tme of concurrent programs runnng on multcores wth shared L2 caches, assumng set-assocatve nstructon caches usng the LRU replacement polcy. Ther work was later extended [17] by addng a TDMA bus analyss technque to bound the memory access delay. Fnally, one must also note that some other technques, such as [14, 19] for nstance, am at modfyng the schedulng algorthm

3 so that ts schedulng decsons reduce the mpact of the CRPD. 3. SYSTEM MODEL In ths paper, we provde a theoretcal framework that can be nstantated for a range of dfferent multcore archtectures wth dfferent types of memory herarchy and dfferent arbtraton polces for the common nterconnect. Our am s to create a flexble, adaptable, and generc analyss framework wheren a large number of common multcore archtecture desgns can be modeled and analysed. In ths paper nevtably we can only cover a lmted number of types of local memory, bus, and global memory behavour. We select common approaches to model the dfferent hardware components and ntegrate them nto an extensble framework. 3.1 Multcore Archtectural Model We model a generc multcore platform wth l tmng-compostonal cores P 1,... P l as depcted n Fgure 1. By tmng-compostonal cores we mean cores where t s safe to separately account for delays from dfferent sources, such as computaton on a gven core and nterference on a shared bus [23]. The set of cores s defned as P. Each core has a local memory whch s connected va a shared bus to a global memory and IO nterface. We assume constant delays d man to retreve data from global memory under the assumpton of an mmedate bus access,.e., no wat-cycles or contenton on the bus. We assume atomc bus transactons,.e., no splt transactons, whch furthermore are not re-ordered, and non-preemptable busy watng on the processor for requests to be servced. Further, we assume that bus access may be gven to cores for one access at a tme. The types of the memores and the bus polcy are parameters that can be nstantated to model dfferent multcore systems. In ths paper, we IO/ global memory Core Loc Mem Loc Mem Core Core Loc Mem Loc Mem Core Core Loc Mem Loc Mem Core Fgure 1: Multcore Platform. A set of l processors wth local memores connected va a common bus to a global memory. omt a consderaton of delays due to cache coherence and synchronzaton, and we assume wrte-through caches only. Wrte-back caches are dscussed n Secton Task Model We assume a set of n sporadc tasks {τ 1,..., τ n }, each task τ has a mnmum perod or nter-arrval tme T and a deadlne D. Deadlnes are assumed to be constraned, hence D T. We assume that the tasks are statcally parttoned to the set of l dentcal cores {P 1,..., P l }, and scheduled on each processor usng fxed-prorty pre-emptve schedulng. The set of tasks assgned to core P x s denoted by Γ x. The ndex of each task s unque and thus provdes a global prorty order, wth τ 1 havng the hghest prorty and τ n the lowest. The global prorty of each task translates to a local prorty order on each core whch s used for schedulng purposes. We use hp() (lp()) to denote the set of tasks wth hgher (lower) prorty than that of task τ, and we use hep() (lep()) to denote the set of tasks wth hgher or equal (lower or equal) prorty to task τ. We ntally assume that the tasks are ndependent, n so far as they do not share mutually exclusve software resources (dscussed n Secton 7); nevertheless, the tasks compete for hardware resources such as the processor, local memory, and the memory bus. The executon of task τ s modelled usng a set of traces O, where each trace o = [ι 1,... ι k ] s an ordered lst of nstructons. For ease of notaton, we treat the ordered lst of nstructons as a mult-set, whenever we can abstract away from the specfc order. We dstngush three types of nstructons t: r[m da ] read data from memory block m da t = w[m da ] wrte data to memory block m da (1) e execute An nstructon ι s a trple consstng of the nstructon s memory address m n, ts executon tme wthout memory delays,.e., assumng a perfect local memory, and the nstructon type t: ι = (m n,, t) (2) The set of memory blocks s defned as M. M n denotes the nstructon memory blocks and M da the data memory blocks. We assume that data memory and nstructon memory are dsjont,.e, M n M da =. The use of traces to model a task s behavour s unusual as the number of traces s exponental n the number of control-flow branches. Despte ths obvous drawback, traces provde a smple yet expressve way to model task behavour. They enable a near-trval statc cache analyss and a smple multcore smulaton to evaluate the accuracy of the tmng verfcaton framework. However, most mportantly, traces show that the worst-case executon behavour of a task τ on a multcore system s not unquely defned. From the vewpont of a task scheduled on the same core, τ may have the hghest mpact when t uses the core for the longest possble tme nterval, whereas the mpact on tasks scheduled on any other core may be maxmzed when τ produces the largest number of bus accesses. These two cases may well correspond to dfferent executon traces. As a remedy for the exponental number of traces, the complexty can be reduced by () computng a synthetc worst-case trace or () by dervng the set of Pareto optmal traces that maxmze the task s mpact accordng to a pre-defned cost functon (see [31]). We can also completely resort to statc analyss to derve upper bounds on the performance metrcs. Statc analyses provde ndependent upper bounds on the dfferent performance quanttes. Ths strongly reduces the computatonal complexty, but may lead to pessmsm. An evaluaton of ths trade-off s future work. 4. MEMORY MODELLING In ths secton we show how the effects of a local memory can be modelled va a MEM functon whch descrbes the number of accesses due to a task whch are passed to the next level of the memory herarchy, n ths case man memory. The MEM functon s nstantated for both cache and scratchpads. We model the effect of a (local) memory usng a functon of the form: MEM: O N 2 2N 2 N (3) where MEM(o) = (MD o, UCB o, ECB o ) computes, for a trace o, the number of bus accesses.e., the number of memory accesses whch cannot be served by the local memory alone (denoted as memory demand MD), UCB o whch denotes a multset contanng, for each program pont n trace o, the set of Useful Cache Blocks (UCBs) [28], whch may need to be reloaded when trace o s pre-empted at that program pont, and the set of Evctng Cache Blocks (ECBs) whch s the set of all cache blocks accessed by trace o whch may evct memory blocks of other tasks. The

4 value MD does not just cover cache msses, but also has to account for wrte accesses. In the case of wrte-through caches, each wrte access wll cause a bus access, rrespectve of whether or not the memory block s present n cache. The number of bus accesses MD assumes non-preemptve executon. Wth pre-emptve executon and caches, more than MD memory accesses can contrbute to the bus contenton due to cache evcton. In ths paper, we make use of the CRPD analyss for fxed prorty pre-emptve schedulng ntroduced by Altmeyer et al. [3]. We now derve nstantatons of the functon MEM(o) for a trace o = [ι 1,..., ι k ] for nstructon memores and data memores for systems () wthout cache, () wth scratchpads, and () wth drect-mapped or LRU caches. In the followng, the superscrpts ndcate data (da) or nstructon memory (n), the subscrpts the type of memory,.e., uncached (nc), scratchpad (sp), or caches (ca). 4.1 Uncached Consderng nstructon memory, the number of bus accesses for a system wth no cache s gven by the number of nstructons k n the trace. The set of UCBs and ECBs are empty, as pre-empton has no effect on the performance of the local memory, snce none exsts. MEM n nc(o) = (k,, ) (4) Consderng data memory, we have to account for the number of data accesses, rrespectve of read or wrte access. The number of accesses s thus equal to the number of data access nstructons. ( MEM da { nc(o) = ι ι o ι = (_, _, r/w[m da ]) },, ) (5) 4.2 Scratchpads A scratchpad memory s defned usng a functon SPM: M {true, f alse}, whch returns true for memory blocks that are stored n the scratchpad. For ease of presentaton, we assume a statc a wrte-through scratchpad confguraton, whch does not change at runtme. An extenson to dynamc scratchpads and the wrte-back polcy s straght-forward, but beyond the scope of ths paper. Each memory access to a memory block whch s not stored n the scratchpad causes an addtonal bus access. ( MEM n { sp(o) = m n (m n, _, _) o SPM(m n ) },, ) Further, n the case of wrte accesses, even f a memory block s stored n the scratchpad, that access also contrbutes to the bus contenton as we assume a wrte-through polcy. ( MEM da { sp(o) = m da ( (_, _, r(m da )) o SPM(m da ) ) (_, _, w(m da )) o } ),, (7) The sets of UCBs and ECBs are empty as no pre-empton overhead s assumed wth statc scratchpad memory. Dynamc scratchpad management s dscussed n Secton Caches We assume a functon Ht: I M {true, f alse}, whch classfes each memory access at each nstructon as a cache ht or a cache mss. Ths functon can be derved usng cache smulaton of the access trace startng wth an empty cache or by usng tradtonal cache analyss [20], where each unclassfed memory access s consdered a cache mss. Ths means that we upper bound the number of cache msses. For each possble pre-empton (6) pont ι on trace o, the set of UCBs s derved usng the correspondng analyss descrbed n Altmeyer s thess [1], Chapter 5, Secton 4. It s suffcent to only store the cache sets a useful memory blocks maps to, nstead of the useful memory blocks. The multset UCB o then contans, for each program pont ι n trace o, the set of UCBs at that program pont,.e, UCB o = ι o UCB ι. The set of ECBs s the set of all cache sets of memory blocks on trace o. MEM n ca(o) = ( { m n ι = (m n, _, _) o Ht(m n, ι ) } ), UCB n o, ECB n o Snce we assume a wrte-through polcy, each wrte access contrbutes to the cache contenton and has to be treated accordngly. ( MEM da { ca(o) = m da ( ι = (_, _, r(m da ) ) o Ht(m da, ι ) ) (_, _, w(m da )) o } ), UCB da o, ECB da o (9) 4.4 Memory Combnatons To allow dfferent combnatons of local memores, for example scratchpad memory for nstructons and an LRU cache for data, we defne the combnaton of nstructon memory MEM n and data memory MEM da as follows MEM(o) = ( MD n o + MD da o, UCB n o UCB da ) o, ECB n o ECB da o (10) ( wth MEM n (o) = MD n o, UCB n ) o, ECB n o beng the result for the ( nstructon memory and MEM da (o) = MD da o, UCB da ) o, ECB da o for the data memory. 5. BUS MODELLING In ths secton we show how the memory bus delays experenced by a task can be modelled va a BUS functon of the form: (8) BUS: N P N N (11) where BUS(, x, t) determnes an upper bound on the number of bus accesses that can delay task τ on processor P x durng a tme nterval of length t. Ths abstracton covers a varety of bus arbtraton polces, ncludng Round-Robn, FIFO, Fxed-Prorty, and Processor-Prorty, all of whch are work-conservng, and also TDMA whch s not work-conservng. We now ntroduce the mathematcal representatons of the delays ncurred under these arbtraton polces. We note that the framework s extensble to a wde varety of dfferent polces. The only constrants we place on nstantatons of the BUS(, x, t) functon s that they are monotoncally non-decreasng n t. Let τ be the task of nterest, and x the ndex of the processor P x on whch t executes. Other task ndces are represented by j, k etc. whle y, z are used for processor ndces. Let S x (t) denote an upper bound on the total number of bus accesses due to τ and all hgher prorty tasks that run on processor P x durng an nterval of length t. Let A y j (t) be an upper bound on the total number of bus accesses due to all tasks of prorty j or hgher executng on some processor P y P x durng an nterval of length t. (Note, j may not necessarly be the prorty of a task allocated to processor P y ). As memory bus requests are typcally non-preemptve, one

5 lower prorty 1 memory request may block a hgher prorty one, snce the global, shared memory may have just receved a lower prorty request before the hgher prorty one arrves. To account for these blockng accesses, we use L y j (t) whch denotes an upper bound on the total number of bus accesses due to all tasks of prorty lower than j executng on some other processor P y P x durng an nterval of length t. In Secton 6 we show how the values of S x(t), Ay j (t) and Ly j (t) are computed and explan why S x (t) and A y j (t) are subtly dfferent and hence requre dstnct notaton. In the followng equatons for the BUS(, x, t) functon, we account for blockng due to one non-preemptve access from lower prorty tasks runnng on the same core P x as task τ (.e. +1 n the equatons). Ths holds because such blockng can only occur at the start of the the prorty level- (processor) busy perod. For a Fxed-Prorty bus wth memory accesses nhertng the prorty of the task that generates them, we have: BUS(, x, t) = S x (t) + A y (t) + mn S x (t), L y (t) + 1 (12) y x The term mn ( S x(t), y x L y (t)) upper bounds the blockng due to tasks of lower prorty than τ runnng on other cores. For a Processor-Prorty bus wth memory accesses nhertng the prorty of the core rather than the task, we have: BUS(, x, t) = S x (t)+ A y n(t)+mn S x (t), A y n(t) +1 (13) y HP(x) y x y LP(x) where HP(x) (LP(x)) s the set of processors wth hgher (lower) prorty than that of P x, and n s the ndex of the task wth the lowest prorty. The term A y n(t) thus captures the nterference of all tasks runnng on processor y, ndependent of ther prorty, and the term mn ( S x(t), y x A y n(t) ) upper bounds the blockng due to tasks runnng on processors wth prorty lower than that of P x. For a FIFO bus, we assume that all accesses generated on the other processors may be servced ahead of the last access of τ, hence we have: BUS(, x, t) = S x (t) + A y n(t) + 1 (14) y x Note accesses from other cores do not contrbute blockng snce we already pessmstcally account for all these accesses n the summaton term. For a Round-Robn bus wth a cycle consstng of an equal number of slots v per processor, we have: BUS(, x, t) = S x (t) + mn(a y n(t), v S x (t)) + 1 (15) y x The worst-case stuaton occurs when each access n S x (t) s delayed by each core P y P x for v slots. Interference by core P y s lmted to the number of accesses from core P y. Agan, as we already account for all accesses from all other cores, there s no separate contrbuton to blockng. Note unlke TDMA, Round-Robn moves to the next slot mmedately f a processor has no access pendng. For a TDMA bus wth v adjacent slots per core n a cycle of length l v, we have: BUS(, x, t) = S x (t) + ((l 1) v) S x (t) + 1 (16) Snce TDMA s not work-conservng, the worst case corresponds to each access n S x (t) just mssng a slot for processor P x and hence havng to wat at most ((l 1) v+1) slots to be servced. Effectvely, 1 Here we mean prortes on the bus, whch are not necessarly the same as task prortes. there s addtonal nterference from the (l 1) v slots reserved for other processors on each access, rrespectve of whether these slots are used or not. As all accesses due to hgher prorty tasks on P x may be servced pror to the last access of task τ we requre S x(t) accesses n total to be servced for P x. Note that when v = 1, Equaton (16) smplfes to BUS(, x, t) = l S x (t) + 1. It s nterestng to note that whle TDMA provdes more predctable behavour, ths s at a cost of sgnfcantly worse guaranteed performance over long tme ntervals (e.g. the response tme of a task) due to the fact that t s not work-conservng. Effectvely, ths means that the memory accesses of a task may suffer addtonal nterference due to empty slots on the bus. Nevertheless, Round-Robn behaves lke TDMA when all other cores create a large number of competng memory accesses. We note that the equal number of slots per core for Round-Robn and TDMA, and the groupng of slots per core are smplfyng assumptons to exemplfy how TDMA and Round-Robn buses can be analysed. An analyss for more complex confguratons s reserved for future work. 6. RESPONSE TIME ANALYSIS In ths secton, we present the centre pont of our tmng verfcaton framework: nterference-aware Multcore Response Tme Analyss (MRTA). Ths analyss ntegrates the processor and memory demands of the task of nterest and hgher prorty tasks runnng on the same processor, ncludng CRPD. It also accounts for the cross-core nterference on the memory bus due to tasks runnng on the other processors. A task set s deemed schedulable, f for each task τ, the response tme R s less than or equal to ts deadlne D : : R D schedulable The tradtonal response tme calculaton [6] [24] for fxed-prorty pre-emptve schedulng on a unprocessor s based on an upper bound on the WCET of each task τ, denoted by C. By contrast, our MRTA framework dssects the ndvdual components (processor and memory demands) that contrbute to the WCET bound and re-assembles them at the level of the worst-case response tme. It thus avods the over-approxmaton nherent n usng context-ndependent WCET bounds. In the followng, we assume that τ s the task of nterest whose schedulablty we are checkng, and P x s the processor on whch t runs. Recall that there s a unque global orderng of task prortes even though the schedulng s parttoned wth a fxed-prorty preemptve scheduler on each processor. 6.1 Interference on the Core We compute the maxmal processor demand PD for each task τ as follows: PD = max o O (_,,_) o (17) where s the executon tme of an nstructon wthout memory delays. Task τ suffers nterference I PROC (, x, t) on ts core P x due to tasks of hgher prorty runnng on the same core wthn a tme nterval of length t startng from the crtcal nstant: t I PROC (, x, t) = PD j (18) j Γ x j hp() 6.2 Interference on the local memory Local memory mproves a task s executon tme by reducng the number of accesses to man memory The memory demand of a trace gves the number of accesses that go to man memory and hence the bus, despte the presence of the local memory. The T j

6 maxmal memory demand MD of a task τ s defned by the maxmum number of bus accesses of any of ts traces: { MD = max MD } MEM(o) = (MD, _, _) (19) o O Note that the maxmal memory demand refers to the demand of the combned nstructon and data memory as defned n Equaton (10). The memory demand MD s derved assumng non-preemptve executon,.e. that the task runs to completon wthout nterference on the local memory. The sets of UCBs and ECBs are used to compute the addtonal overhead due to pre-empton. In the computaton of ths overhead, we use the sets of UCBs per trace o to preserve precson, UCB o = UCB wth MEM(o) = (_, UCB, _) (20) and derve the maxmal set of ECBs per task τ as the unon of the ECBs on all traces. ECB = {ECB } MEM(o) = (_, _, ECB) (21) o O We use γ, j,x (wth j hp()) to denote the overhead (addtonal accesses) due to a pre-empton of task τ by task τ j on core P x. We use the ECB-Unon [2] approach as an exemplar of CRPD analyss, as t provdes a reasonably precse bound on the pre-empton overhead wth low complexty. Other technques [3] [29] could also be ntegrated nto ths framework, but we omt the explanaton due to space constrants. The ECB-Unon approach consders the UCBs of the pre-empted task per pre-empton pont and assumes that the pre-emptng task τ j has tself already been pre-empted by all tasks wth hgher prorty on the same processor P x. Ths nested pre-empton of the pre-emptng task s represented by the unon of the ECBs of all tasks wth hgher or equal prorty than task τ j (see [3] for a detaled descrpton). γ, j,x = max k hep() lp( j) k Γ x max o O k max UCB ι UCB o UCB ι ECB h h hep( j) h Γ x (22) 6.3 Interference on the Bus We now compute the number of accesses that compete for the bus durng a tme nterval of length t, equatng to the worst-case response tme of the task of nterest τ. We use S x (t) to denote an upper bound on the total number of bus accesses that can occur due to tasks runnng on processor P x durng that tme. Snce lower prorty tasks cannot execute on P x durng the response tme of task τ (a prorty level- processor busy perod), the only contrbuton from those tasks s a sngle blockng access as dscussed n Secton 5. The maxmum delay s computed assumng task τ s released smultaneously wth all hgher prorty tasks that run on P x, and subsequent releases of those tasks occur as soon as possble, whle also assumng that the maxmum possble number of preemptons occur. S x (t) = k Γ x k hep() t T k (MDk ) + γ,k,x (23) MD k denotes the memory demand of task τ k and γ,k,x accounts for the pre-empton costs on core P x due to jobs of task τ k. We use A y j (t) to denote an upper bound on the total number of bus accesses due to all tasks of prorty j or hgher executng on processor P y P x durng an nterval of length t. A specal case s A y n(t): snce τ n s the lowest prorty task, ths term ncludes accesses due to all tasks runnng on processor P y. In contrast to the dervaton of S x(t), for Ay n(t) we can make no assumptons about the synchronsaton or otherwse of tasks on processor P y wth respect to the release of task τ on processor P x. The value of A y j (t) s therefore obtaned by assumng for each task, that the frst job executes as late as possble,.e. just pror to ts worst-case response tme, whle the next and subsequent jobs execute as early as possble. We assume that the frst nterferng job of a task τ k has all of ts memory accesses as late as possble durng ts executon, whle for subsequent jobs the opposte s true, wth executon and memory accesses occurrng as early as possble after release of the job. Ths treatment s smlar to the concept of carry-n nterference used n the analyss of global multprocessor fxed-prorty schedulng [10], and s llustrated n Fgure 2. The R k Memory accesses Executon Fgure 2: Illustraton of the carry-n nterference analyss. number of complete jobs of task τ k contrbutng accesses n an nterval of length t on processor y s gven by: t + N y j,k (t) = Rk (MD k + γ j,k,y ) d man (24) T k Note the term (MD k + γ j,k,y ) d man represents the tme for the memory accesses. Hence the total number of accesses possble n an nterval of length t due to task τ k and ts cache related preempton effects s gven by: W y j,k (t) = Ny j,k (t) (MD k + γ j,k,y )+ mn ( t + R k (MD k + γ j,k,y ) d man N y j,k MD k + γ j,k,y, (t) T k ) d man (25) Hence we have: A y j (t) = t k Γ y k hep( j) T k W y j,k (t) (26) The value of L y j (t) s obtaned n a smlar way to Ay j, but consderng accesses wth lower prorty than j: L y j (t) = k Γ y k lp( j) W y n,k (t) (27) We note that the carry-n nterference has not been accounted for n [27] Equaton (5) and (6), resultng n potentally optmstc bounds on the number of competng memory requests n [27]. The number of accesses on the cores are used as nput to the BUS functon (see Secton 5), whch we use to derve the maxmum bus delay that task τ on processor P x can experence durng a tme nterval of length t, I BUS (, x, t) = BUS(, x, t) d man (28) where d man s the bus access latency to the global memory. 6.4 Global Memory So far we have assumed a global memory wth a constant access latency d man. Global memory s usually realzed based on dynamc random-access memory (DRAM), whch needs to be refreshed perodcally. Now, we show how to relax the constant-latency assumpton to take nto account delays mposed by refreshes. We assume a DRAM controller wth a Frst Come

7 Frst Served (FCFS) schedulng polcy so that memory accesses cannot be reordered wthn the controller. Further, we assume a closed-page polcy to mnmze the effect of the memory access hstory on access latences. We consder two refresh strateges [34]: dstrbuted refresh where the controller refreshes each row at a dfferent tme, at regular ntervals, and burst refresh where all rows are refreshed mmedately one after another. Under burst refresh, an upper bound on the maxmum number of refreshes wthn an nterval of length t n whch m memory accesses occur s gven by: t DRAM burst (t, m) = #rows (29) T refresh where #rows s the number of rows n the DRAM module, and T refresh s the nterval at whch each row needs to be refreshed. T refresh s usually 64 ms for DDR2 and DDR3 modules. Under dstrbuted refresh, the upper bound s: ( ) t #rows DRAM dst (t, m) = mn m, (30) T refresh Ths s the case, snce at most one memory access can be delayed by each of the refreshes, whereas under burst refresh, a sngle memory access can be delayed by #rows many refreshes. As the number of memory accesses wthn t s equal to the number of BUS accesses, we can bound the nterference due to DRAM refreshes of task τ on core P x as follows: I DRAM (, x, t) = DRAM(t, BUS((, x, t)) d refresh (31) where d refresh s the refresh latency. 6.5 Multcore Response Tme Analyss The response tme R of task τ s gven by the smallest soluton to the followng recurrence relaton: R = PD + I PROC (, x, R ) + I BUS (, x, R ) + I DRAM (, x, R ) (32) where I PROC (, x, R ) s the nterference due to processor demand from hgher prorty tasks runnng on the same processor assumng no msses on the local memory (see Equaton (18)), I BUS (, x, R ) s the delay due to bus accesses from tasks runnng on all cores ncludng MD (see Equaton (28)), and I DRAM (, x, R ) s the delay due to DRAM refreshes (see Equaton (31)). Snce the response tme of each task can depend on the response tmes of other tasks va the functons (26) and (27) descrbng memory accesses A y j (t) and Ly j (t), we use an outer loop around a set of fxed-pont teratons to compute the response tmes of all the tasks, and deal wth an apparent crcular dependency. Iteraton starts wth : R = PD + MD d man and ends when all the response tmes have converged (.e. no response tme changes w.r.t. the prevous teraton), or the response tme of a task exceeds ts deadlne n whch case that task s unschedulable. See Algorthm 1 for a pseudo-code algorthm of the response tme calculaton. Snce the response tme R of a task τ s monotoncally ncreasng w.r.t. ncreases n the response tme of any other task, convergence or exceedng a deadlne s guaranteed n a bounded number of teratons. We note that the analyss s sustanable [8] wth respect to the processor PD j and memory demands MD j of each task, snce values that are smaller than the upper bounds used n the analyss cannot result n a larger response tme. Ths sustanablty extends to traces; f any trace of task executon results n practce n a lower processor or memory demand than that consdered by the analyss, then ths also cannot result n an ncrease n the response tme. Smlarly, a decrease n the set of UCBs or ECBs such that they are a subset of those consdered by the analyss cannot ncrease the worst-case response tme. Algorthm 1 Response Tme Computaton 1: functon MultCoreRTA 2: : R 0 = 0 3: : R 1 = PD + MD d man 4: l = 1 5: whle : R l Rl 1 : R l D do 6: for all do 7: R l,0 = R l 1 8: R l,1 = R l 9: k = 1 10: whle : R l,k R l,k 1 R l,k D do 11: R l,k+1 = PD + I PROC (, x, R l,k ) 12: +I BUS (, x, R l,k ) + I DRAM (, x, R l,k 13: k = k : end whle 15: end for 16: R l+1 = R l,k 17: l = l : end whle 19: f : R l D then 20: return schedulable 21: else 22: return not schedulable 23: end f 24: end functon Note that the defntons of MD, PD and ECB completely decouple the traces from the response tme analyss. Ths comes at the cost of possble pessmsm, but strongly reduces the complexty of the analyss. Dfferent traces may maxmze dfferent parameters, meanng that the combnaton of the parameters n ths way may represent a synthetc worst-case that cannot occur n practce. An alternatve soluton s to defne a multcore response tme analyss that s parametrc n the executon traces. In the extreme, completely expandng the analyss to explore every combnaton of traces from dfferent tasks would be ntractable. However, as a frst step n ths drecton, response tmes could be computed for each ndvdual trace of the task of nterest τ, usng combned traces for all other tasks. The maxmum such response tme would then provde an mproved upper bound. 7. EXTENSIONS Above, we nstantated the Multcore Response Tme Analyss (MRTA) framework for relatvely smple task and multcore archtectural models. In the secton, we brefly dscuss extensons ncludng: RTOS and nterrupts, dynamc scratchpad management, sharng software resources, open systems and ncremental verfcaton, wrte-back cache polces and mult-level caches. However, the presented analyss framework s not fne-tuned to specfc hardware features or executon scenaros such as burst accesses, snce ths counteracts ts extensblty and generalty. 7.1 RTOS and Interrupts The analyss presented n the paper only consders tasks and ther executon, as represented by traces. We now gve a bref outlne of how the MRTA framework can be extended to cover RTOS and nterrupt handler behavour. We assume that task release s trggered va nterrupts from a tmer/counter or other nterrupt sources. When an nterrupt s rased, the approprate handler s dspatched and may pre-empt the )

8 currently executng task 2. When the nterrupt handler returns, then f a hgher prorty task has been released, the scheduler wll run and dspatch that task, otherwse control returns to the prevously runnng task. When a task completes, then the scheduler agan runs and chooses the next hghest prorty task to execute. The behavour of each nterrupt handler s represented by a set of executon traces smlar to those for tasks. Thus nterrupt handlers can be ncluded n the MRTA framework n a smlar way to tasks, but at hgher prortes. (We note that there may be some dfferences f all nterrupts share the same nterrupt prorty level; however due to restrctons on space and the wde varety of possble arrangements of nterrupt prortes, we do not go nto detals here). In some cases, nterrupts may be prohbted from usng the cache, have ther own cache partton, or have ther code permanently locked nto a scratchpad. All of these possbltes can be covered usng varants of the analyss descrbed n the paper. The RTOS s dfferent from nterrupt handlers and tasks n that t s not a schedulable entty n tself, rather RTOS code s run as part of each task, typcally before and after the actual task code, and nterleaved wth t n the form of system calls. Smlarly wth nterrupt handlers that release tasks, RTOS code s typcally called as the handler returns. Wth our representaton of tasks and nterrupt handlers as sets of traces, executon of the RTOS can be fully accounted for by a concatenaton of the approprate sub-traces for the RTOS onto the start and end of the traces for tasks and nterrupt handlers. 7.2 Dynamc Scratchpad Management In Secton 4.2, we assumed that scratchpad contents were statc; however, dynamc scratchpad management schemes [44] are better able to make use of lmted scratchpad memory n multtaskng systems. In ths case pre-empton costs are ncurred, savng, loadng and restorng the scratchpad contents on each pre-empton. These operatons may be explct, mplemented by code n the operatng system, n whch case the addtonal processng and memory demands can easly be accounted for va the sub-traces for the RTOS. Alternatvely, these operatons may be under the control of specalsed DMA hardware [44] requrng specfc modellng of the addtonal memory demands. 7.3 Sharng Software Resources The analyss presented n the paper assumes that tasks are ndependent n the sense that they do not share software resources that must be accessed n mutual excluson, rather the only contenton s over hardware resources. We now consder how that restrcton can be lfted. We assume that tasks executng on the same processor may share software resources that are accessed n mutual excluson accordng to the Stack Resource Protocol (SRP) [7]. Under SRP, a task τ may be blocked from executng by at most a sngle crtcal secton where a task of prorty lower than locks a resource shared wth task τ or a task of hgher prorty. Further, under SRP, blockng only occurs before a task starts to execute, thus SRP ntroduces no extra context swtches. We assume a set of traces O B for all of the crtcal sectons that may block task τ. In the MRTA framework, the mpact of blockng needs to be consdered n terms of both processor and memory demand. Ths can be acheved by consderng the traces O B as belongng to a sngle vrtual task wth hgher prorty than τ. Thus we obtan a contrbuton PD B to the processor demand whch s added nto I (, x, t) and a contrbuton MD B to the memory demand whch contrbutes to S x(t). Accountng for the CRPD effects due to blockng are more complex and ts ntegraton nto the MRTA 2 Or nterrupt handler f multple nterrupt prorty levels are supported framework s beyond the scope of ths paper; the basc method s however explaned n [3]. We note that blockng due to software resources accessed by tasks on other processors does not affect the term A y n(t) snce SRP ntroduces no addtonal context swtches, and at the lowest prorty level n, there are no extra tasks to nclude n the CRPD computaton (see secton 5 of [3]). The value of A y j (t) used n the analyss of a Fxed Prorty bus s also unchanged due to resource accesses, snce we assume that the bus access prorty reflects only a task s base prorty, rather than any rased prorty as a result of SRP. 7.4 Open Systems and Incremental Verfcaton The basc analyss for the MRTA framework gven n the paper assumes that we have nformaton (.e. traces etc.) for all of the tasks n the system. There are a number of reasons why ths may not be the case: () the system may be open, wth tasks on one or more processors loadable post deployment, () the system may be under development and the tasks on another processor not yet known, () ncremental verfcaton may be requred, so no assumpton can be made about the tasks executng on another processor, (v) the system may be mxed crtcalty and tasks on another processor may not be developed to the same crtcalty level, and hence cannot be assumed to be well behaved. Instead we must assume they may exhbt the worst possble behavour. For a processor P y where we have no nformaton, or need to assume the worst, then we may replace A y j (t) and Ay n(t) wth a functon that represents contnual generaton of memory accesses at the maxmum possble rate. In practce, ths may be equvalent to smply settng A y j (t) = Ay n(t) =. We note that analyss for TDMA and round-robn bus arbtraton stll results n bounded response tmes n ths case, whle the analyss for FIFO and Fxed Prorty arbtraton wll result n unbounded response tmes. Wth arbtraton based on Processor Prorty, then bounded response tmes can only be obtaned f P y s a lower prorty processor than P x. 7.5 Caches wth a Wrte-Back Polcy In the paper, we consder wrte-through caches only; however, n practce wrte-back caches are usually preferred, as they reduce the number of accesses to man memory, and thus ncrease performance. Wrte-back caches ntroduce three challenges for future work: The frst challenge s to devse analyses that precsely bound the number of wrte backs, whch s equal to the number of evctons of drty cache lnes. The second and perhaps greater challenge s that wrte backs correspondng to the executon of a task τ may occur after the termnaton of τ and thus contrbute to the delay of another task. Thrdly, wrte-back caches requre the mplementaton of coherence protocols, whch may generate addtonal traffc on the memory bus, whch would have to be safely bounded. A nave soluton to the frst two challenges assumes pessmstcally that each cache lne s drty and thus each cache evcton leads to two bus accesses. Alternatvely, we can derve for each task n a closed system a set of drty-cache lnes, whch have to be wrtten back f evcted by another task. Wrte-backs can then be consdered an addtonal source of nterference n the framework. Detals analyss for wrte-back caches remans an nterestng area for future work. 7.6 Mult-level Caches Modern multcore processors often feature multple cache levels, where usually one level s shared between multple cores. Dealng wth such a scenaro n our framework s n prncple feasble. As long as all caches are prvate, the challenge would be to ntegrate an extenson of CRPD analyss to multple cache

9 ARMv7 ICache DCache ICache DCache ARMv7 ARMv7 ICache ICache DCache DCache ARMv7 IO/ global memory Fgure 3: Multcore Archtecture Case Study: m = 4 cores wth local caches connected va a common bus to a global memory. levels. Chattopadhyay and Roychoudhury [16], have recently proposed such an analyss for non-nclusve memory herarches. Shared second- or thrd-level caches add the extra complcaton of cross-core nterference on the cache. Dfferent more or less precse and effcent approaches to bound ths nterference are concevable, and agan form an nterestng area for future work. 8. EXPERIMENTAL EVALUATION In ths secton we descrbe the results of an expermental evaluaton usng the MRTA framework 3. For the evaluaton, we use the Mälardalen benchmark sute [21] to provde traces. We model a multcore systems based on an ARM Cortex A5 multcore 4 as a reference archtecture to provde a cache confguraton and memory and bus latences. As ths work s ntended to provde an overvew of our generc and extensble framework, we do not model all detals of the specfc multcore archtecture. A case study comparng measurements on a real hardware wth the computed bounds s future work. The reference archtecture depcted n Fgure 3 s confgured as follows: It has 4 ARMv7 cores connected to the global memory/io over a shared bus assumng a round-robn arbtraton polcy and a core frequency of 200MHz. Each core has separate nstructon and data caches, wth 256 cache sets each and a block sze of 32Bytes. The global memory latency d man and the DRAM refresh latency d refresh are both 5 cycles. The DRAM refresh perod T refresh s 64 ms. We assume the DRAM mplements the dstrbuted refresh strategy (see Secton 6.4). We examne dervatves of the reference confguraton assumng the dfferent bus arbtraton polces presented n Secton 5 and a hypothetcal perfect bus whch elmnates all bus nterference f the bus utlzaton s 1. We compare the reference confguraton wth two alternatves archtectures: The frst, referred to as full-solaton archtecture mplements complete spatal and temporal solaton. The local caches are parttoned wth an equal partton sze for each task and the bus uses a TDMA arbtraton polcy. All other parameters reman the same as n the reference archtecture. The performance on the solaton archtecture corresponds to the tradtonal two-step approach to tmng verfcaton wth context-ndependent WCETs. The second alternatve, referred to as uncached archtecture, assumes no local caches except for a buffer of sze 1, and uses round-robn bus arbtraton. All other parameters are agan the same as the reference confguraton. The traces for the benchmarks were generated usng the gem5 nstructon set smulator [13] and contan statcally lnked lbrary calls. As the benchmark code corresponds to ndependent tasks, no data s shared between the tasks. Table 1 shows nformaton for all 39 benchmark programs 3 The software s avalable on demand. 4 cortex-a5.php used to provde traces ncludng the total number of nstructons (whch s equal to the processor demand), the number of read/wrte operatons, the memory demand, and the maxmum number of UCBs and ECBs on the reference multcore archtecture. Each benchmark s assgned only one trace, whch s suffcent due to the smple structure of the benchmark sute: The benchmarks are ether sngle-path or worst-case nput s provded. Despte the rather smple structure of the benchmarks, the tasks show a strong varaton n processor and memory demand. As all benchmarks exhbt only one trace, the worst-case processor and memory demand concde. Evaluaton of more complex tasks ncludng evaluaton of the trade off between pessmsm of ndependent upper bounds and the computatonal complexty of explct traces remans as future work. We dentfy three man sources of over-approxmaton of our multcore response tme analyss framework: The number of memory accesses on the same core cannot be precsely estmated due to mprecson n the pre-empton cost analyss. The nterference due to bus accesses may be pessmstc as not all tasks runnng on another core can smultaneously access the bus. The DRAM refreshes are assumed too frequently f the number of man memory accesses s over-approxmated. A sophstcated evaluaton of the precson of our analyss requres measurements on a real archtecture, whch we cannot yet provde. However, the dfferent archtecture confguratons provde an estmate of the nfluence of the dfferent sources of pessmsm. The reference archtecture wth a perfect bus elmnates any pessmsm due to bus nterference and DRAM accesses. Only the pessmsm of the pre-empton cost analyss remans, whch has been quantfed n [2]. The full-solaton archtecture removes all pessmsm due to the bus nterference and the pre-empton costs, and thus only suffers from the pessmsm n the DRAM analyss. We evaluated the guaranteed performance of the varous confguratons as computed usng the MRTA framework on a large number of randomly generated task sets. The task set parameters were as follows: The default task set sze was 32, wth 8 tasks per core. Each task was randomly assgned a trace from Table 1. The base WCET per task τ, needed solely to set the task perods and deadlne, was defned as C = PD + MD d man + DRAM(PD + MD d man, MD ) d refresh C denotes the executon tme of the task wthout any nterference from any other task. The task utlzatons were generated usng UUnfast [12] wth an equal utlzaton assumed for each core. Task perods were set based on task utlzaton and base WCET,.e., T = C /U. Task deadlnes were mplct. Prortes were assgned n deadlne monotonc order. We note that the processor utlzaton s often not the lmtng factor on a multcore system, but the memory utlzaton, defned as: U BUS MD d man = (33) T s the lmtng factor. Only f U BUS 1, can the tasks be scheduled. The utlzaton per core was vared from to n steps of For each utlzaton value, 1000 tasksets were generated and the schedulablty was determned for each archtectural confguraton. Fgure 4 shows the number of schedulable task sets plotted aganst the core utlzaton (computed usng the base WCETs) and Fgure 5 aganst the bus utlzaton U BUS. Most traces from Table 1 have a hgh memory demand, whch results n a hgh

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Verification by testing

Verification by testing Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over

More information

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing EDA222/DIT161 Real-Tme Systems, Chalmers/GU, 2014/2015 Lecture #8 Real-Tme Systems Real-Tme Systems Lecture #8 Specfcaton Professor Jan Jonsson Implementaton System models Executon-tme analyss Department

More information

Lecture 7 Real Time Task Scheduling. Forrest Brewer

Lecture 7 Real Time Task Scheduling. Forrest Brewer Lecture 7 Real Tme Task Schedulng Forrest Brewer Real Tme ANSI defnes real tme as A Real tme process s a process whch delvers the results of processng n a gven tme span A data may requre processng at a

More information

Real-time Scheduling

Real-time Scheduling Real-tme Schedulng COE718: Embedded System Desgn http://www.ee.ryerson.ca/~courses/coe718/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrcal and Computer Engneerng Ryerson Unversty Overvew RTX

More information

Multitasking and Real-time Scheduling

Multitasking and Real-time Scheduling Multtaskng and Real-tme Schedulng EE8205: Embedded Computer Systems http://www.ee.ryerson.ca/~courses/ee8205/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrcal and Computer Engneerng Ryerson Unversty

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

An Investigation into Server Parameter Selection for Hierarchical Fixed Priority Pre-emptive Systems

An Investigation into Server Parameter Selection for Hierarchical Fixed Priority Pre-emptive Systems An Investgaton nto Server Parameter Selecton for Herarchcal Fxed Prorty Pre-emptve Systems R.I. Davs and A. Burns Real-Tme Systems Research Group, Department of omputer Scence, Unversty of York, YO10 5DD,

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Mixed-Criticality Scheduling on Multiprocessors using Task Grouping

Mixed-Criticality Scheduling on Multiprocessors using Task Grouping Mxed-Crtcalty Schedulng on Multprocessors usng Task Groupng Jankang Ren Lnh Th Xuan Phan School of Software Technology, Dalan Unversty of Technology, Chna Computer and Informaton Scence Department, Unversty

More information

Space-Optimal, Wait-Free Real-Time Synchronization

Space-Optimal, Wait-Free Real-Time Synchronization 1 Space-Optmal, Wat-Free Real-Tme Synchronzaton Hyeonjoong Cho, Bnoy Ravndran ECE Dept., Vrgna Tech Blacksburg, VA 24061, USA {hjcho,bnoy}@vt.edu E. Douglas Jensen The MITRE Corporaton Bedford, MA 01730,

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

A Predictable Execution Model for COTS-based Embedded Systems

A Predictable Execution Model for COTS-based Embedded Systems 2011 17th IEEE Real-Tme and Embedded Technology and Applcatons Symposum A Predctable Executon Model for COTS-based Embedded Systems Rodolfo Pellzzon, Emlano Bett, Stanley Bak, Gang Yao, John Crswell, Marco

More information

Petri Net Based Software Dependability Engineering

Petri Net Based Software Dependability Engineering Proc. RELECTRONIC 95, Budapest, pp. 181-186; October 1995 Petr Net Based Software Dependablty Engneerng Monka Hener Brandenburg Unversty of Technology Cottbus Computer Scence Insttute Postbox 101344 D-03013

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Real-Time Guarantees. Traffic Characteristics. Flow Control

Real-Time Guarantees. Traffic Characteristics. Flow Control Real-Tme Guarantees Requrements on RT communcaton protocols: delay (response s) small jtter small throughput hgh error detecton at recever (and sender) small error detecton latency no thrashng under peak

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal

More information

Scheduling. In general, a scheduling scheme provides two features: An algorithm for ordering the use of system resources (in particular the CPUs)

Scheduling. In general, a scheduling scheme provides two features: An algorithm for ordering the use of system resources (in particular the CPUs) Schedulng Goal To understand the role that schedulng and schedulablty analyss plays n predctng that real-tme applcatons meet ther deadlnes Topcs Smple process model The cyclc executve approach Process-based

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Memory and I/O Organization

Memory and I/O Organization Memory and I/O Organzaton 8-1 Prncple of Localty Localty small proporton of memory accounts for most run tme Rule of thumb For 9% of run tme next nstructon/data wll come from 1% of program/data closest

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

Multitasking and Real-time Scheduling

Multitasking and Real-time Scheduling Multtaskng and Real-tme Schedulng EE8205: Embedded Computer Systems http://www.ee.ryerson.ca/~courses/ee8205/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrcal and Computer Engneerng Ryerson Unversty

More information

A comparison of MPCP and MSRP when sharing resources in the Janus multiple-processor on a chip platform

A comparison of MPCP and MSRP when sharing resources in the Janus multiple-processor on a chip platform A comparson of MPCP and MSRP when sharng resources n the Janus multple-processor on a chp platform Paolo Ga, Marco D Natale, Guseppe Lpar, Scuola Superore Sant Anna, Psa, Italy {pj,marco,lpar}@sssup.t

More information

Reliability and Energy-aware Cache Reconfiguration for Embedded Systems

Reliability and Energy-aware Cache Reconfiguration for Embedded Systems Relablty and Energy-aware Cache Reconfguraton for Embedded Systems Yuanwen Huang and Prabhat Mshra Department of Computer and Informaton Scence and Engneerng Unversty of Florda, Ganesvlle FL 326-62, USA

More information

Avoiding congestion through dynamic load control

Avoiding congestion through dynamic load control Avodng congeston through dynamc load control Vasl Hnatyshn, Adarshpal S. Seth Department of Computer and Informaton Scences, Unversty of Delaware, Newark, DE 976 ABSTRACT The current best effort approach

More information

On Achieving Fairness in the Joint Allocation of Buffer and Bandwidth Resources: Principles and Algorithms

On Achieving Fairness in the Joint Allocation of Buffer and Bandwidth Resources: Principles and Algorithms On Achevng Farness n the Jont Allocaton of Buffer and Bandwdth Resources: Prncples and Algorthms Yunka Zhou and Harsh Sethu (correspondng author) Abstract Farness n network traffc management can mprove

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Concurrent models of computation for embedded software

Concurrent models of computation for embedded software Concurrent models of computaton for embedded software and hardware! Researcher overvew what t looks lke semantcs what t means and how t relates desgnng an actor language actor propertes and how to represent

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management. //7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

3. CR parameters and Multi-Objective Fitness Function

3. CR parameters and Multi-Objective Fitness Function 3 CR parameters and Mult-objectve Ftness Functon 41 3. CR parameters and Mult-Objectve Ftness Functon 3.1. Introducton Cogntve rados dynamcally confgure the wreless communcaton system, whch takes beneft

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Technical Report. i-game: An Implicit GTS Allocation Mechanism in IEEE for Time- Sensitive Wireless Sensor Networks

Technical Report. i-game: An Implicit GTS Allocation Mechanism in IEEE for Time- Sensitive Wireless Sensor Networks www.hurray.sep.pp.pt Techncal Report -GAME: An Implct GTS Allocaton Mechansm n IEEE 802.15.4 for Tme- Senstve Wreless Sensor etworks Ans Koubaa Máro Alves Eduardo Tovar TR-060706 Verson: 1.0 Date: Jul

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,

More information

Coordinated Bank and Cache Coloring for Temporal Protection of Memory Accesses

Coordinated Bank and Cache Coloring for Temporal Protection of Memory Accesses Coordnated Bank and Cache Colorng for Temporal Protecton of Memory Accesses 1 Norak Suzuk, 2 Hyoseung Km, 2 Donso de Nz, 2 Bjorn Andersson, 2 Lutz Wrage, 2 Mark Klen, and 2 Ragunathan (Raj) Rajkumar n-suzuk@ha.jp.nec.com,

More information

Scheduling and queue management. DigiComm II

Scheduling and queue management. DigiComm II Schedulng and queue management Tradtonal queung behavour n routers Data transfer: datagrams: ndvdual packets no recognton of flows connectonless: no sgnallng Forwardng: based on per-datagram forwardng

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

A protocol for mixed-criticality management in switched Ethernet networks

A protocol for mixed-criticality management in switched Ethernet networks A protocol for mxed-crtcalty management n swtched Ethernet networks Olver CROS, Laurent GEORGE Unversté Pars-Est, LIGM / ESIEE, France cros@ece.fr,lgeorge@eee.org Xaotng LI ECE Pars / LACSC, France xaotng.l@ece.fr

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

Perfecting Preemption Threshold Scheduling for Object-Oriented Real-Time System Design: From The Perspective of Real-Time Synchronization

Perfecting Preemption Threshold Scheduling for Object-Oriented Real-Time System Design: From The Perspective of Real-Time Synchronization Perfectng Preempton Threshold Schedulng for Obect-Orented Real-Tme System Desgn: From The Perspectve of Real-Tme Synchronzaton Saehwa Km School of Electrcal Engneerng and Computer Scence Seoul Natonal

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

On the Fairness-Efficiency Tradeoff for Packet Processing with Multiple Resources

On the Fairness-Efficiency Tradeoff for Packet Processing with Multiple Resources On the Farness-Effcency Tradeoff for Packet Processng wth Multple Resources We Wang, Chen Feng, Baochun L, and Ben Lang Department of Electrcal and Computer Engneerng, Unversty of Toronto {wewang, cfeng,

More information

ARTICLE IN PRESS. Signal Processing: Image Communication

ARTICLE IN PRESS. Signal Processing: Image Communication Sgnal Processng: Image Communcaton 23 (2008) 754 768 Contents lsts avalable at ScenceDrect Sgnal Processng: Image Communcaton journal homepage: www.elsever.com/locate/mage Dstrbuted meda rate allocaton

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

A Sub-Critical Deficit Round-Robin Scheduler

A Sub-Critical Deficit Round-Robin Scheduler A Sub-Crtcal Defct ound-obn Scheduler Anton Kos, Sašo Tomažč Unversty of Ljubljana, Faculty of Electrcal Engneerng, Ljubljana, Slovena E-mal: anton.kos@fe.un-lj.s Abstract - A scheduler s an essental element

More information

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET

More information

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems S. J and D. Shn: An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems 2355 An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems Seunggu J and Dongkun Shn, Member,

More information

Response-Time Guarantees in ATM Networks

Response-Time Guarantees in ATM Networks Response-Tme Guarantees n ATM Networks Andreas Ermedahl Hans Hansson Mkael Sjödn Department of Computer Systems Uppsala Unversty Sweden E-mal: febbe,hansh,mcg@docs.uu.se Abstract We present a method for

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

A Genetic Algorithm Based Dynamic Load Balancing Scheme for Heterogeneous Distributed Systems

A Genetic Algorithm Based Dynamic Load Balancing Scheme for Heterogeneous Distributed Systems Proceedngs of the Internatonal Conference on Parallel and Dstrbuted Processng Technques and Applcatons, PDPTA 2008, Las Vegas, Nevada, USA, July 14-17, 2008, 2 Volumes. CSREA Press 2008, ISBN 1-60132-084-1

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations* Confguraton Management n Mult-Context Reconfgurable Systems for Smultaneous Performance and Power Optmzatons* Rafael Maestre, Mlagros Fernandez Departamento de Arqutectura de Computadores y Automátca Unversdad

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Hybrid Job Scheduling Mechanism Using a Backfill-based Multi-queue Strategy in Distributed Grid Computing

Hybrid Job Scheduling Mechanism Using a Backfill-based Multi-queue Strategy in Distributed Grid Computing IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.12 No.9, September 2012 39 Hybrd Job Schedulng Mechansm Usng a Backfll-based Mult-queue Strategy n Dstrbuted Grd Computng Ken Park

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Response-Time Analysis for Single Core Equivalence Framework

Response-Time Analysis for Single Core Equivalence Framework 1 Response-Tme Analyss for Sngle Core Equvalence Framework Renato Mancuso, Rodolfo Pellzzon, Marco Caccamo, Lu Sha, Heechul Yun Unversty of Illnos at Urbana-Champagn, USA, {rmancus2, mcaccamo, lrs}@llnos.edu

More information

Advanced Computer Networks

Advanced Computer Networks Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans

More information

Analysis of Collaborative Distributed Admission Control in x Networks

Analysis of Collaborative Distributed Admission Control in x Networks 1 Analyss of Collaboratve Dstrbuted Admsson Control n 82.11x Networks Thnh Nguyen, Member, IEEE, Ken Nguyen, Member, IEEE, Lnha He, Member, IEEE, Abstract Wth the recent surge of wreless home networks,

More information

Pricing Network Resources for Adaptive Applications in a Differentiated Services Network

Pricing Network Resources for Adaptive Applications in a Differentiated Services Network IEEE INFOCOM Prcng Network Resources for Adaptve Applcatons n a Dfferentated Servces Network Xn Wang and Hennng Schulzrnne Columba Unversty Emal: {xnwang, schulzrnne}@cs.columba.edu Abstract The Dfferentated

More information

Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams

Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams Self-Tunng, Bandwdth-Aware Montorng for Dynamc Data Streams Navendu Jan, Praveen Yalagandula, Mke Dahln, Yn Zhang Mcrosoft Research HP Labs The Unversty of Texas at Austn Abstract We present, a self-tunng,

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information