A Generic and Compositional Framework for Multicore Response Time Analysis

Size: px

Start display at page:

Download "A Generic and Compositional Framework for Multicore Response Time Analysis"

Clarissa Roberts
6 years ago
Views:

1 A Generc and Compostonal Framework for Multcore Response Tme Analyss Sebastan Altmeyer Unversty of Luxembourg Unversty of Amsterdam Clare Maza Grenoble INP Vermag Robert I. Davs Unversty of York INRIA, Pars-Rocquencourt Vncent Nels CISTER, ISEP, Porto Leandro Indrusak Unversty of York Jan Reneke Saarland Unversty ABSTRACT In ths paper, we ntroduce a Multcore Response Tme Analyss (MRTA) framework. Ths framework s extensble to dfferent multcore archtectures, wth varous types and arrangements of local memory, and dfferent arbtraton polces for the common nterconnects. We nstantate the framework for sngle level local data and nstructon memores (cache or scratchpads), for a varety of memory bus arbtraton polces, ncludng: Round-Robn, FIFO, Fxed-Prorty, Processor-Prorty, and TDMA, and account for DRAM refreshes. The MRTA framework provdes a general approach to tmng verfcaton for multcore systems that s parametrc n the hardware confguraton and so can be used at the archtectural desgn stage to compare the guaranteed levels of performance that can be obtaned wth dfferent hardware confguratons. The MRTA framework decouples response tme analyss from a relance on context ndependent WCET values. Instead, the analyss formulates response tmes drectly from the demands on dfferent hardware resources. 1. INTRODUCTION Effectve analyss of the worst-case tmng behavour of systems bult on multcore archtectures s essental f these hgh performance platforms are to be deployed n crtcal real-tme embedded systems used n the automotve and aerospace ndustres. We dentfy four dfferent approaches to solvng the problem of determnng tmng correctness. Wth sngle core systems, a tradtonal two-step approach s typcally used. Ths conssts of tmng analyss whch determnes the context-ndependent worst-case executon tme (WCET) of each task, followed by schedulablty analyss, whch uses task WCETs and nformaton about the processor schedulng polcy to determne f each task can be guaranteed to meet ts deadlne. When local memory (e.g. cache) s present, then ths approach can be augmented by analyss of Cache Related Pre-empton Delays (CRPD) [3], or by parttonng the cache to avod CRPD altogether. Both approaches are effectve and result n tght upper bounds on task response tmes [4]. Wth a multcore system, the stuaton s more complex snce WCETs are strongly dependent on the amount of cross-core nterference on shared hardware resources such as man memory, L2-caches, and common nterconnects, due to tasks runnng on other cores. The uncertanty and varablty n ths cross-core nterference renders the tradtonal two-step process neffectve for many multcore processors. For example, on the Freescale P4080, the latency of a read operaton vares from 40 to 600 cycles dependng on the total number of cores runnng and the number of competng tasks [35]. Smlarly, a 14 tmes slowdown has been reported [38] due to nterference on the L2-cache for tasks runnng on Intel Core 2 Quad processors. At the other extreme s a fully ntegrated approach. Ths nvolves consderng the precse nterleavng of nstructons orgnatng from dfferent cores [22]; however, such an approach suffers from potentally nsurmountable problems of combnatoral complexty, due to the prolferaton of dfferent path combnatons, as well as dfferent release tmes and schedules. An alternatve approach s based on temporal solaton [15]. The dea here s to statcally partton the use of shared resources, e.g. space parttonng of cache and DRAM banks, tme parttonng of bus access, so that context-ndependent WCET values can be used and the tradtonal two-step process appled. Ths approach rases a further challenge, how to partton the resources to obtan schedulablty [39]. Technques whch seek to lmt the worst-case cross-core nterference, for example by usng TDMA arbtraton on the memory bus or by lmtng the amount of contenton by suspendng executon on certan cores [35], can have a sgnfcant detrmental effect on performance, effectvely negatng the performance benefts of usng a multcore system altogether. We note that TDMA s rarely f ever used as a bus-arbtraton polcy n real multcore processors, snce t s not work-conservng and so wastes sgnfcant bandwdth. Ths mpacts both worst-case and average-case performance; essental for applcaton areas such as telecommuncatons, whch have a major nfluence on processor desgn. The fnal approach s the one presented n ths paper, based on explct nterference modellng. We explore the premse that due to the strong nterdependences between tmng analyss and schedulablty analyss on multcore systems, they need to be consdered together. In our approach, we omt the noton of WCET per se and nstead drectly target the calculaton of task response tmes. In ths work, we use executon traces to model the behavour of tasks. Traces provde a smple yet expressve way to model task behavour. Note that relyng on executon traces does not pose a fundamental lmtaton to our approach as all requred performance quanttes can also be derved usng statc analyss [31, 20, 1] as wthn the tradtonal context-ndependent tmng analyss; however, traces enable a near-trval statc cache analyss and so allow us to focus on response tme analyss. The man performance metrcs are the processor demand and the memory demand of each task. The latter quantty feeds nto analyss of the arbtraton polcy used by the common nterconnect, enablng us to upper bound the total memory access delays whch may occur durng the response tme of the task. By computng the overall processor demand and memory demand over a relatvely long nterval of tme (.e. the task response tme),

2 as opposed to summng the worst case over many short ntervals (e.g. ndvdual memory accesses), we are able to obtan much tghter response tme bounds. The Multcore Response Tme Analyss framework (MRTA) that we present s extensble to dfferent types and arrangements of local memory, and dfferent arbtraton polces for the common nterconnect. In ths paper, we nstantate the MRTA framework assumng the local memores used for nstructons and data are sngle-level and ether cache, scratchpad, or not present. Further, we assume that the memory bus arbtraton polcy may be TDMA, FIFO, Round-Robn, or Fxed-Prorty (based on task prortes), or Processor-Prorty. We also account for the effects of DRAM refresh [5, 11]. The general approach emboded n the MRTA framework s extensble to more complex, mult-level memory herarches, and other sources of nterference. It provdes a general tmng verfcaton framework that s parametrc n the hardware confguraton (common nterconnect, local memores, number of cores etc.) and so can be used at the archtectural desgn stage to compare the guaranteed levels of performance that can be obtaned wth dfferent hardware confguratons, and also durng the development and ntegraton stages to verfy the tmng behavour of a specfc system. Whle the specfc hardware models and ther mathematcal representatons used n ths paper cannot capture all of the nterference and complexty of actual hardware, they serve as a vald startng pont. They nclude the domnant sources of nterference and represent current archtectures reasonably well. The rest of the paper s organsed as follows. Secton 2 dscusses the related work. Secton 3 descrbes the system model and notaton used. Sectons 4 and 5 show how the effects of a local memory and the common nterconnect can be modelled. Secton 6 presents the core of our framework, nterference-aware Multcore Response Tme Analyss (MRTA). Ths analyss ntegrates processor and memory demands accountng for cross-core nterference. Extensons to the presented analyss are dscussed n Secton 7. Secton 8 descrbes the results of an expermental evaluaton usng the MRTA framework, and Secton 9 concludes wth a summary and perspectves on future work. 2. RELATED WORK In 2007, Rosen et al. [40] proposed an mplementaton n whch TDMA slots on the bus are statcally allocated to cores. Ths technque reles on the avalablty of a user-programmable table-drven bus arbter, whch s typcally not avalable n real hardware, and on knowledge at desgn tme, of the characterstcs of the entre workload that executes on each core. Chattopadhyay et al. [17] and Kelter et al. [25] proposed an analyss whch takes nto account a shared bus and nstructon cache, assumng separate buses and memores for both code and data (uncommon n real hardware) and TDMA bus arbtraton. The method has a lmted applcablty as t does not address data accesses to memory. In 2010, Schranzhofer et al. [42] developed a framework for analysng the worst-case response tme of real-tme tasks on a mult core wth TDMA arbtraton. Ths was followed by work on resource adaptve arbters [43]. They proposed a task model n whch tasks consst of sequences of super-blocks, themselves dvded nto phases that represent mplct communcaton (fetchng or wrtng of data from/to memory), computaton (processng the data), or both. Contrary to the technque presented here, ther approach requres major program nterventon and compler assstance to prefetch data. Also n 2010, Lv et al. [33] proposed a method to model request patterns and the memory bus usng tmed automata. Ther method handles nstructon accesses only and may suffer from state-space exploson when appled to data accesses. A method employng tmed automata was proposed by Gustavsson et al. [22] n whch the WCET s obtaned by provng specal predcates through model checkng. Ths approach allows for a detaled system modellng but s also prone to the state-space exploson problem. In 2014, Kelter et al. [26] analysed the maxmum bus arbtraton delays for multprocessor systems sharng a TDMA bus and usng both (prvate) L1 and (shared) L2 nstructon and data caches. Pellzzon et al. [37] compute an upper bound on the contenton delay ncurred by perodc tasks, for systems comprsng any number of cores and perpheral buses sharng a sngle man memory. Ther method does not cater for non-perodc tasks and does not apply to systems wth shared caches. In addton t reles on accurate proflng of cache utlzaton, sutable assgnment of the TDMA tme-slots to the tasks super-blocks, and mposes a restrcton on where the tasks can be pre-empted. Schlecker et al. [41] proposed a method that employs a general event-based model to estmate the maxmum load on a shared resource. Ths approach makes very few assumptons about the task model and s thus qute generally applcable. However, t only supports a sngle bus arbter that s an unspecfed work-conservng arbter. Paoler et al. [36] proposed a hardware platform that enforces a constant upper bound on the latency of each access to a shared resource. Ths approach enables the analyss of tasks n solaton snce the nterference on other tasks can be conservatvely accounted for usng ths bound on the latency. Smlarly, the PTARM [32] enforces constant latences for all nstructons, ncludng loads and stores. However, both cases represent customzed hardware. Km et al. [27] presented a model to upper bound the memory nterference delay caused by concurrent accesses to a shared DRAM man memory. Ther work dffers from ths paper n that they do not assume a unque shared bus to access the man memory and they prmarly focus on the contenton at the DRAM controller by assumng a fully parttoned prvate and shared cache model. (For shared caches they smply assume that the extra number of requests generated due to cache lne evctons at runtme s gven). Yun et al. [46] proposed a software-based memory throttlng mechansm to explctly lmt the memory request rate of each core and thereby control the memory nterference. They also developed analytcal solutons to compute proper throttlng parameters that satsfy schedulablty of crtcal tasks whle mnmsng the performance mpact of throttlng. In 2015, Dasar et al. [18] proposed a general framework to compute the maxmum nterference caused by the shared memory bus and ts mpact on the executon tme of the tasks runnng on the cores. The method of computaton n [18] s more complex than that proposed n ths paper, and may be more accurate when t estmates the delay due to the shared bus, but t does not take cache-related effects nto account (by assumng parttoned caches), whch makes t less generc than the framework proposed here. Regardng shared caches, Yan and Zhang [45] addressed the problem of computng the WCET of tasks assumng drect mapped, shared L2 nstructon caches on multcores. The applcablty of the approach s unfortunately lmted as t makes very restrctve assumptons such as (1) data caches are perfect,.e. all accesses are hts, and (2) data references from dfferent threads wll not nterfere wth each other n the shared L2 cache. L et al. [30] proposed a method to estmate the worst-case response tme of concurrent programs runnng on multcores wth shared L2 caches, assumng set-assocatve nstructon caches usng the LRU replacement polcy. Ther work was later extended [17] by addng a TDMA bus analyss technque to bound the memory access delay. Fnally, one must also note that some other technques, such as [14, 19] for nstance, am at modfyng the schedulng algorthm

3 so that ts schedulng decsons reduce the mpact of the CRPD. 3. SYSTEM MODEL In ths paper, we provde a theoretcal framework that can be nstantated for a range of dfferent multcore archtectures wth dfferent types of memory herarchy and dfferent arbtraton polces for the common nterconnect. Our am s to create a flexble, adaptable, and generc analyss framework wheren a large number of common multcore archtecture desgns can be modeled and analysed. In ths paper nevtably we can only cover a lmted number of types of local memory, bus, and global memory behavour. We select common approaches to model the dfferent hardware components and ntegrate them nto an extensble framework. 3.1 Multcore Archtectural Model We model a generc multcore platform wth l tmng-compostonal cores P 1,... P l as depcted n Fgure 1. By tmng-compostonal cores we mean cores where t s safe to separately account for delays from dfferent sources, such as computaton on a gven core and nterference on a shared bus [23]. The set of cores s defned as P. Each core has a local memory whch s connected va a shared bus to a global memory and IO nterface. We assume constant delays d man to retreve data from global memory under the assumpton of an mmedate bus access,.e., no wat-cycles or contenton on the bus. We assume atomc bus transactons,.e., no splt transactons, whch furthermore are not re-ordered, and non-preemptable busy watng on the processor for requests to be servced. Further, we assume that bus access may be gven to cores for one access at a tme. The types of the memores and the bus polcy are parameters that can be nstantated to model dfferent multcore systems. In ths paper, we IO/ global memory Core Loc Mem Loc Mem Core Core Loc Mem Loc Mem Core Core Loc Mem Loc Mem Core Fgure 1: Multcore Platform. A set of l processors wth local memores connected va a common bus to a global memory. omt a consderaton of delays due to cache coherence and synchronzaton, and we assume wrte-through caches only. Wrte-back caches are dscussed n Secton Task Model We assume a set of n sporadc tasks {τ 1,..., τ n }, each task τ has a mnmum perod or nter-arrval tme T and a deadlne D. Deadlnes are assumed to be constraned, hence D T. We assume that the tasks are statcally parttoned to the set of l dentcal cores {P 1,..., P l }, and scheduled on each processor usng fxed-prorty pre-emptve schedulng. The set of tasks assgned to core P x s denoted by Γ x. The ndex of each task s unque and thus provdes a global prorty order, wth τ 1 havng the hghest prorty and τ n the lowest. The global prorty of each task translates to a local prorty order on each core whch s used for schedulng purposes. We use hp() (lp()) to denote the set of tasks wth hgher (lower) prorty than that of task τ, and we use hep() (lep()) to denote the set of tasks wth hgher or equal (lower or equal) prorty to task τ. We ntally assume that the tasks are ndependent, n so far as they do not share mutually exclusve software resources (dscussed n Secton 7); nevertheless, the tasks compete for hardware resources such as the processor, local memory, and the memory bus. The executon of task τ s modelled usng a set of traces O, where each trace o = [ι 1,... ι k ] s an ordered lst of nstructons. For ease of notaton, we treat the ordered lst of nstructons as a mult-set, whenever we can abstract away from the specfc order. We dstngush three types of nstructons t: r[m da ] read data from memory block m da t = w[m da ] wrte data to memory block m da (1) e execute An nstructon ι s a trple consstng of the nstructon s memory address m n, ts executon tme wthout memory delays,.e., assumng a perfect local memory, and the nstructon type t: ι = (m n,, t) (2) The set of memory blocks s defned as M. M n denotes the nstructon memory blocks and M da the data memory blocks. We assume that data memory and nstructon memory are dsjont,.e, M n M da =. The use of traces to model a task s behavour s unusual as the number of traces s exponental n the number of control-flow branches. Despte ths obvous drawback, traces provde a smple yet expressve way to model task behavour. They enable a near-trval statc cache analyss and a smple multcore smulaton to evaluate the accuracy of the tmng verfcaton framework. However, most mportantly, traces show that the worst-case executon behavour of a task τ on a multcore system s not unquely defned. From the vewpont of a task scheduled on the same core, τ may have the hghest mpact when t uses the core for the longest possble tme nterval, whereas the mpact on tasks scheduled on any other core may be maxmzed when τ produces the largest number of bus accesses. These two cases may well correspond to dfferent executon traces. As a remedy for the exponental number of traces, the complexty can be reduced by () computng a synthetc worst-case trace or () by dervng the set of Pareto optmal traces that maxmze the task s mpact accordng to a pre-defned cost functon (see [31]). We can also completely resort to statc analyss to derve upper bounds on the performance metrcs. Statc analyses provde ndependent upper bounds on the dfferent performance quanttes. Ths strongly reduces the computatonal complexty, but may lead to pessmsm. An evaluaton of ths trade-off s future work. 4. MEMORY MODELLING In ths secton we show how the effects of a local memory can be modelled va a MEM functon whch descrbes the number of accesses due to a task whch are passed to the next level of the memory herarchy, n ths case man memory. The MEM functon s nstantated for both cache and scratchpads. We model the effect of a (local) memory usng a functon of the form: MEM: O N 2 2N 2 N (3) where MEM(o) = (MD o, UCB o, ECB o ) computes, for a trace o, the number of bus accesses.e., the number of memory accesses whch cannot be served by the local memory alone (denoted as memory demand MD), UCB o whch denotes a multset contanng, for each program pont n trace o, the set of Useful Cache Blocks (UCBs) [28], whch may need to be reloaded when trace o s pre-empted at that program pont, and the set of Evctng Cache Blocks (ECBs) whch s the set of all cache blocks accessed by trace o whch may evct memory blocks of other tasks. The

4 value MD does not just cover cache msses, but also has to account for wrte accesses. In the case of wrte-through caches, each wrte access wll cause a bus access, rrespectve of whether or not the memory block s present n cache. The number of bus accesses MD assumes non-preemptve executon. Wth pre-emptve executon and caches, more than MD memory accesses can contrbute to the bus contenton due to cache evcton. In ths paper, we make use of the CRPD analyss for fxed prorty pre-emptve schedulng ntroduced by Altmeyer et al. [3]. We now derve nstantatons of the functon MEM(o) for a trace o = [ι 1,..., ι k ] for nstructon memores and data memores for systems () wthout cache, () wth scratchpads, and () wth drect-mapped or LRU caches. In the followng, the superscrpts ndcate data (da) or nstructon memory (n), the subscrpts the type of memory,.e., uncached (nc), scratchpad (sp), or caches (ca). 4.1 Uncached Consderng nstructon memory, the number of bus accesses for a system wth no cache s gven by the number of nstructons k n the trace. The set of UCBs and ECBs are empty, as pre-empton has no effect on the performance of the local memory, snce none exsts. MEM n nc(o) = (k,, ) (4) Consderng data memory, we have to account for the number of data accesses, rrespectve of read or wrte access. The number of accesses s thus equal to the number of data access nstructons. ( MEM da { nc(o) = ι ι o ι = (_, _, r/w[m da ]) },, ) (5) 4.2 Scratchpads A scratchpad memory s defned usng a functon SPM: M {true, f alse}, whch returns true for memory blocks that are stored n the scratchpad. For ease of presentaton, we assume a statc a wrte-through scratchpad confguraton, whch does not change at runtme. An extenson to dynamc scratchpads and the wrte-back polcy s straght-forward, but beyond the scope of ths paper. Each memory access to a memory block whch s not stored n the scratchpad causes an addtonal bus access. ( MEM n { sp(o) = m n (m n, _, _) o SPM(m n ) },, ) Further, n the case of wrte accesses, even f a memory block s stored n the scratchpad, that access also contrbutes to the bus contenton as we assume a wrte-through polcy. ( MEM da { sp(o) = m da ( (_, _, r(m da )) o SPM(m da ) ) (_, _, w(m da )) o } ),, (7) The sets of UCBs and ECBs are empty as no pre-empton overhead s assumed wth statc scratchpad memory. Dynamc scratchpad management s dscussed n Secton Caches We assume a functon Ht: I M {true, f alse}, whch classfes each memory access at each nstructon as a cache ht or a cache mss. Ths functon can be derved usng cache smulaton of the access trace startng wth an empty cache or by usng tradtonal cache analyss [20], where each unclassfed memory access s consdered a cache mss. Ths means that we upper bound the number of cache msses. For each possble pre-empton (6) pont ι on trace o, the set of UCBs s derved usng the correspondng analyss descrbed n Altmeyer s thess [1], Chapter 5, Secton 4. It s suffcent to only store the cache sets a useful memory blocks maps to, nstead of the useful memory blocks. The multset UCB o then contans, for each program pont ι n trace o, the set of UCBs at that program pont,.e, UCB o = ι o UCB ι. The set of ECBs s the set of all cache sets of memory blocks on trace o. MEM n ca(o) = ( { m n ι = (m n, _, _) o Ht(m n, ι ) } ), UCB n o, ECB n o Snce we assume a wrte-through polcy, each wrte access contrbutes to the cache contenton and has to be treated accordngly. ( MEM da { ca(o) = m da ( ι = (_, _, r(m da ) ) o Ht(m da, ι ) ) (_, _, w(m da )) o } ), UCB da o, ECB da o (9) 4.4 Memory Combnatons To allow dfferent combnatons of local memores, for example scratchpad memory for nstructons and an LRU cache for data, we defne the combnaton of nstructon memory MEM n and data memory MEM da as follows MEM(o) = ( MD n o + MD da o, UCB n o UCB da ) o, ECB n o ECB da o (10) ( wth MEM n (o) = MD n o, UCB n ) o, ECB n o beng the result for the ( nstructon memory and MEM da (o) = MD da o, UCB da ) o, ECB da o for the data memory. 5. BUS MODELLING In ths secton we show how the memory bus delays experenced by a task can be modelled va a BUS functon of the form: (8) BUS: N P N N (11) where BUS(, x, t) determnes an upper bound on the number of bus accesses that can delay task τ on processor P x durng a tme nterval of length t. Ths abstracton covers a varety of bus arbtraton polces, ncludng Round-Robn, FIFO, Fxed-Prorty, and Processor-Prorty, all of whch are work-conservng, and also TDMA whch s not work-conservng. We now ntroduce the mathematcal representatons of the delays ncurred under these arbtraton polces. We note that the framework s extensble to a wde varety of dfferent polces. The only constrants we place on nstantatons of the BUS(, x, t) functon s that they are monotoncally non-decreasng n t. Let τ be the task of nterest, and x the ndex of the processor P x on whch t executes. Other task ndces are represented by j, k etc. whle y, z are used for processor ndces. Let S x (t) denote an upper bound on the total number of bus accesses due to τ and all hgher prorty tasks that run on processor P x durng an nterval of length t. Let A y j (t) be an upper bound on the total number of bus accesses due to all tasks of prorty j or hgher executng on some processor P y P x durng an nterval of length t. (Note, j may not necessarly be the prorty of a task allocated to processor P y ). As memory bus requests are typcally non-preemptve, one

5 lower prorty 1 memory request may block a hgher prorty one, snce the global, shared memory may have just receved a lower prorty request before the hgher prorty one arrves. To account for these blockng accesses, we use L y j (t) whch denotes an upper bound on the total number of bus accesses due to all tasks of prorty lower than j executng on some other processor P y P x durng an nterval of length t. In Secton 6 we show how the values of S x(t), Ay j (t) and Ly j (t) are computed and explan why S x (t) and A y j (t) are subtly dfferent and hence requre dstnct notaton. In the followng equatons for the BUS(, x, t) functon, we account for blockng due to one non-preemptve access from lower prorty tasks runnng on the same core P x as task τ (.e. +1 n the equatons). Ths holds because such blockng can only occur at the start of the the prorty level- (processor) busy perod. For a Fxed-Prorty bus wth memory accesses nhertng the prorty of the task that generates them, we have: BUS(, x, t) = S x (t) + A y (t) + mn S x (t), L y (t) + 1 (12) y x The term mn ( S x(t), y x L y (t)) upper bounds the blockng due to tasks of lower prorty than τ runnng on other cores. For a Processor-Prorty bus wth memory accesses nhertng the prorty of the core rather than the task, we have: BUS(, x, t) = S x (t)+ A y n(t)+mn S x (t), A y n(t) +1 (13) y HP(x) y x y LP(x) where HP(x) (LP(x)) s the set of processors wth hgher (lower) prorty than that of P x, and n s the ndex of the task wth the lowest prorty. The term A y n(t) thus captures the nterference of all tasks runnng on processor y, ndependent of ther prorty, and the term mn ( S x(t), y x A y n(t) ) upper bounds the blockng due to tasks runnng on processors wth prorty lower than that of P x. For a FIFO bus, we assume that all accesses generated on the other processors may be servced ahead of the last access of τ, hence we have: BUS(, x, t) = S x (t) + A y n(t) + 1 (14) y x Note accesses from other cores do not contrbute blockng snce we already pessmstcally account for all these accesses n the summaton term. For a Round-Robn bus wth a cycle consstng of an equal number of slots v per processor, we have: BUS(, x, t) = S x (t) + mn(a y n(t), v S x (t)) + 1 (15) y x The worst-case stuaton occurs when each access n S x (t) s delayed by each core P y P x for v slots. Interference by core P y s lmted to the number of accesses from core P y. Agan, as we already account for all accesses from all other cores, there s no separate contrbuton to blockng. Note unlke TDMA, Round-Robn moves to the next slot mmedately f a processor has no access pendng. For a TDMA bus wth v adjacent slots per core n a cycle of length l v, we have: BUS(, x, t) = S x (t) + ((l 1) v) S x (t) + 1 (16) Snce TDMA s not work-conservng, the worst case corresponds to each access n S x (t) just mssng a slot for processor P x and hence havng to wat at most ((l 1) v+1) slots to be servced. Effectvely, 1 Here we mean prortes on the bus, whch are not necessarly the same as task prortes. there s addtonal nterference from the (l 1) v slots reserved for other processors on each access, rrespectve of whether these slots are used or not. As all accesses due to hgher prorty tasks on P x may be servced pror to the last access of task τ we requre S x(t) accesses n total to be servced for P x. Note that when v = 1, Equaton (16) smplfes to BUS(, x, t) = l S x (t) + 1. It s nterestng to note that whle TDMA provdes more predctable behavour, ths s at a cost of sgnfcantly worse guaranteed performance over long tme ntervals (e.g. the response tme of a task) due to the fact that t s not work-conservng. Effectvely, ths means that the memory accesses of a task may suffer addtonal nterference due to empty slots on the bus. Nevertheless, Round-Robn behaves lke TDMA when all other cores create a large number of competng memory accesses. We note that the equal number of slots per core for Round-Robn and TDMA, and the groupng of slots per core are smplfyng assumptons to exemplfy how TDMA and Round-Robn buses can be analysed. An analyss for more complex confguratons s reserved for future work. 6. RESPONSE TIME ANALYSIS In ths secton, we present the centre pont of our tmng verfcaton framework: nterference-aware Multcore Response Tme Analyss (MRTA). Ths analyss ntegrates the processor and memory demands of the task of nterest and hgher prorty tasks runnng on the same processor, ncludng CRPD. It also accounts for the cross-core nterference on the memory bus due to tasks runnng on the other processors. A task set s deemed schedulable, f for each task τ, the response tme R s less than or equal to ts deadlne D : : R D schedulable The tradtonal response tme calculaton [6] [24] for fxed-prorty pre-emptve schedulng on a unprocessor s based on an upper bound on the WCET of each task τ, denoted by C. By contrast, our MRTA framework dssects the ndvdual components (processor and memory demands) that contrbute to the WCET bound and re-assembles them at the level of the worst-case response tme. It thus avods the over-approxmaton nherent n usng context-ndependent WCET bounds. In the followng, we assume that τ s the task of nterest whose schedulablty we are checkng, and P x s the processor on whch t runs. Recall that there s a unque global orderng of task prortes even though the schedulng s parttoned wth a fxed-prorty preemptve scheduler on each processor. 6.1 Interference on the Core We compute the maxmal processor demand PD for each task τ as follows: PD = max o O (_,,_) o (17) where s the executon tme of an nstructon wthout memory delays. Task τ suffers nterference I PROC (, x, t) on ts core P x due to tasks of hgher prorty runnng on the same core wthn a tme nterval of length t startng from the crtcal nstant: t I PROC (, x, t) = PD j (18) j Γ x j hp() 6.2 Interference on the local memory Local memory mproves a task s executon tme by reducng the number of accesses to man memory The memory demand of a trace gves the number of accesses that go to man memory and hence the bus, despte the presence of the local memory. The T j

6 maxmal memory demand MD of a task τ s defned by the maxmum number of bus accesses of any of ts traces: { MD = max MD } MEM(o) = (MD, _, _) (19) o O Note that the maxmal memory demand refers to the demand of the combned nstructon and data memory as defned n Equaton (10). The memory demand MD s derved assumng non-preemptve executon,.e. that the task runs to completon wthout nterference on the local memory. The sets of UCBs and ECBs are used to compute the addtonal overhead due to pre-empton. In the computaton of ths overhead, we use the sets of UCBs per trace o to preserve precson, UCB o = UCB wth MEM(o) = (_, UCB, _) (20) and derve the maxmal set of ECBs per task τ as the unon of the ECBs on all traces. ECB = {ECB } MEM(o) = (_, _, ECB) (21) o O We use γ, j,x (wth j hp()) to denote the overhead (addtonal accesses) due to a pre-empton of task τ by task τ j on core P x. We use the ECB-Unon [2] approach as an exemplar of CRPD analyss, as t provdes a reasonably precse bound on the pre-empton overhead wth low complexty. Other technques [3] [29] could also be ntegrated nto ths framework, but we omt the explanaton due to space constrants. The ECB-Unon approach consders the UCBs of the pre-empted task per pre-empton pont and assumes that the pre-emptng task τ j has tself already been pre-empted by all tasks wth hgher prorty on the same processor P x. Ths nested pre-empton of the pre-emptng task s represented by the unon of the ECBs of all tasks wth hgher or equal prorty than task τ j (see [3] for a detaled descrpton). γ, j,x = max k hep() lp( j) k Γ x max o O k max UCB ι UCB o UCB ι ECB h h hep( j) h Γ x (22) 6.3 Interference on the Bus We now compute the number of accesses that compete for the bus durng a tme nterval of length t, equatng to the worst-case response tme of the task of nterest τ. We use S x (t) to denote an upper bound on the total number of bus accesses that can occur due to tasks runnng on processor P x durng that tme. Snce lower prorty tasks cannot execute on P x durng the response tme of task τ (a prorty level- processor busy perod), the only contrbuton from those tasks s a sngle blockng access as dscussed n Secton 5. The maxmum delay s computed assumng task τ s released smultaneously wth all hgher prorty tasks that run on P x, and subsequent releases of those tasks occur as soon as possble, whle also assumng that the maxmum possble number of preemptons occur. S x (t) = k Γ x k hep() t T k (MDk ) + γ,k,x (23) MD k denotes the memory demand of task τ k and γ,k,x accounts for the pre-empton costs on core P x due to jobs of task τ k. We use A y j (t) to denote an upper bound on the total number of bus accesses due to all tasks of prorty j or hgher executng on processor P y P x durng an nterval of length t. A specal case s A y n(t): snce τ n s the lowest prorty task, ths term ncludes accesses due to all tasks runnng on processor P y. In contrast to the dervaton of S x(t), for Ay n(t) we can make no assumptons about the synchronsaton or otherwse of tasks on processor P y wth respect to the release of task τ on processor P x. The value of A y j (t) s therefore obtaned by assumng for each task, that the frst job executes as late as possble,.e. just pror to ts worst-case response tme, whle the next and subsequent jobs execute as early as possble. We assume that the frst nterferng job of a task τ k has all of ts memory accesses as late as possble durng ts executon, whle for subsequent jobs the opposte s true, wth executon and memory accesses occurrng as early as possble after release of the job. Ths treatment s smlar to the concept of carry-n nterference used n the analyss of global multprocessor fxed-prorty schedulng [10], and s llustrated n Fgure 2. The R k Memory accesses Executon Fgure 2: Illustraton of the carry-n nterference analyss. number of complete jobs of task τ k contrbutng accesses n an nterval of length t on processor y s gven by: t + N y j,k (t) = Rk (MD k + γ j,k,y ) d man (24) T k Note the term (MD k + γ j,k,y ) d man represents the tme for the memory accesses. Hence the total number of accesses possble n an nterval of length t due to task τ k and ts cache related preempton effects s gven by: W y j,k (t) = Ny j,k (t) (MD k + γ j,k,y )+ mn ( t + R k (MD k + γ j,k,y ) d man N y j,k MD k + γ j,k,y, (t) T k ) d man (25) Hence we have: A y j (t) = t k Γ y k hep( j) T k W y j,k (t) (26) The value of L y j (t) s obtaned n a smlar way to Ay j, but consderng accesses wth lower prorty than j: L y j (t) = k Γ y k lp( j) W y n,k (t) (27) We note that the carry-n nterference has not been accounted for n [27] Equaton (5) and (6), resultng n potentally optmstc bounds on the number of competng memory requests n [27]. The number of accesses on the cores are used as nput to the BUS functon (see Secton 5), whch we use to derve the maxmum bus delay that task τ on processor P x can experence durng a tme nterval of length t, I BUS (, x, t) = BUS(, x, t) d man (28) where d man s the bus access latency to the global memory. 6.4 Global Memory So far we have assumed a global memory wth a constant access latency d man. Global memory s usually realzed based on dynamc random-access memory (DRAM), whch needs to be refreshed perodcally. Now, we show how to relax the constant-latency assumpton to take nto account delays mposed by refreshes. We assume a DRAM controller wth a Frst Come

7 Frst Served (FCFS) schedulng polcy so that memory accesses cannot be reordered wthn the controller. Further, we assume a closed-page polcy to mnmze the effect of the memory access hstory on access latences. We consder two refresh strateges [34]: dstrbuted refresh where the controller refreshes each row at a dfferent tme, at regular ntervals, and burst refresh where all rows are refreshed mmedately one after another. Under burst refresh, an upper bound on the maxmum number of refreshes wthn an nterval of length t n whch m memory accesses occur s gven by: t DRAM burst (t, m) = #rows (29) T refresh where #rows s the number of rows n the DRAM module, and T refresh s the nterval at whch each row needs to be refreshed. T refresh s usually 64 ms for DDR2 and DDR3 modules. Under dstrbuted refresh, the upper bound s: ( ) t #rows DRAM dst (t, m) = mn m, (30) T refresh Ths s the case, snce at most one memory access can be delayed by each of the refreshes, whereas under burst refresh, a sngle memory access can be delayed by #rows many refreshes. As the number of memory accesses wthn t s equal to the number of BUS accesses, we can bound the nterference due to DRAM refreshes of task τ on core P x as follows: I DRAM (, x, t) = DRAM(t, BUS((, x, t)) d refresh (31) where d refresh s the refresh latency. 6.5 Multcore Response Tme Analyss The response tme R of task τ s gven by the smallest soluton to the followng recurrence relaton: R = PD + I PROC (, x, R ) + I BUS (, x, R ) + I DRAM (, x, R ) (32) where I PROC (, x, R ) s the nterference due to processor demand from hgher prorty tasks runnng on the same processor assumng no msses on the local memory (see Equaton (18)), I BUS (, x, R ) s the delay due to bus accesses from tasks runnng on all cores ncludng MD (see Equaton (28)), and I DRAM (, x, R ) s the delay due to DRAM refreshes (see Equaton (31)). Snce the response tme of each task can depend on the response tmes of other tasks va the functons (26) and (27) descrbng memory accesses A y j (t) and Ly j (t), we use an outer loop around a set of fxed-pont teratons to compute the response tmes of all the tasks, and deal wth an apparent crcular dependency. Iteraton starts wth : R = PD + MD d man and ends when all the response tmes have converged (.e. no response tme changes w.r.t. the prevous teraton), or the response tme of a task exceeds ts deadlne n whch case that task s unschedulable. See Algorthm 1 for a pseudo-code algorthm of the response tme calculaton. Snce the response tme R of a task τ s monotoncally ncreasng w.r.t. ncreases n the response tme of any other task, convergence or exceedng a deadlne s guaranteed n a bounded number of teratons. We note that the analyss s sustanable [8] wth respect to the processor PD j and memory demands MD j of each task, snce values that are smaller than the upper bounds used n the analyss cannot result n a larger response tme. Ths sustanablty extends to traces; f any trace of task executon results n practce n a lower processor or memory demand than that consdered by the analyss, then ths also cannot result n an ncrease n the response tme. Smlarly, a decrease n the set of UCBs or ECBs such that they are a subset of those consdered by the analyss cannot ncrease the worst-case response tme. Algorthm 1 Response Tme Computaton 1: functon MultCoreRTA 2: : R 0 = 0 3: : R 1 = PD + MD d man 4: l = 1 5: whle : R l Rl 1 : R l D do 6: for all do 7: R l,0 = R l 1 8: R l,1 = R l 9: k = 1 10: whle : R l,k R l,k 1 R l,k D do 11: R l,k+1 = PD + I PROC (, x, R l,k ) 12: +I BUS (, x, R l,k ) + I DRAM (, x, R l,k 13: k = k : end whle 15: end for 16: R l+1 = R l,k 17: l = l : end whle 19: f : R l D then 20: return schedulable 21: else 22: return not schedulable 23: end f 24: end functon Note that the defntons of MD, PD and ECB completely decouple the traces from the response tme analyss. Ths comes at the cost of possble pessmsm, but strongly reduces the complexty of the analyss. Dfferent traces may maxmze dfferent parameters, meanng that the combnaton of the parameters n ths way may represent a synthetc worst-case that cannot occur n practce. An alternatve soluton s to defne a multcore response tme analyss that s parametrc n the executon traces. In the extreme, completely expandng the analyss to explore every combnaton of traces from dfferent tasks would be ntractable. However, as a frst step n ths drecton, response tmes could be computed for each ndvdual trace of the task of nterest τ, usng combned traces for all other tasks. The maxmum such response tme would then provde an mproved upper bound. 7. EXTENSIONS Above, we nstantated the Multcore Response Tme Analyss (MRTA) framework for relatvely smple task and multcore archtectural models. In the secton, we brefly dscuss extensons ncludng: RTOS and nterrupts, dynamc scratchpad management, sharng software resources, open systems and ncremental verfcaton, wrte-back cache polces and mult-level caches. However, the presented analyss framework s not fne-tuned to specfc hardware features or executon scenaros such as burst accesses, snce ths counteracts ts extensblty and generalty. 7.1 RTOS and Interrupts The analyss presented n the paper only consders tasks and ther executon, as represented by traces. We now gve a bref outlne of how the MRTA framework can be extended to cover RTOS and nterrupt handler behavour. We assume that task release s trggered va nterrupts from a tmer/counter or other nterrupt sources. When an nterrupt s rased, the approprate handler s dspatched and may pre-empt the )

8 currently executng task 2. When the nterrupt handler returns, then f a hgher prorty task has been released, the scheduler wll run and dspatch that task, otherwse control returns to the prevously runnng task. When a task completes, then the scheduler agan runs and chooses the next hghest prorty task to execute. The behavour of each nterrupt handler s represented by a set of executon traces smlar to those for tasks. Thus nterrupt handlers can be ncluded n the MRTA framework n a smlar way to tasks, but at hgher prortes. (We note that there may be some dfferences f all nterrupts share the same nterrupt prorty level; however due to restrctons on space and the wde varety of possble arrangements of nterrupt prortes, we do not go nto detals here). In some cases, nterrupts may be prohbted from usng the cache, have ther own cache partton, or have ther code permanently locked nto a scratchpad. All of these possbltes can be covered usng varants of the analyss descrbed n the paper. The RTOS s dfferent from nterrupt handlers and tasks n that t s not a schedulable entty n tself, rather RTOS code s run as part of each task, typcally before and after the actual task code, and nterleaved wth t n the form of system calls. Smlarly wth nterrupt handlers that release tasks, RTOS code s typcally called as the handler returns. Wth our representaton of tasks and nterrupt handlers as sets of traces, executon of the RTOS can be fully accounted for by a concatenaton of the approprate sub-traces for the RTOS onto the start and end of the traces for tasks and nterrupt handlers. 7.2 Dynamc Scratchpad Management In Secton 4.2, we assumed that scratchpad contents were statc; however, dynamc scratchpad management schemes [44] are better able to make use of lmted scratchpad memory n multtaskng systems. In ths case pre-empton costs are ncurred, savng, loadng and restorng the scratchpad contents on each pre-empton. These operatons may be explct, mplemented by code n the operatng system, n whch case the addtonal processng and memory demands can easly be accounted for va the sub-traces for the RTOS. Alternatvely, these operatons may be under the control of specalsed DMA hardware [44] requrng specfc modellng of the addtonal memory demands. 7.3 Sharng Software Resources The analyss presented n the paper assumes that tasks are ndependent n the sense that they do not share software resources that must be accessed n mutual excluson, rather the only contenton s over hardware resources. We now consder how that restrcton can be lfted. We assume that tasks executng on the same processor may share software resources that are accessed n mutual excluson accordng to the Stack Resource Protocol (SRP) [7]. Under SRP, a task τ may be blocked from executng by at most a sngle crtcal secton where a task of prorty lower than locks a resource shared wth task τ or a task of hgher prorty. Further, under SRP, blockng only occurs before a task starts to execute, thus SRP ntroduces no extra context swtches. We assume a set of traces O B for all of the crtcal sectons that may block task τ. In the MRTA framework, the mpact of blockng needs to be consdered n terms of both processor and memory demand. Ths can be acheved by consderng the traces O B as belongng to a sngle vrtual task wth hgher prorty than τ. Thus we obtan a contrbuton PD B to the processor demand whch s added nto I (, x, t) and a contrbuton MD B to the memory demand whch contrbutes to S x(t). Accountng for the CRPD effects due to blockng are more complex and ts ntegraton nto the MRTA 2 Or nterrupt handler f multple nterrupt prorty levels are supported framework s beyond the scope of ths paper; the basc method s however explaned n [3]. We note that blockng due to software resources accessed by tasks on other processors does not affect the term A y n(t) snce SRP ntroduces no addtonal context swtches, and at the lowest prorty level n, there are no extra tasks to nclude n the CRPD computaton (see secton 5 of [3]). The value of A y j (t) used n the analyss of a Fxed Prorty bus s also unchanged due to resource accesses, snce we assume that the bus access prorty reflects only a task s base prorty, rather than any rased prorty as a result of SRP. 7.4 Open Systems and Incremental Verfcaton The basc analyss for the MRTA framework gven n the paper assumes that we have nformaton (.e. traces etc.) for all of the tasks n the system. There are a number of reasons why ths may not be the case: () the system may be open, wth tasks on one or more processors loadable post deployment, () the system may be under development and the tasks on another processor not yet known, () ncremental verfcaton may be requred, so no assumpton can be made about the tasks executng on another processor, (v) the system may be mxed crtcalty and tasks on another processor may not be developed to the same crtcalty level, and hence cannot be assumed to be well behaved. Instead we must assume they may exhbt the worst possble behavour. For a processor P y where we have no nformaton, or need to assume the worst, then we may replace A y j (t) and Ay n(t) wth a functon that represents contnual generaton of memory accesses at the maxmum possble rate. In practce, ths may be equvalent to smply settng A y j (t) = Ay n(t) =. We note that analyss for TDMA and round-robn bus arbtraton stll results n bounded response tmes n ths case, whle the analyss for FIFO and Fxed Prorty arbtraton wll result n unbounded response tmes. Wth arbtraton based on Processor Prorty, then bounded response tmes can only be obtaned f P y s a lower prorty processor than P x. 7.5 Caches wth a Wrte-Back Polcy In the paper, we consder wrte-through caches only; however, n practce wrte-back caches are usually preferred, as they reduce the number of accesses to man memory, and thus ncrease performance. Wrte-back caches ntroduce three challenges for future work: The frst challenge s to devse analyses that precsely bound the number of wrte backs, whch s equal to the number of evctons of drty cache lnes. The second and perhaps greater challenge s that wrte backs correspondng to the executon of a task τ may occur after the termnaton of τ and thus contrbute to the delay of another task. Thrdly, wrte-back caches requre the mplementaton of coherence protocols, whch may generate addtonal traffc on the memory bus, whch would have to be safely bounded. A nave soluton to the frst two challenges assumes pessmstcally that each cache lne s drty and thus each cache evcton leads to two bus accesses. Alternatvely, we can derve for each task n a closed system a set of drty-cache lnes, whch have to be wrtten back f evcted by another task. Wrte-backs can then be consdered an addtonal source of nterference n the framework. Detals analyss for wrte-back caches remans an nterestng area for future work. 7.6 Mult-level Caches Modern multcore processors often feature multple cache levels, where usually one level s shared between multple cores. Dealng wth such a scenaro n our framework s n prncple feasble. As long as all caches are prvate, the challenge would be to ntegrate an extenson of CRPD analyss to multple cache

9 ARMv7 ICache DCache ICache DCache ARMv7 ARMv7 ICache ICache DCache DCache ARMv7 IO/ global memory Fgure 3: Multcore Archtecture Case Study: m = 4 cores wth local caches connected va a common bus to a global memory. levels. Chattopadhyay and Roychoudhury [16], have recently proposed such an analyss for non-nclusve memory herarches. Shared second- or thrd-level caches add the extra complcaton of cross-core nterference on the cache. Dfferent more or less precse and effcent approaches to bound ths nterference are concevable, and agan form an nterestng area for future work. 8. EXPERIMENTAL EVALUATION In ths secton we descrbe the results of an expermental evaluaton usng the MRTA framework 3. For the evaluaton, we use the Mälardalen benchmark sute [21] to provde traces. We model a multcore systems based on an ARM Cortex A5 multcore 4 as a reference archtecture to provde a cache confguraton and memory and bus latences. As ths work s ntended to provde an overvew of our generc and extensble framework, we do not model all detals of the specfc multcore archtecture. A case study comparng measurements on a real hardware wth the computed bounds s future work. The reference archtecture depcted n Fgure 3 s confgured as follows: It has 4 ARMv7 cores connected to the global memory/io over a shared bus assumng a round-robn arbtraton polcy and a core frequency of 200MHz. Each core has separate nstructon and data caches, wth 256 cache sets each and a block sze of 32Bytes. The global memory latency d man and the DRAM refresh latency d refresh are both 5 cycles. The DRAM refresh perod T refresh s 64 ms. We assume the DRAM mplements the dstrbuted refresh strategy (see Secton 6.4). We examne dervatves of the reference confguraton assumng the dfferent bus arbtraton polces presented n Secton 5 and a hypothetcal perfect bus whch elmnates all bus nterference f the bus utlzaton s 1. We compare the reference confguraton wth two alternatves archtectures: The frst, referred to as full-solaton archtecture mplements complete spatal and temporal solaton. The local caches are parttoned wth an equal partton sze for each task and the bus uses a TDMA arbtraton polcy. All other parameters reman the same as n the reference archtecture. The performance on the solaton archtecture corresponds to the tradtonal two-step approach to tmng verfcaton wth context-ndependent WCETs. The second alternatve, referred to as uncached archtecture, assumes no local caches except for a buffer of sze 1, and uses round-robn bus arbtraton. All other parameters are agan the same as the reference confguraton. The traces for the benchmarks were generated usng the gem5 nstructon set smulator [13] and contan statcally lnked lbrary calls. As the benchmark code corresponds to ndependent tasks, no data s shared between the tasks. Table 1 shows nformaton for all 39 benchmark programs 3 The software s avalable on demand. 4 cortex-a5.php used to provde traces ncludng the total number of nstructons (whch s equal to the processor demand), the number of read/wrte operatons, the memory demand, and the maxmum number of UCBs and ECBs on the reference multcore archtecture. Each benchmark s assgned only one trace, whch s suffcent due to the smple structure of the benchmark sute: The benchmarks are ether sngle-path or worst-case nput s provded. Despte the rather smple structure of the benchmarks, the tasks show a strong varaton n processor and memory demand. As all benchmarks exhbt only one trace, the worst-case processor and memory demand concde. Evaluaton of more complex tasks ncludng evaluaton of the trade off between pessmsm of ndependent upper bounds and the computatonal complexty of explct traces remans as future work. We dentfy three man sources of over-approxmaton of our multcore response tme analyss framework: The number of memory accesses on the same core cannot be precsely estmated due to mprecson n the pre-empton cost analyss. The nterference due to bus accesses may be pessmstc as not all tasks runnng on another core can smultaneously access the bus. The DRAM refreshes are assumed too frequently f the number of man memory accesses s over-approxmated. A sophstcated evaluaton of the precson of our analyss requres measurements on a real archtecture, whch we cannot yet provde. However, the dfferent archtecture confguratons provde an estmate of the nfluence of the dfferent sources of pessmsm. The reference archtecture wth a perfect bus elmnates any pessmsm due to bus nterference and DRAM accesses. Only the pessmsm of the pre-empton cost analyss remans, whch has been quantfed n [2]. The full-solaton archtecture removes all pessmsm due to the bus nterference and the pre-empton costs, and thus only suffers from the pessmsm n the DRAM analyss. We evaluated the guaranteed performance of the varous confguratons as computed usng the MRTA framework on a large number of randomly generated task sets. The task set parameters were as follows: The default task set sze was 32, wth 8 tasks per core. Each task was randomly assgned a trace from Table 1. The base WCET per task τ, needed solely to set the task perods and deadlne, was defned as C = PD + MD d man + DRAM(PD + MD d man, MD ) d refresh C denotes the executon tme of the task wthout any nterference from any other task. The task utlzatons were generated usng UUnfast [12] wth an equal utlzaton assumed for each core. Task perods were set based on task utlzaton and base WCET,.e., T = C /U. Task deadlnes were mplct. Prortes were assgned n deadlne monotonc order. We note that the processor utlzaton s often not the lmtng factor on a multcore system, but the memory utlzaton, defned as: U BUS MD d man = (33) T s the lmtng factor. Only f U BUS 1, can the tasks be scheduled. The utlzaton per core was vared from to n steps of For each utlzaton value, 1000 tasksets were generated and the schedulablty was determned for each archtectural confguraton. Fgure 4 shows the number of schedulable task sets plotted aganst the core utlzaton (computed usng the base WCETs) and Fgure 5 aganst the bus utlzaton U BUS. Most traces from Table 1 have a hgh memory demand, whch results n a hgh

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng