Hardware Support for Histogram-based Performance Analysis of Embedded Systems

Size: px
Start display at page:

Download "Hardware Support for Histogram-based Performance Analysis of Embedded Systems"

Transcription

1 217 IEEE 2th International Symposium on Real-Time Distributed Computing Hardware Support for Histogram-based Performance Analysis of Embedded Systems Thomas Ballenthin HBM GmbH Darmstadt, Germany Boris Dreyer and Christian Hochberger Fachgebiet Rechnersysteme, Technische Universität Darmstadt Darmstadt, Germany {dreyer, Simon Wegener AbsInt Angewandte Informatik GmbH Saarbrücken, Germany Abstract Timing analysis in embedded systems has focused mainly on the Worst-Case Execution Time (WCET) in the past. This was (and still is) important to make guarantees for the application of the system in safety critical environments. Today, two reasons call for a slightly changed perspective. Firstly, the complex and often unpredictable internal structure of modern system-on-chip architectures prohibits the calculation of realistic upper bounds for the WCET. Secondly, even if we can compute a realistic value for the WCET, the developer still does not know how the code under scrutiny behaves in general and whether it is useful or necessary to spend time on optimising this code. In this contribution, we present a new method and hardware architecture to collect Execution Time Profiles (ETP) which give us much more insight in the execution time behaviour on modern system-on-chip architectures as previously available. I. INTRODUCTION System-on-Chip architectures (SoC) are the predominant way to implement embedded systems. They offer high performance and are flexible through their software programmability. They are often used to control a technical context like the engine or the brake system in a car. For safety-critical systems, safety standards (e.g. ISO and DO-178B/C) require that guarantees for the upper bounds of execution time of specific parts of the software are given. In the past, this Worst-Case Execution Time (WCET) could be analysed statically [1]. Unfortunately, todays SoCs are highly complex systems with components working in parallel, often in an unpredictable way. Examples for this unpredictable behaviour are random cache replacement strategies or complex bus arbiters. Although such elements can be taken into account during WCET estimation by assuming maximal effects, the resulting times are far from realistic. Nowotsch et al. [2], for example, show the effect of assuming maximal bus contention in a worst-case analysis compared to a more sophisticated approach. Additionally, the WCET does not help programmers to identify sources of high variance. In this article, we present a new method and HW architecture that allows us to capture detailed Execution Time Profiles (ETPs) in the form of histograms. Our method is based on a new approach to process trace data of SoCs which This work was funded within the project CONIRAS by the German Federal Ministry for Education and Research with the funding ID 1IS1329. The responsibility for the content remains with the authors. we have previously published [3]. The great benefit of this approach is that it is non-intrusive and runs online (while the target system executes). Thus, it can aggregate statistics over arbitrary long test cycles. The execution time profiles can be used to understand the runtime behaviour of complex SoCs, which cannot be analysed statically. The histogram bin distribution gives us an overall overview of the execution time and enables us to make statistical statements of execution time probabilities. The following section discusses related work. Section III describes the environment and tools that have been used to carry out the experiments. The mechanism to detect the currently executed function is reviewed in Section IV. It is followed by our main contribution in Section V, where we explain how the detailed execution profile is captured in hardware. Section VI discusses different options to store the histogram in an FPGA. It is followed by an explanation of the computation of execution profiles in Section VII. In Section VIII we apply our tool and method to the well-known debie1 timing analysis benchmark. Finally, a conclusion is given. II. RELATED WORK Some research has been carried out to improve the predictability of processor architectures, either by design or by configuration (e.g. Cullmann et al. [4], Schoeberl et al. [5]). We, however, focus on a COTS architecture (ARM Cortex-A9) which has many unpredictable features. ETPs can be used in many different ways. Many papers about probabilistic schedulability analysis [6] [7] [8] require ETPs, for example to compute the response time distribution of each task in the system. Research has been conducted on how to schedule various tasks and maintain a given quality of service level by Abeni et al. [9] and Cazorla et al. [1]. This is done using Reservation-Based-Scheduling algorithms which utilize ETPs [11]. With the help of ETPs, the statistical probability is ascertained if a timing constrain can be met. The major challenge in these approaches is to achieve reasonable Execution-Time- Profiles of any given program code. Santinelli et al. [12] and Kaczynski et al. [13] use ETPs to estimate the execution times of programs and tasks /17 $ IEEE DOI 1.119/ISORC

2 A measurement-based approach to create ETPs is presented by Hansen et al. [14]. For this, the trace data of the processor was recorded and then evaluated. The recording time of the trace data was limited to 12 minutes. Furthermore, the program code was annotated by markers to assign the trace data which alters the execution time. We, in contrast, process the trace data of the processor in real-time and are therefore not limited in observation time. Also, we do not have to instrument the program under test, so we do not alter the program s timing behaviour and thus create authentic histograms. In order to process the high-speed trace data in real-time, we generate histograms in hardware. The use of hardware implementations to speed up histogram calculations is not new. It is a common feature to accelerate image processing [15] or face recognition. Alsuwailem and Alshebeili [16] presented an approach for computing histogram statistics and histogram equalisation in parallel to speed up image enhancement. Sanny et al. [17] developed a histogram implementation for image processing with a frame rate of 3 on a Virtex-7 FPGA. Stekas and v. d. Heuvel [18] uses Local Binary Patterns Histograms (LBPH) to extract features from test face images implemented on a Zynq-73 SoC. In our architecture, the trace extraction unit generates the events to be processed as histograms. These events are emitted at high frequency and require a fast histogram computation. We designed and implemented our own histogram module because none of the histogram implementations above meet our real-time demands, e.g. processing one event per cycle and an FPGA clock above 2 Mhz. III. TIMING ANALYSIS FRAMEWORK In previous work [3], we developed a non-intrusive measurement-based timing analysis framework. Our framework works on the object code level and is split into three phases: an offline pre-processing phase, the continuous online aggregation phase and an offline post-processing phase. For this work, we added a histogram-based statistics module to the FPGA and replaced the hybrid WCET estimation backend with an ETP computation backend. The workflow of our method is shown in Figure 1. We assume a static schedule where each task runs on one (predefined) core of a multicore system. Each core uses its own trace extraction and continuous aggregation modules. Hence it suffices to describe the workflow for a single core. Moreover, we assume that the analysed software consists of non-recursive functions only (loops are allowed, though). A. Pre-Processing First, the binary reader disassembles a fully linked binary executable into its individual instructions. Architecture specific patterns decide whether an instruction is a call, branch, return or just an ordinary instruction. This knowledge is used to form the basic blocks of the control flow graph (CFG). Then, the control flow between the basic blocks is reconstructed. In most cases, this is done completely automatically. However, if a target of a call or branch cannot be statically resolved, then the user needs to write some annotations to guide the control flow reconstruction. This can happen, for example, if the program contains calls via function pointer arrays. The embedded trace unit (ETU) of modern ARM processors (like the Xilinx Zynq XC7Z2 featuring a dual-core ARM Cortex-A9) is not fully compatible with the CFG model. The ETU emits a waypoint event for each non-linear control flow, for example, interrupts and hardware exceptions, but also for normal calls and branches. So-called waypoint instructions always generate waypoint events. Amongst others, instructions that possibly modify the program counter are waypoint instructions. This is enough to fully reconstruct the control flow, but less fine-grained than the CFG (see Figure 2). Therefore, after CFG reconstruction, the waypoint graph (WPG) is computed. To do so, a pattern matcher checks for each instruction whether it is a waypoint instruction. Afterwards, the edges of the WPG are computed. For each waypoint instruction found, the algorithm follows the edges in the CFG to find reachable waypoints. This gives the direction of a waypoint edge and its target. Now, the analysis hardware is powered on and its Virtex- 7 FPGA is configured with a bitstream that contains the trace extraction and the continuous aggregation unit. This configuration is not application-specific. Therefore, it only needs to be created once and can be used to create ETPs of distinct applications. Then, the WPG is used to create an application-specific meta-configuration for the trace extraction module as well as for the continuous online aggregation module. A unique ID is assigned to each edge in the WPG and the lookup tables in the function automata cluster are instantiated. After the creation, the modules are meta-configured. In contrast to the creation and configuration of a Virtex-7 FPGA with a large bitstream, the creation and configuration of a small application-specific meta-configuration happens within a few seconds. B. Trace Extraction and Continuous Aggregation During the program s execution, the ETU continuously emits raw trace data. This stream of data is fed into the trace extraction module. There, the raw data is decoded and compiled into an event stream. An event is generated for each traversal of a waypoint and consists of an ID and a timestamp. The special ID is used if the waypoint does not belong to the WPG computed during the pre-processing phase. This happens for example in case of an interrupt. The resulting waypoint event stream is then fed into the continuous aggregation module. The continuous aggregation module handles the recording of histograms. It consists of the function automata cluster and the histogram module. The function automata cluster generates a function event stream based on the waypoint event stream (see Section IV). The histogram module updates the histograms by processing either the function event stream or the raw waypoint event stream (see Section V). 2

3 1GB DDR3 Executable SoC FPGA (Xilinx Virtex-7) Post-Processing Pre-Processing Control Flow Reconstruction CFG WPG Measurement Configurator 512 kb BL2 Cache 32 kb L1 I-Cache z CPU Cortex A9, 667 MHz Embedded Trace Unit FIFO Trace Extraction Trace Data Pre-Processing Instruction Reconstruction Continuous Aggregation Function Automata Cluster Histogram Module Histograms ETP Computation ETP Config Traceable System States Trace Data Edge Events (Edge ID + Cycles) Function Events (Function ID + Cycles) Fig. 1. Workflow of our approach. It is splitted into three phases: offline pre-processing, continuous online aggregation and offline post-processing. LDR r1 [r, #4] LDR r2 [r, #8] LRD r3 [r, #12] CMP r1, r2 BLT B3 ID == 2 ID == ID == ID == enter woe_cycles woe_id 2: waypoint outgoing edge id ADD r1 r2 ADD r1, r2 ADD r3, r1 CMP r3, r1 BEQ finish woe_id 1: waypoint outgoing edge id woe_cycles ID == 4 ID == 5 ID == 6 ID == ID == exit exception Waypoint Instruction Normal Instruction Fig. 3. Exemplary comparator tree for one function having one entry and three exits. Unused comparators (shown in grey) return false. Fig. 2. CFG with highlighted waypoint instructions. C. Post-Processing After the program has finished (or the test engineer has collected enough data), the post-processing phase is started by downloading the histograms from the FPGA s memory. Subsequently, ETPs are calculated from the measured histograms, either directly from the function histograms or derived from the individual waypoint edge histograms (see Section VII). IV. FUNCTION AUTOMATA CLUSTER The function automata cluster models the mapping from the waypoint edge events to function events. For each function in the WPG, it contains a set of comparator trees and a finite state machine (FSM). The comparator trees (Figure 3) translate waypoint edge events into inputs for the FSM, namely enter (the function has been entered), exit (the function has been exited), and exception (knowledge about the function has been lost). The compare values (= edge IDs) of the comparator trees are part of the configuration that is loaded before the online aggregation phase is started. The FSM (Figure 4) is used to measure the execution time of one function. It consists of three states, namely Out (the function is not being executed), In (the function is being executed), and Unknown (it is not known if the function is being executed or not). If the function is not executed, the FSM is in state Out and the execution cycle counter is zero. Once the function is executed the state changes to In and the counter accumulates the execution times of the executed waypoint edge events. As soon as the machine changes its state from In to Out, the counter value is considered as the function s execution time and a function event is emitted. It is possible that a trace analysis starts after the program execution has been started. Consequently, there is a lack 3

4 exit Unknown Out enter enter exit In Counter Fig. 4. Finite state machine that measures the execution time of one function. Dashed edges are traversed for exception events. As long as the automaton is in state In, the counter accumulates the execution time of executed waypoint edge events. of function execution information at the beginning of the analysis. Therefore, the initial state of the FSM is Unknown. V. HISTOGRAM MODULE The Histogram Module can be connected to either the function event stream generated by the function automata cluster, calculating histograms at the function level, or directly to the waypoint edge event stream generated by the instruction reconstruction unit. Thus, for the sake of simplicity, we call both function events and waypoint edge events simply events. Each event consists of an identifier (event id) that identifies either a function or a waypoint edge and the elapsed cycles (event cycles) since the last event. The collected statistical data for each event id is stored as a bin distribution, also known as a histogram. Each histogram consists of k =2 n,n 4 N bins. Each bin has an identifier j, and the corresponding lower bound x l j N, as well as an upper bound x u j N. The bins are defined as disjoint left-open intervals ]x l j,xu j ]. Therefore, the lower bin boundary x l j is not part of the interval and the first lower bound is set to zero x l =, since all execution times will be strictly positive. The upper bin boundary x u j is part of the interval and is simultaneously the lower bound x l j+1 of bin j +1, thus j {,..., k 2} : x u j = xl j+1 applies. The last upper bin bound is set to infinity: x u k 1 =. A. Linear Bin Distribution Using a linear bin distribution, the upper bin boundaries x u j are linearly distributed, starting from zero. The range of a single bin is given as step N. This gives directly the cycle count up to which an event is sorted in the individual bins: step (k 1). All other event with an associated cycle count greater than this threshold will be put in the last (accumulative) bin. Table I shows an example for a linear distribution with k =8bins and step =4. TABLE I LINEAR BIN DISTRIBUTION WITH PARAMETERS k =8AND step =4. bin j upper bin bound x u j B. Central Bin Distribution The central bin distribution has been developed by assuming that the expected run times are clustered around a reference time. To obtain a more detailed view of the measured times, the bins are distributed around this reference time. When an event id is handled for the first time, the corresponding execution time event cycles is stored for further calculation as reference value event cycles ref. All future events with the same event id will use the same event cycles ref to determine the bin distribution. The distribution of the bins around the reference value event cycles ref can be versatilely arranged by several parameters. These can be modified to meet the varying requirements made by different program code. The overall number of bins is determined by the parameter k. These bins are spread around the event cycles ref value, whereas the parameter k l sets the bin number in which event cycles ref would be sorted and thus, the number of lower bins. event cycles ref is the upper bound of the corresponding bin: x u k l 1 = event cycle ref,it holds that k l k. The compressing factor f comp sets the range r of the histogram. As seen in Eq. 1, the range is dependent on the event cycles ref value. It limits the overall span of the histogram, which may be truncated to give a more detailed view of the relevant area. Furthermore, it reduces the amount of potentially empty bins. Every value exceeding the range will be put in either the smallest or the largest bin. Dividing the range r by the number of bins k gives the size of a single bin step and also the span between two boundaries (Eq. 2). The upper boundary x u j of an individual bin j is calculated as shown in Eq. 3, with r = event cycle ref. r := event cycle ref f comp (1) r step := k x u j := { r (k l j 1) step, <j k 2,j = k 1 A special case occurs when event cycle ref <k 1. This results in step =. Whenever this happens, a linear bin distribution is used for all events with the affected event id. The procedure is shown as a flow chart in Figure 5. An example for the calculation of bin boundaries is given in Figure 6. The number of bins is set to k =8, with lower bins being k l =6. Processing an event with event cycle ref =6 and using a compressing factor of f comp =.8leads to a range of r =48. It is evident that bin zero covers all events with event cycles smaller or equal to 3 and bin seven covers all events with event cycles larger than 67 cycles. The remaining bins, one to six, provide a detailed view of recorded events. VI. FAST HISTOGRAM STORAGE After the current bin for a single event has been calculated, the corresponding histogram needs to be updated. (2) (3) 4

5 yes j=event_cycle1 event_cycles previouslyseen? yes event_cycles_ref =storedevent_cycles_ref event_cycle<k1 no j=k1 no CalculateBinModule r=event_cycles_reff comp no event id j step= j=k1 step> yes j== yes r k event_cycles_ref=event_cycles storecurrentevent_cycles no j=j1 no u x < j event_cycles u. Fig. 5. Flow chart for determining the bin using the central bin distribution. u upper bin boundary x j bin j yes x j k l = 6 k u = 2 k = 8 Fig. 6. Example bin distribution with k =8bins, k l =6lower bins, a compressing factor of f comp =, 8 and event cycle ref =6. This happens within the storage module, which provides two storage targets. Either all histogram data is stored within the internal memory blocks (BRAM) or alternatively using external memory. A. Storage Using BRAM Using BRAM simplifies the architecture, given the fact that every clock cycle another word can be written to the memory. Each bin is stored at a specific address. Whenever a bin needs to be updated, its old value needs to be read, incremented by one and stored to the same address. This is achieved using dual port memory, which gives the possibility to have simultaneous read and write access. We used a pipelined architecture to improve the performance of the storage module. Whenever the same bin is continuously incremented, the access is buffered and the accumulated value is written to the BRAM only once. This reduces the number of memory accesses. If the program to be analysed is so complex that a large number of event ids is given, the memory requirement will exceed the available BRAM resources. In this case, external memory can be used. B. Storage Using External Memory To handle the access delays caused by the use of external memory, the Memory Buffer module (Figure 7) provides a caching and buffer infrastructure. Different Memory Master modules, for either RLDRAM or AXI-memory, can be used. Depending on the functionality provided by the Memory Master, the Memory Adapter module either uses sequential or parallel read and write accesses. The Histogram Cache is implemented in BRAM and temporarily stores several histograms. We anticipate a clustering of event ids according to the principle of locality. The FIFO Cache-Map stores the received event ids, where for every entry a new and unique cache id is assigned. Whenever a new event id is processed, firstly the FIFO is checked. Only if it does not already contain the event id, a memory access needs to be performed. The event ids and the corresponding cache ids are directly mapped. Thus, the Histogram Cache needs to hold as many histograms as entries are provided by the FIFO Cache-Map. If the external memory is ready to be accessed, the Memory Adapter takes the first entry from the FIFO Cache-Map. This entry contains the pair event id and cache id. The cached histogram and the externally stored histograms are read contemporaneously and the corresponding bins are summed up. Afterwards the result is written back to the external memory. The efficiency of the FIFO Cache-Map is depending on the executed program. The performance is improved by continuously aggregating the same set of event ids. If an overflow of the FIFO occurs, it s size needs to be increased. VII. COMPUTING EXECUTION TIME PROFILES ETPs can easily be computed from histograms by normalising each value from raw counts to its corresponding percentage in the histogram. This gives the probability for an execution to have an execution time corresponding to a given bin in the histogram (i.e. the probability mass function). When interested in worst-case behaviour, other representations of ETPs might be of interest, in particular the complementary cumulative representation [19]. There, for each bin, we calculate the probability that an execution has an execution time greater than the given bin. This is done by computing 1 j bin j. Figures 8 and 9 show the ETPs in complementary cumulative representation for the measured histograms. Note that events from the last bin of a histogram, i.e. the bin that spans until infinity, are not properly represented in the ETPs as we assumed that these events have a value of x u k 2 +1 during conversion. The aforementioned ETPs directly correspond to the measured histograms based on the function event stream. We also measured histograms based on the waypoint edge event stream. This gives ETPs for individual edges in the WPG. Although these ETPs might be interesting in itself, a more 5

6 find New Event event_cycle event_valid add _1 _2... _n next to process corresponding cache_id request histogram histo 1 histo 2 get corresponding cache_id return stored histogram histo 3 store event_cycles histo 1 histo 2 histo k... request cached histogram return requested histogram + store updated histogram... histo n Fig. 7. Histogram caching and storing module. natural (but also more coarse-grained) view on performance asks for the computation of ETPs for whole functions based on the ETPs of the individual edges of a function. Two operations are needed to carry out this computation: (a) choice and (b) convolution. Choice is used to calculate the least upper bound of two paths in the WPG. Depending on the context we either maximise or minimise with the choice operation. Convolutions are used to construct paths through the WPG out of individual edges. If a function contains loops, the application of choice and convolution needs to be iterated to generate all possible paths through the loop. Depending on the purpose of the analysis, different convolutions can be used to construct paths: Under the assumption that the execution times of edges are independent of each other, one can use the Gaussian convolution. However, with the presence of caches and other hardware features which take the history of instructions into account, this assumption is rather simplistic. For worst-case or best-case assessments, one can use the supremal convolution, or the infimal convolution, respectively. However, both might heavily overestimate (or underestimate) the real probabilities for a given execution time. Convolutions that correctly model the features that influence the execution time depend on the hardware on which a program is executed. They might be arbitrarily complex and are not in the scope of this publication. In Section VIII-D, we compare the ETPs derived with the help of different convolutions with an ETP that has been measured. VIII. USE CASE: ANALYSIS OF THE DEBIE1 BENCHMARK A. Settings The target (COTS) SoC in our prototype is a Xilinx Zynq XC7Z2 featuring a dual-core ARM Cortex-A9 running at 667 MHz. The FPGA part of this SoC was only used to route the trace data to the timing analysis platform which utilises a Xilinx Virtex-7 FPGA. The memory subsystem of the SoC consists of separate L1 instruction and data caches, each storing 32 kilobytes, 512 kilobytes of shared L2 cache and 1 gigabyte of DDR main memory. On one core, the debie1 benchmark was running. On the second core, a custom benchmark in a FreeRTOS instance was running to generate interferences on the shared L2 cache and the shared interconnects. This program consisted of multiple threads, which communicated over a shared buffer. The debie1 benchmark was compiled with the C++ compiler provided with the Xilinx SDK 216.1, GNU C/C (prerelease) with flags -mcpu=cortex-a9, -mfpu=vfpv3, -mfloat-abi=hard, -g3 and -O. B. The Benchmark The debie1 benchmark [2], [21] is based on the on-board software of the DEBIE-1 satellite instrument for measuring impacts of small space debris or micro-meteoroids. It defines six analysis problem sets, each derived from the original realtime requirements of the satellite instrument. For example, one problem considers the required deadline of the Interrupt Service Routine (ISR) TM_InterruptService. For our evaluation, we measured the execution times of the four tasks and two ISRs of the debie1 benchmark. Table II shows the number of observed function events for each of them. It shows also the reference value used for the centered bin distributions (Figure 9) as well as the minimal and maximal observed execution times of the tasks and ISRs. The aforementioned ISR TM_InterruptService, for example, got called times during the execution of the benchmark, with a maximal observed execution time of 239 cycles. C. Analysis Based on Function Events Figures 8 and 9 depict the results of our measurements of the tasks and ISRs of the debie1 benchmark. Histograms are shown on the left (in red) and ETPs are shown on the right (in blue). Figure 8 shows the histograms and ETPs when using a linear bin distribution with 128 bins, each of width 8. Figure 9 shows the histograms and ETPs when using a centered bin distribution. 6

7 Histogram (HandleAcquisition) <= 48 8 <= 168 <= 28 <= 248 <= 288 <= 328 <= 368 <= 48 <= 448 <= 488 <= 528 <= 568 <= 68 <= 648 <= 688 <= 728 <= <= 928 <= 968 <= Histogram (HandleHealthMonitoring) <= 48 8 <= 168 <= 28 <= 248 <= 288 <= 328 <= 368 <= 48 <= 448 <= 488 <= 528 <= 568 <= 68 <= 648 <= 688 <= 728 <= <= 928 <= 968 <= Histogram (HandleHitTrigger) <= 48 8 <= 168 <= 28 <= 248 <= 288 <= 328 <= 368 <= 48 <= 448 <= 488 <= 528 <= 568 <= 68 <= 648 <= 688 <= 728 <= <= 928 <= 968 <= 18 ETP (HandleAcquisition) ETP (HandleHealthMonitoring) ETP (HandleHitTrigger) Histogram (HandleTelecommand) <= 48 8 <= 168 <= 28 <= 248 <= 288 <= 328 <= 368 <= 48 <= 448 <= 488 <= 528 <= 568 <= 68 <= 648 <= 688 <= 728 <= <= 928 <= 968 <= Histogram (TC_InterruptService) <= 32 <= 56 <= 14 <= 152 <= 176 <= 2 <= 224 <= 248 <= 272 <= 296 <= 32 <= 344 <= 368 <= 392 <= 416 <= 44 <= 464 <= 488 <= 512 <= 536 <= 56 <= 584 <= 68 <= 632 <= 656 <= Histogram (TM_InterruptService) <= 16 <= 24 <= 32 <= 4 <= 48 <= 56 <= 64 <= 72 8 <= 96 <= 14 <= 112 <= 12 <= 136 <= 144 <= 152 <= 16 <= 168 <= 176 <= 184 <= 192 <= 2 <= 28 <= 216 <= 224 <= 232 <= 24 >= 241 ETP (TC_InterruptService) ETP (HandleTelecommand) ETP (TM_InterruptService) Fig. 8. Histograms and ETPs of the four tasks and two ISRs of the debie1 benchmark. Configuration: linear bin distribution, k = 128 bins, step =8. TABLE II NUMBER OF FUNCTION EVENTS AND REFERENCE VALUES AS WELL AS MINIMAL AND MAXIMAL OBSERVED EXECUTION TIMES OF THE TASKS AND ISRS OF THE DEBIE1 BENCHMARK. Name #Events Reference Min. Max. HandleAcquisition HandleHealthMonitoring HandleHitTrigger HandleTelecommand TC InterruptService TM InterruptService By construction, the histograms using a linear bin distribution can only track events well with an execution time of less than 117 cycles. All events with a higher execution time are accumulated in the last bin. This works well for the ISRs TC_InterruptService and TM_InterruptService, which have maximal observed execution times below this threshold. For the task HandleTelecommand, only a few events go over this threshold, and thus, the histogram represents the execution time distribution of this task rather good. However, since the real maximal observed execution time cannot be tracked by the histogram, the resulting ETP has been cut off at 117 cycles. This also happens for the tasks HandleAcquisition, HandleHealthMonitoring and HandleHitTrigger. For the last two tasks, we can in fact infer no meaningful ETPs with a linear bin distribution, as almost all events have associated execution times above the threshold. For HandleHealthMonitoring, this problem can be solved with the centered bin distribution. The histogram nicely shows how the execution time of most runs is distributed. For the other tasks and ISRs, the reference values do not fit the distributions well, and therefore most events are put either in the first or in the last bin. The centered bin distribution is thus very sensitive to good reference values. Overall, we obtain good results for those tasks and ISRs having low execution times (with the linear bin distribution) and for HandleHealthMonitoring, where we have a good reference value. Our method performs particular bad for HandleAcquisition and HandleHitTrigger, which have a wide spread between the minimal and maximal observed execution times and low reference values (see Table II). 7

8 Histogram (HandleAcquisition) <= 163 <= 183 <= 23 <= 223 <= 243 <= 263 <= 283 <= 33 <= 323 <= 343 <= 363 <= 383 <= 43 <= 423 <= 443 <= 463 <= 483 <= 53 <= 523 <= 543 <= 563 <= 583 <= 63 <= 623 <= 643 <= Histogram (HandleHealthMonitoring) <= 2627 <= 367 <= 357 <= 3947 <= 4387 <= 4827 <= 5267 <= 577 <= 6147 <= 6587 <= 727 <= 7467 <= <= 9227 <= 9667 <= 117 <= 1547 <= 1987 <= <= <= 1237 <= <= <= Histogram (HandleHitTrigger) <= 125 <= 13 <= 135 <= 14 <= 145 <= 15 <= 155 <= 16 <= 165 <= 17 <= 175 <= 18 <= 185 <= 19 <= 195 <= 2 <= 25 <= 21 <= 215 <= 22 <= 225 <= 23 <= 235 <= 24 <= 245 <= 25 ETP (HandleAcquisition) ETP (HandleHealthMonitoring) ETP (HandleHitTrigger) Histogram (HandleTelecommand) <= 357 <= 47 <= 457 <= 57 <= 557 <= 67 <= 657 <= 77 <= <= 97 <= 957 <= 17 <= 157 <= 117 <= 1157 <= 127 <= 1257 <= 137 <= 1357 <= 147 <= 1457 <= 157 <= 1557 <= Histogram (TC_InterruptService) <= 139 <= 144 <= 149 <= 154 <= 159 <= 164 <= 169 <= 174 <= 179 <= 184 <= 189 <= 194 <= 199 <= 24 <= 29 <= 214 <= 219 <= 224 <= 229 <= 234 <= 239 <= 244 <= 249 <= 254 <= 259 <= Histogram (TM_InterruptService) <= 133 <= 137 <= 141 <= 145 <= 149 <= 153 <= 157 <= 161 <= 165 <= 169 <= 173 <= 177 <= 181 <= 185 <= 189 <= 193 <= 197 <= 21 <= 25 <= 29 <= 213 <= 217 <= 221 <= 225 <= 229 <= 233 <= 237 ETP (HandleTelecommand) ETP (TC_InterruptService) ETP (TM_InterruptService) Fig. 9. Histograms and ETPs of the four tasks and two ISRs of the debie1 benchmark. Configuration: centered bin distribution, k = 128 bins, k l = 1, f comp =1. D. Analysis Based on Waypoint Edge Events E. Resource Usage Another point of interest for us was the comparison of ETPs derived from function measurements versus the ETPs derived from edge measurements (see Figure 1). Here, we had a closer look at TM_InterruptService. We took the ETPs for its individual edges and combined them with the help of three different convolutions: (a) the Gaussian convolution, (b) the infimal convolution and (c) the supremal convolution. For (a) and (c), we used maximisation as choice operation. For (b), we used minimisation as choice operation. In our example, the infimal and supremal convolution were of lesser use, as they heavily under- and overestimated the probabilities to overrun a given deadline. The Gaussian convolution proved to be too simplistic. It contains path combinations, that could never be observed in the measurements. Consequently, it predicts that the execution time of this ISR is below 548 cycles in % of all runs, whereas the measured ETP predicts that the execution time is below 24 cycles in % of all runs. The Histogram Module has been implemented using a Xilinx Virtex-7 XC7V585T FPGA. As shown in table III, the characteristics of the events depend on being either edge or function events. The former have a shorter average cycle count, which results in a higher frequency for new events to be processed. The overall amount of unique event IDs is in both cases rather low, which gives the possibility to store the results solely in BRAM memory. In doing so we miss the original objective to have a clock cycle faster than 5 ns, by.2 ns (equates to MHz). The resulting hardware consumption can be seen in Table IV. The timing delay can be further reduced by using the memory buffer implementation with external memory. In this case, we achieve an overall clock delay of 4.89 ns (equates to 24.5 Mhz). The limiting factor is the memory buffer and its FIFO size, which should not exceed 32 entries. If more entries are needed, the FIFO needs to be multiplexed. We interpolated the resulting hardware requirements for a FIFO size of 128 (see Table V). 8

9 Gaussian Convolution Infimal Convolution Supremal Convolution Fig. 1. ETPs of TM InterruptService, computed from the edge histograms with (a) Gaussian convolution, (b) infimal convolution or (c) supremal convolution. IX. CONCLUSION In this contribution, we have shown that beyond normal WCET analysis, advanced tools and HW architectures are capable to capture ETPs of modern SoCs, which cannot be analysed by other means. The challenge was to design this HW based analysis fast enough that it can run in parallel even to fast SoCs. This feature enables the developer to go through rather long and complex test cycles with the software. The gathered ETPs help us to identify those software parts with a high variance. Also, they enable us to make statistical predictions of upper bounds of the execution time with certain confidence intervals. One important point to consider is the choice of the bin distribution. The linear bin distribution works well when an upper bound of execution time is known (or can be easily estimated). The quality of results when using the centered bin distribution strongly depends on the chosen reference value. Finding some good heuristics for choosing the right bin distribution remains future work. In the future, we want to provide the user with a presentation of the results on a more abstract basis. Particularly, this could mean to identify certain types of statistical distributions and TABLE III PROCESSED EVENTS AND CONSEQUENTIAL REQUIREMENTS FOR FIFO SIZES WHEN USING EXTERNAL MEMORY Unique IDs Avg. Cycles FIFO Avg. Size FIFO Max. Size Function Events Edge Events TABLE IV RESOURCE USAGE OF IMPLEMENTATION USING BRAM MEMORY. LUTs Regs BRAM Delay Bin Calculation ns BRAM Storage ns Total ns TABLE V RESOURCE USAGE OF IMPLEMENTATION USING EXTERNAL MEMORY, WITH A MEMORY BUFFER FIFO SIZE OF 128. LUTs Regs BRAM Delay Bin Calculation ns Memory Buffer ns Total ns computing the relevant parameters of such distributions. ACKNOWLEDGMENT The authors like to thank Alexander Lange and Alexander Weiss of Accemic for providing the waypoint edge event stream. REFERENCES [1] R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenström, The worst-case execution-time problem overview of methods and survey of tools, ACM Transactions on Embedded Computing Systems, vol. 7, no. 3, pp. 36:1 36:53, May 28. [Online]. Available: [2] J. Nowotsch, M. Paulitsch, D. Bühler, H. Theiling, S. Wegener, and M. Schmidt, Multi-core Interference-Sensitive WCET Analysis Leveraging Runtime Resource Capacity Enforcement, in ECRTS 14: Proceedings of the 26th Euromicro Conference on Real-Time Systems, July 214. [3] B. Dreyer, C. Hochberger, A. Lange, S. Wegener, and A. Weiss, Continuous Non-Intrusive Hybrid WCET Estimation Using Waypoint Graphs, in 16th International Workshop on Worst-Case Execution Time Analysis (WCET 216), ser. OpenAccess Series in Informatics (OASIcs), M. Schoeberl, Ed., vol. 55. Dagstuhl, Germany: Schloss Dagstuhl Leibniz-Zentrum fuer Informatik, 216, pp. 4:1 4:11. [Online]. Available: [4] C. Cullmann, C. Ferdinand, G. Gebhard, D. Grund, C. Maiza (Burguière), J. Reineke, B. Triquet, S. Wegener, and R. Wilhelm, Predictability Considerations in the Design of Multi-Core Embedded Systems, Ingenieurs de l Automobile, vol. 87, pp , 21. [5] M. Schoeberl, S. Abbaspour, B. Akesson, N. Audsley, R. Capasso, J. Garside, K. Goossens, S. Goossens, S. Hansen, R. Heckmann, S. Hepp, B. Huber, A. Jordan, E. Kasapaki, J. Knoop, Y. Li, D. Prokesch, W. Puffitsch, P. Puschner, A. Rocha, C. Silva, J. Spars, and A. Tocchi, T-crest: Time-predictable multi-core architecture for embedded systems, Journal of Systems Architecture, vol. 61, no. 9, pp , 215. [Online]. Available: [6] A. Leulseged and N. Nissanke, Probabilistic Analysis of Multi-processor Scheduling of Tasks with Uncertain Parameters. Berlin, Heidelberg: Springer Berlin Heidelberg, 24, pp [Online]. Available: 7 [7] J. L. Diaz, D. F. Garcia, K. Kim, C.-G. Lee, L. L. Bello, J. M. Lopez, S. L. Min, and O. Mirabella, Stochastic analysis of periodic real-time systems, in 23rd IEEE Real-Time Systems Symposium, 22. RTSS 22., 22, pp [8] A. Burns, G. Bernat, and I. Broster, A Probabilistic Framework for Schedulability Analysis. Berlin, Heidelberg: Springer Berlin Heidelberg, 23, pp [Online]. Available: 17/ [9] L. Abeni, T. Cucinotta, G. Lipari, L. Marzario, and L. Palopoli, Qos management through adaptive reservations, Real-Time Systems, vol. 29, no. 2, pp , 25. [Online]. Available: http: //dx.doi.org/1.17/s

10 [1] F. J. Cazorla, T. Vardanega, E. Quiñones, and J. Abella, Upperbounding Program Execution Time with Extreme Value Theory, in 13th International Workshop on Worst-Case Execution Time Analysis, ser. OpenAccess Series in Informatics (OASIcs), C. Maiza, Ed., vol. 3. Dagstuhl, Germany: Schloss Dagstuhl Leibniz-Zentrum fuer Informatik, 213, pp [Online]. Available: de/opus/volltexte/213/4123 [11] M. Lindberg, A survey of reservation-based scheduling, 27. [12] L. Santinelli, J. Morio, G. Dufour, and D. Jacquemart, On the Sustainability of the Extreme Value Theory for WCET Estimation, in 14th International Workshop on Worst-Case Execution Time Analysis, ser. OpenAccess Series in Informatics (OASIcs), H. Falk, Ed., vol. 39. Dagstuhl, Germany: Schloss Dagstuhl Leibniz-Zentrum fuer Informatik, 214, pp [Online]. Available: de/opus/volltexte/214/461 [13] G. A. Kaczynski, L. L. Bello, and T. Nolte, Deriving exact stochastic response times of periodic tasks in hybrid priority-driven soft real-time systems, in 27 IEEE Conference on Emerging Technologies and Factory Automation (EFTA 27), Sept 27, pp [14] J. P. Hansen, S. A. Hissam, and G. A. Moreno, Statistical-based WCET estimation and validation, in 9th Intl. Workshop on Worst-Case Execution Time Analysis, WCET 29, Dublin, Ireland, July 1-3, 29, 29. [Online]. Available: [15] K. S. Gautam, Parallel histogram calculation for fpga: Histogram calculation, in 216 IEEE 6th International Conference on Advanced Computing (IACC), Feb 216, pp [16] A. M. Alsuwailem and S. A. Alshebeili, A new approach for real-time histogram equalization using fpga, in 25 International Symposium on Intelligent Signal Processing and Communication Systems, Dec 25, pp [17] A. Sanny, Y. H. E. Yang, and V. K. Prasanna, Energy-efficient histogram on fpga, in 214 International Conference on ReConFigurable Computing and FPGAs (ReConFig14), Dec 214, pp [18] N. Stekas and D. v. d. Heuvel, Face recognition using local binary patterns histograms (lbph) on an fpga-based system on chip (soc), in 216 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 216, pp [19] S. M. Petters, Execution-Time Profiles, NICTA, Tech. Rep., January 27. [Online]. Available: Petters 7.pdf [2] The debie1 Benchmark, 215. [Online]. Available: fr/wiki/doku.php?id=wtc:benchmarks:debie1 [21] H. Falk, S. Altmeyer, P. Hellinckx, B. Lisper, W. Puffitsch, C. Rochange, M. Schoeberl, R. B. Sørensen, P. Wägemann, and S. Wegener, TACLeBench: A Benchmark Collection to Support Worst-Case Execution Time Research, in 16th International Workshop on Worst- Case Execution Time Analysis (WCET 216), ser. OpenAccess Series in Informatics (OASIcs), M. Schoeberl, Ed., vol. 55. Dagstuhl, Germany: Schloss Dagstuhl Leibniz-Zentrum fuer Informatik, 216, pp. 2:1 2:1. [Online]. Available: 1

Continuous Non-Intrusive Hybrid WCET Estimation Using Waypoint Graphs

Continuous Non-Intrusive Hybrid WCET Estimation Using Waypoint Graphs Continuous Non-Intrusive Hybrid WCET Estimation Using Waypoint Graphs Boris Dreyer 1, Christian Hochberger 2, Alexander Lange 3, Simon Wegener 4, and Alexander Weiss 5 1 Fachgebiet Rechnersysteme, Technische

More information

Precise Continuous Non-Intrusive Measurement-Based Execution Time Estimation. Boris Dreyer, Christian Hochberger, Simon Wegener, Alexander Weiss

Precise Continuous Non-Intrusive Measurement-Based Execution Time Estimation. Boris Dreyer, Christian Hochberger, Simon Wegener, Alexander Weiss Precise Continuous Non-Intrusive Measurement-Based Execution Time Estimation Boris Dreyer, Christian Hochberger, Simon Wegener, Alexander Weiss This work was funded within the project CONIRAS by the German

More information

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS Christian Ferdinand and Reinhold Heckmann AbsInt Angewandte Informatik GmbH, Stuhlsatzenhausweg 69, D-66123 Saarbrucken, Germany info@absint.com

More information

Handling Cyclic Execution Paths in Timing Analysis of Component-based Software

Handling Cyclic Execution Paths in Timing Analysis of Component-based Software Handling Cyclic Execution Paths in Timing Analysis of Component-based Software Luka Lednicki, Jan Carlson Mälardalen Real-time Research Centre Mälardalen University Västerås, Sweden Email: {luka.lednicki,

More information

Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability

Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability Yiqiang Ding, Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Outline

More information

Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis

Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis Proceedings of the 5th Intl Workshop on Worst-Case Execution Time (WCET) Analysis Page 41 of 49 Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis

More information

A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking

A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking Bekim Cilku, Daniel Prokesch, Peter Puschner Institute of Computer Engineering Vienna University of Technology

More information

Data-Flow Based Detection of Loop Bounds

Data-Flow Based Detection of Loop Bounds Data-Flow Based Detection of Loop Bounds Christoph Cullmann and Florian Martin AbsInt Angewandte Informatik GmbH Science Park 1, D-66123 Saarbrücken, Germany cullmann,florian@absint.com, http://www.absint.com

More information

Toward Language Independent Worst-Case Execution Time Calculation

Toward Language Independent Worst-Case Execution Time Calculation 10 Toward Language Independent Worst-Case Execution Time Calculation GORDANA RAKIĆ and ZORAN BUDIMAC, Faculty of Science, University of Novi Sad Set of Software Quality Static Analyzers (SSQSA) is a set

More information

Single-Path Programming on a Chip-Multiprocessor System

Single-Path Programming on a Chip-Multiprocessor System Single-Path Programming on a Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at

More information

Timing analysis and timing predictability

Timing analysis and timing predictability Timing analysis and timing predictability Architectural Dependences Reinhard Wilhelm Saarland University, Saarbrücken, Germany ArtistDesign Summer School in China 2010 What does the execution time depends

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

Timing Analysis Enhancement for Synchronous Program

Timing Analysis Enhancement for Synchronous Program Timing Analysis Enhancement for Synchronous Program Extended Abstract Pascal Raymond, Claire Maiza, Catherine Parent-Vigouroux, Fabienne Carrier, and Mihail Asavoae Grenoble-Alpes University Verimag, Centre

More information

Single Pass Connected Components Analysis

Single Pass Connected Components Analysis D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected

More information

Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection

Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection Daniel Grund 1 Jan Reineke 2 1 Saarland University, Saarbrücken, Germany 2 University of California, Berkeley, USA Euromicro

More information

D 8.4 Workshop Report

D 8.4 Workshop Report Project Number 288008 D 8.4 Workshop Report Version 2.0 30 July 2014 Final Public Distribution Denmark Technical University, Eindhoven University of Technology, Technical University of Vienna, The Open

More information

Mixed Criticality Scheduling in Time-Triggered Legacy Systems

Mixed Criticality Scheduling in Time-Triggered Legacy Systems Mixed Criticality Scheduling in Time-Triggered Legacy Systems Jens Theis and Gerhard Fohler Technische Universität Kaiserslautern, Germany Email: {jtheis,fohler}@eit.uni-kl.de Abstract Research on mixed

More information

Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints

Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints Amit Kulkarni, Tom Davidson, Karel Heyse, and Dirk Stroobandt ELIS department, Computer Systems Lab, Ghent

More information

History-based Schemes and Implicit Path Enumeration

History-based Schemes and Implicit Path Enumeration History-based Schemes and Implicit Path Enumeration Claire Burguière and Christine Rochange Institut de Recherche en Informatique de Toulouse Université Paul Sabatier 6 Toulouse cedex 9, France {burguier,rochange}@irit.fr

More information

FIFO Cache Analysis for WCET Estimation: A Quantitative Approach

FIFO Cache Analysis for WCET Estimation: A Quantitative Approach FIFO Cache Analysis for WCET Estimation: A Quantitative Approach Abstract Although most previous work in cache analysis for WCET estimation assumes the LRU replacement policy, in practise more processors

More information

Time-Predictable Virtual Memory

Time-Predictable Virtual Memory Time-Predictable Virtual Memory Wolfgang Puffitsch and Martin Schoeberl Department of Applied Mathematics and Computer Science, Technical University of Denmark wopu@dtu.dk, masca@dtu.dk Abstract Virtual

More information

A Reconfigurable MapReduce Accelerator for multi-core all-programmable SoCs

A Reconfigurable MapReduce Accelerator for multi-core all-programmable SoCs A Reconfigurable MapReduce Accelerator for multi-core all-programmable SoCs Christoforos Kachris, Georgios Ch. Sirakoulis Department of Electrical & Computer Engineering Democritus University of Thrace,

More information

Predictable hardware: The AURIX Microcontroller Family

Predictable hardware: The AURIX Microcontroller Family Predictable hardware: The AURIX Microcontroller Family Worst-Case Execution Time Analysis WCET 2013, July 9, 2013, Paris, France Jens Harnisch (Jens.Harnisch@Infineon.com), Infineon Technologies AG, Automotive

More information

Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses

Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses Bekim Cilku, Roland Kammerer, and Peter Puschner Institute of Computer Engineering Vienna University of Technology A0 Wien, Austria

More information

On the Use of Context Information for Precise Measurement-Based Execution Time Estimation

On the Use of Context Information for Precise Measurement-Based Execution Time Estimation On the Use of Context Information for Precise Measurement-Based Execution Time Estimation Stefan Stattelmann Florian Martin FZI Forschungszentrum Informatik Karlsruhe AbsInt Angewandte Informatik GmbH

More information

Scope-based Method Cache Analysis

Scope-based Method Cache Analysis Scope-based Method Cache Analysis Benedikt Huber 1, Stefan Hepp 1, Martin Schoeberl 2 1 Vienna University of Technology 2 Technical University of Denmark 14th International Workshop on Worst-Case Execution

More information

On Static Timing Analysis of GPU Kernels

On Static Timing Analysis of GPU Kernels On Static Timing Analysis of GPU Kernels Vesa Hirvisalo Aalto University Espoo, Finland vesa.hirvisalo@aalto.fi Abstract We study static timing analysis of programs running on GPU accelerators. Such programs

More information

Approximate Worst-Case Execution Time Analysis for Early Stage Embedded Systems Development

Approximate Worst-Case Execution Time Analysis for Early Stage Embedded Systems Development Approximate Worst-Case Execution Time Analysis for Early Stage Embedded Systems Development Jan Gustafsson, Peter Altenbernd, Andreas Ermedahl, Björn Lisper School of Innovation, Design and Engineering,

More information

Monitoring and WCET Analysis in COTS Multi-core-SoC-based Mixed-Criticality Systems

Monitoring and WCET Analysis in COTS Multi-core-SoC-based Mixed-Criticality Systems Monitoring and WCET Analysis in COTS Multi-core-SoC-based Mixed-Criticality Systems Jan Nowotsch, Michael Paulitsch, Arne Henrichsen, Werner Pongratz, Andreas Schacht EADS Innovation Works, Munich, Germany,

More information

CACHE-RELATED PREEMPTION DELAY COMPUTATION FOR SET-ASSOCIATIVE CACHES

CACHE-RELATED PREEMPTION DELAY COMPUTATION FOR SET-ASSOCIATIVE CACHES CACHE-RELATED PREEMPTION DELAY COMPUTATION FOR SET-ASSOCIATIVE CACHES PITFALLS AND SOLUTIONS 1 Claire Burguière, Jan Reineke, Sebastian Altmeyer 2 Abstract In preemptive real-time systems, scheduling analyses

More information

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Christian Schmitt, Moritz Schmid, Frank Hannig, Jürgen Teich, Sebastian Kuckuk, Harald Köstler Hardware/Software Co-Design, System

More information

Timing Anomalies and WCET Analysis. Ashrit Triambak

Timing Anomalies and WCET Analysis. Ashrit Triambak Timing Anomalies and WCET Analysis Ashrit Triambak October 20, 2014 Contents 1 Abstract 2 2 Introduction 2 3 Timing Anomalies 3 3.1 Retated work........................... 4 3.2 Limitations of Previous

More information

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Dariusz Caban, Institute of Informatics, Gliwice, Poland - June 18, 2014 The use of a real-time multitasking kernel simplifies

More information

Guaranteed Loop Bound Identification from Program Traces for WCET

Guaranteed Loop Bound Identification from Program Traces for WCET Guaranteed Loop Bound Identification from Program Traces for WCET Mark Bartlett, Iain Bate and Dimitar Kazakov Department of Computer Science University of York Heslington, York, UK Email: {mark.bartlett,iain.bate,dimitar.kazakov}@cs.york.ac.uk

More information

Predicting the Worst-Case Execution Time of the Concurrent Execution. of Instructions and Cycle-Stealing DMA I/O Operations

Predicting the Worst-Case Execution Time of the Concurrent Execution. of Instructions and Cycle-Stealing DMA I/O Operations ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, La Jolla, California, June 1995. Predicting the Worst-Case Execution Time of the Concurrent Execution of Instructions and Cycle-Stealing

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

EPC Enacted: Integration in an Industrial Toolbox and Use Against a Railway Application

EPC Enacted: Integration in an Industrial Toolbox and Use Against a Railway Application EPC Enacted: Integration in an Industrial Toolbox and Use Against a Railway Application E. Mezzetti, M. Fernandez, A. Bardizbanyan I. Agirre, J. Abella, T. Vardanega, F. Cazorla, * This project Real-Time

More information

Improving Performance of Single-path Code Through a Time-predictable Memory Hierarchy

Improving Performance of Single-path Code Through a Time-predictable Memory Hierarchy Improving Performance of Single-path Code Through a Time-predictable Memory Hierarchy Bekim Cilku, Wolfgang Puffitsch, Daniel Prokesch, Martin Schoeberl and Peter Puschner Vienna University of Technology,

More information

USING A RUNTIME MEASUREMENT DEVICE WITH MEASUREMENT-BASED WCET ANALYSIS

USING A RUNTIME MEASUREMENT DEVICE WITH MEASUREMENT-BASED WCET ANALYSIS USING A RUNTIME MEASUREMENT DEVICE WITH MEASUREMENT-BASED WCET ANALYSIS Institut für Technische Informatik Technische Universität Wien Treitlstraße 3/182/1 1040 Wien, Austria bernhard@vmars.tuwien.ac.at

More information

State-based Communication on Time-predictable Multicore Processors

State-based Communication on Time-predictable Multicore Processors State-based Communication on Time-predictable Multicore Processors Rasmus Bo Sørensen, Martin Schoeberl, Jens Sparsø Department of Applied Mathematics and Computer Science Technical University of Denmark

More information

Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses

Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses Bekim Cilku Institute of Computer Engineering Vienna University of Technology A40 Wien, Austria bekim@vmars tuwienacat Roland Kammerer

More information

arxiv: v3 [cs.os] 27 Mar 2019

arxiv: v3 [cs.os] 27 Mar 2019 A WCET-aware cache coloring technique for reducing interference in real-time systems Fabien Bouquillon, Clément Ballabriga, Giuseppe Lipari, Smail Niar 2 arxiv:3.09v3 [cs.os] 27 Mar 209 Univ. Lille, CNRS,

More information

Task-Set Generator for Schedulability Analysis using the TACLeBench benchmark suite

Task-Set Generator for Schedulability Analysis using the TACLeBench benchmark suite Task-Set Generator for Schedulability Analysis using the TACLeBench benchmark suite Yorick De Bock imec, IDLab, Faculty of Applied Engineering yorick.debock@uantwerpen.be Jan Broeckhove imec, IDLab, Department

More information

Exploiting Standard Deviation of CPI to Evaluate Architectural Time-Predictability

Exploiting Standard Deviation of CPI to Evaluate Architectural Time-Predictability Regular Paper Journal of Computing Science and Engineering, Vol. 8, No. 1, March 2014, pp. 34-42 Exploiting Standard Deviation of CPI to Evaluate Architectural Time-Predictability Wei Zhang* and Yiqiang

More information

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid

More information

Advanced FPGA Design Methodologies with Xilinx Vivado

Advanced FPGA Design Methodologies with Xilinx Vivado Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

A Dynamic Instruction Scratchpad Memory for Embedded Processors Managed by Hardware

A Dynamic Instruction Scratchpad Memory for Embedded Processors Managed by Hardware A Dynamic Instruction Scratchpad Memory for Embedded Processors Managed by Hardware Stefan Metzlaff 1, Irakli Guliashvili 1,SaschaUhrig 2,andTheoUngerer 1 1 Department of Computer Science, University of

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

FROM TIME-TRIGGERED TO TIME-DETERMINISTIC REAL-TIME SYSTEMS

FROM TIME-TRIGGERED TO TIME-DETERMINISTIC REAL-TIME SYSTEMS FROM TIME-TRIGGERED TO TIME-DETERMINISTIC REAL-TIME SYSTEMS Peter Puschner and Raimund Kirner Vienna University of Technology, A-1040 Vienna, Austria {peter, raimund}@vmars.tuwien.ac.at Abstract Keywords:

More information

Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network

Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network Thomas Nolte, Hans Hansson, and Christer Norström Mälardalen Real-Time Research Centre Department of Computer Engineering

More information

Distributed Deadlock Detection for. Distributed Process Networks

Distributed Deadlock Detection for. Distributed Process Networks 0 Distributed Deadlock Detection for Distributed Process Networks Alex Olson Embedded Software Systems Abstract The distributed process network (DPN) model allows for greater scalability and performance

More information

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,

More information

FPGA Implementation of a Single Pass Real-Time Blob Analysis Using Run Length Encoding

FPGA Implementation of a Single Pass Real-Time Blob Analysis Using Run Length Encoding FPGA Implementation of a Single Pass Real-Time J. Trein *, A. Th. Schwarzbacher + and B. Hoppe * Department of Electronic and Computer Science, Hochschule Darmstadt, Germany *+ School of Electronic and

More information

A Scalable Multiprocessor for Real-time Signal Processing

A Scalable Multiprocessor for Real-time Signal Processing A Scalable Multiprocessor for Real-time Signal Processing Daniel Scherrer, Hans Eberle Institute for Computer Systems, Swiss Federal Institute of Technology CH-8092 Zurich, Switzerland {scherrer, eberle}@inf.ethz.ch

More information

Trace-Based Context-Sensitive Timing Simulation Considering Execution Path Variations

Trace-Based Context-Sensitive Timing Simulation Considering Execution Path Variations FZI FORSCHUNGSZENTRUM INFORMATIK Trace-Based Context-Sensitive Timing Simulation Considering Execution Path Variations Sebastian Ottlik, Jan Micha Borrmann, Sadik Asbach, Alexander Viehl, Wolfgang Rosenstiel,

More information

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals Performance Evaluations of a Multithreaded Java Microcontroller J. Kreuzinger, M. Pfeer A. Schulz, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe, Germany U. Brinkschulte,

More information

Real-Time Audio Processing on the T-CREST Multicore Platform

Real-Time Audio Processing on the T-CREST Multicore Platform Real-Time Audio Processing on the T-CREST Multicore Platform Daniel Sanz Ausin, Luca Pezzarossa, and Martin Schoeberl Department of Applied Mathematics and Computer Science Technical University of Denmark,

More information

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems Abstract Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time

More information

Parallel graph traversal for FPGA

Parallel graph traversal for FPGA LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,

More information

Developing a Data Driven System for Computational Neuroscience

Developing a Data Driven System for Computational Neuroscience Developing a Data Driven System for Computational Neuroscience Ross Snider and Yongming Zhu Montana State University, Bozeman MT 59717, USA Abstract. A data driven system implies the need to integrate

More information

Exploring OpenCL Memory Throughput on the Zynq

Exploring OpenCL Memory Throughput on the Zynq Exploring OpenCL Memory Throughput on the Zynq Technical Report no. 2016:04, ISSN 1652-926X Chalmers University of Technology Bo Joel Svensson bo.joel.svensson@gmail.com Abstract The Zynq platform combines

More information

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating

More information

Static Memory and Timing Analysis of Embedded Systems Code

Static Memory and Timing Analysis of Embedded Systems Code Static Memory and Timing Analysis of Embedded Systems Code Christian Ferdinand Reinhold Heckmann Bärbel Franzen AbsInt Angewandte Informatik GmbH Science Park 1, D-66123 Saarbrücken, Germany Phone: +49-681-38360-0

More information

Parallel Evaluation of Hopfield Neural Networks

Parallel Evaluation of Hopfield Neural Networks Parallel Evaluation of Hopfield Neural Networks Antoine Eiche, Daniel Chillet, Sebastien Pillement and Olivier Sentieys University of Rennes I / IRISA / INRIA 6 rue de Kerampont, BP 818 2232 LANNION,FRANCE

More information

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of

More information

Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling

Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling Tobias Schwarzer 1, Joachim Falk 1, Michael Glaß 1, Jürgen Teich 1, Christian Zebelein 2, Christian

More information

Parametric Timing Analysis for Complex Architectures

Parametric Timing Analysis for Complex Architectures Parametric Timing Analysis for Complex Architectures Sebastian Altmeyer Department of Computer Science Saarland University altmeyer@cs.uni-sb.de Björn Lisper Department of Computer Science and Electronics

More information

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage silage@temple.edu ECE Temple University www.temple.edu/scdl Signal Processing Algorithms into Fixed Point FPGA Hardware Motivation

More information

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009 Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access

More information

A Template for Predictability Definitions with Supporting Evidence

A Template for Predictability Definitions with Supporting Evidence A Template for Predictability Definitions with Supporting Evidence Daniel Grund 1, Jan Reineke 2, and Reinhard Wilhelm 1 1 Saarland University, Saarbrücken, Germany. grund@cs.uni-saarland.de 2 University

More information

Improving Timing Analysis for Matlab Simulink/Stateflow

Improving Timing Analysis for Matlab Simulink/Stateflow Improving Timing Analysis for Matlab Simulink/Stateflow Lili Tan, Björn Wachter, Philipp Lucas, Reinhard Wilhelm Universität des Saarlandes, Saarbrücken, Germany {lili,bwachter,phlucas,wilhelm}@cs.uni-sb.de

More information

Static WCET Analysis: Methods and Tools

Static WCET Analysis: Methods and Tools Static WCET Analysis: Methods and Tools Timo Lilja April 28, 2011 Timo Lilja () Static WCET Analysis: Methods and Tools April 28, 2011 1 / 23 1 Methods 2 Tools 3 Summary 4 References Timo Lilja () Static

More information

Eliminating Annotations by Automatic Flow Analysis of Real-Time Programs

Eliminating Annotations by Automatic Flow Analysis of Real-Time Programs Eliminating Annotations by Automatic Flow Analysis of Real-Time Programs Jan Gustafsson Department of Computer Engineering, Mälardalen University Box 883, S-721 23 Västerås, Sweden jangustafsson@mdhse

More information

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Introduction WCET of program ILP Formulation Requirement SPM allocation for code SPM allocation for data Conclusion

More information

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI CHEN TIANZHOU SHI QINGSONG JIANG NING College of Computer Science Zhejiang University College of Computer Science

More information

Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of SOC Design

Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of SOC Design IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 1, JANUARY 2003 1 Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of

More information

SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core

SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core Sebastian Hahn and Jan Reineke RTSS, Nashville December, 2018 saarland university computer science SIC: Provably Timing-Predictable

More information

The Worst-Case Execution-Time Problem Overview of Methods and Survey of Tools

The Worst-Case Execution-Time Problem Overview of Methods and Survey of Tools The Worst-Case Execution-Time Problem Overview of Methods and Survey of Tools REINHARD WILHELM TULIKA MITRA Saarland University National University of Singapore JAKOB ENGBLOM FRANK MUELLER Virtutech AB

More information

This is the forth SAP MaxDB Expert Session and this session covers the topic database performance analysis.

This is the forth SAP MaxDB Expert Session and this session covers the topic database performance analysis. 1 This is the forth SAP MaxDB Expert Session and this session covers the topic database performance analysis. Analyzing database performance is a complex subject. This session gives an overview about the

More information

Dissecting Execution Traces to Understand Long Timing Effects

Dissecting Execution Traces to Understand Long Timing Effects Dissecting Execution Traces to Understand Long Timing Effects Christine Rochange and Pascal Sainrat February 2005 Rapport IRIT-2005-6-R Contents 1. Introduction... 5 2. Long timing effects... 5 3. Methodology...

More information

Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter

Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter M. Bednara, O. Beyer, J. Teich, R. Wanka Paderborn University D-33095 Paderborn, Germany bednara,beyer,teich @date.upb.de,

More information

Statistical Timing Analysis Using Bounds and Selective Enumeration

Statistical Timing Analysis Using Bounds and Selective Enumeration IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 9, SEPTEMBER 2003 1243 Statistical Timing Analysis Using Bounds and Selective Enumeration Aseem Agarwal, Student

More information

An Approach to Task Attribute Assignment for Uniprocessor Systems

An Approach to Task Attribute Assignment for Uniprocessor Systems An Approach to ttribute Assignment for Uniprocessor Systems I. Bate and A. Burns Real-Time Systems Research Group Department of Computer Science University of York York, United Kingdom e-mail: fijb,burnsg@cs.york.ac.uk

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System

Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System Daniel Sandell Master s thesis D-level, 20 credits Dept. of Computer Science Mälardalen University Supervisor:

More information

Predicated Software Pipelining Technique for Loops with Conditions

Predicated Software Pipelining Technique for Loops with Conditions Predicated Software Pipelining Technique for Loops with Conditions Dragan Milicev and Zoran Jovanovic University of Belgrade E-mail: emiliced@ubbg.etf.bg.ac.yu Abstract An effort to formalize the process

More information

CONTENTION IN MULTICORE HARDWARE SHARED RESOURCES: UNDERSTANDING OF THE STATE OF THE ART

CONTENTION IN MULTICORE HARDWARE SHARED RESOURCES: UNDERSTANDING OF THE STATE OF THE ART CONTENTION IN MULTICORE HARDWARE SHARED RESOURCES: UNDERSTANDING OF THE STATE OF THE ART Gabriel Fernandez 1, Jaume Abella 2, Eduardo Quiñones 2, Christine Rochange 3, Tullio Vardanega 4 and Francisco

More information

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,

More information

The Argo software perspective

The Argo software perspective The Argo software perspective A multicore programming exercise Rasmus Bo Sørensen Updated by Luca Pezzarossa April 4, 2018 Copyright 2017 Technical University of Denmark This work is licensed under a Creative

More information

A Frame Study for Post-Processing Analysis on System Behavior: A Case Study of Deadline Miss Detection

A Frame Study for Post-Processing Analysis on System Behavior: A Case Study of Deadline Miss Detection Journal of Computer Science 6 (12): 1505-1510, 2010 ISSN 1549-3636 2010 Science Publications A Frame Study for Post-Processing Analysis on System Behavior: A Case Study of Deadline Miss Detection Junghee

More information

AirTight: A Resilient Wireless Communication Protocol for Mixed- Criticality Systems

AirTight: A Resilient Wireless Communication Protocol for Mixed- Criticality Systems AirTight: A Resilient Wireless Communication Protocol for Mixed- Criticality Systems Alan Burns, James Harbin, Leandro Indrusiak, Iain Bate, Robert Davis and David Griffin Real-Time Systems Research Group

More information

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI, CHEN TIANZHOU, SHI QINGSONG, JIANG NING College of Computer Science Zhejiang University College of Computer

More information

A Direct Memory Access Controller (DMAC) IP-Core using the AMBA AXI protocol

A Direct Memory Access Controller (DMAC) IP-Core using the AMBA AXI protocol SIM 2011 26 th South Symposium on Microelectronics 167 A Direct Memory Access Controller (DMAC) IP-Core using the AMBA AXI protocol 1 Ilan Correa, 2 José Luís Güntzel, 1 Aldebaro Klautau and 1 João Crisóstomo

More information

Timing Analysis of Automatically Generated Code by MATLAB/Simulink

Timing Analysis of Automatically Generated Code by MATLAB/Simulink Timing Analysis of Automatically Generated Code by MATLAB/Simulink Rômulo Silva de Oliveira, Marcos Vinicius Linhares, Ricardo Bacha Borges Systems and Automation Department - DAS Federal University of

More information

Performance Tuning on the Blackfin Processor

Performance Tuning on the Blackfin Processor 1 Performance Tuning on the Blackfin Processor Outline Introduction Building a Framework Memory Considerations Benchmarks Managing Shared Resources Interrupt Management An Example Summary 2 Introduction

More information

Integration of Code-Level and System-Level Timing Analysis for Early Architecture Exploration and Reliable Timing Verification

Integration of Code-Level and System-Level Timing Analysis for Early Architecture Exploration and Reliable Timing Verification Integration of Code-Level and System-Level Timing Analysis for Early Architecture Exploration and Reliable Timing Verification C. Ferdinand 1, R. Heckmann 1, D. Kästner 1, K. Richter 2, N. Feiertag 2,

More information

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems Martin Schoeberl, Florian Brandner, Jens Sparsø, Evangelia Kasapaki Technical University of Denamrk 1 Real-Time Systems

More information

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2 ISSN 2277-2685 IJESR/November 2014/ Vol-4/Issue-11/799-807 Shruti Hathwalia et al./ International Journal of Engineering & Science Research DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL ABSTRACT

More information