COTS Multicore Processors in Avionics Systems: Challenges and Solutions

COTS Multicore Processors in Avionics Systems: Challenges and Solutions Dionisio de Niz Bjorn Andersson and Lutz Wrage dionisio@sei.cmu.edu, baandersson@sei.cmu.edu, lwrage@sei.cmu.edu

Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 06 JAN 2015 2. REPORT TYPE N/A 3. DATES COVERED 4. TITLE AND SUBTITLE COTS Multicore Processors in Avionics Systems: Challenges and Solutions 6. AUTHOR(S) Wrage /Dionisio de Niz Bjorn Andersson Lutz 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR S ACRONYM(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release, distribution unlimited. 13. SUPPLEMENTARY NOTES The original document contains color images. 14. ABSTRACT 15. SUBJECT TERMS 11. SPONSOR/MONITOR S REPORT NUMBER(S) 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT SAR a. REPORT unclassified b. ABSTRACT unclassified c. THIS PAGE unclassified 18. NUMBER OF PAGES 28 19a. NAME OF RESPONSIBLE PERSON Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

Copyright 2015 Carnegie Mellon University This material is based upon work funded and supported by the Department of Defense under Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN AS-IS BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. This material has been approved for public release and unlimited distribution. This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at permission@sei.cmu.edu. Carnegie Mellon is registered in the U.S. Patent and Trademark Office by Carnegie Mellon University. DM-0002049 2

Why Multi-Core Processors? Processor development trend Increasing overall performance by integrating multiple cores Embedded systems: Actively adopting multi-core CPUs Automotive: Freescale i.mx6 4-core CPU NVIDIA Tegra K1 platform Avionics and defense: Rugged Intel i7 single board computers Freescale P4080 8-core CPU 3

Shared Hardware: Multicore Memory System Core 1 L1/L2 Core 2 L1/L2 Core 3 L1/L2 Core N L1/L2 Last-Level Cache (L3) Memory Bus (and Mem Controller) DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 DRAM Bank 2 DRAM Bank B 4

Cache Interference Across Cores X Core 1 XY Y Core 2 Cache 5

Bank Interference Across Cores BANK columns Core 1 X Y Z A B C rows Core 2 AX BY CZ 6

Impact of Memory Interference 1 attacker Max 5.5x increase 2 attackers Max 8.4x increase 3 attackers Max 12x increase We should predict, bound and reduce the memory interference delay! Norm. execution time (%) 1200 1000 800 600 400 200 0 blackscholes bodytrack 12x increase observed canneal ferret fluidanimate freqmine raytrace streamcluster swaptions vips x264 7

Cache / Bank Partitioning (Coloring) Main Mem Virtual Pages Cache Set 8

Cache / Bank Partitioning (Coloring) Main Mem Banks Virtual Pages 9

Cache / Bank Partitioning (Coloring) Main Mem Set associativity Virtual Pages Cache Address bits Cache sets 16 15 14 13 12 One page 6 Cache Index 10

Cache and Bank Address Bits Cache Index Address bits 20 19 18 17 16 15 14 13 12 XOR XOR XOR Bank Index E.g. 2 bank bits 2 cache bits 1 shared bit 11

Coordinated Cache and Bank Partitioning (Private Partitions) Avoid conflicting color assignments Take advantage of different conflict behaviors Banks can be shared within same core but not across cores Cache cannot be shared within or across cores Take advantage of sensitivity of execution time to cache Task with highest sensitivity to cache is assigned more cache Diminishing returns taken into account Two algorithms explored Mixed-Integer Linear Programming Knapsack 12

Experimental Results 180% 170% 160% Cache coloring only Our coordinated approach 150% 140% 130% 120% 110% 100% 90% 80% PS.canneal PS.ferret PS.streamcluster PS.fluidanimate PS.facesim PS.freqmine PS.x264 SPEC.leslie3d SPEC.mcf SPEC.milc SPEC.sphinx3 13

Shared Bank Partitioning Explicitly considers the timing characteristics of major DRAM resources Rank/bank/bus timing constraints (JEDEC standard) Request re-ordering effect Bounding memory interference delay for a task Combines request-driven and job-driven approaches Task s own memory requests Interfering memory requests during the job execution Software DRAM bank partitioning awareness Analyzes the effect of dedicated and shared DRAM banks 14

DRAM Organization DRAM Rank Command bus Address bus Data bus 64-bit CHIP 1 8-bit CHIP 2 CHIP 3 CHIP 4 CHIP 5 CHIP 6 CHIP 7 CHIP 8 DRAM Chip Bank 8 Bank... Bank 1 Columns Command bus Address bus Data bus Command decoder 8-bit Row address Row decoder Column address Row buffer Column decoder Rows Row hit Row conflict DRAM access latency varies depending on which row is stored in the row buffer 15

Memory Controller Memory requests from CPU cores Processor data bus Request buffer Bank 1 Priority Queue Bank 2 Priority Queue... Bank n Priority Queue Memory scheduler Bank 1 Scheduler Bank 2 Scheduler... Bank n Scheduler Read/ Write Buffers Two-level hierarchical scheduling structure Channel Scheduler DRAM address/command buses DRAM data bus 16

Memory Scheduling Policy FR-FCFS: First-Ready, First-Come First-Serve Goal: maximize DRAM throughput Maximize row buffer hit rate Bank 1 Scheduler Bank 2 Scheduler... Bank n Scheduler 1. Bank scheduler Considers bank timing constraints Prioritizes row-hit requests In case of tie, prioritizes older requests Channel Scheduler 2. Channel scheduler Considers channel timing constraints Prioritizes older requests Memory access interference occurs at both bank and channel schedulers Intra-bank interference at bank scheduler Inter-bank interference at channel scheduler 17

DRAM Bank Partitioning Prevents intra-bank interference by dedicating different DRAM banks to each core Can be supported in the OS kernel (1) w/o bank partitioning Core 1 Core 2 (2) w/ bank partitioning Core 1 Core 2 Bank 1 Scheduler Bank 2 Scheduler Bank 1 Scheduler Bank 2 Scheduler Channel Scheduler Channel Scheduler Intra-bank and inter-bank interference Only inter-bank interference 18

Bounding Memory Interference Delay 1. Request-Driven Bounding 2. Job-Driven Bounding Response-Time Based Schedulability Analysis 19

Response-Time Test Memory interference delay cannot exceed any results from the RD and JD approaches We take the smaller result from the two approaches Extended response-time test Classical iterative response-time test Request-Driven (RD) Approach Job-Driven (JD) Approach 20

Experiment Severe Memory Interference Private DRAM Bank Norm. Response Time (%) 500 400 300 200 100 Observed Predicted 4.1x increase DRAM bank partitioning helps reducing the memory interference 0 blackscholes bodytrack canneal ferret fluidanimate freqmine raytrace streamcluster swaptions vips x264 Our analysis enables the quantification of the benefit of DRAM bank partitioning 21

Non-Severe Memory Interference Private DRAM Bank Norm. Response Time (%) 180 160 140 120 100 80 60 40 20 0 Observed Predicted blackscholes bodytrack Average over-estimates are 8% (13% for a shared bank) canneal ferret fluidanimate freqmine raytrace streamcluster swaptions vips x264 Our analysis bounds memory interference delay with low pessimism under both high and low memory contentions 22

Implementation Cache and Bank Partitioning implemented in Linux/RK Associates Resource Reservations to Linux Threads Memory reservation Cache reservation CPU reservation Portable Kernel Module Hooks into on-demand page allocation At boot time create large memory reserve Pages are classified in cache and bank colors 23

Model Problem Ball-following Controller on Odroid+Linux/RK Memory interference Fixed distance Ball-Following: Keep fixed distance as ball moves around 24

Experimental Results (1) 0.4-0.3- ~ 0.2-1- 0.1-0.0- Frame Processing (no protection, no attackers)...,. J -.....: - -<'--~.:~-~~- ---.. ' \.:.- c._..-:&.: :r. I I I I 0 20 40 60 Time [s] Frame inter- arrival time Frame processing time Software Engineering Institute I Ca.rnegieMellon 25

Experimental Results (2) 0.4- Frame Processing (no protection, 3 attackers) l- n en an ann s nn a en w s a t111eo sa a 6 U 7 T&LlP 7 7 A 7 7 7 S,..., (J)... E o.2-0.1-0.0- I 0 I I I 20 40 60 Time [s] Frame inter- arrival time Frame processing time Deadlines misses at 0.2. only 195 out of 279 frames processed (30% loss) Software Engineering Institute I CarnegieMellon 26

Experimental Results (3) 0.4-0.3- l- Frame Processing (with protection, 3 attackers) " IS - tf - e -... -........ ~t'.t r nlv -:,. t:+:ja 4ki'Dztcl OtPCUsloutJ'tlt& Jtl,.-...,..., (J)... E o.2-0.1-0.0- I I I I 0 20 40 60 Time [s] Frame inter- arrival time Frame processing time No deadline misses. Processing below 0.2 s interarrival (period) Software Engineering Institute I CamegieMellon 27

Concluding Remarks Multicore processor challenges previous results in real-time systems Interference from shared hardware Cache, Memory banks, Memory bus Leads to less usable processing capacity 1200% increase in a four core machine (92% reduction from single core) Our approach Coordinated private partitions for cache and memory Shared bank partitions Implemented in Linux/RK Experimental results for model avionics application Protects control algorithm from interference 28