Best Practice for Caching of Single-Path Code

Size: px

Start display at page:

Download "Best Practice for Caching of Single-Path Code"

Aileen Hancock
5 years ago
Views:

1 Best Practice for Caching of Single-Path Code Martin Schoeberl, Bekim Cilku, Daniel Prokesch, and Peter Puschner Technical University of Denmark Vienna University of Technology 1

2 Context n Real-time systems t Worst-case execution time (WCET) counts n Different from average-case performance t Standard processors are optimized for average-case performance n Design a processor and a compiler for real-time systems t The T-CREST approach 2

3 T-CREST n Time-predictable multicore Patmos R NI SPM Patmos R NI SPM Patmos R NI SPM t Processor t Network-on-chip t Memory hierarchy t Compiler M$ Dec + T-CREST Chip t WCET analysis (AbsInt ait and platin) S/D$ M$ Dec + Memory Tree Memory Controller SDRAM Memory S/D$ M$ Dec + S/D$ n Most parts open-source n 3

4 Patmos Processor n Time-predictable processor n Called Patmos n Flexibility to define the instruction set t LLVM compiler adapted for Patmos n Co-design for low WCET of t Patmos t Compiler t WCET analysis 4

5 Patmos Processor Fetch Decode Execute Memory Writeback M$ RF RF + S$ + PC IR Dec D$ n RISC pipeline n Dual issue n Special caches n No time dependency between instructions SP 5

6 Hardware Description n Chisel t Scala embedded Language t Higher level than VHDL/Verilog n Generates two versions t C++ based emulator t Verilog based hardware description n Cycle accurate emulation in C++ faster than VHDL/Verilog simulation t Based on the hardware description Martin Schoeberl 6 Caching of Single-Path Code

7 Single-path Programming n Remove input data dependent control flow decisions t Gives constant execution time t Uses (heavily) predicates n If-conversion t Execute both branches t Use if condition for result write back n Constant loop iterations t Use loop bounds t Exit condition for result write back 7

8 Single-path Programming n Loops need to be bounded t In WCET analyzable programs anyway n T-CREST compiler can generate single path code from C programs t For non-recursive programs n Simply measure execution time 8

9 Single-Path Support in Patmos n Constant execution time of all instructions n Predicated instructions t 8 predicates t One is constant true t Write result when predicate is true t Otherwise do nothing (NOP instruction) n All instructions are predicated t Execution time independent from predicate 9

10 Caches in Patmos n Configurable: type and size n For data: normal data cache, stack cache, and scratchpad memory n For instructions: t Standard instruction cache t Prefetching instruction cache (SP) t Method cache t Scratchpad memory Currently only single core (Loader issue) 10

11 Method Cache n Originally developed for the Java processor JOP t Therefore called method cache n Now also used in t SHAP t Merasa processor (CarCore) t Metzlaff PhD thesis n Also in Patmos Martin Schoeberl Caching of Single-Path 11 Code

12 Method Cache n Caches whole method/functions t May load unused instructions n Misses only on call or return t Other instructions guaranteed hits n Cache is divided in blocks n Method can span several blocks n Continuous blocks for a method n Replacement FIFO n Tag memory: One entry per block b foo a a b b Martin Schoeberl 12 Caching of Single-Path Code

13 Evaluation n TACLeBench benchmarks V 1.9 t Self-contained benchmarks n Patmos configured for DE2-115 FPGA board n 8 KB instruction cache t 16 methods when method cache n Cycle accurate emulator to collect the data 13

14 Method vs. Standard Cache Relative performance adpcm dec adpcm enc binarysearch bsort cjpeg wrbmp complex updates countnegative cover du fac g723 enc gsm dec h264 dec hu dec iir insertsort jfdctint lift lms matrix1 md5 ndes petrinet powerwindow prime sha st statemate 14

15 2-way vs Direct Mapped Relative performance adpcm dec adpcm enc binarysearch bsort cjpeg wrbmp complex updates countnegative cover du fac g723 enc gsm dec h264 dec hu dec iir insertsort jfdctint lift lms matrix1 md5 ndes petrinet powerwindow prime sha st statemate 15

16 Dynamic Benchmark Sizes Size in bytes adpcm dec adpcm enc binarysearch bsort cjpeg wrbmp complex updates countnegative cover du fac g723 enc gsm dec h264 dec hu dec iir insertsort jfdctint lift lms ludcmp matrix1 md5 minver ndes petrinet powerwindow prime sha st statemate 16

17 Method vs Standard Cache 2 KB Relative performance adpcm dec adpcm enc binarysearch bsort cjpeg wrbmp complex updates countnegative cover du fac g723 enc gsm dec h264 dec hu dec iir insertsort jfdctint lift lms matrix1 md5 ndes petrinet powerwindow prime sha st statemate 17

18 Reproducing the Results 18

19 Conclusion n Single-path code gives constant execution time n Compared different caching organizations n No single winner n In FPGA we can use application specific caching 19

Scope-based Method Cache Analysis

Scope-based Method Cache Analysis Benedikt Huber 1, Stefan Hepp 1, Martin Schoeberl 2 1 Vienna University of Technology 2 Technical University of Denmark 14th International Workshop on Worst-Case Execution