Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance

Size: px

Start display at page:

Download "Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance"

Arlene McBride
5 years ago
Views:

1 Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance Wei Zhang and Yiqiang Ding Department of Electrical and Computer Engineering Virginia Commonwealth University

2 Outline Introduction Related works Architectural time-predictability factor Hybrid SPM-Cache architectures Evaluation methodology Experimental results Conclusions

3 Worst-Case Execution Time (WCET) Robotics Medical equipment Avionics Possible Execution Time BCET WCET 3 Average case Execution Time Estimated WCET

4 Cache vs SPM High performance and good time-predictability Cache Good average-case performance Harmful to time predictability because of it depends on the history of memory accesses and the placement and replacement algorithms Scratch-Pad Memory (SPM) Time predictable because of statically predictable memory access time Better energy efficiency Inferior average-case performance because of no dynamical use of on-chip memory space

5 Our Contributions We propose a hybrid on-chip memory architecture that can leverage the SPM to achieve time predictability while exploiting the cache to improve the average-case performance We have studied three different hybrid architectures to understand how to make best use of the hybrid cache and SPM to store instructions and data for balancing performance and time predictability While most prior works indicate performance and time predictability usually conflict with each other, this research shows that it is possible to exploit hybrid cache-spm models for improving both time predictability and performance

6 Related Works Cache partitioning[1,2,3] and cache locking[4,5,6] Reduce cache interferences between tasks Prevent dynamic use of cache space which degrade performance Some hybrid architecture such as IH-DH can actually boost performance SPM allocation algorithms [7,8,9] Improve either the average-case performance or WCET Not targeting hybrid SPM-cache

7 Related Works Exploration of hybrid models of cache and SPM Panda et al. studied partitioning scalar and array variables into SPM to minimize the execution time [1] Verma et al. studied an instruction cache behavior based SPM allocation to reduce the energy consumption [11] Cong et al. examined an adaptive hybrid cache by reconfiguring a part of the cache as software-managed SPM to improve both performance and energy efficiency [12] Not for achieving time predictability

8 Architectural Time-predictability Factor Architectural Time-predictability Factor (ATF) [Ding et al. SIGBED Review, Nov. 212] Given a processor P, an arbitrary real-time trace T, the actual execution time D(P, T), and the statically predicted execution time based on the timing contract S(P, T), ATF can be defined as: ATF(P, T) D(P, T) S(P, T) S(P, T) D(P, T) Instruction scheduling Running on a processor static exec time dynamic exec time ATF dynamic exec time static exec time

9 Three Hybrid SPM-cache Architectures

10 SPM Allocation Algorithm

11 Simulation We use Trimaran compiler/simulator framework Baseline processor has 2 integer ALUs, 2 float ALUs, 1 branch predictor, 1 ld/st unit, and 1-level on-chip memory Benchmarks: 6 Malardalen WCET benchmark and 7 Mediabench On-chip memory configuration Real-time benchmark: 128 bytes MediaBench benchmark: 16K bytes Cache configuration Real-time benchmark: 16 byte block size, direct-mapped MediaBench benchmark: 32 byte block size, 4-way, LRU Hybrid SPM-cache configuration Two different partitions while the total on-chip size is fixed i-m scheme: a M-byte cache and an (N-M)-byte SPM

12 IH-DC Architecture IS DS IC DC IH DC 64 IH DC 32 ATF real-time benchmarks ATF crc edn lms matmult ndes statemate ATF media benchmarks ATF IS DS IC DC IH DC 8K IH DC 4K

13 IH-DC Architecture IS DS IC DC IH DC 64 IH DC 32 Performance real-time benchmarks Normalized Performance crc edn lms matmult ndes statemate IS DS IC DC IH DC 8K IH DC 4K Performance media benchmarks Normalized Performance

14 IC-DH Architecture IS DS IC DC IC DH 64 IC DH 32 ATF real-time benchmarks ATF crc edn lms matmult ndes statemate ATF media benchmarks ATF IS DS IC DC IC DH 8K IC DH 4K

15 IC-DH Architecture IS DS IC DC IC DH 64 IC DH 32 Performance real-time benchmarks Normalized Performance crc edn lms matmult ndes statemate IS DS IC DC IC DH 8K IC DH 4K Performance media benchmarks Normalized Performance

16 IH-DH Architecture IC DC IH DH 64 IH DH 32 IS DS ATF real-time benchmarks ATF crc edn lms matmult ndes statemate IC DC IH DH 8K IH DH 4K IS DS ATF media benchmarks ATF

17 Performance real-time benchmarks IH-DH Architecture Normalized Performance IC DC IH DH 64 IH DH 32 IS DS.2 On average, the performance of IH DH is 1.9% better crc edn lms matmult ndes statemate than that of the IC DC architecture for real time IC DC IH DH 8K IH DH 4K IS DS benchmarks, and it is 4% better for media benchmarks. 1.2 Performance media benchmarks Normalized Performance

18 WCET Results

19 Conclusions These hybrid architectures can provide a variety of performance and time predictability for a wide range of benchmarks IH-DH is the best hybrid on-chip memory architecture that can achieve both good time predictability and high performance IH-DH can outperform the pure cache-based architecture IC-DC for most benchmarks, revealing that improving time predictability and performance does not have to always conflict with each other

20 Further architectural exploration Future Work Other hybrid on-chip memories such as instruction (data) SPM and data (instruction) cache Impact on energy consumption, in addition to performance and time predictability Explore different SPM allocation algorithms for various hybrid SPM-cache architectures Investigate the use of hybrid on-chip memory architectures in a multicore platform to balance time predictability and performance for multi-threaded and multi-programmed workloads

21 References [1] D. Kirk. "SMART (strategic memory allocation for realtime) cache design,in Proceedings of 5th IEEE International Real-Time Systems Symposium (RTSS), 1989 [2] D. Kirk and J. Strosnider. "SMART (strategic memory allocation for realtime) cache design using the mips r3, In Proceedings of 6th IEEEbInternational Real-Time Systems Symposium (RTSS), 199 [3] F.Mueller. Compiler support for software-baed cache partitioning. SIGPLAN Notice, Vol. 3, Nov [4] I. Puaut and D. Decotigny. "Low-complexity algorithms for static cache locking in multitasking hard real-time systems, In Proceedings of 18 th IEEE International Real-Time Systems Symposium (RTSS), 22 [5] X. Vera, B. Lisper and J. Xue. "Data cache locking for higher program predictability, ACM SIGMETRICS, 23. [6] V. Suhendra and T. Mitra. "Exploring locking & partitioning for predictable shared caches on multi-cores, In Proceedings of 45th Design Automatic Conference, 28

22 References [7] S. Steinke et al. "Assigning program and data objects to scratchpad for energy reduction, In Proceedings of Europe Design and Test Conference, 22. [8] M. Kandemir, I. Kadayif, A. Choudhary and J. Ramanujam. "Compiler-directed scratch pad memory optimization for embedded multiprocessors, IEEE Transactions on VLSI Systems, Vol. 12, No. 3, March 24. [9] J. F. Deverge and I. Puaut. "WCET-directed dynamic scratchpad memory allocation of data, In Proceedings of 19th Euromicro Conference on Real-Time Systems (ECRTS), July, 27. [1] P. Panda, N. Dutt and A. Nicolau. "Efficient utilization of scratch-pad memory in embedded processor applications, In Proceedings of Europe Design and Test Conference, March [11] M. Verma, L. Wehmeyer, and P. Marwedel. "Cache-aware scratchpad allocation algorithm, In Proceedings of Deesign, Automation and Test in Europe Conference, 24. [12] J. Cong, K. Gururaj, H. Huang, C. Liu, G. Reinman, and Y. Zou. "An Energyecient adaptive hybrid cache, In Proceedings of International Symposium on Low Power Electronics and Design, 211.

23 Acknowledgment: This work was funded in part by the NSF grant CCF

24 Static Execution Time Analysis

Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability

Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability Yiqiang Ding, Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Outline