Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance

Size: px
Start display at page:

Download "Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance"

Transcription

1 Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance Wei Zhang and Yiqiang Ding Department of Electrical and Computer Engineering Virginia Commonwealth University

2 Outline Introduction Related works Architectural time-predictability factor Hybrid SPM-Cache architectures Evaluation methodology Experimental results Conclusions

3 Worst-Case Execution Time (WCET) Robotics Medical equipment Avionics Possible Execution Time BCET WCET 3 Average case Execution Time Estimated WCET

4 Cache vs SPM High performance and good time-predictability Cache Good average-case performance Harmful to time predictability because of it depends on the history of memory accesses and the placement and replacement algorithms Scratch-Pad Memory (SPM) Time predictable because of statically predictable memory access time Better energy efficiency Inferior average-case performance because of no dynamical use of on-chip memory space

5 Our Contributions We propose a hybrid on-chip memory architecture that can leverage the SPM to achieve time predictability while exploiting the cache to improve the average-case performance We have studied three different hybrid architectures to understand how to make best use of the hybrid cache and SPM to store instructions and data for balancing performance and time predictability While most prior works indicate performance and time predictability usually conflict with each other, this research shows that it is possible to exploit hybrid cache-spm models for improving both time predictability and performance

6 Related Works Cache partitioning[1,2,3] and cache locking[4,5,6] Reduce cache interferences between tasks Prevent dynamic use of cache space which degrade performance Some hybrid architecture such as IH-DH can actually boost performance SPM allocation algorithms [7,8,9] Improve either the average-case performance or WCET Not targeting hybrid SPM-cache

7 Related Works Exploration of hybrid models of cache and SPM Panda et al. studied partitioning scalar and array variables into SPM to minimize the execution time [1] Verma et al. studied an instruction cache behavior based SPM allocation to reduce the energy consumption [11] Cong et al. examined an adaptive hybrid cache by reconfiguring a part of the cache as software-managed SPM to improve both performance and energy efficiency [12] Not for achieving time predictability

8 Architectural Time-predictability Factor Architectural Time-predictability Factor (ATF) [Ding et al. SIGBED Review, Nov. 212] Given a processor P, an arbitrary real-time trace T, the actual execution time D(P, T), and the statically predicted execution time based on the timing contract S(P, T), ATF can be defined as: ATF(P, T) D(P, T) S(P, T) S(P, T) D(P, T) Instruction scheduling Running on a processor static exec time dynamic exec time ATF dynamic exec time static exec time

9 Three Hybrid SPM-cache Architectures

10 SPM Allocation Algorithm

11 Simulation We use Trimaran compiler/simulator framework Baseline processor has 2 integer ALUs, 2 float ALUs, 1 branch predictor, 1 ld/st unit, and 1-level on-chip memory Benchmarks: 6 Malardalen WCET benchmark and 7 Mediabench On-chip memory configuration Real-time benchmark: 128 bytes MediaBench benchmark: 16K bytes Cache configuration Real-time benchmark: 16 byte block size, direct-mapped MediaBench benchmark: 32 byte block size, 4-way, LRU Hybrid SPM-cache configuration Two different partitions while the total on-chip size is fixed i-m scheme: a M-byte cache and an (N-M)-byte SPM

12 IH-DC Architecture IS DS IC DC IH DC 64 IH DC 32 ATF real-time benchmarks ATF crc edn lms matmult ndes statemate ATF media benchmarks ATF IS DS IC DC IH DC 8K IH DC 4K

13 IH-DC Architecture IS DS IC DC IH DC 64 IH DC 32 Performance real-time benchmarks Normalized Performance crc edn lms matmult ndes statemate IS DS IC DC IH DC 8K IH DC 4K Performance media benchmarks Normalized Performance

14 IC-DH Architecture IS DS IC DC IC DH 64 IC DH 32 ATF real-time benchmarks ATF crc edn lms matmult ndes statemate ATF media benchmarks ATF IS DS IC DC IC DH 8K IC DH 4K

15 IC-DH Architecture IS DS IC DC IC DH 64 IC DH 32 Performance real-time benchmarks Normalized Performance crc edn lms matmult ndes statemate IS DS IC DC IC DH 8K IC DH 4K Performance media benchmarks Normalized Performance

16 IH-DH Architecture IC DC IH DH 64 IH DH 32 IS DS ATF real-time benchmarks ATF crc edn lms matmult ndes statemate IC DC IH DH 8K IH DH 4K IS DS ATF media benchmarks ATF

17 Performance real-time benchmarks IH-DH Architecture Normalized Performance IC DC IH DH 64 IH DH 32 IS DS.2 On average, the performance of IH DH is 1.9% better crc edn lms matmult ndes statemate than that of the IC DC architecture for real time IC DC IH DH 8K IH DH 4K IS DS benchmarks, and it is 4% better for media benchmarks. 1.2 Performance media benchmarks Normalized Performance

18 WCET Results

19 Conclusions These hybrid architectures can provide a variety of performance and time predictability for a wide range of benchmarks IH-DH is the best hybrid on-chip memory architecture that can achieve both good time predictability and high performance IH-DH can outperform the pure cache-based architecture IC-DC for most benchmarks, revealing that improving time predictability and performance does not have to always conflict with each other

20 Further architectural exploration Future Work Other hybrid on-chip memories such as instruction (data) SPM and data (instruction) cache Impact on energy consumption, in addition to performance and time predictability Explore different SPM allocation algorithms for various hybrid SPM-cache architectures Investigate the use of hybrid on-chip memory architectures in a multicore platform to balance time predictability and performance for multi-threaded and multi-programmed workloads

21 References [1] D. Kirk. "SMART (strategic memory allocation for realtime) cache design,in Proceedings of 5th IEEE International Real-Time Systems Symposium (RTSS), 1989 [2] D. Kirk and J. Strosnider. "SMART (strategic memory allocation for realtime) cache design using the mips r3, In Proceedings of 6th IEEEbInternational Real-Time Systems Symposium (RTSS), 199 [3] F.Mueller. Compiler support for software-baed cache partitioning. SIGPLAN Notice, Vol. 3, Nov [4] I. Puaut and D. Decotigny. "Low-complexity algorithms for static cache locking in multitasking hard real-time systems, In Proceedings of 18 th IEEE International Real-Time Systems Symposium (RTSS), 22 [5] X. Vera, B. Lisper and J. Xue. "Data cache locking for higher program predictability, ACM SIGMETRICS, 23. [6] V. Suhendra and T. Mitra. "Exploring locking & partitioning for predictable shared caches on multi-cores, In Proceedings of 45th Design Automatic Conference, 28

22 References [7] S. Steinke et al. "Assigning program and data objects to scratchpad for energy reduction, In Proceedings of Europe Design and Test Conference, 22. [8] M. Kandemir, I. Kadayif, A. Choudhary and J. Ramanujam. "Compiler-directed scratch pad memory optimization for embedded multiprocessors, IEEE Transactions on VLSI Systems, Vol. 12, No. 3, March 24. [9] J. F. Deverge and I. Puaut. "WCET-directed dynamic scratchpad memory allocation of data, In Proceedings of 19th Euromicro Conference on Real-Time Systems (ECRTS), July, 27. [1] P. Panda, N. Dutt and A. Nicolau. "Efficient utilization of scratch-pad memory in embedded processor applications, In Proceedings of Europe Design and Test Conference, March [11] M. Verma, L. Wehmeyer, and P. Marwedel. "Cache-aware scratchpad allocation algorithm, In Proceedings of Deesign, Automation and Test in Europe Conference, 24. [12] J. Cong, K. Gururaj, H. Huang, C. Liu, G. Reinman, and Y. Zou. "An Energyecient adaptive hybrid cache, In Proceedings of International Symposium on Low Power Electronics and Design, 211.

23 Acknowledgment: This work was funded in part by the NSF grant CCF

24 Static Execution Time Analysis

Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability

Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability Yiqiang Ding, Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Outline

More information

Shared Cache Aware Task Mapping for WCRT Minimization

Shared Cache Aware Task Mapping for WCRT Minimization Shared Cache Aware Task Mapping for WCRT Minimization Huping Ding & Tulika Mitra School of Computing, National University of Singapore Yun Liang Center for Energy-efficient Computing and Applications,

More information

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI, CHEN TIANZHOU, SHI QINGSONG, JIANG NING College of Computer Science Zhejiang University College of Computer

More information

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI CHEN TIANZHOU SHI QINGSONG JIANG NING College of Computer Science Zhejiang University College of Computer Science

More information

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Introduction WCET of program ILP Formulation Requirement SPM allocation for code SPM allocation for data Conclusion

More information

Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems

Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems Zimeng Zhou, Lei Ju, Zhiping Jia, Xin Li School of Computer Science and Technology Shandong University, China Outline

More information

Caches in Real-Time Systems. Instruction Cache vs. Data Cache

Caches in Real-Time Systems. Instruction Cache vs. Data Cache Caches in Real-Time Systems [Xavier Vera, Bjorn Lisper, Jingling Xue, Data Caches in Multitasking Hard Real- Time Systems, RTSS 2003.] Schedulability Analysis WCET Simple Platforms WCMP (memory performance)

More information

Power Efficient Instruction Caches for Embedded Systems

Power Efficient Instruction Caches for Embedded Systems Power Efficient Instruction Caches for Embedded Systems Dinesh C. Suresh, Walid A. Najjar, and Jun Yang Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA

More information

Caches in Real-Time Systems. Instruction Cache vs. Data Cache

Caches in Real-Time Systems. Instruction Cache vs. Data Cache Caches in Real-Time Systems [Xavier Vera, Bjorn Lisper, Jingling Xue, Data Caches in Multitasking Hard Real- Time Systems, RTSS 2003.] Schedulability Analysis WCET Simple Platforms WCMP (memory performance)

More information

A METHODOLOGY FOR THE OPTIMIZATION OF MULTI- PROGRAM SHARED SCRATCHPAD MEMORY

A METHODOLOGY FOR THE OPTIMIZATION OF MULTI- PROGRAM SHARED SCRATCHPAD MEMORY INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS VOL. 4, NO. 1, MARCH 2011 A METHODOLOGY FOR THE OPTIMIZATION OF MULTI- PROGRAM SHARED SCRATCHPAD MEMORY J. F. Yang, H. Jiang School of Electronic

More information

Predictable paging in real-time systems: an ILP formulation

Predictable paging in real-time systems: an ILP formulation Predictable paging in real-time systems: an ILP formulation Damien Hardy Isabelle Puaut Université Européenne de Bretagne / IRISA, Rennes, France Abstract Conventionally, the use of virtual memory in real-time

More information

Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip

Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip 1 Mythili.R, 2 Mugilan.D 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

Exploiting Standard Deviation of CPI to Evaluate Architectural Time-Predictability

Exploiting Standard Deviation of CPI to Evaluate Architectural Time-Predictability Regular Paper Journal of Computing Science and Engineering, Vol. 8, No. 1, March 2014, pp. 34-42 Exploiting Standard Deviation of CPI to Evaluate Architectural Time-Predictability Wei Zhang* and Yiqiang

More information

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel

More information

Phase-based Cache Locking for Embedded Systems

Phase-based Cache Locking for Embedded Systems Phase-based Cache Locking for Embedded Systems Tosiron Adegbija and Ann Gordon-Ross* Department of Electrical and Computer Engineering, University of Florida (UF), Gainesville, FL 32611, USA tosironkbd@ufl.edu

More information

Cache-Aware Scratchpad Allocation Algorithm

Cache-Aware Scratchpad Allocation Algorithm 1530-1591/04 $20.00 (c) 2004 IEEE -Aware Scratchpad Allocation Manish Verma, Lars Wehmeyer, Peter Marwedel Department of Computer Science XII University of Dortmund 44225 Dortmund, Germany {Manish.Verma,

More information

Exploring Hybrid SPM-Cache Architectures to Improve Performance and Energy Efficiency for Real-time Computing

Exploring Hybrid SPM-Cache Architectures to Improve Performance and Energy Efficiency for Real-time Computing Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2013 Exploring Hybrid SPM-Cache Architectures to Improve Performance and Energy Efficiency for Real-time Computing

More information

Efficient Cache Locking at Private First-Level Caches and Shared Last-Level Cache for Modern Multicore Systems

Efficient Cache Locking at Private First-Level Caches and Shared Last-Level Cache for Modern Multicore Systems Efficient Cache Locking at Private First-Level Caches and Shared Last-Level Cache for Modern Multicore Systems Abu Asaduzzaman 1, Kishore K. Chidella 2, Md Moniruzzaman 3 1,2,3 Electrical Engineering and

More information

Improving Worst-Case Cache Performance through Selective Bypassing and Register-Indexed Cache

Improving Worst-Case Cache Performance through Selective Bypassing and Register-Indexed Cache Improving WorstCase Cache Performance through Selective Bypassing and RegisterIndexed Cache Mohamed Ismail, Daniel Lo, and G. Edward Suh Cornell University Ithaca, NY, USA {mii5, dl575, gs272}@cornell.edu

More information

Memory-architecture aware compilation

Memory-architecture aware compilation - 1- ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 8 2008 Memory-architecture aware compilation Lecturers: Peter Marwedel, Heiko Falk Informatik 12 TU Dortmund, Germany

More information

Optimizations - Compilation for Embedded Processors -

Optimizations - Compilation for Embedded Processors - 12 Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 23 211 年 1 月 12 日 These slides use Microsoft clip arts.

More information

Usage of Scratchpad Memory In Embedded Systems - State of Art

Usage of Scratchpad Memory In Embedded Systems - State of Art Usage of Scratchpad Memory In Embedded Systems - State of Art B. An uradha I Department of Computer Science and Engineering Dr. C. Vivekanandan2 Dean Electrical Sciences & Student Affairs SNS College of

More information

Optimizations - Compilation for Embedded Processors -

Optimizations - Compilation for Embedded Processors - Springer, 2010 12 Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany 2014 年 01 月 17 日 These slides use Microsoft clip arts. Microsoft copyright restrictions

More information

Stack Frames Placement in Scratch-Pad Memory for Energy Reduction of Multi-task Applications

Stack Frames Placement in Scratch-Pad Memory for Energy Reduction of Multi-task Applications Stack Frames Placement in Scratch-Pad Memory for Energy Reduction of Multi-task Applications LOVIC GAUTHIER 1, TOHRU ISHIHARA 1, AND HIROAKI TAKADA 2 1 System LSI Research Center, 3rd Floor, Institute

More information

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors , July 4-6, 2018, London, U.K. A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid in 3D chip Multi-processors Lei Wang, Fen Ge, Hao Lu, Ning Wu, Ying Zhang, and Fang Zhou Abstract As

More information

Single-Path Programming on a Chip-Multiprocessor System

Single-Path Programming on a Chip-Multiprocessor System Single-Path Programming on a Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at

More information

Efficient Computing in Cyber-Physical Systems

Efficient Computing in Cyber-Physical Systems 12 Efficient Computing in Cyber-Physical Systems Peter Marwedel TU Dortmund (Germany) Informatik 12 Springer, 2010 2013/06/20 Cyber-physical systems and embedded systems Embedded systems (ES): information

More information

Compiler-Directed Scratchpad Memory Management via Graph Coloring

Compiler-Directed Scratchpad Memory Management via Graph Coloring Compiler-Directed Scratchpad Memory Management via Graph Coloring Lian Li, Hui Feng and Jingling Xue University of New South Wales Scratchpad memory (SPM), a fast on-chip SRAM managed by software, is widely

More information

Scope-based Method Cache Analysis

Scope-based Method Cache Analysis Scope-based Method Cache Analysis Benedikt Huber 1, Stefan Hepp 1, Martin Schoeberl 2 1 Vienna University of Technology 2 Technical University of Denmark 14th International Workshop on Worst-Case Execution

More information

Cache-Aware Instruction SPM Allocation for Hard Real-Time Systems

Cache-Aware Instruction SPM Allocation for Hard Real-Time Systems Cache-Aware Instruction SPM Allocation for Hard Real-Time Systems Arno Luppold Institute of Embedded Systems Hamburg University of Technology Germany Arno.Luppold@tuhh.de Christina Kittsteiner Institute

More information

Optimal WCET-Aware Code Selection for Scratchpad Memory

Optimal WCET-Aware Code Selection for Scratchpad Memory Optimal WCET-Aware Code Selection for Scratchpad Memory Hui Wu, Jingling Xue, Sri Parameswaran School of Computer Science and Engineering The Uniersity of New South Wales {huiw,jingling,sridean}@cse.unsw.edu.au

More information

Best Practice for Caching of Single-Path Code

Best Practice for Caching of Single-Path Code Best Practice for Caching of Single-Path Code Martin Schoeberl, Bekim Cilku, Daniel Prokesch, and Peter Puschner Technical University of Denmark Vienna University of Technology 1 Context n Real-time systems

More information

Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems

Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems Hideki Takase, Hiroyuki Tomiyama and Hiroaki Takada Graduate School of Information Science, Nagoya University

More information

Memory Systems and Compiler Support for MPSoC Architectures. Mahmut Kandemir and Nikil Dutt. Cap. 9

Memory Systems and Compiler Support for MPSoC Architectures. Mahmut Kandemir and Nikil Dutt. Cap. 9 Memory Systems and Compiler Support for MPSoC Architectures Mahmut Kandemir and Nikil Dutt Cap. 9 Fernando Moraes 28/maio/2013 1 MPSoC - Vantagens MPSoC architecture has several advantages over a conventional

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

SPM Management Using Markov Chain Based Data Access Prediction*

SPM Management Using Markov Chain Based Data Access Prediction* SPM Management Using Markov Chain Based Data Access Prediction* Taylan Yemliha Syracuse University, Syracuse, NY Shekhar Srikantaiah, Mahmut Kandemir Pennsylvania State University, University Park, PA

More information

Fast, predictable and low energy memory references through architecture-aware compilation 1

Fast, predictable and low energy memory references through architecture-aware compilation 1 ; Fast, predictable and low energy memory references through architecture-aware compilation 1 Peter Marwedel, Lars Wehmeyer, Manish Verma, Stefan Steinke +, Urs Helmig University of Dortmund, Germany +

More information

Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses

Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses Bekim Cilku, Roland Kammerer, and Peter Puschner Institute of Computer Engineering Vienna University of Technology A0 Wien, Austria

More information

PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures *

PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * Sudarshan Banerjee, Elaheh Bozorgzadeh, Nikil Dutt Center for Embedded Computer Systems

More information

Timing analysis and timing predictability

Timing analysis and timing predictability Timing analysis and timing predictability Architectural Dependences Reinhard Wilhelm Saarland University, Saarbrücken, Germany ArtistDesign Summer School in China 2010 What does the execution time depends

More information

Scratchpad memory vs Caches - Performance and Predictability comparison

Scratchpad memory vs Caches - Performance and Predictability comparison Scratchpad memory vs Caches - Performance and Predictability comparison David Langguth langguth@rhrk.uni-kl.de Abstract While caches are simple to use due to their transparency to programmer and compiler,

More information

Compiler-assisted dynamic scratch-pad memory management with space overlapping for embedded systems

Compiler-assisted dynamic scratch-pad memory management with space overlapping for embedded systems SOFTWARE PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2011; 41:737 752 Published online 28 October 2010 in Wiley Online Library (wileyonlinelibrary.com)..1020 Compiler-assisted dynamic scratch-pad memory

More information

Do we need a crystal ball for task migration?

Do we need a crystal ball for task migration? Do we need a crystal ball for task migration? Brandon {Myers,Holt} University of Washington bdmyers@cs.washington.edu 1 Large data sets Data 2 Spread data Data.1 Data.2 Data.3 Data.4 Data.0 Data.1 Data.2

More information

Tightening the Bounds on Feasible Preemption Points

Tightening the Bounds on Feasible Preemption Points Tightening the Bounds on Feasible Preemption Points Harini Ramaprasad, Frank Mueller North Carolina State University,Raleigh, NC 27695-7534, mueller@cs.ncsu.edu Abstract Caches have become invaluable for

More information

Dual-Processor Design of Energy Efficient Fault-Tolerant System

Dual-Processor Design of Energy Efficient Fault-Tolerant System Dual-Processor Design of Energy Efficient Fault-Tolerant System Shaoxiong Hua Synopsys Inc. 7 E. Middlefield Road Mountain View, CA 9443 huas@synopsys.com Pushkin R. Pari Intel Technology India Pvt. Ltd.

More information

A Novel Technique to Use Scratch-pad Memory for Stack Management

A Novel Technique to Use Scratch-pad Memory for Stack Management A Novel Technique to Use Scratch-pad Memory for Stack Management Soyoung Park Hae-woo Park Soonhoi Ha School of EECS, Seoul National University, Seoul, Korea {soy, starlet, sha}@iris.snu.ac.kr Abstract

More information

PROCESSORS are increasingly replacing gates as the basic

PROCESSORS are increasingly replacing gates as the basic 816 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 8, AUGUST 2006 Exploiting Statistical Information for Implementation of Instruction Scratchpad Memory in Embedded System

More information

Code Placement Techniques for Cache Miss Rate Reduction

Code Placement Techniques for Cache Miss Rate Reduction Code Placement Techniques for Cache Miss Rate Reduction HIROYUKI TOMIYAMA and HIROTO YASUURA Kyushu University In the design of embedded systems with cache memories, it is important to minimize the cache

More information

WCET-Aware C Compiler: WCC

WCET-Aware C Compiler: WCC 12 WCET-Aware C Compiler: WCC Jian-Jia Chen (slides are based on Prof. Heiko Falk) TU Dortmund, Informatik 12 2015 年 05 月 05 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.

More information

-- the Timing Problem & Possible Solutions

-- the Timing Problem & Possible Solutions ARTIST Summer School in Europe 2010 Autrans (near Grenoble), France September 5-10, 2010 Towards Real-Time Applications on Multicore -- the Timing Problem & Possible Solutions Wang Yi Uppsala University,

More information

On the Interplay of Loop Caching, Code Compression, and Cache Configuration

On the Interplay of Loop Caching, Code Compression, and Cache Configuration On the Interplay of Loop Caching, Code Compression, and Cache Configuration Marisha Rawlins and Ann Gordon-Ross* Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL

More information

Bounding Worst-Case DRAM Performance on Multicore Processors

Bounding Worst-Case DRAM Performance on Multicore Processors Regular Paper Journal of Computing Science and Engineering, Vol. 7, No. 1, March 2013, pp. 53-66 Bounding Worst-Case DRAM Performance on Multicore Processors Yiqiang Ding, Lan Wu, and Wei Zhang* Department

More information

Instruction Cache Locking Using Temporal Reuse Profile

Instruction Cache Locking Using Temporal Reuse Profile Instruction Cache Locking Using Temporal Reuse Profile Yun Liang, Tulika Mitra School of Computing, National University of Singapore {liangyun,tulika}@comp.nus.edu.sg ABSTRACT The performance of most embedded

More information

A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking

A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking Bekim Cilku, Daniel Prokesch, Peter Puschner Institute of Computer Engineering Vienna University of Technology

More information

Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses

Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses Bekim Cilku Institute of Computer Engineering Vienna University of Technology A40 Wien, Austria bekim@vmars tuwienacat Roland Kammerer

More information

A Comparison of Capacity Management Schemes for Shared CMP Caches

A Comparison of Capacity Management Schemes for Shared CMP Caches A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip

More information

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate

More information

AS DEMANDS for higher performance keep growing,

AS DEMANDS for higher performance keep growing, IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 6, JUNE 2013 809 Data Placement and Duplication for Embedded Multicore Systems With Scratch Pad Memory Yibo Guo,

More information

SF-LRU Cache Replacement Algorithm

SF-LRU Cache Replacement Algorithm SF-LRU Cache Replacement Algorithm Jaafar Alghazo, Adil Akaaboune, Nazeih Botros Southern Illinois University at Carbondale Department of Electrical and Computer Engineering Carbondale, IL 6291 alghazo@siu.edu,

More information

Performance Balancing: Software-based On-chip Memory Management for Effective CMP Executions

Performance Balancing: Software-based On-chip Memory Management for Effective CMP Executions Performance Balancing: Software-based On-chip Memory Management for Effective CMP Executions Naoto Fukumoto, Kenichi Imazato, Koji Inoue, Kazuaki Murakami Department of Advanced Information Technology,

More information

Efficient Pointer Management of Stack Data for Software Managed Multicores

Efficient Pointer Management of Stack Data for Software Managed Multicores Efficient Pointer Management of Stack Data for Software Managed Multicores Jian Cai, Aviral Shrivastava Compiler Microarchitecture Laboratory Arizona State University Tempe, Arizona 85287 USA {jian.cai,

More information

Scalable Memory Hierarchies for Embedded Manycore Systems

Scalable Memory Hierarchies for Embedded Manycore Systems Scalable Hierarchies for Embedded Manycore Systems Sen Ma, Miaoqing Huang, Eugene Cartwright, and David Andrews Department of Computer Science and Computer Engineering University of Arkansas {senma,mqhuang,eugene,dandrews}@uark.edu

More information

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun

More information

Scratchpad Memory Aware Task Scheduling with Minimum Number of Preemptions on a Single Processor

Scratchpad Memory Aware Task Scheduling with Minimum Number of Preemptions on a Single Processor Scratchpad Memory Aware Task Scheduling with Minimum Number of Preemptions on a Single Processor QingWan HuiWu Jingling Xue School of Computer Science and Engineering University of New South Wales Sydney,

More information

Combining Worst-Case Timing Models, Loop Unrolling, and Static Loop Analysis for WCET Minimization

Combining Worst-Case Timing Models, Loop Unrolling, and Static Loop Analysis for WCET Minimization Combining Worst-Case Timing Models, Loop Unrolling, and Static Loop Analysis for WCET Minimization Paul Lokuciejewski, Peter Marwedel Computer Science 12 TU Dortmund University D-44221 Dortmund, Germany

More information

Dynamic scratch-pad memory management with data pipelining for embedded systems

Dynamic scratch-pad memory management with data pipelining for embedded systems CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2010; 22:1874 1892 Published online 7 June 2010 in Wiley InterScience (www.interscience.wiley.com)..1602 Dynamic

More information

NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores www.bsc.es NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores Jordi Cardona 1,2, Carles Hernandez 1, Enrico Mezzetti 1, Jaume Abella 1 and Francisco J.Cazorla 1,3 1 Barcelona

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

Predictable code and data paging for real time systems

Predictable code and data paging for real time systems Predictable code and data paging for real time systems Damien Hardy Isabelle Puaut Université Européenne de Bretagne / IRISA, Rennes, France Abstract There is a need for using virtual memory in real-time

More information

An Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems

An Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems [DOI: 10.2197/ipsjtsldm.8.100] Short Paper An Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems Takuya Hatayama 1,a) Hideki Takase 1 Kazuyoshi Takagi 1 Naofumi

More information

Integrated Scratchpad Memory Optimization and Task Scheduling for MPSoC Architectures

Integrated Scratchpad Memory Optimization and Task Scheduling for MPSoC Architectures Integrated Scratchpad Memory Optimization and Task Scheduling for MPSoC Architectures Vivy Suhendra, Chandrashekar Raghavan, Tulika Mitra School of Computing National University of Singapore {vivy, chandra1,

More information

Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems

Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems Florin Balasa American University in Cairo Noha Abuaesh American University in Cairo Ilie I. Luican Microsoft Inc., USA Cristian

More information

Achieving Predictable Multicore Execution of Automotive Applications Using the LET Paradigm

Achieving Predictable Multicore Execution of Automotive Applications Using the LET Paradigm Achieving Predictable Multicore Execution of Automotive Applications Using the LET Paradigm Alessandro Biondi and Marco Di Natale Scuola Superiore Sant Anna, Pisa, Italy Introduction The introduction of

More information

Static and Dynamic Frequency Scaling on Multicore CPUs

Static and Dynamic Frequency Scaling on Multicore CPUs Static and Dynamic Frequency Scaling on Multicore CPUs Wenlei Bao 1 Changwan Hong 1 Sudheer Chunduri 2 Sriram Krishnamoorthy 3 Louis-Noël Pouchet 4 Fabrice Rastello 5 P. Sadayappan 1 1 The Ohio State University

More information

Data Cache Locking for Tight Timing Calculations

Data Cache Locking for Tight Timing Calculations Data Cache Locking for Tight Timing Calculations XAVIER VERA and BJÖRN LISPER Mälardalens Högskola and JINGLING XUE University of New South Wales Caches have become increasingly important with the widening

More information

One-Level Cache Memory Design for Scalable SMT Architectures

One-Level Cache Memory Design for Scalable SMT Architectures One-Level Cache Design for Scalable SMT Architectures Muhamed F. Mudawar and John R. Wani Computer Science Department The American University in Cairo mudawwar@aucegypt.edu rubena@aucegypt.edu Abstract

More information

A Novel Instruction Scratchpad Memory Optimization Method based on Concomitance Metric

A Novel Instruction Scratchpad Memory Optimization Method based on Concomitance Metric A Novel Instruction Scratchpad Memory Optimization Method based on Concomitance Metric Andhi Janapsatya, Aleksandar Ignjatović, Sri Parameswaran School of Computer Science and Engineering, The University

More information

Reducing Energy Consumption by Dynamic Copying of Instructions onto Onchip Memory

Reducing Energy Consumption by Dynamic Copying of Instructions onto Onchip Memory Reducing Consumption by Dynamic Copying of Instructions onto Onchip Memory Stefan Steinke *, Nils Grunwald *, Lars Wehmeyer *, Rajeshwari Banakar +, M. Balakrishnan +, Peter Marwedel * * University of

More information

An Approach for Adaptive DRAM Temperature and Power Management

An Approach for Adaptive DRAM Temperature and Power Management IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 An Approach for Adaptive DRAM Temperature and Power Management Song Liu, Yu Zhang, Seda Ogrenci Memik, and Gokhan Memik Abstract High-performance

More information

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1) Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling

More information

Compilation for Heterogeneous Platforms

Compilation for Heterogeneous Platforms Compilation for Heterogeneous Platforms Grid in a Box and on a Chip Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/heterogeneous.pdf Senior Researchers Ken Kennedy John Mellor-Crummey

More information

Predictive Line Buffer: A fast, Energy Efficient Cache Architecture

Predictive Line Buffer: A fast, Energy Efficient Cache Architecture Predictive Line Buffer: A fast, Energy Efficient Cache Architecture Kashif Ali MoKhtar Aboelaze SupraKash Datta Department of Computer Science and Engineering York University Toronto ON CANADA Abstract

More information

A Software Solution for Dynamic Stack Management on Scratch Pad Memory

A Software Solution for Dynamic Stack Management on Scratch Pad Memory A Software Solution for Dynamic Stack Management on Scratch Pad Memory Arun Kannan, Aviral Shrivastava, Amit Pabalkar and Jong-eun Lee Department of Computer Science and Engineering Arizona State University,

More information

WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores

WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores This is the author prepared accepted version. 2014 IEEE. The published version is: Yooseong Kim, David Broman, Jian Cai, and Aviral Shrivastaval. WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed

More information

CS 654 Computer Architecture Summary. Peter Kemper

CS 654 Computer Architecture Summary. Peter Kemper CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining

More information

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors Weifeng Liu and Brian Vinter Niels Bohr Institute University of Copenhagen Denmark {weifeng, vinter}@nbi.dk March 1, 2014 Weifeng

More information

History-based Schemes and Implicit Path Enumeration

History-based Schemes and Implicit Path Enumeration History-based Schemes and Implicit Path Enumeration Claire Burguière and Christine Rochange Institut de Recherche en Informatique de Toulouse Université Paul Sabatier 6 Toulouse cedex 9, France {burguier,rochange}@irit.fr

More information

Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores

Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores Magnus Ekman Per Stenstrom Department of Computer Engineering, Department of Computer Engineering, Outline Problem statement Assumptions

More information

Cost-Driven Hybrid Configuration Prefetching for Partial Reconfigurable Coprocessor

Cost-Driven Hybrid Configuration Prefetching for Partial Reconfigurable Coprocessor Cost-Driven Hybrid Configuration Prefetching for Partial Reconfigurable Coprocessor Ying Chen, Simon Y. Chen 2 School of Engineering San Francisco State University 600 Holloway Ave San Francisco, CA 9432

More information

Selfish-LRU: Preemption-Aware Caching for Predictability and Performance

Selfish-LRU: Preemption-Aware Caching for Predictability and Performance Selfish-LRU: Preemption-Aware Caching for Predictability and Performance Jan Reineke, Sebastian Altmeyer, Daniel Grund, Sebastian Hahn, and Claire Maiza Saarland University, Saarbrücken, Germany reineke@cs.uni-saarland.de,

More information

Hybrid Heuristics for Optimizing Energy Consumption in Embedded Systems

Hybrid Heuristics for Optimizing Energy Consumption in Embedded Systems Hybrid Heuristics for Optimizing Energy Consumption in Embedded Systems Maha IDRISSI AOUAD 1, René SCHOTT 2 and Olivier ZENDRA 1 1 INRIA Nancy - Grand Est / LORIA. 615, Rue du Jardin Botanique, 54600 Villers-Lès-Nancy,

More information

SYSTEMS MEMO #12. A Synchronization Library for ASIM. Beng-Hong Lim Laboratory for Computer Science.

SYSTEMS MEMO #12. A Synchronization Library for ASIM. Beng-Hong Lim Laboratory for Computer Science. ALEWIFE SYSTEMS MEMO #12 A Synchronization Library for ASIM Beng-Hong Lim (bhlim@masala.lcs.mit.edu) Laboratory for Computer Science Room NE43-633 January 9, 1992 Abstract This memo describes the functions

More information

Cache contents selection for statically-locked instruction caches: an algorithm comparison

Cache contents selection for statically-locked instruction caches: an algorithm comparison Cache contents selection for statically-locked instruction caches: an algorithm comparison Antonio Martí Campoy *, Isabelle Puaut **, Angel Perles Ivars *, Jose Vicente Busquets Mataix * * Computer Engineering

More information

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,

More information

Pull based Migration of Real-Time Tasks in Multi-Core Processors

Pull based Migration of Real-Time Tasks in Multi-Core Processors Pull based Migration of Real-Time Tasks in Multi-Core Processors 1. Problem Description The complexity of uniprocessor design attempting to extract instruction level parallelism has motivated the computer

More information

IBM's POWER5 Micro Processor Design and Methodology

IBM's POWER5 Micro Processor Design and Methodology IBM's POWER5 Micro Processor Design and Methodology Ron Kalla IBM Systems Group Outline POWER5 Overview Design Process Power POWER Server Roadmap 2001 POWER4 2002-3 POWER4+ 2004* POWER5 2005* POWER5+ 2006*

More information

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics (Sections B.1-B.3, 2.1)

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics (Sections B.1-B.3, 2.1) Lecture: SMT, Cache Hierarchies Topics: memory dependence wrap-up, SMT processors, cache access basics (Sections B.1-B.3, 2.1) 1 Problem 3 Consider the following LSQ and when operands are available. Estimate

More information

Benchmarking the Memory Hierarchy of Modern GPUs

Benchmarking the Memory Hierarchy of Modern GPUs 1 of 30 Benchmarking the Memory Hierarchy of Modern GPUs In 11th IFIP International Conference on Network and Parallel Computing Xinxin Mei, Kaiyong Zhao, Chengjian Liu, Xiaowen Chu CS Department, Hong

More information

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1]) EE392C: Advanced Topics in Computer Architecture Lecture #10 Polymorphic Processors Stanford University Thursday, 8 May 2003 Virtual Machines Lecture #10: Thursday, 1 May 2003 Lecturer: Jayanth Gummaraju,

More information

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2. Lecture: SMT, Cache Hierarchies Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) 1 Problem 0 Consider the following LSQ and when operands are

More information