Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance
|
|
- Arlene McBride
- 5 years ago
- Views:
Transcription
1 Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance Wei Zhang and Yiqiang Ding Department of Electrical and Computer Engineering Virginia Commonwealth University
2 Outline Introduction Related works Architectural time-predictability factor Hybrid SPM-Cache architectures Evaluation methodology Experimental results Conclusions
3 Worst-Case Execution Time (WCET) Robotics Medical equipment Avionics Possible Execution Time BCET WCET 3 Average case Execution Time Estimated WCET
4 Cache vs SPM High performance and good time-predictability Cache Good average-case performance Harmful to time predictability because of it depends on the history of memory accesses and the placement and replacement algorithms Scratch-Pad Memory (SPM) Time predictable because of statically predictable memory access time Better energy efficiency Inferior average-case performance because of no dynamical use of on-chip memory space
5 Our Contributions We propose a hybrid on-chip memory architecture that can leverage the SPM to achieve time predictability while exploiting the cache to improve the average-case performance We have studied three different hybrid architectures to understand how to make best use of the hybrid cache and SPM to store instructions and data for balancing performance and time predictability While most prior works indicate performance and time predictability usually conflict with each other, this research shows that it is possible to exploit hybrid cache-spm models for improving both time predictability and performance
6 Related Works Cache partitioning[1,2,3] and cache locking[4,5,6] Reduce cache interferences between tasks Prevent dynamic use of cache space which degrade performance Some hybrid architecture such as IH-DH can actually boost performance SPM allocation algorithms [7,8,9] Improve either the average-case performance or WCET Not targeting hybrid SPM-cache
7 Related Works Exploration of hybrid models of cache and SPM Panda et al. studied partitioning scalar and array variables into SPM to minimize the execution time [1] Verma et al. studied an instruction cache behavior based SPM allocation to reduce the energy consumption [11] Cong et al. examined an adaptive hybrid cache by reconfiguring a part of the cache as software-managed SPM to improve both performance and energy efficiency [12] Not for achieving time predictability
8 Architectural Time-predictability Factor Architectural Time-predictability Factor (ATF) [Ding et al. SIGBED Review, Nov. 212] Given a processor P, an arbitrary real-time trace T, the actual execution time D(P, T), and the statically predicted execution time based on the timing contract S(P, T), ATF can be defined as: ATF(P, T) D(P, T) S(P, T) S(P, T) D(P, T) Instruction scheduling Running on a processor static exec time dynamic exec time ATF dynamic exec time static exec time
9 Three Hybrid SPM-cache Architectures
10 SPM Allocation Algorithm
11 Simulation We use Trimaran compiler/simulator framework Baseline processor has 2 integer ALUs, 2 float ALUs, 1 branch predictor, 1 ld/st unit, and 1-level on-chip memory Benchmarks: 6 Malardalen WCET benchmark and 7 Mediabench On-chip memory configuration Real-time benchmark: 128 bytes MediaBench benchmark: 16K bytes Cache configuration Real-time benchmark: 16 byte block size, direct-mapped MediaBench benchmark: 32 byte block size, 4-way, LRU Hybrid SPM-cache configuration Two different partitions while the total on-chip size is fixed i-m scheme: a M-byte cache and an (N-M)-byte SPM
12 IH-DC Architecture IS DS IC DC IH DC 64 IH DC 32 ATF real-time benchmarks ATF crc edn lms matmult ndes statemate ATF media benchmarks ATF IS DS IC DC IH DC 8K IH DC 4K
13 IH-DC Architecture IS DS IC DC IH DC 64 IH DC 32 Performance real-time benchmarks Normalized Performance crc edn lms matmult ndes statemate IS DS IC DC IH DC 8K IH DC 4K Performance media benchmarks Normalized Performance
14 IC-DH Architecture IS DS IC DC IC DH 64 IC DH 32 ATF real-time benchmarks ATF crc edn lms matmult ndes statemate ATF media benchmarks ATF IS DS IC DC IC DH 8K IC DH 4K
15 IC-DH Architecture IS DS IC DC IC DH 64 IC DH 32 Performance real-time benchmarks Normalized Performance crc edn lms matmult ndes statemate IS DS IC DC IC DH 8K IC DH 4K Performance media benchmarks Normalized Performance
16 IH-DH Architecture IC DC IH DH 64 IH DH 32 IS DS ATF real-time benchmarks ATF crc edn lms matmult ndes statemate IC DC IH DH 8K IH DH 4K IS DS ATF media benchmarks ATF
17 Performance real-time benchmarks IH-DH Architecture Normalized Performance IC DC IH DH 64 IH DH 32 IS DS.2 On average, the performance of IH DH is 1.9% better crc edn lms matmult ndes statemate than that of the IC DC architecture for real time IC DC IH DH 8K IH DH 4K IS DS benchmarks, and it is 4% better for media benchmarks. 1.2 Performance media benchmarks Normalized Performance
18 WCET Results
19 Conclusions These hybrid architectures can provide a variety of performance and time predictability for a wide range of benchmarks IH-DH is the best hybrid on-chip memory architecture that can achieve both good time predictability and high performance IH-DH can outperform the pure cache-based architecture IC-DC for most benchmarks, revealing that improving time predictability and performance does not have to always conflict with each other
20 Further architectural exploration Future Work Other hybrid on-chip memories such as instruction (data) SPM and data (instruction) cache Impact on energy consumption, in addition to performance and time predictability Explore different SPM allocation algorithms for various hybrid SPM-cache architectures Investigate the use of hybrid on-chip memory architectures in a multicore platform to balance time predictability and performance for multi-threaded and multi-programmed workloads
21 References [1] D. Kirk. "SMART (strategic memory allocation for realtime) cache design,in Proceedings of 5th IEEE International Real-Time Systems Symposium (RTSS), 1989 [2] D. Kirk and J. Strosnider. "SMART (strategic memory allocation for realtime) cache design using the mips r3, In Proceedings of 6th IEEEbInternational Real-Time Systems Symposium (RTSS), 199 [3] F.Mueller. Compiler support for software-baed cache partitioning. SIGPLAN Notice, Vol. 3, Nov [4] I. Puaut and D. Decotigny. "Low-complexity algorithms for static cache locking in multitasking hard real-time systems, In Proceedings of 18 th IEEE International Real-Time Systems Symposium (RTSS), 22 [5] X. Vera, B. Lisper and J. Xue. "Data cache locking for higher program predictability, ACM SIGMETRICS, 23. [6] V. Suhendra and T. Mitra. "Exploring locking & partitioning for predictable shared caches on multi-cores, In Proceedings of 45th Design Automatic Conference, 28
22 References [7] S. Steinke et al. "Assigning program and data objects to scratchpad for energy reduction, In Proceedings of Europe Design and Test Conference, 22. [8] M. Kandemir, I. Kadayif, A. Choudhary and J. Ramanujam. "Compiler-directed scratch pad memory optimization for embedded multiprocessors, IEEE Transactions on VLSI Systems, Vol. 12, No. 3, March 24. [9] J. F. Deverge and I. Puaut. "WCET-directed dynamic scratchpad memory allocation of data, In Proceedings of 19th Euromicro Conference on Real-Time Systems (ECRTS), July, 27. [1] P. Panda, N. Dutt and A. Nicolau. "Efficient utilization of scratch-pad memory in embedded processor applications, In Proceedings of Europe Design and Test Conference, March [11] M. Verma, L. Wehmeyer, and P. Marwedel. "Cache-aware scratchpad allocation algorithm, In Proceedings of Deesign, Automation and Test in Europe Conference, 24. [12] J. Cong, K. Gururaj, H. Huang, C. Liu, G. Reinman, and Y. Zou. "An Energyecient adaptive hybrid cache, In Proceedings of International Symposium on Low Power Electronics and Design, 211.
23 Acknowledgment: This work was funded in part by the NSF grant CCF
24 Static Execution Time Analysis
Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability
Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability Yiqiang Ding, Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Outline
More informationShared Cache Aware Task Mapping for WCRT Minimization
Shared Cache Aware Task Mapping for WCRT Minimization Huping Ding & Tulika Mitra School of Computing, National University of Singapore Yun Liang Center for Energy-efficient Computing and Applications,
More informationA Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System
A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI, CHEN TIANZHOU, SHI QINGSONG, JIANG NING College of Computer Science Zhejiang University College of Computer
More informationA Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System
A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI CHEN TIANZHOU SHI QINGSONG JIANG NING College of Computer Science Zhejiang University College of Computer Science
More informationSireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern
Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Introduction WCET of program ILP Formulation Requirement SPM allocation for code SPM allocation for data Conclusion
More informationManaging Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems
Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems Zimeng Zhou, Lei Ju, Zhiping Jia, Xin Li School of Computer Science and Technology Shandong University, China Outline
More informationCaches in Real-Time Systems. Instruction Cache vs. Data Cache
Caches in Real-Time Systems [Xavier Vera, Bjorn Lisper, Jingling Xue, Data Caches in Multitasking Hard Real- Time Systems, RTSS 2003.] Schedulability Analysis WCET Simple Platforms WCMP (memory performance)
More informationPower Efficient Instruction Caches for Embedded Systems
Power Efficient Instruction Caches for Embedded Systems Dinesh C. Suresh, Walid A. Najjar, and Jun Yang Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
More informationCaches in Real-Time Systems. Instruction Cache vs. Data Cache
Caches in Real-Time Systems [Xavier Vera, Bjorn Lisper, Jingling Xue, Data Caches in Multitasking Hard Real- Time Systems, RTSS 2003.] Schedulability Analysis WCET Simple Platforms WCMP (memory performance)
More informationA METHODOLOGY FOR THE OPTIMIZATION OF MULTI- PROGRAM SHARED SCRATCHPAD MEMORY
INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS VOL. 4, NO. 1, MARCH 2011 A METHODOLOGY FOR THE OPTIMIZATION OF MULTI- PROGRAM SHARED SCRATCHPAD MEMORY J. F. Yang, H. Jiang School of Electronic
More informationPredictable paging in real-time systems: an ILP formulation
Predictable paging in real-time systems: an ILP formulation Damien Hardy Isabelle Puaut Université Européenne de Bretagne / IRISA, Rennes, France Abstract Conventionally, the use of virtual memory in real-time
More informationOptimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip
Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip 1 Mythili.R, 2 Mugilan.D 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,
More informationExploiting Standard Deviation of CPI to Evaluate Architectural Time-Predictability
Regular Paper Journal of Computing Science and Engineering, Vol. 8, No. 1, March 2014, pp. 34-42 Exploiting Standard Deviation of CPI to Evaluate Architectural Time-Predictability Wei Zhang* and Yiqiang
More informationOperating system integrated energy aware scratchpad allocation strategies for multiprocess applications
University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel
More informationPhase-based Cache Locking for Embedded Systems
Phase-based Cache Locking for Embedded Systems Tosiron Adegbija and Ann Gordon-Ross* Department of Electrical and Computer Engineering, University of Florida (UF), Gainesville, FL 32611, USA tosironkbd@ufl.edu
More informationCache-Aware Scratchpad Allocation Algorithm
1530-1591/04 $20.00 (c) 2004 IEEE -Aware Scratchpad Allocation Manish Verma, Lars Wehmeyer, Peter Marwedel Department of Computer Science XII University of Dortmund 44225 Dortmund, Germany {Manish.Verma,
More informationExploring Hybrid SPM-Cache Architectures to Improve Performance and Energy Efficiency for Real-time Computing
Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2013 Exploring Hybrid SPM-Cache Architectures to Improve Performance and Energy Efficiency for Real-time Computing
More informationEfficient Cache Locking at Private First-Level Caches and Shared Last-Level Cache for Modern Multicore Systems
Efficient Cache Locking at Private First-Level Caches and Shared Last-Level Cache for Modern Multicore Systems Abu Asaduzzaman 1, Kishore K. Chidella 2, Md Moniruzzaman 3 1,2,3 Electrical Engineering and
More informationImproving Worst-Case Cache Performance through Selective Bypassing and Register-Indexed Cache
Improving WorstCase Cache Performance through Selective Bypassing and RegisterIndexed Cache Mohamed Ismail, Daniel Lo, and G. Edward Suh Cornell University Ithaca, NY, USA {mii5, dl575, gs272}@cornell.edu
More informationMemory-architecture aware compilation
- 1- ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 8 2008 Memory-architecture aware compilation Lecturers: Peter Marwedel, Heiko Falk Informatik 12 TU Dortmund, Germany
More informationOptimizations - Compilation for Embedded Processors -
12 Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 23 211 年 1 月 12 日 These slides use Microsoft clip arts.
More informationUsage of Scratchpad Memory In Embedded Systems - State of Art
Usage of Scratchpad Memory In Embedded Systems - State of Art B. An uradha I Department of Computer Science and Engineering Dr. C. Vivekanandan2 Dean Electrical Sciences & Student Affairs SNS College of
More informationOptimizations - Compilation for Embedded Processors -
Springer, 2010 12 Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany 2014 年 01 月 17 日 These slides use Microsoft clip arts. Microsoft copyright restrictions
More informationStack Frames Placement in Scratch-Pad Memory for Energy Reduction of Multi-task Applications
Stack Frames Placement in Scratch-Pad Memory for Energy Reduction of Multi-task Applications LOVIC GAUTHIER 1, TOHRU ISHIHARA 1, AND HIROAKI TAKADA 2 1 System LSI Research Center, 3rd Floor, Institute
More informationA Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors
, July 4-6, 2018, London, U.K. A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid in 3D chip Multi-processors Lei Wang, Fen Ge, Hao Lu, Ning Wu, Ying Zhang, and Fang Zhou Abstract As
More informationSingle-Path Programming on a Chip-Multiprocessor System
Single-Path Programming on a Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at
More informationEfficient Computing in Cyber-Physical Systems
12 Efficient Computing in Cyber-Physical Systems Peter Marwedel TU Dortmund (Germany) Informatik 12 Springer, 2010 2013/06/20 Cyber-physical systems and embedded systems Embedded systems (ES): information
More informationCompiler-Directed Scratchpad Memory Management via Graph Coloring
Compiler-Directed Scratchpad Memory Management via Graph Coloring Lian Li, Hui Feng and Jingling Xue University of New South Wales Scratchpad memory (SPM), a fast on-chip SRAM managed by software, is widely
More informationScope-based Method Cache Analysis
Scope-based Method Cache Analysis Benedikt Huber 1, Stefan Hepp 1, Martin Schoeberl 2 1 Vienna University of Technology 2 Technical University of Denmark 14th International Workshop on Worst-Case Execution
More informationCache-Aware Instruction SPM Allocation for Hard Real-Time Systems
Cache-Aware Instruction SPM Allocation for Hard Real-Time Systems Arno Luppold Institute of Embedded Systems Hamburg University of Technology Germany Arno.Luppold@tuhh.de Christina Kittsteiner Institute
More informationOptimal WCET-Aware Code Selection for Scratchpad Memory
Optimal WCET-Aware Code Selection for Scratchpad Memory Hui Wu, Jingling Xue, Sri Parameswaran School of Computer Science and Engineering The Uniersity of New South Wales {huiw,jingling,sridean}@cse.unsw.edu.au
More informationBest Practice for Caching of Single-Path Code
Best Practice for Caching of Single-Path Code Martin Schoeberl, Bekim Cilku, Daniel Prokesch, and Peter Puschner Technical University of Denmark Vienna University of Technology 1 Context n Real-time systems
More informationPartitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems
Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems Hideki Takase, Hiroyuki Tomiyama and Hiroaki Takada Graduate School of Information Science, Nagoya University
More informationMemory Systems and Compiler Support for MPSoC Architectures. Mahmut Kandemir and Nikil Dutt. Cap. 9
Memory Systems and Compiler Support for MPSoC Architectures Mahmut Kandemir and Nikil Dutt Cap. 9 Fernando Moraes 28/maio/2013 1 MPSoC - Vantagens MPSoC architecture has several advantages over a conventional
More informationProfiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency
Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu
More informationSPM Management Using Markov Chain Based Data Access Prediction*
SPM Management Using Markov Chain Based Data Access Prediction* Taylan Yemliha Syracuse University, Syracuse, NY Shekhar Srikantaiah, Mahmut Kandemir Pennsylvania State University, University Park, PA
More informationFast, predictable and low energy memory references through architecture-aware compilation 1
; Fast, predictable and low energy memory references through architecture-aware compilation 1 Peter Marwedel, Lars Wehmeyer, Manish Verma, Stefan Steinke +, Urs Helmig University of Dortmund, Germany +
More informationAligning Single Path Loops to Reduce the Number of Capacity Cache Misses
Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses Bekim Cilku, Roland Kammerer, and Peter Puschner Institute of Computer Engineering Vienna University of Technology A0 Wien, Austria
More informationPARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures *
PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * Sudarshan Banerjee, Elaheh Bozorgzadeh, Nikil Dutt Center for Embedded Computer Systems
More informationTiming analysis and timing predictability
Timing analysis and timing predictability Architectural Dependences Reinhard Wilhelm Saarland University, Saarbrücken, Germany ArtistDesign Summer School in China 2010 What does the execution time depends
More informationScratchpad memory vs Caches - Performance and Predictability comparison
Scratchpad memory vs Caches - Performance and Predictability comparison David Langguth langguth@rhrk.uni-kl.de Abstract While caches are simple to use due to their transparency to programmer and compiler,
More informationCompiler-assisted dynamic scratch-pad memory management with space overlapping for embedded systems
SOFTWARE PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2011; 41:737 752 Published online 28 October 2010 in Wiley Online Library (wileyonlinelibrary.com)..1020 Compiler-assisted dynamic scratch-pad memory
More informationDo we need a crystal ball for task migration?
Do we need a crystal ball for task migration? Brandon {Myers,Holt} University of Washington bdmyers@cs.washington.edu 1 Large data sets Data 2 Spread data Data.1 Data.2 Data.3 Data.4 Data.0 Data.1 Data.2
More informationTightening the Bounds on Feasible Preemption Points
Tightening the Bounds on Feasible Preemption Points Harini Ramaprasad, Frank Mueller North Carolina State University,Raleigh, NC 27695-7534, mueller@cs.ncsu.edu Abstract Caches have become invaluable for
More informationDual-Processor Design of Energy Efficient Fault-Tolerant System
Dual-Processor Design of Energy Efficient Fault-Tolerant System Shaoxiong Hua Synopsys Inc. 7 E. Middlefield Road Mountain View, CA 9443 huas@synopsys.com Pushkin R. Pari Intel Technology India Pvt. Ltd.
More informationA Novel Technique to Use Scratch-pad Memory for Stack Management
A Novel Technique to Use Scratch-pad Memory for Stack Management Soyoung Park Hae-woo Park Soonhoi Ha School of EECS, Seoul National University, Seoul, Korea {soy, starlet, sha}@iris.snu.ac.kr Abstract
More informationPROCESSORS are increasingly replacing gates as the basic
816 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 8, AUGUST 2006 Exploiting Statistical Information for Implementation of Instruction Scratchpad Memory in Embedded System
More informationCode Placement Techniques for Cache Miss Rate Reduction
Code Placement Techniques for Cache Miss Rate Reduction HIROYUKI TOMIYAMA and HIROTO YASUURA Kyushu University In the design of embedded systems with cache memories, it is important to minimize the cache
More informationWCET-Aware C Compiler: WCC
12 WCET-Aware C Compiler: WCC Jian-Jia Chen (slides are based on Prof. Heiko Falk) TU Dortmund, Informatik 12 2015 年 05 月 05 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.
More information-- the Timing Problem & Possible Solutions
ARTIST Summer School in Europe 2010 Autrans (near Grenoble), France September 5-10, 2010 Towards Real-Time Applications on Multicore -- the Timing Problem & Possible Solutions Wang Yi Uppsala University,
More informationOn the Interplay of Loop Caching, Code Compression, and Cache Configuration
On the Interplay of Loop Caching, Code Compression, and Cache Configuration Marisha Rawlins and Ann Gordon-Ross* Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL
More informationBounding Worst-Case DRAM Performance on Multicore Processors
Regular Paper Journal of Computing Science and Engineering, Vol. 7, No. 1, March 2013, pp. 53-66 Bounding Worst-Case DRAM Performance on Multicore Processors Yiqiang Ding, Lan Wu, and Wei Zhang* Department
More informationInstruction Cache Locking Using Temporal Reuse Profile
Instruction Cache Locking Using Temporal Reuse Profile Yun Liang, Tulika Mitra School of Computing, National University of Singapore {liangyun,tulika}@comp.nus.edu.sg ABSTRACT The performance of most embedded
More informationA Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking
A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking Bekim Cilku, Daniel Prokesch, Peter Puschner Institute of Computer Engineering Vienna University of Technology
More informationAligning Single Path Loops to Reduce the Number of Capacity Cache Misses
Aligning Single Path Loops to Reduce the Number of Capacity Cache Misses Bekim Cilku Institute of Computer Engineering Vienna University of Technology A40 Wien, Austria bekim@vmars tuwienacat Roland Kammerer
More informationA Comparison of Capacity Management Schemes for Shared CMP Caches
A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip
More informationAn Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling
An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate
More informationAS DEMANDS for higher performance keep growing,
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 6, JUNE 2013 809 Data Placement and Duplication for Embedded Multicore Systems With Scratch Pad Memory Yibo Guo,
More informationSF-LRU Cache Replacement Algorithm
SF-LRU Cache Replacement Algorithm Jaafar Alghazo, Adil Akaaboune, Nazeih Botros Southern Illinois University at Carbondale Department of Electrical and Computer Engineering Carbondale, IL 6291 alghazo@siu.edu,
More informationPerformance Balancing: Software-based On-chip Memory Management for Effective CMP Executions
Performance Balancing: Software-based On-chip Memory Management for Effective CMP Executions Naoto Fukumoto, Kenichi Imazato, Koji Inoue, Kazuaki Murakami Department of Advanced Information Technology,
More informationEfficient Pointer Management of Stack Data for Software Managed Multicores
Efficient Pointer Management of Stack Data for Software Managed Multicores Jian Cai, Aviral Shrivastava Compiler Microarchitecture Laboratory Arizona State University Tempe, Arizona 85287 USA {jian.cai,
More informationScalable Memory Hierarchies for Embedded Manycore Systems
Scalable Hierarchies for Embedded Manycore Systems Sen Ma, Miaoqing Huang, Eugene Cartwright, and David Andrews Department of Computer Science and Computer Engineering University of Arkansas {senma,mqhuang,eugene,dandrews}@uark.edu
More informationEfficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems
Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun
More informationScratchpad Memory Aware Task Scheduling with Minimum Number of Preemptions on a Single Processor
Scratchpad Memory Aware Task Scheduling with Minimum Number of Preemptions on a Single Processor QingWan HuiWu Jingling Xue School of Computer Science and Engineering University of New South Wales Sydney,
More informationCombining Worst-Case Timing Models, Loop Unrolling, and Static Loop Analysis for WCET Minimization
Combining Worst-Case Timing Models, Loop Unrolling, and Static Loop Analysis for WCET Minimization Paul Lokuciejewski, Peter Marwedel Computer Science 12 TU Dortmund University D-44221 Dortmund, Germany
More informationDynamic scratch-pad memory management with data pipelining for embedded systems
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2010; 22:1874 1892 Published online 7 June 2010 in Wiley InterScience (www.interscience.wiley.com)..1602 Dynamic
More informationNoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores
www.bsc.es NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores Jordi Cardona 1,2, Carles Hernandez 1, Enrico Mezzetti 1, Jaume Abella 1 and Francisco J.Cazorla 1,3 1 Barcelona
More informationChapter 13 Reduced Instruction Set Computers
Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining
More informationPredictable code and data paging for real time systems
Predictable code and data paging for real time systems Damien Hardy Isabelle Puaut Université Européenne de Bretagne / IRISA, Rennes, France Abstract There is a need for using virtual memory in real-time
More informationAn Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems
[DOI: 10.2197/ipsjtsldm.8.100] Short Paper An Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems Takuya Hatayama 1,a) Hideki Takase 1 Kazuyoshi Takagi 1 Naofumi
More informationIntegrated Scratchpad Memory Optimization and Task Scheduling for MPSoC Architectures
Integrated Scratchpad Memory Optimization and Task Scheduling for MPSoC Architectures Vivy Suhendra, Chandrashekar Raghavan, Tulika Mitra School of Computing National University of Singapore {vivy, chandra1,
More informationCompiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems
Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems Florin Balasa American University in Cairo Noha Abuaesh American University in Cairo Ilie I. Luican Microsoft Inc., USA Cristian
More informationAchieving Predictable Multicore Execution of Automotive Applications Using the LET Paradigm
Achieving Predictable Multicore Execution of Automotive Applications Using the LET Paradigm Alessandro Biondi and Marco Di Natale Scuola Superiore Sant Anna, Pisa, Italy Introduction The introduction of
More informationStatic and Dynamic Frequency Scaling on Multicore CPUs
Static and Dynamic Frequency Scaling on Multicore CPUs Wenlei Bao 1 Changwan Hong 1 Sudheer Chunduri 2 Sriram Krishnamoorthy 3 Louis-Noël Pouchet 4 Fabrice Rastello 5 P. Sadayappan 1 1 The Ohio State University
More informationData Cache Locking for Tight Timing Calculations
Data Cache Locking for Tight Timing Calculations XAVIER VERA and BJÖRN LISPER Mälardalens Högskola and JINGLING XUE University of New South Wales Caches have become increasingly important with the widening
More informationOne-Level Cache Memory Design for Scalable SMT Architectures
One-Level Cache Design for Scalable SMT Architectures Muhamed F. Mudawar and John R. Wani Computer Science Department The American University in Cairo mudawwar@aucegypt.edu rubena@aucegypt.edu Abstract
More informationA Novel Instruction Scratchpad Memory Optimization Method based on Concomitance Metric
A Novel Instruction Scratchpad Memory Optimization Method based on Concomitance Metric Andhi Janapsatya, Aleksandar Ignjatović, Sri Parameswaran School of Computer Science and Engineering, The University
More informationReducing Energy Consumption by Dynamic Copying of Instructions onto Onchip Memory
Reducing Consumption by Dynamic Copying of Instructions onto Onchip Memory Stefan Steinke *, Nils Grunwald *, Lars Wehmeyer *, Rajeshwari Banakar +, M. Balakrishnan +, Peter Marwedel * * University of
More informationAn Approach for Adaptive DRAM Temperature and Power Management
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 An Approach for Adaptive DRAM Temperature and Power Management Song Liu, Yu Zhang, Seda Ogrenci Memik, and Gokhan Memik Abstract High-performance
More informationLecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)
Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling
More informationCompilation for Heterogeneous Platforms
Compilation for Heterogeneous Platforms Grid in a Box and on a Chip Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/heterogeneous.pdf Senior Researchers Ken Kennedy John Mellor-Crummey
More informationPredictive Line Buffer: A fast, Energy Efficient Cache Architecture
Predictive Line Buffer: A fast, Energy Efficient Cache Architecture Kashif Ali MoKhtar Aboelaze SupraKash Datta Department of Computer Science and Engineering York University Toronto ON CANADA Abstract
More informationA Software Solution for Dynamic Stack Management on Scratch Pad Memory
A Software Solution for Dynamic Stack Management on Scratch Pad Memory Arun Kannan, Aviral Shrivastava, Amit Pabalkar and Jong-eun Lee Department of Computer Science and Engineering Arizona State University,
More informationWCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores
This is the author prepared accepted version. 2014 IEEE. The published version is: Yooseong Kim, David Broman, Jian Cai, and Aviral Shrivastaval. WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed
More informationCS 654 Computer Architecture Summary. Peter Kemper
CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining
More informationad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors Weifeng Liu and Brian Vinter Niels Bohr Institute University of Copenhagen Denmark {weifeng, vinter}@nbi.dk March 1, 2014 Weifeng
More informationHistory-based Schemes and Implicit Path Enumeration
History-based Schemes and Implicit Path Enumeration Claire Burguière and Christine Rochange Institut de Recherche en Informatique de Toulouse Université Paul Sabatier 6 Toulouse cedex 9, France {burguier,rochange}@irit.fr
More informationPerformance and Power Impact of Issuewidth in Chip-Multiprocessor Cores
Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores Magnus Ekman Per Stenstrom Department of Computer Engineering, Department of Computer Engineering, Outline Problem statement Assumptions
More informationCost-Driven Hybrid Configuration Prefetching for Partial Reconfigurable Coprocessor
Cost-Driven Hybrid Configuration Prefetching for Partial Reconfigurable Coprocessor Ying Chen, Simon Y. Chen 2 School of Engineering San Francisco State University 600 Holloway Ave San Francisco, CA 9432
More informationSelfish-LRU: Preemption-Aware Caching for Predictability and Performance
Selfish-LRU: Preemption-Aware Caching for Predictability and Performance Jan Reineke, Sebastian Altmeyer, Daniel Grund, Sebastian Hahn, and Claire Maiza Saarland University, Saarbrücken, Germany reineke@cs.uni-saarland.de,
More informationHybrid Heuristics for Optimizing Energy Consumption in Embedded Systems
Hybrid Heuristics for Optimizing Energy Consumption in Embedded Systems Maha IDRISSI AOUAD 1, René SCHOTT 2 and Olivier ZENDRA 1 1 INRIA Nancy - Grand Est / LORIA. 615, Rue du Jardin Botanique, 54600 Villers-Lès-Nancy,
More informationSYSTEMS MEMO #12. A Synchronization Library for ASIM. Beng-Hong Lim Laboratory for Computer Science.
ALEWIFE SYSTEMS MEMO #12 A Synchronization Library for ASIM Beng-Hong Lim (bhlim@masala.lcs.mit.edu) Laboratory for Computer Science Room NE43-633 January 9, 1992 Abstract This memo describes the functions
More informationCache contents selection for statically-locked instruction caches: an algorithm comparison
Cache contents selection for statically-locked instruction caches: an algorithm comparison Antonio Martí Campoy *, Isabelle Puaut **, Angel Perles Ivars *, Jose Vicente Busquets Mataix * * Computer Engineering
More informationEffective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management
International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,
More informationPull based Migration of Real-Time Tasks in Multi-Core Processors
Pull based Migration of Real-Time Tasks in Multi-Core Processors 1. Problem Description The complexity of uniprocessor design attempting to extract instruction level parallelism has motivated the computer
More informationIBM's POWER5 Micro Processor Design and Methodology
IBM's POWER5 Micro Processor Design and Methodology Ron Kalla IBM Systems Group Outline POWER5 Overview Design Process Power POWER Server Roadmap 2001 POWER4 2002-3 POWER4+ 2004* POWER5 2005* POWER5+ 2006*
More informationLecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics (Sections B.1-B.3, 2.1)
Lecture: SMT, Cache Hierarchies Topics: memory dependence wrap-up, SMT processors, cache access basics (Sections B.1-B.3, 2.1) 1 Problem 3 Consider the following LSQ and when operands are available. Estimate
More informationBenchmarking the Memory Hierarchy of Modern GPUs
1 of 30 Benchmarking the Memory Hierarchy of Modern GPUs In 11th IFIP International Conference on Network and Parallel Computing Xinxin Mei, Kaiyong Zhao, Chengjian Liu, Xiaowen Chu CS Department, Hong
More informationVirtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])
EE392C: Advanced Topics in Computer Architecture Lecture #10 Polymorphic Processors Stanford University Thursday, 8 May 2003 Virtual Machines Lecture #10: Thursday, 1 May 2003 Lecturer: Jayanth Gummaraju,
More informationLecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.
Lecture: SMT, Cache Hierarchies Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) 1 Problem 0 Consider the following LSQ and when operands are
More information