Efficient Computing in Cyber-Physical Systems

Size: px
Start display at page:

Download "Efficient Computing in Cyber-Physical Systems"

Transcription

1 12 Efficient Computing in Cyber-Physical Systems Peter Marwedel TU Dortmund (Germany) Informatik 12 Springer, /06/20

2 Cyber-physical systems and embedded systems Embedded systems (ES): information processing embedded into a larger product [Peter Marwedel, 2003] Cyber-physical systems (CPS): integrations of computation with physical processes [Edward Lee, 2006]. CPS = ES + physical environment Information processing Embedded systems ("small computers") Physics - 2 -

3 Outline Scope, opportunities & challenges (Partial) solutions to challenges Timing predictability - WCET-aware compilation: unrolling, locking - register allocation, scratch pads Energy efficiency - scratch pads: (non)-overlaying, compile/run-time allocation Tradeoffs between multiple objectives Summary - PAMONO bio sensor - automatic parallelization: energy efficiency vs. performance - approximate computations: timeliness vs. exactness - 3 -

4 Opportunities Transportation Automotive Avionics Rail Medical Hospitals Support of handicapped Patient information Support for the elderly ~2000 : Microsoft Cliparts + P. Marwedel, 2011, - 4 -

5 Opportunities (2) (Urban) living Telecommunication Consumer electronics Smart home Logistics Smart grid Public safety Structural health monitoring Robotics Military systems Graphics: P. Marwedel, 2011, Microsoft - 5 -

6 Challenge (1): timing predictability Timing: The system must react within a time interval dictated by the environment [Adopted from Bergé] execute t Consequences frequently underestimated Always! Not subject to statistical arguments (except possibly extremely small probabilities) Graphics: Microsoft Cliparts - 6 -

7 Challenges (2): Energy/power efficiency Need for energy/power efficient processing ( sweet corner ) Graphics: P. Marwedel, H. De Man, Microsoft - 7 -

8 Challenge(3): Many more objectives exactness of results, QoS average case delay safety security thermal behavior reliability EMC characteristics radiation hardness cost, size weight environmental friendliness extendability Threatened by increased openness Graphics: P. Marwedel, 2011; Microsoft - 8 -

9 Challenges (4) Analog/digital interface (quantization, ) Consequences of networked embedded systems digital-analog signal e.g. maintain level of local control - 9 -

10 Outline Scope, opportunities & challenges (Partial) solutions to challenges Timing predictability - WCET-aware compilation: unrolling, locking - register allocation, scratch pads Energy efficiency - scratch pads: (non)-overlaying, compile/run-time allocation Tradeoffs between multiple objectives Summary - PAMONO bio sensor - automatic parallelization: energy efficiency vs. performance - approximate computations: timeliness vs. exactness

11 Let s look at timing! Definition of worst case execution time: disrtribution of times BCET EST BCET worst-case guarantee worst-case performance possible execution times timing predictability WCET WCET EST Time constraint time WCET EST must be 1. safe (i.e. WCET) and 2. tight (WCET EST -WCET WCET EST ) Graphics: adopted from R. Wilhelm + Microsoft Cliparts

12 Current Trial-and-Error Based Development 1. Specification of CPS/ES software 2. Generation of Code (ANSI-C or similar) 3. Compilation for given target processor 4. Execution and/or simulation of machine code, using a (e.g. random) set of input data 5. Measurement-based computation of estimated worst case execution time (WCET meas ) 6. Adding safety margin (e.g. 20%) on top of WCET meas and call this WCET hypo 7. If WCET hypo > real-time constraint: change some detail, go back to 1 or

13 Structure of WCC (WCET-aware C-compiler) CRL2LLIR LLIR2CRL

14 Challenges for WCET EST -Minimization Worst-Case Execution Path (WCEP) WCET EST of a program = Length of longest execution path (WCEP) in that program WCET EST -Minimization: Reduction of the longest path Other optimizations do not result in a reduction of WCET EST Optimizations need to know the WCEP

15 WCET-oriented optimizations Extended loop analysis (CGO 09) Instruction cache locking (CODES/ISSS 07, CGO 12) Cache partitioning (WCET-Workshop 09) Procedure cloning (WCET-WS 07, CODES 07, SCOPES 08) Procedure/code positioning (ECRTS 08, CASES 11 (2x)) Function inlining (SMART 09, SMART 10) Loop unswitching/invariant paths (SCOPES 09) Loop unrolling (ECRTS 09) Register allocation (DAC 09, ECRTS 11)) Scratchpad optimization (DAC 09) Extension towards multi-objective optimization (RTSS 08) Superblock-based optimizations (ICESS 10) Surveys (Springer 10, Springer 12)

16 WCET-aware Loop Unrolling via Back-annotation WCET-information available at assembly level Unrolling to be applied at internal representation of source code Solution: Back-annotation: Experimental WCET-aware compiler WCC allows copying information: assembly code source code WCET data Back- Annotation High-Level IR (Source Code) Mem. Spec. Code generation Assembly code size Amount of spill code Low-Level IR (Assembly Code) Memory architecture info available

17 Results for unrolling: WCET Relative WCET [%] kbyte 1 kbyte 2 kbyte 4 kbyte 8 kbyte 16 kbyte 16 I-Cache Size STANDARD 100%: Avg. WCET for all benchmarks with O3 & no unrolling WCET reduction between 10.2% and 15.4% WCET-driven Unrolling outperforms std. unrolling by 13.7%

18 Relative WCET EST with I-Cache Locking 5 Benchmarks/ARM920T/Postpass-Opt Rel. WCET_EST [%] 110% 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% ADPCM G723 Statemate Compress MPEG2 H. Falk, Heiko, S. Plazar, H. Theiling: Compile Time Decided Instruction Cache Locking Using Worst-Case Execution Paths, Intern. Conf. on Hardware/Software Codesign and System Synthesis (CODES+ISSS), % (ARM920T) Cache Size [bytes] [S. Plazar et al.]

19 Register Allocation H. Falk, Heiko: WCET-aware Register Allocation based on Graph Coloring, 46th Design Automation Conference (DAC), % 100% = WCET EST using Standard Graph Coloring (highest degree) 100% 90% 80% Relative WCETEST [%] (Optimization Level -O3) 70% 60% 50% 40% 30% 20% 10% 0% adpcm_verify cjpeg_transupp compress crc dijkstra duff edge_detect edn epic expint fdct fft_1024 fft_256 fir fir2dim gsm gsm_encode h263 h264dec_block h264dec_macro iir_4_64 iir_biquad_n jfdctint latnrm_32_64 lmsfir_8_1 lmsfir_32_64 lpc ludcmp matmult matrix2_fixed matrix2_float md5 minver mult_10_10 mult_4_4 ndes prime qmf_receive qmf_transmit qurt rijndael_enc select sha spectral startup v32_benc Average

20 Scratch pad memories (SPM): Fast, energy-efficient, timing-predictable SPMs are small, physically separate memories mapped into the address space; Selection is by an appropriate address decoder (simple!) select Address space SPM 0 FFF.. Example scratch pad memory ARM7TDMI cores, wellknown for low power consumption Small; no tag memory

21 WCET EST -aware scratch pad allocation Setup Bosch Democar: Runnable IgnitionSWCSync WCET EST -aware scratch pad allocation of program code by WCC, including fully automated WCET EST analyses using ait P. Marwedel, 2011 WCET EST reduced to about 50%, compared to gcc. Property of the PREDATOR consortium. Confidential

22 Outline Scope, opportunities & challenges (Partial) solutions to challenges Timing predictability - WCET-aware compilation: unrolling, locking - register allocation, scratch pads Energy efficiency - scratch pads: (non)-overlaying, compile/run-time allocation Tradeoffs between multiple objectives Summary - PAMONO bio sensor - automatic parallelization: energy efficiency vs. performance - approximate computations: timeliness vs. exactness

23 Why care about energy efficiency? Execution platform E.g. Global warming Cost of energy Increasing performance Problems with cooling, avoiding hot spots Avoiding high currents & metal migration Reliability Energy a very scarce resource Plugged Factory Relevant during use? Uncharged periods Car Unplugged Sensor Power Graphics: P. Marwedel,

24 Where is the energy consumed? - Consumer portable systems - According to International Technology Roadmap for Semiconductors (ITRS), 2010 update, [ Violation of W constraint for small mobiles. It not just the logic, don t ignore memory! ITRS,

25 Energy consumption of memories CACTI: Scratchpad (SRAM) vs. DRAM (DDR2): DRAM High Performance - 16 banks SPM Access time (ns) K SRAM 8K 32K 128K 512K 2M 8M 32M 128M 512M 2G High Performance - 16 banks SPM Energy (nj) - read High Performance - 16 banks DDR Access time (ns) High Performance - 16 banks DDR Energy (nj) - read 16 bit read; size in bytes; 65 nm for SRAM, 80 nm for DRAM Source: Olivera Jovanovic, TU Dortmund,

26 Migration of data & instructions, global optimization model (TU Dortmund) Example: main memory for i.{ } for j..{ } while... repeat call... array... Which memory object (array, loop, etc.) to be stored in SPM? Non-overlaying ( static ) allocation: Gain g k and size s k for each object k. Maximise gain G = g k, respecting size of SPM SSP s k. scratch pad memory, capacity SSP? array Solution: knapsack algorithm. Overlaying ( dynamic ) allocation: processor int... Moving objects back and forth

27 Reduction in energy and average run-time for migrating functions and variables Cycles [x100] Energy [µj] Multi_sort (mix of sort algorithms) Measured processor / external memory energy + CACTI values for SPM (combined model) Numbers will change with technology, algorithms remain unchanged

28 Non-overlaying allocation problematic for multiple hot spots Overlaying allocation Memory CPU Memory SPM Effectively results in a kind of compilercontrolled overlays for SPM Address assignment within SPM required

29 Overlaying allocation by Verma et al. (1) B1 DEF A Based on control flow graph. B2 MOD A B3 B5 USE C C USE A B4 B6 USE C B7 B8 USE A [M.Verma, P.Marwedel: Dynamic Overlay of Scratchpad Memory for Energy Minimization, ISSS, 2004]

30 Overlaying allocation by Verma et al. (2) B1 B2 DEF A B9 SPILL_STORE(A); SPILL_LOAD(C); MOD A USE A B3 B4 B7 B5 B6 USE C C USE C B10 Global set of ILP equations reflects cost/benefit relations of potential copy points SPILL_LOAD(A); B8 USE A Code handled like data

31 Runtime/energy reduction with respect to non-overlaying ( static ) allocation

32 Dynamic set of multiple applications Compile-time partitioning of SPM not feasible Introduction of SPM-manager Runtime decisions, but compile-time supported CPU MEM SPM Address space: SPM App. 1 App. 2 App. n? SPM Manager SPM App. 3 App. 2 App. 1 t [R. Pyka, Ch. Faßbach, M. Verma, H. Falk, P. Marwedel: Operating system integrated energy aware scratchpad allocation strategies for multi-process applications, SCOPES, 2007]

33 Comparison of SPMM to Caches for SORT Baseline: Main memory only SPMM peak energy reduction by 83% at 4k Bytes scratchpad SPM Size Δ 4-way ,81% ,35% ,39% ,64% ,73%

34 Outline Scope, opportunities & challenges (Partial) solutions to challenges Timing predictability - WCET-aware compilation: unrolling, locking - register allocation, scratch pads Energy efficiency - scratch pads: (non)-overlaying, compile/run-time allocation Tradeoffs between multiple objectives Summary - PAMONO bio sensor - automatic parallelization: energy efficiency vs. performance - approximate computations: timeliness vs. exactness

35 Energy consumption of (bio-) virus detection Detection of nano objects using plasmon effect (Optical effect of objects wavelength of light) Nano object 0,06 Reflectivity 0,04 Prisma Sensor Image sequence Signal ( I/I) 0,02 0,00-0,02 Laser CCD -0,04 55,0 55,2 55,4 55,6 55,8 56,0 56,2 incidence angle, grad

36 Timeliness in (bio-) virus detection (2) Target: 30 frames per second at 1024x256 pixels Frame Rate (fps) Image Size 1024x x x x1024 CPU: i CPU: i7-620m (mobile) GPU: 3100M (mobile) GPU: GTX GPU: GTX 560Ti Low-end Mobile GPU meets realtime constraint up to 1024x256 pixels Faster GPUs give room for larger usable sensor area

37 Timeliness in (bio-) virus detection (3) current clamp Energy per frame CPU 3.26 J 5.84 J J Reduced to Energy per frame GPU 0.93 J 1.56 J 2.76 J avg 27% C. Timm, A. Gelenberg, P. Marwedel, F. Weichert: Energy Considerations within the Integration of General Purpose GPUs in Embedded Systems. Intern. Conf. on Advances in Distributed and Parallel Computing, 2010 C. Timm, F. Weichert, P. Marwedel, H. Müller: Design Space Exploration Towards a Realtime and Energy-Aware GPGPU-based Analysis of Biosensor Data. Computer Science - Research and Development, ENA-HPC,

38 Multi-objective based parallelization of sequential programs D. Cordes, P. Marwedel: Multi-Objective Aware Extraction of Task-Level Parallelism Using Genetic Algorithms, DATE

39 Tradeoff between timeliness and exactness Traditional error correction requires resources, including time Conflicting with timing requirements of real-time systems ➀ ➁ ➂

40 Tradeoff between timeliness and exactness (2) In some cases, errors have no effect In some cases, timing requirements are dominating & error correction may be compromised In other cases, error correction requirements are dominating & timing requirements may be compromised Register 0 Register 1 Register 2 Register 3 unused variable i unused variable j No effect Imprecise output System crash Classification requires application knowledge

41 Analysis of H.264 video decoder (~ 3500 lines of C code) Fraction of memory space Resolution Memory size of reliable data Memory size of unreliable data 176 x kb (55%) 74 kb (45%) 352 x kb (43%) 297 kb (57%) 1280 x kb (37%) 2700 kb (63%) Impact of uncorrected errors: Injection into unreliable memory 240 faults / sec 80 faults / sec Average PSNR of frames with errors db db Deteriorated exactness, timeliness maintained!

42 Summary Scope, opportunities & challenges (Partial) solutions to challenges Timing predictability - WCET-aware compilation: unrolling, locking - register allocation, scratch pads Energy efficiency - scratch pads: (non)-overlaying, compile/run-time allocation Tradeoffs between multiple objectives Summary - PAMONO bio sensor - automatic parallelization: energy efficiency vs. performance - approximate computations: timeliness vs. exactness

Memory-architecture aware compilation

Memory-architecture aware compilation - 1- ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 8 2008 Memory-architecture aware compilation Lecturers: Peter Marwedel, Heiko Falk Informatik 12 TU Dortmund, Germany

More information

Optimizations - Compilation for Embedded Processors -

Optimizations - Compilation for Embedded Processors - 12 Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 23 211 年 1 月 12 日 These slides use Microsoft clip arts.

More information

Optimizations - Compilation for Embedded Processors -

Optimizations - Compilation for Embedded Processors - Springer, 2010 12 Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany 2014 年 01 月 17 日 These slides use Microsoft clip arts. Microsoft copyright restrictions

More information

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel

More information

WCET-Aware C Compiler: WCC

WCET-Aware C Compiler: WCC 12 WCET-Aware C Compiler: WCC Jian-Jia Chen (slides are based on Prof. Heiko Falk) TU Dortmund, Informatik 12 2015 年 05 月 05 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.

More information

fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund, Informatik 12 Germany 2008/06/25

fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund, Informatik 12 Germany 2008/06/25 12 Optimizations Peter Marwedel TU Dortmund, Informatik 12 Germany 2008/06/25 Application Knowledge Structure of this course New clustering 3: Embedded System HW 2: Specifications 5: Application mapping:

More information

Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance

Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance Wei Zhang and Yiqiang Ding Department of Electrical and Computer Engineering Virginia Commonwealth University {wzhang4,ding4}@vcu.edu

More information

Evaluation and Validation

Evaluation and Validation Springer, 2010 Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany 2013 年 12 月 02 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply. Application Knowledge

More information

Best Practice for Caching of Single-Path Code

Best Practice for Caching of Single-Path Code Best Practice for Caching of Single-Path Code Martin Schoeberl, Bekim Cilku, Daniel Prokesch, and Peter Puschner Technical University of Denmark Vienna University of Technology 1 Context n Real-time systems

More information

Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems

Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems Zimeng Zhou, Lei Ju, Zhiping Jia, Xin Li School of Computer Science and Technology Shandong University, China Outline

More information

Combining Worst-Case Timing Models, Loop Unrolling, and Static Loop Analysis for WCET Minimization

Combining Worst-Case Timing Models, Loop Unrolling, and Static Loop Analysis for WCET Minimization Combining Worst-Case Timing Models, Loop Unrolling, and Static Loop Analysis for WCET Minimization Paul Lokuciejewski, Peter Marwedel Computer Science 12 TU Dortmund University D-44221 Dortmund, Germany

More information

Evaluation and Validation

Evaluation and Validation 12 Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 2003 2010 年 12 月 05 日 These slides use Microsoft clip arts. Microsoft copyright

More information

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Introduction WCET of program ILP Formulation Requirement SPM allocation for code SPM allocation for data Conclusion

More information

Cache-Aware Instruction SPM Allocation for Hard Real-Time Systems

Cache-Aware Instruction SPM Allocation for Hard Real-Time Systems Cache-Aware Instruction SPM Allocation for Hard Real-Time Systems Arno Luppold Institute of Embedded Systems Hamburg University of Technology Germany Arno.Luppold@tuhh.de Christina Kittsteiner Institute

More information

Optimizations. - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/13. technische universität dortmund

Optimizations. - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/13. technische universität dortmund 12 Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/13 Graphics: Alexandra Nolte, Gesine Marwedel, 2003 - Compilation for Embedded Processors - Structure of this course Application

More information

Cache-Aware Scratchpad Allocation Algorithm

Cache-Aware Scratchpad Allocation Algorithm 1530-1591/04 $20.00 (c) 2004 IEEE -Aware Scratchpad Allocation Manish Verma, Lars Wehmeyer, Peter Marwedel Department of Computer Science XII University of Dortmund 44225 Dortmund, Germany {Manish.Verma,

More information

Evaluating MMX Technology Using DSP and Multimedia Applications

Evaluating MMX Technology Using DSP and Multimedia Applications Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical

More information

Scratchpad memory vs Caches - Performance and Predictability comparison

Scratchpad memory vs Caches - Performance and Predictability comparison Scratchpad memory vs Caches - Performance and Predictability comparison David Langguth langguth@rhrk.uni-kl.de Abstract While caches are simple to use due to their transparency to programmer and compiler,

More information

4/6/2011. Informally, scheduling is. Informally, scheduling is. More precisely, Periodic and Aperiodic. Periodic Task. Periodic Task (Contd.

4/6/2011. Informally, scheduling is. Informally, scheduling is. More precisely, Periodic and Aperiodic. Periodic Task. Periodic Task (Contd. So far in CS4271 Functionality analysis Modeling, Model Checking Timing Analysis Software level WCET analysis System level Scheduling methods Today! erformance Validation Systems CS 4271 Lecture 10 Abhik

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

Embedded Systems 1. Introduction

Embedded Systems 1. Introduction Embedded Systems 1. Introduction Lothar Thiele 1-1 Organization WWW: http://www.tik.ee.ethz.ch/tik/education/lectures/es/ Lecture: Lothar Thiele, thiele@ethz.ch Coordination: Rehan Ahmed, rehan.ahmed@tik.ee.ethz.ch

More information

Improving Nanoobject Detection in Optical Biosensor Data

Improving Nanoobject Detection in Optical Biosensor Data Improving Nanoobject Detection in Optical Biosensor Data Constantin TIMM Department of Computer Science XII, Technical University of Dortmund, Otto-Hahn-Str. 16 Pascal LIBUSCHEWSKI Dominic SIEDHOFF Frank

More information

HARDWARE SOFTWARE CO-DESIGN

HARDWARE SOFTWARE CO-DESIGN HARDWARE SOFTWARE CO-DESIGN BITS Pilani Dubai Campus Dr Jagadish Nayak Introduction BITS Pilani Dubai Campus What is this? Hardware/Software codesign investigates the concurrent design of hardware and

More information

Multi-Objective Exploration of Compiler Optimizations for Real-Time Systems¹

Multi-Objective Exploration of Compiler Optimizations for Real-Time Systems¹ 2010 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Multi-Objective Exploration of Compiler Optimizations for Real-Time Systems¹ Paul Lokuciejewski,

More information

System Design and Methodology/ Embedded Systems Design (Modeling and Design of Embedded Systems)

System Design and Methodology/ Embedded Systems Design (Modeling and Design of Embedded Systems) Design&Methodologies Fö 1&2-1 Design&Methodologies Fö 1&2-2 Course Information Design and Methodology/ Embedded s Design (Modeling and Design of Embedded s) TDTS07/TDDI08 Web page: http://www.ida.liu.se/~tdts07

More information

TuBound A tool for Worst-Case Execution Time Analysis

TuBound A tool for Worst-Case Execution Time Analysis TuBound A Conceptually New Tool for Worst-Case Execution Time Analysis 1 Adrian Prantl, Markus Schordan, Jens Knoop {adrian,markus,knoop}@complang.tuwien.ac.at Institut für Computersprachen Technische

More information

Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems

Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems Florin Balasa American University in Cairo Noha Abuaesh American University in Cairo Ilie I. Luican Microsoft Inc., USA Cristian

More information

Shared Cache Aware Task Mapping for WCRT Minimization

Shared Cache Aware Task Mapping for WCRT Minimization Shared Cache Aware Task Mapping for WCRT Minimization Huping Ding & Tulika Mitra School of Computing, National University of Singapore Yun Liang Center for Energy-efficient Computing and Applications,

More information

Complementing Software Pipelining with Software Thread Integration

Complementing Software Pipelining with Software Thread Integration Complementing Software Pipelining with Software Thread Integration LCTES 05 - June 16, 2005 Won So and Alexander G. Dean Center for Embedded System Research Dept. of ECE, North Carolina State University

More information

Loop Bound Analysis based on a Combination of Program Slicing, Abstract Interpretation, and Invariant Analysis

Loop Bound Analysis based on a Combination of Program Slicing, Abstract Interpretation, and Invariant Analysis Loop Bound Analysis based on a Combination of Program Slicing, Abstract Interpretation, and Invariant Analysis Andreas Ermedahl, Christer Sandberg, Jan Gustafsson, Stefan Bygde, and Björn Lisper Department

More information

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy

More information

Power Efficient Instruction Caches for Embedded Systems

Power Efficient Instruction Caches for Embedded Systems Power Efficient Instruction Caches for Embedded Systems Dinesh C. Suresh, Walid A. Najjar, and Jun Yang Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA

More information

Tag der mündlichen Prüfung: 03. Juni 2004 Dekan / Dekanin: Prof. Dr. Bernhard Steffen Gutachter / Gutachterinnen: Prof. Dr. Francky Catthoor, Prof. Dr

Tag der mündlichen Prüfung: 03. Juni 2004 Dekan / Dekanin: Prof. Dr. Bernhard Steffen Gutachter / Gutachterinnen: Prof. Dr. Francky Catthoor, Prof. Dr Source Code Optimization Techniques for Data Flow Dominated Embedded Software Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften der Universität Dortmund am Fachbereich Informatik

More information

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS Christian Ferdinand and Reinhold Heckmann AbsInt Angewandte Informatik GmbH, Stuhlsatzenhausweg 69, D-66123 Saarbrucken, Germany info@absint.com

More information

Mapping of Applications to Multi-Processor Systems

Mapping of Applications to Multi-Processor Systems Springer, 2010 Mapping of Applications to Multi-Processor Systems Peter Marwedel TU Dortmund, Informatik 12 Germany 2014 年 01 月 17 日 These slides use Microsoft clip arts. Microsoft copyright restrictions

More information

Abstract. 1. Introduction

Abstract. 1. Introduction THE MÄLARDALEN WCET BENCHMARKS: PAST, PRESENT AND FUTURE Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn Lisper School of Innovation, Design and Engineering, Mälardalen University Box 883, S-721

More information

Optimizations - Compilation for Embedded Processors -

Optimizations - Compilation for Embedded Processors - 12 Optimizations - Compilation for Embedded Processors - Jian-Jia Chen (Slides are based on Peter Marwedel) TU Dortmund Informatik 12 Germany 2015 年 01 月 13 日 These slides use Microsoft clip arts. Microsoft

More information

technische universität dortmund fakultät für informatik informatik 12 Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/10

technische universität dortmund fakultät für informatik informatik 12 Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/10 12 Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/10 Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Optimizations Structure of this course Application Knowledge 2: Specification Design repository

More information

Program design and analysis

Program design and analysis Program design and analysis Optimizing for execution time. Optimizing for energy/power. Optimizing for program size. Motivation Embedded systems must often meet deadlines. Faster may not be fast enough.

More information

Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform

Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform BL Standard IC s, PL Microcontrollers October 2007 Outline LPC3180 Description What makes this

More information

Architectural Musings

Architectural Musings Architectural Musings Rethinking Computer Systems Architecture & Evaluation Christopher Vick cvick@qti.qualcomm.com March 23, 2014 1 Introduction Vision Talk How should we analyze, reason about and evaluate

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Predictable paging in real-time systems: an ILP formulation

Predictable paging in real-time systems: an ILP formulation Predictable paging in real-time systems: an ILP formulation Damien Hardy Isabelle Puaut Université Européenne de Bretagne / IRISA, Rennes, France Abstract Conventionally, the use of virtual memory in real-time

More information

D 5.7 Report on Compiler Evaluation

D 5.7 Report on Compiler Evaluation Project Number 288008 D 5.7 Report on Compiler Evaluation Version 1.0 Final Public Distribution Vienna University of Technology, Technical University of Denmark Project Partners: AbsInt Angewandte Informatik,

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

-- the Timing Problem & Possible Solutions

-- the Timing Problem & Possible Solutions ARTIST Summer School in Europe 2010 Autrans (near Grenoble), France September 5-10, 2010 Towards Real-Time Applications on Multicore -- the Timing Problem & Possible Solutions Wang Yi Uppsala University,

More information

Embedded Systems. For other titles published in this series, go to

Embedded Systems. For other titles published in this series, go to Embedded Systems Series Editors Nikil D. Dutt, Department of Computer Science, Donald Bren School of Information and Computer Sciences, University of California, Irvine, Zot Code 3435, Irvine, CA 92697-3435,

More information

RTL2GDS Low Power Convergence for Chip-Package-System Designs. Aveek Sarkar VP, Technology Evangelism, ANSYS Inc.

RTL2GDS Low Power Convergence for Chip-Package-System Designs. Aveek Sarkar VP, Technology Evangelism, ANSYS Inc. RTL2GDS Low Power Convergence for Chip-Package-System Designs Aveek Sarkar VP, Technology Evangelism, ANSYS Inc. Electronics Design Complexities Antenna Design and Placement Chip Low Power and Thermal

More information

Stack Frames Placement in Scratch-Pad Memory for Energy Reduction of Multi-task Applications

Stack Frames Placement in Scratch-Pad Memory for Energy Reduction of Multi-task Applications Stack Frames Placement in Scratch-Pad Memory for Energy Reduction of Multi-task Applications LOVIC GAUTHIER 1, TOHRU ISHIHARA 1, AND HIROAKI TAKADA 2 1 System LSI Research Center, 3rd Floor, Institute

More information

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions

More information

Embedded Systems: Hardware Components (part I) Todor Stefanov

Embedded Systems: Hardware Components (part I) Todor Stefanov Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System

More information

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion

More information

Information Processing. Peter Marwedel Informatik 12 Univ. Dortmund Germany

Information Processing. Peter Marwedel Informatik 12 Univ. Dortmund Germany Information Processing Peter Marwedel Informatik 12 Univ. Dortmund Germany Embedded System Hardware Embedded system hardware is frequently used in a loop ( hardware in a loop ): actuators - 2 - Processing

More information

A Stack Cache for Real-Time Systems

A Stack Cache for Real-Time Systems A Stack Cache for Real-Time Systems Martin Schoeberl and Carsten Nielsen Department of Applied Mathematics and Computer Science Technical University of Denmark Email: masca@imm.dtu.dk, carstenlau.nielsen@uzh.ch

More information

Bus-Aware Static Instruction SPM Allocation for Multicore Hard Real-Time Systems

Bus-Aware Static Instruction SPM Allocation for Multicore Hard Real-Time Systems Bus-Aware Static Instruction SPM Allocation for Multicore Hard Real-Time Systems Dominic Oehlert 1, Arno Luppold 2, and Heiko Falk 3 1 Hamburg University of Technology, Hamburg, Germany dominic.oehlert@tuhh.de

More information

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal

More information

MARACAS: A Real-Time Multicore VCPU Scheduling Framework

MARACAS: A Real-Time Multicore VCPU Scheduling Framework : A Real-Time Framework Computer Science Department Boston University Overview 1 2 3 4 5 6 7 Motivation platforms are gaining popularity in embedded and real-time systems concurrent workload support less

More information

Strato and Strato OS. Justin Zhang Senior Applications Engineering Manager. Your new weapon for verification challenge. Nov 2017

Strato and Strato OS. Justin Zhang Senior Applications Engineering Manager. Your new weapon for verification challenge. Nov 2017 Strato and Strato OS Your new weapon for verification challenge Justin Zhang Senior Applications Engineering Manager Nov 2017 Emulation Market Evolution Emulation moved to Virtualization with Veloce2 Data

More information

Understanding multimedia application chacteristics for designing programmable media processors

Understanding multimedia application chacteristics for designing programmable media processors Understanding multimedia application chacteristics for designing programmable media processors Jason Fritts Jason Fritts, Wayne Wolf, and Bede Liu SPIE Media Processors '99 January 28, 1999 Why programmable

More information

A Software Solution for Dynamic Stack Management on Scratch Pad Memory

A Software Solution for Dynamic Stack Management on Scratch Pad Memory A Software Solution for Dynamic Stack Management on Scratch Pad Memory Arun Kannan, Aviral Shrivastava, Amit Pabalkar and Jong-eun Lee Department of Computer Science and Engineering Arizona State University,

More information

Processors, Performance, and Profiling

Processors, Performance, and Profiling Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode

More information

Hardware/ Software Partitioning

Hardware/ Software Partitioning Hardware/ Software Partitioning Peter Marwedel TU Dortmund, Informatik 12 Germany Marwedel, 2003 Graphics: Alexandra Nolte, Gesine 2011 年 12 月 09 日 These slides use Microsoft clip arts. Microsoft copyright

More information

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun

More information

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis Hardware-Software Codesign 9. Worst Case Execution Time Analysis Lothar Thiele 9-1 System Design Specification System Synthesis Estimation SW-Compilation Intellectual Prop. Code Instruction Set HW-Synthesis

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

Embedded System Hardware - Processing -

Embedded System Hardware - Processing - Springer, 2010 12 Embedded System Hardware - Processing - Peter Marwedel Informatik 12 TU Dortmund Germany 2013 年 11 月 12 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.

More information

A Dynamic Instruction Scratchpad Memory for Embedded Processors Managed by Hardware

A Dynamic Instruction Scratchpad Memory for Embedded Processors Managed by Hardware A Dynamic Instruction Scratchpad Memory for Embedded Processors Managed by Hardware Stefan Metzlaff 1, Irakli Guliashvili 1,SaschaUhrig 2,andTheoUngerer 1 1 Department of Computer Science, University of

More information

Caches in Real-Time Systems. Instruction Cache vs. Data Cache

Caches in Real-Time Systems. Instruction Cache vs. Data Cache Caches in Real-Time Systems [Xavier Vera, Bjorn Lisper, Jingling Xue, Data Caches in Multitasking Hard Real- Time Systems, RTSS 2003.] Schedulability Analysis WCET Simple Platforms WCMP (memory performance)

More information

HPC VT Machine-dependent Optimization

HPC VT Machine-dependent Optimization HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler

More information

Performance Tuning on the Blackfin Processor

Performance Tuning on the Blackfin Processor 1 Performance Tuning on the Blackfin Processor Outline Introduction Building a Framework Memory Considerations Benchmarks Managing Shared Resources Interrupt Management An Example Summary 2 Introduction

More information

New Embedded NVM architectures

New Embedded NVM architectures New Embedded NVM architectures for Secure & Low Power Microcontrollers Jean DEVIN, Bruno LECONTE Microcontrollers, Memories & Smartcard Group STMicroelectronics 11 th LETI Annual review, June 24th, 2009

More information

Analysis and Implementation of Global Preemptive Fixed-Priority Scheduling with Dynamic Cache Allocation

Analysis and Implementation of Global Preemptive Fixed-Priority Scheduling with Dynamic Cache Allocation Analysis and Implementation of Global Preemptive Fixed-Priority Scheduling with Dynamic Cache Allocation Meng Xu Linh Thi Xuan Phan Hyon-Young Choi Insup Lee Department of Computer and Information Science

More information

Automatic Selection of Machine Learning Models for WCET-aware Compiler Heuristic Generation

Automatic Selection of Machine Learning Models for WCET-aware Compiler Heuristic Generation Automatic Selection of Machine Learning Models for WCET-aware Compiler Heuristic Generation Paul Lokuciejewski 1, Marco Stolpe 2, Katharina Morik 2, Peter Marwedel 1 1 Computer Science 12 (Embedded Systems

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

Automatic Selection of Machine Learning Models for Compiler Heuristic Generation

Automatic Selection of Machine Learning Models for Compiler Heuristic Generation Automatic Selection of Machine Learning Models for Compiler Heuristic Generation Paul Lokuciejewski 1, Marco Stolpe 2, Katharina Morik 2, Peter Marwedel 1 1 Computer Science 12 (Embedded Systems Group)

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

Mapping of Applications to Multi-Processor Systems

Mapping of Applications to Multi-Processor Systems Mapping of Applications to Multi-Processor Systems Peter Marwedel TU Dortmund, Informatik 12 Germany Marwedel, 2003 Graphics: Alexandra Nolte, Gesine 2011 年 12 月 09 日 These slides use Microsoft clip arts.

More information

Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension

Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami Institute of Systems, Information

More information

Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability

Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability Yiqiang Ding, Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Outline

More information

A Novel Technique to Use Scratch-pad Memory for Stack Management

A Novel Technique to Use Scratch-pad Memory for Stack Management A Novel Technique to Use Scratch-pad Memory for Stack Management Soyoung Park Hae-woo Park Soonhoi Ha School of EECS, Seoul National University, Seoul, Korea {soy, starlet, sha}@iris.snu.ac.kr Abstract

More information

Embedded Systems Lecture 11: Worst-Case Execution Time. Björn Franke University of Edinburgh

Embedded Systems Lecture 11: Worst-Case Execution Time. Björn Franke University of Edinburgh Embedded Systems Lecture 11: Worst-Case Execution Time Björn Franke University of Edinburgh Overview Motivation Worst-Case Execution Time Analysis Types of Execution Times Measuring vs. Analysing Flow

More information

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Professor Alvin R. Lebeck Computer Science 220 Fall 1999 Exam Average 76 90-100 4 80-89 3 70-79 3 60-69 5 < 60 1 Admin

More information

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27 1 / 27 GPU Programming Lecture 1: Introduction Miaoqing Huang University of Arkansas 2 / 27 Outline Course Introduction GPUs as Parallel Computers Trend and Design Philosophies Programming and Execution

More information

Reducing Energy Consumption by Dynamic Copying of Instructions onto Onchip Memory

Reducing Energy Consumption by Dynamic Copying of Instructions onto Onchip Memory Reducing Consumption by Dynamic Copying of Instructions onto Onchip Memory Stefan Steinke *, Nils Grunwald *, Lars Wehmeyer *, Rajeshwari Banakar +, M. Balakrishnan +, Peter Marwedel * * University of

More information

Fast, predictable and low energy memory references through architecture-aware compilation 1

Fast, predictable and low energy memory references through architecture-aware compilation 1 ; Fast, predictable and low energy memory references through architecture-aware compilation 1 Peter Marwedel, Lars Wehmeyer, Manish Verma, Stefan Steinke +, Urs Helmig University of Dortmund, Germany +

More information

Design-Induced Latency Variation in Modern DRAM Chips:

Design-Induced Latency Variation in Modern DRAM Chips: Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms Donghyuk Lee 1,2 Samira Khan 3 Lavanya Subramanian 2 Saugata Ghose 2 Rachata Ausavarungnirun

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System

Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System Daniel Sandell Master s thesis D-level, 20 credits Dept. of Computer Science Mälardalen University Supervisor:

More information

Efficient Resource Management based on NonFunctional Requirements for Sensor/Actuator Networks

Efficient Resource Management based on NonFunctional Requirements for Sensor/Actuator Networks Chapter 4 Applications and Impacts Efficient Resource Management based on NonFunctional Requirements for Sensor/Actuator Networks C.Timm¹, F.Weichert², C.Prasse³, H.Müller², M.ten Hompel4 and P.Marwedel¹

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples

More information

Real-Time Cache Management for Multi-Core Virtualization

Real-Time Cache Management for Multi-Core Virtualization Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University Benefits of Multi-Core Processors Consolidation

More information

Improving Cache Performance

Improving Cache Performance Improving Cache Performance Computer Organization Architectures for Embedded Computing Tuesday 28 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,

More information

Time Triggered and Event Triggered; Off-line Scheduling

Time Triggered and Event Triggered; Off-line Scheduling Time Triggered and Event Triggered; Off-line Scheduling Real-Time Architectures -TUe Gerhard Fohler 2004 Mälardalen University, Sweden gerhard.fohler@mdh.se Real-time: TT and ET Gerhard Fohler 2004 1 Activation

More information

Single-Path Programming on a Chip-Multiprocessor System

Single-Path Programming on a Chip-Multiprocessor System Single-Path Programming on a Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

A Framework for Modeling GPUs Power Consumption

A Framework for Modeling GPUs Power Consumption A Framework for Modeling GPUs Power Consumption Sohan Lal, Jan Lucas, Michael Andersch, Mauricio Alvarez-Mesa, Ben Juurlink Embedded Systems Architecture Technische Universität Berlin Berlin, Germany January

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

Improving Timing Analysis for Matlab Simulink/Stateflow

Improving Timing Analysis for Matlab Simulink/Stateflow Improving Timing Analysis for Matlab Simulink/Stateflow Lili Tan, Björn Wachter, Philipp Lucas, Reinhard Wilhelm Universität des Saarlandes, Saarbrücken, Germany {lili,bwachter,phlucas,wilhelm}@cs.uni-sb.de

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information