Efficient Computing in Cyber-Physical Systems
|
|
- Ella Francis
- 6 years ago
- Views:
Transcription
1 12 Efficient Computing in Cyber-Physical Systems Peter Marwedel TU Dortmund (Germany) Informatik 12 Springer, /06/20
2 Cyber-physical systems and embedded systems Embedded systems (ES): information processing embedded into a larger product [Peter Marwedel, 2003] Cyber-physical systems (CPS): integrations of computation with physical processes [Edward Lee, 2006]. CPS = ES + physical environment Information processing Embedded systems ("small computers") Physics - 2 -
3 Outline Scope, opportunities & challenges (Partial) solutions to challenges Timing predictability - WCET-aware compilation: unrolling, locking - register allocation, scratch pads Energy efficiency - scratch pads: (non)-overlaying, compile/run-time allocation Tradeoffs between multiple objectives Summary - PAMONO bio sensor - automatic parallelization: energy efficiency vs. performance - approximate computations: timeliness vs. exactness - 3 -
4 Opportunities Transportation Automotive Avionics Rail Medical Hospitals Support of handicapped Patient information Support for the elderly ~2000 : Microsoft Cliparts + P. Marwedel, 2011, - 4 -
5 Opportunities (2) (Urban) living Telecommunication Consumer electronics Smart home Logistics Smart grid Public safety Structural health monitoring Robotics Military systems Graphics: P. Marwedel, 2011, Microsoft - 5 -
6 Challenge (1): timing predictability Timing: The system must react within a time interval dictated by the environment [Adopted from Bergé] execute t Consequences frequently underestimated Always! Not subject to statistical arguments (except possibly extremely small probabilities) Graphics: Microsoft Cliparts - 6 -
7 Challenges (2): Energy/power efficiency Need for energy/power efficient processing ( sweet corner ) Graphics: P. Marwedel, H. De Man, Microsoft - 7 -
8 Challenge(3): Many more objectives exactness of results, QoS average case delay safety security thermal behavior reliability EMC characteristics radiation hardness cost, size weight environmental friendliness extendability Threatened by increased openness Graphics: P. Marwedel, 2011; Microsoft - 8 -
9 Challenges (4) Analog/digital interface (quantization, ) Consequences of networked embedded systems digital-analog signal e.g. maintain level of local control - 9 -
10 Outline Scope, opportunities & challenges (Partial) solutions to challenges Timing predictability - WCET-aware compilation: unrolling, locking - register allocation, scratch pads Energy efficiency - scratch pads: (non)-overlaying, compile/run-time allocation Tradeoffs between multiple objectives Summary - PAMONO bio sensor - automatic parallelization: energy efficiency vs. performance - approximate computations: timeliness vs. exactness
11 Let s look at timing! Definition of worst case execution time: disrtribution of times BCET EST BCET worst-case guarantee worst-case performance possible execution times timing predictability WCET WCET EST Time constraint time WCET EST must be 1. safe (i.e. WCET) and 2. tight (WCET EST -WCET WCET EST ) Graphics: adopted from R. Wilhelm + Microsoft Cliparts
12 Current Trial-and-Error Based Development 1. Specification of CPS/ES software 2. Generation of Code (ANSI-C or similar) 3. Compilation for given target processor 4. Execution and/or simulation of machine code, using a (e.g. random) set of input data 5. Measurement-based computation of estimated worst case execution time (WCET meas ) 6. Adding safety margin (e.g. 20%) on top of WCET meas and call this WCET hypo 7. If WCET hypo > real-time constraint: change some detail, go back to 1 or
13 Structure of WCC (WCET-aware C-compiler) CRL2LLIR LLIR2CRL
14 Challenges for WCET EST -Minimization Worst-Case Execution Path (WCEP) WCET EST of a program = Length of longest execution path (WCEP) in that program WCET EST -Minimization: Reduction of the longest path Other optimizations do not result in a reduction of WCET EST Optimizations need to know the WCEP
15 WCET-oriented optimizations Extended loop analysis (CGO 09) Instruction cache locking (CODES/ISSS 07, CGO 12) Cache partitioning (WCET-Workshop 09) Procedure cloning (WCET-WS 07, CODES 07, SCOPES 08) Procedure/code positioning (ECRTS 08, CASES 11 (2x)) Function inlining (SMART 09, SMART 10) Loop unswitching/invariant paths (SCOPES 09) Loop unrolling (ECRTS 09) Register allocation (DAC 09, ECRTS 11)) Scratchpad optimization (DAC 09) Extension towards multi-objective optimization (RTSS 08) Superblock-based optimizations (ICESS 10) Surveys (Springer 10, Springer 12)
16 WCET-aware Loop Unrolling via Back-annotation WCET-information available at assembly level Unrolling to be applied at internal representation of source code Solution: Back-annotation: Experimental WCET-aware compiler WCC allows copying information: assembly code source code WCET data Back- Annotation High-Level IR (Source Code) Mem. Spec. Code generation Assembly code size Amount of spill code Low-Level IR (Assembly Code) Memory architecture info available
17 Results for unrolling: WCET Relative WCET [%] kbyte 1 kbyte 2 kbyte 4 kbyte 8 kbyte 16 kbyte 16 I-Cache Size STANDARD 100%: Avg. WCET for all benchmarks with O3 & no unrolling WCET reduction between 10.2% and 15.4% WCET-driven Unrolling outperforms std. unrolling by 13.7%
18 Relative WCET EST with I-Cache Locking 5 Benchmarks/ARM920T/Postpass-Opt Rel. WCET_EST [%] 110% 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% ADPCM G723 Statemate Compress MPEG2 H. Falk, Heiko, S. Plazar, H. Theiling: Compile Time Decided Instruction Cache Locking Using Worst-Case Execution Paths, Intern. Conf. on Hardware/Software Codesign and System Synthesis (CODES+ISSS), % (ARM920T) Cache Size [bytes] [S. Plazar et al.]
19 Register Allocation H. Falk, Heiko: WCET-aware Register Allocation based on Graph Coloring, 46th Design Automation Conference (DAC), % 100% = WCET EST using Standard Graph Coloring (highest degree) 100% 90% 80% Relative WCETEST [%] (Optimization Level -O3) 70% 60% 50% 40% 30% 20% 10% 0% adpcm_verify cjpeg_transupp compress crc dijkstra duff edge_detect edn epic expint fdct fft_1024 fft_256 fir fir2dim gsm gsm_encode h263 h264dec_block h264dec_macro iir_4_64 iir_biquad_n jfdctint latnrm_32_64 lmsfir_8_1 lmsfir_32_64 lpc ludcmp matmult matrix2_fixed matrix2_float md5 minver mult_10_10 mult_4_4 ndes prime qmf_receive qmf_transmit qurt rijndael_enc select sha spectral startup v32_benc Average
20 Scratch pad memories (SPM): Fast, energy-efficient, timing-predictable SPMs are small, physically separate memories mapped into the address space; Selection is by an appropriate address decoder (simple!) select Address space SPM 0 FFF.. Example scratch pad memory ARM7TDMI cores, wellknown for low power consumption Small; no tag memory
21 WCET EST -aware scratch pad allocation Setup Bosch Democar: Runnable IgnitionSWCSync WCET EST -aware scratch pad allocation of program code by WCC, including fully automated WCET EST analyses using ait P. Marwedel, 2011 WCET EST reduced to about 50%, compared to gcc. Property of the PREDATOR consortium. Confidential
22 Outline Scope, opportunities & challenges (Partial) solutions to challenges Timing predictability - WCET-aware compilation: unrolling, locking - register allocation, scratch pads Energy efficiency - scratch pads: (non)-overlaying, compile/run-time allocation Tradeoffs between multiple objectives Summary - PAMONO bio sensor - automatic parallelization: energy efficiency vs. performance - approximate computations: timeliness vs. exactness
23 Why care about energy efficiency? Execution platform E.g. Global warming Cost of energy Increasing performance Problems with cooling, avoiding hot spots Avoiding high currents & metal migration Reliability Energy a very scarce resource Plugged Factory Relevant during use? Uncharged periods Car Unplugged Sensor Power Graphics: P. Marwedel,
24 Where is the energy consumed? - Consumer portable systems - According to International Technology Roadmap for Semiconductors (ITRS), 2010 update, [ Violation of W constraint for small mobiles. It not just the logic, don t ignore memory! ITRS,
25 Energy consumption of memories CACTI: Scratchpad (SRAM) vs. DRAM (DDR2): DRAM High Performance - 16 banks SPM Access time (ns) K SRAM 8K 32K 128K 512K 2M 8M 32M 128M 512M 2G High Performance - 16 banks SPM Energy (nj) - read High Performance - 16 banks DDR Access time (ns) High Performance - 16 banks DDR Energy (nj) - read 16 bit read; size in bytes; 65 nm for SRAM, 80 nm for DRAM Source: Olivera Jovanovic, TU Dortmund,
26 Migration of data & instructions, global optimization model (TU Dortmund) Example: main memory for i.{ } for j..{ } while... repeat call... array... Which memory object (array, loop, etc.) to be stored in SPM? Non-overlaying ( static ) allocation: Gain g k and size s k for each object k. Maximise gain G = g k, respecting size of SPM SSP s k. scratch pad memory, capacity SSP? array Solution: knapsack algorithm. Overlaying ( dynamic ) allocation: processor int... Moving objects back and forth
27 Reduction in energy and average run-time for migrating functions and variables Cycles [x100] Energy [µj] Multi_sort (mix of sort algorithms) Measured processor / external memory energy + CACTI values for SPM (combined model) Numbers will change with technology, algorithms remain unchanged
28 Non-overlaying allocation problematic for multiple hot spots Overlaying allocation Memory CPU Memory SPM Effectively results in a kind of compilercontrolled overlays for SPM Address assignment within SPM required
29 Overlaying allocation by Verma et al. (1) B1 DEF A Based on control flow graph. B2 MOD A B3 B5 USE C C USE A B4 B6 USE C B7 B8 USE A [M.Verma, P.Marwedel: Dynamic Overlay of Scratchpad Memory for Energy Minimization, ISSS, 2004]
30 Overlaying allocation by Verma et al. (2) B1 B2 DEF A B9 SPILL_STORE(A); SPILL_LOAD(C); MOD A USE A B3 B4 B7 B5 B6 USE C C USE C B10 Global set of ILP equations reflects cost/benefit relations of potential copy points SPILL_LOAD(A); B8 USE A Code handled like data
31 Runtime/energy reduction with respect to non-overlaying ( static ) allocation
32 Dynamic set of multiple applications Compile-time partitioning of SPM not feasible Introduction of SPM-manager Runtime decisions, but compile-time supported CPU MEM SPM Address space: SPM App. 1 App. 2 App. n? SPM Manager SPM App. 3 App. 2 App. 1 t [R. Pyka, Ch. Faßbach, M. Verma, H. Falk, P. Marwedel: Operating system integrated energy aware scratchpad allocation strategies for multi-process applications, SCOPES, 2007]
33 Comparison of SPMM to Caches for SORT Baseline: Main memory only SPMM peak energy reduction by 83% at 4k Bytes scratchpad SPM Size Δ 4-way ,81% ,35% ,39% ,64% ,73%
34 Outline Scope, opportunities & challenges (Partial) solutions to challenges Timing predictability - WCET-aware compilation: unrolling, locking - register allocation, scratch pads Energy efficiency - scratch pads: (non)-overlaying, compile/run-time allocation Tradeoffs between multiple objectives Summary - PAMONO bio sensor - automatic parallelization: energy efficiency vs. performance - approximate computations: timeliness vs. exactness
35 Energy consumption of (bio-) virus detection Detection of nano objects using plasmon effect (Optical effect of objects wavelength of light) Nano object 0,06 Reflectivity 0,04 Prisma Sensor Image sequence Signal ( I/I) 0,02 0,00-0,02 Laser CCD -0,04 55,0 55,2 55,4 55,6 55,8 56,0 56,2 incidence angle, grad
36 Timeliness in (bio-) virus detection (2) Target: 30 frames per second at 1024x256 pixels Frame Rate (fps) Image Size 1024x x x x1024 CPU: i CPU: i7-620m (mobile) GPU: 3100M (mobile) GPU: GTX GPU: GTX 560Ti Low-end Mobile GPU meets realtime constraint up to 1024x256 pixels Faster GPUs give room for larger usable sensor area
37 Timeliness in (bio-) virus detection (3) current clamp Energy per frame CPU 3.26 J 5.84 J J Reduced to Energy per frame GPU 0.93 J 1.56 J 2.76 J avg 27% C. Timm, A. Gelenberg, P. Marwedel, F. Weichert: Energy Considerations within the Integration of General Purpose GPUs in Embedded Systems. Intern. Conf. on Advances in Distributed and Parallel Computing, 2010 C. Timm, F. Weichert, P. Marwedel, H. Müller: Design Space Exploration Towards a Realtime and Energy-Aware GPGPU-based Analysis of Biosensor Data. Computer Science - Research and Development, ENA-HPC,
38 Multi-objective based parallelization of sequential programs D. Cordes, P. Marwedel: Multi-Objective Aware Extraction of Task-Level Parallelism Using Genetic Algorithms, DATE
39 Tradeoff between timeliness and exactness Traditional error correction requires resources, including time Conflicting with timing requirements of real-time systems ➀ ➁ ➂
40 Tradeoff between timeliness and exactness (2) In some cases, errors have no effect In some cases, timing requirements are dominating & error correction may be compromised In other cases, error correction requirements are dominating & timing requirements may be compromised Register 0 Register 1 Register 2 Register 3 unused variable i unused variable j No effect Imprecise output System crash Classification requires application knowledge
41 Analysis of H.264 video decoder (~ 3500 lines of C code) Fraction of memory space Resolution Memory size of reliable data Memory size of unreliable data 176 x kb (55%) 74 kb (45%) 352 x kb (43%) 297 kb (57%) 1280 x kb (37%) 2700 kb (63%) Impact of uncorrected errors: Injection into unreliable memory 240 faults / sec 80 faults / sec Average PSNR of frames with errors db db Deteriorated exactness, timeliness maintained!
42 Summary Scope, opportunities & challenges (Partial) solutions to challenges Timing predictability - WCET-aware compilation: unrolling, locking - register allocation, scratch pads Energy efficiency - scratch pads: (non)-overlaying, compile/run-time allocation Tradeoffs between multiple objectives Summary - PAMONO bio sensor - automatic parallelization: energy efficiency vs. performance - approximate computations: timeliness vs. exactness
Memory-architecture aware compilation
- 1- ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 8 2008 Memory-architecture aware compilation Lecturers: Peter Marwedel, Heiko Falk Informatik 12 TU Dortmund, Germany
More informationOptimizations - Compilation for Embedded Processors -
12 Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 23 211 年 1 月 12 日 These slides use Microsoft clip arts.
More informationOptimizations - Compilation for Embedded Processors -
Springer, 2010 12 Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany 2014 年 01 月 17 日 These slides use Microsoft clip arts. Microsoft copyright restrictions
More informationOperating system integrated energy aware scratchpad allocation strategies for multiprocess applications
University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel
More informationWCET-Aware C Compiler: WCC
12 WCET-Aware C Compiler: WCC Jian-Jia Chen (slides are based on Prof. Heiko Falk) TU Dortmund, Informatik 12 2015 年 05 月 05 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.
More informationfakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund, Informatik 12 Germany 2008/06/25
12 Optimizations Peter Marwedel TU Dortmund, Informatik 12 Germany 2008/06/25 Application Knowledge Structure of this course New clustering 3: Embedded System HW 2: Specifications 5: Application mapping:
More informationHybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance
Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance Wei Zhang and Yiqiang Ding Department of Electrical and Computer Engineering Virginia Commonwealth University {wzhang4,ding4}@vcu.edu
More informationEvaluation and Validation
Springer, 2010 Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany 2013 年 12 月 02 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply. Application Knowledge
More informationBest Practice for Caching of Single-Path Code
Best Practice for Caching of Single-Path Code Martin Schoeberl, Bekim Cilku, Daniel Prokesch, and Peter Puschner Technical University of Denmark Vienna University of Technology 1 Context n Real-time systems
More informationManaging Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems
Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems Zimeng Zhou, Lei Ju, Zhiping Jia, Xin Li School of Computer Science and Technology Shandong University, China Outline
More informationCombining Worst-Case Timing Models, Loop Unrolling, and Static Loop Analysis for WCET Minimization
Combining Worst-Case Timing Models, Loop Unrolling, and Static Loop Analysis for WCET Minimization Paul Lokuciejewski, Peter Marwedel Computer Science 12 TU Dortmund University D-44221 Dortmund, Germany
More informationEvaluation and Validation
12 Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 2003 2010 年 12 月 05 日 These slides use Microsoft clip arts. Microsoft copyright
More informationSireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern
Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Introduction WCET of program ILP Formulation Requirement SPM allocation for code SPM allocation for data Conclusion
More informationCache-Aware Instruction SPM Allocation for Hard Real-Time Systems
Cache-Aware Instruction SPM Allocation for Hard Real-Time Systems Arno Luppold Institute of Embedded Systems Hamburg University of Technology Germany Arno.Luppold@tuhh.de Christina Kittsteiner Institute
More informationOptimizations. - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/13. technische universität dortmund
12 Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/13 Graphics: Alexandra Nolte, Gesine Marwedel, 2003 - Compilation for Embedded Processors - Structure of this course Application
More informationCache-Aware Scratchpad Allocation Algorithm
1530-1591/04 $20.00 (c) 2004 IEEE -Aware Scratchpad Allocation Manish Verma, Lars Wehmeyer, Peter Marwedel Department of Computer Science XII University of Dortmund 44225 Dortmund, Germany {Manish.Verma,
More informationEvaluating MMX Technology Using DSP and Multimedia Applications
Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical
More informationScratchpad memory vs Caches - Performance and Predictability comparison
Scratchpad memory vs Caches - Performance and Predictability comparison David Langguth langguth@rhrk.uni-kl.de Abstract While caches are simple to use due to their transparency to programmer and compiler,
More information4/6/2011. Informally, scheduling is. Informally, scheduling is. More precisely, Periodic and Aperiodic. Periodic Task. Periodic Task (Contd.
So far in CS4271 Functionality analysis Modeling, Model Checking Timing Analysis Software level WCET analysis System level Scheduling methods Today! erformance Validation Systems CS 4271 Lecture 10 Abhik
More informationReducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck
More informationEmbedded Systems 1. Introduction
Embedded Systems 1. Introduction Lothar Thiele 1-1 Organization WWW: http://www.tik.ee.ethz.ch/tik/education/lectures/es/ Lecture: Lothar Thiele, thiele@ethz.ch Coordination: Rehan Ahmed, rehan.ahmed@tik.ee.ethz.ch
More informationImproving Nanoobject Detection in Optical Biosensor Data
Improving Nanoobject Detection in Optical Biosensor Data Constantin TIMM Department of Computer Science XII, Technical University of Dortmund, Otto-Hahn-Str. 16 Pascal LIBUSCHEWSKI Dominic SIEDHOFF Frank
More informationHARDWARE SOFTWARE CO-DESIGN
HARDWARE SOFTWARE CO-DESIGN BITS Pilani Dubai Campus Dr Jagadish Nayak Introduction BITS Pilani Dubai Campus What is this? Hardware/Software codesign investigates the concurrent design of hardware and
More informationMulti-Objective Exploration of Compiler Optimizations for Real-Time Systems¹
2010 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Multi-Objective Exploration of Compiler Optimizations for Real-Time Systems¹ Paul Lokuciejewski,
More informationSystem Design and Methodology/ Embedded Systems Design (Modeling and Design of Embedded Systems)
Design&Methodologies Fö 1&2-1 Design&Methodologies Fö 1&2-2 Course Information Design and Methodology/ Embedded s Design (Modeling and Design of Embedded s) TDTS07/TDDI08 Web page: http://www.ida.liu.se/~tdts07
More informationTuBound A tool for Worst-Case Execution Time Analysis
TuBound A Conceptually New Tool for Worst-Case Execution Time Analysis 1 Adrian Prantl, Markus Schordan, Jens Knoop {adrian,markus,knoop}@complang.tuwien.ac.at Institut für Computersprachen Technische
More informationCompiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems
Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems Florin Balasa American University in Cairo Noha Abuaesh American University in Cairo Ilie I. Luican Microsoft Inc., USA Cristian
More informationShared Cache Aware Task Mapping for WCRT Minimization
Shared Cache Aware Task Mapping for WCRT Minimization Huping Ding & Tulika Mitra School of Computing, National University of Singapore Yun Liang Center for Energy-efficient Computing and Applications,
More informationComplementing Software Pipelining with Software Thread Integration
Complementing Software Pipelining with Software Thread Integration LCTES 05 - June 16, 2005 Won So and Alexander G. Dean Center for Embedded System Research Dept. of ECE, North Carolina State University
More informationLoop Bound Analysis based on a Combination of Program Slicing, Abstract Interpretation, and Invariant Analysis
Loop Bound Analysis based on a Combination of Program Slicing, Abstract Interpretation, and Invariant Analysis Andreas Ermedahl, Christer Sandberg, Jan Gustafsson, Stefan Bygde, and Björn Lisper Department
More informationParallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model
Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy
More informationPower Efficient Instruction Caches for Embedded Systems
Power Efficient Instruction Caches for Embedded Systems Dinesh C. Suresh, Walid A. Najjar, and Jun Yang Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
More informationTag der mündlichen Prüfung: 03. Juni 2004 Dekan / Dekanin: Prof. Dr. Bernhard Steffen Gutachter / Gutachterinnen: Prof. Dr. Francky Catthoor, Prof. Dr
Source Code Optimization Techniques for Data Flow Dominated Embedded Software Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften der Universität Dortmund am Fachbereich Informatik
More informationait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS
ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS Christian Ferdinand and Reinhold Heckmann AbsInt Angewandte Informatik GmbH, Stuhlsatzenhausweg 69, D-66123 Saarbrucken, Germany info@absint.com
More informationMapping of Applications to Multi-Processor Systems
Springer, 2010 Mapping of Applications to Multi-Processor Systems Peter Marwedel TU Dortmund, Informatik 12 Germany 2014 年 01 月 17 日 These slides use Microsoft clip arts. Microsoft copyright restrictions
More informationAbstract. 1. Introduction
THE MÄLARDALEN WCET BENCHMARKS: PAST, PRESENT AND FUTURE Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn Lisper School of Innovation, Design and Engineering, Mälardalen University Box 883, S-721
More informationOptimizations - Compilation for Embedded Processors -
12 Optimizations - Compilation for Embedded Processors - Jian-Jia Chen (Slides are based on Peter Marwedel) TU Dortmund Informatik 12 Germany 2015 年 01 月 13 日 These slides use Microsoft clip arts. Microsoft
More informationtechnische universität dortmund fakultät für informatik informatik 12 Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/10
12 Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/10 Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Optimizations Structure of this course Application Knowledge 2: Specification Design repository
More informationProgram design and analysis
Program design and analysis Optimizing for execution time. Optimizing for energy/power. Optimizing for program size. Motivation Embedded systems must often meet deadlines. Faster may not be fast enough.
More informationTechniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform
Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform BL Standard IC s, PL Microcontrollers October 2007 Outline LPC3180 Description What makes this
More informationArchitectural Musings
Architectural Musings Rethinking Computer Systems Architecture & Evaluation Christopher Vick cvick@qti.qualcomm.com March 23, 2014 1 Introduction Vision Talk How should we analyze, reason about and evaluate
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationPredictable paging in real-time systems: an ILP formulation
Predictable paging in real-time systems: an ILP formulation Damien Hardy Isabelle Puaut Université Européenne de Bretagne / IRISA, Rennes, France Abstract Conventionally, the use of virtual memory in real-time
More informationD 5.7 Report on Compiler Evaluation
Project Number 288008 D 5.7 Report on Compiler Evaluation Version 1.0 Final Public Distribution Vienna University of Technology, Technical University of Denmark Project Partners: AbsInt Angewandte Informatik,
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More information-- the Timing Problem & Possible Solutions
ARTIST Summer School in Europe 2010 Autrans (near Grenoble), France September 5-10, 2010 Towards Real-Time Applications on Multicore -- the Timing Problem & Possible Solutions Wang Yi Uppsala University,
More informationEmbedded Systems. For other titles published in this series, go to
Embedded Systems Series Editors Nikil D. Dutt, Department of Computer Science, Donald Bren School of Information and Computer Sciences, University of California, Irvine, Zot Code 3435, Irvine, CA 92697-3435,
More informationRTL2GDS Low Power Convergence for Chip-Package-System Designs. Aveek Sarkar VP, Technology Evangelism, ANSYS Inc.
RTL2GDS Low Power Convergence for Chip-Package-System Designs Aveek Sarkar VP, Technology Evangelism, ANSYS Inc. Electronics Design Complexities Antenna Design and Placement Chip Low Power and Thermal
More informationStack Frames Placement in Scratch-Pad Memory for Energy Reduction of Multi-task Applications
Stack Frames Placement in Scratch-Pad Memory for Energy Reduction of Multi-task Applications LOVIC GAUTHIER 1, TOHRU ISHIHARA 1, AND HIROAKI TAKADA 2 1 System LSI Research Center, 3rd Floor, Institute
More informationComputer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions
More informationEmbedded Systems: Hardware Components (part I) Todor Stefanov
Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System
More informationCenter for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop
Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion
More informationInformation Processing. Peter Marwedel Informatik 12 Univ. Dortmund Germany
Information Processing Peter Marwedel Informatik 12 Univ. Dortmund Germany Embedded System Hardware Embedded system hardware is frequently used in a loop ( hardware in a loop ): actuators - 2 - Processing
More informationA Stack Cache for Real-Time Systems
A Stack Cache for Real-Time Systems Martin Schoeberl and Carsten Nielsen Department of Applied Mathematics and Computer Science Technical University of Denmark Email: masca@imm.dtu.dk, carstenlau.nielsen@uzh.ch
More informationBus-Aware Static Instruction SPM Allocation for Multicore Hard Real-Time Systems
Bus-Aware Static Instruction SPM Allocation for Multicore Hard Real-Time Systems Dominic Oehlert 1, Arno Luppold 2, and Heiko Falk 3 1 Hamburg University of Technology, Hamburg, Germany dominic.oehlert@tuhh.de
More informationA Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors
A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal
More informationMARACAS: A Real-Time Multicore VCPU Scheduling Framework
: A Real-Time Framework Computer Science Department Boston University Overview 1 2 3 4 5 6 7 Motivation platforms are gaining popularity in embedded and real-time systems concurrent workload support less
More informationStrato and Strato OS. Justin Zhang Senior Applications Engineering Manager. Your new weapon for verification challenge. Nov 2017
Strato and Strato OS Your new weapon for verification challenge Justin Zhang Senior Applications Engineering Manager Nov 2017 Emulation Market Evolution Emulation moved to Virtualization with Veloce2 Data
More informationUnderstanding multimedia application chacteristics for designing programmable media processors
Understanding multimedia application chacteristics for designing programmable media processors Jason Fritts Jason Fritts, Wayne Wolf, and Bede Liu SPIE Media Processors '99 January 28, 1999 Why programmable
More informationA Software Solution for Dynamic Stack Management on Scratch Pad Memory
A Software Solution for Dynamic Stack Management on Scratch Pad Memory Arun Kannan, Aviral Shrivastava, Amit Pabalkar and Jong-eun Lee Department of Computer Science and Engineering Arizona State University,
More informationProcessors, Performance, and Profiling
Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode
More informationHardware/ Software Partitioning
Hardware/ Software Partitioning Peter Marwedel TU Dortmund, Informatik 12 Germany Marwedel, 2003 Graphics: Alexandra Nolte, Gesine 2011 年 12 月 09 日 These slides use Microsoft clip arts. Microsoft copyright
More informationEfficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems
Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun
More informationHardware-Software Codesign. 9. Worst Case Execution Time Analysis
Hardware-Software Codesign 9. Worst Case Execution Time Analysis Lothar Thiele 9-1 System Design Specification System Synthesis Estimation SW-Compilation Intellectual Prop. Code Instruction Set HW-Synthesis
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016
Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss
More informationEmbedded System Hardware - Processing -
Springer, 2010 12 Embedded System Hardware - Processing - Peter Marwedel Informatik 12 TU Dortmund Germany 2013 年 11 月 12 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.
More informationA Dynamic Instruction Scratchpad Memory for Embedded Processors Managed by Hardware
A Dynamic Instruction Scratchpad Memory for Embedded Processors Managed by Hardware Stefan Metzlaff 1, Irakli Guliashvili 1,SaschaUhrig 2,andTheoUngerer 1 1 Department of Computer Science, University of
More informationCaches in Real-Time Systems. Instruction Cache vs. Data Cache
Caches in Real-Time Systems [Xavier Vera, Bjorn Lisper, Jingling Xue, Data Caches in Multitasking Hard Real- Time Systems, RTSS 2003.] Schedulability Analysis WCET Simple Platforms WCMP (memory performance)
More informationHPC VT Machine-dependent Optimization
HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler
More informationPerformance Tuning on the Blackfin Processor
1 Performance Tuning on the Blackfin Processor Outline Introduction Building a Framework Memory Considerations Benchmarks Managing Shared Resources Interrupt Management An Example Summary 2 Introduction
More informationNew Embedded NVM architectures
New Embedded NVM architectures for Secure & Low Power Microcontrollers Jean DEVIN, Bruno LECONTE Microcontrollers, Memories & Smartcard Group STMicroelectronics 11 th LETI Annual review, June 24th, 2009
More informationAnalysis and Implementation of Global Preemptive Fixed-Priority Scheduling with Dynamic Cache Allocation
Analysis and Implementation of Global Preemptive Fixed-Priority Scheduling with Dynamic Cache Allocation Meng Xu Linh Thi Xuan Phan Hyon-Young Choi Insup Lee Department of Computer and Information Science
More informationAutomatic Selection of Machine Learning Models for WCET-aware Compiler Heuristic Generation
Automatic Selection of Machine Learning Models for WCET-aware Compiler Heuristic Generation Paul Lokuciejewski 1, Marco Stolpe 2, Katharina Morik 2, Peter Marwedel 1 1 Computer Science 12 (Embedded Systems
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationAutomatic Selection of Machine Learning Models for Compiler Heuristic Generation
Automatic Selection of Machine Learning Models for Compiler Heuristic Generation Paul Lokuciejewski 1, Marco Stolpe 2, Katharina Morik 2, Peter Marwedel 1 1 Computer Science 12 (Embedded Systems Group)
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationMapping of Applications to Multi-Processor Systems
Mapping of Applications to Multi-Processor Systems Peter Marwedel TU Dortmund, Informatik 12 Germany Marwedel, 2003 Graphics: Alexandra Nolte, Gesine 2011 年 12 月 09 日 These slides use Microsoft clip arts.
More informationEnhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension
Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami Institute of Systems, Information
More informationArchitectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability
Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability Yiqiang Ding, Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Outline
More informationA Novel Technique to Use Scratch-pad Memory for Stack Management
A Novel Technique to Use Scratch-pad Memory for Stack Management Soyoung Park Hae-woo Park Soonhoi Ha School of EECS, Seoul National University, Seoul, Korea {soy, starlet, sha}@iris.snu.ac.kr Abstract
More informationEmbedded Systems Lecture 11: Worst-Case Execution Time. Björn Franke University of Edinburgh
Embedded Systems Lecture 11: Worst-Case Execution Time Björn Franke University of Edinburgh Overview Motivation Worst-Case Execution Time Analysis Types of Execution Times Measuring vs. Analysing Flow
More informationLecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin
Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Professor Alvin R. Lebeck Computer Science 220 Fall 1999 Exam Average 76 90-100 4 80-89 3 70-79 3 60-69 5 < 60 1 Admin
More informationGPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27
1 / 27 GPU Programming Lecture 1: Introduction Miaoqing Huang University of Arkansas 2 / 27 Outline Course Introduction GPUs as Parallel Computers Trend and Design Philosophies Programming and Execution
More informationReducing Energy Consumption by Dynamic Copying of Instructions onto Onchip Memory
Reducing Consumption by Dynamic Copying of Instructions onto Onchip Memory Stefan Steinke *, Nils Grunwald *, Lars Wehmeyer *, Rajeshwari Banakar +, M. Balakrishnan +, Peter Marwedel * * University of
More informationFast, predictable and low energy memory references through architecture-aware compilation 1
; Fast, predictable and low energy memory references through architecture-aware compilation 1 Peter Marwedel, Lars Wehmeyer, Manish Verma, Stefan Steinke +, Urs Helmig University of Dortmund, Germany +
More informationDesign-Induced Latency Variation in Modern DRAM Chips:
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms Donghyuk Lee 1,2 Samira Khan 3 Lavanya Subramanian 2 Saugata Ghose 2 Rachata Ausavarungnirun
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017
Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the
More informationEvaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System
Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System Daniel Sandell Master s thesis D-level, 20 credits Dept. of Computer Science Mälardalen University Supervisor:
More informationEfficient Resource Management based on NonFunctional Requirements for Sensor/Actuator Networks
Chapter 4 Applications and Impacts Efficient Resource Management based on NonFunctional Requirements for Sensor/Actuator Networks C.Timm¹, F.Weichert², C.Prasse³, H.Müller², M.ten Hompel4 and P.Marwedel¹
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples
More informationReal-Time Cache Management for Multi-Core Virtualization
Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University Benefits of Multi-Core Processors Consolidation
More informationImproving Cache Performance
Improving Cache Performance Computer Organization Architectures for Embedded Computing Tuesday 28 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,
More informationTime Triggered and Event Triggered; Off-line Scheduling
Time Triggered and Event Triggered; Off-line Scheduling Real-Time Architectures -TUe Gerhard Fohler 2004 Mälardalen University, Sweden gerhard.fohler@mdh.se Real-time: TT and ET Gerhard Fohler 2004 1 Activation
More informationSingle-Path Programming on a Chip-Multiprocessor System
Single-Path Programming on a Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationA Framework for Modeling GPUs Power Consumption
A Framework for Modeling GPUs Power Consumption Sohan Lal, Jan Lucas, Michael Andersch, Mauricio Alvarez-Mesa, Ben Juurlink Embedded Systems Architecture Technische Universität Berlin Berlin, Germany January
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationImproving Timing Analysis for Matlab Simulink/Stateflow
Improving Timing Analysis for Matlab Simulink/Stateflow Lili Tan, Björn Wachter, Philipp Lucas, Reinhard Wilhelm Universität des Saarlandes, Saarbrücken, Germany {lili,bwachter,phlucas,wilhelm}@cs.uni-sb.de
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More information