Enabling Arm DynamIQ support. Dan Handley (Arm) Ionela Voinescu (Arm) Vincent Guittot (Linaro)

Size: px
Start display at page:

Download "Enabling Arm DynamIQ support. Dan Handley (Arm) Ionela Voinescu (Arm) Vincent Guittot (Linaro)"

Transcription

1 Enabling Arm DynamIQ support Dan Handley (Arm) Ionela Voinescu (Arm) Vincent Guittot (Linaro)

2 Agenda DynamIQ introduction DynamIQ and Arm Trusted Firmware OS Power Management with DynamIQ L3 partial power-down support ENGINEERS AND DEVICES WORKING TOGETHER

3 DynamIQ key features From 1. A new single-cluster design 2. Intelligent compute capabilities 3. Interfaces for closely coupled accelerators 4. Built-in power-saving features 5. DynamIQ big.little 6. Advanced RAS and safety features ENGINEERS AND DEVICES WORKING TOGETHER

4 DynamIQ key features From 1. A new single-cluster design 2. Intelligent compute capabilities 3. Interfaces for closely coupled accelerators 4. Built-in power-saving features 5. DynamIQ big.little 6. Advanced RAS and safety features ENGINEERS AND DEVICES WORKING TOGETHER

5 DynamIQ Shared Unit (DSU) TRM: Armv8.2+ Cortex-A CPU support e.g. Cortex-A55, Cortex-A75 2 different CPU types in same cluster Maximum 8 Per-CPU L1+L2 caches and shared L3 Per-CPU DVFS control Partial L3 cache power down Hardware assisted power management Simplifies power up/down software

6 Agenda DynamIQ introduction DynamIQ and Arm Trusted Firmware OS Power Management with DynamIQ L3 partial power-down support ENGINEERS AND DEVICES WORKING TOGETHER

7 DynamIQ Shared Unit (DSU) and Arm TF DSU enables simpler, faster and more robust software during power up/down Simplified micro-architectural programming sequence Automatic enabling and disabling of coherency with the interconnect Automatic and faster cache flushing at all levels without software intervention Reduced power controller communication via P-channel interface TF enables more performant PSCI operations via HW_ASSISTED_COHERENCY option CPU idle, hotplug, secondary CPU boot Will still work without HW_ASSISTED_COHERENCY but won t get the benefits Allows more aggressive OSPM tuning Warning: Some HW operations will be invisible to SW and may give misleading statistics

8 CPU idle to power down (Armv8.0 CPUs) OS calls SMC CPU_SUSPEND Power Down Validate CPU_SUSPEND arguments Acquire locks for non-cpu levels PSCI state coordination CPU-specific power down handling Disable data caches Flush data cache(s) Disable intra-cluster coherency (!SMP_BIT) Stack maintenance Platform suspend operations Release locks for non-cpu levels Wait For Interrupt (WFI) Reset Power Up Minimal SCTLR initialization Platform reset handling CPU-specific reset handling Errata handling Enable intra-cluster coherency (SMP_BIT) CPU architectural register initialization Enable MMU Acquire locks for non-cpu levels Platform suspend-finish operations Stack maintenance Enable data caches Restore OS context PSCI bookkeeping Release locks for non-cpu levels ERET to OS

9 CPU idle to power down (Armv8.2 CPUs) OS calls SMC CPU_SUSPEND Power Down Validate CPU_SUSPEND arguments Acquire locks for non-cpu levels PSCI state coordination CPU-specific power down handling Request CPU power down (CORE_PWRDN_EN) Platform suspend operations Release locks for non-cpu levels Wait For Interrupt (WFI) Reset Minimal SCTLR initialization Platform reset handling CPU-specific reset handling Errata handling (none yet) CPU architectural register initialization Enable MMU and data caches Acquire locks for non-cpu levels Platform suspend-finish operations Restore OS context PSCI bookkeeping Release locks for non-cpu levels ERET to OS Power Up

10 CPU idle to power down (Armv8.2 CPUs) OS calls SMC CPU_SUSPEND Validate CPU_SUSPEND arguments Acquire locks for non-cpu levels PSCI state coordination CPU-specific power down handling Request CPU power down (CORE_PWRDN_EN) Platform suspend operations Release locks for non-cpu levels Wait For Interrupt (WFI) D$ remains enabled throughout Power Down D$ enabled much earlier Reset Minimal SCTLR initialization Platform reset handling CPU-specific reset handling Errata handling (none yet) CPU architectural register initialization Enable MMU and data caches Acquire locks for non-cpu levels Platform suspend-finish operations Restore OS context PSCI bookkeeping Release locks for non-cpu levels ERET to OS Power Up

11 CPU idle to power down (Armv8.2 CPUs) OS calls SMC CPU_SUSPEND Power Down Validate CPU_SUSPEND arguments Acquire locks for non-cpu levels PSCI state coordination CPU-specific power down handling Request CPU power down (CORE_PWRDN_EN) Platform suspend operations Release locks for non-cpu levels Wait For Interrupt (WFI) No need for explicit cache flushes or stack maintenance Reset Minimal SCTLR initialization Platform reset handling CPU-specific reset handling Errata handling (none yet) CPU architectural register initialization Enable MMU and data caches Acquire locks for non-cpu levels Platform suspend-finish operations Restore OS context PSCI bookkeeping Release locks for non-cpu levels ERET to OS Power Up

12 CPU idle to power down (Armv8.2 CPUs) OS calls SMC CPU_SUSPEND Power Down Validate CPU_SUSPEND arguments Acquire locks for non-cpu levels PSCI state coordination CPU-specific power down handling Much more efficient spin Request locks instead CPU power of bakery down locks (CORE_PWRDN_EN) (using v8.1 CAS instruction) Platform suspend operations Release locks for non-cpu levels Wait For Interrupt (WFI) Reset Minimal SCTLR initialization Platform reset handling CPU-specific reset handling Errata handling (none yet) CPU architectural register initialization Enable MMU and data caches Acquire locks for non-cpu levels Platform suspend-finish operations Restore OS context PSCI bookkeeping Release locks for non-cpu levels ERET to OS Power Up

13 CPU idle to power down (Armv8.2 CPUs) OS calls SMC CPU_SUSPEND Power Down Validate CPU_SUSPEND arguments Acquire locks for non-cpu levels PSCI state coordination CPU-specific power down handling Request CPU power down (CORE_PWRDN_EN) Platform suspend operations Release locks for non-cpu levels Wait For Interrupt (WFI) (Potentially) reduced power controller communication Reset Minimal SCTLR initialization Platform reset handling CPU-specific reset handling Errata handling (none yet) CPU architectural register initialization Enable MMU and data caches Acquire locks for non-cpu levels Platform suspend-finish operations Restore OS context PSCI bookkeeping Release locks for non-cpu levels ERET to OS Power Up No need for explicit interconnect programming for masters to enter/exit coherency

14 Future TF enhancements Use per-thread cluster power voting register CLUSTERPWRDN_EL1 Automatic cluster power down or memory retention if the power controller hardware and firmware support it Remove cluster level locks or at least reduce the time they are held Analyze performance on DynamIQ hardware platforms ENGINEERS AND DEVICES WORKING TOGETHER

15 Agenda DynamIQ introduction DynamIQ and Arm Trusted Firmware OS Power Management with DynamIQ L3 partial power-down support ENGINEERS AND DEVICES WORKING TOGETHER

16 OS Power Management with DynamIQ Finer grained power capabilities Already handled by PM frameworks Per-core Frequency/Voltage domain DSU Frequency/Voltage domain

17 Scheduler domains Current big.little system Example of 4 big cores + 4 LITTLE cores: Energy model layout matches scheduler domain

18 Scheduler domains DynamIQ changes domains boundaries Not necessarily congruent Physical / Voltage / Frequency / Architecture Change the scheduler topology And energy model layout Example of 4 big cores + 4 LITTLE cores:

19 Phantom domains Add intermediate domain Voltage/Frequency boundary Example of 4 big cores + 4 LITTLE cores: Per core DVFS

20 Phantom domains Example of 4 big cores + 4 LITTLE cores: One frequency domain for big cores and one for LITTLE cores Frequency domain close to current big.little system Enable similar scheduler topology

21 OSPM next steps Shared frequency domains Shared voltage domains Impact on energy model Impact on compute capacity Getting notified of power domain OPP change Multiple DynamIQ clusters ENGINEERS AND DEVICES WORKING TOGETHER Reference:

22 Agenda DynamIQ introduction DynamIQ and Arm Trusted Firmware OS Power Management with DynamIQ L3 partial power-down support ENGINEERS AND DEVICES WORKING TOGETHER

23 L3 partial power-down Arm DynamIQ Shared Unit (DSU) L3 cache Implementation specific number of portions controlled through a power control register Counters for cache misses and cache hits to help drive decisions ENGINEERS AND DEVICES WORKING TOGETHER Support in software DevFreq driver Control of active portions based on: Cache hit/miss rates Computed power benefit Bias for performance Out of tree reference implementation:

24 L3 partial power-down: architecture Linux Kernel DSU register interface DSU L3 cache DevFreq governor Target portions hit counter miss counter Timer 10ms DevFreq device Set target portions control register Update DevFreq

25 L3 partial power-down: algorithm Upsize: Weigh additional cost in energy of enabling another portion against potential savings by decreasing dynamic cost of accessing DRAM. Condition for upsize: MBW > (1.0 Tu) * CB Compare energy consumption Bias for performance L3 cache static MBW miss bandwidth: MiB/sec CB cost bandwidth: MiB/sec CB = L / ED L static leakage of single portion: uj/sec ED dynamic energy of DRAM: uj/mib Tu upsizing threshold: fraction 0.00 to 1.00 Bias for performance DRAM dynamic energy

26 L3 partial power-down: algorithm - 1 Downsize: From an energy trade-off perspective, to justify a portion to be powered on, requires a hit bandwidth that pays for its leakage. If that requirement is not met, it can be powered-off. Compare energy consumption Bias for performance Condition for downsize: HBW < (N Td) * CB DRAM dynamic energy HBW hit bandwidth: MiB/sec N current number of portions enabled CB cost bandwidth: MiB/sec CB = L / ED L static leakage of single portion: uj/sec ED dynamic energy of DRAM: uj/mib Td downsize threshold: fraction 0.00 to 1.00 Bias for performance L3 cache static

27 L3 partial power-down: behaviour Example: 2MB L3 cache Memcpy workload with buffer size of 4MB

28 L3 partial power-down: behaviour - 1 Expected behaviour: CPU intensive workloads should not have an effect on the number of active portions I/O intensive loads should raise portions when the cache is well used

29 L3 partial power-down Limitations of current reference implementation Portion is the smallest single unit of the cache that can be powered up/down Only support for a single DynamIQ Shared Unit Not suitable for use with the simple on-demand governor L3 partial power-down in Arm Trusted Firmware? ENGINEERS AND DEVICES WORKING TOGETHER Reference:

30 Thank You #SFO17 BUD17 keynotes and videos on: connect.linaro.org For further information:

Copyright 2017 ARM Limited or its affiliates. All rights reserved.

Copyright 2017 ARM Limited or its affiliates. All rights reserved. Page 1 of 33 Revision Information The following revisions have been made to this document. Date Version Confidentiality Change 31 March 2017 1.0 Non-confidential Initial version of the document 30 October

More information

Trusted Firmware Deep Dive. Dan Handley Charles Garcia-Tobin

Trusted Firmware Deep Dive. Dan Handley Charles Garcia-Tobin Trusted Firmware Deep Dive Dan Handley Charles Garcia-Tobin 1 Agenda Architecture overview Memory usage Code organisation Cold boot deep dive PSCI deep dive 2 Example System Architecture Normal World Secure

More information

ARM Trusted Firmware From Embedded to Enterprise. Dan Handley

ARM Trusted Firmware From Embedded to Enterprise. Dan Handley ARM Trusted Firmware From Embedded to Enterprise Dan Handley Agenda Quick recap Project news Security hardening AArch32 support ENGINEERS AND DEVICES WORKING TOGETHER Other enhancements Translation table

More information

How to get realistic C-states latency and residency? Vincent Guittot

How to get realistic C-states latency and residency? Vincent Guittot How to get realistic C-states latency and residency? Vincent Guittot Agenda Overview Exit latency Enter latency Residency Conclusion Overview Overview PMWG uses hikey960 for testing our dev on b/l system

More information

Attack Your SoC Power Challenges with Virtual Prototyping

Attack Your SoC Power Challenges with Virtual Prototyping Attack Your SoC Power Challenges with Virtual Prototyping Stefan Thiel Gunnar Braun Accellera Systems Initiative 1 Agenda Part #1: Power-aware Architecture Definition Part #2: Power-aware Software Development

More information

ARM Trusted Firmware: Changes for Axxia

ARM Trusted Firmware: Changes for Axxia ARM Trusted Firmware: Changes for Axxia atf_84091c4_axxia_1.39 Clean up klocwork issues, Critical and Error only, and only in code added to support Axxia. atf_84091c4_axxia_1.38 Allow non-secure access

More information

ARM big.little Technology Unleashed An Improved User Experience Delivered

ARM big.little Technology Unleashed An Improved User Experience Delivered ARM big.little Technology Unleashed An Improved User Experience Delivered Govind Wathan Product Specialist Cortex -A Mobile & Consumer CPU Products 1 Agenda Introduction to big.little Technology Benefits

More information

ARM Trusted Firmware ARM UEFI SCT update

ARM Trusted Firmware ARM UEFI SCT update presented by ARM Trusted Firmware ARM UEFI SCT update UEFI US Fall Plugfest September 20-22, 2016 Presented by Charles García-Tobin (ARM) Updated 2011-06-01 Agenda ARM Trusted Firmware What and why UEFI

More information

ARM Vision for Thermal Management and Energy Aware Scheduling on Linux

ARM Vision for Thermal Management and Energy Aware Scheduling on Linux ARM Vision for Management and Energy Aware Scheduling on Linux Charles Garcia-Tobin, Software Power Architect, ARM Thomas Molgaard, Director of Product Management, ARM ARM Tech Symposia China 2015 November

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 22

ECE 571 Advanced Microprocessor-Based Design Lecture 22 ECE 571 Advanced Microprocessor-Based Design Lecture 22 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 19 April 2018 HW#11 will be posted Announcements 1 Reading 1 Exploring DynamIQ

More information

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Improving Energy Efficiency in High-Performance Mobile Platforms Peter Greenhalgh, ARM September 2011 This paper presents the rationale and design

More information

ACPI-next C-States Charles Garcia-Tobin Oct 2013

ACPI-next C-States Charles Garcia-Tobin Oct 2013 1 ACPI-next C-States Charles Garcia-Tobin Oct 2013 ACPI-Next C-states Numerical non-equivalency Types of states Topology awareness Additional Information: Version, BreakEven, S/R, Cache Device, Power Resource,

More information

SoC Idling & CPU Cluster PM

SoC Idling & CPU Cluster PM SoC Idling & CPU Cluster PM Presented by Ulf Hansson Lina Iyer Kevin Hilman Date BKK16-410 March 10, 2016 Event Linaro Connect BKK16 SoC Idling & CPU Cluster PM Idle management of devices via runtime PM

More information

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving Cortex-A75 and Cortex- DynamIQ processors Powering applications from mobile to autonomous driving Lionel Belnet Sr. Product Manager Arm Arm Tech Symposia 2017 Agenda Market growth and trends DynamIQ technology

More information

Dynamic secure firmware configuration. Dan Handley (Arm)

Dynamic secure firmware configuration. Dan Handley (Arm) Dynamic secure firmware configuration Dan Handley (Arm) Recap BUD17 had a session to discuss possible secure world use of kernel Device Tree (DT) Like the kernel, it s desirable to have a single set of

More information

A Study on C-group controlled big.little Architecture

A Study on C-group controlled big.little Architecture A Study on C-group controlled big.little Architecture Renesas Electronics Corporation New Solutions Platform Business Division Renesas Solutions Corporation Advanced Software Platform Development Department

More information

Towards Power Management for FreeBSD

Towards Power Management for FreeBSD Towards Power Management for FreeBSD Robin Randhawa robin.randhawa@arm.com FreeBSD Developer Summit Computer Laboratory University of Cambridge August 2015 Agenda An overview of Energy Aware Scheduling

More information

KeyStone II. CorePac Overview

KeyStone II. CorePac Overview KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB

More information

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving Stefan Rosinger Director, Product Management Arm Arm TechCon 2017 Agenda Market growth and trends DynamIQ

More information

Intelligent Power Allocation for Consumer & Embedded Thermal Control

Intelligent Power Allocation for Consumer & Embedded Thermal Control Intelligent Power Allocation for Consumer & Embedded Thermal Control Ian Rickards ARM Ltd, Cambridge UK ELC San Diego 5-April-2016 Existing Linux Thermal Framework Trip1 Trip0 Thermal trip mechanism using

More information

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Maximizing heterogeneous system performance with ARM interconnect and CCIX Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable

More information

LCA14-104: GTS- A solution to support ARM s big.little technology. Mon-3-Mar, 11:15am, Mathieu Poirier

LCA14-104: GTS- A solution to support ARM s big.little technology. Mon-3-Mar, 11:15am, Mathieu Poirier LCA14-104: GTS- A solution to support ARM s big.little technology Mon-3-Mar, 11:15am, Mathieu Poirier Today s Presentation: Things to know about Global Task Scheduling (GTS). MP patchset description and

More information

Next Generation Enterprise Solutions from ARM

Next Generation Enterprise Solutions from ARM Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the

More information

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014 Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline

More information

Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise)

Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise) Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise) Application vs. Pure Workloads Benchmarks that reproduce application workloads Assist in

More information

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,

More information

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400

More information

Designing Security & Trust into Connected Devices

Designing Security & Trust into Connected Devices Designing Security & Trust into Connected Devices Eric Wang Sr. Technical Marketing Manager Tech Symposia China 2015 November 2015 Agenda Introduction Security Foundations on ARM Cortex -M Security Foundations

More information

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM Integrating CPU and GPU, The ARM Methodology Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM The ARM Business Model Global leader in the development of

More information

Power Management for Embedded Systems

Power Management for Embedded Systems Power Management for Embedded Systems Minsoo Ryu Hanyang University Why Power Management? Battery-operated devices Smartphones, digital cameras, and laptops use batteries Power savings and battery run

More information

SMP bring up on ARM SoCs

SMP bring up on ARM SoCs Embedded Linux Conference 2014 SMP bring up on ARM SoCs Gregory CLEMENT Bootlin gregory.clement@bootlin.com - Kernel, drivers and embedded Linux - Development, consulting, training and support - https://bootlin.com

More information

RA3 - Cortex-A15 implementation

RA3 - Cortex-A15 implementation Formation Cortex-A15 implementation: This course covers Cortex-A15 high-end ARM CPU - Processeurs ARM: ARM Cores RA3 - Cortex-A15 implementation This course covers Cortex-A15 high-end ARM CPU OBJECTIVES

More information

ARM instruction sets and CPUs for wide-ranging applications

ARM instruction sets and CPUs for wide-ranging applications ARM instruction sets and CPUs for wide-ranging applications Chris Turner Director, CPU technology marketing ARM Tech Forum Taipei July 4 th 2017 ARM computing is everywhere #1 shipping GPU in the world

More information

Low Power System-on-Chip Design Chapters 3-4

Low Power System-on-Chip Design Chapters 3-4 1 Low Power System-on-Chip Design Chapters 3-4 Tomasz Patyk 2 Chapter 3: Multi-Voltage Design Challenges in Multi-Voltage Designs Voltage Scaling Interfaces Timing Issues in Multi-Voltage Designs Power

More information

Reliability, Availability, and Serviceability (RAS) on AArch64. Fu Wei (Linaro LEG) Supreeth Venkatesh (ARM)

Reliability, Availability, and Serviceability (RAS) on AArch64. Fu Wei (Linaro LEG) Supreeth Venkatesh (ARM) Reliability, Availability, and Serviceability (RAS) on AArch64 Fu Wei (Linaro LEG) Supreeth Venkatesh (ARM) AGENDA 1. Brief introduction of RAS 2. RAS on AArch64 3. Definition, Importance, History Overview

More information

19: I/O Devices: Clocks, Power Management

19: I/O Devices: Clocks, Power Management 19: I/O Devices: Clocks, Power Management Mark Handley Clock Hardware: A Programmable Clock Pulses Counter, decremented on each pulse Crystal Oscillator On zero, generate interrupt and reload from holding

More information

Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems

Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Matthew J. Walker*, Stephan Diestelhorst, Geoff V. Merrett* and Bashir M. Al-Hashimi* *University of Southampton Arm Ltd.

More information

QoS Handling with DVFS (CPUfreq & Devfreq)

QoS Handling with DVFS (CPUfreq & Devfreq) QoS Handling with DVFS (CPUfreq & Devfreq) MyungJoo Ham SW Center, 1 Performance Issues of DVFS Performance Sucks w/ DVFS! Battery-life Still Matters More Devices (components) w/ DVFS More Performance

More information

Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Lecture 21: Virtual Memory. Spring 2018 Jason Tang Lecture 21: Virtual Memory Spring 2018 Jason Tang 1 Topics Virtual addressing Page tables Translation lookaside buffer 2 Computer Organization Computer Processor Memory Devices Control Datapath Input Output

More information

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Li Chen, Staff AE Cadence China Agenda Performance Challenges Current Approaches Traffic Profiles Intro Traffic Profiles Implementation

More information

Embedded Systems Architecture

Embedded Systems Architecture Embedded System Architecture Software and hardware minimizing energy consumption Conscious engineer protects the natur M. Eng. Mariusz Rudnicki 1/47 Software and hardware minimizing energy consumption

More information

ARM Trusted Firmware Evolution HKG15 February Andrew Thoelke Systems & Software, ARM

ARM Trusted Firmware Evolution HKG15 February Andrew Thoelke Systems & Software, ARM ARM Trusted Evolution HKG15 February 2015 Andrew Thoelke Systems & Software, ARM 1 ARM Trusted for 64-bit ARMv8-A A refresher Standardized EL3 Runtime For all 64-bit ARMv8-A systems Reducing porting and

More information

Cortex-A15 MPCore Software Development

Cortex-A15 MPCore Software Development Cortex-A15 MPCore Software Development Course Description Cortex-A15 MPCore software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to

More information

POWER MANAGEMENT AND ENERGY EFFICIENCY

POWER MANAGEMENT AND ENERGY EFFICIENCY POWER MANAGEMENT AND ENERGY EFFICIENCY * Adopted Power Management for Embedded Systems, Minsoo Ryu 2017 Operating Systems Design Euiseong Seo (euiseong@skku.edu) Need for Power Management Power consumption

More information

ARM64 Server RAS Solutions. Jonathan (Zhixiong) Zhang Cavium Inc.

ARM64 Server RAS Solutions. Jonathan (Zhixiong) Zhang Cavium Inc. ARM64 Server RAS Solutions Jonathan (Zhixiong) Zhang Cavium Inc. Agenda Overview Solutions Building blocks Reflections Overview Reliability, Availability, Serviceability RAS is one of the most important

More information

ARMv8-A Software Development

ARMv8-A Software Development ARMv8-A Software Development Course Description ARMv8-A software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software for

More information

Power management for in-vehicle infotainment systems

Power management for in-vehicle infotainment systems Automotive Linux Summit 2017 Power management for in-vehicle infotainment systems 2017/05/31 Takahiko Gomi Automotive Information Solution Business Division Renesas Electronics Corporation 1 Who am I?

More information

CS 326: Operating Systems. CPU Scheduling. Lecture 6

CS 326: Operating Systems. CPU Scheduling. Lecture 6 CS 326: Operating Systems CPU Scheduling Lecture 6 Today s Schedule Agenda? Context Switches and Interrupts Basic Scheduling Algorithms Scheduling with I/O Symmetric multiprocessing 2/7/18 CS 326: Operating

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

Cortex-A9 MPCore Software Development

Cortex-A9 MPCore Software Development Cortex-A9 MPCore Software Development Course Description Cortex-A9 MPCore software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop

More information

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator

More information

Reliability, Availability, and Serviceability(RAS) on ARM64. Wei Fu

Reliability, Availability, and Serviceability(RAS) on ARM64. Wei Fu Reliability, Availability, and Serviceability(RAS) on ARM64 Wei Fu AGENDA What is RAS? ARMv8 CPU requirements for RAS BERT and CPER, HEST and GHESv2, EINJ/ERST SW components for RAS(in example) CPU core,

More information

Operating System Review

Operating System Review COP 4225 Advanced Unix Programming Operating System Review Chi Zhang czhang@cs.fiu.edu 1 About the Course Prerequisite: COP 4610 Concepts and Principles Programming System Calls Advanced Topics Internals,

More information

Hardware OS & OS- Application interface

Hardware OS & OS- Application interface CS 4410 Operating Systems Hardware OS & OS- Application interface Summer 2013 Cornell University 1 Today How my device becomes useful for the user? HW-OS interface Device controller Device driver Interrupts

More information

Arm Server Ready. Dong Wei

Arm Server Ready. Dong Wei Arm Server Ready Dong Wei Agenda Arm ServerReady Program SBSA/SBBR Updates PCIe Integration Updates UEFI Forum Updates Server Management Strategy ENGINEERS AND DEVICES WORKING TOGETHER Agenda Arm ServerReady

More information

Energy Discounted Computing On Multicore Smartphones Meng Zhu & Kai Shen. Atul Bhargav

Energy Discounted Computing On Multicore Smartphones Meng Zhu & Kai Shen. Atul Bhargav Energy Discounted Computing On Multicore Smartphones Meng Zhu & Kai Shen Atul Bhargav Overview Energy constraints in a smartphone Li-Ion Battery Arm big.little Hardware Sharing What is Energy Discounted

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number

More information

UTILIZING A BIG.LITTLE TM SOLUTION IN AUTOMOTIVE

UTILIZING A BIG.LITTLE TM SOLUTION IN AUTOMOTIVE UTILIZING A BIG.LITTLE TM SOLUTION IN AUTOMOTIVE JUN. 20, 2018 YOSHIYUKI ITO AUTOMOTIVE INFORMATION SOLUTION BUSINESS DIVISION RENESAS ELECTRONICS CORPORATION Today s Topics & Goal Requirement for big.little

More information

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. ARM CORTEX-R52 Course Family: ARMv8-R Cortex-R CPU Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. Duration: 4 days Prerequisites and related

More information

Virtual to physical address translation

Virtual to physical address translation Virtual to physical address translation Virtual memory with paging Page table per process Page table entry includes present bit frame number modify bit flags for protection and sharing. Page tables can

More information

HY225 Lecture 12: DRAM and Virtual Memory

HY225 Lecture 12: DRAM and Virtual Memory HY225 Lecture 12: DRAM and irtual Memory Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS May 16, 2011 Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 1 / 36 DRAM Fundamentals Random-access

More information

Building blocks for 64-bit Systems Development of System IP in ARM

Building blocks for 64-bit Systems Development of System IP in ARM Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects

More information

Optimal Algorithm. Replace page that will not be used for longest period of time Used for measuring how well your algorithm performs

Optimal Algorithm. Replace page that will not be used for longest period of time Used for measuring how well your algorithm performs Optimal Algorithm Replace page that will not be used for longest period of time Used for measuring how well your algorithm performs page 1 Least Recently Used (LRU) Algorithm Reference string: 1, 2, 3,

More information

Application Note 228

Application Note 228 Application Note 228 Implementing DMA on ARM SMP Systems Document number: ARM DAI 0228 A Issued: 1 st August 2009 Copyright ARM Limited 2009 Copyright 2006 ARM Limited. All rights reserved. Application

More information

DynamIQ Processor Designs Using Cortex-A75 & Cortex- A55 for 5G Networks

DynamIQ Processor Designs Using Cortex-A75 & Cortex- A55 for 5G Networks DynamIQ Processor Designs Using Cortex-A75 & Cortex- A55 for 5G Networks 2017 Arm Limited David Koenen Sr. Product Manager, Arm Arm Tech Symposia 2017, Taipei Agenda 5G networks Ecosystem software to support

More information

Scheduling, part 2. Don Porter CSE 506

Scheduling, part 2. Don Porter CSE 506 Scheduling, part 2 Don Porter CSE 506 Logical Diagram Binary Memory Formats Allocators Threads Today s Lecture Switching System to CPU Calls RCU scheduling File System Networking Sync User Kernel Memory

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Seventh Edition By William Stallings Objectives of Chapter To provide a grand tour of the major computer system components:

More information

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Address Translation Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics How to reduce the size of page tables? How to reduce the time for

More information

Embedded Systems: Projects

Embedded Systems: Projects November 2016 Embedded Systems: Projects Davide Zoni PhD email: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni Contacts & Places Prof. William Fornaciari (Professor in charge) email: william.fornaciari@polimi.it

More information

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block

More information

Chapter 2 Computer-System Structure

Chapter 2 Computer-System Structure Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual

More information

The Cache-Coherence Problem

The Cache-Coherence Problem The -Coherence Problem Lecture 12 (Chapter 6) 1 Outline Bus-based multiprocessors The cache-coherence problem Peterson s algorithm Coherence vs. consistency Shared vs. Distributed Memory What is the difference

More information

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems Designing, developing, debugging ARM and heterogeneous multi-processor systems Kinjal Dave Senior Product Manager, ARM ARM Tech Symposia India December 7 th 2016 Topics Introduction System design Software

More information

Server-Related Policy Configuration

Server-Related Policy Configuration BIOS Settings, page 1 CIMC Security Policies, page 63 Graphics Card Policies, page 68 Configuring Local Disk Configuration Policies, page 70 Scrub Policies, page 87 Configuring DIMM Error Management, page

More information

AArch64 Virtualization

AArch64 Virtualization Connect AArch64 User Virtualization Guide Version Version 0.11.0 Page 1 of 13 Revision Information The following revisions have been made to this User Guide. Date Issue Confidentiality Change 03 March

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 24

ECE 571 Advanced Microprocessor-Based Design Lecture 24 ECE 571 Advanced Microprocessor-Based Design Lecture 24 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 25 April 2013 Project/HW Reminder Project Presentations. 15-20 minutes.

More information

Advanced IP solutions enabling the autonomous driving revolution

Advanced IP solutions enabling the autonomous driving revolution Advanced IP solutions enabling the autonomous driving revolution Chris Turner Director, Emerging Technology & Strategy, Embedded & Automotive Arm Shanghai, Beijing, Shenzhen Arm Tech Symposia 2017 Agenda

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 10: Paging Geoffrey M. Voelker Lecture Overview Today we ll cover more paging mechanisms: Optimizations Managing page tables (space) Efficient

More information

Embedded System Architecture

Embedded System Architecture Embedded System Architecture Software and hardware minimizing energy consumption Conscious engineer protects the natur Embedded Systems Architecture 1/44 Software and hardware minimizing energy consumption

More information

Designing Security & Trust into Connected Devices

Designing Security & Trust into Connected Devices Designing Security & Trust into Connected Devices Rob Coombs Security Marketing Director TechCon 11/10/15 Agenda Introduction Security Foundations on Cortex-M Security Foundations on Cortex-A Use cases

More information

Hardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc.

Hardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc. Hardware and Software solutions for scaling highly threaded processors Denis Sheahan Distinguished Engineer Sun Microsystems Inc. Agenda Chip Multi-threaded concepts Lessons learned from 6 years of CMT

More information

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc. On-chip Networks Enable the Dark Silicon Advantage Drew Wingard CTO & Co-founder Sonics, Inc. Agenda Sonics history and corporate summary Power challenges in advanced SoCs General power management techniques

More information

Dongjun Shin Samsung Electronics

Dongjun Shin Samsung Electronics 2014.10.31. Dongjun Shin Samsung Electronics Contents 2 Background Understanding CPU behavior Experiments Improvement idea Revisiting Linux I/O stack Conclusion Background Definition 3 CPU bound A computer

More information

HKG : OpenAMP Introduction. Wendy Liang

HKG : OpenAMP Introduction. Wendy Liang HKG2018-411: OpenAMP Introduction Wendy Liang Agenda OpenAMP Projects Overview OpenAMP Libraries Changes in Progress Future Improvements OpenAMP Projects Overview Introduction With today s sophisticated

More information

No Tradeoff Low Latency + High Efficiency

No Tradeoff Low Latency + High Efficiency No Tradeoff Low Latency + High Efficiency Christos Kozyrakis http://mast.stanford.edu Latency-critical Applications A growing class of online workloads Search, social networking, software-as-service (SaaS),

More information

AMD Fusion APU: Llano. Marcello Dionisio, Roman Fedorov Advanced Computer Architectures

AMD Fusion APU: Llano. Marcello Dionisio, Roman Fedorov Advanced Computer Architectures AMD Fusion APU: Llano Marcello Dionisio, Roman Fedorov Advanced Computer Architectures Outline Introduction AMD Llano architecture AMD Llano CPU core AMD Llano GPU Memory access management Turbo core technology

More information

Handout 4 Memory Hierarchy

Handout 4 Memory Hierarchy Handout 4 Memory Hierarchy Outline Memory hierarchy Locality Cache design Virtual address spaces Page table layout TLB design options (MMU Sub-system) Conclusion 2012/11/7 2 Since 1980, CPU has outpaced

More information

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management CSE 120 July 18, 2006 Day 5 Memory Instructor: Neil Rhodes Translation Lookaside Buffer (TLB) Implemented in Hardware Cache to map virtual page numbers to page frame Associative memory: HW looks up in

More information

The Role UEFI Technologies Play in ARM Platform Architecture

The Role UEFI Technologies Play in ARM Platform Architecture presented by The Role UEFI Technologies Play in ARM Platform Architecture Spring 2017 UEFI Seminar and Plugfest March 27-31, 2017 Presented by Dong Wei (ARM) Updated 2011-06- 01 UEFI Plugfest March 2017

More information

Computer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications

Computer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications Computer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications concurrently on all computers in the cluster. Disadvantages:

More information

Tailoring TrustZone as SMM Equivalent

Tailoring TrustZone as SMM Equivalent presented by Tailoring TrustZone as SMM Equivalent Tony C.S. Lo Senior Manager American Megatrends Inc. UEFI Plugfest March 2016 www.uefi.org 1 Agenda Introduction ARM TrustZone SMM-Like Services in TrustZone

More information

Computer Architecture and OS. EECS678 Lecture 2

Computer Architecture and OS. EECS678 Lecture 2 Computer Architecture and OS EECS678 Lecture 2 1 Recap What is an OS? An intermediary between users and hardware A program that is always running A resource manager Manage resources efficiently and fairly

More information

Inf2C - Computer Systems Lecture 16 Exceptions and Processor Management

Inf2C - Computer Systems Lecture 16 Exceptions and Processor Management Inf2C - Computer Systems Lecture 16 Exceptions and Processor Management Boris Grot School of Informatics University of Edinburgh Class party! When: Friday, Dec 1 @ 8pm Where: Bar 50 on Cowgate Inf2C Computer

More information

Threads Implementation. Jo, Heeseung

Threads Implementation. Jo, Heeseung Threads Implementation Jo, Heeseung Today's Topics How to implement threads? User-level threads Kernel-level threads Threading models 2 Kernel/User-level Threads Who is responsible for creating/managing

More information

F28HS Hardware-Software Interface: Systems Programming

F28HS Hardware-Software Interface: Systems Programming F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2017/18 0 No proprietary software has

More information

ARM DynamIQ Shared Unit

ARM DynamIQ Shared Unit ARM DynamIQ Shared Unit Revision: r0p2 Technical Reference Manual Copyright 2016, 2017 ARM Limited or its affiliates. All rights reserved. ARM 100453_0002_00_en ARM DynamIQ Shared Unit ARM DynamIQ Shared

More information

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem

More information

Protection and System Calls. Otto J. Anshus

Protection and System Calls. Otto J. Anshus Protection and System Calls Otto J. Anshus Protection Issues CPU protection Prevent a user from using the CPU for too long Throughput of jobs, and response time to events (incl. user interactive response

More information

Background Heterogeneous Architectures Performance Modeling Single Core Performance Profiling Multicore Performance Estimation Test Cases Multicore

Background Heterogeneous Architectures Performance Modeling Single Core Performance Profiling Multicore Performance Estimation Test Cases Multicore By Dan Stafford Background Heterogeneous Architectures Performance Modeling Single Core Performance Profiling Multicore Performance Estimation Test Cases Multicore Design Space Results & Observations General

More information