Multitasking on Cortex-M(0) class MCU A deepdive into the Chromium-EC scheduler

Similar documents
Interrupts and Low Power Features

The ARM Cortex-M0 Processor Architecture Part-1

Design and Implementation Interrupt Mechanism

References & Terminology

ECE254 Lab3 Tutorial. Introduction to Keil LPC1768 Hardware and Programmers Model. Irene Huang

Modes and Levels. Registers. Registers. Cortex-M3 programming

ECE251: Thursday September 27

introduction to interrupts

ARM Cortex core microcontrollers

ARM Interrupts. EE383: Introduction to Embedded Systems University of Kentucky. James E. Lumpp

ECE254 Lab3 Tutorial. Introduction to MCB1700 Hardware Programming. Irene Huang

Embedded System Design

OUTLINE. STM32F0 Architecture Overview STM32F0 Core Motivation for RISC and Pipelining Cortex-M0 Programming Model Toolchain and Project Structure

ARM architecture road map. NuMicro Overview of Cortex M. Cortex M Processor Family (2/3) All binary upwards compatible

Interrupt-Driven Input/Output

COEN-4720 Embedded Systems Design Lecture 4 Interrupts (Part 1) Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University

CprE 288 Introduction to Embedded Systems Course Review for Exam 3. Instructors: Dr. Phillip Jones

Interrupts (Exceptions) Gary J. Minden September 11, 2014

Interrupts (Exceptions) (From LM3S1968) Gary J. Minden August 29, 2016

ARM Architecture and Assembly Programming Intro

Lab 4 Interrupt-driven operations

Cortex-M3 Reference Manual

Exception and fault checking on S32K1xx

Computer Organization Laboratory. Class Notes. Instructor: Ken Q. Yang Dept. of ECE, URI

Cortex-M4 Processor Overview. with ARM Processors and Architectures

LDR R0,=0x L: LDREX R1, [R0] ORR R1, #4 STR R1, [R0] (5) Part a) Why does the 9S12 code not have a critical section?

AND SOLUTION FIRST INTERNAL TEST

CODE TIME TECHNOLOGIES. Abassi RTOS. Porting Document. ARM Cortex-M3 CCS

Migrating ARM7 Code to a Cortex-M3 MCU By Todd Hixon, Atmel

EE4144: ARM Cortex-M Processor

ECE 598 Advanced Operating Systems Lecture 8

Arm Cortex -M33 Devices

The ARM Cortex-M0 Processor Architecture Part-2

Lecture 4: Mechanism of process execution. Mythili Vutukuru IIT Bombay

Embedded Operating Systems

ARM Cortex-M0 DesignStart Processor and v6-m Architecture. Joe Bungo ARM University Program Manager Americas/Europe R&D Division

EECS 373 Design of Microprocessor-Based Systems

Application Note. Analyzing HardFaults on Cortex-M CPU

ARM Cortex-M and RTOSs Are Meant for Each Other

Program SoC using C Language

Stacks and Subroutines

ARM Cortex-M4 Architecture and Instruction Set 4: The Stack and subroutines

VE7104/INTRODUCTION TO EMBEDDED CONTROLLERS UNIT III ARM BASED MICROCONTROLLERS

ECE 598 Advanced Operating Systems Lecture 8

Free-RTOS Implementation

Real-Time Systems / Real-Time Operating Systems EE445M/EE380L.6, Spring 2016

ARM Cortex-M4 Programming Model

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Putting it All Together

Lecture 10 Exceptions and Interrupts. How are exceptions generated?

Syscalls, exceptions, and interrupts, oh my!

Cortex-M Software Development

Anne Bracy CS 3410 Computer Science Cornell University

CSCI 8530 Advanced Operating Systems

AN316 Determining the stack usage of applications

Returning from an Exception. ARM Exception(Interrupt) Processing. Exception Vector Table (Assembly code) Exception Handlers (and Vectors) in C code

NET3001. Advanced Assembly

ARM Accredited Engineer Certification

ARM. Cortex -M7 Devices. Generic User Guide. Copyright 2015 ARM. All rights reserved. ARM DUI 0646A (ID042815)

real-time kernel documentation

Stack Frames. September 2, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 September 2, / 15

ELC4438: Embedded System Design ARM Cortex-M Architecture II

The Definitive Guide to the ARM Cortex-M3

Lecture notes Lectures 1 through 5 (up through lecture 5 slide 63) Book Chapters 1-4

Process Context & Interrupts. New process can mess up information in old process. (i.e. what if they both use the same register?)

Troubleshooting Guide

ECE 598 Advanced Operating Systems Lecture 7

CODE TIME TECHNOLOGIES. mabassi RTOS. Porting Document. SMP / ARM Cortex-A9 CCS

Advanced Operating Systems Embedded from Scratch: System boot and hardware access. Federico Terraneo

Implementing Secure Software Systems on ARMv8-M Microcontrollers

IoT and Security: ARM v8-m Architecture. Robert Boys Product Marketing DSG, ARM. Spring 2017: V 3.1

ZYNQ/ARM Interrupt Controller

CODE TIME TECHNOLOGIES. Abassi RTOS. Porting Document. ARM Cortex-M0 Atollic

EE251: Tuesday October 23

(2) Part a) Registers (e.g., R0, R1, themselves). other Registers do not exists at any address in the memory map

Support for high-level languages

Introduction to C. Write a main() function that swaps the contents of two integer variables x and y.

Hakim Weatherspoon CS 3410 Computer Science Cornell University

ARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers

Cortex-M4 Exceptions and Interrupts

Input/Output Programming

Last Time. Think carefully about whether you use a heap Look carefully for stack overflow Especially when you have multiple threads

Hardware OS & OS- Application interface

Cooperative Multitasking

Anne Bracy CS 3410 Computer Science Cornell University

Exam 1. Date: Oct 4, 2018

IMPLEMENTATION OF SIGNAL HANDLING. CS124 Operating Systems Fall , Lecture 15

ECE251: Tuesday September 11

button.c The little button that wouldn t

(5) Question 2. Give the two most important factors for effective debugging. Jonathan W. Valvano

Kinetis Software Optimization

ECE 598 Advanced Operating Systems Lecture 11

CSE 153 Design of Operating Systems

CS 104 Computer Organization and Design

CODE TIME TECHNOLOGIES. Abassi RTOS. Porting Document. ARM Cortex-A9 CCS

Assembly Language Programming

Developing a Generic Hard Fault handler for ARM Cortex-M3/Cortex-M4

Secure software guidelines for ARMv8-M. for ARMv8-M. Version 0.1. Version 2.0. Copyright 2017 ARM Limited or its affiliates. All rights reserved.

Interrupt/Timer/DMA 1

ECE251: Tuesday September 12

Transcription:

Multitasking on Cortex-M(0) class MCU A deepdive into the Chromium-EC scheduler

$whoami Embedded Software Engineer at National Instruments We just finished our first product using Chromium-EC and future ones to come Other stuff I do: fpga-mgr framework (co)maintainer for the linux kernel random drive-by contributions to other projects

Cortex-M0 registers Cortex-M0 has 16 general purpose registers Low registers (r0-r7) High registers (r8-r15) Undefined status at reset Limited size in thumb, often only low regs r13 is stack pointer for current context stack It is banked, eigher msp or psp xpsr is depending on mode (later) APSR (application program status register) EPSR (exception program status register) IPSR (interrupt program status register) PRIMASK (later) CONTROL (later) r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13/ sp r14 / lr r15 / pc xpsr PRIMASK CONTROL msp psp

Stack pointer (sp/r13) Used for accessing stack memory via push and pop instructions Can be modified / accessed like any other reg via ldr, str, subs, adds,... Either r13 or sp will work It is banked (later more) Always word aligned, i.e. lowest two bits will always read 0 Stack is full descending 0x20000000 0x1ffffffc 0x1ffffff8 0x1ffffff4 r7 = 0x42 sp = 0x20000000 0x20000000 0x42 0x1ffffffc 0x1ffffff8 0x1ffffff4 r7 = 0x42 sp = 0x1ffffffc sp push r7 sp

Link Register (r14) & Program Counter (r15) Link register lr is used with subroutine calls and exceptions (later) During subroutine call (using bl / blx), sequentially next pc value is loaded into lr lr[0] set indicates a return to Thumb state Some instructions need lr[0] to be set Reading it will give current instruction + 4 (pipeline) pc[0] should be zero, however bx/blx require it to be set, to make sure we stay in Thumb

Combined Program Status Register the xpsr is a combined register, where apsr is the application program status registers Bits 31:0 are the ALU flags Not Zero Carry Overflow The ipsr contains the exception number in the lower bits 5:0 if 0, then thread mode (later) The epsr is the exception program status register All of them can be accessed with msr / mrs instructions 31 30 29 28 N Z C V 31:6 reserved 31:24 24 reserved T 27:0 reserved 5:0 Exception Number 23:0 reserved

Calling convention r0-r3 are the argument and scratch registers r0-r1 are also the result registers (r1 if result > word size) r4-r8 are callee-save registers (i.e. callee gotta restore them on return) r9 might be a callee-save register or not (on some variants of AAPCS it is a special register, ignore that) r10-r11 are callee-save registers r12-r15 are special registers (r12 is intra procedure scratch register)

PRIMASK PRIMASK If set no exceptions with programmable priority entered If not set, no effect 31:1 0 reserved PRIMASK

CONTROL Only one privilege level in Cortex-M0 CONTROL[0] is reserved (priv. In M3) CONTROL[1] if set sp = psp if not set sp = msp Using CONTROL[1] one can switch between psp and msp 31:2 1 0 reserved reserved Active stack

Thumb State (overview) Two modes Handler mode Thread mode Handler mode always uses msp as stack Thread mode usage of stack depends on setting in control register After reset start out in Thread mode with msp active Thread to Handler mode transition via Exception Exception return Thread mode sp == msp Reset Handler mode sp == msp Exception CONTROL[1] = 1 CONTROL[1] = 0 Thread mode sp == psp Exception return

Exceptions Event that changes program flow Suspends current code, run handler, resume (some) exceptions on Cortex-M0 have fixed priorities Reset NMI Hard Fault some exceptions have programmable priority SysTick (later) PendSV (later) IRQs 0 is the highest priority PRIMASK can be used to mask interrupts Interrupts can be pending

Exceptions: Vectors Cortex-M0 processors support vectored exceptions Table contains addresses of handlers, processor fetches address on exception Processor jumps to correct handler, instead of having single handler Make sure to set a default one

Exceptions (Stacking with MSP) Cortex-M(0) comes with hardware features that make dealing with exceptions easier On exception entry some registers are pushed onto the stack (depending on current mode) These registers form the Exception Context r0-r3, r12, lr, pc, xpsr Stacking happens on current sp (example is if psp is not used) Makes nesting possible Handler Unstacking happens based on lr sp xpsr pc lr r12 r3 r2 r1 r0 sp stacking sp xpsr pc lr r12 r3 r2 r1 r0 unstacking sp Thread

Exceptions (Stacking with PSP) Cortex-M(0) comes with hardware features that make dealing with exceptions easier On exception entry some registers are pushed onto the stack (depending on current mode) These registers form the Exception Context r0-r3, r12, lr, pc, xpsr Stacking happens on current sp Makes nesting possible Unstacking happens based on lr Handler msp sp sp == msp sp == psp xpsr pc lr r12 r3 r2 r1 r0 psp psp xpsr pc lr r12 r3 r2 r1 r0 msp sp Thread stacking unstacking

Exceptions (Tail Chaining) Speeds up exception servicing On completion, if there is a pending exception, unstacking is skipped New handler runs Handler Thread stacking Handler A Exception Return Handler B Exception Return unstacking Exception A Exception B

Exceptions (Late arrival) If higher priority exception arrives before execution of handler, but after stacking Stacking gets reused Handler Thread stacking Handler B Exception Return Handler a Exception Return unstacking Exception A Exception B

Exceptions (Nested) If higher priority exception arrives after Exception handler starts Higher Prio handler preempted Afterwards lower priority handler finished stacking Handler B Exception Return unstacking Exception Return Priority Handler Han stacking dler A unstacking Thread Exception A Exception B

Exceptions (EXC_RETURN) Depending on value of the lr different paths will be taken if lr 0xfffffff1 unstacking msp to handler mode 0xfffffff9 unstacking msp to thread mode 0xfffffffd unstacking psp to thread mode Stacking using msp Exception 0xfffffff1 Unstacking using msp Thread mode sp == msp 0xfffffff9 Exception (nested) Handler mode sp == msp Exception Return Setting / Clearing CONTROL[1] Thread mode sp == psp Stacking using psp 0xfffffffD Unstacking using psp Exception

Startup code / Getting to main() On reset execution starts at reset vector Do a bunch of stuff before we can run normal C Make sure we re in the right state (MSP, Thumb, Privileged, ) Initialize bss section to zero Copy exception vectors to SRAM Copy initialized data section to SRAM e.g. global variables Set initial stack pointer Jump to main()

Getting to main() in Chromium-EC After reset make sure CONTROL = 0 Write bss to zero Copy over vectors Set vector table to SRAM Copy initialized data Go! reset: movs r0, #0 msr control, r0 isb movs r0, #0 ldr r1,_bss_start ldr r2,_bss_end bss_loop: str r0, [r1] adds r1, #4 cmp r1, r2 blt bss_loop ldr r1, =vectors ldr r2, =sram_vtable movs r0, #0 vtable_loop: ldr r3, [r1] str r3, [r2] adds r1, #4 adds r2, #4 adds r0, #1 cmp r0, #48 blt vtable_loop movs r0, #3 ldr r1, =0x40010000 str r0, [r1] ldr r0,_ro_end ldr r1,_data_start ldr r2,_data_end data_loop: ldr r3, [r0] adds r0, #4 str r3, [r1] adds r1, #4 cmp r1, r2 blt data_loop ldr r0, =stack_end mov sp, r0 bl main fini_loop: b fini_loop

Multitasking - Context Switching O O Task A Task B Task C S S O S General idea: Run tasks in a way that each of them can use processor exclusively As opposed to cooperative approaches, tasks don t need to be aware of each other To make that work, each of them will need state, i.e. context to be restored As seen before, Context: registers + stack OS decides who goes next time

Multitasking - Stack layout OS Stack Task A Stack Task B Stack Need one stack per task Need one stack for OS OS stack needs to be large enough to deal with all Exceptions Task doesn t need to know it s own stack OS takes care of dealing with stack pointers Heap (malloc, free ) is optional Task C Stack Heap

Multitasking - Systick Tick Tick Tick O O Task A Task B Task C S S O S Use a timer as periodic event source Use these events to run scheduler Correct prioritization is required So common that ARM provides (optional) one in ARMv6 time

Multitasking in Chromium-EC struct task { u32 sp; u32 events; u32 *stack; }; Task struct as shown No heap Fixed priorities Events Timer Mutex Wake Peripherals using 32 bit or 16 bit hardware timer instead of systick

wait_evt() task_wait_event() Task states tskid!= fls(tasks_ready & tasks_enabled) Running fls() gives first set bit in integer always run task with highest priority tskid is id of current task all tasks start out disabled, except HOOKS task all scheduling happens on event Disabled tasks_enabled[tskid] = 1 task_set_event() Wait for any event Ready tskid == fls(tasks_ready & tasks_enabled) task_set_event() task_wait_event_mask() Wait for specific event

Scheduling (example) Priority CONSOLE HOOKS << IDLE >> Waiting for event Waiting for event Waiting for event Waiting for event Only enabled task at start Enable all tasks Event unblocked CONSOLE Event unblocked CONSOLE

schedule(int desched, int resched) Wrapper function for SVC call, passing desched in r0, and resched in r1 Switches to handler mode (uses MSP) void schedule(int desched, int resched) { register int p0 asm( r0 ) = desched; register int p0 asm( r1 ) = resched; asm( svc 0 ); }

svc_handler push r3 and lr on stack call svc_handler to figure out who goes next svc_handler: push {r3, lr} bl svc_handler ldr r3, =current_task ldr r1, [r3] cmp r0, r1 beq svc_handler_return bl switchto svc_handler_return: pop {r3, pc}

svc_handler (scheduling decision) if desched and!current->events current task is no longer ready task given by resched is now ready pick by priority amongst enabled tasks via fls(tasks_ready & tasks_enabled) return current in r0 task_* svc_handler(int desched, task_id resched) { task_* current; } current = current_task; if (desched &&!current->events) tasks_ready &= ~BIT(current - tasks); tasks_ready = BIT(resched); next = fls(tasks_ready & tasks_enabled); current_task = next; return current;

back in svc_handler svc_handler returned current in r0 compare current in r0 (return value) with next (changed by function call) in r3 if context switch required, call switchto pop r3 and lr (into pc) svc_handler: push {r3, lr} bl svc_handler ldr r3, =current_task ldr r1, [r3] cmp r0, r1 beq svc_handler_return bl switchto svc_handler_return: pop {r3, pc}

switchto get psp for current task into r2 store currently active sp (msp)in r3 make psp our stack (from r2) push old context (remember only got thumb) switchto: mrs r2, psp mov r3, sp mov sp, r2 push {r4-r7} mov r4, r8 mov r5, r9 mov r6, r10 mov r7, r11 push {r4, r7} mov r2, sp mov sp, r3 str r2, [r0] ldr r2, [r1] ldmia r2!, {r4-r7} mov r8, r4 mov r9, r5 mov r10, r6 mov r11, r7 ldmia r2!, {r4-r7} msr psp, r2 bx lr

switchto store active sp into r2 make r3 (old msp) our stack again store r2 (old psp) into old task (first member of struct, pointer in r0) load new sp into r2 (first member of struct, pointer in r1) restore r8-r11, then r4-r7 from new stack make r2 the new psp remember: struct task { u32 sp; [...] }; switchto: mrs r2, psp mov r3, sp mov sp, r2 push {r4-r7} mov r4, r8 mov r5, r9 mov r6, r10 mov r7, r11 push {r4, r7} mov r2, sp mov sp, r3 str r2, [r0] ldr r2, [r1] ldmia r2!, {r4-r7} mov r8, r4 mov r9, r5 mov r10, r6 mov r11, r7 ldmia r2!, {r4-r7} msr psp, r2 bx lr

back in svc_handler pop r3 and lr (into pc) to return from exception svc_handler: push {r3, lr} bl svc_handler ldr r3, =current_task ldr r1, [r3] cmp r0, r1 beq svc_handler_return bl switchto svc_handler_return: pop {r3, pc}

how do we get this thing going? load scratchpad stack into r2 make room for 17 regs (16 regs (r0-15), psr) make that psp switch thread mode stack pointer to psp task_start was called with pointer to started_scheduling variable -> set it to 1 call schedule (remember r0 = 0 -> don t deschedule, r1 contains task_id to schedule) task_start: ldr r2,=scratchpad movs r3, #2 adds r2, #17*4 movs r1, #0 msr psp, r2 movs r2, #1 isb msr control, r3 movs r3, r0 movs r0, #0 isb str r2, [r3] bl schedule movs r0, #1 bx lr

Task states wait_evt() task_wait_event() tskid!= fls(tasks_ready & tasks_enabled) Running task_wait_event_mask() Disabled tasks_enabled[tskid] = 1 tskid == fls(tasks_ready & tasks_enabled) Ready Wait for any event task_set_event() task_set_event() Wait for specific event

task_set_event(tskid, event) Handler Event source is IRQ context (atomically) set event flag in receiver task mark task as ready use pendsv to call schedule() after IRQs are done Example: Timer (process_timers) Thread Task A Handler Han dler IRQ IRQ pendsv_handler Task B svc_handler Event source is task context (atomically) set event flag in receiver task directly call schedule() Example: Mutex (mutex_unlock) Handler Thread Task A schedule Task B

wait_evt(timeout_us, resched) MUST not be called in IRQ context Arm a timer with timeout While (atomic) read of task->events == 0, deschedule ourselves, and reschedule resched If timer expires, return timeout Wrapped in helper u32 task_wait_event(timeout_us) { return wait_evt(timeout_us, TASK_IDLE); }

Example: usleep(u32 timeout) u32 evt = 0; u32 t0 = hw_clock_source_read(); do { evt = task_wait_evt(timeout); } while (!(evt & TASK_EVENT_TIMER) && hw_clock_source_read() - t0 < timeout)); if (evt) atomic_or(task->events, evt & ~TASK_EVENT_TIMER);

Atomic operations Cortex-M0 does not have strex etc so all we can do is disable IRQ, modify, enable IRQ This then will look something like this (assuming address passed in r0, and value in r1) #define ATOMIC_OP(asm_op, a, v) do { \ uint32_t reg0; \ asm volatile (" cpsid i\n" \ " ldr %0, [%1]\n" \ #asm_op" %0, %0, %2\n" \ " str %0, [%1]\n" \ " cpsie i\n" \ : "=&b" (reg0) \ : "b" (a), "r" (v) : "cc"); \ } while (0) static inline void atomic_or(uint32_t volatile *addr, uint32_t bits) { ATOMIC_OP(orr, addr, bits); }

Timers / Timer Events Using 32 bit or 16 bit hardware timers Each task can have a timer, activating it via timer_arm(timestamp_t tstamp, task_id_t tskid) Set compare value for timer to closest deadline On interrupt call process_timers() which compares each tasks deadline with timer value, and expires timers as required Expired timers set timer event, i.e. task that called task_wait_event() or task_wait_event_mask() becomes ready To cancel a timer timer_cancel()

Future work Add OS awareness to OpenOCD (kinda works already, just gotta clean up) Port to RiscV (just because) Port to Microblaze

Questions? email: moritz.fischer@ettus.com / mdf@kernel.org gpg-fingerprint: 135A 2159 8651 9D68 DA5B C3F1 958A 4C04 7304 62CC github: mfischer