OUTLINE. STM32F0 Architecture Overview STM32F0 Core Motivation for RISC and Pipelining Cortex-M0 Programming Model Toolchain and Project Structure

ARCHITECTURE AND PROGRAMMING George E Hadley, Timothy Rogers, and David G Meyer 2018, Images Property of their Respective Owners

OUTLINE STM32F0 Architecture Overview STM32F0 Core Motivation for RISC and Pipelining Cortex-M0 Programming Model Toolchain and Project Structure

THE ARM BUSINESS MODEL License Controls whole HW design stack Fab their own chips, so can beat the competition by getting to smaller transistors fastest Do not fab anything Only designs and sells IP Licenses their IP to other companies allow them to modify designs for $$$

STM32F0 OVERVIEW STM32F0 Top Level Block Diagram (from Figure 1 of STM32F051R8T6 Datasheet)

STM32F0 OVERVIEW Microcontroller Core

STM32F0 OVERVIEW Memory (SRAM and Flash)

STM32F0 OVERVIEW Reset and Clock Control

STM32F0 OVERVIEW Internal System Bus

STM32F0 OVERVIEW General-Purpose I/O (GPIO)

STM32F0 OVERVIEW Timer and Real Time Clock (RTC)

STM32F0 OVERVIEW ADC, DAC, and Comparators

STM32F0 OVERVIEW Universal Synchronous/Asynchronous Receiver/Transmitter (USART)

STM32F0 OVERVIEW Serial Peripheral Interface (SPI)

STM32F0 OVERVIEW Inter-Integrated Circuit (I 2 C)

STM32F0 CORE

STM32F0 CORE CPU: Central processor, executes instructions, performs embedded math, etc. JTAG: Programming/debug interface to processor Serial Wire: Programming/debug interface to processor Nested Vector Interrupt Controller (NVIC): Handles context switching, interrupt priorities, etc. Wake Up Interrupt Controller (WIC): Allows processor to wake up from low power operation from external stimuli Data Watchpoint: Used to keep track of register values Breakpoint Unit: Used to pause processor execution at specific locations in memory (used for debugging)

STM32F0 CORE The CPU is treated as a proprietary black box from ARM Design and implementation of processor cores is taught in ECE 437 Illustrative single-cycle (left) and pipelined (right) processor cores

MOTIVATION FOR RISC CISC (e.g., x86, 9S12) Large variety of op codes + addressing modes Variable length instructions More complicated HW Less complicated compilers Many multi-cycle instructions Better code density Difficult to pipeline RISC (e.g., ARM, PIC) Fewer operations + addressing modes Fixed size instructions Less complicated HW More complicated compilers Most instructions are singlecycle Worse code density Easier to pipeline

MOTIVATION FOR RISC Why Bother with RISC? 1. Simpler to decode 2. Simpler instructions = simpler hardware 3. Easier to pipeline WHY BOTHER WITH RISC?

MOTIVATION FOR PIPELINING Steps Required to Process a Machine Instruction 1. Get the instruction from memory (Fetch) 2. Figure out what the instruction does (Decode) 3. Do it! (Execute) What determines the maximum clock frequency of a digital circuit?

MOTIVATION FOR PIPELINING Single-Cycle Processor Core One Long Cycle

MOTIVATION FOR PIPELINING Pipelined Processor Core One Shorter Cycle

MOTIVATION FOR PIPELINING Pipelining Analogy 1st task Early in Time Later in Time Last task Simultaneous activity From 7:30 to 8:00 pm

MOTIVATION FOR PIPELINING Pipelining Instructions Clock Cycle Number 1 2 3 4 5 6 7 8 9 Instruction i IF ID EX MEM WB Instruction i+1 IF ID EX MEM WB Instruction i+2 IF ID EX MEM WB Instruction i+3 IF ID EX MEM WB Instruction i+4 IF ID EX MEM WB Improves instruction throughput. Once full, instructions finish at a rate of one/cycle.

MOTIVATION FOR PIPELINING Why Pipelining is Important Increases clock speed Allows for overlapped execution of instructions, therefore increasing hardware utilization All high performance processors are pipelined

MOTIVATION FOR PIPELINING Why Not Pipeline Indefinitely? Pipeline registers at each stage add additional latency setup/hold time of registers More hardware required for each stage Deeper pipelines are more sensitive to branch misprediction

MOTIVATION FOR PIPELINING An Ideal Pipeline Uniform Sub-computations Goal: Each stage has same delay Achieve by balancing pipeline stages Identical Computations Goal: Each computation uses same number of stages Achieve by unifying instruction types Independent Computations Goal: Avoid hazards Look for ways to minimize pipeline stalls

MOTIVATION FOR PIPELINING Dependencies I1: Add R3,R1,R2 I2: Add R4,R3,R5 I2 needs the results (stored in R3) from I1 Want both instructions to execute at the same time Might have to wait executing I2 until I1 has completed

MOTIVATION FOR PIPELINING How RISC Facilitates Pipelining Uniform Sub-computations Memory addressing modes <=> disparity of speed between processor and memory Identical Computations RISC: reducing complexity makes each instruction use roughly the same number of stages Independent Computations Reg-Reg ( Load-Store ): makes it easier to identify dependencies (versus Reg-Mem)

CORTEX-M0 PROGRAMMING MODEL High-Level Concepts ARMv6-M architecture supports multiple states, modes, and levels: Thumb State: general operating state Thread Mode: used when running normal code Privileged Level: full processor access Unprivileged Level: some memory regions and operations inaccessible (optional on Cortex-M0+, unavailable on Cortex-M0) Handler Mode: used when running exceptions Debug State: debugging stated, active when processor is halted by debugger

CORTEX-M0 PROGRAMMING MODEL High-Level Concepts

CORTEX-M0 PROGRAMMING MODEL Core Registers Cortex-M0 utilizes a load-store architecture (data is loaded to register from memory, processed, and then stored back to memory) Cortex-M0/M0+ feature sixteen 32-bit registers, as well as special registers: R0-R7 (low registers): general purpose, accessible by all instructions R8-R12 (high registers): general purpose, accessible by some instructions (such as MOV)

CORTEX-M0 PROGRAMMING MODEL Core Registers (Continued) R13 (SP): Stack pointer, used in push/pop instructions. References two physically distinct pointers: Main Stack Pointer (MSP): default stack pointer Process Stack Pointer (PSP): can be used in Thread mode R14 (LR): link register, contains return address from interrupt, subroutine, or function R15 (PC): program counter, tracks current memory location

CORTEX-M0 PROGRAMMING MODEL Core Registers (Continued) Combined Program Status Registers (xpsr): Provides information about program execution and arithmetic logic unit (ALU) flags. Consists of three separate registers: Application PSR (APSR): contains ALU flags (N,Z,C,V) Negative Flag (N): indicates result of last operation was negative Zero Flag (Z): indicates result of last operation was zero Carry Flag (C): indicates carry out of sign position from previous operation Overflow Flag (V): indicates two s complement overflow condition has occurred

CORTEX-M0 PROGRAMMING MODEL Core Registers (Continued) Combined Program Status Registers (cont.): Interrupt PSR (IPSR): Contains currently executing interrupt ISR number Exception PSR (EPSR): Contains T state bit, indicating Thumb operation (hardcoded to 1 on Cortex-M0/M0+)

CORTEX-M0 PROGRAMMING MODEL Core Registers (Continued) Interrupt Mask Special Register (PRIMASK): Contains a single bit for masking interrupts (masks all interrupts except NMI and HardFault when set). Only accessible using special access instructions (MSR, MRS, CPS) CONTROL Register: 2-bit special purpose register: SPSEL (bit 1): Specifies Handler Mode (0) / Thread Mode (1) npriv (bit 0): Specifies privileged access (0) or unprivileged access (1) (unprivileged access unavailable on Cortex-M0; hardcoded as 0)

CORTEX-M0 PROGRAMMING MODEL Memory System and the Stack ARM Cortex-M processors feature 4GB memory address space, divided into architecturally-defined regions

SOFTWARE TOOLCHAIN The software toolchain for ECE 362 is used to translate source code into µc instructions, and program the hex file generated into the target device Many toolchains are available, but Eclipse + OpenSTM32 have been selected for development in the lab experiments (toolchain setup is the subject of Experiment 0) Eclipse/OpenSTM32 toolchain based on GNU Compiler Collection (gcc)

SOFTWARE TOOLCHAIN Software tools in the Eclipse/OpenSTM32 Toolchain: Toolchain: Combines source files together into executable/linkable format (.elf) file, calls compiler Compiler: Converts C/C++ code to assembly, calls assembler and linker Assembler: Converts assembly code to machine code Linker: Combines assembly code segments together into unified hex file Flash Programming: Writes hex machine code file to onboard memory on target device

SOFTWARE TOOLCHAIN

Questions?