Efficiency and memory footprint of Xilkernel for the Microblaze soft processor

Similar documents
Zilog Real-Time Kernel

ZiLOG Real-Time Kernel Version 1.2.0

Multiprocessor and Real- Time Scheduling. Chapter 10

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Interrupts Peter Rounce - room 6.18

Real-Time Programming

Course Introduction. Purpose: Objectives: Content: 27 pages 4 questions. Learning Time: 20 minutes

Hardware Design. University of Pannonia Dept. Of Electrical Engineering and Information Systems. MicroBlaze v.8.10 / v.8.20

Interrupts Peter Rounce

Express Logic s ThreadX RTOS for RISC-V

Exam TI2720-C/TI2725-C Embedded Software

GLOSSARY. VisualDSP++ Kernel (VDK) User s Guide B-1

Process Monitoring in Operating System Linux

Multiprocessor and Real-Time Scheduling. Chapter 10

Instruction Set Overview

Designing Embedded AXI Based Direct Memory Access System

Grundlagen Microcontroller Interrupts. Günther Gridling Bettina Weiss

REAL TIME OPERATING SYSTEMS: A COMPLETE OVERVIEW

Tasks. Task Implementation and management

Chapter 4. Enhancing ARM7 architecture by embedding RTOS

Lecture 3: Concurrency & Tasking

Virtex-4 PowerPC Example Design. UG434 (v1.2) January 17, 2008

What s An OS? Cyclic Executive. Interrupts. Advantages Simple implementation Low overhead Very predictable

Context Switch DAVID KALINSKY

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

Micrium µc/os II RTOS Introduction EE J. E. Lumpp

4. Hardware Platform: Real-Time Requirements

Design of a Network Camera with an FPGA

Commercial Real-time Operating Systems An Introduction. Swaminathan Sivasubramanian Dependable Computing & Networking Laboratory

Xinu on the Transputer

PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor

TDDD07 Real-time Systems Lecture 10: Wrapping up & Real-time operating systems

Module 8. Industrial Embedded and Communication Systems. Version 2 EE IIT, Kharagpur 1

Embedded System Curriculum

Avnet S6LX16 Evaluation Board and Maxim DAC/ADC FMC Module Reference Design

real-time kernel documentation

Interrupt/Timer/DMA 1

Tomasz Włostowski Beams Department Controls Group Hardware and Timing Section. Developing hard real-time systems using FPGAs and soft CPU cores


REAL-TIME MULTITASKING KERNEL FOR IBM-BASED MICROCOMPUTERS

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures

Chapter 12. CPU Structure and Function. Yonsei University

8th Slide Set Operating Systems

Spartan-6 and Virtex-6 FPGA Embedded Kit FAQ

Embedded Systems. 5. Operating Systems. Lothar Thiele. Computer Engineering and Networks Laboratory

UNIT- 5. Chapter 12 Processor Structure and Function

High Speed Data Transfer Using FPGA

LogiCORE IP AXI Video Direct Memory Access v4.00.a

Spartan-3 MicroBlaze Sample Project

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.00.a)

The components in the middle are core and the components on the outside are optional.

LogiCORE IP AXI Video Direct Memory Access v5.00.a

Link Service Routines

EE4144: Basic Concepts of Real-Time Operating Systems

Chapter 5 Embedded Soft Core Processors

AUTOBEST: A United AUTOSAR-OS And ARINC 653 Kernel. Alexander Züpke, Marc Bommert, Daniel Lohmann

Introduction to Real-Time Systems and Multitasking. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff

AC OB S. Multi-threaded FW framework (OS) for embedded ARM systems Torsten Jaekel, June 2014

4) In response to the the 8259A sets the highest priority ISR, bit and reset the corresponding IRR bit. The 8259A also places

A hardware operating system kernel for multi-processor systems

Migrating to Cortex-M3 Microcontrollers: an RTOS Perspective

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski

Interrupts in Zynq Systems

Lecture notes Lectures 1 through 5 (up through lecture 5 slide 63) Book Chapters 1-4

OVERVIEW. Last Week: But if frequency of high priority task increases temporarily, system may encounter overload: Today: Slide 1. Slide 3.

Q.1 Explain Computer s Basic Elements

I Introduction to Real-time Applications By Prawat Nagvajara

INPUT/OUTPUT ORGANIZATION

2. Introduction to Software for Embedded Systems

Reference Model and Scheduling Policies for Real-Time Systems

Improvement of the Communication Protocol Conversion Equipment Based on Embedded Multi-MCU and μc/os-ii

Chapter 5: CPU Scheduling

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.01.a)

Software in Embedded Systems. EDAN85: Lecture 4

ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University

Introduction to Computing Systems Terminology Guide

ECE 3055: Final Exam

ReconOS: An RTOS Supporting Hardware and Software Threads

Lecture Topics. Announcements. Today: Uniprocessor Scheduling (Stallings, chapter ) Next: Advanced Scheduling (Stallings, chapter

Dual Processor Reference Design Suite Author: Vasanth Asokan

Software in Embedded Systems

Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved

Main Points of the Computer Organization and System Software Module

2. List the five interrupt pins available in INTR, TRAP, RST 7.5, RST 6.5, RST 5.5.

Measuring Interrupt Latency

Titolo presentazione. Scheduling. sottotitolo A.Y Milano, XX mese 20XX ACSO Tutoring MSc Eng. Michele Zanella

Interrupts and Time. Real-Time Systems, Lecture 5. Martina Maggio 28 January Lund University, Department of Automatic Control

Two Real-Time Operating Systems and Their Scheduling Algorithms: QNX vs. RTLinux

AN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4. Bas Breijer, Filipa Duarte, and Stephan Wong

Operating System Concepts

REAL TIME OPERATING SYSTEM PROGRAMMING-I: VxWorks

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

DSP/BIOS Kernel Scalable, Real-Time Kernel TM. for TMS320 DSPs. Product Bulletin

Interrupts and Time. Interrupts. Content. Real-Time Systems, Lecture 5. External Communication. Interrupts. Interrupts

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Processes and threads

Introduction to the ThreadX Debugger Plugin for the IAR Embedded Workbench C-SPYDebugger

OPERATING SYSTEMS CS3502 Spring Processor Scheduling. Chapter 5

Supporting the Linux Operating System on the MOLEN Processor Prototype

The Embedded Computing Platform

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

Transcription:

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Dariusz Caban, Institute of Informatics, Gliwice, Poland - June 18, 2014 The use of a real-time multitasking kernel simplifies the design process of embedded software, but the kernel requires some portion of system's resources. This paper describes results of research work performed to determine overheads incurred by Xilkernel, a real-time multitasking kernel developed by Xilinx. Introduction An embedded system should react to events in its environment in a predetermined period of time. At modest timing requirements the system software cyclically, in an infinite loop, checks the state of the environment. After the detection of an event it undertakes proper action. The processor can be informed about events by interrupt request signals, which reduces the probability of missing them. It is recommended that the interrupt service routine (ISR) should be executed as soon as possible. Often, in ISR, the event flag is only set, and the event is serviced in the loop. If an occurrence of an event is tested once in the loop, in the worst case the service will start only after serving all the remaining events. When interrupts are used, ISR s execution times must be added. Tests can be made several times in the loop which reduces the waiting time for serving the event. Meeting higher timing requirements can be difficult or impossible by software with such a structure. A solution is then to make use of a real-time multitasking kernel, also called the Real-Time Operating System (RTOS). The system software is divided into tasks of various degrees of importance. Each task is in any of several states. Only tasks ready to run can be executed. The task can become ready to run as a result of an event. The kernel will start this task immediately provided that its priority is higher than the priority of the task running so far [1, 2]. However, the kernel requires some portion of the program and data memory of the system, and CPU time. This paper presents results of the research whose purpose was to determine efficiency and memory footprint of Xilkernel, a real-time multitasking kernel developed by Xilinx [3]. Xilkernel Xilkernel is a kernel for the following embedded processors: Microblaze, PowerPC 405 and PowerPC 440. It is integrated with the Xilinx Platform Studio (XPS) framework and is a free software library that the user gets with the Xilinx Embedded Development Kit (EDK). Library functions are written mainly in the C language. Apart from task management, the kernel also provides typical services which enable tasks to, among others: - use semaphores, mutexes, message queues and shared memory, - use software timers,

- dynamically allocate and free memory buffers, - self-preemption. The user must configure the kernel appropriately. During the configuration there are chosen, among others: scheduling algorithm (priority-driven, round-robin), sizes of queues (ready, wait) and system services which will be used by the application. The XPS framework generates the Xilkernel library with the object code of the selected service functions. It is next merged with the object code of the application into one executable file. Xilkernel provides a POSIX interface to most of the library functions [3 ]. The measurement system The measurement system The measurement system was implemented in Virtex-5 FPGA on the Xilinx ML505 Evaluation Platform [4]. For designing and configuring it the XPS ver. 10.1 was used. The structure of the system is depicted in Fig. 1. Figure 1. Structure of the system in an FPGA circuit The Microblaze is a 32-bit embedded RISC processor soft core, optimized for implementation in FPGAs from Xilinx. A computer system based on Microblaze has a Harvard architecture, with separate address spaces of instruction and data memory. The processor communicates with these memories through separate buses, ILMB (Instruction Local Memory Bus) and DLMB (Data Local Memory Bus), respectively [5]. In the measurement system there is the processor in version 7.10d and the instructions and data are stored in a dual-port BRAM (XPS BRAM). Input-output devices are connected to the processor through PLB (Processor Local Bus). There are two timer/counters (XPS Timer). One of them is a system timer, the other was used to measure time of operations performed by Xilkernel. During measurements impulses of fixed frequency were being counted. The UART device (XPS UART) was used to transmit results of measurements to the computer. Five digital input-output devices (XPS GPIO) were added so that there were eight sources of interrupts in the system. The interrupt controller (XPS INTC) was necessary because the Microblaze processor supports only one external interrupt source. The Microblaze instruction execution is pipelined. The pipeline can be divided into three or five stages, to minimize hardware cost or maximize performance, respectively. For most instructions, each stage takes one clock cycle to complete. The processor in the measurement system was configured to have pipeline with five stages. The operation frequency of the processor was 100 MHz, the frequency of impulses counted during the measurements of time was 125 MHz (8 ns resolution). The efficiency of Xilkernel v4.00.a was investigated. The kernel used priority-driven preemptive scheduling only (it also supports roundrobin scheduling). The following parameters were measured: interrupt latency, task latency and execution time of most important services of Xilkernel being used during normal system operation

[6]. Interrupt latency The response time of the system to events depends among others on the interrupt latency. This term refers to the amount of time that elapses from the appearance of an interrupt request signal to the onset of the corresponding interrupt service routine. It next depends on rules of the interrupt handling. The Microblaze processor supports only one external interrupt source. When an interrupt occurs, the instruction in the execution stage completes and the instruction in the decode stage is replaced by a branch to the address 0x00000010 (for most instructions each stage takes one cycle clock). The return address is saved and future interrupts are disabled. At program memory addresses 0x00000010-0x00000013 jump instruction to the system ISR must be stored. The source code of this ISR is generated by XPS. The system ISR calls user-specified ISR. Of course, if there are multiple interrupt sources it must determine the current source. If a multitasking kernel is not used, upon completion of system ISR the interrupted program is being resumed [7]. Interrupt handling under Xilkernel supervision The interrupt handling is slightly different if the system software works under supervision of Xilkernel. The system ISR saves the context of the interrupted task and then calls the user-specified ISR. The user-specified ISR can release a semaphore or send a message to a queue. A task waiting for a semaphore or message will only then become ready to run. Task scheduling, restoring the context of the selected task and enabling of interrupts are carried out at the end of the system ISR [3]. For measuring interrupt latencies the timer/counter was programmed to work in the Generate mode. This mode is useful for generating repetitive interrupt requests with a specified interval [8]. At the start of a user-specified timer ISR content of the counter register was read. The counter counted up from an initial non-zero value, so the result of the measurement was equal to the difference between the read and initial values. Latencies were measured for highest[1] and lowest interrupt requests, which occurred, when processor executed program normally. The results are presented in Table 1. These values are minimal since no other request was serviced at the occurrence of these ones. The maximum time it took the processor to complete instruction was duration of 3 clock cycles (it didn t execute floating point arithmetic instructions, that requires 4-30 cycles to complete). Interrupt s priority System software without Xilkernel with Xilkernel highest 0.87 0.88 µs 1.39 1.41 µs lowest 1.54 1.56 µs 1.96 1.98 µs Table 1. Interrupt latencies Task latency If the service of an event is time-consuming, the most urgent actions should be done in the interrupt service routine, and the rest in the task [2]. The task can wait for releasing a semaphore or for a

message in a queue. Releasing the semaphore or sending the message by user-specified ISR only makes a waiting task ready to run. Task scheduling is performed at the end of system ISR. Consequently, it was possible to measure task latency, which is a sum of task scheduling and context restoring times. The results are presented in Table 2. The application consisted of only two tasks: the measurement task and the system idle task (with lowest priority, always ready to run). Task activation Task latency by releasing a 9.97 10.16 µs semaphore by sending a message 14.01 14.35 µs Table 2. Task latencies Execution time of Xilkernel service functions Execution times of the kernel s services which can be utilized during the interrupt service were measured and the results are presented in Table 3. There are two modes of message passing: basic and enhanced. The user chooses the mode during the configuration of the kernel. In the basic mode, allocation and freeing space for the messages is made by Xilkernel. For the allocation it uses the bufmalloc() service function which allocates a memory block from a pool. The execution time of this function is short and predictable. When the enhanced mode is chosen, the user must allocate a memory block by the malloc() function. This function is typically slow and its execution time is unpredictable [ 2]. The enhanced mode should not be used if also an ISR can send a message. The time of allocation, given in the table, was obtained when there was only 1 free block in every pool. Kernel s service Execution time releasing a semaphore 3.2 3.39 µs sending a 4-byte message (basic 7.22 7.54 µs sending a 16-byte message (basic memory block allocation from identified pool memory block allocation from any pool (there are 2 pools) 7.19 7.51 µs 1.41 µs 1.49 1.73 µs Table 3. Execution time of kernel s services utilized during interrupt service Task preemption Performing some of the kernel's services can result or not in task preemption. Preemption occurs when, for example, the task tries to take a semaphore already taken or releases a semaphore which a task with higher priority waits for. Execution times of these services were measured for both cases and the results are given in Table 4. Block diagrams of tasks in measurement applications are

presented in figures 2 and 3. Kernel s service Execution time a) b) taking a semaphore 1.36 µs 9.31 9.44 µs releasing a semaphore 1.35 µs 13.08 13.74 µs taking a mutex 1.53 µs 9.41 9.54 µs releasing a mutex 1.34 µs 13.37 13.88 µs sending a 4-byte message (basic 5.31 5.5 µs 21.37 22.07 µs sending a 16-byte message (basic 5.42 5.62 µs 21.32 22.02 µs reading a 4-byte message (basic 5.55 5.74 µs 9.55 9.69 µs reading a 16-byte message (basic 5.53 5.72 µs 9.55 9.69 µs sending a 4-byte message (enhanced 6.26 6.46 µs 9.55 9.69 µs sending a 16-byte message (enhanced 6.26 6.46 µs 9.55 9.69 µs reading a 4-byte message (enhanced 5.54 5.74 µs 22.1 22.94 µs reading a 16-byte message (enhanced 5.54 5.74 µs 22.1 22.8 µs a) without preemption of task b) with preemption of task Table 4. Execution time of kernel s services that can cause task preemption Figure 2. Block diagram of task to measure execution time of kernel s services, they don t cause task preemption

Figure 3. Block diagrams of tasks to measure execution time of semaphore services, they cause tasks preemption Memory block allocation time does not depend on the caller of the service function (ISR or task, see Table 3). Freeing a block lasts 3.18 µs, when the pool is identified. Freeing a block from any pool, if there are 2 pools, lasts 3.18 3.51 µs.

A task can yield a processor to the next task ready to run. It takes 10.76 11.08 µs. Memory footprint of Xilkernel The size of the Xilkernel code depends mainly on which services are utilized by the application. If the application uses semaphores only, the kernel s code occupies about 12 kb. If all services are used, the code size is about 20.5 kb. The size of the required RAM depends on many factors, among others: the number of priority levels, the length of queues of ready and blocked tasks, the number of semaphores and software timers, the length of message queues. Adopting the default values of configuration parameters results in the following size of the required RAM: - about 46.5 kb, if the application uses semaphores only, - about 59 kb, if the application uses all kernel s services and there are 10 blocks of shared memory, 1 kb each. Also, an individual stack is assigned to each task. The default size of the stack is about 1 kb. Summary The use of a multitasking kernel simplifies the design process of system software. The software is divided into tasks responsible for individual portions of work. Each task is assigned a priority that determines its importance. The kernel ensures that more important tasks are performed first. Still, the kernel requires some portion of the system's memory and consumes some of CPU time. The research work was done to determine the efficiency and memory footprint of Xilkernel. This paper presents results of the research. Knowledge of the kernel s efficiency should help a potential developer of an embedded system to estimate whether the system can meet timing requirements. References 1. Kalinsky D.: Context switch. Embedded Systems Programming, February 2001. 2. Simon D.E.: An Embedded Software Primer. Addison-Wesley, 1999. 3. Xilkernel (v4.00.a). www.xilinx.com. 4. Xilinx ML505 Evaluation Platform Documentation. www.xilinx.com. 5. Microblaze Processor Reference Guide. www.xilinx.com. 6. Lamie W., Carbone J.: Measure your RTOS s real-time performance. Embedded Systems Programming, May 2007. 7. Glover P., MacMahon S., Man Shakya D.: Using and Creating Interrupt-Based Systems. Application Note XAPP778, www.xilinx.com. 8. LogiCORE IP XPS Timer/Counter. www.xilinx.com. More about author Dariusz Caban. [1] It is the second interrupt request in terms of the importance if Xilkernel is used; the most important is a request from the system timer