Commercial Real-time Operating Systems An Introduction Swaminathan Sivasubramanian Dependable Computing & Networking Laboratory swamis@iastate.edu
Outline Introduction RTOS Issues and functionalities LynxOS QNX/Neutrino VRTX VxWorks Spring Kernel Distributed RTOS ARTS MARS
Commercial and Research RTOS Commercial RTOSes different from traditional OS gives more predictability Used in the following areas such as: Embedded Systems or Industrial Control Systems Parallel and Distributed Systems E.g. LynxOS, VxWorks, psos, Spring,ARTS, Maruti, MARS Traditionally these systems can be classified into a Uniprocessor, Multiprocessor or Distributed Real-Time OS
RTOS Issues Real-Time POSIX API standard compliance Whether pre-emptive fixed-priority scheduling is supported Support for standard synchronization primitives Support for light weight real-time threads APIs used for task-handling Scalability Footprint of the kernel how huge is the kernel? Can the kernel be scaled down to fit in the ROM of the system?
RTOS Issues (contd..) Modularity How does the functionalities like I/O, file system, networking services behave? Can they be added at run-time or can they be changed at run-time? Can a new service be added at run-time? Type of RTOS kernel Monolithic kernel less run-time overhead but not extensible Microkernel high run-time overhead but highly extensible
RTOS Issues (contd..) Speed and Efficiency Run-time overhead most of the modern RTOSes are microkernels, but unlike traditional RTOSes they ve less overhead Run-time overhead is decreased by reducing the unnecessary context switch Important timings such as context switch time, interrupt latency, semaphore latency must be minimum System Calls Non preemptable portions of kernel functions necessary for mutual exclusion are highly optimized and made short and deterministic
RTOS Issues (contd..) Interrupt Handling Non preemptable portions of the interrupt handler routines are kept small and deterministic Interrupt handlers are scheduled and executed at appropriate priority Scheduling Type of scheduling supported RMS or EDF Number of priority levels supported 32 to be RT-POSIX compliant; many offer between 128-256 Type of scheduling for equal priority threads FIFO or Round-Robin Can thread priorities be changed at run-time?
RTOS Issues (contd..) Priority Inversion Control Does it support Priority Inheritance or Ceiling protocols for scheduling? Memory Management Can provide virtual-to-physical address mapping Traditionally does not do paging Networking Type of networking supported deterministic network stack or not
Lynx OS Microkernel design Means the kernel footprint is small Only 28 kilobytes in size The small kernel provides essential services in scheduling, interrupt dispatching and synchronization The other services are provided by kernel lightweight service modules, called Kernel Plug-Ins (KPIs) New KPIs can be added to the microkernel and can be configured to support I/O, file systems, TCP/IP, streams and sockets Can function as a multipurpose UNIX OS
Lynx OS (contd..) Here KPIs are multi-threaded, which means each KPI can create as many thread as it want There is no context switch when sending a message to a KPI For example, when a RFS (Request for Service) message is sent to a File System KPI, this does not request a context switch Hence run-time overhead is minimum Further, inter KPI communication incurs minimal overhead with it consuming only very few instructions Lynx OS is a self hosted system wherein development can be done in the same sytem
Lynx OS (contd..) In such a system, there is a need for protecting the OS from such huge memory consuming applications (compilers, debuggers) LynxOS offers memory protection through hardware MMUs Applications make I/O requests to I/O system through system calls Kernel directs I/O request to the device driver Each device driver has an interrupt handler and kernel thread
Lynx OS (contd..) The interrupt handler carries the first step of interrupt handling If it does not complete the processing, it sets an asynchronous trap to the kernel Later, when kernel can respond to the software interrupt, it schedules an instance of the kernel thread to complete the interrupt processing
QNX/ Neutrino SMP RTOS requires high end, networked SMP machines with GBs of physical memory Microkernel design kernel provides essential threads and real-time services Other services are considered as resource managers and can be added or removed at run-time The footprint of microkernel is 12kb.
QNX/ Neutrino (contd..) QNX is a message passing operating system Messages are basic means of interprocess communication among all threads Follows a message based priority tracking feature Messages are delivered at the priority order and the service provider executes at the priority of the highest priority clients waiting for service So, if the highest priority task wants to do read some data from file, the file system resource manager will execute at this task s priority
QNX/ Neutrino (contd..) When a service provider thread wants to provide service, then it creates a channel (for exchanging messages) with its service identifier for identification To get a service from a provider, the client thread attaches it to the provider s channel Within the client, this connection is directly mapped to the file descriptor (so RFS can be sent directly to the file descriptor) QNX messages are blocking unlike POSIX message standards
VRTX VRTX has two multitasking kernels VRTXsa designed for performance Provides priority inheritance, POSIX compliant libraries Supports multiprocessing System calls fully preemptable and deterministic VRTXmc designed for low power consumption Used for cellular phones and hand-held devices Rather than providing optional components provides hooks for extensibility application can add its own system calls
VxWorks Monolithic Kernel Leads to an improved performance with less run-time overhead However the scalability is poor I.e. the footprint of the kernel is affected a little. Provides interfaces specified by RT-POSIX standards in addition to its own APIs Though not a multiprocessor OS, provides sharedmemory objects: shared binary and counting semaphores It has the standard MMU as a modern OS Provides basic virtual-to-physical memory mapping Allows to add new mappings and make portions of memory non cacheable
VxWorks (contd..) When memory boards are added dynamically, to increase the address space for interprocess communication The data is made non cacheable, to ensure cache consistency Reduced Context Switch time Saves only those register windows that are actually in use (on a Sparc) When a task s context is restored, only the relevant register window is restored To increase response time, it saves the register windows in a register cache useful for recurring tasks
Spring Kernel Goal development of dynamic, distributed real-time system System is a network of multiprocessors, each multiprocessor containing one or more processors, I/O subsystems I/O subsystem is a separate entity from Spring kernel, handling non-critical I/O, slow I/O devices and fast sensors Design Principle Segmentation & Reflection
Spring Kernel (contd..) Segmentation dividing resources of the systems into units Size of unit depends on application requirements Helps in determining the resource constraints of online scheduling algorithms Reflection Concept of reasoning its own state and its environments Required for handling situations in highly dynamic environments (where handcrafting is infeasible)
Spring Kernel (contd..) Scheduling consists of 4 modules Process-resident dispatcher simply removes the task from Global System Task Table (GSTT) Local Scheduler (per processor) responsible for locally guaranteeing that a new task can make its deadline and for ordering processor specific tasks in STT Global Scheduler finds a site for execution for any task that cannot be locally guaranteed Meta Level Controller can adapt various parameters by noticing significant changes
Spring Kernel (contd..) Memory Management OS is core-resident No dynamic memory allocation to eliminate large and unpredictable delays (due to page faults and page replacements) Kernel pre-allocates a fixed number of instances of the some of kernel data structures Tasks are accepted dynamically if the necessary data structures are available Inter-Process Communication Mailboxes and communication primitives are used for communication No need for semaphores since mutual exclusion is taken care in scheduling
ARTS - Distributed OS Distributed real-time OS provides a predictable distributed real-time computing environment Distributed computing environment Heterogeneous computing environment Need for global view of the system and resources No over-utilization and under-utilization of a particular system in a distributed system Guaranteeing predictability in such a system is difficult than in multiprocessor system case
ARTS (Contd..) How to synchronize the clocks in a distributed system? Scheduling Integrated time-driven scheduler ITDS scheduler provides an interface between the scheduling policies and the rest of the operating system Allows different scheduling policies to exist (though only one can be used at a time) Communication scheduling Extended RMS for communication scheduling integrating message and processor scheduling
MARS Distributed RTOS Maintainable Real-Time System (MARS) focuses on fault tolerance in distributed RTOS Objective To provide guaranteed timely response under peak load conditions To support real-time testability by breaking up the system into subsystem Time Driven System system initiates activities at pre-determined times Better performance than event driven systems Control signals are based on the physical time, hence in presence of a global physical time no need for control signals across subsystem interfaces
MARS (Contd..) System Architecture MARS application consists of a set of clusters (autonomous subsystems), several components of a cluster connected by a real-time bus Each component runs an identical copy of the operating system Different clusters are connected through an inter-cluster interface, forming a network Cluster consists of Fault-Tolerant Units (FTUs) consisting of replicated components providing redundancy Shadow components update their own internal state and monitor the operation of active components Shadow becomes active, when active fails Each message is also sent twice on real-time bus
MARS (Contd..) Fault Tolerance Addresses both transient and permanent faults Messages have checksums and h/w comp. are self-checking Uses robust storage structures Application software detects errors by executing each task twice (catching transient faults) MARS is fail silent component is turned on detecting first error to avoid fault propagation Upon detection shadow component takes over the final one
MARS (Contd..) Tasks and messages Tasks (periodic and aperiodic) are scheduled by static scheduling schemes Hard real-time tasks are run at specific intervals that are known during system initialization Soft real-time tasks are run at intervals not used by hard real-time tasks Communication through message passing also uses state messages (produced periodically at predetermined times), conveying state of the system To avoid unpredictable delays in CSMA/CD protocols, MARS uses a TDMA protocol to provide collision-free access to Ethernet (atmost one hard RT message for each slot remaining for soft RT messages)
MARS (Contd..) MARS uses only one kind of interrupts periodic clock interrupt. Interaction with peripherals is through polling Scheduling Scheduling done offline Assumes that the running task will yield the CPU at the end of its quantum Task switching is done by major handler every 8 milliseconds Change can be triggered by invoking a system call or receiving an appropriate message.