Design Patterns for Real-Time Computer Music Systems

Similar documents
Real-Time Systems Concepts for Computer Music

Multimedia Systems 2011/2012

Tasks. Task Implementation and management

A Flexible Real-Time Software Synthesis System 1

A Predictable RTOS. Mantis Cheng Department of Computer Science University of Victoria

15-323/ Spring 2019 Homework 5 Due April 18

Real-Time Programming

Virtual Machine Design

UNIX rewritten using C (Dennis Ritchie) UNIX (v7) released (ancestor of most UNIXs).

Semaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; }

Discrete Event Simulation

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski

Introduction to Real-Time Operating Systems

Remaining Contemplation Questions

Module 1. Introduction:

Concurrency: Deadlock and Starvation. Chapter 6

INTERFACING REAL-TIME AUDIO AND FILE I/O

Introduction. Week 10 Concurrency. Roger B. Dannenberg Professor of Computer Science, Art, and Music Carnegie Mellon University

(b) External fragmentation can happen in a virtual memory paging system.

Cs703 Current Midterm papers solved by Naina_Mailk. Q: 1 what are the Characteristics of Real-Time Operating system?

Computer Systems Assignment 4: Scheduling and I/O

Dealing with Issues for Interprocess Communication

Interprocess Communication By: Kaushik Vaghani

Implementing Scheduling Algorithms. Real-Time and Embedded Systems (M) Lecture 9

COMP 3361: Operating Systems 1 Final Exam Winter 2009

Deadlock. Concurrency: Deadlock and Starvation. Reusable Resources

PROJECT 6: PINTOS FILE SYSTEM. CS124 Operating Systems Winter , Lecture 25

Main Points of the Computer Organization and System Software Module

Chapter 5 Concurrency: Mutual Exclusion. and. Synchronization. Operating Systems: Internals. and. Design Principles

Operating System Performance and Large Servers 1

Multiprocessor and Real- Time Scheduling. Chapter 10

Concurrent Preliminaries

Process Management And Synchronization

Operating Systems (234123) Spring (Homework 3 Wet) Homework 3 Wet

Timers 1 / 46. Jiffies. Potent and Evil Magic

Analyzing Real-Time Systems

Context Switch DAVID KALINSKY

Midterm Exam. October 20th, Thursday NSC

What s An OS? Cyclic Executive. Interrupts. Advantages Simple implementation Low overhead Very predictable

Precedence Graphs Revisited (Again)

Multiprocessor Systems. Chapter 8, 8.1

Last Class: Monitors. Real-world Examples

SYNCHRONIZATION M O D E R N O P E R A T I N G S Y S T E M S R E A D 2. 3 E X C E P T A N D S P R I N G 2018

Concurrency: Deadlock and Starvation

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s)

Operating Systems. Process scheduling. Thomas Ropars.

Data Processing on Modern Hardware

Multimedia-Systems. Operating Systems. Prof. Dr.-Ing. Ralf Steinmetz Prof. Dr. rer. nat. Max Mühlhäuser Prof. Dr.-Ing. Wolfgang Effelsberg

Lecture 17: Threads and Scheduling. Thursday, 05 Nov 2009

To Everyone... iii To Educators... v To Students... vi Acknowledgments... vii Final Words... ix References... x. 1 ADialogueontheBook 1

Concurrent & Distributed Systems Supervision Exercises

Scheduling Mar. 19, 2018

Multiprocessors 2007/2008

A Comparison of Streams and Time Advance As Paradigms for Multimedia Systems. Abstract

Real-time Support in Operating Systems

Multiprocessor and Real-Time Scheduling. Chapter 10

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

Virtual Memory Outline

Operating Systems. Synchronization

Locking: A necessary evil? Lecture 10: Avoiding Locks. Priority Inversion. Deadlock

Computer Science 4500 Operating Systems

[08] IO SUBSYSTEM 1. 1

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001

CS 153 Design of Operating Systems Winter 2016

Lecture 10: Avoiding Locks

2. Which of the following resources is not one which can result in deadlocking processes? a. a disk file b. a semaphore c. the central processor (CPU)

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

CSE 120 PRACTICE FINAL EXAM, WINTER 2013

Distributed File Systems Issues. NFS (Network File System) AFS: Namespace. The Andrew File System (AFS) Operating Systems 11/19/2012 CSC 256/456 1

Zilog Real-Time Kernel

Multi-core Architecture and Programming

Disruptor Using High Performance, Low Latency Technology in the CERN Control System

Multiprocessor Systems. COMP s1

CS533 Concepts of Operating Systems. Jonathan Walpole

Chapter 5 Concurrency: Mutual Exclusion and Synchronization

Following are a few basic questions that cover the essentials of OS:

15: OS Scheduling and Buffering

Processes. CS 475, Spring 2018 Concurrent & Distributed Systems

2. The shared resource(s) in the dining philosophers problem is(are) a. forks. b. food. c. seats at a circular table.

Reference Model and Scheduling Policies for Real-Time Systems

Real-time operating systems and scheduling

Introduction to Embedded Systems

Review for Midterm. Starring Ari and Tyler

supernova - A Multiprocessor Aware Real-Time Audio Synthesis Audio Synthesis Engine For SuperCollider

Department of CSIT ( G G University, Bilaspur ) Model Answer 2013 (Even Semester) - AR-7307

2 Introduction to Processes

Concurrency Race Conditions and Deadlocks

Reminder from last time

Experience with Processes and Monitors in Mesa. Arvind Krishnamurthy

The control of I/O devices is a major concern for OS designers

2 Threads vs. Processes

Module 20: Multi-core Computing Multi-processor Scheduling Lecture 39: Multi-processor Scheduling. The Lecture Contains: User Control.

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

1995 Paper 10 Question 7

Capriccio : Scalable Threads for Internet Services

The Google File System

CSE398: Network Systems Design

Operating System Support

Real-time & Embedded Systems Workshop July 2007 Building Successful Real-time Distributed Systems in Java

Transcription:

Design Patterns for Real-Time Computer Music Systems Roger B. Dannenberg and Ross Bencina 4 September 2005 This document contains a set of design patterns for real time systems, particularly for computer music systems. We see these patterns often because the problems that they solve come up again and again. Hopefully, these patterns will serve a more than just a set of canned solutions. It is perhaps even more important to understand the underlying problems, which often have subtle aspects and ramifications. By describing these patterns, we have tried to capture the problems, solutions, and a way of thinking about real-time systems design. We welcome your comments and questions.

Static Priority Scheduling Also known as Deadline Monotonic Scheduling. Real-time systems often have a mix of tasks. Tasks may have hard deadlines, as in sample buffer computation, or strongly desirable upper limits on latency, as in interactive MIDI processing. If this allowable time-to-completion for some task is less than the execution time of some other task, then the long-running task must be preempted so that the low-latency task can meet its deadline. Organize the program so that deadlines are met and latencies are acceptable, using relatively few processes or threads. When there is coordination and communication between tasks (e.g. shared variables, memory, and data structures), it is much easier for the programmer if these tasks do not preempt one another they can then run in a single thread. Design is a tradeoff between letting tasks share threads, to simplify coordination, and allocating tasks to separate threads, to allow fine-grain scheduling. Minimize the number of threads, typically using 2 or 3. Divide tasks according to their latency requirements, i.e. the time from when the task can run to its completion deadline. Low-latency tasks are assigned to the highest-priority thread, and the highest-latency threads are assigned to the lowest-priority thread, etc. The threads are scheduled according to fixed priorities. Analysis can sometimes be used to evaluate solutions before they are implemented. Essentially, the worst-case latency at the highest priority thread is just the sum of all task run times (assuming tasks will not run again until other tasks have finished). The latency of the thread at the next lower priority level is also the sum of the run times for tasks running in this thread, but in addition, the high priority thread steals time and this must be accounted for. Continuing until worst-case latencies are estimated for all threads, one can then decide whether the latencies are small enough to satisfy all tasks. Ken Tindell, Deadline Monotonic Analysis, in http://www.embedded.com/2000/0006/0006feat1.htm. J. Y. T. Leung, J. Whitehead, On the Complexity of Fixed Priority Scheduling of Periodic, Real-Time Tasks, Performance Evaluation, 2(4), pages 237-250, 1989. 2 2005, Roger B. Dannenberg and Ross Bencina

Real Time Memory Management Also known as non-blocking memory management or deterministic memory management. When programming real-time code to run on a general purpose operating system, such as to generate audio in a high-priority callback, it is not advisable (or sometimes not even possible) to call operating system functions, including those which allocate memory, because it is impossible to predict the time which may be taken by such calls (which could acquire locks, cause priority inversion, or page faults). Yet often in real-time music software objects need to be allocated dynamically. Allocate memory for program objects using methods which avoid calling the operating system to allocated memory in real-time code. How well is the allocation behavior of these objects understood? How many different types (or size classes) of objects are needed? Is a fully generalised allocation mechanism necessary, or does the allocation behavior allow a simpler solution? How are allocated objects shared between threads of execution? s Depending on the type of allocation patterns which need to be supported there are a number of practical options, many can be combined or used in different parts of the same application. Fixed Up-front allocations with a free list If the maximum number of required objects is known, arrays of these objects can be preallocated, and a linked list of free objects can be maintained for fast allocation and deallocation. Size-segregated free lists An extension of the previous pattern is to keep free lists for separate size classes, instead of distinct object types. Per-thread memory pools One way to avoid locks is to only use memory within a specific thread context and write a custom allocator which allocates memory from a large preallocated block. This avoids calling the OS, but the memory allocation algorithm should be chosen carefully if deterministic timing is important. 3 2005, Roger B. Dannenberg and Ross Bencina

Allocate in a non-real-time context Don't allocate any memory in the real-time context, instead allocated memory in a nonreal-time thread and send it using an asynchronous message queue. Real-time Garbage Collection Write a garbage collector for objects which are used in your real-time thread. An incremental collector can be coded to provide deterministic timing behavior by interrupting collection if its timing allocation is exceeded. Examples SuperCollider Language and Serpent both use real-time garbage collectors to manage dynamic memory. Aura and AudioMulch use per-thread size-segregated memory pools. AudioMulch allocates in a non-real-time context for many types of dynamic objects, especially large ones. Fixed up-front allocation is often used for things like synthesizer voices, where the maximum polyphony is known up-front. See the Memory Patterns chapter of Real-Time Design Patterns: Robust Scalable Architecture for Real-Time Systems by Bruce Powel Douglass. Addison Wesley Professional, 2002. or the online extract at: http://www.awprofessional.com/articles/article.asp?p=30309 4 2005, Roger B. Dannenberg and Ross Bencina

Communication and Synchronization with Lock-Free Queues Also known as asynchronous message passing. On general purpose operating systems it is not usually possible to depend on synchronization primitives such as mutexes (locks) to provide deterministic timing behavior since these systems are often optimized for high-throughput rather than lowlatency. Exchange data or events safely between multiple threads of execution without using operating system synchronization primitives such as mutexes or semaphores. Polling can (sometimes) add latency and overhead compared to implementations based on synchronization primitives (but suitable real-time synchronization primitives may not exist and tend to be less portable). Single-reader, single-writer queues are simple, but may be more wasteful of resources than more sophisticated lock-free schemes. Use lock free queues to queue messages or data between threads. The queues can contain things such as data samples, message records, variable length messages, or pointers to other data structures. Such queues need to be polled periodically to ensure that they don't overflow. If a queue has data in it a semaphore (pthreads condition variable) may be used to signal this state to a non-real-time thread, however a real-time thread should not normally block waiting for a queue. By virtue of the algorithms used, access to these queues is lock-free. Examples Supercollider 3 server uses lock-free queues for communication between the real-time audio callback and non-real-time threads such as the OSC network communications thread. AudioMulch uses lock-free queues for most communication between real-time audio and other threads such as those for the GUI, incoming MIDI callbacks, disk i/o and sample loading. Jsyn uses a lock free ring-buffer for communications to the audio thread, and a lock-free linked list queue for communications from the audio thread. Aura uses lock-free queues between zones. PortMIDI includes an example using a lock-free queue to communicate between the main thread and a high-priority MIDI thread. Some Notes on Lock-Free and Wait-Free Algorithms: http://www.audiomulch.com/~rossb/code/lockfree/ 5 2005, Roger B. Dannenberg and Ross Bencina

Accurate Timing with Timestamps This pattern is closely tied to discrete event simulation. Many musical sequences and computations give rise to a set of events with precise timings. The simple timing approach, which is something like A(); sleep(5); B(); sleep(3); C(); sleep(7); will accumulate error due to finite computation speed and system latencies. We assume that tasks run when methods or functions are called by a scheduler and that tasks do not suspend or sleep. To deliver accurately timed outputs. Momentarily high CPU load should at most cause momentary degradation of the output, but no long-term errors. Sometimes, it is desirable to let time slip when systems encounter too much load. In other words, once a system falls behind, it may be better to remain behind than to catch up. Tasks always compute the ideal time at which they should run. The scheduler remembers the time at which the task requested to wake up. When the task wakes up, it operates as if this ideal time is the true real time. If future events are scheduled relative to this one, they will be scheduled relative to the ideal time rather than real time, insuring accurate timing. For example, if every time a task runs, it schedules itself to run again in 0.1 seconds, then it will compute the ideal sequence of wake-up times 0.1, 0.2, 0.3, etc. with no error due to finite CPU speed. Anderson, D. P. and Kuivila, R. 1990. A system for computer music performance. ACM Trans. Comput. Syst. 8, 1 (Feb. 1990), 56-82. 6 2005, Roger B. Dannenberg and Ross Bencina

Event Buffers Often discussed in the context of jitter, and this could be considered a special case of Accurate Timing with Timestamps. Also called Action Buffers. Sometimes, data cannot be computed accurately at the desired time. This is usually caused by latency when the operating system fails to run a process immediately. It may also be caused when the application has too much to do, perhaps because work comes in bursts. In some of these cases, data can be computed early. In other words, in some cases, it is better to reduce timing deviations, called jitter, than to reduce overall timing delay, or latency. Deliver output with timing that is more accurate than can be achieved by the task that computes the output. This design relies on the ability to pre-compute data, which means that latency will suffer in order to reduce jitter. Sometimes, it is better to just minimize latency. As latency approaches zero, jitter also approaches zero, so sometimes this pattern has no advantage. This pattern will be most useful when the computation time for output data is significant and exhibits a lot of variation or burstiness. Compute both data and the time at which the data should be delivered. Typically, the data is computed using the Accurate Timing with Timestamps pattern, and the ideal time of the computation is used as the time at which the data should be delivered. However, since the computation will often be somewhat late, a constant time offset is added to the ideal time to form an output time stamp. The data is then transferred through a buffer to another process (or device driver or hardware) running at higher priority and better able to output data accurately according to the timestamp. A trivial example is prefetching a MIDI sequence from disk so that it can be output with lower latency than disk read operations would otherwise allow. Another common example is the use of timestamps on MIDI data which is delivered to a device driver. D. P. Anderson and G. Homsy. A Continuous Media I/O Server and its Synchronization Mechanism. IEEE Computer, 24(10):51-57, October 1991. Roger B. Dannenberg, Thomas P. Neuendorffer, Joseph M. Newcomer, Dean Rubine, David B. Anderson: Tactus: Toolkit-Level Support for Synchronized Interactive Multimedia. Multimedia Syst. 1(2): 77-86 (1993) 7 2005, Roger B. Dannenberg and Ross Bencina

Synchronous Dataflow Graph Also known as synthesis graph, Signal Graph (PD), Filter Graph (DirectShow), Ugen Graph (SuperCollider) Multiple independent signal generation and processing modules (unit generators) need to be dynamically interconnected at program runtime, such as when loading a new configuration file (i.e. an orchestra) or in response to dynamic interconnection requests from the user interface. The nature of the interconnections is not limited to sequential or parallel topologies. Unit generators have data dependencies: The inputs to a unit generator must be computed before the unit generator runs. The many interconnections and their implied dependencies require that special attention be paid to the order of unit generator execution. Some systems are static, i.e. the graph does not change at run-time. Other systems are dynamic, allowing the graph to be modified interactively. Organise the modules in a directed graph which indicates the signal flow between modules. By traversing backwards through the graph it is possible to establish the data dependencies, and hence which modules need to be allocated first. By applying graphcoloring techniques from the compiler literature it is possible to minimize the number of buffers necessary to communicate intermediate data between modules. Two major variations of this technique exist: one is to traverse the graph once to determine evaluation order, and cache this ordering in a secondary structure. The other alternative is to traverse the graph directly. Each alternative has its own benefits. For example, although direct graph traversal is simpler, caching the execution order means that the graph may be altered without needing to protect the execution list from concurrent access. Read the compiler literature on graph coloring. 8 2005, Roger B. Dannenberg and Ross Bencina