A Freely Congurable Audio-Mixing Engine. M. Rosenthal, M. Klebl, A. Gunzinger, G. Troster

A Freely Congurable Audio-Mixing Engine with Automatic Loadbalancing M. Rosenthal, M. Klebl, A. Gunzinger, G. Troster Electronics Laboratory, Swiss Federal Institute of Technology CH-8092 Zurich, Switzerland March 7, 1995 Abstract The most important design issue for digital audio mixing consoles is the communication concept, that is used to interconnect an array of signal processors. This paper demonstrates the implementation of a digital mixing console that is capable of routing up to 100 audio signalpaths. The audio algorithms are on a modular DSP network. Signalpaths can be edited with a graphical user interface and are automatically mapped on the DSP network with optimal load-balancing. 1 Introduction The typical architecture of a digital audio mixing console is shown in gure 1. The most obvious dierence to an analog mixing console is the clear separation of mixing-desk and mixing-engine. The mixing-engine performs the digital signal processing and is completely controlled by the mixing-desk. A digital mixing console can be described as a network of signal-paths where a large number of digital audio functions are combined. Each function has to process some input signals and produces some output signals. The produced output signals of each function are used as input signals for other functions. This network can be described in a signal ow-graph where every function has at least one input and one output. The sources and drains of the ow-graph are the audio inputs and outputs of the entire mixing console. The mixing desk controls directly the functions by a set of parameters. For example a scale function scales a particular audio signal according to the slider on the mixing-desk. The huge amount of processing power that a digital mixing console demands can only be performed by a certain number of processors. Such an array of processors needs an interprocessor communication with a high bandwidth to provide the data exchange between audio functions. The communication 1

network is often designed directly in hardware because a software controlled communication does not satisfy the strict requirements of speed or uses too much processing power. As a result, the communication network becomes very inexible and changes in the digital data ow of audio signals are dicult. To tackle the lack of routability and communication speed of the interprocessor network this paper shows a new, highly exible communication structure. It has a bandwidth of 800 MBit/sec and allows to communicate up to 512 internal audio channels autonomously without decreasing the processing power. 2 System Requirements As mentioned above, audio functions need to be spreaded among an array of processors. To provide an ecient distribution, a homogeneous architecture of processing elements (PEs) is required. In other words, all PEs have the same capabilities and there is no master processor. Under this condition a load-balancing analysis can be performed in which the needed processing time of each function is measured. The goal is to evaluate the best possible way of spreading the functions among the processing elements and to use only the fewest number of PEs. Such a system works with optimal load-balancing. The next design issue that has to be dealt with arises in nding a way of mapping the signal ow-graph of the functions onto the existing hardware. For a fully routable system, that allows any interconnection between two or more functions running on any processor in the system, the communication structure of the interprocessor network must be orthogonal. This means each processor must be able to access all data produced by all other processors. The last important requirement of the system is that interprocessor communication is independent and not consuming processing power. 3 System Architecture Figure 2 shows the system architecture of the mixing-engine. One major goal of the project was to build a fully scalable system, where the number of PEs can be anything between 1 and 100. This corresponds directly with the fact that the needed processing power varies with the number of functions describing a mixing console. For the reason of scalability the chosen network topology is a ring. Buses and crossbars are other network examples. However a bus may establish only one connection at a time and must be arbitrated. A crossbar of order N may establish N connections at a time. This topology owns the best connection possibilities. But a scalable system with a crossbar network can hardly be realized. Other systems with a ring architecture have been built, like the WARP [1], the iwarp and the RAP [2]. 2

3.1 Communication Concept Every PE has its own communication controller (CC) which is responsible for the data ow on the ring bus. The data that has to be transferred passes every CC in a strictly ordered fashion value by value. Every controller is programmed to insert data from its PE at a certain position of the data stream. It also copies data coming from other PEs out of the data stream and stores it for its PE. This is a special kind of an independent time-division multiplexed (TDM) bus. In order to reduce bandwidth, but still meet the requirement of orthogonality of the interprocessor network, every CC communicates only data that is needed by other processors. Before the communication starts, the CCs have to be congured by the processors. After that the net works completely autonomously. Figure 3 illustrates the overlapping of communication and processing. The CCs synchronizes itself and starts the communication as soon as the rst data values are available. Data transfer and processing is executed simultaneously without slowing down the processors. After the last value has reached the last CC the communication cycle is nished and the processing can start immediately. Therefore processing is synchronized with the end of the communication cycle and not with the master sampling clock. Let S C be the synchronization time of the CC and S P the synchronization time of the PE. Thus the synchronization time over the entire system is maximum of S C and S P. If no CC is involved, like in other implementations, all synchronization is done by the processors. The synchronization time is then the sum of S C and S P. A new cycle begins after the next master sampling clock. Because of the independent communication controller the communication concept is named \Intelligent Communication". 3.2 Global Audio Channels The data that is communicated on the ring bus can be described as a set of global channels. Each channel is a digital audio connection between audio functions on dierent processors. Each processor produces a certain part of these channels depending on which functions are running on this processor. If two functions on the same processor need a connection, there is no need to use a global audio channel. The audio data can be transfered within the memory of the processor. This is equivalent to a local audio connection. The amount of communicated data, limited by the highest possible clock frequency on the ring bus, is at the moment 25 MWord/sec. At a digital audio sampling rate of 48KHz it is possible to communicate up to 512 global channels at 32 bits/word. Supposing an optimal signal ow-graph of a digital mixing console where most of the audio connections can be hold locally and not more than 5 global audio channels are used for a full signal-path, it is possible to route up to 100 audio signal-paths on the system. 3

3.3 Data Input Output Two interface modules provide the data exchange of audio raw-data with external audio resources. One is an AES/EBU interface, the other is a Multi Audio Digital Interface (MADI). With one MADI interface a maximum of 56 digital audio channels can be connected directly to the mixing-engine. To support an ecient data ow it is important to connect the interface modules directly to a CC. This way no processing power is lost at all. Figure 4 shows the topology of the mixing-engine with the I/O features. An interface module can be positioned anywhere between two processors. A system can have several MADI and AES/EBU interfaces. Like processors also the interface modules produce a certain part of the global channels. The CC of each module works autonomously and is liable for the data that the module produces and consumes. 3.4 Parameter Processing The mixing desk needs to control every function that is currently running on the system with a certain set of parameters. Also this information needs to be communicated through the interprocessor network. However parameters are not changing as fast as audio raw-data. Therefore it is not necessary to use one global audio channel for each parameter. In the current implementation 512 parameters are multiplexed on one global audio channel. This corresponds to an update rate of more than 100 times per second per parameter. Only a few global audio channels are applied and no special communication network for parameters has to be implemented. Between two parameter updates, an interpolation of the parameters is done to avoid audible discontinuities. 4 Hardware Implementation The hardware platform of the mixing-engine is the MUSIC Parallel-computer built at the Electronics Lab of the Swiss Federal Institute of Technology [3] [4]. However the communication concept and the operating system was completely redesigned. Figure 5 shows a processing board. Three PEs t on one board (22cm by 23cm) and up to 63 PEs can be connected together in a standard 19 inch rack. A special I/O board gives the possibility to connect a MADI or a AES/EBU module directly to the interprocessor network. The modular design allows to scale the system according to the individual needs. Only the necessary number of PEs and modules are inserted in the system. Therefore hardware overhead can be substantially reduced. One PE consists of a Motorola DSP 96002 oating point digital signal processor, 1 MByte of static RAM and 2 MBytes of dual-ported DRAM (Video RAM) organized in two blocks called \producer" memory and \consumer" memory. Each PE has its own communication controller, which is responsible for the data-ow between the PE and the interprocessor network. The CC is 4

implemented in an FPGA Xilinx XC3190. It fetches data through the serial port of the producer VRAM and writes arriving data into the serial port of the consumer VRAM. The serial buer of the producer and consumer VRAM can store 512 IEEE oating point values of 32 bits. 5 Software The software for the mixing engine is made of three parts: a signal-ow-graph editor, a conguration software and a runtime kernel. Figure 6 shows the three steps for the reconguration of the mixing-engine. Each step corresponds to a separate software module. 5.1 Signal-ow-graph Editor Audio functions are programmed in optimized assembler code. They appear as icons in the signal-ow-graph editor. Figure 7 demonstrates how functions can be placed and connected together. Subgraphs can be dened for later use. For example a complete channel structure can be designed and inserted as a block into the total system. The graphical user interface is running on a UNIX workstation. 5.2 Signal-path Router After placing and connecting the audio functions with the signal-ow-graph editor, the signal-path router congures the mixing-engine according to the designed signal network. In a load-balancing analysis the functions are placed on the processors. For parallel programs with asynchronous data exchange this is a known problem [5] [6]. However a synchronous system like a mixing-engine already has well partitioned functions and the processing time for each processor is x. Important is the optimal load of the processors. In the next step the routing of the signal-ow-graph is performed and mapped on the mixingengine. If a connection between two processors is needed the interprocessor network is applied using one of the global audio channels. If two functions are linked together that run on the same processor a local connection is established. 5.3 Runtime Kernel After booting the system a runtime kernel is working on each PE. It synchronizes all running functions with the communication controller. At a master sampling frequency of 48kHz new audio signals arrive about every 20 s. This time also corresponds to the processing time on each processor when no pipelining of functions is involved. The kernel uses less than 5 % of processing time on each processor. The remaining time is reserved for the audio functions exclusively. 5

5.4 Audio Function Design New audio functions can be included with a minimum of software eort. Using a well dened software interface the user can insert any self-written audio function in the system. The new function can be programmed in C or DSP assembler code. After the integration it is visible in the signal-ow-graph editor and can be placed in any audio signal network. 6 Conclusion This paper describes a communication network which is very exible and still reaches the necessary speed for multi digital audio communication. Recon- guration of signal paths is done easily with automatic load-balancing on all processors, which guarantees an optimal usage of processing resources. Therefore any conguration of a digital mixing console described in a signal-owgraph can be implemented. The presented implementation is the result of a research work and is not a cost eective solution. However the system is fully operational and serves as the platform for an industrial product. References [1] M. Annaratone, E. Arnould, T. Gross, H. T. Kung, M. Lam, O. Menzilicioglu, J. A. Webb. The WARP Computer: Architecture, Implementation and Performance. IEEE Trans. on Computer, Vol. C-36, No. 12, December 1987, pp.1523-1538 [2] N. Morgan, J. Beck, P. Kohn, J. Bilmes, E. Allman, and J. Beer. The rap: A Ring Array Processor for Layered Network Calculations. In International Conference On Application Specic Array Processors. IEEE Computer Society Press, 1990. [3] A. Gunzinger, U. A. Muller, W. Scott, B. Baumle, P. Kohler, W. Guggenbuhl. Architecture and Realization of a Multi Signalprocessor System. In International Conference On Application Specic Array Processors. IEEE Computer Society Press, 1992. [4] U. A. Muller, B. Baumle, P. Kohler, A. Gunzinger, W. Guggenbuhl. Achieving Supercomputer Performance for Neural Net Simulation with an Array of Digital Signal Processors. IEEE Micro, October 1992. [5] Ch. W. Kessler (ed). Automatic Parallelization, new Approaches to Code Generation, Data Distribution, and Performance Prediction. Vieweg, Wiesbaden, Germany, 1994. [6] G. Haring (ed), G. Kotsis (ed). Performance Measurement and Visualization of Parallel Systems. North-Holland, Amsterdam, London, New York, Tokyo, 1993. 6

Mixing Desk Parameter Audio raw-data Input MIXING ENGINE Processed audio data Output Figure 1: Architecture of a Digital Mixing Console Ringbus Controller Controller Controller Controller 1 2 3 n PE 1 PE 2 PE 3 PE n Figure 2: Topology of the scalable mixing-engine End of communication S c Communication Processing Clock Period Clock Period End of processing S p Figure 3: Communication and processing on all PEs run in parallel. After the end of a communication cycle, the processing can start immediately. The synchronization of communication (S C ) and processing (S P )is done separately and can be pipelined. 7

Controller Controller 1 2 Controller 3 Controller 3 Controller n AES/EBU MADI PE 1 MADI PE n Figure 4: Input Output features of the mixing-engine Figure 5: A processing board with 3 PEs 8

Signal Flow Graph Routing Runtime Figure 6: Reconguration of the mixing-engine is done in 3 steps. Figure 7: Signal-ow-graph Editor 9