Port of a fixed point MPEG2-AAC encoder on a ARM platform

Port of a fixed point MPEG2-AAC encoder on a ARM platform Romain Pagniez University College Dublin Information Hiding Laboratory Department of Computer Science Belfield, Dublin 4 Ireland http://ihl.ucd.ie/ UCD, August 2004

Overview Introduction Perceptual Audio Coding Fixed Point Elements Development Toolset Implementation Results Summary 1/22

Introduction: hardware encoding MPEG2 AAC: state of the art in perceptual audio compression. It achieves a compression rate about 30% higher than mp3. Compressed stereo audio at 128 kbit/s is indistinguishable from the original 1.4 Mbit/s CD quality. MPEG-like perceptual encoders are quite complex and involve a significant amount of calculations. Computing the algorithm on a dedicated hardware chip could be an efficient solution allowing fast encoding at reduced cost. Hardware encoding research is been conduced in the IHL on that specific subject and particularly highlights fixed point encoding. 2/22

Introduction: main points of my work Having the encoder properly working on a ARM platform. Communication functions: serial port of the board. Ethernet port. Double precision multiplication algorithm. 3/22

Introduction: workflow AAC Specifications Floating Point Software Encoder Keith Cullen (PhD) Fixed Point Software Encoder Keith Cullen (PhD) My project Fixed Point ARM Encoder Fixed Point FPGA Encoder Alexis Guerin (MSc) 4/22

Perceptual Audio Coding: Encoding Digital Audio Input Filterbank Time / Frequency Mapping Bit / Noise Allocation and Coding Bitstream Formatting Encoded Bitstream Psychoacoustic Model Ancillary Data (optional) Filter bank: frequency. divides the input stream into multiple subbands of Psychoacoustic model: simultaneously determines the overall masking threshold for each subband. Allocation block: uses the masking threshold to decide how many bits should be used. Bitstream formatting: multiplexes all the data to be transmitted. 5/22

Perceptual Audio Coding: Threshold in Quiet 100 Threshold in Quiet 80 Sound Pressure Level (db) 60 40 20 0 10 1 10 2 10 3 10 4 Frequency (Hz) 6/22

Perceptual Audio Coding: Threshold in Quiet Amplitude Masking Tone Masking Threshold Masked Tone Frequency 7/22

Perceptual Audio Coding: Masking Level (db) Frequency Time 8/22

Fixed point WL 1 0-1 -2 2 2 2 2 S IWL FWL Fixed point representation virtually places a radix point somewhere in the middle of the digits and uses integer arithmetics. This is equivalent to considering integers of portion of some unit. For example, one might represent 1/100ths of a unit; with 4 decimal digits, 10.82 or 00.01 can be represented. What is the more accurate position for the radix point? 9/22

Fixed point: range and error Range 4-bit Unsigned Step Size [0, 0.46875] 0.03125 [0, 0.93750] 0.06250 [0, 1.87500] 0.12500 [0, 3.75000] 0.25000 [0, 7.50000] 0.50000 [0, 15.00000] 1.00000 [0, 30.00000] 2.00000 10/22

Fixed point: optimal position The optimal position of the radix point must be chosen considering precision and range of data. The goal is to minimize error: overflow must be avoided and precision maximized. Position of the binary point may vary along the encoding process. Simulations must be carried out to determine the optimal position of each binary point. 11/22

Development Toolset: development kit EPXA1 development board: EPXA1F484C device ARM922T 32-bit RISC microprocessor 100 k gates APEX 20KE FPGA 32 Mo RAM 8 Mo flash 100 Mbit Ethernet JTAG header LCD display Quartus II software GNUpro compiler 12/22

Development Toolset: development board 13/22

Development Toolset: architecture LCD Ethernet LEDs ARM AHB PLD Switch etc. Serial Port The PLD is the only interface of the ARM processor. We must configure the FPGA. 14/22

Development Toolset: flash programming Software Source Files MegaWizard Plug-in Manager System Build Descriptor File Quartus II Software Builder quartus_swb Simulator Initialisation Files To Quartus II Simulator or other EDA Simulation Tools From Quartus II Hardware Compiler (full compilation) Slave Binary Image File Flash Programming File 15/22

Implementation: simulation files Simulation program running on a PC Audio data Fixed Point AAC Encoder Simulated fixed point environment AAC Compressed audio file 16/22

Implementation: master slave architecture Master program running on a host PC Audio data Slave fixed point encoder running on the EPXA1 Communication Functions Communication Functions Fixed Point AAC Encoder AAC Compressed audio file 17/22

Implementation: communication sequence From Host PC to Board From Board to Host PC Bit Rate (4 Bytes) Number of channels (4 Bytes) Sample Rate (4 Bytes) Bytes per sample (4 Bytes) Number of samples (4 Bytes) encode (2 Bytes) 1024 audio samples (mono - 1024 * 2 Bytes) or 2048 audio samples (stereo - 2048 * 2 Bytes) Size of Buffer (4 Bytes) Encoded Bitstream (size_of_buffer * 1 Byte) encode (2 Bytes) Size of Buffer (4 Bytes) Encoded Bitstream (size_of_buffer * 1 Byte) STOP (2 Bytes) Size of Buffer (4 Bytes) Encoded Bitstream (size_of_buffer * 1 Byte) 18/22

Implementation: naive vs overlapped protocol Host PC EPXA1 Dev. Board Host PC EPXA1 Dev. Board AAC Encoding AAC Encoding AAC Encoding AAC Encoding AAC Encoding AAC Encoding AAC Encoding Time 19/22

Results: communication vs computations Naive implementation Proportion Real Time Communications 8.63 s 0.411 s/frame 48.6 % 0.056 Computations 8.88 s 0.434 s/frame 51.4 % 0.053 Total 17.51 s 0.845 s/frame 100 % 0.027 Communication overlapped with (faster) computations Proportion Real Time Communications 140 s 0.392 s/frame 100 % 0.059 Computations 93 s 0.260 s/frame 66.3 % 0.089 Total 140 s 0.392 s/frame 100 % 0.059 20/22

Results: overlapped and high speed mult. Encoding time for the communication overlapped with computations encoder (high speed multiplication) at 128 kbit/s. Encoding Time Real Time 86.1 ko 0.48 s 8.30 s 0.415 s/frame 0.058 1.41 Mo 8.45 s 139.3 s 0.385 s/frame 0.063 10.9 Mo 1 min 04 s 1065 s 0.384 s/frame 0.060 21/22

Conclusion The encoder works as expected Communications on the serial port, even if slower the Ethernet, are sufficient to manage the encoding of a complete file 22/22