A review of lossless audio compression standards and algorithms

Size: px

Start display at page:

Download "A review of lossless audio compression standards and algorithms"

Rodney Morris
6 years ago
Views:

1 A review of lossless audio compression standards and algorithms Fathiah Abdul Muin, Teddy Surya Gunawan, Mira Kartiwi, and Elsheikh M. A. Elsheikh Citation: AIP Conference Proceedings 1883, (2017); View online: View Table of Contents: Published by the American Institute of Physics Articles you may be interested in Outage analysis of terrestrial free space optical (FSO) link using measured visibility data in Malaysia AIP Conference Proceedings 1883, (2017); / FILTSoft: A computational tool for microstrip planar filter design AIP Conference Proceedings 1883, (2017); / Strategies for a better performance of RPL under mobility in wireless sensor networks AIP Conference Proceedings 1883, (2017); / A survey of various enhancement techniques for square rings antennas AIP Conference Proceedings 1883, (2017); / The influence of dispersion slope to the higher-order dispersion coefficients for highly-nonlinear fibers AIP Conference Proceedings 1883, (2017); / Real-time bus location monitoring using Arduino AIP Conference Proceedings 1883, (2017); /

2 A Review of Lossless Audio Compression Standards and Algorithms Fathiah Abdul Muin 1, a), Teddy Surya Gunawan 1, b), Mira Kartiwi 2 and Elsheikh M.A. Elsheikh 1 1 Department of Electrical and Computer Engineering, Kulliyyah of Engineering, International Islamic University Malaysia, Jalan Gombak, Kuala Lumpur, Selangor, Malaysia 2 Department of Information Systems, Kulliyyah of Engineering, International Islamic University Malaysia, Jalan Gombak, 5310 Kuala Lumpur, Malaysia a) Corresponding author: fathiah_muin@hotmail.com b) tsgunawan@iium.edu.my Abstract. Over the years, lossless audio compression has gained popularity as researchers and businesses has become more aware of the need for better quality and higher storage demand. This paper will analyse various lossless audio coding algorithm and standards that are used and available in the market focusing on Linear Predictive Coding (LPC) specifically due to its popularity and robustness in audio compression, nevertheless other prediction methods are compared to verify this. Advanced representation of LPC such as LSP decomposition techniques are also discussed within this paper. INTRODUCTION There are many types of lossless audio compression mechanism introduced over the years, typically many standards and popular non-standard compressors uses Linear Prediction Coding as its core due to various reasons which will be explained in later sections. The basic operation in most lossless compression algorithms (Encoder) is shown in Fig. 1 [1] where, the intrachannel decorrelation block contains the Prediction model, inclusive of the quantization operation for each of the channel of the audio signal. This prediction model is the lossy part of the whole operation as there is always prediction error. However, Entropy Coding Block compensates the lossy part to become lossless again by encoding the error residue of the Predictor Model. In other words, the Entropy Coding Block will add the encoded error residue with the product of the Predictor Model to form a lossless audio compression scheme. FIGURE 1. General lossless audio encoder [1] In lossless audio compression, the predictor model acts as a lossy decoder to reconstruct the original signal as close as possible so that it can minimize the residual error. The smaller the error, the more efficient the entropy coding is and the smaller the compressed signal as it will contain more zeroes in its error residual [2]. A general predictor structure with entropy coding is described in Fig. 2, in which the output of the predictor is the error residual and is encoded along with the calculated predictor coefficients. Advances in Electrical and Electronic Engineering: From Theory to Applications AIP Conf. Proc. 1883, ; doi: / Published by AIP Publishing /$

3 FIGURE 2.General predictor structure with entropy coding in encoder In the early introduction of MPEG-4 ALS, Linear predictive coding mode was proposed as its main predictive model. This was the most commonly used model for lossless audio compression at the time. The concept of linear predictive coding is that, the speech sample, can be estimated as a linear combination of its past samples, ) [2]. Eq. (1) represents the prediction sample,. (1) The disadvantage of this model was that, it requires the predictor coefficient to be encoded together with the output bit stream, causing some overhead in the compression ratio of the audio data file as the accuracy of the predictor coefficients depends on the order. However, the higher the order, the more overhead there will be. (2) Apart from typical LPC coefficients ( ), there are other advanced representation of LPC for speech analysis, such as Log Area Ratios (LAR), line spectral pairs (LSP) decomposition and reflection coefficients. Comparatively, LSP decomposition ensures stability of the predictor, and spectral errors are local for small coefficient deviations. The LSP decomposition will assume that the prediction filter is made of a symmetric and non-symmetric polynomial as used by the IEEE standard, shown in Eq. (3), where is a symmetric polynomial and is an antisymmetric polynomial of a quantized vector [3]. (3) The objective of this paper is to provide a systematic review of lossless audio compression standards and algorithms. Furthermore, the application of the lossless audio compression will be elaborated. The rest of the paper is organized as follows: Section 2 will explain the related works on lossless audio compression, then Section 3 is the discussion followed by different applications of each standards and algorithm in Section 4. Specifically, Section 2 is divided into Subsection, where section 2.1 will elaborate on the various lossless MPEG-4 standards, then Section 2.2 on Free Lossless Audio Codec (FLAC) and Section 2.3 will be on the IEEE standard released in Section 2.4 will on various enhanced methods proposed by different researchers based on these standards and algorithms, but were not included in the market tools itself. Lastly, Section 5 concludes the paper

4 LOSSLESS AUDIO CODING MPEG-4 STANDARDS From Lossy to Lossless Audio Compression, one of the major players of audio compressor available in the market is MPEG-4. Their most popular lossy compressor being AAC (Advanced Audio Coding). Many research and study were invested into the development of MPEG-4 Audio Lossless standards, which are ALS LPC (Audio Lossless Coding Linear Predictive Coding) [4], RLS-LMS (Audio Recursive Least Mean Square-Least Mean Square) [5], and SLS (Scalable Lossless Coding) [6] which will be discussed in this section. MPEG-4 Audio Lossless Coding The working draft for MPEG-4 ALS standardization started from 2003 and was finalized in It has been found that MPEG-4 ALS compression ratio and decoding speed was better than Monkey s Audio during its release in 2003 [4]. The MPEG-4 ALS encoder consists of five blocks, which are illustrated in Fig. 3. The description of these blocks is derived from the overview explained previously: Buffer: The audio frame. Coefficients Estimation and Quantization, and Predictor block: The intrachannel decorrelation. Entropy Coding: Encodes the error residual using Golomb-Rice Code. Multiplexer: Outputs the compressed signal along with the necessary header. To reconstruct the output bitstream of the Encoder, the corresponding Decoder, will consists of a demultiplexer, which will decompose the decoded signal and re-encode the Residual and code indices, as well as the quantized coefficients. FIGURE 3. MPEG-4 lossless audio encoder [2] In the finalized version of its standardization in 2009, the MPEG-4 ALS compression ratio was better than FLAC, but slower in decoding speed [2]. In the finalized version, the predictor was further divided into two parts, a shortterm predictor block and a long-term predictor block as well as an adaptive predictor order mechanism to determine the optimum predictor order. The long-term predictor block and adaptive predictor order would further minimize the error residual by finding correlation between each frame residual error, but compensates on the encoding speed, due its additional complexity. Eq. (4) shows the calculation to determine the long-term residual error,, where is the gain and is the lag

5 (4) RLS (Recursive Least Mean Square)-LMS (Least Mean Square) mode RLS-LMS, is a derivative mode of MPEG-4 ALS. As an alternative to LPC models, where many coefficients were multiplexed into the encoded signal, LMS were investigated due to its low complexity. However, LMS by itself has low convergence properties lead to poor prediction gain, thus RLS models were integrated to compensate for this property, as RLS by itself is infeasible due to its high complexity. Reaching a balance of both properties, it was possible to yield the optimal predictor of this method as described in [7]. FIGURE 4. Structure RLS-LMS predictor [7][8] Unlike the LPC mode, the RLS-LMS mode does not require the predictor coefficients to be coded together, as shown in Fig. 4 and Fig. 5. This is because the decoder containing the same predictor filter as the encoder however resulting high computational requirement of the RLS and long LMS filters show that the RLS-LMS prediction is slower than ALS LPC mode [5], although the compression ratio is around 1% higher [7,8] for conventional music files (sampled at 44.1 khz). Nevertheless, RLS-LMS reflected better compression ratio for higher resolution audio (sampled at 96 khz) [5]. FIGURE 5. Structure of ALS encoder (top) and decoder (bottom) [5]

6 Despite this impressive compression ratio, however, the variance of the deviation increases for correlated signal and signal with large variance, leading to the degraded performance of the RLS algorithm to white signals or signal with smaller variance [7]. As well as this higher order LMS causes the decoder to be more complex and thus slower in terms of decoding and encoding speed [2]. (5) (6) From Eq. (5)-(7), the RLS-LMS filters differs in calculation as the error residual are calculated from the difference of the previous residual and output prediction of k-th stage. The final prediction,, is the weighted output predictions of the stages linearly added together [5]. MPEG-4 SLS (Scalable Lossless Coding) Unlike the LPC models, this model can choose to utilizes lossy MPEG-4 Advanced Audio Coder (AAC) structure, which provides high quality perceptual coding using Modified Discrete Cosine Transform (MDCT) [6]. The lossless extension of SLS, analyses the audio signal in blocks of Integer Modified Discrete Cosine Transform (IntMDCT) [9]. In the same way residual error is outputted from the Adaptive LPC model, the residual error is also able to be outputted from the SLS model and the resulting quantized signal are decoded by an arithmetic coding scheme, Bit-Plane Golomb Coding (BPGC) or Context Arithmetic Coding (CBAC)[6]. FIGURE 6. Structure of SLS decoder (top) and encoder (bottom) [6] Due to its scalability from MPEG-4 AAC, the approach was adapted into the MPEG-4 Audio Standard tool [10]. Differing from LPC models, this method transforms the audio signal to the frequency domain by IntMDCT, which are coded with a two-layer structure, as shown in Fig. 6. The core AAC layer and a lossless layer that also preserve the spectral shape of the quantization noise of the core AAC bitstream at intermediate rates. This results in an increase of noise-to-mask ratio, allowing for optimal quality in terms of perceptual quality at these operating bit-rates

7 FLAC FLAC stands for Free Lossless Audio Coding, originating from the early Due to its public availability, the codec is still being enhanced by various developers over the years. In order to compress audio, it uses a four stage method shown in Fig.7 [11]. FIGURE 7. Structure of FLAC encoder The blocking stage divides the audio signal into blocks or portions of a specified size. This directly affects the compression ratio, if the block is too small, the total number of blocks will increase, wasting bits on encoding headers, vice versa. The inter-channel decorrelation stage (or mid-side conversion) is performed by removing redundancy in the stereo signals: left and right channels. By encoding the left and right channels into a middle channel (left and right average) and a side channel (left minus right), the number of bits needed to store the signal can be reduced. In cases where the left and right channels are very different, it can be passed without any decorrelation. The prediction stage is essential for providing good compression and can change from block to block audio samples [11]. The advantage of this method is the interchangeable prediction method from each audio blocks and sub block to determine the complexity of the arithmetic computational according to the synthesis of the current signal whic allows faster decoding speed. However, additional overhead is required in which compression ratio will be lower. A comparison of FLAC with MPEG-ALS show that its adaptive nature whereby the compression ratio, encoding speed and decoding speed improves relative to wav file size. By disabling the MD5, it can be argued that FLAC is superior in terms of compression ratio and speed compared to all other lossless audio codec, whereby even with fast pre-set it suffers up to 25% increase in speed due to MD5 check summing feature [12]. IEEE STANDARD In 2013, the finalized standard for IEEE Advanced Audio Coding was introduced. The standard is a combination of linear predictive coding and a new introduced entropy algorithm [3]. This section will discuss some main components of this standard. General Block Diagram Figure 8 illustrates the General IEEE lossless audio Encoder/Decoder structure. Differing from the typical LPC model, the IEEE standard utilizes Integer Wavelet Transform to find its corresponding detail component and reconstructs this component and pre-processed stage where the signal is flattened and subsequently coded by the entropy coder inter lossless bit stream. The reversed process of de-flattened (post-processor) is also described to reconstruct a decoded signal in a lossless manner which is the exact replication of its original form. One important point is that the system is using fixed point instead of floating point which contributes to the fast decoding speed

8 FIGURE 8. IEEE encoder (top) and decoder (bottom) [3], [14] IEEE Predictor Figure 9 shows the various components of the predictor model. From the Integer Lifting wavelet Transform block, only the high frequency audio input sample frame are inputted into the predictor module, in which Linear Predictive Coding (LPC) is then performed on each frame, via LSP decomposition [3]. Then the partial-correlation (PARCOR) coefficients are computed through the Levinson-Durbin algorithm [13]. The PARCOR coefficients are quantized and sent in the lossless bit stream. Additionally, the quantized PARCOR coefficients are also used to locally de-quantized and convert to the LPC coefficients which re-generates a prediction for each sample in the frame (), thus allowing computation of the residual error () by subtracting the predicted sample with the input signal (). FIGURE 9: IEEE predictor overview [3] Eq. (8) is used to obtain the quantized PARCOR coefficients

9 (8) Then even in the encoder itself, we also need to de-quantize the PARCOR coefficients like a decoder in the encoder to generate a prediction signal for the linear predictor block, Eq. (9) describes the de-quantization of the PARCOR coefficients. Eq. (10) is used to obtain the prediction matrix which is substituted from the audio frame to get the prediction residue for encoding: 9) (10 FIGURE 10. IEEE predictor basic block [14] The performance of this predictor has shown that it is slightly faster in terms encoding and decoding speed to other available lossless compression tool with LPC models like MPEG-4 ALS and FLAC. However, it is still slower compared to MPEG-4 SLS which uses an audio synthesis perception model for its predictor block, although MPEG- 4 SLS has a slightly lower compression ratio [14]

10 ENHANCED MECHANISMS This section will mention the research done based on previously established standards or algorithms resulting in new ideas for predictor designs. Sparse Linear Predictor This technique replaces LPC predictive models with sparse predictive model which utilizes minimum description length approach and is approximated by a greedy algorithm that can improve the problems faced by least square partial used in LPC models. The objective of this technique is to improve decoding speed performance and compression ratio when compared widely existing non-sparse models of lossless audio compression encoder and decoders [15]. The Audio Compression tool created is an enhanced version on the existing popular lossless audio tool OptimFrog which was created by the one of the author of the paper, Florin Ghodi in 1996 [16]. The technique optimizes multichannel audio specifically, where the main channel is using a typical LPC model, with its reference channel or second channel is used to reduce the error of the main channel. The equation of this model is described below, where and are the main and second channel,, whilst and describes the regression coefficient,. (11) The test result of this model show that it can achieve a better compression ratio compared to MPEG-4 ALS (RLS- LMS) algorithm although lower than OptimFrog and MPEG-4 ALS (RLS-LMS) using the best compression settings, but the decoding speed are better comparatively at this setting [15]. Cascaded OLS-NLMS based on MPEG-4 RLS-LMS Following the numerical stability problem of RLS algorithm in MPEG-4 RLS-LMS, Ordinary Least Square (OLS) algorithm replacement can minimize this problem as well as exploit higher computational complexity by increasing the LMS filter degree, and using Normalized LMS to avoid slow convergence [17]. Though previous study has shown that higher order of LMS can degrade the decoder performance [2]. FIGURE 11. Cascaded OLS-NLMS predictor structure Figure 9 shows the OLS-NLMS filter, which outputs the residual from OLS filter, instead of RLS filter, as well as this, a Cascaded filter coefficient Cmix which will determine the learning parameter for stochastic gradient descent algorithm for the NLMS three stage filter [17]. Enhanced MPEG-4 SLS Originally MPEG-4 standard uses Laplacian distributed input data for the BPGC calculation, an enhanced model of this would be to assume a generalized Gaussian distribution input data for the BPGC calculation. The advantage of this enhancement would be better compression ratio with the same computational complexity of the original scheme, with at least 0.01% improvement compared to the original scheme [18]

11 Lossless Audio Coding using CELP In order to analyse other models of prediction other than LPC, Code Excited Linear Predictor (CELP) is proposed, without the use of lossless arithmetic decorrelator to reduce the complexity of the arithmetic computation [19]. CELP, utilizes a large collection of stochastic codebook to determine its linear predictive coefficient and the same as LPC predictor, if residual error exists it will be encoded with the coefficient. Although the compression ratio is generally high, it is inconclusive on whether it s decoding and encoding speed is better than other than other algorithms other than MPEG-4 ALS (RLS-LMS). DISCUSSION Table 1 describes the comparison for all the various audio lossless compression methods with each strength and weakness. From the table, it can be concluded that FLAC is superior for larger sized wave files and is still popular due to open source code nature compared to MPEG-4, which although it is not as popular, but is faster in terms of encoding and decoding speed. However, IEEE has started to show its competitiveness compared to other lossless audio compression in terms of higher compression ratio with negligible additional encoding/decoding time. TABLE 1. Comparison of the related works on lossless audio compression algorithms. Method Description Year Reference Software MPEG-4 ALS (LPC) MPEG-4 ALS (RLS- LMS) MPEG-4 SLS FLAC Enhanced CELP Enhanced SLS IEEE Sparse Linear Predictor Cascaded OLS- NLMS This model suggested that the residual error and predictor coefficients is encoded with the bitstream to regenerate the original audio. This model removes the predictor coefficient from the encoded stream and replaces the LPC model with RLS-LMS predictors. This model extends the MPEG-4 AAC lossy compression with additional lossless computational layer. Free Lossless Audio Compression under BSD public license. This model employs code exited sample-by-sample adaptive coding such intersample correlation are removed. Replacement of Laplacian to Gaussian Distribution Input data for BPCG Entropy Block. This Standard employs new pre-and post-processor to flatten and de-flatten the signal and new approaches for entropy coding. Replaces LPC predictors with Sparse predictors. Replaces the RLS algorithm with OLS and LMS with Normalized LMS Filter. *Non-production source code. Advantage 2003 Available Lossless audio compression can be achieved. Disadvantage Overhead of predictor coefficient during decoding demultiplexing Available Fast Decoding Speed. Numerical instability with white signal or signal with small variance Available Scalability to enhance current MPEG-4 AAC lossy algorithm Available 2010 Not available 2010 Not available Adaptive computational complexity due to interchangeable prediction model based on audio signal block and authenticity data check by MD5 code. Faster encoding/decoding Speed compared to MPEG-4 RLS-LMS, due to fixed excitation codebook. Superior encoding and decoding speed Available* High compression ratio Not available 2014 Not available High compression ratio. Low computational complexity (Faster encoding/decoding speed compared to MPEG-4 RLS-LMS) High computational complexity of arithmetic coding (Low Encoding/Decoding Speed). Low compression ratio due overhead of MD5 and prediction model interchangeability with smaller sized wave files. Large stochastic codebook need to be stored from training numerous collection of real audio signals. Average compression ratio. Average computation complexity of arithmetic coding (Aerage encoding/decoding speed). Average decoding speed.

12 From Table 1, it can be concluded from previous research on lossless audio compression that there is still a gap in providing a lossless audio compression which is both high in compression ratio, but also fast in terms encoding and decoding speed. Although encoding speed does not have a significant impact for users, but fast decoding speed is relevant as it impacts playability for in storage and transmission. APPLICATION OF LOSSLESS AUDIO CODING In the future, application of these lossless audio compression can also be used in various fields of science, not only music applications but also medical or biology as these fields are concern with higher quality analysis from their respective machines, thus requiring more storage spaces for these data. Additionally, it is not only applicable to audio data, but might also be useful in other form of compression like video compression. As well as this, more investigation should be done to improve lossless audio compression model based on LPC to improve its encoding and decoding speed especially for multimedia usage, since there is high demand for FLAC codec in the market due to its continuous release of software until 2017 [11]. TABLE 2: Various applications of each standard and algorithm Method Application Data Storage Medical Transmission MPEG-4 ALS (LPC) Yes Yes Yes MPEG-4 ALS (RLS-LMS) Yes Yes Yes MPEG-4 SLS Yes - Yes FLAC Yes Yes Yes Enhanced CELP Yes - - Enhanced SLS Yes - - IEEE Yes - Yes Sparse Linear Predictor Yes - Yes Cascaded OLS-NLMS Yes - - Table 2 shows various applications of each algorithm in accordance of each standard and algorithm description as well as existing systems utilizing each algorithm. Although, in general each standard and algorithm in this paper was created for data storage purposes, its application to medical and transmission specifically were highlighted by the authors and researchers of the standards. For example, MPEG-4 ALS mentioned these applications and was verified to be able to be applied in a medical system such as ECG as well as for streaming [20]. FLAC was also verified that it is applicable in the medical systems, which uses biosignal data [21]. Although others were not used in a medical or transmission system, it mentions the possible application in these field, for example in the IEEE case, the

13 standards provides two file format headers, one for the data storage purpose and another for streaming purposes, such as transmission [3]. CONCLUSION This paper presents the research done on various lossless audio compression standards and algorithm. It is a widelydiscussed topic since people have become more interested in reducing online file size and with higher quality and multi-channel audio files. Throughout the years, numerous lossless codecs have surfaced and refined to the point that they delivers comparable performance and ability. The only differentiating factors are the adoption rate and feature set that might appeal to different set of audience. From the various studies, the IEEE standard can provide the best compression rate by adding pre-processing block to reduce the dynamic range of the prediction residue instead of encoding it directly. This increases the complexity of the algorithm thus affecting the encoding time, but can greatly improve the compression ratio in the process. The IEEE audio compression tool not only promises higher compression ratio but also better performance in terms of encoding and decoding speed compared to other existing lossless codecs with LPC models like MPEG-4 ALS and FLAC, by fixed-point formatting it s encoder and decoder modules, though studies has shown that audio synthesis perception models like MPEG-4 SLS has superior speed though with slightly lower compression ratio. It is also good to note, that although the IEE audio compression tool can provide a high compression ratio, the FLAC codec is adaptive and thus the compression ratio, encoding speed and decoding speed is higher with larger file size despite MD5 checksum overhead. ACKNOWLEDGMENTS The authors have gratefully acknowledged that this research has been supported by Ministry of Higher Education Malaysia Research Fund, FRGS REFERENCES 1. Hans M, Schafer RW. IEEE Signal Process Mag.18, (2001). 2. Liebchen T. T J Acoust Soc Korea. 28, 1 19 (2009). 3. IEEE IEEE Standard for Systems of Advanced Audio and Video Coding, Liebchen T. IEEE Int. Conf. Acoust. Speech Signal Process. (IEEE, 2004). pg Huang H, Rahardja S, Lin X, Yu R, Franti P. IEEE Int. Conf. Acoust. Speech Signal Process. (IEEE, 2006). pg Geiger R, Schmidt M, Yu R. AAC. Proc EURASIP. (2006). pg Huang D-Y. IEEE Int. Conf. Acoust. Speech Signal Process. (IEEE, 2004). pg Huang H, Fränti P, Huang D, Rahardja S. IEEE Trans Audio, Speech Lang Process. 16, (2008). 9. Yu R, Geiger R, Rahardja S, Herre J, Lin X, Huang H. Proc. 117th AES Conv. (2004). pg Yu R, Lin X, Rahardja S, Huang H. IEEE 7th Work. Multimed. Signal Process. (IEEE, 2005). pg Coalson J. FLAC - Free Lossless Audio Codec Beurden M van. Lossless Audio Codec Comparison - Revision 4, 1 31 (2015). 13. B SB, Vagdevi U, Bineetha Y. Int. J. Eng. Sci. 2, 1 7 (2013). 14. Huang H, Shu H, Yu R. IEEE Int. Conf. Acoust. Speech Signal Process. (IEEE, 2014). pg Ghido F, Tabus I. IEEE Trans Audio, Speech Lang Process. 21, (2013). 16. Ghido F. OptimFROG Ulacha G, Stasinski R. Int. Conf. Signals Electron. Syst. (2014). pg Shu, Haiyan; Huang, Haibin;Li, Te; Rahardja S. IEEE Int. Conf. Acoust. Speech Signal Process. (IEEE, 2010). pg Li Y, Chan CF. Eur. Signal Process. Conf. (2010). pg Harada N, Moriya T, Kamamoto Y. MPEG-4 ALS: Performance, applications, and related standardization activities Yoo S, Rho D, Cheon G, Choi J. 5th Int. Conf. Inf. Technol. Appl. Biomed. (2008). pg

Convention Paper Presented at the 118th Convention 2005 May Barcelona, Spain

Audio Engineering Society Convention Paper Presented at the 118th Convention 2005 May 28 31 Barcelona, Spain 6449 This convention paper has been reproduced from the author s advance manuscript, without