Image Compression using Discrete Wavelet Transform Preston Dye ME 535 6/2/18
Introduction Social media is an essential part of an American lifestyle. Latest polls show that roughly 80 percent of the US uses social media in some form. Not only are people reading content they are providing it as well. Social media giant Facebook takes in roughly 600 terabytes of data a day. Storing all of that data can be difficult and expensive. Data compression is a critical component to social media on the internet. Companies like Facebook must choose wisely how much they will compress the data, striking a balance between image quality and storage optimization. Discrete Fourier Transform Signal transforms are a popular method for data compression. To begin the discussion we will look at a more common transform for signal processing, Discrete Fourier Transform DFT. Discrete Fourier Transform and its computational implementation Fast Fourier Transform are widely used in mathematics and engineering. The core idea being signals can be expressed as a combination of sinusoidal waves. DFT takes a signal which are in the units of time and amplitude and transforms them into a signal of amplitude and frequency. Experimental data or signal data is collected as a finite set. These discrete data points are represented in the time/amplitude domain. To find the underlying frequencies in the data the data must be transformed to the frequency/amplitude domain. The subscript j represents the specific times that the sample was taken. The variable f_j represents the value of that continuous signal at the specific time. Omega is the fundamental frequency represented as 2pi/period. The constant k represents the multiples of the fundamental frequency, or harmonics. For a discrete Fourier transform is written as: And for the inverse discrete Fourier transform:
The following is an example of a DFT of a signal using FFT. The signal runs for 1 sec at.02 sec time interval. For the given function above we would expect the FFT to identify the of frequency at 12.5HZ in the real domain and 18.75 in the imaginary domain. Using the fft function in matlab the figures below show that indeed the fft is able to correctly extract the frequency from the sample function. Figue 1: FFT of example signal DFT has additional uses when denoising data. The noisy signal may be transformed and the dominant frequency can be identified. The insignificant frequencies may be removed and then the signal would then be inverse transformed to the time domain. This same approach can be used for signal compression. First one would transform the signal. Then identify frequencies that are insignificant, typically they are the higher frequencies. Threshold or remove those frequencies. Finally, transform back to the time domain.
Principles of Image Compression Most images are comprised of thousands of pixels that require large computation space to hold. When you zoom into a picture one notices that many of the neighboring pixels are identical or very close to the same colour. These redundant pixels do not provide and additional detail for the picture yet require storage space. As mentioned above storage space can be very expensive. Since many pixels are redundant many transform models aim at identifying these rednuncacies and removing them in a logical way. Talukder identified three aims that compression models try to minimize. First Spatial Redundancy, or as mentioned previously identifying similar neighboring pixels. Secondly Spectral Redundancy or the correlation between color planes and spectral bands and thirdly temporal redundancy or correlation between adjacent frames when used in video applications. Talukder states it sucently that Image compression aims to reduce the number of bits needed to represent an image. (Ref 4) Lossy vs Lossless There are two types of image compression schemes, Lossy and Lossless. First Lossless compression is as it sounds the compression model is successfully able to minimize the space without losing any of the data. Lossy compression will compress the image but at the same time some of the data is lost. Lossless compression is restricted by the fact that it must retain all the data so compression ratios typically are not very high. Lossy compression typically is much better at compression but will lose some of the data in the process. Though with lossy compression typically you are able to set a compression threshold which will determine the amount of data lossed to compression. (Ref 4) Image Compression Process The essential steps for image compression are seen below. The Image is processed via a transform either fourier or wavelet, which will be covered in greater detail further on. The image is then quantized. The process of quantization involves the conversion of floating point digits into integers. This can typically results in reconstruction error due to the fact that the signal is changed. The level to which the numbers are quantized can be set, but typically for wavelet transforms will be set be the chosen wavelet method. After quantization typically within the image matrix there a several redundant points. To further compress the image an entropy encoder is employed to reduce these redundancies. (Ref 7). To decompress the images the reverse process is followed.
Figure 2: Image Compression Flow Chart Entropy Encoding One popular entropy encoder is known as a Huffman Tree. David A. Huffman identified a method to minimize code length while maintaining lossless compression. As the name describes the data is formed in a tree as shown in the example below. The root of the tree is the total probability of the code as a whole which is one. To create the tree the leaves are made first. The probability of an individual data point is evaluated for each. These are then order from least to highest probability. Nodes are created where the two lowest probability leaves are combined. The list is then sorted again and the process continues again by combining the least probable nodes again. This is continued until a single node is found known as the root. The final step to Huffman encoding is to go back up the tree and assign every node either a 0 or 1. Typically every left node is assigned a 0 and every right node is assigned a one. Each digit is assigned a specific binary value. The string of digits now contain several zeros and ones which are easy compressed. (Ref 3). The example below is the conversion of the string into Huffman code. Figure 3: Huffman Coding Example
DFT Limitations As DFT became more popular in different fields some of its limitations were identified. Specifically DFT is not able to handle sharp discontinuities in data. Sharp discontinuities exist widely in images, they are typically found when the image goes from dark to a light. The flower picture below is an example of an image with several discontinuities. DFT has a difficult time handling these discontinuities, which usually results in choppy images after compression. The figure below shows the image compressed at different levels. As can be seen even with only 10% compression the image quickly becomes blurred. Figure 4: Gray scale image example Figure 5: FFT compression of example image
DCT and JPEG As a way to remedy the downfalls of DFT a different transform was developed known as Discrete Cosine Transform DCT. This was later adopted as the transform of choice for image compression by the Joint Photographic Experts Group or JPEG. This form of compression is still widely used today. The details of DCT are outside the scope of this text. As an example of DCT, a similar compression was made with the flower image. This transform is much better at compressing the image, but as can been seen, there is still much to be desired in terms of image quality after compression at higher values. Figure 6: DCT of example image Why Wavelets As seen early cosine and sine based transforms have a difficult time representing discontinuities in signals or images. The real source of the issue goes back to what is known as the Heisenberg Uncertainty Principle. Though this was originally a physics based theory this can also be applied to the time frequency dilemma. The principle states that one cannot know the exact time-frequency representation of a signal. The way around this phenomena is to create time intervals that may capture the frequencies that you are after.
When using the Fourier Transform to interpret a signal we are in either the time domain or the frequency domain. When we are in the time domain we know exactly the signal at every time interval, but we do not have an frequency information. The same thing happens when in the frequency domain, we can see the spectra perfectly but we do not know its time interval. (The Wavelet Tutorial by Robi Polikar) One solution is to change the window in which we are analyzing the signal. Though this may help in some situation we are again faced with another problem in which we must determine the length of our window. When the window is too narrow we will achieve good time resolution but at the expense of frequency resolution. And again the same thing happens when performed the other way. Discrete Wavelet Transform The Discrete Wavelet Transform is unique in that it applies the method of multiresolution analysis or MRA. MRA appliess several different length windows to a signal in order to capture as much detail as possible. It is engineered to achieve good time yet poor frequency resolution at high frequencies and good frequency but poor time resolution at lower frequencies. The reason for this is we as humans are able to better recognize lower frequencies than higher ones. Discrete Wavelet Transform can be thought of as a way of zooming in and out of an image. As you zoom out detail is lost and only the general image remains. For example if we have a signal pixel image in a vector space V^0 it will have the length [0,1). Now if the signal has two pixels each of [0,½) and [½,1) it will take on the vector space V^1. One could continue in this manner up to V^j space, or could conversely move down the ladder to V^-j. Each of these intervals can always be represented as a combination of the previous. Just as a continuous piecewise function of two parts could be represented as a piecewise function of four. The nested space V^j are a key ingredient to MRA. (Ref 4) A basis for each vector space V^j must be defined, these are known as scaling functions. They are typically denoted as the symbol φ. As an example the Haar scaling function is the following: The wavelet Psi corresponding to the Haar wavelet is:
Haar Wavelet Example Wavelet transforms when applied to a signal results in two parts, the approximation and the detail. Each approximation and detail combination are known as a level. Typically the process applies the decomposition to the approximation, where is further refined and a new detail is created for each level. As mentioned before the approximation is a representation of the lower frequencies and is usually a good estimate of the original signal. The detail portion is typically the high frequencies and is kept so the the original signal may be reconstructed. A simple 1D matrix will be used to illustrate how DWT is implemented. In this example we will be using the Haar wavelet. This is the simplest of wavelets and is represented as two steps, a positive 1 for ½ time interval then a negative 1 for ½ time interval. Our 1D matrix will be the following four values [8 5 3 4]. The first approximation is the original data. The second approximation is created by taking the average of the two pairs of values resulting in a new approximation of [ 6.5 3.5]. The detail for this first level is [1.5.5]. The third and final approximation is [5] and the detail is [1.5]. The final wavelet transform is a four digit matrix as a combination of the final approximation and the detail coefficients [5 1.5 1.5.5]. The original matrix can then be recreated by taking the approximation at each level and adding both the positive and negative value of the detail to it. Table 1: Haar Wavelet Example HAAR WAVELET TRANSFORM Resolution Approximations Detail Coefficents 4 [8 5 3 4] 2 [6.5 3.5] [1.5.5] 1 [5] [.5] Two Dimensional Decomposition Two dimensional decomposition is similar to one dimensional. The same scaling and wavelet functions are applied to a two dimensional signal. The two dimensional scaling function is made by multiplying the two 1D scaling functions φ(x,y)=φ(x)φ(y). Similarly the wavelet function is also obtained by multiplying the two 1D wavelet functions or wavelet and scaling function. There are
three wavelet functions for the 2D case. The horizontal details are Ψ(1)(x,y)= φ(x)ψ(y), vertical details Ψ(2)(x,y)= Ψ(x)φ(y) and the diagonal details Ψ(3)(x,y)=Ψ(x) Ψ(y). These four represent the required pieces for perfect reconstruction of the 2D signal. The 2 scaling function also known as the (Low Low or LL) portion can then be used to continue to decompose the image. The other three portions horizontal (High Low or HL), vertical (LH) and diagonal (HH) will be kept for reconstruction. Each step is downsampled due to the Nyquist frequency rule. The flow for image decomposition is as follows Figure 7: 2D image decomposition The decomposed Haar image at level 1 is represented in the following figure. The approximation is in the upper left hand corner. The vertical detail is the bottom left, diagonal detail bottom right and horizontal detail is top right. Figure 8: Decomposition example, haar level 1
Energy and Thresholding After the image is wavelet transformed the real compression comes by thresholding. Thresholding is a set point that the user provides that determines which values to keep and which to set as zero. Replacing these smaller values with zeros does cause the compression system to be lossy, in that the reconstructed image will not be the same as the original. One way to determine how alike the original is to the reconstructed image is via energy. Energy in the discrete domain is the squared sum of all the values. Energy can be calculated for the pre and post processing images and the ratio determines how alike the images are. Figure 9: Thresholding example Thresholding can be adjusted to determine the how much energy will remain after compression. The image was first passed through a level 4 Haar wavelet. The dashed line on the left of the graph determines which values will be placed to zero. The blue line represent the percentage of zeros and the purple line represents the amount of energy. In the example above with the global threshold set to 30. The energy of the image is 99% while the amount of zeros is 95%.
Reconstruction After thresholding, the image is then constructed in the reverse order of its decomposition. The compressed image is compared to the original by compression ratio. Compression ratio is the ratio between the uncompressed size and compressed size. Compression of a 10 MB file to 2 MB has a compression ratio of 10/2 = 5, often notated as an explicit ratio, 5:1. Wavelet Families One of the most convenient things about using discrete wavelet transform is the ability to choose the type of wavelet that will be applied to the signal. One researcher, Daubechies, was able to create several different wavelets that were excellent at representing polynomial behavior. (Ref 1) This wavelet family in matlab is known as dbn, were n can vary from 1 to 10. DB1 in fact is the simple Haar Wavelet. The additional wavelets are the following: Figure 10: Daubechies wavelets Source( https://www.mathworks.com/help/wavelet/gs/introduction-to-the-wavelet-families.html ) There are several other wavelet families. Two popular wavelets are Coiflets and Symlets. Both were created by Daubechies. The wavelet anaylzer in matlab allows one to change between the different families and determine which best suites the image they are trying to compress. For an example here is the same flower image compressed using a few different families. Figure 11: Db4 at level 4 and 18.74 global threshold
Figure 12: Sym2 at level 4 and 18.74 threshold Conclusion Discrete wavelet transform is a powerful tool for signal compression. Wavelet transform stems from earlier work is Fourier Transform. Though Fourier Transform is commonly used for signal analysis for discontinuities found in images it has a difficult time representing. DWT and different wavelet families have the capability to account for these sharp discontinuities allowing the images to be processed more smoothly. They are also able to provide significantly better compression ratios.
References: 1. Daubechies I. Orthonormal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics. 1988;41(7):909-996. doi: 10.1002/cpa.3160410705. 2. Mallat SG. A theory for multiresolution signal decomposition: The wavelet representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 1989;11(7):674-693. doi: 10.1109/34.192463. 3. Huffman D. A method for the construction of minimum-redundancy codes. Reson. 2006;11(2):91-99. doi: 10.1007/BF02837279. 4. Talukder KH, Harada K. Haar wavelet based approach for image compression and quality assessment of compressed image.. 2010. 5. Graps A. An introduction to wavelets. Computational Science & Engineering, IEEE. 1995;2(2):50-61. doi: 10.1109/99.388960. 6. Antonini M, Barlaud M, Mathieu P, Daubechies I. Image coding using wavelet transform. Image Processing, IEEE Transactions on. 1992;1(2):205-220. doi: 10.1109/83.136597. 7. Pearlman, William A (William Abraham). Wavelet image compression. San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA): San Rafael, Calif. 1537 Fourth Street, San Rafael, CA 94901 USA : Morgan & Claypool; 2013.