Analysis of Huffman and Run-length encoding Compression Algorithms on Different Image Files

Analysis of Huffman and Run-length encoding Compression Algorithms on Different Image Files Aliyu Ishola Nasiru and Afolayan Tolulope Ambibola Department of Information and Communication Science University of Ilorin Email: aliyu.in@unilorin.edu.ng Abstract Viewing and downloading uncompressed images from the internet on mobile devices might take a longer time. This makes the data plan costly and brings unpleasant user experience. Usually, when virtual server is hired to host a website with images, money is paid for the amount of storage and the amount of data that the server sends and receives over a period of time. Image compression allows streaming of more compressed images to viewers without paying more for the bandwidth used. For this purpose, this paper studied and implemented two compression algorithms i.e. Run-lengthen coding (RLE) and Huffman on four image file formats; Joint Photographic Experts Group(JPEG), Bitmap Image File (BMP),Graphics Interchange Format (GIF), Portable Network Graphics (PNG) using C#. Experimentally, results show that RLE performs better than Huffman in compressing GIF, BMP, JPG, and PNG images, with very low compression ratio and high saving percentage. The only instance where Huffman performed better was on a BMP file with less repeating strings. Run-length also compresses in a minimal amount of time compared to Huffman. It is recommended that Huffman and RLE algorithms can be used when lossless compression is required. When Huffman and RLE are the options available for compression, RLE could be considered, but for complicated images with possibilities of less repeating strings, Huffman should be considered. Keywords: Run length Encoding (RLE), Huffman Coding (HC), Image Compression, Image files. 1. Introduction Pictures have been with us since the dawn of time. However, the way pictures have been represented and displayed has changed greatly. Originally, every picture is unique, either represented or displayed in a physical way, such as paint on a cave wall or etchings on the stone. The use of digital images has increased at a rapid pace over the past decade due to computer generated (synthetic) images, particularly for special effects in advertising and entertainments (Shankar, 2010).Image compression plays a major role in a digital domain. The more the image is compressed, the less amount of storage is required (Mahmud, 2012, Arora and Kumar,2018). Data compression is the science of reducing the amount of data used to convey information. It relies on the fact that information, by its nature, is not random but exhibits order and patterns. If that order and patterns can be extracted, the essence of the information can be represented and transmitted using less data than what would be needed for the original. Then, at the receiving end, the original can be intimately or closely reconstructed (David and Giovanni,2010).Basically, data compression is performed by a program that uses a formula or algorithm to determine how to shrink the size of a particular data (Joshi,Raval, Dandawate, Joshi, Metkar,2014)).These programs find the common pieces of data blocks that can be omitted, shrunk, removed or substituted with smaller patterns. The more of the repeated blocks it finds, the more it can compress (David and Giovanni,2010). Nonetheless, compressing data can save storage capacity, speed file transfer, and decrease costs for storage hardware and network bandwidth (Joshi etal, 2014). 183

Images are composed of pixels and each pixel represents the color at a single point in the image; an image will therefore consist of millions of pixels. The richer the image, the more pixels and the bigger the size, the more bandwidth and space required. An uncompressed image, that is, an image in its raw form is quite expensive in terms of space and bandwidth requirements. Hence, image compression that will permanently get rid of some information in the image to save storage space and ease transfer is needed. Compression techniques can be categorized into two types i.e. lossless or lossy. Lossless compression enables the original data to be reconstructed the same as it was before compression without the loss of a single bit of data. It is usually used for text, executable files, medical field, where the loss of words or numbers could change the information or could be harmful (Mozammil, Zakariya and Inamullah, 2012). Lossy compression on the other hand, permanently eliminates bits of data that are redundant, unimportant or imperceptible. Lossy compression is used in graphics, audio, video, and images, where the removal of some data bits has little or no discernible effect on the representation of the content (Joshi et al, 2014). This paper presents a comparative analysis between two lossless compression algorithm; Huffman and Run-length encoding on various image file formats such as JPEG, BMP,GIF,PNG using C#. 2. Related Work Often times, when it comes to comparison between RLE and Huffman encoding, it is usually on text, they are scarcely compared on images. In this work, the comparison analysis is based on their performance on a different image file format. Sharma(2010) studied various compression techniques and compared them based on their usage in different applications and their advantages and disadvantages. The work concludes that Huffman is easy to implement, produces optimal and compact code, relatively slow, depends on statistical model of data, decoding is difficult due to different code lengths, it has an overhead due to Huffman tree, always used in JPEG. Run-length coding is simple to implement, fast to execute; compression ratio is slow as compared to other algorithms, used mostly for TIFF, BMP and PCX files. Ibrahim and Mustapha (2015) compared Huffman and RLE using C++ program to compress a set of text files and the results show that Huffman performs better than RLE on all types of text file. Shankar (2010) compared Run-length coding and Huffman on a single image, based on the results, it was concluded that RLE is very easy to implement, but would not necessarily reduce the size of image and greater compression ratio can be achieved in a crowded image. Huffman coding can provide optimal compression and error free decompression. (Kodituwakkuet al, 2010 and Maan, 2013) opined that in most cases, Huffman performs better than RLE on text files and images. The interest of this work is to see on what image file format Huffman outperforms RLE. (Yuan, Guo, Sun and Ju, 2016) proposed a power efficient System-on-a-Chip test data compression method using alternating statistical run-length coding. Experimental results show that a high compression ratio, low scan-in test power dissipationand little extra area overhead during System-on-a- Chip scan testing were obtained. (Shukla and Gupta, 2015) combined DCT and run-lenght encoding for image compression, result shows that high compression rates are achieved and visually negligible difference between compressed images and original images. 3. Methodology In this work, two popular data compression, algorithms are implemented, analyzed and compared. For measuring the performance, the following parameters are used: compression ratio, saving percentage, computational time and the file formats are JPEG, BMP, GIF, and PNG. 184

3.1 RLE Algorithm (Run Length Encoding) Run-length encoding is a data compression algorithm that is supported by most bitmap file formats, such as TIFF, BMP, and PCX. RLE is suited for compressing any type of data regardless of its information content, but the content of the data will affect the compression ratio achieved by RLE. Although most RLE algorithms cannot achieve the high compression ratios of the more advanced compression methods, RLE is both easy to implement and quick to execute, making it a good alternative to either using a complex compression algorithm or leaving your image data uncompressed. RLE works by reducing the physical size of a repeating string of characters. This repeating string, called a run, is typically encoded into two bytes (Ibrahim and Mustapha, 2015). RLE Pseudocode Given a binary image of dimension n x m, with a background pixel intensity of 0 and foreground intensity of 1. set color to 0 set count to 0 for each pixel in the image if current pixel not equal to color write count set color to current pixel color set count to 1 else increment count by 1 if count not equal to 0 write count // record last run 3.2 Huffman Algorithm The Huffman Algorithm generates variable length code in such a way that high frequency symbols are represented with a minimum number of bits and low frequency symbols are represented by a relatively high number of bits (Yadav, 2006). Huffman coding is an entropy encoding algorithm used for lossless data compression in computer science and information theory. The term refers to the use of a variablelength code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. The following are the steps of Huffman Algorithm: Step 1: Compute or collect the total number of symbols and their relative frequency Step 2: Arrange all the symbols in decreasing order of their frequencies Step 3: Construct Huffman Tree from the list of symbols Creating the tree: 1. Start with as many leaves as there are symbols. 2. Enqueue all leaf nodes into the first queue (by probability in increasing order so that the least likely item is in the head of the queue). 3. While there is more than one node in the queues: 1. Dequeue the two nodes with the lowest weight. 185

2. Create a new internal node, with the two just-removed nodes as children (either node can be either child) and the sum of their weights as the new weight. 3. Enqueue the new node into the rear of the second queue. 4. The remaining node is the root node; the tree has now been generated. Step 4: Assign the code. Figure 1 : The flowchart of Run length Encoding (Murray,Vanryper,1996) 4. Result and Discussion The lower the compression ratio, the better and the more it performs. RLE has a very low compression ratio on image files except for the Hildebrantmed.bmp file that has very high compression ratio for RLE and very low for Huffman. From figure 3, RLE has better saving percentage compared to Huffman, although Huffman is running neck to neck with it on some image file formats which are the GIF images. That is, they both do great on GIF images. Huffman does not perform well on JPEG images compared to RLE. The saving percentage increases when the file size after compression is far smaller than the original file size. That is, the smaller the difference between file size before and after compression, the lesser the saving percentage. The Hildebrantmed.bmp is still the only image that gives Huffman algorithm advantage over RLE. Table 4 shows the compressed size files of the compression analysis. The table simply depicts that RLE clearly compresses better than Huffman techniques, except on the large complicated image with less repeating strings, where Huffman does better by compressing 470kb file to 204kb, while RLE compresses 186

the same picture to 381k. The output for the compressed files for both algorithms on.gif image types are very close, compared to the difference in both algorithms on the other image file formats. Table 1: Compression rate for Huffman Coding S/ File Name File File Huffman Huffman Saving Decomp- N Type Size Output File Compression Percentage ression size (kb) Size (kb) Ratio (%) (kb) 1 05 TIFF1d Jpg 147 105 0.71 28.6 147 2 BisonTeton Jpg 93 75 0.81 19.4 93 3 Yoyin Jpg 2470 2250 0.91 8.9 2470 4 Sciurusvulgaris Png 1599 205 0.13 87.2 1599 5 Lady Png 514 51 0.1 90.1 514 6 latest-1 Png 81 15 0.19 81.5 81 7 Tiger-1 Bmp 655 67 0.1 89.8 655 8 Hildebrantmed Bmp 470 204 0.43 56.6 470 9 Adafruit Bmp 226 56 0.25 75.2 226 10 Earth Gif 1319 15 0.01 98.9 1319 11 PeterPan Gif 380 11 0.03 97.1 380 12 SpongeBob Gif 48 9 0.19 81.3 48 Table 2: Compression rate for Run-length Encoding S/N File Name File File RLE RLE RLE Saving RLE Type Size output Compression Percentage Decompression (kb) file size Ratio (%) Size (kb) 1 05 TIFF1d Jpg 147 21 0.14 85.7 147 2 BisonTeton Jpg 93 24 0.26 74.2 93 3 Yoyin Jpg 2470 243 0.1 90.1 2470 4 Sciurusvulgaris Png 1599 33 0.2 97.9 1599 5 Lady Png 514 15 0.03 97.1 514 6 latest-1 Png 81 7 0.09 91.2 81 7 Tiger-1 Bmp 655 17 0.03 97.4 655 8 Hildebrantmed Bmp 470 381 0.81 18.9 470 9 Adafruit Bmp 226 11 0.05 95.1 226 10 Earth Gif 1319 13 0.01 99 1319 11 PeterPan Gif 380 9 0.02 97.6 380 12 SpongeBob Gif 48 8 0.12 83.3 48 187

Compression Ratio Comparison 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 File Size 147kb 93kb 2470kb 1599kb 514kb 81kb 655kb 470kb 226kb 1319kb 380kb 48kb File Type jpg jpg jpg png png png bmp bmp bmp gif gif gif RLE Huffman Figure 2: Compression ratio comparison between RLE and Huffman Saving Percentage Comparison 120 100 80 60 40 20 0 File Size File Type 147kb 93kb 2470kb 1599kb 514kb 81kb 655kb 470kb 226kb 1319kb 380kb 48kb jpg jpg jpg png png png bmp bmp bmp gif gif gif RLE Huffman Figure 3: Saving Percentage comparison between RLE and Huffman 188

Table 3:Compression time for RLE and Huffman S/N File Type File Size (kb) RLE Compression Time (Secs) 1 Jpg 147 7 25 2 Jpg 93 4 23 3 Jpg 2470 264 587 4 Png 1599 6 125 5 Png 514 3 22 6 Png 81 2 5 7 Bmp 655 5 14 8 Bmp 470 4 37 9 Bmp 226 2 7 10 Gif 1319 2 8 11 Gif 380 2 7 12 Gif 48 1 4 HUFFMAN Compression Time (Secs) Table 3 displays the compression comparison between both algorithms. As it can be depicted, RLE compresses faster than Huffman on all image files of different sizes. Table 4: Compression Output File size by both RLE and Huffman S/N File Type File Size (kb) RLE Output File size (kb) Huffman Output File Size (kb) 1 Jpg 147 21 105 2 Jpg 93 24 75 3 Jpg 2470 243 2250 4 Png 1599 33 205 5 Png 514 15 51 6 Png 81 7 15 7 Bmp 655 17 67 8 Bmp 470 381 204 9 Bmp 226 11 56 10 Gif 1319 13 15 11 Gif 380 9 11 12 Gif 48 8 9 189

Figure 4:jpg image after compression and decompression with Huffman Figure 5: Bmp image after compression and decompression with RLE 5. Conclusion This paper thoroughly studied and implemented two well-known compression algorithms named Runlength Encoding (RLE) and Huffman Coding (HC). Both Algorithms are tested on the following types of image file GIF, BMP, JPG, and PNG. Experimentally, results showed that RLE performs better than Huffman in compressing GIF, BMP, JPG, and PNG images, with very low compression ratio and high saving percentage. The only instance where Huffman performed better was on a BMP file with less repeating strings. RLE also compresses in a minimal amount of time compared to Huffman. All things 190

being equal, it is concluded that RLE performs better than Huffman based on the stipulated parameters which are compression ratio, saving percentage, compressed file size, and compression time. In the future, we intend to implement the two algorithms on various video file formats. REFERENCES Arora, S., & Kumar, G.(2018). Review of Image Compression Techniques. International Journal of Recent Research. 5( 1), 185-188 David S and Giovanni M. (2010). Handbook of Data Compression. New York: Springer. Ibrahim, A. M. A., & Mustafa, M. E. (2015). Comparison between (Rle and huffman) algorithms for lossless data compression. IJITR, 3(1), 1808-1812. Joshi, M. A., Raval, M. S., Dandawate, Y. H., Joshi, K. R., & Metkar, S. P. (2014).Image and Video Compression: Fundamentals, Techniques, and Applications. CRC Press. Kodituwakku, S. R., & Amarasinghe, U. S. (2010). Comparison of lossless data compression algorithms for text data. Indian journal of computer science and engineering, 1(4), 416-425. Maan, A. J. (2013). Analysis and comparison of algorithms for lossless data compression. International Journal of Information and Computation Technology, 3(3), 139-146. Mahmud, S. (2012). An improved data compression method for general data. International Journal of Scientific & Engineering Research, 3(3), 2. Zakariya, S. M., & Inamullah, M. (2012). Analysis of video compression algorithms on different video files. In Computational Intelligence and Communication Networks (CICN), 2012 Fourth International Conference on (pp. 257-262). IEEE. Murray, J. D., & William vanryper. (1996). Encyclopedia of Graphics File Formats: The Complete Reference on CD-ROM with Links to Internet Resources. O'Reilly. Run-lenght Encoding Pseudocode, retrieved from http://www.cs.unca.edu/~reiser/imaging/rle.html,12 may 2017 Shankar, U. B. (2010). Image compression techniques. International Journal of Information Technology and Knowledge Management, 2(2), 265-269. Sharma, M. (2010). Compression using Huffman coding. IJCSNS International Journal of Computer Science and Network Security, 10(5), 133-141. Shukla, R., & Gupta, N. K. (2015). Image Compression through DCT and Huffman Coding Technique. International Journal of Current Engineering and Technology, 5(3), 1942-1946. Yadav, D. S. (2006). Foundations of Information Technology. New Delhi: New Age International. Yuan, H., Guo, K., Sun, X., & Ju, Z. (2016). A Power Efficient Test Data Compression Method for SoC using Alternating Statistical Run-Length Coding. Journal of Electronic Testing, 32(1), 59-68. 191