Comparative Analysis of Sequitur Algorithm with Adaptive Huffman Coding Algorithm on Text File Compression with Exponential Method

Similar documents
So, what is data compression, and why do we need it?

Comparison Between Dynamic And Static Blocks In Sequitur Algorithm

Keywords Data compression, Lossless data compression technique, Huffman Coding, Arithmetic coding etc.

A Comparative Study of Entropy Encoding Techniques for Lossless Text Data Compression

Research Article Does an Arithmetic Coding Followed by Run-length Coding Enhance the Compression Ratio?

Content Based Image Retrieval Using Lacunarity and Color Moments of Skin Diseases

Analysis of Parallelization Effects on Textual Data Compression

Data Compression Algorithms

Decision Support System Best Employee Assessments with Technique for Order of Preference by Similarity to Ideal Solution

FRACTAL IMAGE COMPRESSION OF GRAYSCALE AND RGB IMAGES USING DCT WITH QUADTREE DECOMPOSITION AND HUFFMAN CODING. Moheb R. Girgis and Mohammed M.

Comparison of Text Data Compression Using Run Length Encoding, Arithmetic Encoding, Punctured Elias Code and Goldbach Code

A COMPRESSION TECHNIQUES IN DIGITAL IMAGE PROCESSING - REVIEW

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

A New Algorithm based on Variable BIT Representation Technique for Text Data Compression

A New Compression Method Strictly for English Textual Data

Modified Playfair Cipher Using Random Key Linear Congruent Method

Text Data Compression and Decompression Using Modified Deflate Algorithm

A Comparative Study of Lossless Compression Algorithm on Text Data

Highly Secure Invertible Data Embedding Scheme Using Histogram Shifting Method

Comparison between Variable Bit Representation Techniques for Text Data Compression

Topic 5 Image Compression

A Memory-Efficient Adaptive Huffman Coding Algorithm for Very Large Sets of Symbols Revisited

A Novel Image Compression Technique using Simple Arithmetic Addition

Optimization of Bit Rate in Medical Image Compression

IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 10, 2015 ISSN (online):

An Effective Approach to Improve Storage Efficiency Using Variable bit Representation

Invisible Watermarking Audio Digital with Discrete Cosine Transform

ENHANCED DCT COMPRESSION TECHNIQUE USING VECTOR QUANTIZATION AND BAT ALGORITHM Er.Samiksha 1, Er. Anurag sharma 2

Information Technology Department, PCCOE-Pimpri Chinchwad, College of Engineering, Pune, Maharashtra, India 2

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

ADVANCED LOSSLESS TEXT COMPRESSION ALGORITHM BASED ON SPLAY TREE ADAPTIVE METHODS

Medical Image Compression using DCT and DWT Techniques

Data Compression Scheme of Dynamic Huffman Code for Different Languages

Lossless Compression Algorithms

Lossless Image Compression with Lossy Image Using Adaptive Prediction and Arithmetic Coding

A Comparative Study Of Text Compression Algorithms

Design and Implementation of FPGA- based Systolic Array for LZ Data Compression

Multimedia Systems. Part 20. Mahdi Vasighi

A Research Paper on Lossless Data Compression Techniques

Comparative data compression techniques and multi-compression results

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Shri Vishnu Engineering College for Women, India

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

A Method for Lossless Compression of Images

Data compression.

Efficient Image Compression of Medical Images Using the Wavelet Transform and Fuzzy c-means Clustering on Regions of Interest.

8 Integer encoding. scritto da: Tiziano De Matteis

COLOR IMAGE COMPRESSION USING DISCRETE COSINUS TRANSFORM (DCT)

Dicky Nofriansyah*, Ganefri, Sarjon Defit, Ridwan, Azanuddin, Haryo S Kuncoro 1,4,5. Departement of Information System, STMIK Triguna Dharma 1

A Comprehensive Review of Data Compression Techniques

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

Novel Lossy Compression Algorithms with Stacked Autoencoders

AN ANALYTICAL STUDY OF LOSSY COMPRESSION TECHINIQUES ON CONTINUOUS TONE GRAPHICAL IMAGES

Prim's Algorithm for Optimizing Fiber Optic Trajectory Planning

CIS 121 Data Structures and Algorithms with Java Spring 2018

Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems

University of Mustansiriyah, Baghdad, Iraq

Journal of Physics: Conference Series PAPER OPEN ACCESS. To cite this article: B E Zaiwani et al 2018 J. Phys.: Conf. Ser.

Image Compression Algorithm and JPEG Standard

A Comparison between English and. Arabic Text Compression

CSE COMPUTER USE: Fundamentals Test 1 Version D

IMPLEMENTATION OF IMAGE AND AUDIO DATA USING KUDOS & COMPRESSION TECHNIQUES

CSE 421 Greedy: Huffman Codes

IMAGE COMPRESSION USING HYBRID QUANTIZATION METHOD IN JPEG

Dictionary Based Compression for Images

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

Research Article Color Image Compression Based on Wavelet, Differential Pulse Code Modulation and Quadtree Coding

ENTROPY ENCODERS: HUFFMAN CODING AND ARITHMETIC CODING 1

JPEG. Table of Contents. Page 1 of 4

FILE CONVERSION AFTERMATH: ANALYSIS OF AUDIO FILE STRUCTURE FORMAT

An Implementation of RC4 + Algorithm and Zig-zag Algorithm in a Super Encryption Scheme for Text Security

Lossless Image Compression having Compression Ratio Higher than JPEG

Adaptive Quantization for Video Compression in Frequency Domain

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

Robust Header Compression for Multimedia Services

Improving LZW Image Compression

MEMORY EFFICIENT WDR (WAVELET DIFFERENCE REDUCTION) using INVERSE OF ECHELON FORM by EQUATION SOLVING

Keywords DCT, SPIHT, PSNR, Bar Graph, Compression Quality

Journal of Computer Engineering and Technology (IJCET), ISSN (Print), International Journal of Computer Engineering

HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM

Fingerprint Image Compression

IMAGE COMPRESSION USING HYBRID TRANSFORM TECHNIQUE

ARTICLE; BIOINFORMATICS Clustering performance comparison using K-means and expectation maximization algorithms

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

Steganography by using Logistic Map Function and Cellular Automata

Lempel-Ziv-Welch (LZW) Compression Algorithm

A LOSSLESS INDEX CODING ALGORITHM AND VLSI DESIGN FOR VECTOR QUANTIZATION

Improved Image Compression by Set Partitioning Block Coding by Modifying SPIHT

LOSSLESS DATA COMPRESSION AND DECOMPRESSION ALGORITHM AND ITS HARDWARE ARCHITECTURE

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM

ISSN Vol.06,Issue.10, November-2014, Pages:

An Efficient Compression Technique Using Arithmetic Coding

Image coding and compression

A Simple Lossless Compression Heuristic for Grey Scale Images

Comparative Analysis of 2-Level and 4-Level DWT for Watermarking and Tampering Detection

Concept of Analysis and Implementation of Burst On Mikrotik Router

A novel lossless data compression scheme based on the error correcting Hamming codes

Keywords Audio Compression, Biorthogonal tab 9/7 wavelet filter, Hierarchal Quantization, Lossless Coding

STUDY OF VARIOUS DATA COMPRESSION TOOLS

Transcription:

2017 IJSRST Volume 3 Issue 8 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology Comparative Analysis of Sequitur Algorithm with Adaptive Huffman Coding Algorithm on Text File Compression with Exponential Method Muhammad Fadlan 1, Surya Darma Nasution 2, Fadlina 3, Saidi Ramadan Siregar 4, Pristiwanto 5 1 Research Scholar. STMIK Budi Darma, Medan, Indonesia 2,3,4,5 Department of Computer Engineering, STMIK Budi Darma, Medan, Indonesia ABSTRACT This research was conducted to analyze and compare sequitur algorithm with the adaptive Huffman coding algorithm in compressing text file, to find which algorithm more effective and efficient in compressing compressed text file will be compared with an exponential method to be seen the performance of each algorithm based on CR, SS, RC. Text file compression will read the current string in the text file. In the Huffman adapting algorithm the text file coding will be changed into bit form, and the sequitur algorithm will change the file based on the grammar. The test result states that Huffman adapting coding algorithm is no better than sequitur algorithm. Keywords: Compression, Text file, Adaptive Huffman Coding, Sequitur, Exponential. I. INTRODUCTION The more the technology, the more data or files we want to save, or we send, but sometimes the memory capacity that we have not comparable with the data we will save; Therefore the data will be stored compressed first so that its size becomes smaller. If the size of the data can be compressed to be smaller than the original size, then automatically the memory can store more data and be concerning delivery will be faster and save the time required. At the moment now much software used to handle data compression problems. In the process of data compression, there are some things to be considered, time process (the time that runs when data is compressed and decompressed), ratio (size of data after compression and decompression), completeness (completeness of data after the files are in compression and decompression), space savings (percentage difference in data size after compressed with data size before in compression). Data compression has an essential role in file storage and distribution systems. Data compression can be used in multimedia fields, text documents, and database records.[1]. The adaptive Huffman coding algorithm is an extension of the Huffman analogy, where dynamically encoded files have a significant impact on the effectiveness of the tree as an encoder. Unlike the adaptive Huffman coding algorithm that works on a character-by-character basis, the sequitur algorithm operates by running the constraint diagrams and usability rules. Previous research ever conducted by Craig G. Nevill- Manning and Ian H. Witten SEQUITUR solves a new problem by operating gradually. Also, the simple structure of this method allows it to operate in space and time which is linear in the input size. The results of this study can process 50,000 symbols per second and have been applied [2]. IJSRST173869 Received : 10 Nov 2017 Accepted : 24 Nov 2017 November-December-2017 [(3)8: 349-353] 349

II. METHODS AND MATERIAL 2.1 Data Compression The compression process is the process of reducing the size of data to produce a robust or compact digital representation but can still represent the quantity of information contained in the data. There are two types of compression, lossy compression is capable of losing specific data from the data, and lossless compression of information contained in the data results similar to the information when the data has not been compressed[3][4][5]. 2.2 Text File A text file is a computer file composed of a series of lines of text. The types of text files that fall into the category contain a series of characters without any visual format information. The contents of this category file are usually notes or personal lists, articles, books, and so on. Text files are similar to files produced by word processing programs whose principal content is textual[6]. 2.3 Sequitur Algorithm The sequitur algorithm is an algorithm based on the concept of context-free grammar. In this algorithm is known non-terminal symbol and terminal symbol. Both types of symbols are elements of the production rules (rules used to construct a sentence). A non-terminal symbol is placed on the left and a string of terminal and non-terminal symbols on the right. The non-terminal symbol on the left becomes the name of the string on the right. Sequitur builds its grammar using two principles: 1. Diagram Uniqueness, no pair of adjacent symbols that appear more than one. 2. All rules must be used more than once, and onceused rules must be ignored or eliminated[7]. 2.4 Adaptive Huffman Coding Faller and Gallager first introduced adaptive Huffman coding. Knuth contributed with improvements to his algorithm and produced an algorithm known as the FGK algorithm. Vitter introduced the latest version of Adaptive Huffman Coding. All methods found are define-word schemes determining the mapping of source messages to codewords based on the probability estimation of source messages. The code is adaptive, changing according to its optimal estimate at the time. In this case, Adaptive Huffman Code responds to locality. In a sense, the encoder studies the characteristics of the source. Decoders should learn the similarity with the encoder by updating the Huffman tree so that it is in sync with the encoder[7]. The main idea of compression and decomposition begins with an empty Huffman tree and modifies the symbol being read and processed. Compressor and decompressor modify the tree in the same way so that at any point in the process will use the same code. Although the codes may change in every step. Where the decoder reflects the encoder operation. Initially, compression begins with an empty Huffman tree. There are no pre-existing symbols. The first symbol is inserted in the stream in a compressed form. This symbol is then added to the tree, if the symbol is found again in the stream (tree) then the frequency increases one, and will modify the tree, the tree is rechecked to see if it is still a Huffman tree (best code). If it does not rearrange trees, and cause code changes[7]. 2.5 Exponential Method In using the Exponential Comparison Method, there are several steps that must be done: 1. Develop alternative decision alternatives. 2. Determine which criteria or comparison of decisions are important to evaluate. 3. Determine the importance of each decision criterion. 4. Assess all alternatives on each criterion. 5. Calculate the score or total value of each alternative. 6. Determining the order of priority decisions is based on the score or the total value of each alternative[8]. The calculation formulation of scores for each alternative in the exponential comparison method is as follows: Information : TNi: Total i-alternate value 350

RKij: The relative importance of- j criterion on decision choice i TKKj: The degree of importance of the criteria of the junta decision; TKKj> 0; round m: Number of decision criteria n: Number of decision choices j: 1,2,3,... m; m: Number of criteria i: 1,2,3,..., n; n: Number of alternative options III. RESULTS AND DISCUSSION Comparative analysis of sequitur algorithm with adaptive Huffman coding algorithm is done to find out which algorithm is more effective in compressing text file. Here is the process that happens: Text: gadjahmada Table I. The size of the string before it is compressed 3.1 Sequitur Algorithm Suppose we want to compress the string "gadjahmada" with sequitur algorithm. To find out the string size see table 1. Table II. Sequitur Algorithm In table 1 the character "gadjahmada" will be checked to see the pairs of characters appearing more than 1 time, the character pairs appearing more than 1 time is "ad", then the character "ad" will be processed, the character "ad" will be inserted into the diagram which generates nonterminal A, and generates a new character "gajahmaa", check whether there are any new character pairs or not, otherwise the process is complete. Table III. Total Bit x Frequency Char Freq Dec Biner Bit Bit x Freq g 1 103 01100111 8 8 A 2 65 01000001 8 16 j 1 106 01101010 8 8 a 2 97 01100001 8 16 h 1 104 01101000 8 8 m 1 109 01101101 8 8 Total Bit x Frequency 64 3.2 Adaptive Huffman Coding Adaptive Huffman coding algorithm is the development of the Huffman algorithm, suppose we want to compress the string "gadjahmada", then the steps - steps are as follows: 1. Character g is inserted into the tree, generate the frequency value to 1. 2. A character is inserted into the tree and creates a new tree branch, and the frequency value becomes 2. 3. The d character in inserted makes the branch bar below character a, and the frequency value becomes 3. 4. The character j is inserted and create a new branch under the character d, and the g character in the move to the left so that the tree remains a Huffman tree. 5. Character an inserted into the tree, Because previously there a character has been inputted then the character frequency increases from 1 to 2, and the tree is in the fox to remain a Huffman tree and input character h then the frequency value so 6. 6. Input the character m and create a new tree and add the frequency to 7, as shown in Figure 1. continued (g) input a character and added a value to 3 and make the frequency to 8. 7. Input character d and add the frequency value d to 2 and the number of frequencies to 9. 8. Input character a and add the frequency value a to 4 and change the number of frequencies to 10. 351

Based on the comparison using the exponential comparison method of both algorithms are obtained results that sequitur algorithm more efficiently used in compressing text files than adaptive Huffman coding algorithm. Figure 1. Huffman Tree The compression result of the string "gadjahmada" produces 100 11 00 101 0101 which amounts to 70 bits. 4 Experimental Results In an experiment to compare sequitur algorithm with Adaptive Huffman Coding algorithm done with an application, wherein the application there is a button to input and text file and compress text file based on an algorithm and compare both algorithms, here's how it looks: 3.3 Comparison With Exponential Method To compare the two algorithms, we must define the criteria. The criteria of the two algorithms are as follows: 1. Ratio Of Compression (RC) a. Rc Adaptive Huffman Coding=80/25=3,2 b. R C Sequitur = 80/64= 1,25 2. Compression Ratio (CR) a. CR Adaptive Huffman Coding =70/80 x 100%=31,2% b. CR Sequitur = 64/80 x 100% = 80% 3. Space Savings (SS) a. SS Adaptive Huffman Coding = 100% - 31,2% = 68,8% b. SS Sequiture = 100% - 80% = 20% Tabel IV. Criteria Criterion Weight AHC Sequitur RC 0,25 3,2 1,2 CR 0,25 0,3 0,8 SS 0,5 0,7 0,2 Total Point (N)B 1 Point Algoritma AHC = (3,2)0,25 + (0.3)0,25 + (0,7)0,25 = 2,9 Point Algoritma Sequitur = (1,2)0,25 + (0,8)0,25 + (0,2)0,25 = 2,6 Figure 2. Comparison Process With Application IV. CONCLUSION Based on the exponential comparison method the sequitur algorithm is more effective in compressing the text file than the adaptive Huffman coding algorithm. The sequitur algorithm will compress if there is a loop of words in the text file that is entered and will stop processing if the non-terminal symbol used has reached non-terminal Z. Adaptive Huffman coding algorithm compresses Huffman tree shapes that will be changed from binary, in adaptive Huffman tree coding will continue to grow and will stop if all text has been entered. V. REFERENCES [1]. S. Porwal, Y. Chaudhary, J. Joshi, and M. Jain, "Data Compression Methodologies for Lossless Data and Comparison between Algorithms," vol. 2, no. 2, pp. 142 147, 2013. 352

[2]. C. G. Nevill-Manning and I. H. Witten, "Identifying Hierarchical Structure in Sequences: A linear-time algorithm," J. Artif. Intell. Res., vol. 7, pp. 67 82, 1997. [3]. Darma Putra, Pengolahan Citra. Yogyakarta: C.V ANDI OFFSET, 2010. [4]. S. D. Nasution and Mesran, "Goldbach Codes Algorithm for Text Goldbach Codes Algorithm for Text Compression," vol. 4, no. December, pp. 43 46, 2016. [5]. S. D. Nasution, G. L. Ginting, M. Syahrizal, and R. Rahim, "Data Security Using Vigenere Cipher and Goldbach Codes Algorithm," Int. J. Eng. Res. Technol., vol. 6, no. 1, pp. 360 363, 2017. [6]. E. Jubilee, Rahasia Manajemen File. Elex Media Komputindo, 2010. [7]. D. Salomon and G. Motta, HandBook Of Data Compression. Springer. [8]. M. S. Prof.Dr. Ir. Marimin, Pengambilan Keputusan Kriteria Majemuk. Jakarta: Grasindo, 2004. 353