Visualizing Lempel-Ziv-Welch

Similar documents
Overview. Last Lecture. This Lecture. Next Lecture. Data Transmission. Data Compression Source: Lecture notes

EE-575 INFORMATION THEORY - SEM 092

Lempel-Ziv-Welch (LZW) Compression Algorithm

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding

Program Construction and Data Structures Course 1DL201 at Uppsala University Autumn 2010 / Spring 2011 Homework 6: Data Compression

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

Lempel-Ziv-Welch Compression

CIS 121 Data Structures and Algorithms with Java Spring 2018

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Digital Image Processing

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

Abdullah-Al Mamun. CSE 5095 Yufeng Wu Spring 2013

Comparative Study of Dictionary based Compression Algorithms on Text Data

Digital Image Processing

An Efficient Implementation of LZW Compression in the FPGA

HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM

EE67I Multimedia Communication Systems Lecture 4

An Improved Reversible Data-Hiding Scheme for LZW Codes

Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we ha

IMAGE COMPRESSION TECHNIQUES

A Compression Technique Based On Optimality Of LZW Code (OLZW)

Encoding. A thesis submitted to the Graduate School of University of Cincinnati in

Multimedia Systems. Part 20. Mahdi Vasighi

Journal of Computer Engineering and Technology (IJCET), ISSN (Print), International Journal of Computer Engineering

CS/COE 1501

Modeling Delta Encoding of Compressed Files

CS 202 Data Structures (Spring 2007)

MODELING DELTA ENCODING OF COMPRESSED FILES. and. and

Recitation 3 Problems

Engineering Mathematics II Lecture 16 Compression

Analysis of Parallelization Effects on Textual Data Compression

FPGA based Data Compression using Dictionary based LZW Algorithm

Image coding and compression

Ch. 2: Compression Basics Multimedia Systems

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Worst-case running time for RANDOMIZED-SELECT

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

Modeling Delta Encoding of Compressed Files

CS 206 Introduction to Computer Science II

Supporting Level 2 Functionality

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland

CS/COE 1501

LZW Compression. Ramana Kumar Kundella. Indiana State University December 13, 2014

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS

Data Compression Techniques

A study in compression algorithms

2.2 Syntax Definition

Data Storage. Slides derived from those available on the web site of the book: Computer Science: An Overview, 11 th Edition, by J.

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 11, May 2014

Lossless Compression Algorithms

VIDEO SIGNALS. Lossless coding

LOSSLESS DATA COMPRESSION AND DECOMPRESSION ALGORITHM AND ITS HARDWARE ARCHITECTURE

Multimedia Systems. Part 4. Mahdi Vasighi

Noise Reduction in Data Communication Using Compression Technique

A Comprehensive Review of Data Compression Techniques

Multiple-Pattern Matching In LZW Compressed Files Using Aho-Corasick Algorithm ABSTRACT 1 INTRODUCTION

Image Compression Technique

On Data Latency and Compression

Enhancing Text Compression Method Using Information Source Indexing

Data Compression Scheme of Dynamic Huffman Code for Different Languages

2. ADDRESSING METHODS

DNA Inspired Bi-directional Lempel-Ziv-like Compression Algorithms

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION

18.3 Deleting a key from a B-tree

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2

DNA Inspired Bi-directional Lempel-Ziv-like Compression Algorithms

5.1 The String reconstruction problem

Compressing Data. Konstantin Tretyakov

University of Waterloo CS240R Fall 2017 Review Problems

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Network Working Group Request for Comments: 1962 Category: Standards Track June 1996

Study of LZ77 and LZ78 Data Compression Techniques

Implementation of Robust Compression Technique using LZ77 Algorithm on Tensilica s Xtensa Processor

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

Massachusetts Institute of Technology. Problem Set 2 Solutions. Solution to Problem 1: Is it Over Compressed or is it Modern Art?

Ch. 2: Compression Basics Multimedia Systems

Optimized Compression and Decompression Software

1.Define image compression. Explain about the redundancies in a digital image.

Data Structures and Algorithms

The Sender- the person who wishes to communicate with. Encoding- the process whereby a sender translate the information into a message

I. Introduction II. Mathematical Context

15 July, Huffman Trees. Heaps

The Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression

(Refer Slide Time: 01:12)

7: Image Compression

Simple variant of coding with a variable number of symbols and fixlength codewords.

Text Compression through Huffman Coding. Terminology

Dictionary techniques

Network Working Group Request for Comments: December 1998

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

Source Coding: Lossless Compression

Design and Implementation of FPGA- based Systolic Array for LZ Data Compression

Comparative data compression techniques and multi-compression results

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

Hash Table and Hashing

CS337 Project 1 : Compression

Implementation and Optimization of LZW Compression Algorithm Based on Bridge Vibration Data

Transcription:

Visualizing Lempel-Ziv-Welch The following slides assume you have read and (more or less) understood the description of the LZW algorithm in the 6.02 notes. The intent here is to help consolidate your understanding by giving you a way to visualize the essentials of the compression and decompression phases of the algorithm. We use a string of lower-case letters to represent the message or file that is to be compressed (so we can save upper-case letters for other purposes in our visualization!). The toy example here uses the character string abcabcabcabcabcabc as the message, for illustration. The first slide of the visualization shows this message sitting at the SENDER. The characters in the message that the SENDER is examining at any time are highlighted in red, as in abcabcabcabcabcabc. As the algorithm progresses, you will see the message being compressed into a sequence of dictionary indices or pointers that are communicated to the RECEIVER, which uses these to reconstruct the original message. George Verghese (verghese@mit.edu) for MIT 6.02, Fall 2010

Upper-case letters will play multiple roles in these slides, but the following color coding will help you separate the different roles: ABC will denote the dictionary entry for the word or character string abc. You will see these dictionary entries accumulate (in an upward arc in the visualization) as more and more words are added to the dictionary at the SENDER. ABC will denote the index of (or pointer to) the dictionary entry for the word abc. These indices are what the SENDER communicates (downward in the visualization) to the RECEIVER. We will keep count of the number of such indices transmitted from SENDER to RECEIVER. The concatenation of these indices constitutes the compressed text. To compute the amount of compression obtained, compare how many binary digits (or bits) it takes to communicate the indices (e.g., at 12 bits per index, which allows a dictionary of size 2**12=4096 words), versus how many bits it would have taken to communicate the original message (e.g., at 8 bits per character, as when ASCII code is used). Finally, we shall switch (the color of) a dictionary entry from ABC to ABC once the RECEIVER has reconstructed the contents of that entry from the indices transmitted to it by the SENDER.

Recall that both SENDER and RECEIVER are initialized with dictionaries containing the elementary characters, i.e., a, b, c,... Click through the following slides slowly, taking the time to recognize the changes that happen from one slide to the next, then interpreting those changes in terms of what you know about LZW. Be sure to keep going till the end, so you see a special case that needs to be dealt with in a special way.

S E N D E R

D I C T I O N A R Y E N T R Y A I N D E X # 1 S E N T

A 1 a R E C E I V E R

A 1 a

2 a

2 a b

R E C E I V E R S D I C T. 2 a b

C 3 a b

C 3 a b c

C 3 a b c

C 3 a b c

C C 4 a b c

C C 4 a b c a b

C C 4 a b c a b

C C 4 a b c a b

C C C 5 a b c a b

C C C 5 a b c a b c a

C C C 5 a b c a b c a

C C C 5 a b c a b c a

A C C C C C 6 a b c a b c a

A C C C C C 6 a b c a b c a b c

A C C C C C 6 a b c a b c a b c

A C C C C C 6 a b c a b c a b c

A C C C C C 6 a b c a b c a b c

A C C C C C C 7 a b c a b c a b c

A C C C C C C 7 a b c a b c a b c a b c

A C C C C C C 7 a b c a b c a b c a b c

A C C C C C C 7 a b c a b c a b c a b c

A C C C C C C 7 a b c a b c a b c a b c

A C C C C C C 7 a b c a b c a b c a b c

C A C C C C C C 8 a b c a b c a b c a b c

C A C C C C C C 8 a b c a b c a b c a b c????

The RECEIVER for the first time in this transmission receives an index that it does not yet have in its dictionary, hence the question marks:???? Why isn t the word ABCA yet in the RECEIVER s dictionary? Because the SENDER has just entered the word in its dictionary at the previous time step, and there hasn t been time for the receiver to augment its dictionary with this word. More generally, this problem arises when, and only when, the index that the SENDER transmits is for a word that was just incorporated into its dictionary at the preceding step. The fix for this hinges on noting that the above situation only happens when this dictionary word has the same last letter as its first letter! The next slide makes this clear for our example:

C A C C C C C C 8 a b c a b c a b c a b c????

So the fix is as follows: When the RECEIVER gets an index for a word that is not yet in its dictionary, it completes the dictionary word that it is currently building, by appending the first character of that word to the end of the word. In our example, the RECEIVER has abc as the initial part of the dictionary word, so it appends a to get abca, then makes a dictionary entry for this, and thereby interprets the received index correctly. This is illustrated next:

C A C C C C C C 8 a b c a b c a b c a b c????

C A C C C C C C 8 a b c a b c a b c a b c a???

C A C C C C C C 8 a b c a b c a b c a b c a???

C A C C C C C C 8 a b c a b c a b c a b c a b c a

And finally:

C A C C C C C C 8 a b c a b c a b c a b c a b c a

C A C C C C C C C C 9 a b c a b c a b c a b c a b c a

C A C C C C C C C C 9

C A C C C C C C C C 9

The bottom line: Instead of sending 18 characters, we sent 9 dictionary indices. If characters cost 8 bits/character and dictionary indices cost 12 bits/index, then the compressed file had 108 bits, versus the 144 in the uncompressed file.