Overview. Last Lecture. This Lecture. Next Lecture. Data Transmission. Data Compression Source: Lecture notes

Similar documents
ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Lossless Compression Algorithms

Compressing Data. Konstantin Tretyakov

EE67I Multimedia Communication Systems Lecture 4

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Lempel-Ziv-Welch (LZW) Compression Algorithm

Visualizing Lempel-Ziv-Welch

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes

EE-575 INFORMATION THEORY - SEM 092

15 July, Huffman Trees. Heaps

Dictionary techniques

Engineering Mathematics II Lecture 16 Compression

More Bits and Bytes Huffman Coding

CIS 121 Data Structures and Algorithms with Java Spring 2018

Huffman Codes (data compression)

Lossless compression II

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

Modeling Delta Encoding of Compressed Files

MODELING DELTA ENCODING OF COMPRESSED FILES. and. and

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

Multimedia Systems. Part 20. Mahdi Vasighi

A Comprehensive Review of Data Compression Techniques

Modeling Delta Encoding of Compressed Files

EE 368. Weeks 5 (Notes)

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

CS 206 Introduction to Computer Science II

Algorithms Dr. Haim Levkowitz

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

Greedy Algorithms CHAPTER 16

Digital Image Processing

Data Compression Techniques

Algorithms and Data Structures CS-CO-412

IMAGE COMPRESSION TECHNIQUES

LZW Compression. Ramana Kumar Kundella. Indiana State University December 13, 2014

University of Waterloo CS240R Fall 2017 Solutions to Review Problems

University of Waterloo CS240 Spring 2018 Help Session Problems

Ch. 2: Compression Basics Multimedia Systems

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

Text Compression through Huffman Coding. Terminology

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security

University of Waterloo CS240R Winter 2018 Help Session Problems

Repetition 1st lecture

Data Compression 신찬수

Chapter 7 Lossless Compression Algorithms

COSC431 IR. Compression. Richard A. O'Keefe

University of Waterloo CS240R Fall 2017 Review Problems

Text Compression. Jayadev Misra The University of Texas at Austin July 1, A Very Incomplete Introduction To Information Theory 2

Second Semester - Question Bank Department of Computer Science Advanced Data Structures and Algorithms...

Digital Image Processing

7: Image Compression

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2

CSE 143 Lecture 22. Huffman Tree

Greedy Algorithms. Alexandra Stefan

Journal of Computer Engineering and Technology (IJCET), ISSN (Print), International Journal of Computer Engineering

Greedy Algorithms and Huffman Coding

Intro. To Multimedia Engineering Lossless Compression

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)

6. Finding Efficient Compressions; Huffman and Hu-Tucker

Design and Analysis of Algorithms

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

Image Coding and Data Compression

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

Data Compression Scheme of Dynamic Huffman Code for Different Languages

FPGA based Data Compression using Dictionary based LZW Algorithm

Improving LZW Image Compression

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland

Image coding and compression

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

You can say that again! Text compression

Data and information. Image Codning and Compression. Image compression and decompression. Definitions. Images can contain three types of redundancy

DEFLATE COMPRESSION ALGORITHM

Final Exam Solutions

Fundamentals of Video Compression. Video Compression

Program Construction and Data Structures Course 1DL201 at Uppsala University Autumn 2010 / Spring 2011 Homework 6: Data Compression

Encoding. A thesis submitted to the Graduate School of University of Cincinnati in

Lecture: Analysis of Algorithms (CS )

Data Structures and Algorithms

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Chapter 16: Greedy Algorithm

Information Retrieval. Chap 7. Text Operations

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS

CS/COE 1501

Compression; Error detection & correction

DNA Inspired Bi-directional Lempel-Ziv-like Compression Algorithms

Linked Structures Songs, Games, Movies Part IV. Fall 2013 Carola Wenk

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27

VIDEO SIGNALS. Lossless coding

Error Resilient LZ 77 Data Compression

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

Indexing. CS6200: Information Retrieval. Index Construction. Slides by: Jesse Anderton

yintroduction to compression ytext compression yimage compression ysource encoders and destination decoders

Efficient Sequential Algorithms, Comp309. Motivation. Longest Common Subsequence. Part 3. String Algorithms

Optimized Compression and Decompression Software

Chapter 4: Application Protocols 4.1: Layer : Internet Phonebook : DNS 4.3: The WWW and s

Greedy Algorithms. CLRS Chapters Introduction to greedy algorithms. Design of data-compression (Huffman) codes

Transcription:

Overview Last Lecture Data Transmission This Lecture Data Compression Source: Lecture notes Next Lecture Data Integrity 1 Source : Sections 10.1, 10.3 Lecture 4 Data Compression 1

Data Compression Decreases space, time to transmit, and cost Bit rate is limited, can we send fewer bits and still deliver the data reliably? (Reduce the number of bits while retaining its meaning) Various approaches for data compression: Huffman, Run Length, LZW 3

Data Compression Ideas 1. Apple, Apple, Apple, Banana, Banana, Banana, Banana, Lemon, Lemon, Lemon 3 Apples, 4 Bananas, 3 Lemons Run Length Compression 2. Texting Language: B4, we usd 2 go 2 NY 2C my bro, his GF & thr 3 :-@ kds FTF. Before, we used to go to New York to see my brother, his girlfriend and their three screaming kids face to face. Lempel-Ziv Compression 3. Most frequently used words are short (The, Be, To, Of, And, A, In, I, It); many long words are not frequently used ( Pseudopseudohypoparathyroidism ). Huffman Compression 2

Huffman code Variable length code based on the frequency of character use. Most frequently used characters -> shortest codes Least frequently used characters -> longest codes A simple example Text EEEEAEEBFEEE (ASCII 12 * 7 = 84 bits) E-0, A-100, B-101, F-110 Code 000010000101110000 (18 bits) 4

Huffman Code (cont.) Huffman code: Code formation - Assign weights to each character - Merge two lightest weights into one root node with sum of weights (why binary tree?) - Repeat until one tree is left - Traverse the tree from root to the leaf (for each node, assign 0 to the left, 1 to the right) Lecture 4 - Data compression 5

Huffman Code (cont.) Text: ABECADBC. Lecture 3 - Data transmission & compression 6 24

Huffman Code (cont.) Huffman code: Code Interpretation No prefix property (Restriction): The code for any character never appears as the prefix or start of the code for any other character. (guarantees the codes can be translated back) Receiver continues to receive bits until it finds a code and forms the character 01110001110110110111 (extract the string) Lecture 4 Data compression 7

Huffman Code (cont.) Huffman code steps (description in Textbook) To each character, associate a binary tree consisting of just one node. To each tree, assign the character s frequency, which is called the tree s weight. Look for the two lightest-weight trees. If there are more than two, choose among them randomly. Merge the two into a single tree with a new root node whose left and right sub trees are the two we chose. Assign the sum of weights of the merged trees as the weight of the new tree. Repeat the previous step until just one tree is left. Lecture 4 Data compression 8

Run Length Encoding (Character-Level) Used for character data only Send an alternating set of numbers and characters. Example HHHHHHHUFFFFFFFFFFFFFF 7H1U14F Lecture 4 Data compression 9

Run Length Encoding (Bit-Level) Consider a picture of the letter T. T" 70-90% of the space is white space, which means many continuous zeroes to be transmitted. Group the runs of zeroes and send their length instead. Lecture 4 Data compression 10

Run Length Encoding (Bit-Level cont.) Decide the number of bits to represent a run length. Encoding algorithm (4 bit lengths) Count the number of 0s between two 1s If the number is less than 15, write it down in binary form. If it is greater than or equal to 15, write down 1111, and a following binary number to indicate the rest of the 0s. If more than 30, repeat this process. If the data starts with a 1, write down 0000 at the beginning. If the data ends with a 1, write down 0000 at the end. Send the binary string. Lecture 4 Data compression 11

Run Length Encoding (Bit-Level cont.) Decoding algorithm: Group all the bits into 4-bit groups. 1. For each 4-bit group, write down that number of 0s. 2. If at the end of the bit string, stop. 3. If not at the end of the bit string: If the 4-bit group was less than 15, write down a 1. Go to step 1. If the 4-bit group is 15, go to step 1. Lecture 4 Data compression 12

Run Length Encoding (Bit-Level cont.) Lecture 3 Data compression 13

Lempel-Ziv Compression In text, phrases or entire words are repeated very often. Look for repeated strings. Store them and a code in a dictionary. In the output, replace these repeated strings with the code. zip, unzip, compress command in Unix. Lecture 4 Data compression 14

Lempel-Ziv Compression (cont.) From Crichton, M. Jurassic Park. The tropical rain fell in drenching sheets, hammering the corrugated roof of the clinic building, roaring down the metal gutters, splashing on the ground in a torrent. Some repetitions: the - # ro - $ ing - % en - & rr - * Lecture 4 Data compression 15

Lempel-Ziv Compression (cont.) The tropical rain fell in drenching sheets, hammering the corrugated roof of the clinic building, roaring down the metal gutters, splashing on the ground in a torrent. #t$pical rain fell in dr&ch% sheets, hammer% #co*ugated $of of #clinic build%, $ar% down # metal gutters, splash% on #g$und in a to*&t. Lecture 4 Data compression 16

Lempel-Ziv-Welch (LZW) algorithm (cont.) Add all possible character codes to the dictionary w = ""; for (every character c in the incoming data) { if ((w + c) exists in the dictionary) { w = w + c; } else { add (w + c) to the dictionary; add the dictionary code for w to output; w = c; } } add the dictionary code for w to output; display output; Lecture 4 Data Compression 19

Lempel-Ziv-Welch (LZW) algorithm A dictionary is initialized to contain all the single characters. Scan through the input string for successively longer substrings (w+c) that is not in the dictionary. When such a string (w+c) is found, the index for the string less the last character (i.e., the longest substring that is in the dictionary, w) is sent to output. The new string (including the last character, w+c) is added to the dictionary with the next available code. The last input character (c) is then used as the next starting point to scan for substrings. Lecture 4 Data Compression 17

Lempel-Ziv-Welch (LZW) algorithm (cont.) Lecture 4 Data Compression 18

Lempel-Ziv-Welch (LZW) algorithm (cont.) No need to send the dictionary except the initial encoding for the alphabet letters. Need to agree on the initial coding between the sender and the receiver. The dictionary can be reconstructed as decompression is done. Lecture 4 Data Compression 20

Other Types of Compression Differential Encoding 21

Summary Huffman encoding Run-length encoding Lempel-Ziv Compression 22