Introduction to Data Compression

Similar documents
Data Compression 신찬수

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

Compression; Error detection & correction

Fundamentals of Video Compression. Video Compression

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

Compression; Error detection & correction

FPGA based Data Compression using Dictionary based LZW Algorithm

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

Image coding and compression

Chapter 1. Digital Data Representation and Communication. Part 2

Grayscale true two-dimensional dictionary-based image compression

Comparative Study of Dictionary based Compression Algorithms on Text Data

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

VC 12/13 T16 Video Compression

Simple variant of coding with a variable number of symbols and fixlength codewords.

Lossless Compression Algorithms

EE67I Multimedia Communication Systems Lecture 4

Data compression with Huffman and LZW

Engineering Mathematics II Lecture 16 Compression

Noise Reduction in Data Communication Using Compression Technique

So, what is data compression, and why do we need it?

A Novel Image Compression Technique using Simple Arithmetic Addition

CS 335 Graphics and Multimedia. Image Compression

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Keywords Data compression, Lossless data compression technique, Huffman Coding, Arithmetic coding etc.

Multimedia Networking ECE 599

Compressing Data. Konstantin Tretyakov

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

Lossless compression II

ECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University

DEFLATE COMPRESSION ALGORITHM

Introduction to Compression. Norm Zeck

CS/COE 1501

S 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources

G64PMM - Lecture 3.2. Analogue vs Digital. Analogue Media. Graphics & Still Image Representation

TEXT COMPRESSION ALGORITHMS - A COMPARATIVE STUDY

IMAGE COMPRESSION TECHNIQUES

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Ch. 2: Compression Basics Multimedia Systems

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

A Comparative Study Of Text Compression Algorithms

Error Resilient LZ 77 Data Compression

Optimized Compression and Decompression Software

A Research Paper on Lossless Data Compression Techniques

Multimedia Communications ECE 728 (Data Compression)

7: Image Compression

Topic 5 Image Compression

A Comprehensive Review of Data Compression Techniques

Lecture 5 Lossless Coding (II)

Lecture 5 Lossless Coding (II) Review. Outline. May 20, Image and video encoding: A big picture

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION

CS/COE 1501

Data Storage. Slides derived from those available on the web site of the book: Computer Science: An Overview, 11 th Edition, by J.

Repetition 1st lecture

General Computing Concepts. Coding and Representation. General Computing Concepts. Computing Concepts: Review

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

compression and coding ii

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security

Comparative data compression techniques and multi-compression results

Huffman Coding Implementation on Gzip Deflate Algorithm and its Effect on Website Performance

More Bits and Bytes Huffman Coding

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2

Bits and Bit Patterns

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1

I. Introduction II. Mathematical Context

13.6 FLEXIBILITY AND ADAPTABILITY OF NOAA S LOW RATE INFORMATION TRANSMISSION SYSTEM

Ch. 2: Compression Basics Multimedia Systems

A study in compression algorithms

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

Lecture Coding Theory. Source Coding. Image and Video Compression. Images: Wikipedia

Image Compression. CS 6640 School of Computing University of Utah

Sparse Transform Matrix at Low Complexity for Color Image Compression

Distributed source coding

A COMPRESSION TECHNIQUES IN DIGITAL IMAGE PROCESSING - REVIEW

Chapter 1. Data Storage Pearson Addison-Wesley. All rights reserved

Image Compression. cs2: Computational Thinking for Scientists.

Digital Image Processing

Technical lossless / near lossless data compression

EE-575 INFORMATION THEORY - SEM 092

Lempel-Ziv-Welch Compression

Lempel-Ziv-Welch (LZW) Compression Algorithm

Image Compression - An Overview Jagroop Singh 1

Improving LZW Image Compression

You can say that again! Text compression

IMGD The Game Development Process: File Formats

Lecture #3: Digital Music and Sound

CIS 121 Data Structures and Algorithms with Java Spring 2018

A Comparison between English and. Arabic Text Compression

LZW Compression. Ramana Kumar Kundella. Indiana State University December 13, 2014

DCT Based, Lossy Still Image Compression

VIDEO SIGNALS. Lossless coding

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

International Journal of Computer & Organization Trends Volume 3 Issue 2 March to April 2013

2-D SIGNAL PROCESSING FOR IMAGE COMPRESSION S. Venkatesan, Vibhuti Narain Rai

MULTIMEDIA AND CODING

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

Elementary Computing CSC 100. M. Cheng, Computer Science

Transcription:

Introduction to Data Compression Guillaume Tochon guillaume.tochon@lrde.epita.fr LRDE, EPITA Guillaume Tochon (LRDE) CODO - Introduction 1 / 9

Data compression: whatizit? Guillaume Tochon (LRDE) CODO - Introduction 2 / 9

Data compression: whatizit? Data compression is the process of modifying, encoding or converting the bits structure of some input data in order to reduce the storage size of this data. Input data, large size Compression algorithm Compressed data, smaller size Guillaume Tochon (LRDE) CODO - Introduction 2 / 9

Data compression: whatizit? Data compression is the process of modifying, encoding or converting the bits structure of some input data in order to reduce the storage size of this data. Input data, large size Compression algorithm Compressed data, smaller size Data decompression is the inverse process, namely restoring the compressed data back to its original form (or a close one) for usage. Guillaume Tochon (LRDE) CODO - Introduction 2 / 9

Data compression: whatizit? Data compression is the process of modifying, encoding or converting the bits structure of some input data in order to reduce the storage size of this data. Input data, large size Compression algorithm Compressed data, smaller size Data decompression is the inverse process, namely restoring the compressed data back to its original form (or a close one) for usage. Talking about data compression talking about the compression and the decompression algorithm. Guillaume Tochon (LRDE) CODO - Introduction 2 / 9

Why does it matter? An illustrative example How much would weigh a non compressed HD movie? Guillaume Tochon (LRDE) CODO - Introduction 3 / 9

Why does it matter? An illustrative example How much would weigh a non compressed HD movie? Each color is encoded over 1 byte (2 8 = 256 possible gray levels). Guillaume Tochon (LRDE) CODO - Introduction 3 / 9

Why does it matter? An illustrative example How much would weigh a non compressed HD movie? Each color is encoded over 1 byte (2 8 = 256 possible gray levels). Each pixel is composed of 3 colors (R,G,B). Guillaume Tochon (LRDE) CODO - Introduction 3 / 9

Why does it matter? An illustrative example How much would weigh a non compressed HD movie? Each color is encoded over 1 byte (2 8 = 256 possible gray levels). Each pixel is composed of 3 colors (R,G,B). There are 1280 720 pixels in a given frame. Guillaume Tochon (LRDE) CODO - Introduction 3 / 9

Why does it matter? An illustrative example How much would weigh a non compressed HD movie? Each color is encoded over 1 byte (2 8 = 256 possible gray levels). Each pixel is composed of 3 colors (R,G,B). There are 1280 720 pixels in a given frame. There are 25 frames per second. Guillaume Tochon (LRDE) CODO - Introduction 3 / 9

Why does it matter? An illustrative example How much would weigh a non compressed HD movie? Each color is encoded over 1 byte (2 8 = 256 possible gray levels). Each pixel is composed of 3 colors (R,G,B). There are 1280 720 pixels in a given frame. There are 25 frames per second. An average movie is 1 hour 30 minutes long ( 5400 seconds). Guillaume Tochon (LRDE) CODO - Introduction 3 / 9

Why does it matter? An illustrative example How much would weigh a non compressed HD movie? Each color is encoded over 1 byte (2 8 = 256 possible gray levels). Each pixel is composed of 3 colors (R,G,B). There are 1280 720 pixels in a given frame. There are 25 frames per second. An average movie is 1 hour 30 minutes long ( 5400 seconds). Leading to a total weight of: 5400 25 1280 720 3 = Guillaume Tochon (LRDE) CODO - Introduction 3 / 9

Why does it matter? An illustrative example How much would weigh a non compressed HD movie? Each color is encoded over 1 byte (2 8 = 256 possible gray levels). Each pixel is composed of 3 colors (R,G,B). There are 1280 720 pixels in a given frame. There are 25 frames per second. An average movie is 1 hour 30 minutes long ( 5400 seconds). Leading to a total weight of: 5400 25 1280 720 3 = 373 248 Mb. 80 single-side, single-layer DVDs! And that is not even considering the sound... Guillaume Tochon (LRDE) CODO - Introduction 3 / 9

Why does it matter? If the previous example didn t convince you... Data compression is interesting for several reasons: To save space/memory. ++ Particularly true in the early days of computer science, when memory was über costly (it nonetheless remains the case nowadays). Guillaume Tochon (LRDE) CODO - Introduction 4 / 9

Why does it matter? If the previous example didn t convince you... Data compression is interesting for several reasons: To save space/memory. ++ Particularly true in the early days of computer science, when memory was über costly (it nonetheless remains the case nowadays). To speed-up processing time (especially for images). ++ Can also ease the transfer of voluminous data (less bandwidth needed). Guillaume Tochon (LRDE) CODO - Introduction 4 / 9

Why does it matter? If the previous example didn t convince you... Data compression is interesting for several reasons: To save space/memory. ++ Particularly true in the early days of computer science, when memory was über costly (it nonetheless remains the case nowadays). To speed-up processing time (especially for images). ++ Can also ease the transfer of voluminous data (less bandwidth needed). To increase compatibility. ++ There are as many RAW formats as imaging sensor manufacturers. A given image viewer is very likely not to be compatible with all of them, but it will always be compatible with JPEG. Guillaume Tochon (LRDE) CODO - Introduction 4 / 9

Why does it matter? If the previous example didn t convince you... Data compression is interesting for several reasons: To save space/memory. ++ Particularly true in the early days of computer science, when memory was über costly (it nonetheless remains the case nowadays). To speed-up processing time (especially for images). ++ Can also ease the transfer of voluminous data (less bandwidth needed). To increase compatibility. ++ There are as many RAW formats as imaging sensor manufacturers. A given image viewer is very likely not to be compatible with all of them, but it will always be compatible with JPEG. To increase security. ++ Data compression and cryptography are strongly linked: a compressed data is illegible for anyone who does not possess the correct decompression algorithm. However, a corrupted compressed file is irremediably lost (can be a problem for some applications). Guillaume Tochon (LRDE) CODO - Introduction 4 / 9

A brief historical review 1838 Morse code can be considered as the first compression algorithm since frequent letters ( e, t ) are given shorter support. 1948 Claude Shannon establishes the Information Theory with its seminal paper A Mathematical Theory of Communication, laying the mathematical basis for data compression and transmission. 1952 David Huffman publishes the encoding algorithm that is now named after him. 1977 Abraham Lempel and Jacob Ziv introduce LZ77 as the first adaptive compression algorithm. 1984 Terry Welch improves LZ77 to give birth to the LZW algorithm. 1980s Computing power and storage capacities increase, allowing for the manipulation of sound and images and calling for lossy compression algorithms. 1992 The first JPEG standard is released (still evolving nowadays). 1993 Following JPEG, the first MPEG-1 standard is completed. A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 3 4 5 6 7 8 9 0 Guillaume Tochon (LRDE) CODO - Introduction 5 / 9

Data compression Two wide and highly different families of algorithms Lossless compression: exploits statistical redundancy of the data to represent it without losing any information. Data before and after decompression are identical. Guillaume Tochon (LRDE) CODO - Introduction 6 / 9

Data compression Two wide and highly different families of algorithms Lossless compression: exploits statistical redundancy of the data to represent it without losing any information. Data before and after decompression are identical. Lossy compression: discard some information during the compression/decompression process. Data after compression is not the same as before compression. Guillaume Tochon (LRDE) CODO - Introduction 6 / 9

Lossless compression LZR (1981) LZB (1987) LZSS (1982) LZH (1987) LZRW1 (1991) LZRW1-A, 2, 3, 3- A, 4 Mature discipline (not so much research going on in the field nowadays). Very efficient on noise-free data (such as text documents, executable files, etc), but performances degrade with noise. Dictionary-based LZ77 (1977) ROLZ (1991) DEFLATE (1993) LZS (1994) LZX (1995) LZO (1996) LZMA (1998) Statistical Lempel- Ziv (2001) LZP (1995) DEFLATE64 (1999) LZMA2 (2009) LZJB (1998) Serves as base units for more elaborated lossy compression algorithms. LZ78 (1978) LZW (1984) LZJ (1985) LZC (1985) LZMW (1985) LZWL (2006) LZT (1987) LZAP (1988) Other PPM (1984) bzip2 (1996) PAQ (2002) 20+ variants (2003-2009) Guillaume Tochon (LRDE) CODO - Introduction 7 / 9

Lossless compression LZR (1981) LZB (1987) LZSS (1982) LZH (1987) LZRW1 (1991) LZRW1-A, 2, 3, 3- A, 4 Mature discipline (not so much research going on in the field nowadays). Very efficient on noise-free data (such as text documents, executable files, etc), but performances degrade with noise. Dictionary-based LZ77 (1977) ROLZ (1991) DEFLATE (1993) LZS (1994) LZX (1995) LZO (1996) LZMA (1998) Statistical Lempel- Ziv (2001) LZP (1995) DEFLATE64 (1999) LZMA2 (2009) LZJB (1998) Serves as base units for more elaborated lossy compression algorithms. LZ78 (1978) LZW (1984) LZJ (1985) LZC (1985) LZMW (1985) LZWL (2006) LZT (1987) LZAP (1988) Other PPM (1984) bzip2 (1996) PAQ (2002) 20+ variants (2003-2009) Guillaume Tochon (LRDE) CODO - Introduction 7 / 9

Lossy compression Lossy compression algorithms assume that some part of the data to compress can be discarded, such that the human user won t notice the difference. But how to evaluate which information is relevant and which one is redundant/useless in some data? Well suited for sound and image compression, but not for text files (you may not want a piece of code to be altered after compression/decompression...). Still an active field of research (wavelets, compressed sensing, etc). high compression, bad quality low compression, good quality Guillaume Tochon (LRDE) CODO - Introduction 8 / 9

General outline 1 Introduction 2 A flavor of Information Theory 3 Lossless compression algorithms - Run-length encoding algorithm - Huffman compression algorithm - bzip2 compression algorithm - LZW compression algorithm 4 Analog-to-digital conversion 5 Lossy compression algorithms - Some mathematical preliminaries to JPEG - JPEG compression algorithm for grayscale images - JPEG compression algorithm for color images - The one and only Principal Component Analysis Guillaume Tochon (LRDE) CODO - Introduction 9 / 9