Data Compression Media Signal Processing, Presentation 2 Presented By: Jahanzeb Farooq Michael Osadebey
What is Data Compression? Definition -Reducing the amount of data required to represent a source of information (while preserving the original content as much as possible). Objectives 1- Reduce the amount of data storage space required. 2- Reduce length of data transmission time over the network.
Categories of Data Compression Lossy Data Compression -The original message can never be recovered exactly as it was before it was compressed. -Not good for critical data, when we cannot afford to loss even a single bit. -Used mostly in sound, video, image compressions where the losses can be tolerated. -A threshold level is used for truncation. (for example In a sound file, very high and low frequencies, which the human ear can not hear, may be truncated from the file) -Examples: JPG, MPEG -Lossy techniques are much more effective at compression than lossless methods. The higher the compression ratio, the more noise added to the data.
Categories of Data Compression Lossless Data Compression -The original message can be exactly decoded. -Repeated patterns in a message are found and encoded in an efficient manner. -Also referred to as redundancy reduction. -Must required for textual data, executable code, word processing files, tabulated numbers. -Popular algorithms: LZW(Lempel-Ziv-Welch), RLE(Run Length Encoding), Huffman coding, Arithmetic Coding, Delta Encoding. -GIF images (an example of lossless image compression)
Applications: Why We Need Data Compression? The two most important points are: 1-Data storage -Modern data processing applications require storage of large volumes of data. -Compressing a file to half of its original size is equivalent to doubling the capacity of the storage medium. 2-Data transmission -Modern communication networks require massive transfer of data over communication channels. -Compressing the amount of data to be transmitted is equivalent to increasing the capacity of the communication channel. -Smaller a file the faster it can be transferred over the channel.
Applications Applications -Wide range of applications. We can say Data Compression is used almost everywhere. Types -Image Compression -(e.g JPG images) -Audio Compression -(e.g MP3 s audio) -Video Compression -(e.g DVD s) -General Data Compression -(e.g ZIP files)
Data Compression Algorithms 1-Huffman coding 2-Run Length Encoding 3-Lempel-Ziv-Welch Encoding 4-Arithmatic coding 5-Delta Encoding Some others... 6-Adaptive Huffman coding 7-Wavelet compression 8-Discrete Cosine Transform
Huffman Coding -The characters in a data file are converted to a binary code. -The most common characters in the input file(characters with higher probability) are assigned short binary codes and -least common characters(with lower probabilities) are assigned longer binary codes. -Codes can be of different lengths
Lempel-Ziv-Welch -Uses a dictionary or code table. -Done by constructing a "dictionary" of words or parts of words in a message, and then using pointers to the words in the dictionary. -LZW to compress text, executable code, and similar data files to about one-half their original size. Higher compressions of 1:5 can also be achievable. Example: The string "ain" can be stored in the dictionary and then pointed to when it repeats.
Lempel-Ziv-Welch
Lempel-Ziv-Welch
Run Length Encoding -Coding data with frequently repeated characters. -It is called run-length because a run is made for repeated bits and coded in lesser bits by only stating how many bits were there. Example: -A file with 0 as repeating character. -Two characters in the compressed file replace each run of zeros. -For the first 3 repeating 0 s in original file, the first encdoed stream in compressed file is showing that 0 was repating 3 times.
Arithmetic Coding -Message is encoded as a real number in an interval from 0 to 1. -Shows better performance than Huffman coding. Disadvantages -The whole codeword must be received to start decoding. -If there is a corrupt bit in the codeword, the entire message could become corrupt. -Limited number of symbols to encode within a codeword.
Arithmetic Coding Symbol Probability Interval A 0.2 (0.0, 0.2) B 0.3 (0.2, 0.5) C 0.1 (0.5, 0.6) D 0.4 (0.6, 1.0) Symbol New A Interval A (0.0, 0.04) B (0.04, 0.1) C (0.1, 0.102) D (0.102, 0.2) Symbol New B Interval A (0.102, 0.1216) B (0.1216, 0.151) C (0.151, 0.1608) D (0.1608, 0.2) Symbol New D Interval A (0.1608, 0.16864) B (0.16864, 0.1804) C (0.1804, 0.18432) D (0.18432, 0.2)