Lossless compression B 1 U 1 B 2 C R D! CSCI 470: Web Science Keith Vertanen

Size: px
Start display at page:

Download "Lossless compression B 1 U 1 B 2 C R D! CSCI 470: Web Science Keith Vertanen"

Transcription

1 Lossless compression B U B U B 2 U B A ϵ CSCI 47: Web Science Keith Vertanen C R D!

2 Lossless compression Mo7va7on Overview Rules and limits of the game Things to exploit Run- length encoding (RLE) Exploit runs of same character Huffman coding Variable- length codeword for each pajern (character) Transmit codewords plus compressed data Sec6on 5.5 2

3 Lossless compression Reduce size of a file Mo7va7on Save space while storing it Data always expands to fill available drive space Save space while transmipng it Bandwidth growing rapidly, but so are files! HD video: (92 * 8) pixels/frame * 3 frames/sec * 24 bits/pixel =.5Gbps! Lossless = get back exactly what you put in (e.g. zip) Lossly compression (stay tuned) Informa7on is lost (e.g. JPEG, MP3) 3

4 Name Value Million 6 megabyte Billion 9 gigabyte Trillion 2 terabyte Quadrillion 5 petabyte Quin7llion 8 exabyte hjp://www-.ibm.com/sobware/data/bigdata/ 4

5 Lossless compression: applica7ons Generic file compression compress, gzip, zip, bzip2, 7z, xz NTFS, HFS+, ZFS Image files GIF, PNG, TIFF Audio files Free Lossless Audio Codec (FLAC) Apple Lossless Audio Codec (ALAC) Data transmission HTTP, PPP, SSH, fax machines, v.92 modems 5

6 Compression and expansion bitstream B compressed bitstream C(B) original bitstream B Compress Expand Data we want to be smaller Smaller (hopefully) version of data Exact version of original data Compression ra8o: bits in C(B) / bits in B Example: 7 ASCII characters, 7 bits each = 9 bits Output 2 codewords, 8 bits/codeword = 96 bits Compression ra7o = 8% 6

7 "created_at":"mon Apr 4 3:54:57 + \u8ea\u5df\u96e3\u53d7 \u53ea\u6 79\u8ea\u5df\u624d\u6c2\u32","source":"\u3ca href=\" rel=\"nofollow\"\u3etwitter for iphone\u3c\/a\u3e","t runcated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name ":null,"user":"id": ,"id_str":" ","name":"chyh Leng","screen_name":"chyhleng","location":"","url":null,"description":"272 :)Music\u266a Kpop \ ua7 Kdrama\uff2c\uff2f\uff36\uff25\uff32\u265 EverLastingFriend \u2665 Minoz \u2665","protected":false,"followers_count":96,"friends_count":39,"listed_count ":,"created_at":"wed Nov 2 3:4: ","favourites_count":82,"utc_offset":288,"time_zone":"Kuala Lumpur","geo_enabled":true,"verified":false,"sta tuses_count":4263,"lang":"en","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"9ae4e8","profile_bac kground_image_url":" e6\/bg.gif","profile_background_tile":false,"profile_image_url":" ge_url_https":" \/ \/ ","profile_link_color":"84B4","profile_sidebar_border_color":"BDDCAD","profile_sidebar_fill_color":"DDFFCC","profile_text_color":" ","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null," geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":"created_at":"mon Apr 4 3:47: + 24","id": ,"id_str" :" ","text":"\u8ea\u5df\u96e3\u53d7 \u53ea\u679\u8ea\u5df\u624d\u6c2\u32","source":"\u3ca href=\" ne\" rel=\"nofollow\"\u3etwitter for iphone\u3c\/a\u3e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_ id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":"id":7272,"id_str":"7272","name":"\u852\u7f8e\u5","screen_name":"fa nshu_choy","location":"","url":null,"description":"\u222 People live because of a dream \u222","protected":false,"followers_count":627,"friends_count":65,"li sted_count":4,"created_at":"tue Jul 7 4::4 + 22","favourites_count":46,"utc_offset":288,"time_zone":"Beijing","geo_enabled":true,"verified":false,"statuses_count":523,"lang":"zh- tw","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"356","pr ofile_background_image_url":" 29,743,396 katie339.txt plaintext mage_url_https":" profile_image_url":" 52,583,76 katie339.txt.z compress LZW e_images\/ \/utpanlr_normal.jpeg","profile_banner_url":" "555","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"E6F6F9","profile_text_color":"333333","profile_use_background_image":true,"defaul 3,875,422 katie339.txt.zip zip DEFLATE t_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null,"geo":null,"coordinates":null,"place":null,"con tributors":null,"retweet_count":,"favorite_count":5,"entities":"hashtags":[],"symbols":[],"urls":[],"user_mentions":[],"favorited":false,"retweeted":false," 23,862,52 katie339.txt.bz2 bzip2 Burrows- Wheeler lang":"zh","retweet_count":,"favorite_count":,"entities":"hashtags":[],"symbols":[],"urls":[],"user_mentions":["screen_name":"fanshu_choy","name":"\u852\u 7f8e\u5","id":7272,"id_str":"7272","indices":[3,5]],"favorited":false,"retweeted":false,"filter_level":"medium","lang":"zh" 2,84,286 katie339.txt.lzma2.7z 7- zip LZMA2 "created_at":"mon Apr 4 3:54: ","id": ,"id_str":" ","text":"hallo","source":"web","truncated":false,"in_reply_t o_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":"id": ,579,65 katie339.txt.7z 7- zip LZMA 73,"id_str":" ","name":"geilespuitfles","screen_name":"geilespuitfles","location":"","url":null,"description":null,"protected":false,"followers_count" :697,"friends_count":35,"listed_count":2,"created_at":"Thu Aug 9:8:8 + 23","favourites_count":,"utc_offset":null,"time_zone":null,"geo_enabled":f 2,32,68 katie339.txt.xz xz LZMA2 alse,"verified":false,"statuses_count":692,"lang":"nl","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_col or":"cdeed","profile_background_image_url":" 2,27,44 katie339.txt.ppmd.7z 7- zip PPMd om\/images\/themes\/theme\/bg.png","profile_background_tile":false,"profile_image_url":" e947fe4bf5d74f77_normal.jpeg","profile_image_url_https":" al.jpeg","profile_link_color":"84b4","profile_sidebar_border_color":"cdeed","profile_sidebar_fill_color":"ddeef6","profile_text_color":"333333","profile_use_ background_image":true,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null,"geo":null,"coordi nates":null,"place":null,"contributors":null,"retweet_count":,"favorite_count":,"entities":"hashtags":[],"symbols":[],"urls":[],"user_mentions":[],"favorite d":false,"retweeted":false,"filter_level":"medium","lang":"et" "delete":"status":"id": ,"user_id":337896,"id_str":" ","user_id_str":"337896" "created_at":"mon Apr 4 3:54: ","id": ,"id_str":" ","text":"@leeaoilove \u342\u34a\u344\u369\u3fc\u357\u 35f\uff","source":"\u3ca href=\" rel=\"nofollow\"\u3etwitter for iphone\u3c\/a\u3e","truncated":false,"in_rep ly_to_status_id": ,"in_reply_to_status_id_str":" ","in_reply_to_user_id": ,"in_reply_to_user_id_str":" ","in _reply_to_screen_name":"leeaoilove","user":"id": ,"id_str":" ","name":"\u35\u344\u3fc\u35f","screen_name":"keeita_423","location":""," url":null,"description":"\u9cf4\u6d7728 \u3c6\u3cb\u3b9\u9e8 \u9cf4\u6d77\u36e\u4eba\u3d5\u3a9\u3ed\u3fc\u357\u366\u3fc\uff","protected":false,"f ollowers_count":2,"friends_count":237,"listed_count":,"created_at":"wed Dec 4 4:4: ","favourites_count":78,"utc_offset":null,"time_zone":null,"geo_enabled":true,"verified":false,"statuses_count":336,"lang":"ja","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profil e_background_color":"cdeed","profile_background_image_url":" :\/\/abs.twimg.com\/images\/themes\/theme\/bg.png","profile_background_tile":false,"profile_image_url":" 626\/XG8Azfho_normal.jpeg","profile_image_url_https":" ttps:\/\/pbs.twimg.com\/profile_banners\/ \/ ","profile_link_color":"84b4","profile_sidebar_border_color":"cdeed","profile_sidebar_fill_col or":"ddeef6","profile_text_color":"333333","profile_use_background_image":true,"default_profile":true,"default_profile_image":false,"following":null,"follow_req uest_sent":null,"notifications":null,"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":,"favorite_count":,"entities":"hashtags" :[],"symbols":[],"urls":[],"user_mentions":["screen_name":"leeaoilove","name":"\u342\u34a\u374\u3c\u37e\u393","id": ,"id_str":" ","in dices":[,2]],"favorited":false,"retweeted":false,"filter_level":"medium","lang":"ja"

8 "A second aspect of the present inven7on which further enhances its ability to achieve high compression percentages, is its ability to be applied to data recursively. Specifically, the methods of the present inven7on are able to make mul7ple passes over a file, each 7me further compressing the file. Thus, a series of recursions are repeated un7l the desired compression level is achieved." "the direct bit encode method of the present inven7on is effec7ve for reducing an input string by one bit regardless of the bit pajern of the input string." 8

9 Universal data compression? Reality: No algorithm can compress every bitstream Proof (by contradic7on) Suppose you have a universal compressor U Given bitstream B, use U to compress to smaller B Compress B to get smaller B 2 Con7nue un7l bitstream is of size Thus all bitstream can be compressed to bits! Proof 2 (coun7ng) Suppose you can compress all - bit strings 2 possible bit strings with bits How many possible shorter encodings, 999 bits? (# of bit numbers) + (# of 2 bit numbers) + + (# of 999 bit numbers) = 2 - Thus fewer than the 2 we need for unique mapping B U B U B 2 U ϵ 9

10 24 x 768 = 786,432 bits java PictureDump < bits.bin Compressed with standard compression u7li7es My top- secret method! 98,34 bits.bin 98,346 bits.bin.gz 98,57 bits.bin.zip 99,8 bits.bin.bz2 98,368 bits.bin.xz 232 bits.bin.kdv 232 * 8 / 786,432.24% of original! public class RandomBits public static void main(string [] args) int x = ; for (int i = ; i < ; i++) x = x * ; BinaryStdOut.write(x > ); BinaryStdOut.close();

11 Another set of 24 x 768 = 786,432 bits What is the op7mal compressor for this image? Undecideable! In fact this image is completely random. java PictureDump < bin,48, bin,53, bz2,48, gz,49, zip

12 Redundancy in English How redundant is wrijen English? Yet aoccdrnig to a sudty at Cmabrigde Uinerv7sy, it deosn t mjaer in waht oredr the ljeers in a wrod are, the olny iprmoetnt 7hng is taht the frist and lsat ljeer be at the rghit pclae. The rset can be a Joal mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Answer: very! Shannon es7mate: bits per lejer 2

13 Approaches to compression Exploit + of the following: ) Small alphabets 2) Long sequences of iden7cal bits 3) Frequently used characters 4) Long reused bit sequences (next 7me) We'll look at an example of each Including Java implementa7on Using support classes for: Binary file input/output Data structures 3

14 Reading and wri7ng binary data public class BinaryStdIn boolean readboolean() // Read bit of data, return as a boolean value char readchar() // Read 8 bits of data, return as a char value char readchar(int r) // Read r bits of data, return as a char value // Read r bits for byte, short, int, long, double boolean isempty() // Is the bitstream empty? void close() // Close the bitstream public class BinaryStdOut void write(boolean b) // Write the specified bit void writechar(char c) // Write the specified 8- bit char void write(char c, int r) // Write r least significant bits of char c // Write r LSB of byte, short, int, long, double void close() // Close the bitstream 4

15 Visualizing a bitstream How to view a bitstream? % more abra.txt ABRACADABRA! Bitstream as characters % java BinaryDump 6 < abra.txt 4 bits Bitstream represented by and 's % java HexDump 4 < abra.txt a 4 bits Bitstream represented by 2- digit hex numbers % java PictureDump 6 6 < abra.txt Bitstream as pixels in a picture 5

16 Method : Small alphabets public class OutputDate public static void main(string [] args) int month = 2; int day = 3; int year = 999; int mode = Integer.parseInt(args[]); if (mode == ) System.out.print(month+"/"+day+"/"+year); else if (mode == ) BinaryStdOut.write(month); BinaryStdOut.write(day); BinaryStdOut.write(year); BinaryStdOut.close(); else BinaryStdOut.write(month, 4); BinaryStdOut.write(day, 5); BinaryStdOut.write(year, 2); BinaryStdOut.close(); 2 / 3 / % java OutputDate java BinaryDump 8 8 bits Mode : output as ASCII text Mode : output four 32- bit ints Mode 2: output variable length bit fields 6

17 public class OutputDate public static void main(string [] args) int month = 2; int day = 3; int year = 999; int mode = Integer.parseInt(args[]); if (mode == ) System.out.print(month+"/"+day+"/"+year); Method : Small alphabets else if (mode == ) BinaryStdOut.write(month); BinaryStdOut.write(day); BinaryStdOut.write(year); BinaryStdOut.close(); else BinaryStdOut.write(month, 4); BinaryStdOut.write(day, 5); BinaryStdOut.write(year, 2); BinaryStdOut.close(); % java OutputDate java BinaryDump bits Mode : output as ASCII text Mode : output four 32- bit ints Mode 2: output variable length bit fields 7

18 public class OutputDate public static void main(string [] args) int month = 2; int day = 3; int year = 999; int mode = Integer.parseInt(args[]); if (mode == ) System.out.print(month+"/"+day+"/"+year); Method : Small alphabets else if (mode == ) BinaryStdOut.write(month); BinaryStdOut.write(day); BinaryStdOut.write(year); BinaryStdOut.close(); else BinaryStdOut.write(month, 4); BinaryStdOut.write(day, 5); BinaryStdOut.write(year, 2); BinaryStdOut.close(); Compressing by using a small alphabet: e.g. use 2- bit code for nucleo8des A, C, T, G % java OutputDate 2 java BinaryDump bits Padding since we must end on a byte boundary. Mode : output as ASCII text Mode : output four 32- bit ints Mode 2: output variable length bit fields 8

19 Method 2: Long sequences Run Length Encoding (RLE) Exploits simple form of redundancy Long runs of same bit value Store the count using 8- bits (- 255) Alternate between and If >255, put run of length of other bit, con7nue 9

20 public class RunLength private static final int R = 256; // Maximum run- length count private static final int lgr = 8; // Number of bits per count public static void compress() char run = ; boolean old = false; while (!BinaryStdIn.isEmpty()) boolean b = BinaryStdIn.readBoolean(); if (b!= old) else BinaryStdOut.write(run, lgr); run = ; old =!old; if (run == R- ) run++; // Start out with - bit // Did the bit value change? // Write out the count for completed run // We have reached 255, time to output BinaryStdOut.write(run, lgr); // Write run of 255 run = ; BinaryStdOut.write(run, lgr); // Write a run of of the other bit BinaryStdOut.write(run, lgr); BinaryStdOut.close(); 2

21 public static void expand() boolean b = false; while (!BinaryStdIn.isEmpty()) int run = BinaryStdIn.readInt(lgR); // Read 8- bit count from stdin for (int i = ; i < run; i++) BinaryStdOut.write(b); // Write - bit to stdout b =!b; BinaryStdOut.close(); // Pads s to get to byte boundary public static void main(string[] args) if (args[].equals("- ")) compress(); else if (args[].equals("+")) expand(); else throw new RuntimeException("Illegal command line argument"); 2

22 % java BinaryDump 32 < q32x48.bin 536 bits bit- run = 79 () bit- run = 7 () bit- run = 22 () bit- run = 5 ()... % java RunLength - < q32x48.bin java BinaryDump bits Compression ratio: 44 / 536 = 74% 22

23 ABCDEFGHIJKLMNOPQRSTUVWXYZABABABABABABBABABBBABABABBA BBABABABABBBBBBABABABABABAABABABBABBABABABBBABABABABA Exploi8ng small alphabet: BABABABAABABABBABABABABABBABAABBABABABABAABABBBABABAB 55 columns x 8 rows = 99 characters ABABAAABABABABABAABABABABABBBABABABABAAABABABABABBABA 26 possible lejers, A- Z BABABABABABAAABABAABBABABAAABBBABABBABAAAAABABABABAAA 26 < 2 5 = 32 BABABABABABABABBBABABABABAAABAABABABABABBABABABABABAB 5 bits/character * 99 characters = 495 bits ABAAABABABABABABABABABABABABABABABABABABBABABBBABABAB ABABABABBBABABAAAAAAABAAAAAAABAABABABAAAAAABABBABABAB BABABAAABABBBBBBBBBAABABABBBBBBBBBBABABABABABABABBBAB Huffman coding: ABABABABABABABABABABBBABABABBABBBABBBABABABABBABABABB Key idea: Use shorter codewords for common characters ABABABABAAABAABABABABABBABBABBAABBBABABBBAABABABBAABA BABBABABAABABABABABABABAAABABABABABABABABABABABABABAB Different number of bits used to encode different characters ABABABABABABBABABBBABABABABABABABBBABBABAAAAAAABBAAAB ABABABABABABABABABABABABABABABBBABABABBABABBAABAABABA ABABABABABAABABBABABABABABABAABABABABABABABABABABBABA ABABBBBABABABBABBBABABBABABAAAABBAABBABABBABABABABABB ABBABBBBABAABABBABABABABAABBBABABABABAAABABABABABBABA ABAABABABABABABABAAABABABABABABBBABABAAABABAAAABBABBA 23

24 Method 3: Frequency of characters Variable- length prefix- free codes Map from characters to bit strings (codewords) Choose codewords so none is a prefix of another ABRACADABRA! Text to be compressed key value key value! A B C D R A B RA CA DA B RA! Scheme : 3 bits! A B C D R A B R A C A D A B R A! Scheme 2: 29 bits 24

25 Trie representa7on Represent prefix- free code as a binary trie Characters at leafs Codeword is path from root to leaf = leb, = right key value! A B C D B A R A B R A C A D A B R A! 29 bits C R D! 25

26 Using the trie Compression Start at leaf of target character Follow path to root, print bits in reverse order or create a symbol table Expansion Start at root Go leb if bit =, go right if bit = If leaf node, output character and return to root key value! A B C D B A R A B R A C A D A B R A! 29 bits C R D! 26

27 private static class Node implements Comparable<Node> private final char ch; private final int freq; private final Node left, right; // Initialize a new Node Node(char ch, int freq, Node left, Node right) this.ch = ch; this.freq = freq; this.left = left; this.right = right; // Is this node a leaf? private boolean isleaf() return (left == null && right == null); // Compare Nodes by frequency public int compareto(node that) return this.freq - that.freq; 27

28 Expansion public static void expand() // Read in the encoding trie Node root = readtrie(); // Number of bytes to write int length = BinaryStdIn.readInt(); // Decode using the Huffman trie for (int i = ; i < length; i++) Node x = root; // Expand codeword for the ith character while (!x.isleaf()) boolean bit = BinaryStdIn.readBoolean(); if (bit) x = x.right; else x = x.left; BinaryStdOut.write(x.ch); BinaryStdOut.flush(); Expansion Start at root Go leb if bit=, go right if bit= If leaf node, output character and return to root B C R D! A 28

29 TransmiPng the trie Trie needed in order to expand Must be sent along with the data Causes overhead, but small if message is long Write preorder traversal of trie Mark leaf and internal nodes with a bit B C R D! A private static void writetrie(node x) if (x.isleaf()) BinaryStdOut.write(true); BinaryStdOut.write(x.ch); return; BinaryStdOut.write(false); writetrie(x.left); writetrie(x.right); B C R D! A 29

30 Reading the trie Reconstruc7ng from preorder traversal Use / bits to decide if internal or leaf node B C R D! A private static Node readtrie() boolean isleaf = BinaryStdIn.readBoolean(); if (isleaf) return new Node(BinaryStdIn.readChar(), -, null, null); else return new Node('\', -, readtrie(), readtrie()); B C R D! A 3

31 Building the trie Can we always find op7mal prefix- free code? Yes! Discovered by David Huffman while a PhD student Huffman algorithm: Count frequency of each char in input Create forest of leaf nodes for each char Each leaf weighted according to its frequency Repeat un7l single trie: Select 2 tries with min sum of weights Merge into trie with sum of weights Provably op7mal This trie isn't the op7mal one for ABRACADBRA! 3

32 ABRACADABRA! 6) Join last two tries 2 5) Join two with sum = 7 4) Join two with sum = ) Join two with sum = 3 2) Join two with sum = 2 (in a 7e, exact choice arbitrary) 3 2 ) Create leafs with weight = frequency in en7re text Codewords: A 5 B 2 R 2 D C! 32

33 ABRACADABRA! % java Huffman - < abra.txt java BinaryDump 32 2 bits trie size of text 65 = A 68 = D 33 =! 67 = C 82 = R 66 = B int = 2 A B R D C! compressed data 33 A B R A C A D A B R A! padding 33

34 Lossless compression Summary Universal compression: impossible Op7mal data compression: undecidable Exploi7ng: Small alphabets Use only as many bits as needed to represent data Repeated symbols Run length encoding (RLE) Frequency of symbols Prefix- free codes Huffman coding 34

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE Algorithms ROBERT SEDGEWICK KEVIN WAYNE 5.5 DATA COMPRESSION Algorithms F O U R T H E D I T I O N introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu

More information

Algorithms 5.5 DATA COMPRESSION. basics run-length coding Huffman compression LZW compression

Algorithms 5.5 DATA COMPRESSION. basics run-length coding Huffman compression LZW compression 5.5 DATA COMPRESSION Algorithms F O U R T H E D I T I O N basics run-length coding Huffman compression LZW compression R O B E R T S E D G E W I C K K E V I N W A Y N E Algorithms, 4 th Edition Robert

More information

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE Algorithms ROBERT SEDGEWICK KEVIN WAYNE 5.5 DATA COMPRESSION Algorithms F O U R T H E D I T I O N introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE https://algs4.cs.princeton.edu

More information

5.5 Data Compression

5.5 Data Compression 5.5 Data Compression basics run-length coding Huffman compression LZW compression Algorithms, 4 th Edition Robert Sedgewick and Kevin Wayne Copyright 2002 2010 February 8, 2011 2:50:01 PM Data compression

More information

5.5 Data Compression

5.5 Data Compression 5.5 Data Compression basics run-length encoding Huffman compression LZW compression Algorithms in Java, 4 th Edition Robert Sedgewick and Kevin Wayne Copyright 2009 January 26, 2010 8:42:01 AM Data compression

More information

Semi-Lossless Text Compression: a Case Study

Semi-Lossless Text Compression: a Case Study Semi-Lossless Text Compression: a Case Study BRUNO CARPENTIERI Dipartimento di Informatica Università di Salerno Italy bc@dia.unisa.it Abstract: - Text compression is generally considered only as lossless

More information

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE lgorithms OBET SEDGEWICK KEVIN WYNE 5.5 DT COMPESSION lgorithms F O U T H E D I T I O N introduction run-length coding Huffman compression LZW compression OBET SEDGEWICK KEVIN WYNE http://algs4.cs.princeton.edu

More information

CS/COE 1501

CS/COE 1501 CS/COE 1501 www.cs.pitt.edu/~nlf4/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory E.g.,

More information

CMPSC112 Lecture 37: Data Compression. Prof. John Wenskovitch 04/28/2017

CMPSC112 Lecture 37: Data Compression. Prof. John Wenskovitch 04/28/2017 CMPSC112 Lecture 37: Data Compression Prof. John Wenskovitch 04/28/2017 What You Don t Get to Learn Self-balancing search trees: https://goo.gl/houquf https://goo.gl/r4osz2 Shell sort: https://goo.gl/awy3pk

More information

CS/COE 1501

CS/COE 1501 CS/COE 1501 www.cs.pitt.edu/~lipschultz/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory

More information

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE lgorithms ROBERT SEDGEWICK KEVIN WYNE 5.5 DT COMPRESSION lgorithms F O U R T H E D I T I O N introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WYNE http://algs4.cs.princeton.edu

More information

The impossible patent: an introduction to lossless data compression. Carlo Mazza

The impossible patent: an introduction to lossless data compression. Carlo Mazza The impossible patent: an introduction to lossless data compression Carlo Mazza Plan Introduction Formalization Theorem A couple of good ideas Introduction What is data compression? Data compression is

More information

CSSE SEMESTER 1, 2017 EXAMINATIONS. CITS1001 Object-oriented Programming and Software Engineering FAMILY NAME: GIVEN NAMES:

CSSE SEMESTER 1, 2017 EXAMINATIONS. CITS1001 Object-oriented Programming and Software Engineering FAMILY NAME: GIVEN NAMES: CSSE SEMESTER 1, 2017 EXAMINATIONS CITS1001 Object-oriented Programming and Software Engineering FAMILY NAME: GIVEN NAMES: STUDENT ID: SIGNATURE: This Paper Contains: 20 pages (including title page) Time

More information

Lossless compression II

Lossless compression II Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)

More information

CNT4406/5412 Network Security

CNT4406/5412 Network Security CNT4406/5412 Network Security Introduction to Cryptography Zhi Wang Florida State University Fall 2014 Zhi Wang (FSU) CNT4406/5412 Network Security Fall 2014 1 / 18 Introduction What is Cryptography Mangling

More information

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character. Project 2 1 P2-0: Text Files All files are represented as binary digits including text files Each character is represented by an integer code ASCII American Standard Code for Information Interchange Text

More information

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character. Project 2 1 P2-0: Text Files All files are represented as binary digits including text files Each character is represented by an integer code ASCII American Standard Code for Information Interchange Text

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms CS245-2015S-P2 Huffman Codes Project 2 David Galles Department of Computer Science University of San Francisco P2-0: Text Files All files are represented as binary digits

More information

Data Compression. Guest lecture, SGDS Fall 2011

Data Compression. Guest lecture, SGDS Fall 2011 Data Compression Guest lecture, SGDS Fall 2011 1 Basics Lossy/lossless Alphabet compaction Compression is impossible Compression is possible RLE Variable-length codes Undecidable Pigeon-holes Patterns

More information

Lossless compression II

Lossless compression II Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)

More information

ECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University

ECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University ECE 499/599 Data Compression & Information Theory Thinh Nguyen Oregon State University Adminstrivia Office Hours TTh: 2-3 PM Kelley Engineering Center 3115 Class homepage http://www.eecs.orst.edu/~thinhq/teaching/ece499/spring06/spring06.html

More information

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video Data Representation Data Representation Types of data: Numbers Text Audio Images & Graphics Video Analog vs Digital data How is data represented? What is a signal? Transmission of data Analog vs Digital

More information

Einführung in die Programmierung Introduction to Programming

Einführung in die Programmierung Introduction to Programming Chair of Software Engineering Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Michela Pedroni Lecture 11: Describing the Syntax Goals of today s lecture Learn about

More information

15 July, Huffman Trees. Heaps

15 July, Huffman Trees. Heaps 1 Huffman Trees The Huffman Code: Huffman algorithm uses a binary tree to compress data. It is called the Huffman code, after David Huffman who discovered d it in 1952. Data compression is important in

More information

5.5 Data Compression. basics run-length coding Huffman compression LZW compression. Data compression

5.5 Data Compression. basics run-length coding Huffman compression LZW compression. Data compression 5.5 ata ompression ata compression ompression reduces the size of a file: To save space when storing it. To save time when transmitting it. Most files have lots of redundancy. basics run-length coding

More information

Algorithms and Data Structures CS-CO-412

Algorithms and Data Structures CS-CO-412 Algorithms and Data Structures CS-CO-412 David Vernon Professor of Informatics University of Skövde Sweden david@vernon.eu www.vernon.eu Algorithms and Data Structures 1 Copyright D. Vernon 2014 Trees

More information

Multimedia Networking ECE 599

Multimedia Networking ECE 599 Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on B. Lee s lecture notes. 1 Outline Compression basics Entropy and information theory basics

More information

An example (1) - Conditional. An example (2) - Conditional. An example (3) Nested conditional

An example (1) - Conditional. An example (2) - Conditional. An example (3) Nested conditional Chair of Software Engineering Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Michela Pedroni October 2006 February 2007 Lecture 8: Describing the Syntax Intro. to

More information

Simple variant of coding with a variable number of symbols and fixlength codewords.

Simple variant of coding with a variable number of symbols and fixlength codewords. Dictionary coding Simple variant of coding with a variable number of symbols and fixlength codewords. Create a dictionary containing 2 b different symbol sequences and code them with codewords of length

More information

Compressing Data. Konstantin Tretyakov

Compressing Data. Konstantin Tretyakov Compressing Data Konstantin Tretyakov (kt@ut.ee) MTAT.03.238 Advanced April 26, 2012 Claude Elwood Shannon (1916-2001) C. E. Shannon. A mathematical theory of communication. 1948 C. E. Shannon. The mathematical

More information

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding Fundamentals of Multimedia Lecture 5 Lossless Data Compression Variable Length Coding Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Fundamentals of Multimedia 1 Data Compression Compression

More information

CIS 121 Data Structures and Algorithms with Java Spring 2018

CIS 121 Data Structures and Algorithms with Java Spring 2018 CIS 121 Data Structures and Algorithms with Java Spring 2018 Homework 6 Compression Due: Monday, March 12, 11:59pm online 2 Required Problems (45 points), Qualitative Questions (10 points), and Style and

More information

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I 1 Need For Compression 2D data sets are much larger than 1D. TV and movie data sets are effectively 3D (2-space, 1-time). Need Compression for

More information

More Bits and Bytes Huffman Coding

More Bits and Bytes Huffman Coding More Bits and Bytes Huffman Coding Encoding Text: How is it done? ASCII, UTF, Huffman algorithm ASCII C A T Lawrence Snyder, CSE UTF-8: All the alphabets in the world Uniform Transformation Format: a variable-width

More information

Data compression with Huffman and LZW

Data compression with Huffman and LZW Data compression with Huffman and LZW André R. Brodtkorb, Andre.Brodtkorb@sintef.no Outline Data storage and compression Huffman: how it works and where it's used LZW: how it works and where it's used

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 1: Entropy Coding Lecture 1: Introduction and Huffman Coding Juha Kärkkäinen 31.10.2017 1 / 21 Introduction Data compression deals with encoding information in as few bits

More information

So, what is data compression, and why do we need it?

So, what is data compression, and why do we need it? In the last decade we have been witnessing a revolution in the way we communicate 2 The major contributors in this revolution are: Internet; The explosive development of mobile communications; and The

More information

Ch. 2: Compression Basics Multimedia Systems

Ch. 2: Compression Basics Multimedia Systems Ch. 2: Compression Basics Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Outline Why compression? Classification Entropy and Information

More information

Lossless Compression Algorithms

Lossless Compression Algorithms Multimedia Data Compression Part I Chapter 7 Lossless Compression Algorithms 1 Chapter 7 Lossless Compression Algorithms 1. Introduction 2. Basics of Information Theory 3. Lossless Compression Algorithms

More information

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc. David Rappaport School of Computing Queen s University CANADA Copyright, 1996 Dale Carnegie & Associates, Inc. Data Compression There are two broad categories of data compression: Lossless Compression

More information

15110 PRINCIPLES OF COMPUTING SAMPLE EXAM 2

15110 PRINCIPLES OF COMPUTING SAMPLE EXAM 2 15110 PRINCIPLES OF COMPUTING SAMPLE EXAM 2 Name Section Directions: Answer each question neatly in the space provided. Please read each question carefully. You have 50 minutes for this exam. No electronic

More information

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints: CS231 Algorithms Handout # 31 Prof. Lyn Turbak November 20, 2001 Wellesley College Compression The Big Picture We want to be able to store and retrieve data, as well as communicate it with others. In general,

More information

Text Compression through Huffman Coding. Terminology

Text Compression through Huffman Coding. Terminology Text Compression through Huffman Coding Huffman codes represent a very effective technique for compressing data; they usually produce savings between 20% 90% Preliminary example We are given a 100,000-character

More information

Data Compression Scheme of Dynamic Huffman Code for Different Languages

Data Compression Scheme of Dynamic Huffman Code for Different Languages 2011 International Conference on Information and Network Technology IPCSIT vol.4 (2011) (2011) IACSIT Press, Singapore Data Compression Scheme of Dynamic Huffman Code for Different Languages Shivani Pathak

More information

A Comparative Study of Lossless Compression Algorithm on Text Data

A Comparative Study of Lossless Compression Algorithm on Text Data Proc. of Int. Conf. on Advances in Computer Science, AETACS A Comparative Study of Lossless Compression Algorithm on Text Data Amit Jain a * Kamaljit I. Lakhtaria b, Prateek Srivastava c a, b, c Department

More information

Data compression.

Data compression. Data compression anhtt-fit@mail.hut.edu.vn dungct@it-hut.edu.vn Data Compression Data in memory have used fixed length for representation For data transfer (in particular), this method is inefficient.

More information

History of Typography. (History of Digital Font)

History of Typography. (History of Digital Font) History of Typography (History of Digital Font) 1 What is Typography? The art and technique of printing The study and process of typefaces Study Legibility or readability of typefaces and their layout

More information

Chapter 16: Greedy Algorithm

Chapter 16: Greedy Algorithm Chapter 16: Greedy Algorithm 1 About this lecture Introduce Greedy Algorithm Look at some problems solvable by Greedy Algorithm 2 Coin Changing Suppose that in a certain country, the coin dominations consist

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Design and Analysis of Algorithms Instructor: SharmaThankachan Lecture 10: Greedy Algorithm Slides modified from Dr. Hon, with permission 1 About this lecture Introduce Greedy Algorithm Look at some problems

More information

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression An overview of Compression Multimedia Systems and Applications Data Compression Compression becomes necessary in multimedia because it requires large amounts of storage space and bandwidth Types of Compression

More information

Department of Image Processing and Computer Graphics University of Szeged. Fuzzy Techniques for Image Segmentation. Outline.

Department of Image Processing and Computer Graphics University of Szeged. Fuzzy Techniques for Image Segmentation. Outline. László G. Nyúl systems sets image László G. Nyúl Department of Processing and Computer Graphics University of Szeged 2009-07-07 systems sets image 1 systems 2 sets 3 image thresholding clustering 4 Dealing

More information

Technical University of Munich - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Technical University of Munich - FTP Site Statistics. Top 20 Directories Sorted by Disk Space Technical University of Munich - FTP Site Statistics Property Value FTP Server ftp.ldv.e-technik.tu-muenchen.de Description Technical University of Munich Country Germany Scan Date 23/May/2014 Total Dirs

More information

Huffman Coding Assignment For CS211, Bellevue College (rev. 2016)

Huffman Coding Assignment For CS211, Bellevue College (rev. 2016) Huffman Coding Assignment For CS, Bellevue College (rev. ) (original from Marty Stepp, UW CSE, modified by W.P. Iverson) Summary: Huffman coding is an algorithm devised by David A. Huffman of MIT in 95

More information

Binary Trees and Huffman Encoding Binary Search Trees

Binary Trees and Huffman Encoding Binary Search Trees Binary Trees and Huffman Encoding Binary Search Trees Computer Science E-22 Harvard Extension School David G. Sullivan, Ph.D. Motivation: Maintaining a Sorted Collection of Data A data dictionary is a

More information

Computer Security & Privacy. Why Computer Security Matters. Privacy threats abound (identity fraud, etc.) Multi-disciplinary solutions

Computer Security & Privacy. Why Computer Security Matters. Privacy threats abound (identity fraud, etc.) Multi-disciplinary solutions Computer Security & Privacy slides adopted from F. Monrose 1 Why Computer Security Matters Computers/Internet play a vital role in our daily lives Social Networks and Online Communities facebook, flickr,

More information

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1 IMAGE COMPRESSION- I Week VIII Feb 25 02/25/2003 Image Compression-I 1 Reading.. Chapter 8 Sections 8.1, 8.2 8.3 (selected topics) 8.4 (Huffman, run-length, loss-less predictive) 8.5 (lossy predictive,

More information

CS 231 Data Structures and Algorithms Fall Binary Search Trees Lecture 23 October 29, Prof. Zadia Codabux

CS 231 Data Structures and Algorithms Fall Binary Search Trees Lecture 23 October 29, Prof. Zadia Codabux CS 231 Data Structures and Algorithms Fall 2018 Binary Search Trees Lecture 23 October 29, 2018 Prof. Zadia Codabux 1 Agenda Ternary Operator Binary Search Tree Node based implementation Complexity 2 Administrative

More information

Professional Communication

Professional Communication 1 Professional Communication 1 3 Agenda Communication Analyze the writing situation Personal Business E-mail, Memos, Letters Editing your work 4 Analyze the Writing Situation Consider the following Subject

More information

CSE 143, Winter 2013 Programming Assignment #8: Huffman Coding (40 points) Due Thursday, March 14, 2013, 11:30 PM

CSE 143, Winter 2013 Programming Assignment #8: Huffman Coding (40 points) Due Thursday, March 14, 2013, 11:30 PM CSE, Winter Programming Assignment #8: Huffman Coding ( points) Due Thursday, March,, : PM This program provides practice with binary trees and priority queues. Turn in files named HuffmanTree.java, secretmessage.short,

More information

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials) CSE100 Advanced Data Structures Lecture 12 (Based on Paul Kube course materials) CSE 100 Coding and decoding with a Huffman coding tree Huffman coding tree implementation issues Priority queues and priority

More information

INTRODUCTORY COMPUTER NETWORKS PHYSICAL LAYER, DIGITAL TRANSMISSION FUNDAMENTALS. Faramarz Hendessi

INTRODUCTORY COMPUTER NETWORKS PHYSICAL LAYER, DIGITAL TRANSMISSION FUNDAMENTALS. Faramarz Hendessi INTRODUCTORY COMPUTER NETWORKS PHYSICAL LAYER, DIGITAL TRANSMISSION FUNDAMENTALS Faramarz Hendessi Introductory Computer Networks Lecture 5 Fall 2010 Isfahan University of technology Dr. Faramarz Hendessi

More information

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,

More information

TREES Lecture 10 CS2110 Spring2014

TREES Lecture 10 CS2110 Spring2014 TREES Lecture 10 CS2110 Spring2014 Readings and Homework 2 Textbook, Chapter 23, 24 Homework: A thought problem (draw pictures!) Suppose you use trees to represent student schedules. For each student there

More information

4/16/2012. Data Compression. Exhaustive search, backtracking, object-oriented Queens. Check out from SVN: Queens Huffman-Bailey.

4/16/2012. Data Compression. Exhaustive search, backtracking, object-oriented Queens. Check out from SVN: Queens Huffman-Bailey. Data Compression Exhaustive search, backtracking, object-oriented Queens Check out from SVN: Queens Huffman-Bailey Bailey-JFC 1 Teams for EditorTrees project Greedy Algorithms Data Compression Huffman's

More information

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding CS6B Handout 34 Autumn 22 November 2 th, 22 Data Compression and Huffman Encoding Handout written by Julie Zelenski. In the early 98s, personal computers had hard disks that were no larger than MB; today,

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Image Compression Caution: The PDF version of this presentation will appear to have errors due to heavy use of animations Material in this presentation is largely based on/derived

More information

Distributed source coding

Distributed source coding Distributed source coding Suppose that we want to encode two sources (X, Y ) with joint probability mass function p(x, y). If the encoder has access to both X and Y, it is sufficient to use a rate R >

More information

A Research Paper on Lossless Data Compression Techniques

A Research Paper on Lossless Data Compression Techniques IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 1 June 2017 ISSN (online): 2349-6010 A Research Paper on Lossless Data Compression Techniques Prof. Dipti Mathpal

More information

STUDY OF VARIOUS DATA COMPRESSION TOOLS

STUDY OF VARIOUS DATA COMPRESSION TOOLS STUDY OF VARIOUS DATA COMPRESSION TOOLS Divya Singh [1], Vimal Bibhu [2], Abhishek Anand [3], Kamalesh Maity [4],Bhaskar Joshi [5] Senior Lecturer, Department of Computer Science and Engineering, AMITY

More information

7: Image Compression

7: Image Compression 7: Image Compression Mark Handley Image Compression GIF (Graphics Interchange Format) PNG (Portable Network Graphics) MNG (Multiple-image Network Graphics) JPEG (Join Picture Expert Group) 1 GIF (Graphics

More information

Opera Web Browser Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Opera Web Browser Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space Property Value FTP Server ftp.opera.com Description Opera Web Browser Archive Country United States Scan Date 04/Nov/2015 Total Dirs 1,557 Total Files 2,211 Total Data 43.83 GB Top 20 Directories Sorted

More information

6. Finding Efficient Compressions; Huffman and Hu-Tucker

6. Finding Efficient Compressions; Huffman and Hu-Tucker 6. Finding Efficient Compressions; Huffman and Hu-Tucker We now address the question: how do we find a code that uses the frequency information about k length patterns efficiently to shorten our message?

More information

Intro. To Multimedia Engineering Lossless Compression

Intro. To Multimedia Engineering Lossless Compression Intro. To Multimedia Engineering Lossless Compression Kyoungro Yoon yoonk@konkuk.ac.kr 1/43 Contents Introduction Basics of Information Theory Run-Length Coding Variable-Length Coding (VLC) Dictionary-based

More information

An Overview 1 / 10. CS106B Winter Handout #21 March 3, 2017 Huffman Encoding and Data Compression

An Overview 1 / 10. CS106B Winter Handout #21 March 3, 2017 Huffman Encoding and Data Compression CS106B Winter 2017 Handout #21 March 3, 2017 Huffman Encoding and Data Compression Handout by Julie Zelenski with minor edits by Keith Schwarz In the early 1980s, personal computers had hard disks that

More information

Image coding and compression

Image coding and compression Image coding and compression Robin Strand Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Today Information and Data Redundancy Image Quality Compression Coding

More information

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression Digital Compression Page 8.1 DigiPoints Volume 1 Module 8 Digital Compression Summary This module describes the techniques by which digital signals are compressed in order to make it possible to carry

More information

Lossy compression. CSCI 470: Web Science Keith Vertanen

Lossy compression. CSCI 470: Web Science Keith Vertanen Lossy compression CSCI 470: Web Science Keith Vertanen Digital audio Overview Sampling rate Quan5za5on MPEG audio layer 3 (MP3) JPEG s5ll images Color space conversion, downsampling Discrete Cosine Transform

More information

COSC-211: DATA STRUCTURES HW5: HUFFMAN CODING. 1 Introduction. 2 Huffman Coding. Due Thursday, March 8, 11:59pm

COSC-211: DATA STRUCTURES HW5: HUFFMAN CODING. 1 Introduction. 2 Huffman Coding. Due Thursday, March 8, 11:59pm COSC-211: DATA STRUCTURES HW5: HUFFMAN CODING Due Thursday, March 8, 11:59pm Reminder regarding intellectual responsibility: This is an individual assignment, and the work you submit should be your own.

More information

ENSC Multimedia Communications Engineering Huffman Coding (1)

ENSC Multimedia Communications Engineering Huffman Coding (1) ENSC 424 - Multimedia Communications Engineering Huffman Coding () Jie Liang Engineering Science Simon Fraser University JieL@sfu.ca J. Liang: SFU ENSC 424 Outline Entropy Coding Prefix code Kraft-McMillan

More information

EE67I Multimedia Communication Systems Lecture 4

EE67I Multimedia Communication Systems Lecture 4 EE67I Multimedia Communication Systems Lecture 4 Lossless Compression Basics of Information Theory Compression is either lossless, in which no information is lost, or lossy in which information is lost.

More information

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding SIGNAL COMPRESSION Lecture 5 11.9.2007 Lempel-Ziv Coding Dictionary methods Ziv-Lempel 77 The gzip variant of Ziv-Lempel 77 Ziv-Lempel 78 The LZW variant of Ziv-Lempel 78 Asymptotic optimality of Ziv-Lempel

More information

a graph is a data structure made up of nodes in graph theory the links are normally called edges

a graph is a data structure made up of nodes in graph theory the links are normally called edges 1 Trees Graphs a graph is a data structure made up of nodes each node stores data each node has links to zero or more nodes in graph theory the links are normally called edges graphs occur frequently in

More information

GWDG Software Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

GWDG Software Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space GWDG Software Archive - FTP Site Statistics Property Value FTP Server ftp5.gwdg.de Description GWDG Software Archive Country Germany Scan Date 18/Jan/2016 Total Dirs 1,068,408 Total Files 30,248,505 Total

More information

Efficient Sequential Algorithms, Comp309. Motivation. Longest Common Subsequence. Part 3. String Algorithms

Efficient Sequential Algorithms, Comp309. Motivation. Longest Common Subsequence. Part 3. String Algorithms Efficient Sequential Algorithms, Comp39 Part 3. String Algorithms University of Liverpool References: T. H. Cormen, C. E. Leiserson, R. L. Rivest Introduction to Algorithms, Second Edition. MIT Press (21).

More information

15110 Principles of Computing, Carnegie Mellon University - CORTINA. Digital Data

15110 Principles of Computing, Carnegie Mellon University - CORTINA. Digital Data UNIT 7A Data Representa1on: Numbers and Text 1 Digital Data 10010101011110101010110101001110 What does this binary sequence represent? It could be: an integer a floa1ng point number text encoded with ASCII

More information

Building Java Programs. Priority Queues, Huffman Encoding

Building Java Programs. Priority Queues, Huffman Encoding Building Java Programs Priority Queues, Huffman Encoding Prioritization problems ER scheduling: You are in charge of scheduling patients for treatment in the ER. A gunshot victim should probably get treatment

More information

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn 2 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information

More information

Unit 2 Digital Information. Chapter 1 Study Guide

Unit 2 Digital Information. Chapter 1 Study Guide Unit 2 Digital Information Chapter 1 Study Guide 2.5 Wrap Up Other file formats Other file formats you may have encountered or heard of include:.doc,.docx,.pdf,.mp4,.mov The file extension you often see

More information

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013 Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013 Digital audio Overview Sampling rate Quan5za5on MPEG audio layer 3 (MP3) JPEG s5ll images Color space conversion, downsampling Discrete

More information

15-122: Principles of Imperative Computation, Spring 2013

15-122: Principles of Imperative Computation, Spring 2013 15-122 Homework 6 Page 1 of 13 15-122: Principles of Imperative Computation, Spring 2013 Homework 6 Programming: Huffmanlab Due: Thursday, April 4, 2013 by 23:59 For the programming portion of this week

More information

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code Entropy Coding } different probabilities for the appearing of single symbols are used - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic

More information

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, 2016 Huffman Codes CLRS: Ch 16.3 Ziv-Lempel is the most popular compression algorithm today.

More information

Bits and Bit Patterns

Bits and Bit Patterns Bits and Bit Patterns Bit: Binary Digit (0 or 1) Bit Patterns are used to represent information. Numbers Text characters Images Sound And others 0-1 Boolean Operations Boolean Operation: An operation that

More information

Algorithms Dr. Haim Levkowitz

Algorithms Dr. Haim Levkowitz 91.503 Algorithms Dr. Haim Levkowitz Fall 2007 Lecture 4 Tuesday, 25 Sep 2007 Design Patterns for Optimization Problems Greedy Algorithms 1 Greedy Algorithms 2 What is Greedy Algorithm? Similar to dynamic

More information

CSE 421 Greedy: Huffman Codes

CSE 421 Greedy: Huffman Codes CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits a 45% b 13% c 12% d 16% e 9% f 5% Why?

More information

14.4 Description of Huffman Coding

14.4 Description of Huffman Coding Mastering Algorithms with C By Kyle Loudon Slots : 1 Table of Contents Chapter 14. Data Compression Content 14.4 Description of Huffman Coding One of the oldest and most elegant forms of data compression

More information

FauxCrypt - A Method of Text Obfuscation

FauxCrypt - A Method of Text Obfuscation FauxCrypt - A Method of Text Obfuscation Devlin M. Gualtieri Consulting Scientist Ledgewood, New Jersey gualtieri@ieee.org Abstract Warnings have been raised about the steady diminution of privacy. More

More information

CS101 Lecture 12: Image Compression. What You ll Learn Today

CS101 Lecture 12: Image Compression. What You ll Learn Today CS101 Lecture 12: Image Compression Vector Graphics Compression Techniques Aaron Stevens (azs@bu.edu) 11 October 2012 What You ll Learn Today Review: how big are image files? How can we make image files

More information

ENSC Multimedia Communications Engineering Topic 4: Huffman Coding 2

ENSC Multimedia Communications Engineering Topic 4: Huffman Coding 2 ENSC 424 - Multimedia Communications Engineering Topic 4: Huffman Coding 2 Jie Liang Engineering Science Simon Fraser University JieL@sfu.ca J. Liang: SFU ENSC 424 1 Outline Canonical Huffman code Huffman

More information

A Comparative Study of Entropy Encoding Techniques for Lossless Text Data Compression

A Comparative Study of Entropy Encoding Techniques for Lossless Text Data Compression A Comparative Study of Entropy Encoding Techniques for Lossless Text Data Compression P. RATNA TEJASWI 1 P. DEEPTHI 2 V.PALLAVI 3 D. GOLDIE VAL DIVYA 4 Abstract: Data compression is the art of reducing

More information