PERMUTATION-BASED DATA COMPRESSION. Amalya Mihnea. A Dissertation Submitted to the Faculty of. The Charles E. Schmidt College of Science

Similar documents
ISSN (ONLINE): , VOLUME-3, ISSUE-1,

Digital Image Processing

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy

Image Coding and Data Compression

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

Image Compression Algorithm and JPEG Standard

Lossless Compression Algorithms

Contributions to image encryption and authentication

Compression of Stereo Images using a Huffman-Zip Scheme

VC 12/13 T16 Video Compression

Syrvey on block ciphers

Index. 1. Motivation 2. Background 3. JPEG Compression The Discrete Cosine Transformation Quantization Coding 4. MPEG 5.

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

Lecture 5: Compression I. This Week s Schedule

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

Video Compression An Introduction

Data Compression. Guest lecture, SGDS Fall 2011

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Digital Image Representation Image Compression

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

SIGNAL COMPRESSION. 9. Lossy image compression: SPIHT and S+P

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Lossless Image Compression with Lossy Image Using Adaptive Prediction and Arithmetic Coding

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

CS 335 Graphics and Multimedia. Image Compression

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1

Multimedia Networking ECE 599

06/12/2017. Image compression. Image compression. Image compression. Image compression. Coding redundancy: image 1 has four gray levels

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

CHAPTER 6. 6 Huffman Coding Based Image Compression Using Complex Wavelet Transform. 6.3 Wavelet Transform based compression technique 106

Lecture 8 JPEG Compression (Part 3)

So, what is data compression, and why do we need it?

Introduction to Video Compression

Modified SPIHT Image Coder For Wireless Communication

A COMPRESSION TECHNIQUES IN DIGITAL IMAGE PROCESSING - REVIEW

Lossless Image Compression having Compression Ratio Higher than JPEG

Digital Video Processing

MPEG-4: Simple Profile (SP)

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

MULTIMEDIA COMMUNICATION

Topic 5 Image Compression

DIGITAL IMAGE PROCESSING WRITTEN REPORT ADAPTIVE IMAGE COMPRESSION TECHNIQUES FOR WIRELESS MULTIMEDIA APPLICATIONS

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

An introduction to JPEG compression using MATLAB

AUDIOVISUAL COMMUNICATION

7: Image Compression

Wireless Communication

JPEG: An Image Compression System. Nimrod Peleg update: Nov. 2003

A Combined Encryption Compression Scheme Using Chaotic Maps

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

AN ANALYTICAL STUDY OF LOSSY COMPRESSION TECHINIQUES ON CONTINUOUS TONE GRAPHICAL IMAGES

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Advanced Video Coding: The new H.264 video compression standard

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM

ECE 533 Digital Image Processing- Fall Group Project Embedded Image coding using zero-trees of Wavelet Transform

Rate Distortion Optimization in Video Compression

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS

IMAGE COMPRESSION TECHNIQUES

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2

Lossy Coding 2 JPEG. Perceptual Image Coding. Discrete Cosine Transform JPEG. CS559 Lecture 9 JPEG, Raster Algorithms

CMPT 365 Multimedia Systems. Media Compression - Image

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology

Dr. V.U.K.Sastry Professor (CSE Dept), Dean (R&D) SreeNidhi Institute of Science & Technology, SNIST Hyderabad, India

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

Digital Image Steganography Techniques: Case Study. Karnataka, India.

A Reversible Data Hiding Scheme for BTC- Compressed Images

Fingerprint Image Compression

Lecture 6 Introduction to JPEG compression

13.6 FLEXIBILITY AND ADAPTABILITY OF NOAA S LOW RATE INFORMATION TRANSMISSION SYSTEM

Interactive Progressive Encoding System For Transmission of Complex Images

Anatomy of a Video Codec

Predicting Messaging Response Time in a Long Distance Relationship

Fundamentals of Video Compression. Video Compression

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

Image Compression. CS 6640 School of Computing University of Utah

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

Image Compression - An Overview Jagroop Singh 1

Overcompressing JPEG images with Evolution Algorithms

Highly Secure Invertible Data Embedding Scheme Using Histogram Shifting Method

A Novel Image Compression Technique using Simple Arithmetic Addition

Great Theoretical Ideas in Computer Science. Lecture 27: Cryptography

Engineering Mathematics II Lecture 16 Compression

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

ROI Based Image Compression in Baseline JPEG

EE67I Multimedia Communication Systems Lecture 4

VIDEO SIGNALS. Lossless coding

Lecture 6 Review of Lossless Coding (II)

Implementation and Analysis of Efficient Lossless Image Compression Algorithm

Lecture 8 JPEG Compression (Part 3)

High Efficiency Video Coding. Li Li 2016/10/18

MRT based Fixed Block size Transform Coding

Compression; Error detection & correction

Module 7 VIDEO CODING AND MOTION ESTIMATION

Image and Video Coding I: Fundamentals

Volume 2, Issue 9, September 2014 ISSN

Lecture 12: Compression

8 Integer encoding. scritto da: Tiziano De Matteis

A NEW ENTROPY ENCODING ALGORITHM FOR IMAGE COMPRESSION USING DCT

Transcription:

PERMUTATION-BASED DATA COMPRESSION by Amalya Mihnea A Dissertation Submitted to the Faculty of The Charles E. Schmidt College of Science in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Florida Atlantic University Boca Raton, FL December i 2011

PERMUTATION-BASED DATA COMPRESSION by Amalya Mihnea This dissertation was prepared under the direction of the candidate's dissertation advisor, Dr. Frederick Hoffman, Department of Mathematical Sciences, and has been approved by the members of her supervisory committee. It was submitted to the faculty of the Charles E. Schmidt College of Science and was accepted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. an Dissertation Advisor ~~~ Chair, D ent of athematical Sciences Aaron Meyerowitz, Ph.D. Gary W. Perry, P.. Dean, The Charles E. Sc 11"'7 r. ~-..,... Barry T. Rosson, Ph.D. Dean, Graduate College. t College of Science &~~iw/j Date 11

ACKNOWLEDGMENTS I would like to gratefully acknowledge the enthusiastic supervision of Dr. Frederick Hoffman, whose patience in proofreading various versions of the dissertation is greatly appreciated. I would also like to express my deepest gratitude to all members of my dissertation committee who offered useful suggestions and comments. Without their support this dissertation could not have been completed. iii

ABSTRACT Author: Title: Institution: Amalya Mihnea Permutation-Based Data Compression Florida Atlantic University Dissertation Advisor: Dr. Frederick Hoffman Degree: Doctor of Philosophy Year: 2011 The use of permutations in data compression is an aspect that is worthy of further exploration. The work that has been done in video compression based on permutations was primarily oriented towards lossless algorithms. The study of previous algorithms has led to a new algorithm that could be either lossless or lossy, for which the amount of compression and the quality of the output can be controlled. The lossless version of our algorithm performs close to lossy versions of H.264 and it improves on them for the majority of the videos that we analyzed. Our algorithm could be used in situations where there is a need for lossless compression and the video sequences are part of a single scene, e.g., medical videos, where loss of information could be risky or expensive. Some results on permutations, which may be of independent interest, arose in developing this algorithm. We report on these as well. iv

PERMUTATION-BASED DATA COMPRESSION LIST OF TABLES... vi LIST OF FIGURES... vii CHAPTER 1: INTRODUCTION... 1 1.1 Preliminaries... 2 CHAPTER 2: PERMUTATIONS... 7 2.1 Representations and Encodings of Permutations... 8 CHAPTER 3: PERMUTATIONS AND DATA COMPRESSION... 14 3.1 Introduction... 14 3.2 Two-Phase Linear Regression... 18 3.3 Representation of a Permutation Resulting from an Image... 20 3.4 Rounding the Residuals to Integers... 25 3.5 An Algorithm for the Representation of the Permutation... 27 3.6 Permutations and Video Compression... 32 3.7 Comparison of the Third Algorithm with H.264... 51 3.8 Conclusions... 71 BIBLIOGRAPHY... 73 v

LIST OF TABLES Table 1: Comparison of the average cost per element, considering just n elements (neglecting the extra elements for BWT and Min. entropy column), for 50 random permutations of order n... 13 Table 2: Comparison of the average cost per element, considering just n elements (neglecting the extra elements for BWT and Min. entropy column), for 500 random permutations of order n... 13 Table 3: Representation of the inverse permutation for the image Lena (1)... 29 Table 4: Representation of the inverse permutation for the image Lena (2)... 30 Table 5: Results for the third algorithm (lossy) for the video sequence Akiyo... 50 Table 6: A comparison of the three algorithms for the video sequence Akiyo... 50 Table 7: Bitrate comparison of the three algorithms... 51 Table 8: Bitrate comparison of Algorithm 3 with H.264... 55 Table 9: A break for the bitrate of our algorithm for the lossless case... 57 vi

LIST OF FIGURES Figure 1: Linear patterns in the representation of the inverse permutation for the image Lena... 21 Figure 2: Results for the image Balloons... 22 Figure 3: Results for the image Lena... 23 Figure 4: Block diagram for Algorithm 1... 35 Figure 5: Block diagram for Algorithm 2... 35 Figure 6: Motivation for further improvement... 36 Figure 7: Changes relative to the original frame (F 2 )... 39 Figure 8: Changes relative to the sorted frame (P 2 (F 2 ))... 40 Figure 9: Frames recovered with the inverse of the sorting permutation of the previous frame... 41 Figure 10: Differences between two sorted frames for the videos Akiyo and city... 42 Figure 11: Differences between two sorted frames for the videos crew and harbour... 43 Figure 12: Block diagram for Algorithm 3... 48 Figure 13: PSNR (Peak Signal to Noise Ratio) for Algorithm 3 (lossy) with different values of M, for the video sequence Akiyo... 50 Figure 14: Comparison of performance for the two methods (PSNR vs. bitrate), for each video... 59 vii

Figure 15: Comparison of performance for each method (PSNR vs. bitrate), for all the videos... 60 Figure 16: Comparison of performance for our method... 61 Figure 17: Comparison of performance for our method... 61 Figure 18: Comparison of the two methods for the same PSNR... 62 Figure 19: Comparison of the two methods for the same bitrate (Video mother_daughter)... 63 Figure 20: Comparison of the two methods for the same bitrate (Video hall_monitor)... 64 Figure 21: Comparison of the two methods for the same bitrate (Video harbour)... 65 Figure 22: Comparison of artifacts for low quality videos (Our Method vs. H.264)... 66 Figure 23: Screen shots of frames from the video city, as M increases Our Method... 67 Figure 24: Screen shots of frames from the video city, as M increases Our Method... 68 Figure 25: Screen shots of frames from the video harbour, as M increases Our Method... 69 Figure 26: Screen shots of frames from the video harbour, as M increases Our Method... 70 viii

CHAPTER 1: INTRODUCTION The continuing increase of the quantity of data used and stored calls for efficient data compression methods. Lossless compression has been shown to provide a limited amount of compression, so it is necessary to focus also on lossy methods, whenever their application is allowed. Video compression using permutations is certainly worthy of further study, and we are attempting to contribute to this study. We introduce an algorithm that could be either lossless or lossy, and for which the amount of compression and the quality of the output can be controlled. The second chapter provides a theoretical background on permutations and some considerations other than data compression. The third chapter focuses on data compression, especially image and video compression. The second chapter begins with a representation for permutations. We compare our representation with the Burrows-Wheeler Transform for situations in which the input is a permutation. In the third chapter, we construct an algorithm which gives an efficient representation of a permutation resulting from an image. We compare some types of regression and conclude that, of them, the two-phase segmented linear regression gives the best reasonable approximation of such a permutation. By reasonable, we mean that the quantity of the additional information we have to save is not very large. Some possibilities of improvement for this method, like rounding the residuals and the 1

coefficients to integers or quantizing them, are also discussed. Further exploration could be related to a partial recovery of the permutation, resulting from the removal of outliers or from quantization of the permutation or some other data, which would lead to lossy compression. Permutations could also be applied to the blocks of an image. The section related to video compression focuses on algorithms that take advantage of the cheap representation of sorted frames. The motivation for further improvement resulted from comparing the differences between two sorted frames with the differences between two almost-sorted frames. The former one is much cheaper, especially for videos with slow motion. The algorithm that was introduced could be lossless or lossy, and its performance can be controlled by adjusting a parameter M and by the density of I frames. 1.1 Preliminaries We now give some definitions which are needed for the presented work. Please note that logarithms are taken to base 2. Lossless and lossy compression. Lossless compression techniques, as their name implies, involve no loss of information. The original data can be recovered exactly from the compressed data. Lossless compression is generally used for discrete data, such as text, computer-generated data, and some kinds of image and video information. Lossy compression techniques involve some loss of information, and data cannot always be recovered or reconstructed exactly. 2

Entropy. If we have a set of independent events A i, which are sets of outcomes of some experiment S, such that A i = S, where S is the sample space, then the entropy (or average self-information) associated with the random experiment is given by ( ) ( ). Shannon showed that if the experiment is a source that outputs symbols A i from a set A, then the entropy measures the average number of binary symbols needed to code the output of the source. The best that a lossless compression algorithm can do is to encode the output of a source with an average number of bits equal to the entropy of the source (Sayood, 1996, p. 15). Consider the message BAAD BARBARA BABBLED ABRACADABRA, which has 32 characters including the space. Some common computer methods encode each character from the standard alphabet using 8 bits, so this message would require 32*8=256 bits. Since the message only has 8 distinct characters, we could encode them using 3 bits each: 000 for space, 001 for A etc. This would cut the message down to 96 characters, saving 160 bits. A header file giving the coding might use approximately 96 more bits. For example, if A is 01000001, we would need to report that 01000001 becomes 001. This is over half the savings, and a cost of 96+96=192 comes out to a rate of 6 bits per character. However, for messages much longer than the underlying alphabet, the overhead is not significant. Because the letter A occurs 11 out of the 32 characters, it contributes -11/32 * log(11/32) to the entropy. In all, the entropy is 80.436477 or about 2.514 bits per character. So this is the theoretical limit of the rate for an encoding of this message. 3

Run-length encoding. Run-length encoding is a specification of elements in a list as a list of pairs giving the element and number of times it occurs in a run. For example, given the list {0, 0, 0, 1, 1, 3, 3, 3, 3, 7, 7, 7, 10, 10}, the run-length encoding is {{0, 3}, {1, 2}, {3, 4}, {7, 3}, {10, 2}}. Huffman encoding. A prefix code is a type of code system characterized by the condition that no valid codeword is a prefix of any other valid codeword in the system. The Huffman codes are prefix codes and are based on the frequencies of occurrence of data items. Lower numbers of bits are used to encode the data that occur more frequently, compared to the data that occur less frequently. From this, it follows that this type of coding is more useful for data with non-uniform distributions (Sayood, 1996, pp. 25-59). For the message in the example above, a prefix code is A=00, B=01, R=100, space=101, D=110, C=1110, E=11110, L=11111. With this encoding, the message uses 82 bits for a rate of 2.5625 bits per character. Ziv-Lempel compression. Ziv-Lempel compression is based on finding exact repetitions of sequences. The algorithm reads the input data and builds a dictionary of observed sequences while looking for repetitions. The encoding is done by writing strings to the output the first time they are observed (i.e., for new data) and writing special codes or references when a repetition is found (i.e., for old data) (Sayood, 1996, pp. 100-113). For example, a long message which used the words ABRACADABRA, ABRASIVE, LABRADOR, ABRAHAM and SABRA might have a special encoding for the string ABRA. 4

MPEG and JPEG. MPEG gives excellent compression for static scenes. It relies on the storage of moderately compressed Intra Pictures every 15th frame, then Forward Predicted P-Pictures, storing only the change vectors of parts of the pictures, and finally Bidirectional B-Pictures which are generated estimation pictures averaging between the I-Pictures and the B-Pictures. (Horne, 1999). Because the stored information consists of changes to I-Pictures, the possible loss of an I-Picture will corrupt the whole data stream that depends on it. JPEG addresses each incoming video frame as a separate picture, compressing with a predictable, pre-settable, compression rate, leading to a predictable file size. (Horne, 1999). The Burrows-Wheeler Transform (BWT) for permutations. We consider a permutation p = [5 4 3 7 1 6 2]. We note that BWT, as originally defined in (Burrows & Wheeler, 1994), was designed to be used for multi-set permutations. BWT generates a matrix G whose rows are consecutive cyclic left-shifts of p. Then it sorts the rows of G lexically, in ascending order, obtaining a matrix H. The matrix G: (1) 5 4 3 7 1 6 2 (2) 4 3 7 1 6 2 5 (3) 3 7 1 6 2 5 4 (4) 7 1 6 2 5 4 3 (5) 1 6 2 5 4 3 7 (6) 6 2 5 4 3 7 1 (7) 2 5 4 3 7 1 6 The matrix H: (5) 1 6 2 5 4 3 7 (7) 2 5 4 3 7 1 6 (3) 3 7 1 6 2 5 4 (2) 4 3 7 1 6 2 5 (1) 5 4 3 7 1 6 2 (6) 6 2 5 4 3 7 1 (4) 7 1 6 2 5 4 3 5

We observe that the original sequence appears in the fifth row (i=5) of H. Denote by S the second column of H, and by L the last column of H. From the pair (i, S) or (i, L) we can recover the initial permutation p using one of the following algorithms (Arnavut & Magliveras, 1997). If the pair (i, S) is transmitted we use: 1. p[1] = i 2. for j = 2 to n p[j] = S[p[j-1]] If the pair (i, L) is transmitted we use: 1. p[n] = L[i] 2. for j = 1 to n-1 p[n-j] = L[p[n-j+1]] We might use whichever of S and L has the lowest entropy, at the cost of an extra bit to specify which. For LPSA (a generalization of BWT), all the columns other than the first are examined (at least for prime n such as 7) and the lowest entropy one is used. This requires transmitting the index of the chosen column as well. 6

CHAPTER 2: PERMUTATIONS Permutations have a rich combinatorial structure. A permutation can be represented in many equivalent ways: as a word (sequence), a function, a collection of disjoint cycles, a matrix, etc. In the past it was shown that transformations based on permutations can be used in data compression. In (Burrows & Wheeler, 1994), the authors introduced one such transformation, which is referred to as the Burrows-Wheeler Transformation (BWT). Their approach, called Block Sorting Lossless Data Compression Algorithm, combined BWT with move-to-front coding and a standard compressor such as Huffman coding or arithmetic coding. The algorithm was shown to achieve compression rates similar to that of content-based lossless methods, but at execution times comparable to that of the fast general-purpose lossless compressors, such as Ziv-Lempel techniques. The work by Burrows and Wheeler was later continued and improved by Deorowicz (Deorowicz, 2002). Arnavut and Magliveras introduced lexical permutation sorting algorithm (LPSA), a generalized version of BWT (Arnavut & Magliveras, 1997). Among other things, they report on experiments comparing BWT and LPSA for encoding permutations presented as lists, where they find an advantage for LPSA but not for BWT. Socek used permutations to compress fairly static videos (Socek, 2006). Our main purpose in studying permutations was to find a representation that has lower entropy compared to other representations. In this chapter, we present some concepts and applications related to permutations, including experiments inspired by 7

those in (Arnavut & Magliveras, 1997). In Chapter 3 we study other improvements in the representation of a permutation, including a method based on linear regression. Then we focus on video compression using permutations. 2.1 Representations and Encodings of Permutations We begin with a representation similar to methods described in (Bona, 2004) using paths. This is a variant of Lehmer codes (Lehmer, 1960). Suppose p = [p 1 p 2 p n ] is a permutation. Our method uses the list c = [c 1 c 2 c n ] where c j is the rank of p j among the first j entries; in other words, the number of entries p i with i<j+1 and p i <p j +1. Theorem 1 (Properties of our representation) 1) If p = [1 2 3 n-1 n] then c = p = [1 2 3 n-1 n] 2) If p = [n n-1 n-2 2 1] then c = [1 1 1 1 1] 3) c i i for all i from 1 to n 4) if p i - p i+1 = 1 then c i = c i+1 5) if p i+1 - p i = 1 then c i+1 - c i = 1 6) c n = p n 7) We can recover p from c. Only the last property needs justification. We provide this in the second algorithm below. Informally, the fact that c n = p n means that we can at least recover p n. Then, from c n-1 and the remaining entries, we can recover p n-1, and so on. 8

Algorithm EncodePerm(n, perm) Initialize empty array c of length n c[1] = 1 for i = 2 to n sorted_perm = Sort(perm[1:(i-1)]) pos = PositionInsertionInSorted(sorted_perm, perm[i]) c[i] = pos return c Algorithm DecodePerm(n, c) Initialize empty array p of length n for i = 1 to n left[i] = i p[n] = c[n] for i = n-1 to 1 left = EraseElem(left, c[i+1]) p[i] = left[c[i]] return p Next, we give some examples related to our method to encode a permutation. We observe that the number of different elements decreases as a result of this type of encoding. We should have a decrease in entropy, so if we use variable length encoding (like Huffman encoding), our method would give compression compared to the original representation of the permutation, if it is not the identity permutation. 9

Example For the permutation [2 4 3 1 6 5] we get the representation [1 2 2 1 5 5]. For the permutation [6 1 2 4 3 5] we get the representation [1 1 2 3 3 5]. We have already mentioned the BWT and LPSA algorithms used in data compression. As mentioned, a study in the literature compares their performance when the input is a permutation (Arnavut & Magliveras, 1997). We compare our method to the other two and get better results. The lossless Burrows-Wheeler Compression Algorithm (Burrows & Wheeler, 1994) has received considerable attention over recent years for its effectiveness. This algorithm is based on a permutation of the input sequence - the Burrows-Wheeler Transform - which groups symbols with a similar context close together. In the original version, this permutation was followed by a Move-To-Front (MTF) transformation and a final entropy coding stage. The Move-To-Front transformation (MTF) replaces the input symbols with corresponding ranking values. An MTF replacement, called Inversion Frequencies, was introduced in (Arnavut & Magliveras, 1997), and Deorowicz presented another MTF replacement, named Weighted Frequency Count (Deorowicz, 2002). BWT can be generalized to the Lexical Permutation Sorting Algorithm (LPSA). When the message has length n, BWT produces a certain cyclic permutation of length n. LPSA considers all ϕ(n) generators of the cyclic group generated by that permutation, where ϕ(n) is Euler s ϕ function of n. It is optimal to consider the case that n is prime, since then ϕ(n) = n-1. If we transmit a group generator and an exponent x: 1 x n, in addition to the information needed for BWT, we can reconstruct the input permutation. We could choose the generator that is the cheapest among the ϕ(n) possibilities, and 10

therefore, we could obtain a better compressibility than BWT (Arnavut & Magliveras, 1997). This involves an additional cost, because we have to save n+2 elements, instead of n, in order to be able to recover the original permutation. This increases the rate by 2log(n)/n bits per character, which is not significant if n is very large. If we apply our representation to a randomly generated set of permutations of order n, we obtain an average entropy of the outputs that is smaller than both average entropies for BWT and LPSA, with the same set of permutations taken as inputs. We have conducted a number of experiments for random permutations of selected degree n (n prime) to compare the behavior of the three algorithms. On average, our method is better than both LPSA and BWT in this rather unusual setting. Tables 1 and 2 present some results of these experiments for given values of n. For comparison, we consider the cost of transmitting a string Y = (Y[1], Y[2], Y[3],, Y[n]) as either the entropy of Y or the entropy of the difference string δy = (Y[1], Y[2]-Y[1], Y[3]-Y[2],, Y[n]-Y[n-1]) (δ-transformation). We pause to make a few observations about encoding permutations. A permutation presented as a list Y of n numbers is the worst possible case for compression, as there are no repeated symbols. Anything we do will be an improvement on this. In particular, δy may have some repeated entries and, if it does not, it is no better but no worse than Y itself. If we truly have random permutations, as in the experiments we extend, then the best course is to transmit an integer from 0 to n!-1, representing the position of the selected permutation in dictionary order. There are easy, well known, methods to reverse this and information theory shows that nothing could be more efficient (Knuth, 1981, pp. 12-13). 11

Our representation always begins with a 1. This could be deleted with no loss of information and some improvement in length. However, we do not do this, for the following reason. Even in the least efficient representation as Y, we could drop the final entry Y[n] and recover it at the other end as the one integer missing. This would be simple to do but tedious, if n is very large. In the representation δy = (Y[1], Y[2]-Y[1], Y[3]-Y[2],, Y[n]-Y[n-1]) we could do the same. When n is large, the savings in transmission rate is not significant. So we use all the methods considered in the form which converts Y into another vector of length n. We give BWT and LPSA the advantage of not considering the added cost of transmitting an extra one or two parameters per transmission. Again, this is not significant for very large n. The next tables present the average cost for transmitting a permutation in each of the cases that we analyzed. We observe that on average, our representation is better than the other representations we considered, even with the extra advantage that we gave BWT and LPSA by neglecting the extra elements needed for their representations. Future research could be directed towards generating larger sets of random permutations and/or permutations of higher order. We could also repeat the experiments many times, for fixed parameters, and take the average behavior over a large number of executions. Finding patterns or combinatorial formulas related to the algorithms we considered could help us in proving the improvements in performance that we obtained. We believe that our representation is better than the other representations we considered, based on the experiments that we conducted. Every time we ran the algorithms for sets of permutations of degree n (n prime), we obtained better results with our method than with the other 12

methods of representation. The next two tables give the results obtained by applying the three algorithms to the same sets of random permutations of order n. n n Original repr. (Y) n Original repr. (δy) n+1 elem. needed BWT last column (δy) 13 n+2 elem. needed Min. entropy column (δy) n Our method (δy) n Our method (Y) 11 3.459432 2.881589 2.897254 2.662373 2.4511 2.321322 19 4.247928 3.723594 3.689352 3.411038 3.231313 3.085691 53 5.72792 5.144376 5.171265 4.895346 4.638808 4.527508 101 6.658211 6.091978 6.082887 5.876007 5.561401 5.410621 199 7.636625 7.072666 7.061392 6.899571 6.549617 6.365163 293 8.194757 7.608373 7.621553 7.47306 7.083711 6.913937 n Table 1: Comparison of the average cost per element, considering just n elements (neglecting the extra elements for BWT and Min. entropy column), for 50 random permutations of order n n Original repr. (Y) n Original repr. (δy) n+1 elem. needed BWT last column (δy) n+2 elem. needed Min. entropy column (δy) n Our method (δy) n Our method (Y) 11 3.459432 2.903297 2.903179 2.641596 2.508097 2.370673 19 4.247928 3.692994 3.68405 3.386754 3.225618 3.073631 53 5.72792 5.158698 5.161551 4.905767 4.661294 4.487254 101 6.658211 6.08937 6.082123 5.877019 5.57198 5.399066 199 7.636625 7.060105 7.063859 6.897279 6.542882 6.359778 293 8.194757 7.619876 7.618466 7.477393 7.091816 6.919428 Table 2: Comparison of the average cost per element, considering just n elements (neglecting the extra elements for BWT and Min. entropy column), for 500 random permutations of order n

CHAPTER 3: PERMUTATIONS AND DATA COMPRESSION 3.1 Introduction The applications of permutations to image and video compression constituted a motivation to study areas related to these topics. In order to understand how permutations can be applied to image processing in general, we give some definitions, examples and remarks based on (Arnavut, 1995). By a multiset M based on a set S, we mean a pair (S, f) where f : S N is a function from S into N and f is called the frequency (multiplicity) function. If the underlying set has an implicit or explicit linear order, say S = {a 1, a 2,..., a n } with a 1 < a 2 <... < a n, we sometimes denote the multiset by M = ( ) ( ) ( ). Example For the multiset M = T A 3 L 2 H S 2 E 2, TALLAHASSEE is a multiset permutation of M and so is SEESTAHALLA. If M = ( ) ( ) ( ), and f(a x ) = f x then the number of distinct multiset permutations of M is ( ). This quantity is also called a multinomial coefficient and is sometimes denoted by ( ) Given a two-dimensional image I, by raster scanning the image I, we can convert it to a linear array I, which can be considered a multiset permutation of the multiset of gray 14

values. For a 256-gray-valued image, I will be a multiset permutation on multiset, where f i represents the frequency of gray value i in the multiset. An image could be compressed by sorting the data represented by its pixel values, and then taking the differences between consecutive elements. If we assume that all possible 256 intensities occur in an image, then, after sorting the image, we get blocks of f 0 elements with value 0, f 1 elements with value 1, etc. Because we shall have adjacent elements that are equal, we could use run length encoding to efficiently represent the sorted image. If differences of consecutive elements are taken (referred to as residual data/image in this section), we shall obtain an array of 0 s and 1 s, and due to the high frequency of 0 s, this data could also be coded at a low cost rate. Simulation results have shown that such data can be coded at a rate of 0.015 to 0.035 bits per pixel (Arnavut, 1995). In order to recover the original image, we need to apply a permutation to the sorted image. To get compression, this permutation has to be represented with a small cost; otherwise, we incur an overall loss. Suppose that the data, or residual data, for an 8-bit image is sorted in ascending order. If all possible gray values occur in the image, there will be at most 256 blocks, and at most 512 blocks in the case of its residual image. In our case, a particular block consists of the set of all indices for which the given data X = (x 1, x 2,, x m ) has a particular fixed gray value. For example, if X = (x 1,, x 9 ) = (2, 0, 1, 1, 3, 3, 0, 2, 0), then there are four blocks: B 0 = {2, 7, 9}, B 1 = {3, 4}, B 2 = {1, 8} and B 3 = {5, 6}. 15

If we have the blocks and the corresponding gray values, as in the previous example, we could reconstruct the image. We could also reconstruct the image from the sorted X and the permutation P = [2 7 9 3 4 1 8 5 6], obtained by concatenating the blocks. The objectives of the following study are to determine an effective way to represent a permutation resulting from an image and to analyze some methods to compress a video sequence using permutations. We compare some types of regression, to see which gives a better approximation of a certain permutation. Then we discuss the possibility of using some restricted permutations which are cheap to represent. Further, we analyze an algorithm that uses permutations of the pixels of each frame to transmit a video sequence. We start by presenting some applications of permutations that could help us in understanding the importance of permutations. Our final goal is to compress fairly static videos, represented by a single scene with low motion. We consider a signal obtained from reading an image in raster scan order, by rows (or by columns, if we prefer). When a sorting permutation acts on this signal, the correlation of the symbols is best possible, and run-length encoding (RLE) could be used. But in order to recover the initial image, we need to apply the inverse of the sorting permutation. If a permutation is of degree N (i.e., if the total number of pixels in an image I is N), we could transmit it as a sequence of length N, consisting of unique log 2 N-bit indices. The total transmission cost for it is, in this case, N log 2 N bits. If both the sender and the receiver preselect an ordering of permutations from S N, such as the lexicographic ordering, each permutation has its own index according to that ordering. If the index is small, transmitting the index itself could be cheaper than the previous method. The cost 16

of this transmission is log 2 N! in the worst case. So transmitting a permutation is usually not efficient, unless we find a cheap way to represent it. Burrows and Wheeler introduced a permutation-based transformation (called Burrows-Wheeler Transformation or BWT), which served as a compression primitive (Burrows & Wheeler, 1994). In (Arnavut & Magliveras, 1997), the authors introduced Lexical Permutation Sorting Algorithm (LPSA), a generalized version of BWT. Sample reordering is also used in JPEG and MPEG to reorder transform coefficients (e.g., DCT coefficients) in a fixed order that allows for more efficient symbol entropy coding (e.g., zig-zag ordering). Secret permutations are frequently used in digital image encryption applications, to shuffle the positions of the pixels, DCT/wavelet coefficients, Huffman table codewords, and even blocks or macroblocks. Some of these algorithms are based solely on secret permutations that are generated by a secret key, and because of that they receive criticism, being considered insecure against several types of cryptanalysis (known-plaintext, chosen-plaintext and chosen-ciphertext attacks). Permutations are also used as an encryption primitive in modern symmetric-key cryptography. Most symmetric-key block ciphers rely on permutations of symbols (e.g., bits) in order to provide data diffusion. Also, there are cryptosystems based on transformations that use permutation groups (Socek, Magliveras, Ćulibrk, Marques, Kalva, & Furht, 2007). Daniel Socek developed digital video encryption algorithms based on correlationpreserving permutations, in which the current frame is permuted using the sorting permutation of the previous frame (Socek, 2006). Such applications are of interest to us, and we shall come back to them in section 3.6. 17

3.2 Two-Phase Linear Regression In statistics, regression analysis is used to model relationships between random variables, and to determine the magnitude of the relationships between variables. It can be used to make predictions based on the models. Linear regression is a statistical method for modeling the relationship between two or more random variables using a linear equation. One variable is considered to be a dependent variable and the others are considered independent. In sections 3.2-3.5, the term residuals refers to the residuals from linear regression. In the case of two-phase linear regression, we are interested in determining the point of separation between two regression functions, which is not known a priori. This can be done by inspection. If our problem requires a more accurate approximation, the set of data can be separated at different x values, and after calculating regression functions for the two subsets resulting from separation, the x value giving the best fit may be chosen as the point of separation (Vieth, 1989). Linear regression is used either to predict future trends or to describe past events mathematically. Two models of two-phase linear regression, which provide valuable insight into the existence, timing, and significance of data breaks are piecewise and segmented regression. For piecewise two-phase linear regression, the two best-fit lines are joined at the change point. It follows that there is always a difference in slope between the two new data sets (otherwise the regression is considered one-phase, or simple) (Dunnigan, Hammen, & Harris, 1997). Segmented two-phase linear regression does not require that trend lines be joined at the change point, therefore allowing any change in slope and intercept. In this case the slopes of the two lines could be equal. 18

The response of many variables can be described by two linear regression functions with a joint point (the point where the first regression line intersects the second) or a change point (the point where there is a step from one regression line to the next). These points are unknown a priori (Vieth, 1989). Regarding the joint or change point, we have to distinguish between two cases: the point lying between the x values of two observations and the point being identical with the x value of an observation. In previous experiments (Mihnea, 2008) we compared the performances of two types of two-phase linear regression: piecewise and segmented. The segmented one gives a better approximation (a smaller sum of squared residuals), so we used that one for representing a permutation resulting from an image. For data where there is a sudden jump in the y values, the fact that the lines have to meet, in the case of two-phase piecewise linear regression, is a restriction that will worsen our approximation. This is evident if we look at the first cluster from the graph in Figure 1, in section 3.3, where the first portion of the data (first cluster) can be better approximated by two fitted lines that do not meet. The same thing is true for clusters 3 and 4 in the same graph; while, for the second cluster, a visual inspection tells us that simple linear regression is more appropriate. With an increase in complexity, we could combine simple linear regression and twophase segmented linear regression, to obtain a cheaper representation of the data we consider here (a permutation resulting from sorting an image). A direction for future research would be to use this combination on blocks of images. More details are given at the end of section 3.5. 19

3.3 Representation of a Permutation Resulting from an Image We consider a signal obtained from reading an image in raster scan order (by rows, after observing that this order gives a better correlation between adjacent pixels than reading it by columns). If a sorting permutation acts on this signal, we obtain a sorted image. In order to recover the initial image, we need to apply the inverse of the sorting permutation. The sorted image and the inverse permutation are sufficient to recover the original image. The images and videos that we used in this chapter are frequently used in research on image and video processing. They can be found in many databases of images and/or videos made public on the Internet, such as (Burkardt) for images and (Kalva) for videos. For easier processing, we converted the images to.pgm and the videos to.yuv, formats that allow the extraction of pixels for images, and frames of videos. The experiments were performed using the Y component of the videos, which determines the brightness of the color (referred to as luminance or luma). The data that we used to analyze an inverse permutation consists of the images Balloons and Lena (Burkardt). We used simple, two-phase segmented linear regression and cubic spline regression to analyze the inverse permutation resulting from ordering the image Lena. 20

Figure 1: Linear patterns in the representation of the inverse permutation for the image Lena The y values of the permutations from the graphs of Figures 2 and 3 represent the positions of the pixels from the sorted image in the original image. Their order follows the order of the pixels in the sorted version of the image. The x values from the same graph are the numbers: 1, 2, 3,, n, where n is the total number of pixels in the image. We denote by cluster, the portion of this permutation corresponding to one intensity. The graph of the inverse permutation breaks and usually goes back down where the intensities change in the sorted image (where a new cluster begins). There are nice patterns in these graphs, which are illustrated in Figures 1-3. Figure 1 gives a representation of the first four clusters of the inverse permutation, corresponding to the first intensities in the sorted version of the image Lena. The graph of the inverse permutation for the image Balloons (Figure 2) is represented on a partial domain. 21

The final image (after ordering the pixels from low to high values) The initial image The image has 480 640 = 307200 pixels. After ordering, the first (lowest) pixels in the sorted image have the following indices in the initial image: 231129 232410 233050 233691 234331 234332 234972 236253 236892 236893 Figure 2: Results for the image Balloons 22

The final image The initial image (after ordering the pixels from low to high values) The image has 222 208 = 46176 pixels. After ordering, the first (lowest) pixels in the sorted image have the following indices in the initial image: 19243 19656 20276 20277 20484 20485 21500 24424 26070 26794 Figure 3: Results for the image Lena 23

For the image Balloons, the linear patterns appear in the whole domain, but they become too dense to be represented in one graph, because the domain is too big to get a clear image for all of it. The sorted image and the y values from the graph of the inverse permutation are sufficient to recover the original image. We do linear regression on each cluster of the image Lena (Figure 3), save the coefficients as a pair of real numbers (intercept, slope), round the residuals, and save them as integers for transmitting purposes. Then we recover the initial permutation from these saved values. We observe that if we round the residuals and the recovered data, we get all of the errors (from rounding the residuals to integers) equal to zero for simple linear regression, in this case. We compare the initial values (y values from the graph in Figure 3) that we need to transmit with the final values (intercept, slope and residuals), for the first cluster (lowest intensity). The sum of the initial values is 1,208,238, and after applying linear regression, the sum of the new (absolute) values needed to recover the permutation is 93,254.94. We have 1,208,238/93,254.94=12.9563 13 (about 13 times smaller). The sum of the absolute values of the residuals is 74,340. We have 1,208,238/74,340=16.2529 16 (about 16 times smaller). The range of the residuals is also smaller than the range of the initial values (about half of it) for the first cluster. The intercept in this case is 18335.9115 and the slope, 579.0263. We apply two-phase segmented linear regression to the first cluster of the permutation and check if the results improve significantly. In this case, we shall have two pairs (intercept, slope), but the residuals could be much smaller because of a better approximation. Indeed, their absolute sum is 35,403, which is 1,208,238/35,403= 24

34.1281 34 times smaller than the sum of the initial values, and 74,340/35,403=2.0998 2 times smaller than the sum of the absolute residuals from one-phase (simple) regression. In this case, we need to save the residuals plus five values: the break point, two intercepts and two slopes: 26, (20,282.554, 35,631.5011) and (377.506, 105.6418). For the next cluster/intensity, we could consider x values starting from 1, 2, 3, and this way, the y-intercept (that we also need to save) would be smaller (the slope would probably be the same). Knowing the dimension of each cluster would help us shift each cluster to the left, so that its first elements have x values: 1, 2, 3 We don t need to send or save these dimensions, because they can be calculated from the sorted image, which can be sent as pairs (intensity, length of run). The vector formed by length of run gives us these dimensions. 3.4 Rounding the Residuals to Integers For simple linear regression, if we round the residuals and the recovered data we get all the errors (Original-Recovered) equal to zero, so rounding does not change the recovered data in the case of image Lena. For two-phase segmented linear regression we get errors from rounding for just 2 out of 46176 pixels, which means 99.9957% accuracy for the image Lena. The errors arise from residuals that end in.5, as can be seen next. Recovered=Intercept+Slope X+Residuals Recovered_From_Rounded=Intercept+Slope X+Rounded_Residuals If abs(residuals[i]-rounded_residuals[i])<0.5 then abs(recovered[i]- Recovered_From_Rounded[i])<0.5 25

So Rounded(Recovered[i])=Rounded(Recovered_From_Rounded[i])=Original[i] If abs(residuals[i]-rounded_residuals[i])=0.5 then abs(recovered[i]- Recovered_From_Rounded[i])=0.5, and we could have some problems, but these could be solved in the program, while saving the rounded residuals. Example If Recovered_From_Rounded[i]=100.5 then Rounded(Recovered_From_Rounded[i])=101 (we could use/construct a function that always rounds up for numbers that end in.5) while Original[i] could be 100. These small errors (+1 or -1 in this case) are not significant and could be detected and corrected by adding/subtracting 1 to/from the corresponding Rounded_Residuals[i] value. In this case, we set Rounded_Residuals[i] := Rounded_Residuals[i]-1 so Recovered_From_Rounded[i] will decrease by 1 also: Recovered_From_Rounded[i] := Recovered_From_Rounded[i]-1=100.5-1=99.5 so Rounded(Recovered_From_Rounded[i])=100 = Original[i]. In the case of the image Lena, one-phase (simple) linear regression didn t give any errors, while for two-phase segmented linear regression we got two errors: -1 in position 41978 and +1 in position 9728. We could improve data compressibility by also rounding the coefficients to integers. We could also use quantization (Sayood, 1996, p. 169): approximate them with multiples of a specific integer and then adapt the residuals accordingly. This could make the data even cheaper. The proposed algorithm to represent a permutation cheaply is given next. 26

3.5 An Algorithm for the Representation of the Permutation Next, we give an algorithm that uses regression for the representation of a permutation. 1) We cluster the sorted (by pixels) image in clusters that correspond to each intensity value (one-dimensional clusters). 2) We separate the permutation (needed to recover the initial image and represented by the y values in the graph) into groups corresponding to the previous clustering of the sorted image. 3) We take advantage of the possible correlation within each cluster in (2) in one of the following ways: i) We subtract each element (y) from the previous one for each cluster and transmit just the first element in the cluster and the differences (δ-transformation). ii) We apply linear (simple or two-phase) or non-linear regression to the elements from each cluster, considered in two-dimensions as in the graph of the permutation (we need to transmit the coefficients and the residuals = predicted y actual y). We determine the compressibility of the data for both types of regression and we choose the one that gives the best compressibility. In order to get smaller values for the y-intercepts, we shift clusters 2, 3, 4 to the left, such that the x values start with 1, 2, 3 Case (ii) is the one that we implemented and analyzed, because we consider it better than case (i) (adjacent values in a cluster might not be well correlated). We don t need to send or save the dimensions of the clusters (or by how much we shift each cluster), because they can be calculated from the sorted image. We don t need to send or save the x values in the graph of the permutation, because we consider them ordered from 1, 2, 3 to n, where n is the total number of pixels in the image. 27

Suppose we have an image with w h pixels. The w h residuals obtained from 3 (ii) are much smaller (with a much smaller domain) than the values in the original permutation, and they can be rounded to integers, with +/-1 (rare) adjustments in the residuals, in case of +/-1 error in the recovered data, as we explained in the previous section. Also, the entropy decreases, because many of these small values will repeat, and they can be encoded efficiently. The only concern is the range of the coefficients. In the case of simple linear regression fit, these coefficients are (slope, intercept), which are real numbers, but they are much fewer than the number of residuals w h (2 the number of clusters < 2 256=512). So, if the number of different intensities (clusters) is small, we have just a few coefficients to transmit, and the rest are small residuals. We can even try to set up an upper bound for these slopes and see how significant are the changes produced in the residuals. In fact, the slopes are already bounded above by N = the number of pixels in the image, and they are always positive. To see this, we can take the worst case, when, for example, the first y is 1 and the second y is N (the first cluster has two elements). The slope in this case is (N-1)/(2-1) = N-1 < N. These cases are not likely to appear very often in an image (the number of pixels is much greater than the maximum number of clusters, 255). A possible improvement of this method would be the removal of outliers (points that give high values for the residuals = predicted y actual y). Depending on the number of residuals marked for transmission, some of them could be ignored and replaced with the value given by the linear function. In this case, we shall have a partial recovery of the 28

initial permutation, and we could approximate the rest of the pixels with the mean or median of their neighboring pixels. The visible result might not be impacted too much, as a few pixels scattered over the image might not be noticeable (in this case, the bigger the frame resolution, the better). Information Initial data (permutation) One-phase (simple) linear regression (residuals+coefficients) Two-phase segmented linear regression (residuals+coefficients +break points) Number of values 46,176 46,530 47,061 Number of different values 46,176 11,940 6,062 Entropy 15.4949 13.1076 11.7768 Sum of absolute values 1,066,134,576 92,048,112 (8.6338% of original) 34,782,585 (3.2625% of original) Minimum value 1-9952 -71,744.6 Maximum value 46,176 18,335.91 35,631.5 Minimum value residuals Maximum value residuals - -9,952-8,404-11,781 8,014 Table 3: Representation of the inverse permutation for the image Lena (1) Tables 3 and 4 show the outputs of our experiments. Table 3 gives some global results, from which we can see the differences between the initial permutation and the new data used to represent it. When we try to fit a third-degree spline to the data using cubic spline regression, the global results don t improve, so we could consider the twophase segmented linear regression as a good method to use. We suppose that the change point for the two-phase linear regression is identical with the x value of an observation (x j, j>1). The parameters of the first regression line are estimated from the observations x l,, x j, and the parameters of the second regression line are based on the observations 29

x j+1,, x n. The break point x j is the point for which the sum of squared residuals for both data subsets is the minimum over all the possible break points. We could apply the same method to blocks of an image. Also, we could sort the residuals and use the same method for representing them as a pair (sorted version, permutation). Information Initial data (permutation) Cubic spline regression (residuals+coefficients) Number of values 46,176 46,884 Number of different values 46,176 7,279 Entropy 15.4949 12.1255 Sum of absolute values 1,066,134,576 65,481,430 (6.1419% of original) Minimum value 1-57,935.04 Maximum value 46,176 51,612.89 Minimum value residuals - -7,121 Maximum value residuals - 6,678 Table 4: Representation of the inverse permutation for the image Lena (2) To further reduce the entropy of the residuals from linear regression, we fitted an autoregressive model AR(1) (Cryer & Chan, 2008) to them and then rounded the new residuals. When we did this, the entropy of the residuals decreased from 11.7017 to 9.4436 for two-phase segmented linear regression and from 13.0782 to 9.3249 for simple linear regression. We showed that the data can be recovered exactly if we round at each intermediate step of the recovering process. We could save a few bits per element in representing the residuals, but overall, the improvement is not satisfactory. 30

None of the results are satisfactory in this case, but further research could be directed towards using the same methods for permutations resulting from blocks of images. Our goal in such studies would be to get a value for the entropy that is lower than 8, corresponding to the number of bits that it takes to represent a pixel in a monochrome image. In such cases, we think that one-phase or two-phase linear regression, or a combination of both would give much better results. The use of blocks is left for future research because of the amount of computation involved. The application of regression in compressing permutations corresponding to blocks of images could give better results in the case of images characterized by low spatial frequencies (with flat areas), like those from synthetic videos and cartoon videos. In such cases, the number of clusters obtained for one block will be small, depending also on the size of the blocks. If the dimensions of the blocks are not very large, the residuals could also be small. If the number of clusters increases; i.e., if we have a higher number of intensities in the blocks of the image, the volume of data that we need to save in addition to the residuals increases. There is a trade-off between: 1) getting a better approximation by using two-phase segmented linear regression and saving more information in addition to the residuals (break point, two slopes and two intercepts per cluster); 2) getting a worse approximation by using simple linear regression and saving less information in addition to the residuals (one slope and one intercept per cluster). The effectiveness of our method depends also on the number of elements in a cluster. For clusters with 2 elements, for example, the residuals would be zero, but we have to 31

save the information related to the regression lines: slopes and intercepts. As the number of elements in a cluster increases, the correlation between these elements could decrease. In images with a lot of details and high spatial frequencies, we shall have more clusters and less correlation in the data than in images with smooth areas and low spatial frequencies. The characteristics of the images we analyze will determine the size that we choose for blocks and the type of regression that we apply, when trying to compress the corresponding permutations. 3.6 Permutations and Video Compression Another approach we considered was to compare the permutations or inverse permutations and the sorted images of two adjacent frames in different videos. The purpose is to exploit the temporal correlation of a video, to see if there are some patterns in the differences of two consecutive frames. The data that we used here consists mainly of the sequence Akiyo (Burkardt), which is fairly static. We used three other videos, city, crew and harbour (Burkardt) to compare the differences between two adjacent sorted frames. In the case of fairly static sequences, these differences are small. All the videos that we used in experiments have a resolution of 352x288 and a frame rate of 30 frames per second. In a video where the frames do not change significantly over time, when a sorting permutation of the previous frame acts on the current frame, it produces an almost sorted frame. Once an initial permutation is transmitted or calculated from a sent frame, the sender can use it to almost sort the next frame. Having the sorting permutation of the received frame, the receiver can use it to recover the next frame, and so on. Therefore, 32

just the first permutation or the first (compressed) frame needs to be sent, and the almost sorted versions of the remaining frames, each of them sorted using the permutation of the previous frame (Socek, 2006). Next, we present two existing algorithms and we discuss some possibilities to improve them. We consider an encoder that compresses a video sequence with frames F 1, F 2,, F m using a compression algorithm C and sends it to the decoder, which uses a decompression algorithm to obtain the original video sequence. In the following algorithms, we use the following notations: F i (i = 1, 2,, m) frames of the video, P i (i = 1, 2,, m) sorting permutations of the frames F i, P -1 i (i = 1, 2,, m) inverses of the sorting permutations of the frames F i. Algorithm 2 is a variation of Algorithm 1. Both of them are lossless and are based on the concept of almost sorted frame, denoting a frame that is sorted using the sorting permutation of the previous frame. Later, we shall present Algorithm 3, which improves both of them. Figures 4-5 give the block diagrams for Algorithm 1 and Algorithm 2. Algorithm 1 (Socek, 2006) The encoder Given a video sequence F 1, F 2,, F m, the encoder computes P 1, the sorting permutation of F 1. The encoder calculates C(F 1 ) and transmits it. For each frame F i, i = 2, 3,, m, the encoder does the following: Computes P i-1 (F i ) and P i ; Encodes P i-1 (F i ) as C(P i-1 (F i )) and transmits it. 33

The decoder The decoder decodes C(F 1 ) into F 1 and obtains the sorting permutation P 1 of F 1 and P -1 1. For each encoded frame he receives, C(P i-1 (F i )), i = 2, 3,, m, he does the following: Decodes C(P i-1 (F i )) into P i-1 (F i ) and calculates F i = P -1 i-1 (P i-1 (F i )); Calculates the sorting permutation P i of F i and its inverse P -1 i. The second algorithm is the following: Algorithm 2 (Socek, Kalva, & Magliveras, 2007) The encoder Given a video sequence F 1, F 2,, F m, the encoder computes P 1, the sorting permutation of F 1. The encoder calculates C(F 1 ) and transmits it. For each frame F i, i = 2, 3,, m, the encoder does the following: Computes P i-1 (F i ) - P i-1 (F i-1 ); Encodes this as C(P i-1 (F i ) - P i-1 (F i-1 )) and transmits it. The decoder The decoder decodes C(F 1 ) into F 1 and obtains the sorting permutation P 1 of F 1 and P -1 1. For each C(P i-1 (F i ) - P i-1 (F i-1 )) he receives, i = 2, 3,, m, he does the following: Decodes C(P i-1 (F i ) - P i-1 (F i-1 )) into P i-1 (F i ) - P i-1 (F i-1 ) and calculates P -1 i-1 (P i-1 (F i ) - P i-1 (F i-1 )) = F i - F i-1 = DF; Calculates F i = F i-1 + DF; Calculates the sorting permutation P i of F i and its inverse P -1 i. 34

Pixel permutations are the fundamental operations used in these two algorithms and, as a result, the compressed videos are also encrypted, except for the first frame. In applications that require secure transmission, the first frame and all the following I frames have to be encrypted, using traditional encryption schemes such as AES. The same things are true about the improved algorithm (Algorithm 3) that is presented later. F i Apply Permutation P i-1 (F i ) c i P i-1 Get Permutation F i-1 Frame Buffer F i Figure 4: Block diagram for Algorithm 1 F i Apply P i-1 (F i ) Permutation P i-1 c i F i-1 Apply Permutation P i-1 (F i-1 ) Get Permutation F i-1 Frame Buffer F i Figure 5: Block diagram for Algorithm 2 35

Next, we study some differences that will help us to improve the previous algorithms. For Algorithm 1, we could send differences between consecutive P i-1 (F i ). A comparison of P 2 (F 3 ) - P 1 (F 2 ) and P 3 (F 3 ) - P 2 (F 2 ) for the video Akiyo (Figure 6) motivates us to seek further improvement, by using differences between adjacent sorted frames. We observe that P 3 (F 3 ) - P 2 (F 2 ) is much cheaper than P 2 (F 3 ) - P 1 (F 2 ) for fairly static sequences, and even for sequences with moderate motion, as we shall see later. In the following graphs, we denote by AS (almost sorted) a frame that was sorted using the sorting permutation of the previous frame, and by S (sorted), a frame that was sorted using its own sorting permutation. P 2 (F 3 ) - P 1 (F 2 ) as a vector, by rows P 3 (F 3 ) - P 2 (F 2 ) as a vector, by rows Figure 4: Motivation for further improvement Next, we study what happens if we sort each frame using its own permutation and send some additional data to recover it from the data that was previously sent. The 36

experiment is done for a video with low motion, where the assumption that all frames are part of a single scene holds. Suppose we send the first frame, F 1, as a compressed image. The receiver can find the permutation P 1 that sorts F 1. Instead of sending P 1 (F 2 ), an almost sorted frame (F 2 sorted with the permutation of the previous frame, F 1 ), we send P 2 (F 2 ), a sorted frame (sorted F 2 ). The receiver could recover F 2 from P 2 (F 2 ) using one of the two methods: 1) Lossless. Send additional values (non-fixed points of P 1 (F 2 ) - P 2 (F 2 )), so that the receiver can get P 1 (F 2 ) from P 2 (F 2 ). He has P 1 and P -1 1, so he can recover F 2. 2) Lossy. The receiver recovers a slightly different F 2. The difference P 2 (F 2 ) - P 1 (F 1 ) is cheap to send for fairly static sequences, and it could be cheaper to send this difference instead of P 2 (F 2 ). The difference P 1 (F 2 ) - P 2 (F 2 ) is more expensive, but we don t need to send all these values, if we consider a lossy case. Later, we shall prove that taking the differences in P 1 (F 2 ) - P 2 (F 2 ) = d is like taking the differences in P -1 2 (P 2 (F 2 )) - P -1 1 (P 2 (F 2 )) = P -1 1 (d), when we consider mapping to the original frame or finding the corresponding positions (row, column) in the original frame (F 2 ). The second formula is easier to use, so we use that one for checking the differences which are not 0. Suppose we are the sender. We take the two inverse permutations needed to recover the first two frames from their sorted versions, P -1 1 and P -1 2. We consider a function PermutationToPosition, which gives the position (row, column) of an element from the inverse permutation in the original image/frame. We neglect the points where P 1 (F 2 ) = P 2 (F 2 ) and we consider just the positions where these are different. Instead of trying to go 37

back from P 1 (F 2 ) and P 2 (F 2 ) to F 2, we can go back from P 2 (F 2 ) to F 2 in two different ways: using P -1 1 and P -1 2. We check if the values in the original image are equal, and if not (i.e., if P -1 1 (P 2 (F 2 )) P -1 2 (P 2 (F 2 ))=F 2 ), we record the differences. For the lossless case, we have to send all these values, so that the receiver can recover the original F2. For the lossy case we do the following: if these differences are less than a certain threshold value M (M=10 for example), we neglect them; if not, we save the positions and the values, and we send them, or we send just the positions, and try to approximate the corresponding pixels from their neighbors (as a mean or median of their neighbor values). We leave this last option, of sending just the positions and approximating from neighboring values, for future research. Next, we prove that taking the differences in P 1 (F 2 ) - P 2 (F 2 ) = d is like taking the differences in P -1 2 (P 2 (F 2 )) - P -1 1 (P 2 (F 2 )) = P -1 1 (d), when we consider mapping to the original frame. P 1 (F 2 ) - P 2 (F 2 ) = d P -1 1 (P 1 (F 2 ) - P 2 (F 2 )) = P -1 1 (d) P -1 1 (P 1 (F 2 )) - P -1 1 (P 2 (F 2 )) = P -1 1 (d) F 2 - P -1 1 (P 2 (F 2 )) = P -1 1 (d) or P -1 2 (P 2 (F 2 )) - P -1 1 (P 2 (F 2 )) = P -1 1 (d) F 2 = P -1 1 (P 2 (F 2 )) + P -1 1 (d) We get the same 9 values and positions of change (in the original frame) using either one of the two formulas. We observe the same 9 values from the graph of P -1 2 (P 2 (F 2 )) - P -1 1 (P 2 (F 2 )) = F 2 - P -1 1 (P 2 (F 2 )) (Figure 7.c) on the graph for P 1 (F 2 ) - P 2 (F 2 ) (Figure 8.c). In Figures 7.b and 8.b, the positions of the pixels that change by more than M = 10 are related to the images displayed in the corresponding figures (Figures 7.a and 8.a). 38

a. Frame F 2 for video Akiyo b. Pixels that change by more than M = 10 in P 2-1 (P2 (F 2 )) - P 1-1 (P2 (F 2 )) c. P 2-1 (P2 (F 2 )) - P 1-1 (P2 (F 2 )) as a vector, by rows d. Details about the changes (Row, Column) (104, 146) (140, 141) (141, 139) (141, 141) (142, 139) (202, 75) (202, 76) (203, 75) (203, 76) Change Amount P 2-1 (P2 (F 2 )) - P 1-1 (P2 (F 2 )) 16-16 16-17 11-20 -18-23 -21 Just 9 out of 101,376 values (less than 0.0089%) change by more than 10 The points/pixels that change by more than +/-10 when going from P 2 (F 2 ) to P 1 (F 2 ) or when approximating P 2-1 (P 2 (F 2 )))=F 2 with P 1-1 (P 2 (F 2 )) Figure 5: Changes relative to the original frame (F 2 ) 39

a. Sorted frame P 2 (F 2 ) for video Akiyo b. Pixels that change by more than M = 10 in P 1 (F 2 ) - P 2 (F 2 ) c. P 1 (F 2 ) - P 2 (F 2 ) as a vector, by rows d. Details about the changes (Row, Column) (148, 14) (162, 39) (162, 128) (200, 156) (241, 28) (265, 63) (281, 203) (286, 102) (288, 340) Change Amount P 1 (F 2 ) - P 2 (F 2 ) -16 16-17 16-20 11-21 -18-23 Just 9 out of 101,376 values (less than 0.0089%) change by more than 10 The points/pixels that change by more than +/-10 when going from P 2 (F 2 ) to P 1 (F 2 ) or when approximating P 2-1 (P 2 (F 2 )))=F 2 with P 1-1 (P 2 (F 2 )) Figure 6: Changes relative to the sorted frame (P 2 (F 2 )) 40

Original frame (F 2 ) Recovered F 2 with P 1-1 (P 2 (F 2 )) Original frame (F 102 ) Recovered F 102 with P 101-1 (P 102 (F 102 )) Figure 7: Frames recovered with the inverse of the sorting permutation of the previous frame 41

P 2 (F 2 )-P 1 (F 1 ) as a vector, by rows P 31 (F 31 )-P 1 (F 1 ) as a vector, by rows Results for the video Akiyo P 2 (F 2 )-P 1 (F 1 ) as a vector, by rows P 31 (F 31 )-P 1 (F 1 ) as a vector, by rows Results for the video city Frame from the video Akiyo Frame from the video city Figure 8: Differences between two sorted frames for the videos Akiyo and city 42

P 2 (F 2 )-P 1 (F 1 ) as a vector, by rows P 31 (F 31 )-P 1 (F 1 ) as a vector, by rows Results for the video crew P 2 (F 2 )-P 1 (F 1 ) as a vector, by rows P 31 (F 31 )-P 1 (F 1 ) as a vector, by rows Results for the video harbour Frame from the video crew Frame from the video harbour Figure 9: Differences between two sorted frames for the videos crew and harbour 43

The formula needed to get F 2 is F 2 = P -1 1 (P 2 (F 2 )) + P -1 1 (d). P 2 (F 2 ). P -1 1 and d are known by the receiver, so he can recover F 2. Further, we study the differences P 2 (F 2 ) - P 1 (F 1 ) for different sequences at the same resolution (352x288) to see how cheap they are. Also, we display P 31 (F 31 ) - P 1 (F 1 ) to see how much the sorted images change in one second (from frame 1 to frame 31). We observe that for videos with slow motion the differences are low, with few different values; while, for the video crew, with fast action, the differences are higher, and the number of different values is also higher. The video Akiyo gives the best results. We need just two bits per element to send the difference P 2 (F 2 ) - P 1 (F 1 ), because this difference has only four distinct values. We can also send it using run-length encoding, because many adjacent values are equal. If we want to apply the δ-transformation, we could also get an efficient representation. We check what happens if we consider sending the difference P 1 (F 2 ) - P 1 (F 1 ). P 1 (F 2 ) - P 1 (F 1 ) = d P -1 1 (P 1 (F 2 ) - P 1 (F 1 )) = P -1 1 (d) P -1 1 (P 1 (F 2 )) - P -1 1 (P 1 (F 1 )) = P -1 1 (d) F 2 - F 1 = P -1 1 (d) F 2 = F 1 + P -1 1 (d) Suppose we send the first frame. That means the receiver knows F 1, P -1 1, and if we send d = P 1 (F 2 ) - P 1 (F 1 ), the receiver can find F 2 = F 1 + P -1 1 (d). So all we need to send in this case is the first frame and the differences d i = P i (F i+1 ) - P i (F i ), for i = 1, 2,, n-1, where n is the total number of frames in the sequence. 44

We study the difference P 1 (F 2 ) - P 1 (F 1 ) for the video Akiyo with resolution 352x288 to see how cheap it is. We compare it with the first difference we studied, P 1 (F 2 ) - P 2 (F 2 ): [P 1 (F 2 ) - P 1 (F 1 )] - [P 1 (F 2 ) - P 2 (F 2 )] = P 2 (F 2 ) - P 1 (F 1 ), which is cheap for fairly static sequences, as can be seen in Figures 10 and 11. There is no significant difference between them. The gain we get by using P 1 (F 2 ) - P 1 (F 1 ) comes from sending just these differences in addition to the first frame, without having to send P 2 (F 2 ) - P 1 (F 1 ). The use of this method could give a good compression for fairly static sequences. The results in Figures 7-9 are for the sequence Akiyo with resolution 352x288. We observe that more than 99.9911% of the values do not change significantly when we approximate F 2 with P -1 1 (P 2 (F 2 )), in the case of sending each frame sorted using its own permutation. We also saw that P 2 (F 2 ) - P 1 (F 1 ) is cheap for fairly static sequences. The video Akiyo has low motion; there is little change caused by the movements of the presenter, and the camera does not move. In the video city, just the camera moves, and this causes a slow change in the scenery. It can be seen that there are more significant changes for this video than for the first one in the sorted frames, from frame 1 to frame 31 (in one second). The frame rate of all the videos we considered is 30 frames per second and the resolution is 352x288. The video crew has a lot of motion, and this affects the difference between the sorted frames, which is more expensive. Because the motion is fairly constant, the two graphs for this video don t look very different. The range of the differences does not increase significantly from frame 2 to frame 31. 45

The video harbour is characterized by the motion of a few objects (boats) on a static background, so the differences are not significant, even after one second. The third algorithm (lossless or lossy) can be summarized like this: Algorithm 3 The encoder Given a video sequence F 1, F 2,, F m, the encoder computes P 1, the sorting permutation of F 1. The encoder calculates C(F 1 ) and transmits it. He computes P 2 (F 2 ) - P 1 (F 1 ) and (F 2 ) F 2 ' = P -1 1 (P 2 (F 2 )). He sends C(P 2 (F 2 ) - P 1 (F 1 )) (very cheap). He calculates = F 2 - F 2 ' and sends it (lossless) or sends just additional information (positions (x, y) and differences) for the pixels with xy > M; for example, M = 10 (lossy). He calculates the adjusted F 2 '. Let F 2 '' be the adjusted F 2 ' (F 2 '' = F 2 for lossless). This F 2 '' can be calculated by the decoder, who mimics the calculations done by the encoder (he has F 1, P 2 (F 2 ) - P 1 (F 1 ) and the information needed to adjust F 2 ' to get F 2 ''). The encoder does the same things for F 2 '' and F 3 as he did for F 1 and F 2. For each frame F i, i = 3, 4,, m, the encoder does the following: Computes P i (F i ) - P i-1 '' (F i-1 '') and F i ' = P i-1 '' -1 (P i (F i )). Sends C(P i (F i ) - P i-1 '' (F i-1 '')) (very cheap). Calculates = F i - F i ' and sends it (lossless) or sends just additional information for the pixels with xy > M; for example, M =10 (lossy). He calculates F i '', the adjusted F i '. 46

The decoder The decoder decodes C(F 1 ) into F 1 and obtains the sorting permutation P 1 of F 1 and P -1 1. He decodes C(P 2 (F 2 ) - P 1 (F 1 )) into P 2 (F 2 ) - P 1 (F 1 ) and calculates P 2 (F 2 ). He calculates (F 2 ) F 2 ' = P -1 1 (P 2 (F 2 )). He adjusts F 2 ' to get F 2 '', using the received information about F 2 - F 2 '. He calculates the sorting permutation P 2 '' of F 2 '' and its inverse P 2 '' -1. For each C(P i (F i ) - P i-1 '' (F i-1 '')) he receives, i = 3, 4,, m, he does the following: Decodes C(P i (F i ) - P i-1 '' (F i-1 '')) into P i (F i ) - P i-1 '' (F i-1 '') and calculates P i (F i ). Calculates F i ' P i-1 '' -1 (P i (F i )). Adjusts F i ' to get F i '', using the received information about F i - F i '. Calculates the sorting permutation P i '' of F i '' and its inverse P i '' -1. If we use the lossless version, F i '' becomes F i. In this case, the results depend on the distribution of = F i - F i ' = F i - P i-1 '' -1 (P i (F i )), which the encoder will have to send for i = 2, 3,, m. In the lossy version, the loss can be reduced by using I frames from time to time, which are sent to the receiver with no prediction involved. For videos with slow motion, they can be sparsely distributed. By increasing their number, the propagated error for the lossy case could be reduced. The quality of the video for the lossy case and the amount of compression can be controlled by varying the value of M (for a smaller value, we get a better quality). This is one advantage of the last algorithm, compared to the previous ones. Another advantage is that the differences between two adjacent sorted frames are very cheap to send. The next figure gives a block diagram for Algorithm 3. 47

F i Apply Permutation P i (F i ) c i F i-1 " Apply Permutation P i-1 " (F i-1 " ) Frame Buffer F i-1 " F i ' Reconstructed Frame F i Error Control Figure 10: Block diagram for Algorithm 3 In the previous figure, the Error Control block is represented by the differences = F i - F i ' = F i - P i-1 '' -1 (P i (F i )) and the Reconstructed Frame block by F i '', the adjusted F i '. This F i '' is calculated by adding to F i ' = P i-1 '' -1 (P i (F i )) the differences from = F i - F i ' whose absolute values are greater than M. For M = 0, all the differences in are sent. In our implementation of this algorithm, for M > 0, we chose the approach of setting to 0 all the differences in that were less than or equal to M, and sending this modified version of, after observing that this method gave a better compressibility than that of sending just positions (x, y) in the frame, and differences, for the pixels with xy > M. 48

Performance Evaluation We compare the first two algorithms and the lossless version of the third algorithm by compressing and comparing the sizes of the files in which we stored the information necessary to recover the videos. These files were compressed using 7-Zip, with archive format 7z. For our experiments, we used the programming language R and a desktop computer Dell Optiplex GX 745, Intel (R) Core (TM) 2 CPU, 2.40 GHz, 3.00 GB of RAM, with MS Windows XP. The experiments were done on sub-sequences of the original videos (30 frames). We excluded the first frame in our comparisons, due to the variety of methods that could be used to compress that frame, which has to be saved or sent in all the three algorithms. For Algorithm 3, we saved the information needed to recover a video sequence into two files: one file with the differences P i (F i ) - P i-1 '' (F i-1 '') and one file corresponding to the differences = F i - F i ' = F i - P i-1 '' -1 (P i (F i )). We did this so that we could take advantage of the high compressibility of the differences between two sorted frames, P i (F i ) - P i-1 '' (F i-1 ''). The next tables show that the lossless version of the third algorithm gives better results than the first two algorithms, for the video Akiyo. From Table 5, corresponding to the lossy version of Algorithm 3 (with M > 0), we observe that as M increases, MSE (Mean Squared Error) increases and PSNR (Peak Signal to Noise Ratio) decreases. This reflects the fact that as M increases, the quality of the recovered video decreases (a lower MSE and a higher PSNR indicate a better quality of the video). Table 6 was constructed using the sizes of the files in which we stored the information necessary to recover the videos, excluding the first frame. 49

PSNR Algorithm 3 M=1 Algorithm 3 M=5 Algorithm 3 M=10 MSE 0.2053795 1.970367 4.987364 PSNR 54.10565 44.58534 41.1407 Table 5: Results for the third algorithm (lossy) for the video sequence Akiyo Algorithm 3 M=1 Algorithm 3 M=5 Algorithm 3 M=10 Bitrate (Kbps) 2565.52 1059.31 670.34 Algorithm 1 Algorithm 2 Algorithm 3 M=0 Bitrate (Kbps) 4725.52 4386.21 3748.97 Table 6: A comparison of the three algorithms for the video sequence Akiyo 60.00 55.00 50.00 Comparison for the video Akiyo - PSNR vs. M (Algorithm 3) 45.00 40.00 35.00 30.00 0 5 10 15 20 25 M Figure 11: PSNR (Peak Signal to Noise Ratio) for Algorithm 3 (lossy) with different values of M, for the video sequence Akiyo 50

Video sequence Algorithm 1 (Kbps) Algorithm 2 (Kbps) Algorithm 3 (Kbps) Rate of improvement Alg. 3 vs. Alg. 1 Alg. 2 Akiyo 4725.52 4386.21 3748.97-20.67% -14.53% mother_daughter 9310.34 9020.69 8151.72-12.44% -9.63% hall_monitor 11842.76 12289.66 10303.45-13.00% -16.16% crew 13944.83 13704.83 11445.52-17.92% -16.49% harbour 19812.41 19166.90 15500.69-21.76% -19.13% city 18960.00 18620.69 16841.38-11.17% -9.56% Table 7: Bitrate comparison of the three algorithms Table 7 gives a comparison of the three algorithms for six videos. In this table, the smaller the number (or the greater the absolute value), the better the improvement is. We observe that Algorithm 2 gives better results than Algorithm 1 for five out of six videos (excluding hall_monitor), and Algorithm 3 gives better results than the other two for all the videos. The level of compression for the third algorithm improves significantly when M increases, as we shall see in the next section. 3.7 Comparison of the Third Algorithm with H.264 H.264 is the newest international video coding standard, known also by the name Advanced Video Coding (AVC). Compared to the previous standards (MPEG1, MPEG2, MPEG4, H.261, H.263), H.264 introduces some changes in the details of their basic functional elements (prediction, transform, quantization, entropy encoding). (Richardson, 2002-2011) 51

We shall now give more details about H.264 and the way it works, taken from (Richardson, 2007-2011). The H.264/AVC standard was first published in 2003 and it offers better compression performance and greater flexibility in compressing, transmitting and storing videos. H.264 is more complex than the previous compression methods. It is more computationally expensive to compress or decompress videos with H.264 than with earlier methods. The current applications of H.264 include: High Definition DVDs (HD-DVD and Blu-Ray formats), High Definition TV broadcasting in Europe, Apple products including itunes video downloads, ipod video and MacOS, NATO and US DoD video applications, Mobile TV broadcasting, Internet video, Videoconferencing. (Richardson, 2007-2011) The H.264 encoder divides a video frame in macroblocks (blocks of pixels). A prediction of a macroblock is constructed from the current frame (intra prediction) or from other frames that have already been coded and transmitted (inter prediction). By subtracting the prediction from the current macroblock, the encoder obtains a residual. Intra prediction uses 16x16 and 4x4 block sizes while inter prediction uses a range of block sizes from 16x16 down to 4x4. Motion estimation refers to finding a suitable inter prediction and motion compensation is subtracting an inter prediction from the current macroblock. The next step of the encoder is to apply an integer transform, an approximate form of the Discrete Cosine Transform (DCT), to the residual data. This transform generates a block of coefficients, each of which is a weighting value for a standard basis pattern. These coefficients are then quantized; i.e., each coefficient is divided by an integer value, according to a quantization parameter (QP). This way, the precision of the transform 52

coefficients is reduced. If they cannot be recovered exactly, but rather approximated, we obtain lossy compression. Usually, the integer transform outputs a block in which most or all of the coefficients are zero, with a few non-zero coefficients. If we choose a high value for QP, more coefficients will be set to zero after quantization, resulting in a high compression but a poor quality for the decoded image. Choosing a low value for QP will leave more non-zero coefficients after quantization, and this gives a better quality for the decoded image, but lower compression. In the next step of the encoding, the quantized transform coefficients are combined with other information needed to recover the video, and the encoding of this information forms the compressed bitstream, which can be stored and transmitted. The decoder reverses the encoding process by: decoding the compressed H.264 bitstream, rescaling the coefficients, applying an inverse transform to reconstruct the blocks of the residual data, grouping these blocks into residual macroblocks, forming an identical prediction to the one created by the encoder, and adding the prediction to the decoded residual to reconstruct a decoded macroblock, which is a part of a decoded video frame. (Richardson, 2007-2011) Performance evaluation Next, we present a comparison between our third algorithm and H.264, version JM 18. The algorithms were applied to grayscale videos. The encoder parameters for H.264 were: profile 66 (baseline - using only I and P frames, not B), level 4.0, intra period 0 (only first picture is I), number of reference frames 5 (up to 5 previous frames used for 53

prediction of current frame), RD optimization off (we used fixed QP values for frames, which means that bitrate is not constant). In our algorithm, M is similar to QP in the way that it controls the quality and the amount of compression. The higher the M, the better compression we obtain, at the expense of video quality. As we mentioned in the previous section, for our algorithm (Algorithm 3), we saved the information needed to recover a video sequence into two files: one file with the differences P i (F i ) - P i-1 '' (F i-1 '') and one file corresponding to the differences = F i - F i ' = F i - P i-1 '' -1 (P i (F i )). We did this so that we could take advantage of the high compressibility of the differences between two sorted frames, P i (F i ) - P i-1 '' (F i-1 ''). We then compressed these two files using 7-Zip, with archive format 7z. The bitrate was calculated taking into consideration the sizes of these two files, with the first frame excluded. In H.264, this first frame was also excluded from calculations. In addition to the previously considered videos, we added two more: hall_monitor and mother_daughter (Kalva). Hall_monitor is a video recorded by a surveillance camera and mother_daughter is similar to Akiyo, but it has two moving characters instead of one. For lossless and close-to-lossless image compression we obtained better results than H.264 in terms of PSNR vs. bitrate, in 5 out of the 6 videos. The one which gave worse results is the sequence city, recorded with a moving camera. For the same PSNR, the quality of the frames is better for our algorithm, as we do not lose details. We can see in the displayed frames how H.264 loses details for smaller PSNR (Figure 18). Our algorithm is less complex than H.264, because it does not use as many processing steps as H.264. 54

The following table compares the lossless algorithm from the previous section with the near lossless version (QP=0) of H.264. The near lossless mode of H.264 (QP=0) has a PSNR of approximately 70 db, which corresponds to a mean squared error of 0.0065. The lossless version of Algorithm 3 from section 3.6 has a mean squared error of 0. Video sequence H.264, QP=0 (Kbps) Algorithm 3 (Kbps) Rate of improvement Alg. 3 (lossless) vs. H.264 (near lossless) Akiyo 3985.57 3748.97-5.94% mother_daughter 9737.63 8151.72-16.29% hall_monitor 12711.31 10303.45-18.94% crew 12170.32 11445.52-5.96% harbour 15810.46 15500.69-1.96% city 12824.61 16841.38 31.32% Table 8: Bitrate comparison of Algorithm 3 with H.264 In our discussion, when we refer to Our Method, we mean Algorithm 3 from section 3.6. For a better picture of the results, in Figures 14, 15 and 17 we plotted the lossless version of our algorithm with a PSNR equal to that of the near lossless version of H.264 (QP=0) because for lossless compression PSNR is not defined. The average number of differences which are greater than M in absolute value (differences that we have to save or transmit) decreases as M increases. If the frequency of the I frames increases in our algorithm, we obtain a better quality of the videos because the number of differences greater than M in = F i - F i ' = F i - P i-1 '' -1 (P i (F i )) increases with the distance of a P frame from the previous I frame. 55

We observe that our method gives a clearer image for the same PSNR (Figure 18), and that the quality of the image for the same bitrate is comparable to that of H.264 (Figures 19-21). An advantage of our algorithm is its simplicity compared to H.264, in which the data has to go through several processing steps before it reaches its final form in the compressed file. For videos in which a lossless compression is necessary, the advantage of our algorithm is more evident, as, for the majority of the videos, we obtain a lower bitrate for the lossless case than the near lossless case of H.264. From Table 8, we observe that for the videos mother_daughter and hall_monitor we obtained the best improvement rates for the lossless case of our method versus the near lossless case of H.264. The worst improvement was obtained for the video harbour, characterized by fast movement of big objects (boats). The only video in which we did not obtain improvement is the video city, recorded with a moving camera. For the first three videos (Akiyo, mother_daughter and hall_monitor), which have less movement, the performance of the lossy versions of our algorithms is close to that of the lossy versions of H.264, as can be seen in Figures 14.a 14.c. From Figure 16, we observe that for the same three videos, the average number of differences = F i - F i ' = F i - P i-1 '' -1 (P i (F i )) that are greater than M decreases faster as M increases. The videos which give the worst performance in our lossless version (harbour and city) give the worst performance in our lossy versions also, as can be seen in Figures 14.e 14.f. For these two videos, the average number of differences = F i - F i ' = F i - P i-1 '' -1 (P i (F i )) that are greater than M decreases slowest, compared to the other videos. We can observe this fact in Figure 16, where these differences decrease slowest for the video harbour, followed by city and crew. The performance of our algorithm does not depend 56

just on the number of these differences, but also on P i (F i ) - P i-1 '' (F i-1 ''). The next table gives additional details on these differences for the lossless case. Video sequence Bitrate for P i (F i ) - P i-1 '' (F i-1 '') (Kbps) Bitrate for = F i - F i ' = F i - P i-1 '' -1 (P i (F i )) (Kbps) Total bitrate (Kbps) Akiyo 124.14 3624.83 3748.97 mother_daughter 115.86 8035.86 8151.72 hall_monitor 148.97 10154.48 10303.45 crew 132.41 11313.10 11445.52 harbour 140.69 15360.00 15500.69 city 107.59 16733.79 16841.38 Table 9: A break for the bitrate of our algorithm for the lossless case The bitrate for P i (F i ) - P i-1 '' (F i-1 '') does not change significantly for the lossy case, compared to the lossless case. What really changes is the bitrate for = F i - F i ' = F i - P i-1 '' -1 (P i (F i )). In our experiments, we did not take advantage of the improvements that could be obtained when representing values using bit manipulation. Future work could be directed towards improving the compression of the data using operations on bits. This would add more complexity to our method, but would probably give further improvements, for both the lossless version and the lossy version, bringing the performance of our lossy version closer to that of H.264. The next figures were used in our performance evaluation. We observe that the quality of our video is comparable to that of H.264, for the same bitrate (Figures 19-21). Also, for the same PSNR, the quality of the videos for H.264 is worse than that of our algorithm, even if the bitrate is much better (Figure 18). 57

PSNR PSNR PSNR 80 70 60 50 40 30 20 10 0 a. Comparison for video Akiyo 0 1000 2000 3000 4000 5000 Bitrate H.264 Our Method 80 70 60 50 40 30 20 10 0 b. Comparison for video mother_daughter 0 2000 4000 6000 8000 10000 12000 Bitrate H.264 Our Method 80 70 60 50 40 30 20 10 0 c. Comparison for video hall_monitor 0 5000 10000 15000 Bitrate H.264 Our Method Figure 14: Comparison of performance for the two methods (PSNR vs. bitrate), for each video (continued on next page) 58

PSNR PSNR PSNR (Figure 14 continued) 80 70 60 50 40 30 20 10 0 d. Comparison for video crew 0 5000 10000 15000 Bitrate H.264 Our Method 80 70 60 50 40 30 20 10 0 e. Comparison for video harbour 0 5000 10000 15000 20000 Bitrate H.264 Our Method 80 70 60 50 40 30 20 10 0 f. Comparison for video city 0 5000 10000 15000 20000 Bitrate H.264 Our Method Figure 12: Comparison of performance for the two methods (PSNR vs. bitrate), for each video 59

PSNR PSNR a. Comparison for all the videos (Our Method) 80 70 60 50 40 30 20 10 0 0 5000 10000 15000 20000 Bitrate Akiyo mother_daughter hall_monitor crew harbour city 80 70 60 50 40 30 20 10 0 b. Comparison for all the videos (H.264) 0 5000 10000 15000 20000 Bitrate Akiyo mother_daughter hall_monitor crew harbour city Figure 13: Comparison of performance for each method (PSNR vs. bitrate), for all the videos 60

PSNR Average number of differences Average number of differences in with absolute value greater than M vs. M 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 0 5 10 15 20 25 M Figure 14: Comparison of performance for our method (Differences vs. M), for all the videos Akiyo mother_daughter hall_monitor crew harbour city 65.00 Comparison for all the videos - PSNR vs. M (Our Method) 60.00 55.00 50.00 45.00 40.00 35.00 30.00 25.00 Akiyo mother_daughter hall_monitor crew harbour city 20.00 0 5 10 15 20 25 M Figure 15: Comparison of performance for our method (PSNR vs. M), for all the videos 61

a. Frame mother_daughter Our method, PSNR=35, bitrate=537.93 Kbps (M=15) b. Frame mother_daughter H.264, PSNR=34.96, bitrate=25.49 Kbps (QP=35) Figure 16: Comparison of the two methods for the same PSNR (Video mother_daughter) 62

a. Frame mother_daughter Our Method, PSNR=33.99, bitrate=496.55 Kbps (M=16) b. Frame mother_daughter H.264, PSNR=45.18, bitrate=501.1 Kbps (QP=18) Figure 17: Comparison of the two methods for the same bitrate (Video mother_daughter) 63

a. Frame hall_monitor Our Method, PSNR=35, bitrate=438.62 Kbps (M=20) b. Frame hall_monitor H.264, PSNR=40.02, bitrate=426.95 Kbps (QP=24) Figure 18: Comparison of the two methods for the same bitrate (Video hall_monitor) 64

a. Frame harbour Our Method, PSNR=35.9, bitrate=9856.55 Kbps (M=10) b. Frame harbour H.264, PSNR=50.76, bitrate=9707.17 Kbps (QP=11) Figure 19: Comparison of the two methods for the same bitrate (Video harbour) 65

a. Frame city Our Method, M=20 b. Frame city H.264, QP=35 c. Frame harbour Our Method, M=20 d. Frame harbour H.264, QP=35 e. Frame crew Our Method, M=20 f. Frame crew H.264, QP=35 Figure 20: Comparison of artifacts for low quality videos (Our Method vs. H.264) 66

a. Frame city, M=0 b. Frame city, M=3 c. Frame city, M=5 d. Frame city, M=10 e. Frame city, M=15 f. Frame city, M=20 Figure 21: Screen shots of frames from the video city, as M increases Our Method 67

a. Frame city, M=0 b. Frame city, M=3 c. Frame city, M=5 d. Frame city, M=10 e. Frame city, M=15 f. Frame city, M=20 Figure 22: Screen shots of frames from the video city, as M increases Our Method 68

a. Frame harbour, M=0 b. Frame harbour, M=3 c. Frame harbour, M=5 d. Frame harbour, M=10 e. Frame harbour, M=15 f. Frame harbour, M=20 Figure 23: Screen shots of frames from the video harbour, as M increases Our Method 69

a. Frame harbour, M=0 b. Frame harbour, M=3 c. Frame harbour, M=5 d. Frame harbour, M=10 e. Frame harbour, M=15 f. Frame harbour, M=20 Figure 24: Screen shots of frames from the video harbour, as M increases Our Method 70