Data Compression. The Encoder and PCA

Data Compression The Encoder and PCA Neural network techniques have been shown useful in the area of data compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the information represented is actually lost. JPEG and M PEG (video & audio) compression standards are examples of lossy compression whereas LZW and 'packbits' are lossless. Neural net techniques can be applied to achieve both lossless and lossy compression. The following is a closer look at examples of different neural net based compression techniques. 1

The Encoder Self-supervised backpropagation The input is reproduced on the output Hidden layer compresses data Only hidden layer outputs transmitted Output layer used for decoding The encoder is a multi-layer perceptron, trained to act as an autoassociator, using backpropagation. 2

The Encoder The net is trained to produce the same output pattern that appears on the input. This is also known as self-supervised backpropagation. The aim is to reproduce the input pattern on the output, but using as few hidden layer neurons as possible. The output of the hidden layer then becomes the data to be transmitted. The "compressed" data is decoded at the receiver using the weights on the output layer. The illustration shows how an n dimensional input pattern can be transmitted using less than n inputs (since there are less than n hidden units). 3

The Encoder Lossless compression N orthogonal input patterns can be mapped onto log 2 N hidden units. Lossy compression N < log 2 N It is known (Rumelhart & M cclelland, 1986) that a set of N orthogonal input patterns can be mapped onto log 2 N hidden units. Thus, a figure of log 2 N can be taken as a theoretical minimum number of hidden units to achieve lossless compression. 4

The Encoder Cottrell et. Al. (1987) Image compression Greyscale 8 bit image, any dimensions Network size : 64 in, 64 out and 16 hidden Image processed in 8x8 patches. An example of this approach for image compression was investigated by Cottrell et al. (1987). The aim here was to compress an image (of any size). Their approach used a network with 64 inputs (representing an 8x8 image patch), 16 hidden units and 64 outputs. Each input represented a 256 level. 5

The Encoder Near state of the art results obtained! 64 greyscale pixels compressed by 16 hidden units. 150,000 training patterns Compression is image dependent however. Encode & transmit first 8x8 patch The net was trained using 150,000 presentations of input taken randomly from 8x8 patches of the image. Applying the net to each 8x8 non-overlapping patch of the image Cottrell obtained near state of the art compression results! Note however that compression was very much tuned to the actual image compressed and that results with other kinds of images were less impressive. 6

Principal Component Analysis PCA is dimensionality reduction m bit data converted to n bit data where n < m Another way to view data compression is to regard it as a reduction in dimensionality. That is, can a representation of a set of patterns expressed using n bits of information, be adequately described using m bits, where m is less than n? The goal is to effectively represent the same data using a reduced set of features. Given a set of data then principal component analysis, as we have already seen, attempts to identify axes (or principal components) along which the data varies the greatest. 7

PCA By definition PCA is lossy compression Reduction in number of features used to represent data Which features to keep and which to remove? 8

PCA Principal components Are axes along which data varies the most 1 st principal component exhibits greatest variance 2 nd principal component exhibits next greatest variance Etc. The first principal component is regarded to be the axis, which exhibits the greatest variance in data. The second component is orthogonal to the first and shows the second greatest variance of data, the third is orthogonal to the first two and so on. 9

PCA 2 nd component orthogonal to 1 st 3 rd orthogonal to 2 nd Etc. Original axes - clusters difficult to discriminate Original Data 2 nd PC 1 st PC Principal components - easier to discriminate 10

PCA The Hebb Rule Oja 1992 : a single neuron, can be trained to find the 1 st PC Sanger 1989 : in general m neurons can be trained to find the first m PCs The generalized Hebbian algorithm (GHA) In terms of neural nets, a Hebb like learning rule can be used to train a single neuron so that the weights converge to the principal component of a distribution (Oja, 1992). In general, a layer of m neurons, can be trained using a "generalized Hebbian algorithm" (GHA) to find the first m principal components of a set of input data (Sanger 1989). 11

PCA & Image Compression Haykin 1994 Describes the GHA for image compression Example Input image 256 x 256, each pixel with 256 grey levels PCA network 8 neurons, each with an 8 x 8 receptive field Haykin describes an application of GHA for image compression. A 256 by 256 image, where each pixel had 256 grey levels was chosen for encoding. The image was coded using a linear feedforward network of 8 neurons, each with 64 inputs. Training was performed by presenting the net with data taken from 8x8 non-overlapping patches of the image. To allow convergence, the image was scanned from left to right, and top to bottom, twice. The 8 neurons represent the first 8 eigenvectors obtained by convergence. (Sanger's rule). 12

PCA & Image Compression Haykin 1994 Processing Image scanned top-to-bottom, and left-to-right. The neurons let to converge. The 8 neurons represents the first 8 eigenvectors. 13

PCA & Image Compression Example input image From Haykin, S., "Neural Networks - A Comprehensive Foundation", 1994 Once the weights of the network had converged, they were used to encode the image (shown above) for transmission. 14

PCA & Image Compression Encoding details Each 8 x 8 block multiplied by each neuron This gives 8 coefficients Coefficients transmitted. In Haykin s example, 23 bits needed. I.e. 23 bits encoded an 8x8x8 image patch. Transmission Each 8x8 block of the image was multiplied by the weights from each of the eight neurons (I,e. applied to each neuron). This generated 8 outputs or coefficients. The coefficient from each neuron was transmitted. The number of bits chosen to represent each coefficient is determined by variance of the coefficient over the whole image. (I.e. a larger number of bits are needed to represent something that varies a lot, rather than something that varies a little). In the example described in Haykin, this required 23 bits to code the outputs of the 8 neurons. (That is, 23 bits were required to encode each 8x8 block of pixels, where each pixel was represented using 8bits). 15

PCA & Image Compression The weights of the 8 neurons From Haykin, S., "Neural Networks - A Comprehensive Foundation", 1994 The illustration above shows the weights obtained by each of the eight neurons. In the diagram, light areas depict positive weights and dark areas negative (or inhibitory) weights. 16

PCA & Image Compression Decoding details Neurons used to decode transmitted coefficients. Weights x coefficient = 8 x 8 patch reconstructed Receiving (decoding) The image was reconstructed from the transmitted coefficients using the neurons again. This time however, the weights of each neuron were multiplied by their coefficient and then added together to reconstruct each 8x8 patch of the image. The following illustration shows the weights obtained by each of the eight neurons. In the diagram, light areas depict positive weights and dark areas negative (or inhibitory) weights. 17

PCA & Image Compression! 1 1 2 2 Transmission 8x8 patch of image 8 8 Weights of each neuron represent one of first eight principal components of image data. Obtained using Sanger's rule. 18

PCA & Image Compression Example output image Input Image Output Image From Haykin, S., "Neural Networks - A Comprehensive Foundation", 1994 19