USING FRACTAL CODING FOR PROGRESSIVE IMAGE TRANSMISSION

Size: px

Start display at page:

Download "USING FRACTAL CODING FOR PROGRESSIVE IMAGE TRANSMISSION"

Derick Henry
6 years ago
Views:

1 USING FRACTAL CODING FOR PROGRESSIVE IMAGE TRANSMISSION Y.-Kheong Chee Internal Report School of Elec. and Computer Engineering Curtin University of Technology GPO Box U 1987 Perth, Western Australia 6001

2 1. ABSTRACT INTRODUCTION FRACTAL IMAGE COMPRESSION Background Jacquin s block-based fractal coder A Review of the State of the Art Advantages of Fractal Coding Multiresolutional decoding Good performance at high-compression ratios Relatively fast decoding speed Opportunities for improvement Disadvantages of Fractal Coding Lengthy encoding time Lack of PIT capability Patented technology Approximation problems Selecting the Fractal Coder for PIT PROGRESSIVE IMAGE TRANSMISSION Introduction Successive-Approximation Methods...20 i

3 4.1.2 Multistage or Residual-Quantisation Methods Transmission-Sequence-Based Methods Methods Based on Multiresolutional Coding TRANSMISSION-SEQUENCE-BASED FRACTAL CODER Introduction Specifying the Transmission Sequence The Traditional Approach The New Perceptually Based Approach The Block Classification System Introduction A Review of Classifiers Used in Image Coding Wang and Mitra (1993) s Classifier Implementing the Classifier Fractal Coding System Overview of the Coder Quadtree Structure Creating Mean-Residual Blocks Coding the Transformation Parameters Preliminary Simulation Results and the Difficulties Encountered Simulation Results...38 ii

4 5.5.2 Low Compression Ratio Block-Interdependence Nature of Fractal Coding Conclusion PYRAMIDAL FRACTAL CODER Introduction Motivation Pyramidal Decomposition Fractal Coding Interpolation Methods Iterated coding Coding of uniform blocks Coding of the transformation data Mean-removal coding Simulation Results Coding parameters Comparison with other coders Improvements to the Coder Pyramidal decomposition Higher Order Approximation Functions More efficient entropy coding...64 iii

5 6.6.4 Miscellaneous Methods CONCLUSIONS ACKNOWLEDGEMENTS REFERENCES...67 iv

6 1. ABSTRACT Fractal coding is gaining in popularity as an efficient still-image coder. One drawback of conventional block-based fractal coders, however, is that they do not offer progressive image transmission. This paper proposes two new methods by which fractal image coders can progressively transmit the image data. The first method, the transmission-sequence-based fractal coder (TSFC), progressively transmits the image blocks depending on their perceptual importance. The second method is the pyramidal fractal coder, (PFC) and it operates by progressively transmitting the pyramid levels of the coded image. Simulation results with PFC showed that existing quadtree fractal coders can readily be extended to provide progressive transmission, and that high quality images can be obtained with a compression ratio of 16:1. subject terms: progressive image transmission; image coding; fractal coding; transmission sequence; block classification; pyramidal structure 2. INTRODUCTION Transmission of digital images is becoming an increasingly challenging task with the growing number of images to be handled and the enormous amount of data needed to represent each image. This problem can be partially countered by using higher-bandwidth communication systems. However, a more effective solution should consider also how the digital images are represented and the transmission method employed. Seeking for a better method to represent a digital image has led to a myriad of image compression techniques. This has been proven to be an effective approach to tackle the problem. However, a better solution is to also improve the transmission method. Progressive image transmission (PIT) is an elegant method for making effective use of communication bandwidth. Instead of transferring an image at full resolution in a sequential manner, PIT will transmit an approximate image first. The quality of this image is then progressively improved over a number of transmission passes. The advantage behind this is that it allows the user to quickly recognise an image and make an accept or reject decision. 1

7 Current state-of-the-art compression techniques, such as those that involve wavelet transform and vector quantisation, can deliver grayscale images of acceptable quality with compression ratios greater than 30:1. More recently, fractal coding has been proposed as a promising new method for image coding (Jacquin, 1992). Indeed, fractal coding can outperform many of the other methods for applications that require very high compression ratios (Fisher et al., 1994). However, quadtree fractal coders do not have the capability for progressive transmission. This shortcoming limits the types of applications fractal coding can be used for, considering that other high-compression methods such as wavelet coding do offer progressive transmission. This paper considers the use of fractal coding for progressive image transmission (PIT). It explores the difficulties of implementing such a system and the main criteria considered in designing it. Two coding methods are proposed and simulation results presented. The first method, the transmission-sequence-based fractal coder (TSFC), progressively transmits the image blocks depending on their perceptual importance. The transmission order is dependent on the block type, which is determined by a human-visual-system-based classifier. The second method is the pyramidal fractal coder, (PFC) and it operates by progressively transmitting the pyramid levels of the coded image. Section 3 of this paper outlines the mechanism of fractal image compression. A review of the state-of-the-art is then given, showing how the basic theory has been extended to improve coding speed and rate-distortion performance. Section 4 reviews the four major categories of PIT methods. Based on a number of criteria, two of the methods are chosen for further analysis. Section 5 presents the motivation, implementation, and simulation results for TSFC. In addition, a comprehensive discussion of block-classification systems is also given. Section 6 discusses PFC in depth. The use of the pyramid structure for image coding is explored, and implementation details of the PFC are given. In addition to the use of the basic pyramid structure, a number of other methods that improves fractal coding are proposed. Finally, simulation results are given and improvements to the coder are suggested. 2

8 3. FRACTAL IMAGE COMPRESSION 3.1 Background The mathematical foundations for fractal geometry can be found in Barnsley (1988) and Peitgen et al. (1992). Discussion here are limited to instances of fractal mathematics that are relevant to in image coding, namely the concepts of contractive transformations, iterated function system (IFS), and the Collage Theorem. The basis behind fractal coding is that natural images exhibit self-similarity at different scales, and fractal coding makes use of contractive transformations in the form of an IFS to exploit this redundancy. A metric space can be denoted by (X, d), where d(, ) denotes the metric. For example, the space for the set of points of a binary image can be represented by R 2. In addition, the dimensions M N of the image can be used to define the rectangular support for this space: H = A B : 0 A M, 0 B N and A, B R. (1) This can be extended to a grayscale image with an intensity function f: H I. This intensity function is bounded so that I = [a, b] R and the interval is typically [0, 255]. Thus, a grayscale image can be represented by a subset of points in the space X = H I. To complete the specification of a metric space, a measure of distance between points in the space is defined. A simple metric such as sup can be selected: { } (, ) sup (, ) (, ) :(, ) d f1 f2 = f1 x y f2 x y x y H. (2) A transformation operates in a metric space, and a particular transformation w : X X is contractive if there is a constant 0 s < 1 such that (Barnsley 1988, Chapter 3) [ ( ) ( )] ( ) d w x, w y sd x, y x, y X. (3) Thus, each iteration of a contractive mapping on a set of points will always bring the points closer together, as measured by the metric d(, ). Furthermore, the value s is known as the contractivity factor for w. The contractivity factor determines how quickly the points will 3

9 converge together into a fixed point or attractor. This is proven with the contractive mapping theorem, which states that w has a unique fixed point x w X such that ( ) w xw = xw. (4) Significantly, for any point x X, the sequence of forward iterations of w { w οn ( x) : n,,,...} = will converge to the fixed point x w. Thus, ( ) οn lim w x = xw x X. (5) n This theorem thus assures that starting with any initial point x, the transformation will always converge to the fixed point x w. A commonly used form of transformation is the affine transformation. It can be represented in binary-image space as: w[ ] w x a b x e x = Ax t 1 x = c d 1 x + f = +, (6) 2 2 where (x 1, x 2 ) are the image-space coordinates and {a, b,..., f} are the set of transformation parameters. The affine transformation has important algebraic and geometrical properties because it can perform scaling, rotation, stretching, skewing, and translation on the input set of points. The definition for an iterated function system (IFS) combines a number of these terms: an IFS { X; w, n,2,..., N} n = 1 operates in a complete metric space (X, d) and consists of N contractive maps w n : X X, for n = 1, 2,..., N. The overall transformation of these maps can be denoted W: X X. Each contractive map w n is associated with a contractivity factor s n, with the overall contractivity factor of the IFS being s = max { s : n =,2,..., N} n 1. 4

10 For example, the IFS for the Sierpinski gasket consists of three mappings that are given as follows: w x x 0 1 y = y +. 0 w x x 0 2 y = y w x x 05 3 y = y These transformations can be visualised more clearly with Figure 1. w w w 3 Figure 1: Affine transformations for the Sierpinski gasket An iteration of this IFS on the point x X can be denoted as ( ) = w ( x) W x 3 U i. i=1 Subsequent n forward iterations are represented by W ο n : X X, where n ( ) ( ) ( ) ( ) ( ) ( n =, =, =,..., = ) ( ) ο0 ο1 ο2 ο ο 1 W x x W x W x W x W ow x W W o W x. A number of forward iterations for the Sierpinski gasket is shown on Figure 2. 5

11 A0 A1 A2 A3 A4 A6 Figure 2: Forward iterations of W(A 0 ) In this case, the iterations converge towards the fixed point very quickly because the image does not alter noticeably after the sixth iteration. In applying IFS to image coding, the aim is to find the set of transformations that will operate on any initial image to produce a coded version of the original image. This is known as the inverse problem of IFS (Vrscay, 1991). The solution of this problem is non-trivial because for each possible IFS, a check must be performed to see if its attractor is close to the given image. This complexity is significantly reduced by applying Barnsley's Collage Theorem (Barnsley, 1988). A collage is the resulting image when the IFS is applied to the original image. The Collage Theorem states that given an IFS W whose attractor is x W, it follows that for all x X, N 1 d( x, xw ) ( 1 s) d x, U wi( x). (7) i= 1 This means that if the IFS can generate a collage W(x) = w ( x) N U i that approximates the i=1 original image x, then the attractor of the IFS will also approximate x. Thus, the inverse problem is reduced to just finding a collage that is close to the given image. To appreciate the motivation for fractal coding, observe that the resulting Sierpinski gasket image has been produced from an IFS that requires the storage of only six parameters per transformation, for a total of 18 parameters. This simple example demonstrates the ability of an IFS to produce a complex image using only a small number of parameters. Unfortunately, such an IFS fails to model real-world images well because a non-trivial image cannot be reproduced by manipulating copies of itself. To overcome this difficulty, Barnsley and Jacquin (1988) extended IFS theory to allow the transformations to operate on parts of the image rather than only the entire image. Such an IFS is known as a recurrent IFS (RIFS), partitioned 6

12 IFS (PIFS), piecewise IFS, or local IFS. Since the transformations of an RIFS acts only on specified regions of the image, each transformation w i has a region or domain area D i associated with it: { w : D : i =,2,..., N} i i X 1, where D i X. Since one part of the image is mapped to another part of itself, fractal coding relies heavily on the self-similar or fractal properties of the image. This can be considered as a form of redundancy whereby similar details are present at different scales. Thus, details present at a larger scale and be duplicated at a smaller scale because of the contractive properties of the transformation Jacquin s block-based fractal coder Jacquin (1990) was the first to propose an automated fractal image coder. The majority of current fractal coders are based on this system, of which an unelaborated version will be briefly explained here. First, the input image is divided into disjoint blocks known as range blocks (Jacquin, 1992). Each range block R i is associated with an affine transformation w i of the form x ai bi w i y ci d = i z x ei 0 y f + i. (8) si z oi Note that this is a form of (6) extended to account for the intensity z(x, y) of the image. During coding, an iteration of this transformation gives the collage of the range block. To code each range block R i the following steps are carried out: Choose a source domain block D i from the pool of domain blocks. This pool comprises subblocks from the original image. Unlike range blocks, however, domain blocks can overlap and do not have to tile the image completely. To assure contractivity, the domain block is decimated (normally by a factor of 2). Because of this, the domain block must be larger than the range block. 7

13 The a i, b i, c i, and d i parameters specify which isometric operation (Jacquin, 1993) to perform. Through this operation, the block is orientated to one of the possible eight orientations, made up of four rotations and two flips. Along with s i and o i, or the contrastscaling and brightness-offset parameters, the orientation is determined using least-squares regression (Fisher, 1992). For each pixel (x, y) in the resulting block, find the new intensity z'(x, y) such that ( ) ( ) z' x, y = siz x, y + oi. (9) Out of all the domain blocks, choose the block with the transformation that yields the least distortion. The problem with this basic coder is that the search requirements are too enormous for such a coder to be practical. For example, an original image subdivided into 4 4 range blocks will have a corresponding amount of 4096 domain blocks of size 8 8. To find the best parameters for each of the 16,384 range blocks, we have to search through 8 orientations of each domain block, meaning 32,768 blocks must be searched. A number of ways have been devised to reduce the search time. Jacquin used block classification and a quadtree structure that allowed two sizes for the range blocks. With classification, only domain blocks that belonged to the same class as the range block were searched. The use of larger range blocks increased the compression ratio and also reduced the number of blocks to be coded and searched. The decoding operation is simple and much faster than coding. The first step is to start with an initial image A, which is the source for the domain blocks. For each transformation, the parameters are read and the transformations performed. The resulting block is copied to image B, which is initially blank. After all the transformations have been processed, the roles of A and B are switched: image B is now the source for the domain blocks and the results of the next set of transformations are stored in image A. A fractal decoder has a number of interesting properties because of the contractive nature of the transformations. The transformations will always produce the exact reconstruction regardless of the initial image and the decoding is scaleless: it can reproduce the image at any 8

14 desired resolution, constrained only by the blocks having an integral size. From simulations, it was observed that the typical number of iterations needed was around 8, and it increased if decoding to a larger scale was desired. 3.2 A Review of the State of the Art Most of the developments in fractal coding aimed to improve the coding speed and ratedistortion performance. Saupe and Hamzaoui (1994a) provided a comprehensive survey and a relevant bibliography list. A shorter survey can also be found in Jacquin (1993). Fisher et al. (1991) proposed an elegant method of orienting a block according to the brightness of its 4 subblocks. This meant that only the orientation-normalised representation of the blocks was searched, instead of 8 instances of the same block. Also, there were 3 possible canonical positions, depending on the location of the brightest, next brightest, and third brightest blocks (Fisher et al., 1991). Thus the search was further reduced 3 times if only blocks from the same canonical position were searched. A further classification could be imposed on the block the subblocks could be ordered according to their variances, resulting in 4! or 24 possible subclasses. In total, these classifications reduce the search complexity by 576 times. Simulations with this algorithm found that the brightness classification method was effective in producing good matches in reduced search time. The variance sub-classification provided a more substantial trade-off: although search time was reduced considerably, the image quality was much degraded too. Refer to Table 1 for results. The results in this table were produced from simulations performed on a HP-720 workstation and the image used was the monochrome image of Lena. 9

15 Search Constraint Encoding PSNR (db) Compression Brightness Variance Time (sec) Ratio Classification Ordering U U V U V V Table 1: Performance trade-offs with different search constraints Fisher et al. (1991 and 1992) also extended IFS theory in proposing the use of eventually contractive mappings. An IFS { X; w, n,2,..., N} n = 1 is eventually contractive if there exists a positive integer m such that the mth forward iteration W ο m : X X is contractive. The significance of this theory is that the restriction of contractivity can be removed from some of the individual mappings w i in the system as long as the overall effect of all the mappings is contractive. This means that the contrast scaling parameter s i in equation (9) can be 1.0, greatly improving the approximation ability of the transformation. Monro and Dudbridge (1992) developed an algorithm that did not require searching by imposing a structure such that a domain block always had its four quadrants as its four range blocks, as shown on Figure 3. Each range block R i had an associated transformation w i that operated on the parent block. Since the domain block was always constrained to be the parent block, a linear equation such as (9) could no longer approximate the range block well. Instead, Monro and Dudbridge used a polynomial of the form A$ x = wi( A) = wi y = z x y x y (, ) z' ( x, y) i i i i = a1x + a2 y + a3z( x, y) + a4. (10) The parameters a k ; k = 1, 2, 3, 4; for each transformation i were determined using the leastsquares criterion such that (Monro, 1993a) 10

16 ( A A) d, $ ; k = 1,2, 3,4. (11) ak This technique was known as the Bath transform and featured both fast coding and decoding. It was further generalised in Monro (1993a, 1993b), with additional simulation results and improvments given in Monro (1993c). The extensions to the algorithm allowed using polynomials different from (10), finding the best split point of the domain block, applying rotations and reflections, as well as searching for the best domain block. Having introduced this flexibility, however, Monro (1993c) concluded that a more favourable trade-off would be to forego searching but increase the order of the polynomial. w 1 (A) w 2 (A) w 3 (A) w 4 (A ) Figure 3: Domain block with its associated four range blocks The other class of fractal coders that will be examined here are those based on an innerproduct-space approach, with the first of such techniques discussed by Øein et al. (1991). They used a transformation of the same form as equation (10) but expressed differently as: 3 $A = wi( A) = ak i i Bk + a4t( B4 ), (12) k= 1 where {a 1, a 2, a 3, a 4 } are real coefficients. The matrix B 4 was the domain block and T(B 4 ) represented the scaled and isometrically transformed block. The matrices {B 1, B 2, B 3 } were fixed basis blocks and were specified such that the pixel (i, j) was 1 in B 1, i in B 2, and j in B 3 (Øein et al, 1991). The approach to solving for the coefficients a k was to make the error A - $A orthogonal to each basis B k : 11

17 A A $, Bk = 0 ; (13) and solving the resulting set of orthogonality equations (Øein et al., 1991). Later work in Øein et al. (1992) extended this inner-product-space approach in two major ways. First, all the fixed basis blocks B k, 1 k 3 in this example, were orthonormalised using Gram-Schmidt orthonormalisation. Then, for each new image to be coded, a number of candidate domain blocks were selected. Each of these domain blocks was orthonormalised with respect to the subspace formed by the fixed basis blocks B k, 1 k 3. The resulting subspace will be denoted S. Now, given a range block A, we could find its least squares approximation by projecting A onto subspace S. Since S was spanned by the orthogonal bases {B 1, B 2, B 3, B 4 } (Anton, 1991), proj S A = A, B1 B1 + A, B2 B2 + A, B3 B3 + A, B4 B4. (14) This meant that each coefficient a k of equation (12) could easily be found as the inner product A, Bk. Compared to having to solve the orthogonality equations in (13), there is a significant reduction in complexity. Much work was also done by Vines (1993) on using orthonormal basis vectors. The second mechanism introduced to reduce the encoding time was a clustering technique that reduced the number of domain blocks to be searched by grouping similar blocks together (Øein et al., 1992). Furthermore, both range and domain blocks were decimated to reduce the dimensions of the blocks. In Øein et al. (1993), the inner-product-space approach was extended to provide a noniterative decoder by selecting the basis B 4 from equation (14) so that it is independent of the image. Ramstad and Lepsøy (1993) discussed how such blocks can be chosen. The domain pool was formed from a subsampled version of the mean image. Although the decoder could reproduce such a domain pool without iteration, this had a number of disadvantages. First, the subsampling depended on the block size, eg. 8:1 in each direction for 8 8 blocks. This meant that the domain pool was a highly subsampled version of the original image, weakening the 12

18 self-similarity between the domain pool and the image to be coded. Second, the high subsampling factor also meant a great reduction in the domain-pool size. There are a number of other ways to improve the decoding speed. Because of the contractive properties of the transformations, the decoding can commence with any initial image. In itself, this interesting feature does not offer any performance advantage. However, convergence is faster with an initial image that is a closer approximation to the final image. Thus, one method of exploiting this is through the use of fixed basis blocks, as explained earlier in the method proposed by Øein et al. (1993). A second method is to first decode a smaller version of the image. This image is then expanded and used as the initial image for the full-size decoding (Fisher et al., 1992). Another improvement is the use of tiling shapes that are non-square. Fisher et al. (1994) and Fisher (1992) investigated HV-partitioning, which allowed rectangular blocks. The partitions were chosen so that the blocks fulfilled some form of homogeneity criterion. Since there was more flexibility in choosing the partitions, less blocks were needed to tile a given image compared to using square blocks. Fisher et al. (1994) demonstrated that this partitioning could outperform the conventional quadtree partitioning. Fisher (1992), Novak (1993), and Chassery et al. (1993) explored triangular partitioning, and the even greater flexibility showed encouraging results. Besides being able to use less partitions to tile an image, the artefacts arising from the partition boundaries were also less distracting compared to horizontal and vertical block boundaries. 13

19 3.3 Advantages of Fractal Coding Multiresolutional decoding Due to the properties of contractive transformations, it offers multiresolutional decoding, meaning that a fractally coded image can be decoded directly from any initial image into a reconstructed image of any resolution. Furthermore, the expanded image shows less pixellation compared to conventionally expanded images. For this reason, a special application of fractal coding is resolution enhancement (Barnsley and Hurd, 1993). This is used for producing an enhanced image at a higher resolution, rather than for compression. The given image is fractally coded and then decoded at a high resolution. The enhanced image is kept but the transformation parameters are usually discarded. In practical implementations of block-based coders, there is a likely constraint on the decoded size because the smallest block size must still be an integral number of pixels. For example, a image that has been coded with a minimum block size of 4 4 cannot be decoded directly to since this requires block sizes of Good performance at high-compression ratios Although fractal coders offer competitive performance at moderate compression ratios, they compare best to other coders at very high compression ratios. This was also observed by Frigaard et al. (1994), which did simulations against the popular JPEG scheme, and Fisher et al. (1994), which compared wavelet coding as well. In terms of rate-distortion performance, Barthel et al. (1994) s coder currently offers one of the best results, providing approximately 32 db and 36 db for bpp and 0.43 bpp compressions, respectively Relatively fast decoding speed Fractal decoding is a relatively fast operation because it basically involves the simple operation of taking the domain portion of the image, applying the transformation, and copying the resulting pixels to the range area of the image. 14

20 3.3.4 Opportunities for improvement Fractal coding is a relatively new technology and has received considerably less attention compared to techniques such as transform coding and vector quantisation (VQ). It is currently a highly active area of research, with numerous improvements being continually discovered (Saupe and Hamzaoui, 1994a). 3.4 Disadvantages of Fractal Coding Lengthy encoding time Unconstrained fractal coding is a very lengthy process, and many of the improvements that have been proposed resulted in either image-quality degradation or did not reduce the coding time sufficiently. On the other hand, a number of recently proposed techniques appeared to be promising. Saupe and Hamzaoui (1994b) provided a review on the different types of complexity-reducing techniques Lack of PIT capability To date, no PIT-capable block-based fractal coding method has been published yet. This limits the applicability of fractal coding when such a capability is needed. There are numerous reasons fractal coding is not well suited for PIT. The major one is that there is block interdependency in the decoding operation. This means that certain blocks can only be decoded after the domain block it depends upon has been decoded. Such a dependency generally signifies that all the transformations must be received before the decoding can begin. Another disadvantage is that the decoding is iterative, meaning that it is computationally expensive to decode an image with approximated transformation parameters Patented technology Fractal image compression is a patented technology (Barnsley and Sloan, 1990) (Barnsley and Sloan, 1991). The technology could be licensed from Iterated Systems, Inc., which also produced commercial software and hardware for both image and video compression. Being a 15

21 patented technology, it is unlikely that it will enjoy widespread use, especially with the JPEG image-compression standard freely available Approximation problems Since fractal coding relies on the self-similarity of the image, it carries with it a unique set of coding problems and is less robust for coding certain types of images. First, it is restricted to the use of transformations that are contractive. Theoretically, the contrast-scaling parameter s i in equation (9) is constrained to 0 s i < 1. This would mean that it cannot approximate some types of high contrast patterns well. Fortunately, this problem is somewhat eased through the use of eventually contractive maps and polynomial approximation functions. The second problem is the large degree of non-uniformity in image quality. It is well known that images have non-stationary properties, resulting in certain parts that are easier to code than others. Most image compression methods handle the changes in image statistics by adapting the coding parameters accordingly. In fractal compression, this is mainly done by the use of different block sizes. While this aids in attaining more uniform image quality, the degree of non-uniformity is still generally larger than other coding methods. This problem arises from using the block-transformation process to create a range block from a domain block. If the domain block is not approximated well, the range block will be badly approximated too. 3.5 Selecting the Fractal Coder for PIT In selecting the fractal coder to be used for PIT, there are a number of main criteria to consider: rate-distortion performance, codec speeds, and codec complexity. Figure 4 compares the rate-distortion (R-D) performances of different fractal coders. Rate is defined as bits per pixel and distortion is the peak-to-peak power signal-to-noise ratio (PPSNR). The results are presented for the widely used grayscale Lena image. As with other comparisons of this type, it is important to be aware of the limitations. First, it has been widely accepted that the PPSNR is not an accurate measure of subjective image quality (Girod, 1993). However, it is the most commonly used form of distortion measure, and no other convenient and more accurate measure has been satisfactorily accepted in its place. 16

22 Second, although rate-distortion performance is frequently the most important criterion for a coder, there are other important issues such as codec speed, codec complexity, and application-specific criteria. Third, this comparison does not contain every published fractal coder since not all researchers use Lena for their simulations. However, it does represent the coders that offer the best R-D results. The JPEG coder is also included as it is a widely used standard and offers high R-D results for moderate to high bit rates. It can be seen that Barthel et al. (1994) offers the best rate-distortion performance, followed by Fisher et al. (1993). The other fractal coders perform rather poorly against these two. Indeed, the gap is so wide that only these two coders will be considered further. The performance offered by Barthel et al. s (1994) coder comes at a high cost. The coder complexity is very high and it requires an exceedingly long encoding time due to its many levels of adaptivity. To obtain high coding efficiency, the bit allocation is based on a global rate-distortion cost factor. For each range block, the best block for each of the possible search classes must be determined. Furthermore, if the distortion is still too high for the largest search range, the order of the luminance transformation is increased. For this higher order, the best block for each search class must be determined again. Another major disadvantage is the need to perform the DCT and IDCT for decoding. Added with the non-uniform tiling of the domain blocks, this can result in a lengthy decoding time. On the other hand, Fisher et al. (1993) s coder offers a more-balanced performance trade-off. As shown on Table 1, the rate-distortion performance can be traded-off with encoding time, with remarkably good results and fast encoding time produced when the 24 variance subclasses are searched. Decoding speed is also fast since the coder operates in the spatial domain. Based on these points, this coder, referred to as the quadtree fractal coder hereon, was chosen to be the basis for the PIT method that will be discussed in the next section. In coding a image, the data that need to be stored for each transformation are listed in Table 2. 17

23 35 Bit Rate vs PSNR: Compression of 512x512 Lena Image JPEG Barthel et al. (1994) Barthel and Voye (1994) PSNR (db) Fisher et al. (1993) Jacquin (1993) Bit rate (bits per pixel) Fisher et al. (1993) JPEG Barthel and Voye Lepsoy et al. (1993) (1994) Thomas and Deravi Vines and Hayes III Jacquin (1993) Oien et al. (1991) (1993) (1993) Novak (1993) Zhao and Yuan (1994) Oien et al. (1992) Barthel et al. (1994) Figure 4: R-D performance of fractal coders Data type Bits Needed Quadtree level: a 2-level quadtree is used 1 Address of domain block: eg. non-overlapping 8 8 domain blocks 12 Contrast scaling s i 5 Luminance shift o i 7 Symmetry operation 3 Total Number of Bits 28 Table 2: Bit allocation for quadtree-based fractal coder 18

24 4. PROGRESSIVE IMAGE TRANSMISSION 4.1 Introduction PIT methods can be effectively classified based upon the manner in which the progressive update of image content is achieved. Such a classification emphasises the method by which the data are progressively transmitted, rather than the type of data being sent. Based on this classification, PIT methods can be divided into four major classes. There can be overlap between these categories and such hybrid techniques can improve the approximation ability of the overall system. For example, for the JPEG progressive coder, better performance is given when successive-approximation is combined with spectral selection (Pennebaker and Mitchell, 1993), which is classified as a transmission-sequence-based method here. To maintain generality, it is assumed that these methods can be applied to any type of data; for example, coefficients that are derived from prediction error, transform coding, wavelet coding, or VQ codewords, etc. This is followed by a discussion about the technique s suitability for fractal-based coders. Since this does not consider combining different PIT methods together, the objective is to quickly discount inappropriate methods. Through this process, it was decided that two of the four methods were feasible for a fractal coder. To cater for general applications, the following criteria were used in evaluating the PIT methods: 1. Good overall and progressive R-D performance. 2. Visually continuous progressive build-up and sufficient number of transmission stages. 3. Fast decoding speed. 4. Other criteria such as fast encoding speed, low codec complexity, and capability for selective decoding. 5. With respect to a fractal coder, the proposed system should only require simple modifications to block-based coders. This ensures that existing fractal coders can readily be converted into progressive-transmission systems. 19

25 4.1.1 Successive-Approximation Methods In this type of coding, the precisions of the coefficients are successively refined (Equitz and Cover, 1991). There are numerous ways in which this can be done, with the effective ones updating the coefficients that are more important first. For many types of coefficients, transmitting the higher-order bits before the lower-order ones is a simple and often efficient choice. This method is used in coders such as bit-plane coders (Dürst and Kunii, 1991), treestructured VQ (Gray et al., 1992:51), full-search VQ (Riskin et al., 1994), and transformdomain methods (Wallace, 1991). It is also used in embedded zerotree wavelet coders (Shapiro, 1993) and can exploit the correlation between the pyramid levels. The applicability of this method is limited in the sense that it can only be used if the coefficients are values that can be successively approximated. With a fractal coder, for example, only the contrast-scaling and luminance-offset parameters satisfy this criterion. Successively coding the other parameters will result in grossly distorted intermediate images. For example, the domain coordinates cannot be successively approximated as this would mean that a different block will be chosen each time. This is a serious weakness because the parameters that cannot be transmitted progressively represent a considerable portion of the total bit rate. This also means that a sizeable amount of data must be transmitted before the first approximation can be displayed. For example, with the bit allocation given in Table 2, only 12 out of the possible 28 bits can be successively approximated. Added to this disadvantage is the slow decoding speed of fractal coders due to the need to iterate. This disadvantage is tolerable for non-progressive decoding since the image is also decoded once. For progressive decoding, however, the image must be decoded repeatedly, once for each transmission stage. It is possible to reduce this problem to a limited extent by decoding only domain blocks and range blocks that have been changed instead of all the transformations. However, this is not feasible if the majority of the blocks have been altered Multistage or Residual-Quantisation Methods A multistage-quantisation or residual-coding method operates by initially transmitting a crude approximation of the image. For the subsequent transmission stages, the approximated image is subtracted from the original one, resulting in a residual image. This residual image is coded, 20

26 possibly with a different method from that used to code the initial approximation, and transmitted. This process continues until the desired quality is achieved. As with other PIT methods classified here, many types of coders can be used for the coding of the initial image and the residuals. Residual VQ (RVQ) (Ho and Gersho, 1988) is a popular form of a residual coder and can yield very good rate-distortion performance (Kossentini et al., 1993). Another example is a residual coder that operates in the transform domain (Wang and Goldberg, 1988). Residual coders can also be extended to provide exact reconstruction by losslessly coding the final residual layer (Wang and Goldberg, 1988b). Two methods can be used to incorporate fractal-coding into a residual coder. The first method is to code the initial image using a small bit allocation for the transformation parameters. This image is then subtracted from the original, and the residual image is iteratively coded in a similar fashion. Except that it uses residual coding, this method resembles the successiveapproximation method described previously. Thus, it adopts the same set of disadvantages as well. The second method maintains the bit allocation per block but increases the block size to larger ones in coding the initial image. The residual images in later stages can be coded with smaller block sizes. This coder resembles a hierarchical coder, except that the resolution is always maintained at the highest level. The main disadvantage with this approach is that it is not as efficient as the hierarchical coder, as will be explained later Transmission-Sequence-Based Methods Transmission-sequence-based techniques operate by transmitting the image data in an order that is determined by the relative importance of the data. The most common implementation of this method operates in the transform domain, where it is also known as the spectral-selection method (Pennebaker and Mitchell, 1993). The transmission sequence of the ac coefficients can be specified using the zig-zag scanning sequence (Ngan, 1984). A more elaborate method, proposed by Chitprasert and Rao (1990), is to first scale the coefficients by a weighting function. For perceptually based ordering, this function is derived from the contrast-sensitivity function (CSF) of the human visual system (HVS) model. The 21

27 resulting coefficient blocks are then classified into a number of classes, typically four, according to the total ac energy within each block. Finally, the transmission sequence is specified by ordering the variances of the coefficients, with the highest-variance coefficients being transmitted first. The transmission sequence is sent as overhead data. A transmission-sequence PIT operating in the spatial domain using VQ is described by Gong et al. (1993). The sequence ordering is achieved by ordering the VQ codebook in decreasing order of codeword variances. These approaches will be described more fully in a later section. The advantages with this approach are as follows: The additional decoding operations are minimal compared to a non-progressive decoder. In contrast, the previous two methods may require a much larger amount of operations. There is a favourable trade-off between the number of build-up stages and the overhead data sent: basically, this means that the overhead is small for a small number of stages. The transmission sequence can be perceptually ordered and be adaptive to the image, as shown by Chitprasert and Rao (1990). This transmission method forms the basis for the proposed transmission-sequence-based fractal coder method Methods Based on Multiresolutional Coding A multiresolutional or a pyramidal structure provides an intuitive approach to PIT because it allows an image to be accessed in a number of different resolutions. Indeed, the earliest PIT method, proposed by Tanimoto (1977) and also described in Sloan and Tanimoto (1979), was based on such a structure. Many current state-of-the-art PIT coders, namely the wavelet and subband coders, are also based on multiresolutional structures. Spatial decomposition methods used to create the multiresolutional representation can be divided into two major groups. The first group represents the simpler form and uses subsampling and interpolation, eg., Ang et al. (1991). This is a very fast algorithm but often produces insufficient compression ratios and has poor approximation ability. As a result, it is 22

28 usually combined with successive approximation of the grayscale values, eg., see Takahashi et al. (1989). The second group is more widely adopted and uses a type of pyramid structure to achieve multiresolutional representation. There is an inherent hierarchical ordering of data in a pyramid structure, and popular forms are the quadtree, binary tree, quincunx, and hexagonal structures. An important advantage associated with using a pyramidal structure is that it can be used with almost any type of coder, including a fractal coder. The proposed pyramidal fractal coder is based on this approach. 23

29 5. TRANSMISSION-SEQUENCE-BASED FRACTAL CODER 5.1 Introduction The proposed transmission-sequence-based fractal coder (TSFC) is shown on Figure 5. TSFC ordered the transmission sequence of the image blocks according to the perceptual class of the block. A number of classification systems were considered before deciding on a spatialdomain-based classifier. The sensitivity of the system to the performance of the classifer will be discussed. There were difficulties encountered with this approach. Although a number of improvements were suggested to deal with them, the final conclusion was that the block-dependency nature of fractal coding was a major impediment to achieving high-quality approximation and a pleasing progressive build-up of image details. Coder Input Image Block Division Subdivide Block Classifier Type of Block Transmission Order Transmit Block Buffer Channel 5.2 Specifying the Transmission Sequence Figure 5: Block diagram of TSFC The Traditional Approach Consider a transmission-sequence-based PIT (TSPIT) system that transmits an image X in M transmission stages. Invariably, the image is divided into non-overlapping blocks x ij. Each transmission stage α k : k = 0, 1,..., M - 1 is defined by the blocks that are transmitted for that stage. For example, α 1 = [x 1,2, x 2,4, x 3,6 ] will mean that the set of three blocks will be 24

30 transmitted for the second stage. Also, for non-redundant transmission, if x ab α k, then x ab α j for j k. Also, for progressive transmission, M > 1 and the following cannot be true: xij αk xij and any k, meaning that a single stage cannot contain the entire set of blocks. The overhead information incurred is log 2 M bits per block. In previously proposed systems, the first stage of the transmission is the mean values of the blocks such that [ ij i j] α 0 = µ :,, where µ ij is the mean value of x ij. In such a case, the previous conditions still hold except that mean-removed blocks x' ij are transmitted and the overhead information is log 2 M - 1 per block. The order of the blocks for the later stages is usually determined according to the variance of the block, with higher-variance blocks being transmitted first (eg. (Gong et al., 1993) and (Chitprasert and Rao, 1990)). When operating in the transform domain, the added advantage to doing so is that the lower-frequency transform coefficients will have higher priority than higher-frequency ones. In the spatial domain, however, the motivation may not be as obvious. In order to maximally reduce the error from one stage to the next, the most poorly coded blocks should be updated first. If mean square error (MSE) is used as the distortion measure, the block selection should be in order of decreasing variances The New Perceptually Based Approach The problem with the variance-based approach is that it does not take into account the perceptual significance of the image region. This was also noted in Sanz et al. (1984), which observed that image regions with the highest activity did not always contain the most meaningful information. The novel approach proposed here incorporates the properties of the human visual system (HVS). It is widely accepted that the most perceptually important part of 25

31 an image is its structural content, and this is largely defined by the presence of edges [(Alter- Gartenberg and Narayanswamy, 1990) and (Marr, 1982)]. Thus, edge blocks should be transmitted prior to other non-uniform blocks so that the structural content of the image is defined early in the transmission. A block classifier can be used to distinguish four types of blocks: uniform, edge, mixed, and texture. A mixed block is defined as a block with two or more apparent edges. There are a number of ways the transmission sequence can be modified to include this new approach. As before, the first stage transmits all the mean values. Some variations in the transmission order are as follows: Transmit an entire class for each stage, in the order of edge, mixed, and texture class. More stages can be obtained by subdividing the blocks of a stage according to their variances. For example, 7 instead of 4 stages can be obtained by splitting each non-uniform class into high-variance and low-variance blocks. Order the block according to both its class type and its variance. For example, highvariance texture blocks can be transmitted before low-variance edge blocks for greater MSE reduction. The second method is expected to yield better performance over the first one because it is a compromise between a variance-based transmission sequence and a HVS-based one. This results in a more favourable trade-off between error reduction and perceptual significance. An example is a system that uses 3 variance-divisions for the edge class, {a, b, c}, 2 for mixed, {d, e}, and 2 for texture, {f, g}. Notice that a total of 8 different types have been defined (including the mean class) for efficient bit representation. A reasonable transmission order of the classes is {a, d, b, e, f, c, g}, while {a, d, f, b, e, c, g} is more biased towards greater error reduction. Gong et al. s TSPIT (1993) used block classification as well. However, the classification was used solely for ensuring edge fidelity and did not contribute to forming a more perceptually coherent transmission sequence. The coder employed was the classified separating mean Kohonen VQ (CSMKVQ), and it classified the blocks into 2 edge and 2 non-edge classes. 26

32 Progressive transmission was achieved by arranging each subcodebook according to decreasing codeword variances. For the first stage following the mean-value stage, in each class, n 1 of the codewords with the highest variances were chosen. Similarly, for the subsequent stages, n i of the remaining highest-variance codewords were chosen. During transmission, the blocks that used one of the n codewords defined for that stage were transmitted. The classes were given equal weighting, meaning that the same amount of n codewords was chosen for each class. 5.3 The Block Classification System Introduction The classifier should provide a visually continuous transmission build-up as well reduce the search time in the fractal coder. When considering the coding aspect, the lengthy search time of a fractal coder can be significantly reduced by searching only domain blocks that belong to the same class as the range block. In order to form homogeneous domain pools, the classifier must be able to distinguish the blocks accurately. Thus, the classifier influences the system s R-D performance, transmission sequence, encoding time, and the reconstructed image s perceptual quality. For this reason, much attention was placed into the design of the classifier. After an extensive analysis, it was found that the spatial-domain-based classifier proposed by Wang and Mitra (1993) was the most suitable for the task A Review of Classifiers Used in Image Coding The two widely used types of classifier operates in either the spatial or transform domain. For a transform-domain coder a classifier that operates in the transform domain offers a number of advantages. First, the transform coefficients can be used for both the classifier and the quantiser. Second, as demonstrated by Lee and Crebbin (1994) and Kim and Lee (1992b), the classification is very fast as it uses only a small number of feature values. In Lee and Crebbin (1994), with respect to Figure 6, the coefficients C(0, 1), C(0, 2), and C(0, 3) form the horizontal feature set, and the coefficients C(1, 0), C(2, 0), and C(3, 0) form the vertical 27

33 feature set. These six coefficients are sufficient to distinguish 24 different edges. Kim and Lee (1992b) used only coefficients C(0, 1) and C(1, 0) to differentiate between 6 classes: a lowvariance class, one high and one medium-variance vertical edge class, one high and one medium-variance horizontal edge class, and a mixed-edge class. C(0,0) C(0,1) C(0,2) C(0,3) C(1,0) C(1,1) C(1,2) C(1,3) C(2,0) C(2,1) C(2,2) C(2,3) C(3,0) C(3,1) C(3,2) C(3,3) Figure 6: DCT coefficient of a block A survey of classified coding methods indicated that transform-domain-based classifiers did not have a class for texture blocks. Rather, such blocks were usually classified as a mixed block or an edge block. However, texture blocks are perceptually different and thus, should be coded differently as well. On the other hand, there are a number of possible reasons for not doing so. First, Chen and Bovik (1990) argued that for average viewing conditions, the limitations of the HVS mean that the human eye can detect only a single edge in a 4 4 block. Therefore, for lower-quality coding, it is possible to code all non-uniform blocks as edge blocks. Second, the transformations used in image compression have strong energy-packing properties. This means that after a DCT operation, most of the energy of a block is contained within a group of low-frequency and mid-frequency coefficients. The regions that have largemagnitude coefficients change depending on the orientation of the edge block. Therefore, if non-uniform blocks are coded adaptively according to their orientation, the approximation quality will still be very high. Other advantages are that fewer bits are needed to represent the class information and with added complexity, the class information can be predicted (Ngan and Koh, 1992), further reducing the overhead. However, for TSFC, such a classifier is inadequate because for a perceptually meaningful transmission build-up, the blocks should be classified into perceptually different groups. Another compelling reason is that when coding in the spatial domain, texture blocks will be statistically different from edge blocks. If treated similarly, the approximation in the spatial domain will be poor. Vaisey and Gersho (1992) also reported that, even in the transform domain, coding texture and edge blocks differently can result in substantial rate reduction for 28

34 images with high texture content. These reasons suggest that a spatially based classifier will be better suited because its larger set of feature values can distinguish a larger number of perceptual classes. The simplest spatial-domain classifier was that proposed by Nasiopoulos et al. (1991), and it operated by taking the difference between the maximum and minimum pixel values. Their adaptive block-truncation coding (BTC) used 4 4 blocks, and a block was classified as smooth if the range was less than a smoothness threshold. This threshold was determined empirically to be around 18. For nonsmooth blocks, the range was compared to an edgedetection threshold of around 120. If the range was less than the threshold, the block is classified as texture, otherwise edge. A similar classifier was also used in Christopoulos and Skodra (1993). The advantage of this algorithm was its simplicity. However, it is expected that the classification will be quite inaccurate since a single feature value is not reliable enough to distinguish between the three block types. Ramamurthi and Gersho (1986) s classifier is more complex and computationally intensive. The first step is an edge-enhancement process that accounts for Weber s effect, normalises the pixel gradient, and determines the transition matrices and transition counters. The gradient between a pair of adjacent pixels is normalised by dividing by their average intensity. A transition matrix is defined for each horizontal and vertical direction. Each entry is determined based on whether the transition between the corresponding pair of pixels is negative or positive. A transition is registered if its magnitude is above a threshold T e based on Weber s Law. Furthermore, a transition counter is defined for each horizontal and vertical direction. The counter is incremented if the transition magnitude is above another threshold T s that is also based on Weber s Law. This is followed by a decision tree that classifies the block as shade, midrange, edge, or mixed. A block is classified as shade if both the transitions counters are below a threshold J s. For an edge block, polarity counts are obtained from the transition matrix. Each horizontal and vertical direction has a positive and a negative polarity count. 29

35 For a given direction α, if both positive and negative counts have high values, the block is classified as mixed. Else, if only one polarity count β exceeds the threshold J e, the block has an edge component in direction α with polarity β. A diagonal edge is characterised by the presences of edge components in both directions. The position of the edge can also be determined by the polarity counts (Ramamurthi and Gersho, 1986). Finally, if none of the polarity counts of a non-shade block exceeds J e, the block is classified as midrange. As can be seen, four thresholds are used, and they are defined heuristically. This algorithm was observed to work well with all classes except for the mixed class, which wrongly contained some blocks with dominant edges (Ramamurthi and Gersho, 1986). However, this problem could be reduced by additional processing described in Ramamurthi (1985). The other minor disadvantage is that it does not include the texture class. Nevertheless, the use of transition matrices and counters to define the mixed class means that texture blocks are a subset of this class. Alternatively, lower-variance texture blocks can be classified as midrange class Wang and Mitra (1993) s Classifier Wang and Mitra (1993) s classifier was part of their coder based on block pattern models and is shown on Figure 7. It classifies the blocks into five types of classes: constant, edge, pseudoconstant, texture, and mixed. This classifier was derived based on experimental findings of the HVS. It was reported to work well with 4 4 and 5 5 block sizes (Wang and Mitra, 1993). The first step uses a just-noticeable difference (JND) function to determine if the block appears uniform perceptually. Given the background intensity, the JND function returns the threshold at which a change in luminance is just noticeable to the HVS. This function, which accounts for Weber s Law, was determined through psychophysical experiments (Wang and Mitra, 1993) and is shown on Figure 8. The background intensity is approximated by fmax + 2 fmin, with f being the intensity function. The luminance change is approximated by f max - f min. If this change is not noticeable, the block is classified as constant or shade. Non-constant blocks require further processing, namely to find the orientation of each pixel and to record the counts for each orientation in an orientation histogram. A number of 30

36 methods can be used to find the pixel orientation. The Sobel operator (Gonzalez and Woods, 1992) was used due to its simplicity and accuracy. To calculate f ' x and f ' y, the gradients in the horizontal and vertical direction, the Sobel operator uses two separate 3 3 spatial mask (see Figure 9) centred on the pixel of interest. Note that this is a more accurate method than Ramamurthi and Gersho (1986) s method of using only a pair of adjacent pixels. Having found f ' x and f ' y, the quantised orientation is given by angle = 1 Q tan f f ' x. (15) ' y Q( ) is the function that quantises the angle according to the orientation selectivity of the HVS. The orientation bandwidth used in the literature varies, with 30 (Hubel and Wisel, 1974) and 40 (De Valois et al., 1982) being the most common. On the other hand, Wang and Mitra (1993) based their implementation on Olzak and Thomas (1986) s findings of 15 to 20. Wang and Mitra (1993) reported that 15 yielded good results, and this was confirmed by our experiments. Having found the quantised angle, the orientation histogram is updated. When all the pixels in the block have been processed, the orientations with the highest and second highest counts, h max and h max2, respectively, are determined. The decision tree that follows is based on these two counts and correlates well with the intuitive definition of the class type. The classification process is explained by the flowchart on Figure 7. An edge block has pixels with only one dominant orientation. A new class type known as pseudo-constant is also introduced. It is a uniform block with a higher luminance change than a constant block but has no definite edge orientation. A texture block is defined as one with a noticeable luminance change but has no definite edge orientation. Finally, a mixed block is defined as a block with at least two perceptible orientations. 31

37 Input Block = JND[(f max + f min) / 2] Y Constant N (f max - f min) < t 0? Orientation Histogram Find the count for each orientationh k Find the maximum h max and the second maximum h max2 N h max t 1? Y h max2 t 2 h max? Y Edge N (fmax - fmin) < t3? Y Pseudo- Constant N h max < t 4? Y Texture N Mixed Figure 7: Flowchart of the Wang and Mitra (1993) classifier (adapted from paper) 32

38 50 40 JND Background Intensity Figure 8: JND Function [adapted from (Wang and Mitra, 1993)] (a) (b) Figure 9: (a) Mask for computing vertical gradient; (b) horizontal gradient Implementing the Classifier The advantages of this classifier are that it is based on HVS models that correlate well with experimental findings and that the criteria used to categorise the block agree well with the qualitative definitions of the different classes. In addition, it recognises the whole group of perceptually distinct classes. However, there are a number of disadvantages associated with it. First, it uses five thresholds that have to be determined empirically. Second, it is quite computationally intensive, as can be seen from the approximate calculations shown in Table 3. 33

39 Operation Counts per pixel Comparison 4 Addition 12 Multiplication 2 Division 1 Shift 4 Table 3: Computational complexity of classifier Nevertheless, these disadvantages are outweighted by the advantages, and the classifier used in TSFC is based on this classifier, with a number of modifications. First, with the Sobel operator, the pixel s orientation is considered only if its gradient is above a threshold value T g. Based on the minimum JND threshold of 6. Also, the pixels at the image borders are replicated so that the Sobel operator can still produce acceptable results. Wang and Mitra (1993) used block sizes of 5 5 and the thresholds t i : i = 0,..., 4 as follows: 1.0, 0.56n, 0.75, 5.0, and 0.48n, where n is the number of pixels in the block. The classifier proposed for TSFC uses 4 4, 8 8, and blocks. The large blocks are restricted to only the uniform class. 8 8 blocks can be of any class apart from the texture class. The reason for this is that an 8 8 block might consist of non-homogeneous 4 4 blocks, and as a result, is labelled as a texture block by the classifier. When this classifier was used with the threshold values proposed by Wang and Mitra, it was found that far too many blocks were classified as texture, and the blocking effect was unacceptable. Also, the algorithm could not distinguish well between the non-uniform blocks that were larger than 4 4. A significant amount of the edge blocks in the Lena image were not identified as oriented blocks but were misclassified as mixed or more commonly, as texture blocks. This agreed with the observations made by Ramamurthi and Gersho (1986) and Wang and Mitra (1993). 34

40 (a) (b) (c) (d) Figure 10: (a) 8 8 edge blocks; (b) 8 8 mixed blocks; (c) 4 4 edge blocks; (d) 4 4 mixed blocks; (e) 4 4 texture blocks. (e) 35

41 Further experimentation suggested that the following parameters yielded better results: 1.0, 0.35n, 0.8, 1.2, and 0.3n. The criteria used for this selection include reduction of blocking effects and consistent selection of edge blocks. Figure 10 shows the selection of some nonuniform classes of the Lena image. 5.4 Fractal Coding System Overview of the Coder The fractal coder used is a variation of that of Jacquin (1993), which has been briefly described in Jacquin s block-based fractal coder. The block diagram of the TSFC is shown on Figure 5. The first step involves dividing the image into blocks through the use of a quadtree structure. Non-uniform blocks are recursively subdivided until they are uniform or the block size is at a minimum. After division, mean removal is performed on all the range blocks. The first transmission stage consists of sending the mean values of all the range blocks. For the subsequent second, third, and fourth transmission stages, the respective edge, mixed and texture blocks are fractally coded and sent. This section explains aspects of the coding system that are different from Jacquin s blockbased fractal coder. A component of the system that is modified but not mentioned below is the classifier, which was discussed in the previous section Quadtree Structure In order to adapt to different regions in the image, a quadtree structure is used to provide variable block sizes. Two parameters to be determined are the minimum and maximum block sizes. For uniform blocks, larger block sizes can be used, with being the most common blocks were experimented with, but did not produce acceptable results. First, there were very few uniform regions that were this large. Second, blocking became a very noticeable artefact. Lee and Crebbin (1994) reported a similar observation as well. Therefore, a 3-level quadtree structure is used, with 4 4 being the minimum block size. A quadtree structure must also be specified by a splitting criterion, ie. a block is decomposed if it is non-homogeneous according to the criterion. The criterion used here is the class of the 36

42 block, instead of MSE. Jacquin s (1993) implementation also uses partial subdivision. The approximation for the parent block is kept if the distorted sub-blocks can be adequately approximated by two or less child blocks. This can help reduce the number of blocks needed but the penalties are increased coding time and increased overhead in specifying the division. For these reaons, a simple quadtree structure without partial subdivision is employed Creating Mean-Residual Blocks The first transmission stage of TSFC consists of sending the mean values of all the blocks. To do this efficiently, the fractal coder should process mean-removed blocks rather than the original image blocks. In addition to PIT, there are a number of advantages for doing so. First, it is well known that there is a high degree of correlation between the mean values of neighbouring blocks. By using prediction, we are able to reduce the entropy of the mean values. Second, the calculations of the optimal s and o parameters in equation (9) can be done quickly (Øien et al., 1991). Removing the mean component from the domain block D and range block R has the effect of orthogonalising the domain residual block D and range residual block R with respect to one another. Now, the optimal value for s is the normalised inner product between D and R : s = R D D. (16) Clearly, the optimal value for o would be the mean value of R. Without mean removal, the calculations of s and o would require many more operations [see (Fisher et al., 1992)]. A disadvantage with using mean removal is the increased decoding and overall coding times, due to the need to perform mean removal for all the domain and range blocks. The mean values are coded by using predictive coding. Various degrees of complexity are possible, with a simple method being to predict the mean value from the block to the left of the current block. With high correlation between neighbouring blocks, the entropy of the prediction error is expected to be much lower than the original mean value s. This saving in bit rate can be realised through entropy coding (Huffman, 1952). This is the approach taken in the JPEG coder and was found to yield good results (Wallace, 1991). The same predictive coder 37

43 is used here; however, this method can be improved for a classified coder such as TSFC. Ngan and Koh (1992) used 2-D prediction, which generally performs better for vertical edges and poorer for horizontal edges. The prediction accuracy was significantly improved by taking into account the orientation of the present block and the surrounding blocks. Also, note that for a non-progressive coder, the class information can also be predicted (Ngan and Koh, 1992), but this is not possible with a progressive coder that uses the class information to specify the transmission sequence Coding the Transformation Parameters Uniform blocks can be represented economically as the entire block is estimated with just its mean value. As mentioned in the previous section, the mean value is predictively coded. The prediction should be quantised sufficiently to diminish blocking effects. A 7-bit uniform quantiser was chosen and was found to yield good results. This is followed by entropy coding so the actual bit rate for the mean value should be noticeably less than 7 bits per sample. In coding all the non-uniform blocks, 5 bits are allocated for the s parameter and 7 bits for the o. 5.5 Preliminary Simulation Results and the Difficulties Encountered The simulations were done on a HP 720 workstation, with the programs coded in C. Ratedistortion results are not presented because the preliminary results indicated a number of difficulties with the approach. Both types of transmission sequence mentioned in The New Perceptually Based Approach were experimented with, and the comments here apply to both of their results Simulation Results Figure 11 illustrates the transmission sequence for the Lenna image. However, because the simulation was incomplete, coding was applied only to the uniform blocks. The non-uniform blocks were merely duplicated from the original image to demonstrate the progressive buildup that would occur if a high-quality block coder was used. 38

44 (a) (b) (c) (d) (e) (f) (g) (h) Figure 11: Transmission sequence, ordered from (a) to (h) The transmission sequence can be divided into three main phases. The first phase transmits the mean values of the blocks. This is done in three stages, with each stage handling blocks of size 16 16, 8 8, and 4 4, respectively. The second phase is divided into two stages. The first stage transmits the 8 8 edge blocks, and the second stage transmits the 8 8 mixed blocks. 39

45 Finally, the remaining 4 4 non-uniform blocks are transmitted in the third phase. There are three stages in this phase, and the classes are transmitted in the order of edge, mixed, and texture. Table 4 displays information about the distribution of the block types. Sequence Block Type Number of Blocks a Uniform 25 b 8 8 Uniform 485 c 4 4 Uniform 2631 d 8 8 Edge 812 e 8 8 Mixed 268 f 4 4 Edge 1574 g 4 4 Mixed 865 h 4 4 Texture 4654 Table 4: Distribution of block types The transmission sequence demonstrates that the perceptually based approach does convey perceptually important information early in the progressive build-up. The first phase can be transmitted with a fairly low bit rate since only mean values are transmitted. In the second phase, it can be observed that the classifier has effectively located the important edges in the image. This quickly allows the user to see the structural appearance of the image. In contrast, a method not based on such a classification scheme will have a poorer chance of estimating the structural content of the image Low Compression Ratio One disadvantage of the quadtree fractal coder is that approximately 28 bits (see Table 2) are needed to represent a non-uniform block. When the block size is 4 4, this represents a low compression ratio of 4.6:1. Furthermore, in the simulations, there were a large number of 4 4 texture blocks. 40

46 One method to reduce the number of these blocks is to use 8 8 texture blocks. The reasons this can be done are as follows. First, being last in the transmission sequence, texture blocks have a more similar set of domain pool than other block classes. Second, a way to code large texture blocks without compromising edge quality can be done as follows. Initially, all the blocks are classfied using the minimum 4 4 block size so that the classifier can work reliably. Next, groups of texture blocks that make up a larger 8 8 texture block are located and coded as a single 8 8 texture block Block-Interdependence Nature of Fractal Coding An inherent problem with fractal coding is that the quality of a given block depends on the decoded quality of its domain block. For example, in coding the edge blocks, the domain image will consists of edge blocks and the mean values for all other blocks. In coding a given block, the coder assumes that the edge blocks in the domain image are coded very accurately. This assumption creates problems in the decoder because a badly decoded edge block will mean a badly reproduced range block. This problem is intensified in TSFC because for a given transmission stage, the number of domain and range blocks are relatively small. This tremendously increases the block interdependency. 5.6 Conclusion TSFC was not a very promising approach due to the transmission-sequence constraints and the limitations of fractal coding. A number of improvements are possible. Since the domain image consists of mean blocks, increasing the decimation factor from 2:1 to 4:1 can be expected to increase the similarity between the domain and range pools. Also, another possibility is to explore hybrid coding. For example, an edge-pattern coder, eg. (Chen and Bovik, 1990), can be used to code the edge blocks with high compression ratio and image quality. In such a case, 8 8 block sizes can be used for the mixed and texture blocks since the edge blocks that form the domain pool have higher quality. The disadvantage of a hybrid coder is that multiresolutional decoding is no longer possible. The investigations of these extensions are the subjects for further research. 41

47 6. PYRAMIDAL FRACTAL CODER 6.1 Introduction A pyramidal coder transmits the highest pyramid level as the initial approximation image. Lower levels are transmitted for the subsequent approximations. The proposed pyramidal fractal coder (PFC) codes each pyramid level with fractal compression. This section discusses the motivation, implementation, and simulation results of the PFC. 6.2 Motivation The advantages of a multiresolutional processing technique can be found in Mallat (1989), Burt (1984), and Rosenfeld (1984), and the important ones are summarised here: 1. Psychophysical and physiological experiments of human vision have shown that the human visual system decomposes the retinal image into a number of spatially oriented frequency channels (Mallat, 1989). Furthermore, these channels can be modeled by independent bandpass filters with octave separations between peak responses (Mallat, 1989). A dyadical pyramidal decomposition, whereby the resolution halves from one level to the next lower, mimics this latter property. 2. In image processing, there are two main advantages (Mallat, 1989). First, the hierarchical structure means that image detail at different scales can be treated differently. For example, a low resolution should be used when we are processing coarse image features. Second, it allows a coarse-to-fine processing approach, whereby knowledge about coarse features can be used to direct the processing of fine features. 3. Due to the similarity of the image at different scales, a hierarchical decomposition offers attractive decorrelation properties. This means that we can predict information at smaller scales using that of larger scales. When used as a structure for a fractal coder, further advantages can be seen by comparing this approach to the residual-quantisation approach. First, the hierarchical approach is more computationally efficient because the size of the initial approximation image is small. In 42

48 contrast, the residual coder deals with a full-size image throughout. Second, the artefacts on the initial images are less noticeable with the hierarchical approach because the images are smaller. Hofmann and Troxel (1986) provided a discussion on this, concluding that the image display size should vary according to the image resolution. Thus, when the resolution is poor, a smaller-sized image is generally more recognisable than a larger-sized one. This proposed method is also easy to implement since it requires only simple modifications to existing quadtree fractal coders. Yet another advantage is that the multiresolutional-decoding capability of fractal compression can be used to enhance the quality of the expanded pyramid levels. 6.3 Pyramidal Decomposition Pyramid structures can be formed by both spatial and frequency decomposition. Frequency decomposition methods such as wavelet and subband systems are not discussed here because their implementation requires significant modifications to existing quadtree fractal coders. In a pyramid structure, the information describing the image is organised into n hierarchical levels. For PIT, since the number of transmission stages is the same as the number of levels, it is advantageous to have large n so that the initial stage will require a small number of bits to be recognisable and there is sufficient progression in the transmission. On the other hand, increasing n will mean an increase in bit rate and coding time since additional data must be processed. For fractal coding, the added constraint is that the initial image must be sufficient large so that the domain pool is large enough. Based on this, n = 3 was chosen, and Figure 12 illustrates the 3-level pyramid. A number of different types of pyramids are described in Goldberg and Wang (1991). The difference pyramid is selected as it is suitable for most types of coding. This versatile pyramid also reduces the interlevel correlation by forming residual images between the levels. To form an n-level difference pyramid of an image X n-1 of size N N, the first step is to form a truncated mean pyramid (adapted from Goldberg and Wang (1991)): 1. Initialisation: Let level k = n - 1 and X k = X n-1. 43

49 2. Formation of Level k - 1: Divide X k into non-overlapping blocks of size 2 2 and calculate the mean Xk 1, i, j = Xk, 2i, 2 j + Xk, 2i, 2 j+ 1 + Xk, 2i+ 1, 2 j + Xk, 2i+ 1, 2 j+ 1, (17) 4 where i, j = 1, 3, 2 k - 1. This decimation step is also known as the Reduce process, by which X ( X ) l 1 = Reduce l. 3. Termination: Let k = k - 1 and go to step 2 if k > 0; otherwise, stop. For generality, it is sufficient to assume that Reduce( ) results in a decimation of pixels from the lower pyramid level to form pixels in the higher level. This decimation can be made more efficient by using a low-pass filter prior to subsampling (Houlding and Vaisey, 1994). X 0 : Level 0 ( N /4 N /4 ) X1: Level 1 ( N /2 N/2 ) X 2: Level 2 ( N N) Original Image Figure 12: 3-Level Pyramid The complement of Reduce( ) is Expand( ), by which X k = Expand(X k-1 ). A number of different interpolation methods can be used for Expand( ), and these will be discussed later. The simplest method uses pixel replication: Xk, 2i, 2 j = Xk, 2i, 2 j+ 1 = Xk, 2i+ 1, 2 j = Xk, 2i+ 1, 2 j+ 1 = Xk, i, j. (18) 44

50 After the mean pyramid X k has been formed, the top level of the pyramid X 0 is fractally coded, producing Y 0. To code the next lower pyramid level, Y 0 is decoded to form X $ 0 and then expanded. Now, we fractally code the residual image R1 X1 ( X0) = expand $ to get Y 1. This process repeats until we reach the lowest pyramid level. This can be summarised in the steps below, with X $ 1 = 0: ( ) Rk = Xk expand X $ k 1, (19) Yk [ R ] = code k, (20) ( ) $R k = decode Yk, (21) ( ) X $ k = R $ k + expand X $ k 1, (22) for k = 0, 1,, n - 1. As can be seen, at level k, X k is the original truncated mean pyramid, $X k is the reconstructed mean pyramid, R k is the residual pyramid, R $ k is the decoded residual pyramid, and Y k is the fractal code that is transmitted by the coder each stage. The block diagram of the 3-level coder is shown on Figure 13. For generality, the residual images R k can be coded using virtually any type of coder, making the difference pyramid a very versatile structure for PIT. decode Y k in equation (20) instead of Xk expand ( Xk ) $ 1 Using ( ) ensures that the coding error does not propagate throughout the levels. For this reason, the final reconstruction error is due to the error incurred at coding the lowest level n - 1 alone. 45

51 Transmit to Channel code Y 0 decode ^ X 0 + Σ - code Y 1 decode + + Σ ^ X 1 expand R 1 expand + - code Σ Y 2 R 2 Figure 13: Block diagram of pyramidal fractal coder

52 Y 0 decode ^ X 0 expand Y 1 decode + Σ - ^ X 1 expand decode Y 2 + Σ - ^ X 2 Received from Channel Images at Various Transmission Stages Figure 14: Block diagram of pyramidal fractal decoder The decoding process is as follows, with X 1 $ = 0: $ ( ) ( $ k = k + k ), X decode Y expand X 1 (23) for k = 0, 1,, n - 1, and is the same as equation (20). The received fractal data Y k is decoded and added to the expanded version of the previous higher pyramid level. The result $X k is transmitted as the reconstructed pyramid level at each transmission stage. As can be observed, the coder is more complex than the decoder as it includes the decoding process too. The block diagram of the 3-level decoder is shown on Figure

53 6.4 Fractal Coding From Figure 13 it can be seen that the coding of the residual images R k is rather straightforward because each image is coded independently of the others. This is possible because a significant amount of interlevel correlation has already been removed through the structure of the difference pyramid. More specifically, the process of finding the residual images can be likened to a form of predictive coding. The result is that the pixel values of the residual images are centred around 0 with a variance considerably smaller than the original image s. Figure 15 shows a set of typical residual images. A residual value of 0 is represented by a gray pixel value of 127. As can be seen, a large majority of the pixels have values very close to 127. In addition, Figure 16 shows the distribution of block-mean values for the different residual images. Again, the block-mean values are clustered around 127 for levels 1 and 2. The proposed pyramidal fractal coder is rather simple. Each level of the difference pyramid is encoded using a quadtree fractal coder similar to the one proposed by Yuval et al. (1992). There is enormous flexibility in choosing the coding parameters for each level. However, the following general guidelines were proved to be useful. Since there are fewer image data to be coded in the higher levels, it is logical to code these higher levels with more bits per pixel. Furthermore, doing so will aid the coding of lower levels appreciably since there will be less high-detail regions to be coded. Apart from being allocated higher bits per pixel, the highest pyramid level is also encoded differently from the other levels. Figure 16 shows the distribution of the block means for this level. As expected, its distribution resembles that of the original image since this image is merely a subsampled version of the original. As with TSPIT, this suggests that mean removal followed by prediction of the mean can lead to bit-rate reduction. 48

54 (a) (b) Figure 15: Residual Images; (a) Level 1, ; (b) Level 2, Level 0 Level 1 PDF PDF Mean Value Mean Value Level PDF Mean Value Figure 16: PDF curves for block mean values in different pyramid levels 49

55 6.4.1 Interpolation Methods There is considerable flexibility in the choice of the Expand( ) interpolation function in Figure 13 that expands one pyramid level to the next larger size. Pixel replication or nearestneighbour interpolation is commonly used as it is the simplest to implement (Gonzalez and Woods, 1992). Sanz et al. (1984) compared its performance with those of cubic splines and cubic convolution (Berstein, 1976). They concluded that both methods outperformed replication appreciably, with cubic splines outperforming convolution slightly. The main disadvantage with using cubic splines is the high computational requirements. However, new approaches such as those suggested by Unser and Aldroubi (1992) and Lee and Paik (1993) can overcome this problem. Wang and Goldberg (1991) also evaluated a Gaussian-like 5 5 filter and concluded that it significantly outperformed pixel replication. The problem with this method is that it is computationally intensive and therefore, increases the decoding time. Since fast decoding time is attractive for most applications, alternative approaches were explored. One method that offered good performance-complexity trade-off is linear interpolation (Jain, 1991). Zeng and Venetsanopoulos (1992) suggested that non-linear interpolation is a better alternative. The advantages include faster computational time and higher quality. The everyother-row-and-column (EORC) lattice is used to sub-sample the original image. The interpolator structure is shown on Figure 17. The X members of the lattice represent the sampled pixels, and the other Y and Z members will have interpolated values. Zeng and Venetsanopoulos (1992) proposed four different interpolation algorithms. INTER1 is chosen on the basis that it performs well with natural images and has very low computational requirements. With this scheme, the interpolated values are calculated as follows: ( ) Y = median X1, X2, X3, X4 (24) Z1 = X1 + X2 X X Z 1 +, 3 2 = (25) 2 2 Z 3 and Z 4 are found in similarly fashion to Z 1 and Z 2, respectively. 50

56 X1 Z1 X2 Z2 Y Z3 X3 Z4 X4 Figure 17: Interpolator structure for EORC lattice A fractal decoder offers an alternative to these interpolation methods since a fractally coded image can be decoded to any resolution. This is particularly important for a pyramid coder since fractal expansion can be used to interpolate a given pyramid level to the next lower level. For an image that has been coded with sufficient accuracy, it is expected that this interpolation will outperform pixel replication since the latter results in aliasing or staircase effect. However, incorporating fractal interpolation is quite complicated. The other techniques mentioned above can be used to expand any image, but fractal interpolation can only expand an image that has been fractally coded. This means that only the residual images and the highest pyramid level can be expanded. Therefore, to fractally expand Xk $ to level k + 1, R $ j : j = 1L,, k must be expanded to the size of level k + 1. This suggests that the increase in decoding time and memory space will be unacceptable if there are a large number of levels. For a 3-level pyramid, however, the additional computations are moderate. R0 $ must be expanded to the sizes of levels 1 and 2, and R1 $ must be expanded to the size of level Iterated coding The degree of nonuniformity in image quality is generally larger for fractal coders when compared to other coders (See Approximation problems ). The root of the problem is the discrepancy between the domain image of the coder and that of the decoder. The coder s domain image is always based on a transformation of the original image. The decoder s, however, is based on the image being iteratively decoded. Expectedly, if the decoded-image quality is poor, the decoder s domain image will be quite different from the coder s. This is a problem because the transformation parameters are calculated based on the coder s domain image. When these parameters are used for decoding, the differences between the domain images will added to the inaccuracy of the approximation. 51

57 Additionally, the coder cannot determine if the approximation is poor until the domain block has been decoded. However, since there are interdependencies between the blocks, the domain block cannot be decoded until the whole image has been coded. One method to overcome is problem is to iterate the coding process. This method has also been described, independently, in Barthel and Voye (1994). The first coding iteration is done conventionally and uses the transformation of the original image as the domain image. After the image has been coded, it is immediately decoded, with the appropriate number of decoding iterations. The decoded image is used as a basis for the domain image on the second coding iteration. The original image is coded again, using this new domain image. The parameters calculated in this iteration are expected to be more accurate since the domain image more closely approximates the decoder s domain image. A number of variations are possible with this iterated-coding process. First, the subsequent coding iterations do not have to repeat the entire coding process. To reduce coding time, the previous domain block locations and symmetry operations can be retained, with just alter the s i and o i values altered (Barthel and Voye, 1994). The second variation recognises that even domain images used in subsequent coding iterations will be different from the decoder s domain image since the latter will change when the transformation parameters have changed. To deal with this, it is proposed that the coder s domain image be an average of the original image and the decoder s domain image. Simulation results showed that this produced better results Coding of uniform blocks It was observed that the sub-images generated by the pyramidal decomposition contained a fairly large amount of uniform areas. These uniform areas can be coded efficiently in a number of ways. First, the quadtree method allowed variable block sizes to be used, and this meant that large uniform regions could be represented by a larger block. Second, to quickly identify uniform blocks, a method was devised to measure block uniformity. This technique has two advantageous. First, searching is not required when a uniform block is coded, thus reducing the coding time. Second, the block can be represented with just the contrast-scaling parameter 52

58 s i and luminance-shift parameter o i since it does not require a domain block. This is a tremendous reduction in bit rate compared to representing the block with an affine transformation of a domain block. A critical implementation issue was the selection of the uniformity measure. The approach adopted is to use a measure that was HVS based since block uniformity is a perceived quality. This measure is based loosely on the first stage of Wang and Mitra (1993) s classifier. The first step is to find the maximum and minimum pixel values of the block. Then the luminance change is approximated as the difference between these two values. If the change is greater than unif then the block is classifed as non-uniform. A similar measure was also used in Nasiopoulos et al. (1991) with a unif value of Coding of the transformation data In image-coding methods, further coding gain can be obtained through quantisation and entropy coding of the coding data (Rabbani and Jones, 1991). Since the pyramidal images are differentially coded data, in can be expected that each residual image in the pyramid has pixels with highly skewed Laplacian probability density function (PDF) (Rabbani and Jones, 1991). Figure 18 confirms this, showing the PDF curves along with the fitted Laplacian PDF s. For a discrete memoryless source, the entropy is the lower bound on the average number of bits needed to code each symbol (Storer, 1988). Thus, the effectiveness of applying entropy coding on the transformation data can be gauged by calculating the entropies of the transformation parameters. For a typical coded image, the entropies were found to be as follows: 53

59 Data type Level 0 Level 1 Level 2 bits entropy bits entropy bits entropy Domain block coordinates 4, , , , , , 5.57 Contrast scaling s i Luminance shift o i Symmetry operation Table 5: Bit allocation and entropies for transformation parameters In the case of coding level 1 s luminance-shift parameter o i, for example, an ideal entropy coder can code with an average of 2.18 bits per parameter, instead of 7 bits. For the simulations, the entropy coding was realised by an adaptive arithmetic coder (Nelson, 1993). Arithmetic coding overcomes the limitations of Huffman coding and can achieve the theoretical entropy bound in coding the source (Witten et al., 1987) Level Level PDF 0.01 PDF Offset Level Offset PDF Fitted Laplacian PDF 0.3 PDF Offset Figure 18: PDF curves for offset values in different pyramid levels 54

60 6.4.5 Mean-removal coding Mean-removal coding involves finding the mean value of the block, subtracting the mean value from each pixel in the block, and coding the mean residual. For PFC, the main advantage in coding the mean value instead of the offset value o i is that the blocking artefact is reduced for low-bit-rate coding. Other advantages, as with TFSC, are that prediction coding can be used and the mean blocks of the topmost pyramid level can be transmitted as a very low bitrate approximation. 6.5 Simulation Results The simulations were carried on a HP 720 workstation, with the programs coded in C. The main image used was the monochrome Lena image, although the coder was shown to be robust with other images from the USC image database. The criteria considered here are compression ratio, objective image fidelity, coding time, and subjective image quality. The subjective image quality is presented by pictures of the reconstructed images and comments about the general appearance of the images and their artefacts. Table 6 shows a sample of the simulation results for 3-level pyramids. These parameters give generally smooth progressive build-up between the stages. The final images of the first six of these coding results are shown on Figure 19. For the bpp coder on the last row of Table 6, Table 7 displays the bit rate and relative PPSNR of the different pyramid levels. Note that, for the first two levels, the PPSNR is derived from comparing the reconstructed image with the sub-sampled image, not the original image. High R-D performance is possible if much fewer bits are used to code the first two levels. The results for this configuration is shown on Table 8. These results have not been included into Table 6 because the first two pyramid levels have very poor quality. 55

61 (a) bpp, db (b) bpp, db (c) bpp, db (d) bpp, db (e) bpp, db (f) bpp, db Figure 19: Last-level images from PFC simulation 56

(a) (b) (c) 0.0784 bpp, 0.09198 bpp, 28.35 db 0.7012 bpp, 33.45 db 27.

62 (a) (b) (c) bpp, bpp, db bpp, db db Figure 20: Images from the PIT stages of the PFC Bit rate PPSNR Coding Subjective quality of final image (bpp) (db) time (s) relatively poor quality, strong quantisation noise and blurring, strong fuzziness on edges good quality, fuzziness around edges, contouring on background good quality, blocking on shoulder, contouring on background, some fuzziness around edges good quality, some fuzziness around edges, contouring on background good quality, good preservation of edges very high quality, few noticeable artefacts very high quality Table 6: Results for PFC 57

5.7. Fractal compression Overview

5.7. Fractal compression Overview 1. Introduction 2. Principles 3. Encoding 4. Decoding 5. Example 6. Evaluation 7. Comparison 8. Literature References 1 Introduction (1) - General Use of self-similarities