CID:IQ - A new image quality database. Xinwei Liu

Size: px

Start display at page:

Download "CID:IQ - A new image quality database. Xinwei Liu"

Lora French
6 years ago
Views:

1 Xinwei Liu Masteroppgave Master i Teknologi - Medieteknikk 30 ECTS Avdeling for informatikk og medieteknikk Høgskolen i Gjøvik, 2013

2 Avdeling for informatikk og medieteknikk Høgskolen i Gjøvik Postboks Gjøvik Department of Computer Science and Media Technology Gjøvik University College Box 191 N-2802 Gjøvik Norway

3 Xinwei Liu 2013/11/22

5 Abstract The aim of our study is to develop a new image quality database: CID:IQ (Colorlab Image Database: Image Quality) which is capable of evaluating and benchmarking image quality algorithms. A large number of image quality algorithms, also known as image quality metrics, have been developed over last two decades and the amount of metrics has continued to grow. The fact is, existing image quality databases are not satisfied by the evaluation of image quality metrics any more. The lack of reference images design and assessment principles, the limited distortion aspects, and uncontrolled experimental environment are the reasons why a newly developed image quality database is desired. In this thesis we propose the methods to select and evaluate the reference images which has never been considered in existing databases. In addition, more different types of distortion especially color related distortions, which were less focused in previous works, are used in the CID:IQ database. Another innovation in this project is that we apply two viewing distance and control the viewing conditions when conducting the psychophysical experiments. At last we also employ new approaches to assess our CID:IQ database compare to existing ones. The CID:IQ database will be available in relative research fields and people can use it free of charge for research and academic purposes. iii

7 Acknowledgments Most importantly, I thank my family especially my father Bujian and my mother Zhuoping who have supported and encouraged me into taking all chances and making everything best. Thank you for the weekly calls and sharing all your positive energy to me. I could not make all thoughts came true without your help. I would also like to thank my first academic supervisor Associate Professor Marius Pedersen for introducing me to this research area and initially presented me this interesting master project to focus on the perceptual image quality and psychophysical studies. Moreover, for your invaluable and timely sustain that I could overcome all obstacles. In addition, I thank my second supervisor Professor Jon Yngve Hardeberg, who always proposed excellent ideas and showed great foresight. I want to thank Professor Ivar Farup as well because the help you gave to me about L A TEX compilation. Furthermore, I thank all members in The Norwegian Colour and Visual Computing Laboratory for providing me all necessary apparatus and materials for this work. I could not make this master project without all your help. To one of my incredibly best friends Gerardo and all his family members, Tiril, Zach, Ulla, and Kjell, thank you for all your helpfulness and taking good care of me. After stayed together two and a half year we are like a family. I could not survived in Norway without all your help. To everyone that participated in my 3-hour experiment by using their personal time: Ping, Xiaolin, Andrew, Kiran, Mariia, Congcong, Guoqiang, Racheal, Deming, Tingru and so on, thank you for your kindly participation. I could not complete my experiment without your help. Lastly but definitely not least, for the two people that have been with me for more than two years, Annie and Parinaz, you made my life good and wonderful. Thank you for handling the good, the bad and the ugly me with so much success. You have changed me to become better and better. You should remember that no matter where you are, you always have me. Xinwei Liu, 2013/11/22 v

9 Contents Abstract iii Acknowledgments v Contents vii List of Figures ix List of Tables xi 1 Introduction Motivation Aims Research methods Background Image quality Evaluation of image quality Subjective image quality evaluation Objective image quality evaluation Essential considerations in evaluation experiment Image quality database Literature survey A survey of existing image quality databases Gray image quality databases Color image quality databases Privately owned image quality databases Experimental images design Number of image Type of image Standard test images Analysis of images used in existing databases Survey of distortions Distortions in image quality metrics Distortions in image quality databases Other distortions Classification of distortions The CID:IQ database Reference images Subjective method for reference images evaluation Objective method for reference images evaluation Distortions Compression Noise Blurring Gamut mapping distortions vii

10 4.3 Experiment setup Viewing conditions Psychophysical experiment Experiment results Experiment results analysis Distortions data analysis JPEG2000 compression distortion JPEG compression distortion Poisson noise distortion Gaussian blur distortion Gamut mapping distortions Images data analysis An example of comparing subjective and objective results Conclusions and Further Work Contributions Conclusions Further Work Bibliography A Reference image statistics B ICC profile gamuts C Frequency tables D Subjective results E Image mean opinion scores viii

11 List of Figures 1 Example pair comparison experiment LFM values vs. Z-scores % Confidence Interval Example rank order experiment Example category judgment experiment The quality scales of MOS different kinds of image quality metrics The image quality database designed workflow MacBeth ColorChecker color rendition chart CIE recommends gamut mapping test ski image Test images in ISO Test images in ISO The eight natural test images in ISO Test images in ISO Kodak Lossless True Color Image Suite Reference images in TID2008 database The synthesized image in TID2013 database Reference images used in LIVE image quality database Reference images in CSIQ database Reference images in IVC database Reference images in VCL database Reference images in A57 database SI CF for color image databases Relative ranges R i of source characteristics Stages of the image segmentation process Image quality attributes Distortion classification Reference images Results of objective reference images evaluation Comparison of SI vs. CF results between 9 image quality databases Comparison of Busyness results between 9 image quality databases Comparison between the original image and JPEG compressed image Comparison between the original image and JPEG2000 compressed image Comparison between the original image and Poisson noise image Comparison between the original image and Gaussian blurred image CIE1931 color space chromaticity diagram vs. visible srgb gamut Comparison between the original image and two gamut mapped image Screenshot of ICC3D software Comparison between five ICC profile gamuts Histogram of ICC profile gamut volumes ix

12 41 Real experiment setup The quality scales of category judgment experiment Tools used for observer visual test The structure of training sequence EIZO ColorEdge CG inch monitor Experimental Graphic User Interface The demonstration of image appearance order Z-scores for JPEG2000 distortion Z-scores for JPEG distortion Z-scores for Poisson noise distortion Z-scores for Gaussian blur distortion Z-scores for gamut mapping distortion Image Z-scores and values for Image Gamut for Image Image 12 gamut and profile gamut comparison Objective and subjective results comparison Z-scores vs. SSIM for Image x

13 List of Tables 1 The pair comparison results Summed frequency table Percentage matrix Logistic Function Matrix Final Z-score matrix Mean Z-scores Raw data recorded from a rank order experiment The frequency matrix converted from rank order experiment The category judgment results Category judgment frequency table Category judgment cumulative frequency table Category judgment cumulative percentage table Category judgment LFM table Category judgment Z-score matrix Category judgment difference matrix Category judgment boundary matrix Category judgment scale values Existing public image quality databases Existing preliminary image quality databases Characteristics and typica1 usage of the natural images from ISO Characteristics and typica1 usage of the natural images from ISO Characteristics and typica1 usage of the natural images from ISO Characteristics and typica1 usage of the natural images from ISO Distortions tested in image quality metrics Distortions used in image quality databases Distortions discovered in other image quality issues Table of features for subjective reference images evaluation Z-scores values for JPEG distortion Z-scores values for JPEG distortion Z-scores values for Poisson noise distortion Z-scores values for Gaussian blur distortion Z-scores values for gamut mapping distortion Frequency tables for gamut mapping distortions Scale, boundary, and confidence interval values for Image An example of comparing SSIM results and Z-scores Spearman and Pearson Correlations xi

15 1 Introduction The evaluation of digital images quality is a significant part in many image processing applications. Before a final observer such as a human being looks at images, they will be processed in a lot of steps which can introduce perceivable distortions to the images. These distortions will decrease the quality of images. In order to enhance image quality while reduce distortions, it is important to have an indicator which represents the image quality scale. For this purpose, image quality evaluation algorithms (also called image quality metrics ) are commonly used to assess the quality of images. Numerous of image quality metrics have been proposed over the past and the number is still increasing. They all promised they have made progress in their particular domains. However, we also need ground truth for the assessment and benchmarking of image quality metrics. Therefor, image quality databases are developed for this requirement. 1.1 Motivation There are so many existing image quality databases such as very famous and commonly used TID2008 [1, 2, 3], LIVE [4, 5, 6], Toyama (MICT) [7] and so on. TID2008 [1, 2, 3] database contains the largest number of observers while Toyama (MICT) [7] database only focuses on JPEG compression distortions. All databases have their own special purposes but most of them have three main shortages that need to be taken seriously consideration in the developing process. First, reference images are the kernel part of an image quality database so they have to be selected very cautious. More than half of the databases used analog images which were scanned from negative as their reference. The quality of these images are limited and uncomparable to current digital images. Even some of databases used high quality digital images, they had lack of analysis for the reference images to demonstrate whether these source contents cover enough ranges of characteristics such as the spatial information and colorfulness. Second, the types of distortion used in most of the databases are similar. JPEG compression artifacts, Gaussian noise, and blurring often appear in the current databases. Nevertheless these distortions are the most common distortions, but none of the databases focus on the color distortions (e.g. color shifts, gamut mapping artifacts and so on). Next, viewing conditions are important in experiment aspect. Particularly the choise of environmental illumination and the viewing distance. The viewing condition information is rarely payed attention or detailed introduced in the existing databases. In conclusion, development of a new image quality database is required. 1.2 Aims The current work aimed at developing a new image quality database. Before achieve this goal we need to survey all existing databases in order to investigate the advantages and disadvantages from these previous works. Along with the survey, we will propose both objective and subjective methods to guide the reference images selection and evaluation. Then exploring common distortions in image quality field especially color related 1

16 distortion types. Afterwards, CID:IQ database will be launched with following new characteristics: 1. Reference images are selected by proposed principles and analyzed by proposed approaches. 2. Distortions should cover both the normal distortion types and color related types. 3. Subjective experiments which take into account viewing conditions based on recommended standards and apply different viewing distances. 4. Raw evaluation data recorded from human observers will be provided. Finally, the comparison between the new database and the existing databases and analysis of experiment results will be made. The CID:IQ database will be available at: Research methods Because the image quality database is used for evaluating the performance of image quality metrics, therefor psychophysical experiments will be carried out by involvement of the human observers. For analyzing results of these experiments we will use two kinds of methods: qulitative approach and quantitative approach. By using the qualitative approach we can make a more detailed and integral characterization. Meanwhile by using the quantitative approach we can make numerical analysis of raw data in order to understand the results. Before conducting the psychophysical experiments and analyzing the perceptual data, it is necessary to have a good overview and understanding of the state-of-the-arts related knowledge in literature. In this thesis both theoretical and practical works are included such as literature surveys and experiments. 2

17 2 Background In order to allow the reader to understand and follow the knowledge without any inconvenience, it is important to present all terminologies employed in this study before carrying on the technical parts of this document. In this chapter, all indispensable knowledge and involved concepts will be introduced. 2.1 Image quality The development of image quality study has a long history. The concept of image quality was originated in the optical field. Optics, as a science and technology, dates back to about 1200 B.C. with the invention of curved mirrors (Hecht and Zajac, 1974). In the twentieth century, the pace of development of television and digital imaging technologies had a significant speed-up. Because of the introduction of television and digital imaging, the image quality became more and more ordinary. If we look back to the history of imaging technology, the fact was, image quality has not been at the top of the list of design criteria during the initial phases of technology development. The first image quality topic to be addressed was the rendering of tones that comprise the image, then the spatial structure or the image details. Finally, as the imaging technology develops, attention is focused on the color quality of the image. The definition of image quality can be found in many literatures, for instance, Engeldrum [8] defined image quality is the intergrated set of perceptions of the overall degree of excellence of the image. Janssen [9] defined image quality as: the quality of an image is the adequacy of this image as input to visual perception, and the adequacy of an image as imput to visual perception is given by the discriminability and identifiability of the items depicted in the image. Jacobson [10] defined image quality as the subjective impression of the observer that how well the observer think the particular image is. Keelan [11] defined image quality as the impression of the overall merit or excellence of an image as the same as perceived by an observer. It is not involved with the sujective methods and neither associated with the act of photography. 2.2 Evaluation of image quality Digital imaging and image processing technologies have revolutionized the way of imaging workflow. One of the most important purposes of these developed technologies is to enhance image quality. As a result, the need to have methods which can evaluate image quality increases. In the recent connected word, there are two main categories of approaches to evaluate image quality, objective method and subjective method. The most common objective method is the use of image quality metric, meanwhile the subjective approach can be the experiment which is conducted by the involvement of human observers Subjective image quality evaluation Subjective assessment always relate to psychophysical experiments where observers are requested to rate a group of images based on some particular principles. Subjective evalu- 3

18 ation methods are very important in this work because we need to evaluate the objective approaches (image quality metrics) by using subjective data. This is the main reason why we need an image quality database. If we want to make sure whether the objective data reflect perceived quality, the only way is to check the correlation between the objective results and subjective results. So a thorough introduction of different subjective quality evaluation approached is presented. Thresholds and just-noticeable differences Ernst Weber conducted the first psychophysical experiments and from these experiments he discovered the Just Noticeable Difference (JND) is proportional to the stimulus magnitude and now it is called Weber s law. Gustav Fechner also conducted similar experiments and published Elemente der Psychophysik (Elements of psychophysics) [12] over the same period. This publication was an investigation of the relationship between subjective experience and stimulus intensity. In general there are three basic approaches for measuring psychophysical threshold: method of adjustment, method of limits and method of constant stimuli [8]. Psychophysical scaling Compared to the psychophysical threshold approaches which aim at figure out color tolerances, compression thresholds and so on, the target of psychophysical methods are usually to determine the scale of cognition. To implement this target, the connection between perception changes and physical changes are required to be received in scaling approaches. There are three commonly used approaches to carry out psychophysical scaling experiments: rank order, pair comparison, and category judgment [8]. Pair comparison approach In a typical pair comparison experiment, observers need to determine image quality according to a pair of images and a single observer is required to judge which image is better than the other one in a pair based on a particular principle, for instance which image has the best color reproduction or has the smallest difference compare to the original (Figure 1). There are two kinds of pair comparison experiments: forced-choice and non-forced-choice. The observers have to give an answer in forced-choice or the observers can judge the two images as equals (tie) in non-forced-choice experiments without any constrains. Assume that the observers judge p duplications for q reference images in pair comparison experiments, so totally (p(p 1) q)/2 comparisons are made. In the experiments each pair of duplications need to be shown twice, for example change the left (up) and right (bottom) positions to against bias. In this case the comparisons will be pq(p 1) times. In order to analyze the data gathered from pair comparison experiments much more convenient, we can transform the data into interval scale form. Thurstone s Law of Comparative Judgment [14] provided the method how to achieve this transformation. The interval scale data indicate the distance between a particular stimuli and the average value of the assessed group. The process of this transformation has to follow the principle which proposed by Silverstein et al. [15]: Each sample has a single value that can describe its quality. Each observer estimates the quality of this sample with a value from a normal distribution around this actual quality. 4

19 Reference Test set Experiment 1 Experiment 2 Experiment Left Left Right Figure 1: Example pair comparison experiments. In the first experiment, observer consider the left reproduction has smaller difference to the reference, in the second experiment the observer still judged the left one, and in the third experiment is the right one. The observer has to judge all possible combinations of pairs. Figure inspired by Pedersen [13] Each sample has the same perceptual variance. Each comparison is independent. Pair comparison is the most commonly used approach in a great amount of image quality evaluation experiments because of its simpleness and observers can handle it within a short period. Example of pair comparison experiment The observers are asked to choose one image from one pair (two samples) of images presented that has the features of the study administrator has focused on (e.g. which one has a higher quality). Given four reproductions (A, B, C, and D) with all possible combinations to four observers. For instance, A and B is given to the first observer, the observer has to choose one from on pair (e.g. observer chooses A), then the result "1" will be wrote down in column A and row B (Table 1 (a)). Meanwhile result "0" is recorded in column B and row A in an 4 4 matrix in this case. If the observer consider both A and B are the same, then the result is "0.5". An example of this experiment is presented in Table 1. Then if we sum all raw matrices from Table 1 we can get a 4 4 summed frequency matrix in Table 2. Next, we divide the frequency matrix by the number of observers (here it is 4), therefor we can get a percentage matrix in Table 3. This matrix represents the percentage preference of each reproduction compaired to another. Since the summed frequency matrix (Table 2) is not a logical matrix we need to transform it to a Logistic Function Matrix (LFM) (Table 4) by using the equation: f + c LFM = ln( N f + c ) (2.1) where f is the data in Table 2, N represents the number of observer, and c is an arbitrary additive constant which is 0.5 in our study suggested by Bartleson [16]. After we have LFM we need to transform it to z-scores which can be done by applying 5

20 (a) Observer 1 A B C D A B C D (c) Observer 3 A B C D A B C D (b) Observer 2 A B C D A B C D (d) Observer 4 A B C D A B C D Table 1: The pair comparison results for four observers with four reproductions. A B C D A B C D Table 2: Summed frequency table. A B C D A B C D Table 3: Percentage matrix. A B C D A B C D Table 4: Logistic Function Matrix. 6

21 Figure 2: LFM values vs. Z-scores calculated from percentage matrix. The slope of this linear regression line is A B C D A B C D Table 5: Final Z-score matrix. a scaling coefficient α. α is calculated by taking into account the relationship between the LFM values and the inverse of the standard normal cumulative distribution for the percentage matrix. Simply put, it can be achieved in both Matlab and Microsoft Excel by using function "norminv" and "normsinv" respectively. Because the transformation is linear regression so α is the slope of the regression line (Figure 2). The α in this case is 0.82, therefore the final Z-score matrix can be computed by multiplyng the LFM matrix with α (Table 5). The mean Z-scores of four reproductions are simply calculated by computing the average values in each column (Table 6). The higher mean Z-score values means the better prefer by observers. The 95% confidence Interval (CI) can be calculated as: CI = 1.96 σ N (2.2) where σ represents the standard deviation and N is the number of observers. Here the units of Z-scores have a scale of σ 2 so the value of σ is 1. Now CI = 1.96 (1/ N) and in this example CI = 1.96 (1/ 4) = So the 95% CI equals to mean Z-scores ± CI. The plot of 95% CI is shown in Figure 3. From the figure we can see that Reproduction A and C are statistically significant from reproduction D with 95% CI, while there is no A B C D Table 6: Mean Z-scores. 7

2 1.5 1 0.5 0 Z scores 0.5 1 1.5 2 2.5 3 3.5 A B C D Reproduction Figure 3: 95% Confidence Interval Z-scores from an example of pair comparison experiment.

22 Z scores A B C D Reproduction Figure 3: 95% Confidence Interval Z-scores from an example of pair comparison experiment. Reference Test set Experiment Figure 4: Example of rank order experiment. significant different from A, B, and C. Rank order approach Asking observers to rank samples is probably the easiest way to manage an experiment. In rank order approach the observer is asked to rank the images in order (e.g. from best to worst) based on a particular principle such as image quality (Figure 4). Example of rank order experiment The rank results are collected for every reproductions. By using the comparative paired comparison modeling proposed by Cui [17], the data can be transformed into pair comparison data and use the same way as pair comparison to analyse the results, so called rank score. For instance, a reproduction w has been ranked at position q for p times, then the rank score (RS) can be calculated as: RS = n 1 (n q)p N(n 1) (2.3) where n is the number of reproductions, therefor n j represents the weight of w. The 8

23 O1 O2 O3 O4 O5 A B C Table 7: Raw data recorded from a rank order experiment. A B C A B 4-2 C Table 8: The frequency matrix converted from rank order experiment. difference indicator s which investigate the relationship between rank order and pair comparison experiment need to be found before doing further calculations. So in pair comparison method the scale value s can be computed as s pair : s pair = 1 n 1 n 1 nu z( n ) (2.4) where z() is an indicator to transform the proportion of choices to Z-scores, u represents a binary value that records the preference results. Equation 2.5 shows the solution how to transform rank order data into pair comparison matrix: RS = 1 1 n nu = (n 1)n n 1 n nu [ n ] (2.5) 1 In summary, the scale value s pair : 1 s pair = z( n 1 n n 1 [ u ]) (2.6) n 1 Assume that there are 3 images (A, B, and C) and 5 observers participate the rank order experiment. The raw data is recorded in Table 7. This matrix can be converted to pair comparison frequency matrix by using the Equation 2.6 (Table 8). Then further calculations are the same as pair comparison approach. Category judgment approach Category judgment approach which supported by range-frequency theory and model [18] is also a subjective evaluation method to assess image quality. In the category judgment experiment, the observers are asked to judge an image according to a certain rule (e.g. category "Big" and category "Small") and assign this image to one of the categories (Figure 5). Usually 5 or 7 categories are used. Category judgment method is faster than pair comparison in some cases, and fewer comparisons are necessary. Because of this advantage, category judgment approach is most commonly used in experiments with large amount of images. The observers are allowed to assign more than one image to a single category. Example of category judgment experiment In this example 4 observers are asked to 9

Reference Test set 50 30 50 70 Experiment 1 50 30 40 50 60 70 Categories: 1-5 2 2 1 3 4 Figure 5: Example of category judgment experiment.

(a) Observer 1 5 4 3 2 1 A 0 0 1 1 0 B 0 1 1 0 0 C 0 2 0 0 0 D 1 1 0 0 0 (b) Observer 2 5 4 3 2 1 A 0 0 0 1 1 B 0 0 2 0 0 C 0 2 0 0 0 D 2 0 0 0 0 (d) Observer 4 5 4 3 2 1 A 0 0 0 0 2 B 0 0 0 2 0 C 1

24 Reference Test set Experiment Categories: Figure 5: Example of category judgment experiment. All reproductions are shown to an observer and in this example category "1" means the smallest difference from the reference image while category "5" is the biggest difference. (a) Observer A B C D (b) Observer A B C D (d) Observer A B C D (c) Observer A B C D Table 9: The category judgment results for four observers with four reproductions judged twice. judge 4 reproductions (A, B, C, and D) twice. The raw data is recorded in Table 9, where the values represent the frequency of one reproduction is judged to belong to a category. Then we simply sum all raw matrixes to get the frequency matrix shown in Table 10. Next a cumulative frequency matrix which including n (n 1) values and a percentage matrix are presented in Table 11 and Table 12 separately. The LFM can be found by using the same calculation method in pair comparison example (Table 13) and it can also be converted into Z-scores (Table 14). In this case the slope of this linear regression line is The next step is to calculate the difference matrix (Table 15) by using the z-score value in column i + 1 row j + 1 subtract the value in column i row j + 1. The average of the column in the difference matrix can be further used for computing the category boundaries (Table 16). The way to do this is: setting the first boundary value to 0 and then use the current value (0) adding the i th average column value in difference matrix. The final scale values will be calculated by taking the boundary value to subtract the 10

25 A B C D Table 10: Category judgment frequency table A B C D Table 11: Category judgment cumulative frequency table A B C D Table 12: Category judgment cumulative percentage table A B C D Table 13: Category judgment LFM table A B C D Table 14: Category judgment Z-score matrix A B C D Average Table 15: Category judgment difference matrix. B1 B2 B3 B4 B5 Boundary Table 16: Category judgment boundary matrix. 11

26 Average Rank A B C D Table 17: Category judgment scale values Excellent (Imperceptible) 4 Good (Perceptible but not annoying ) 3 Fair (Slightly annoying) 2 Poor (Annoying) 0 1 Bad (Very annouing) Figure 6: The quality scales of MOS. Inspired by Pedersen [13]. boundary value (Table 17). Other method: Mean opinion score The International Telecommunication Union (ITU) first defined Mean Opinion Score (MOS) as an evaluation method for audio quality. It has been extended to a method that can also assess image quality. In a typical experiment the observers are asked to assess the image quality by giving a score from one to five (similar to category judgment) where one represents the worst quality and five means the best quality. Another version of conducting the experiment is to have five levels of quality scale (excellent, good, fair, poor, and bad) (Figure 6). The MOS can be calculated as: MOS = 1 n n S i (2.7) where S i is the result score and n is the number of observers. The further calculation based on MOS is called Difference Mean Opinion Score (DMOS). In DMOS the reference image is given to observer in the experiment, therefor the final results are found by taking the score of reference image and subtracting the score of reproduction. Each reproduction has a DMOS which indicates the subjective quality of the reproduction according to the reference image Objective image quality evaluation Objective image quality assessment measures aim to predict perceived image quality as human beings which are the end user in most image-based system. One way of image quality evaluation is to automatically assess the quality of images in agreement with human quality judgments by using quality assessment metric, so called objective image quality evaluation. An image quality metric is an objective mathematical way to calcu- i=1 12

Original Image Quality Metric Quality Value Reproduction (a) Full-reference image quality assessment method Original Image Quality Metric Quality Value Reproduction (b) Reduced-reference image

Image inspired by Pedersen [13]. late quality without human observers and it is a algorithm-based mathematical method to compute image quality.

27 Original Image Quality Metric Quality Value Reproduction (a) Full-reference image quality assessment method Original Image Quality Metric Quality Value Reproduction (b) Reduced-reference image quality assessment method Reproduction Image Quality Metric Quality Value (c) No-reference image quality assessment method Figure 7: 3 different kinds of image quality metrics. Image inspired by Pedersen [13]. late quality without human observers and it is a algorithm-based mathematical method to compute image quality. In general, there are three kinds of image quality metrics: fullreference metric, reduced-reference metric, and no-reference metric. In full-reference image quality assessment methods, the quality of a reproduced image is evaluated by comparing it with the original image that is assumed to have perfect quality (Figure 7(a)). In reduced-reference image quality assessment methods, limited information (e.g. histogram, sub-signal) are available for both reproduced image and original image to evaluate the quality of the test images (Figure 7(b)). No-reference metrics try to assess the quality of an image without any reference to the original one (Figure 7(c)). In the past two decades, numerous of quality assessment algorithms have been developed. A survey of image quality metrics has been given by Pedersen et al. [19]. In this survey, about 134 image quality metrics were simply introduced. These metrics were developed based on different purposes: Human Visual System models based metrics; image structure based metrics; image statistics based metrics; machine learning based metrics; metrics for blurriness; metrics for sharpness; metrics for compression artifacts; metrics for gamut mapping artifacts and so on Essential considerations in evaluation experiment There are few essential considerations need to be taken into account when preparing the evaluation experiment, such as observer type and number, observer task instruction, and viewing conditions. 13

28 Observer type and number In general, there are two types of observers: expert and non-expert (also know as naive). Observers who have experience in assessing image quality usually fall into the expert observer type. A commonly knowledge about types of observer is that experts see things differently or give different scale values than non-expert observers. Experts may learn to make very fine distinctions of the given criteria they experience. In addition, experts can distinguish amount categories of specific criteria because they have greater power to resolve criteria-scale than non-expert observers. There is always a trade off between the choice of expert or non-expert observer. It is difficult to say which type is preferred better in an experiment but it all depends on the task of the experiment. When planning a subjective evaluation experiment, there is usually a common but difficult question need to be answer: "How many observers do we need?". There is no unified answer for this question but we can find some practical guidelines. The basic benefit of having a large number of observers is the increased precision of the result. The more observers the less errors in the result depending on the details of the statistics of the outcome. The general principle is that, the precision increases as the square root of the number of observers. The number of observers to use is according to the availability. Engeldrum [8] proposed that in the imaging arena is conducted with as few as 4 observers, and with as many as 50. A suggested range is from 10 to 30 for normal subjective experiment. CIE [20] proposed to have at least 15 observer in a typical gamut mapped image evaluation experiment. There is a trade off between the number of observers and the number of images. Because of the time consumption, it is better to have more observers than have more images. Observer task instruction The observer task instruction is a very important element that controls the context of the experiment. In order to get useful and meaningful results, observers need to be told what they are expected to do, for example, what is the attribute they are going to assess and what is their evaluation task; is there an explicit or implicit context to the task; and what criteria or definition should they use in their assessment. These aspects are the key to any successful experiment. Viewing conditions Viewing condition is one of the kernel factors in an experiment to control the accuracy, precision, and the efficiency. Experimental environment If not control the experimental environment carefully it will have adverse effects on the experiment results. The most important aspects need to take into account in managing the environment are psychological and physical comfort, noise, and surround. If the observers have a psychological and physical comfortable environment, no noise when observers doing the experiment and an appropriate place to conduct the experiment. Then the results from this experiment will have much more advantages than the uncontrolled experimental environment. Viewing illumination The illumination is very necessary to control since it will influence the perceived quality. The illuminance is measured in lumens/m 2 or lux. 14

29 For a Liquid Crystal Display (LCD), the light coming from the object or surface is the luminance, in candelas/m 2. Because the illuminance of the display can affects the perception of colorfulness, graininess and so on. Increasing the illuminance will also increase such as colorfulness, therefor it is important to have a controlled illumination. Another consideration is the Correlated Color Temperature (CCT), also known as source spectral power distribution. All color-based properties are effected by the spectral power distribution of the illuminating source simply because the color appearance changes according to the spectral quality of the source. CIE [20] has standardized plenty of light sources such as D5000 or D6500 (D65 for short). The D represents for daylight and the four digit number stands the CCT. The geometry of the sample illumination has always been neglected in softcopy image quality experiment. The recommended viewing practice is to illuminate the sample at 45 degree from the normal to the surface and view the sample normal to the surface [8]. Viewing distance Generally speaking, some image quality attributes such as sharpness, graininess and raggedness associated with spatial image structure will have scale values that vary with the viewing distance (also known as visual angle) over which the observer views the samples [8]. When the viewing distance is small the graininess and raggedness are more visible than at larger distances, and when the viewing distance increases the sharpness values decrease. This result suggests that the viewing distance may need to be controlled during the observers judgments, but it depends on the scaling objective and the level of scale precision required. On the other hand, some small magnitude of distortions are invisible when the viewing distance is large. For example in the image compression workflow, the imperceptible information in an image can be compressed more, or even completly removed. The loss of imperceptible information is difficult to recognize by observers when they view the image in a given distance but however, it is easy for the image quality metrics to identify this degradation if the metrics are not considering Human Visual System (HVS). In summary, it is important to keep the observers viewing distance constant for the single session of experiment meanwhile using different viewing distance in different sessions for the same observer also has benefit. 2.3 Image quality database Ground truth is one of the most important and useful components to evaluate and benchmark the image quality metrics (algorithms). An image quality database offers this significant ground truth information in image quality field. An image quality database designed workflow is shown in Figure 8. A typical image quality database have the following characteristics: An image quality database contains a set of reference images and distorted images reproduced from reference images. Reference images should reflect the natural scenes but synthesized images for special purposes are also considerable reference images. All distorted images are processed from reference images by applying (a) distortion(s) with several degradation levels. 15

Selection of Reference reference images images Distortions and noise of Distorted different images types and levels Subjective image quality evaluation experiments conducted by the involved of human

30 Selection of Reference reference images images Distortions and noise of Distorted different images types and levels Subjective image quality evaluation experiments conducted by the involved of human observers Subjective experiment results Figure 8: The image quality database desighed workflow. Inspired by Ponomarenko [2]. For each distorted image process, distortion levels are selected to represent a wide range of visual quality: from high quality (distortions are not visible) to bad quality (distortions are annoying). One psychophysical scaling method has to be chosen to conduct the subjective experiments. Human observers in the experiments have to pass a visual acuity and color deficiency check. Subjective image quality evaluation data which collected from psychophysical experiments is the ground truth used to evaluate image quality metrics. To compare the objective results from image quality metric and the subjective results from image quality database we can calculate the correlation between objective and subjective results. If the correlation is high, it means this metric has a high correlation with human perception. The higher correlation with human perception, the better performance and reality the metric has. 16

31 3 Literature survey 3.1 A survey of existing image quality databases In this Section we will survey and summarize the existing image quality databases that available to the public. In the recent connected world, there are more than 25 image quality databases available to the research community. Since image quality databases are based on different type of images and distortions so it is difficult to determine sharp boundaries between the wide variety of publicly available image quality databases. In order to classify these databases, we simply divided them into two group: Color image databases and Gray image databases which according to the image types in each different database. Then in each group we divided image quality databases into another two attributes: special purpose databases (which only include specific distortions or images) and general databases (which include different type of distortions or images) that according to the distortion types as well as the particular image category in each different database. In the rest of this Section we will introduce existing public image quality databases based on different groups and attributes. However, on the other hand, there are some image quality databases not available to the public because they often use for narrow focuses and specific purposes. Copyright issue is another reason why they cannot be published. Several privately owned image quality database examples will be presented as well at the end of this Section. The details of the summarized public image quality databases are enumerated in the Table 18 and privately owned database examples are listed in the Table Gray image quality databases There are six gray image quality databases. Five of them are for special purpose and only one general database. Special purpose databases IRCCyN/IVC Watermarking Databases. There are four different watermarking image databases (WID) were proposed by the Institut de Recherche en Communications et Cybernétique de Nantes (IRCCyN). The images in these databases were applied watermarks by using four separate approaches: Enrico, Broken Arrows (BA), Fourier Subband (FSB) and Meerwald (MW). Because all of these four databases are watermarking image based, we classify them into special purpose attribute. 1. IRCCyN/IVC Watermarking Enrico Database [21]. This image quality database was developed in There are five reference images in this database. 10 different distortion types with two distortion levels are applied on reference images. So totally 100 distorted images which are in BMP format at pixels resolution were shown to 16 observers. Double Stimulus Impairment Scale is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. Subjective evaluations were made at viewing distance of six times the screen height. 17

32 2. IRCCyN/IVC Watermarking Broken Arrows Database [22]. This image quality database was developed in There are 10 reference images in this database. Two different distortion types with six distortion levels are applied on reference images. So totally 120 distorted images which are in PGM format at pixels resolution were shown to 17 observers. Double Stimulus Impairment Scale is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. Subjective evaluations were made at viewing distance of six times the screen height. 3. IRCCyN/IVC Watermarking Fourier Subband Database [23, 24]. This image quality database was developed in There are five reference images in this database. Six different distortion types with seven distortion levels are applied on reference images. So totally 210 distorted images which are in BMP format at pixels resolution were shown to seven observers. Double Stimulus Impairment Scale is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. No viewing conditions are mentioned in [23, 24]. 4. IRCCyN/IVC Watermarking Meerwald Database [25]. This image quality database was developed in There are 12 reference images in this database. Two different distortion types with five distortion levels are applied on reference images. So totally 120 distorted images which are in BMP format at pixels resolution were shown to 14 observers. Double Stimulus Impairment Scale is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. Subjective evaluations were made at viewing distance of six times the screen height. Another special purpose database which contains only grayscale images is Wireless Imaging Quality (WIQ) Database. To address the problem of quality assessment for image communication, WIQ database was created using a simulation model of a wireless link. Since it focuses on wireless imaging system, so it is the special purpose database. 5. Wireless Imaging Quality (WIQ) Database [26, 27]. This image quality database was developed in There are seven reference images in this database and only one distortion type is applied on reference images. Totally 80 distorted images which are in BMP format at pixels resolution were shown to 30 observers. Double Stimulus Impairment Scale with the category judgement protocol (Bad, Poor, Fair, Good and Excellent) is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. Subjective evaluations were made in dark rooms with pre-calibrated monitor. The viewing distance is four times of the images height. General database A57 image quality database was conducted for evaluating the image quality metric visual signal-to-noise ratio (VSNR) preliminary [28]. It is the only grayscale image quality database that contains multiple distortion types. 18

33 6. A57 Image Quality Database [28, 29]. This image quality database was developed in There are three reference images in this database. Six different distortion types with three distortion levels are applied on reference images. So totally 54 distorted images which are in BMP format at pixels resolution were shown to seven observers. Continuous rating system in which each original image was tested against the distorted versions of that image is using to conduct the psychophysical experiment. Z-score is used to validate objective quality prediction models. No viewing conditions are mentioned in [28, 29] Color image quality databases As a result of our survey, 19 color image quality databases are discovered. After going to the details of these databases six of them can be classified into the general databases group and the left are special purpose databases. Special purpose databases 7. Toyama Image Quality (MICT) Database [7]. MICT database was developed in 2008 by University of Toyama, Japan. This database include 14 reference images, two kinds of distortions with six separate distortion strength. 16 observers took part in the experiment and a total of 168 distorted images were shown to them. All images in the database are in BMP format at pixels resolution. Single Stimulus Impairment Scale is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. Subjective evaluations were made in a room with low illumination. Viewing distance is four times of the image height. Because it only has JPEG and JPEG2000 compression distortion, it is one of the special purpose databases. 8. IRCCyN/IVC scores on the MICT Database [30, 31]. The Institut de Recherche en Communications et Cybernétique de Nantes (IRCCyN) repeat the experiment in IRCCyN laboratory in France but changed the CRT display to a liquid crystal display (LCD) in order to check if the display is of central importance in such an experiment. It also has a different phychopsysical scaling and 11 more observers. 9. IRCCyN/IVC DIBR Image Quality Database [32]. DIBR database was conducted in 2011 by the Institut de Recherche en Communications et Cybernétique de Nantes (IRCCyN). This database includes three reference images in BMP format at pixels resolution. The diferences of DIBR database are that reference images were captured with three different cameras at three special camera spacings. A total of 96 images extracted from three different multiview-plus-depth sequences. 12 separate DIBR-based algorithms were applied to each single sequence. Single Stimulus Impairment Scale with the Absolute Categorical Rating (ACR) and paired comparisons was used to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. Subjective evaluations were made in ITU conforming test environment and display was calibrated according to ITU-T BT.500 standard. 10. IRCCyN/IVC 3D Image Quality Database [33, 34]. This database was developed in 2008 by the Institut de Recherche en Communications et Cybernétique de Nantes (IR- CCyN). There are six reference images in this database and all of them are 3D images. All tested images are in BMP format at pixels resolution. 15 different types 19

34 of distortions with one strength were applied to reference images. A total of 90 distorted images were shown to observers. Difference of Mean Opinion Scores (DMOS) are used to validate objective quality prediction models. No viewing condition was introduced in [33, 34]. 11. IRCCyN/IVC Art Image Quality Database [35, 36]. This database was developed in 2009 by the Institut de Recherche en Communications et Cybernétique de Nantes (IRCCyN). There are eight reference images in this database and all of them are Art images. Three different types of distortions with five distortion strength were applied to reference images. A total of 120 distorted images were shown to 20 observers. All tested images are in PPM or JPG format at pixels resolution. Double Stimulus Impairment Scale is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. Viewing distance is six times of the screen height. 12. Visual Attention Image Quality (VAIQ) Database [37]. This database was developed in 2009 by the University of Western Sydney, Australia. In order to facilitate incorporation of Visual Attention (VA) into objective metric design and because of most image quality metrics have the lack of considering stronger attention to salient regions of animate, this special image quality database was conducted. It has ground-truth visual gaze patterns of 42 reference images by doing the eye tracking experiment. All images are in BMP format but no fixed resolution. Subjective evaluations were made in a laboratory with low light conditions. TU Delft Perceived Ringing (TUD1 and TUD2) Databases [38] were conducted at the Delft University of Technology, Netherlands. The subjective data on perceived ringing were collected with the aim to better understand where human beings perceive ringing artifacts in compressed images, and to develop a No-Reference (NR) metric to predict perceived ringing annoyance in these compressed images. The data result from two perception experiments: the so-called ringing region experiment (TUD1), and the so-called ringing annoyance experiment (TUD2). 13. TUD1 Database. This database was developed in For the ringing region experiment, eight reference images were JPEG compressed at two strengths, yielding a database of 16 stimuli. All the images are in BMP format but no fixed resolution. 12 observers were requested to mark any region in each stimulus where ringing was perceived, independent of its annoyance. The results were transformed into a subjective ringing region (SRR) map, indicating where in an image on average people see ringing. 14. TUD2 Database. This database was developed in 2010 as well. For the ringing annoyance experiment, 11 reference images were JPEG compressed at four strengths, yielding a test database of 55 stimuli (including the originals). All the images are in BMP format but no fixed resolution. 20 observers scored the annoyance of the ringing artifacts with a single-stimulus scoring method. A mean opinion score (MOS) was obtained for each stimulus. 15. MMSPG JPEG XR Image Compression Database [39, 40]. This database was de- 20

35 veloped in 2009 by École Polytechnique Fédérale De Lausanne (EPFL), Switzerland. As the name implies it only has one distortion type: JPEG XR compression. 10 reference images are included in this database with six different distortion levels. So totally 60 distorted images were shown to a number of observers. All the images are in BMP format at pixels resolution. Single Stimulus Impairment Scale was using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. Subjective evaluations were made in a laboratory with pre-calibrated monitor and controlled lighting lamp. Viewing distance is about equal to the height of the screen. All these conditions are defined by the AIC JPEG XR ad-hoc group [41]. 16. MMSPG 3D Image Quality Database [42, 43]. Since 2D image quality assessment issue have been investigated many years so it has a lot mature applications. Now 3D images require new image quality metrics in order to take into account the fundamental differences in the human visual perception and typical distortions of stereoscopic content. So École Polytechnique Fédérale De Lausanne (EPFL) conducted a 3D image quality database in There are nine reference images, and for each of the images six different stimuli have been considered corresponding to different camera distances. So totally 54 images which are all in JPG format at pixels resolution were shown to 20 observers. Single Stimulus Impairment Scale with the category judgement protocol (Bad, Poor, Fair, Good and Excellent) is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. Subjective evaluations were made in a laboratory with pre-calibrated monitor and controlled lighting lamp. Viewing distance is about equal to the height of the screen. Liu et al. [44] proposed a No-reference image quality metric that quantifies perceived image quality induced by blur. They also developed two image quality databases to validate the performance of the blur metric on its robustness against different image content. One of the databases only includes highly textured natural images (HTI) and another one only contains images with an intentionally blurred background (IBBI). 17. HTI Image Quality Database. This database was developed in 2010 by Liu et al. [44]. It contains 12 reference images and each image was blurred at five different levels. So totally 60 distorted images were shown to 18 observers. All these images are at pixels resolution. Single Stimulus Impairment Scale is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. Subjective evaluations were made in a standard office environment and the viewing distance was approximately 70cm. 18. IBBI Image Quality Database. This database was developed in 2010 by Liu et al. [44] as well. It contains 12 reference images and each image was blurred at five different levels. So totally 60 distorted images were shown to 18 observers. All these images are at pixels resolution. Single Stimulus Impairment Scale is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. Subjective evaluations were made in a 21

36 darkened room at a distance of approximately 60cm. It is important to note that almost all current Full-reference image quality assessment methods were specifically designed for and tested on degraded images. However, in many image processing applications, the target image is actually enhanced from the original image and thus has better visual quality. It is generally impossible for current image quality assessment algorithms to indicate whether the target image has better or worse quality than the original image. By considering this aspect, DRIQ (Digitally Retouched Image Quality) database [45] was developed. It is a full-reference enhanced-image database released by Computational Perception and Image Quality Lab (CPIQ). To the best of our knowledge, currently there is only such image database available. 19. Digitally Retouched Image Quality (DRIQ) Database. This database was conducted in It contains 26 reference images and each image was enhanced by three different levels. So totally 78 enhanced images were shown to nine observers. All these images are in PNG format at pixels resolution. Double Stimulus Impairment and continuous rating, linear displacement system is using to conduct the psychophysical experiment. Difference of Mean Opinion Scores (DMOS) are used to validate objective quality prediction models. Subjective evaluations were made in a dark room with pre-calibrated monitor. Viewing distance is about 60cm. General databases 20. Tampere Image Database 2008 (TID2008) [1, 2, 3]. TID2008 is intended for evaluation of full-reference image visual quality assessment metrics and developed at the Tampere University of Technology, Finland. The TID2008 contains 25 reference images and 1700 distorted images (25 reference images 17 types of distortions 4 levels of distortions). Reference images are obtained by cropping from Kodak Lossless True Color Image Suite [46]. All images are in BMP format at pixels resolution. A software which developed based on Swiss System is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. The MOS was obtained from the results of 838 experiments carried out by observers from three countries: Finland, Italy, and Ukraine (251 experiments have been carried out in Finland, 150 in Italy, and 437 in Ukraine). Totally, the 838 observers have performed comparisons of visual quality of distorted images or evaluations of relative visual quality in image pairs. Subjective evaluations were made in standard PC laboratory or via the internet. 21. Tampere Image Database 2013 (TID2013) [47, 48]. TID2013 is a extension version of TID2008 which released in The TID2013 contains 25 reference images and 3000 distorted images (25 reference images x 24 types of distortions x 5 levels of distortions). Reference images are obtained by cropping from Kodak Lossless True Color Image Suite [46]. All images are in BMP format at pixels resolution. A software which developed based on Swiss System is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. The MOS was obtained from the results of 971 experiments carried 22

37 out by observers from five countries: Finland, France, Italy, Ukraine and USA (116 experiments have been carried out in Finland, 72 in France, 80 in Italy, 602 in Ukraine, and 101 in USA). Totally, the 971 observers have performed comparisons of visual quality of distorted images or evaluations of relative visual quality in image pairs. Subjective evaluations were made in standard PC laboratory or via the internet. 22. LIVE Image Quality Database (Release 2) [4, 5, 6]. LIVE database (Release2) was released in 2006 by Laboratory for Image & Video Engineering (LIVE). It contains 29 reference images with five different type of distortions but the distortion strength is not specify. So totally 808 images were shown to 29 observers. All images are in BMP format at pixels resolution. Single Stimulus Impairment Scale with the category judgment protocol (Bad, Poor, Fair, Good and Excellent) is using to conduct the psychophysical experiment. Difference of Mean Opinion Scores (DMOS) are used to validate objective quality prediction models. Subjective evaluations were made in the same equipment and viewing conditions. 23. Categorical Subjective Image Quality (CSIQ) Database [49, 50]. The CSIQ database is a database released by Computational Perception and Image Quality Lab in It consists of 30 original images, each is distorted using six different types of distortions at four to five different levels of distortion. CSIQ images are subjectively rated base on a linear displacement of the images across four calibrated LCD monitors placed side by side with equal viewing distance to the observer. The database contains 5000 subjective ratings from 35 different observers, and ratings are reported in the form of Difference of Mean Opinion Scores (DMOS). Subjective evaluations were made on pre-calibrated display and viewing distance is 70cm. 24. IRCCyN/IVC Image Quality Database (IVC) [51]. IVC database was proposed by the Institut de Recherche en Communications et Cybernétique de Nantes (IRCCyN) in It contains 10 reference images with four different type of distortions at five distortion strength. So totally 190 images were shown to 15 observers. All images are in BMP format at pixels resolution. Double Stimulus Impairment Scale is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. There is no viewing condition information introduced in [51]. 25. VCL@FER image quality assessment database [52, 53]. The VCL@FER database was proposed by Video Communications Laboratory at the University of Zagreb, Croatia in It contains 23 reference images with four different type of distortions at six distortion strength. So totally 552 images were shown to 118 observers. Single Stimulus Impairment Scale is using to conduct the psychophysical experiment. Difference of Mean Opinion Scores (DMOS) are used to validate objective quality prediction models. Subjective evaluations were made in a room without natural light with electric illumination. Monitor was pre-calibrated Privately owned image quality databases Five preliminary image quality databases were surveyed by us and all of them are color image quality databases. They have been proposed for different purposes such as evalu- 23

38 ate metrics that focus on image compression or gamut mapping. So the brief introduction of these databases will be given below. 26. Image Quality Database proposed by Callet et al. [54]. This database is used for evaluating a metric which aims at image coding assessment in [54]. There are 10 reference images in this database. Three different types of distortion with five distortion levels were applied to reference images. So totally 150 distorted images were shown to 14 observers. Double Stimulus Impairment Scale is using to conduct the psychophysical experiment. Mean Opinion Scores (MOS) are used to validate objective quality prediction models. Subjective evaluations were made in normalized conditions and viewing distance is six times of the screen height. 27. Image Quality Database proposed by Pedersen et al. [55]. This database is used for evaluating the metrics which can assess image printing quality in [55]. There are 24 reference images in this database. Each of reference image was printed with three different rendering intents. So totally 72 reproductions were shown to 15 observers. Rank order protocol is using to conduct the psychophysical experiment. Z-scores are used to validate objective quality prediction models. Subjective evaluations were made in a color laboratory under fixed viewing conditions and the viewing distance is about 60cm. 28. Image Quality Database proposed by Pedersen et al. [56]. This database is used for evaluating the metrics which is able to evaluate color printing quality in [56]. There are 10 reference images in this database. Each of reference image was printed with four different rendering models. So totally 40 reproductions were shown to 10 observers. Rank order protocol is using to conduct the psychophysical experiment. Z-scores are used to validate objective quality prediction models. Subjective evaluations were made in a color laboratory under fixed viewing conditions and the viewing distance is about 60cm. 29. Image Quality Database proposed by Simone et al. [57]. This database is used for evaluating the Gaussians based metrics which in relation to perceived compression artifacts in [57]. There are 10 reference images in this database. Four different distortion types with two distortion strength were applied to referenges. So totally 80 reproductions were shown to 10 observers. Category judgement protocol is using to conduct the psychophysical experiment. Z-scores are used to validate objective quality prediction models. Subjective evaluations were made in a color laboratory under fixed viewing conditions and the viewing distance is about 70cm. 30. Image Quality Database proposed by Cao et al. [58]. This database is used for evaluating the saliency models based metrics which can detect gamut-mapping artifact in [58]. There are 19 reference images in this database. Each reference image was gamut mapped. 12 observers were asked to mark the exact regions where they consider the artifacts occur. Subjective evaluations were made in a color laboratory under fixed viewing conditions. In this Section, 30 different existing image quality databases are surveyed. We introduced them from 11 separate aspects: released year, database type (color or gray), num- 24

39 Name Year Type No. Ref No. Dis No. D_ lvl No. img No. Obs Format Resolution Artif Mult Dis Validation Viewing conditions TID Color BMP Yes No MOS PC lab, via internet TID Color BMP Yes No MOS PC lab, via internet LIVE(Release2) 2006 Color BMP No No DMOS Same equipment and viewing conditions Toyama(MICT) 2008 Color BMP No No MOS Distance:4 image hight, Low illumination CPIQ/CSIQ 2010 Color PNG No No DMOS Distance:70cm, calibrated display CPIQ/DRIQ 2012 Color PNG No No DMOS Distance:60cm, calibrated display, dark room IRCCvN/IVC 2005 Color BMP No No MOS IRCCvN/DIBR 2011 Color BMP No No MOS ITU standard IRCCvN/MICT 2008 Color BMP No No MOS Distance:4 image hight, Low illumination WM/Enrico 2007 Gray BMP No No MOS Distance:6 screen hight WM/BA 2009 Gray PGM No No MOS Distance:6 screen hight WM/FSB 2009 Gray BMP No No MOS WM/MW 2009 Gray BMP No No MOS Distance:6 screen hight IRCCvN/3D 2008 Color BMP No No DMOS IRCCvN/ART 2009 Color PPM+JPG No No MOS Distance:6 screen hight VCL@FER 2011 Color No No DMOS Room with electric light,calibrated monitor VAIQ 2009 Color BMP No No Gaze points Low illumination,distance:60cm TUD Color BMP No No MOS TUD Color BMP No No MOS JPEGXR 2009 Color BMP No No MOS Calibrated monitor Distance=screen height HTI 2011 Color No No MOS Standard office,distace:70cm IBBI 2011 Color No No MOS Dark room,distance:60cm MMSP 3D 2010 Color JPG No No MOS Calibrated monitor Distance=screen height A Gray BMP No No Z-score WIQ 2009 Gray BMP No Yes MOS Dark room,distance:4 image height Notes: No. Ref: Number of reference image No. Dis: Number of distortion type No D_ lvl: Number of distortion level No. img: Number of distorted image No. Obs: Number of observer Artif: Whether contains artificial images Mult Dis: Whether uses multiple distortions : Not Specified Table 18: Existing public image quality databases. 25

40 Name Year Type No. Ref No. Dis No. D_ lvl No. img No. Obs Format Resolution Artif Mult Dis Validation Viewing conditions Callet et al. [54] 2003 Color No No MOS Normalized conditions,distance:6 screen height Pedersen et al. [55] 2010 Color No No Z-score Fix conditions,distance:60cm Pedersen et al. [56] 2011 Color No No Z-score Fix conditions,distance:60cm Simone et al. [57] 2010 Color No No Z-score Fix conditions,distance:70cm Cao et al. [58] 2010 Color No No Z-score Fix viewing conditions Notes: No. Ref: Number of reference image No. Dis: Number of distortion type No D_ lvl: Number of distortion level No. img: Number of distorted image No. Obs: Number of observer Artif: Whether contains artificial images Mult Dis: Whether uses multiple distortions : Not Specified Table 19: Existing preliminary image quality databases. 26

41 ber of reference image, number of distortion type, number of distortion level, number of distorted image, number of observer, image format, image resolution, experimental data validation method and viewing condition. After going to details of all databases we can easily analyze the advantages and disadvantages of them. Meanwhile we can also conclude the shortages in order to help us to know what is necessary to design a new image quality database. 3.2 Experimental images design Experimental images are the kernel part of an image quality database. So the selection or design of experimental images are a significant issue needed to be taken into account. Images may be used for different purposes therefor they may have different structures. In this Section we will state the experimental images design in following aspects: number of image; type of image; standard test images; analysis of images used in existing databases; and the principle of our image design Number of image Keelan et al. [59] proposed that at least three images have to be used in the experiment so that the relative quality values of Just Noticeable Difference (JND) can be included. They also state that at least six images have to be used in the experiment for Standard Quality Scale (SQS) so that the absolute quality values will be obtained. In ISO [60] it suggests that three or more than three images should be obtained in the experiment and it s better to obtain six or more than six images. CIE [20] reports that it is highly suggests to use a minimum of one of the obligatory test images designated by CIE with another minimum of three images for the assessment of gamut mapping algorithms. Field [61] recommends the number of test image should between five to 10 so that the full range of color and image quality factors can be assessed. On one hand, these recommendations can help us to design experimental images but on the other hand the number of image, also depend on other aspects, for example, the number of observers, the psychophysical protocol, and the expected accuracy of the results Type of image Typically there are two kinds of experimental images in the field of image quality assessment: pictorial images and research images [61]. Pictorial images are usually the best choice of many evaluation experiments, for instance, tone reproduction, saturation compression, sharpness, graininess, and overall picture quality assessment. Since observers can easily assess pictorial images so they are most widely used experimental images. However, the choice of pictorial images has to be very careful because the experiment result is highly depends on these test images. The main disadvantage of using pictorial images is that they make the measurements difficult to be consistently quantitative [61]. There are some features of experimental pictorial images should be considered in image selection [61]: Low, medium, and high level distribution of tones Low, medium, and high level of saturations for all important hues Subtle tone 27

Figure 9: MacBeth ColorChecker color rendition chart. Reproduced from http://www.digitalcamerareview.com, visited August 2013.

42 Figure 9: MacBeth ColorChecker color rendition chart. Reproduced from visited August Color transitions Fine details Sharp objects Large smooth areas Neutral grays and so on Pictorial images have been classified into two categories: consumer images or professional images by Field [61]. The consumer images should contain the subjects: Different ethnic of portraits Landscapes in different geographical locations and illuminations Pets Possessions (boats, buildings) Seasonal celebrations and so on and professional images should contain the subjects: Different ethnic of portraits Vehicles Food products Dresses Industrial and office equipment Travel-inspired landscapes and scenes Real estate Sporting goods and so on 28

Figure 10: Ski test image recommended by CIE guidelines for the evaluation of gamut mapping algorithms. Research images are specially designed images for a specific issue.

43 Figure 10: Ski test image recommended by CIE guidelines for the evaluation of gamut mapping algorithms. Research images are specially designed images for a specific issue. This kind of images are artificially-based unreal images such as the famous research image "Macbeth ColorChecker Color Rendition Chart" [62] (Figure 9). The Macbeth ColorChecker Color Rendition Chart is well known because it has been widely used in color reproduction issues. It has two superiorities compared to pictorial images: it s content free and it contains some areas that can be read by measurement instruments. A lot of of guidelines pertaining the features of experimental images were stated in [20] and [61], we refer the reader to them for more detailed information Standard test images CIE gamut mapping test image The CIE Guidelines for the evaluation of gamut mapping algorithms [20] recommends that experimenters should choose test images based on the criteria that from their studies in gamut mapping experiments. Test image sets must contain the one obligatory test image that is specified by these criteria and minimum three other test images selected by the experimenter. In [20] suggests that it is obligatory to use a rendition of the Ski image (Figure 10) as one of the test images. The use of this image has been made freely available courtesy of its copyright holders, Fujifilm Electronic Imaging Limited, and they need to be acknowledged as its source in any publication referring to this test image. ISO test images The international standard ISO Graphic technology-prepress digital data exchange [63] proposed several sets of standard test images for different purposes. ISO CMYK standard colour image data (CMYK/SCID) The test images in ISO [64] consists eight natural (photographed) images (Figure 11 (a)) and ten synthetic images (Figure 11 (b)) created digitally by a computer. The characteristics and typical usage of the natural images are shown in Table 20. Synthetic images are resolution charts. They are used to evaluate the resolving power of output devices, registration accuracy of separations, moire and aliasing effects. All these images are in CMYK color space. 29

(a) The eight natural test images in ISO 12640-1 (b) The ten synthetic test images in ISO 12640-1 Figure 11: Standard test images from ISO 12640-1 data set [64].

44 (a) The eight natural test images in ISO (b) The ten synthetic test images in ISO Figure 11: Standard test images from ISO data set [64]. Name Aspect Characteristics Portrait Portrait Closeup image of model used to evaluate the reproduction of Human skin tones. Cafeteria Portrait Image with complicated geometric shapes. Suitable for evaluating the result of image procession Fruit Basket Landscape Image of a fruit basket, cloth and wood used to evaluate the reproduction of brown clours and fine texture. Wine and Landscape Image of glassware and silverware used to evaluate the reproduction Tableware characteristics of highlight tones and neutral clours. Image of a (penny-farthing) bicycle, resolution charts and other items Bicycle Portrait containing fine detail used to evaluate the sharpness of reproduction and the results of image processing. Orchid Landscape Image of an orchid with background vingettes used to evaluate reproduction of highlight and shadow vignettes. Musicians Landscape Image of three women used to evaluate the reproduction of different skin tones and fine image detail. Candle Landscape "Low-key" image of a room scene containing misccllancous objects used to evaluate dark colors, particularly brown and greens. Table 20: Characteristics and typica1 usage of the natural images from ISO

(a) The eight natural test images in ISO 12640-2 (b) The seven synthetic test images in ISO 12640-1 Figure 12: Standard test images from ISO 12640-2 data set [65].

45 (a) The eight natural test images in ISO (b) The seven synthetic test images in ISO Figure 12: Standard test images from ISO data set [65]. Name Aspect Characteristics Woman with Portrait Close-up image of a woman with a glass; suitable for evaluating glass the reproduction of human skin tones. Flowers Landscape Useful for assessing tonal reproduction of highlight tones and contouring in dark tones. Fishing goods Portrait Low-key image of fishing goods; suitable for evaluating image sharpness. Image obtained by photographing a collection of Japanese Japanese goods Landscape traditional handicrafts, including many highly saturated colors; suitable for evaluating color reproduction capabilities. Field fire Landscape Useful for evaluating the accuracy of color reproduction for delicate colors. Pier Landscape Image with complicated geometric shapes; suitable for evaluating the results of image processing. Threads Landscape Image of woolen yarn, color pencils and ribbons; suitable for evaluating the color gamut of devices. Silver Portrait Image of silverware; suitable for evaluating the tone reproduction of greys, as well as the reproduction of the lustrous appearance of metallic objects. Table 21: Characteristics and typica1 usage of the natural images from ISO ISO XYZ/sRGB encoded standard color image data (XYZ/SCID) The test images in ISO [65] contains eight natural (photographed) images (Figure 12 (a)) and seven synthetic images (Figure 12 (b)) created digitally on a computer. The characteristics and typical usage for the natural images are provided in Table 21. The synthetic images consist of computer graphics images, a business graph, a color chart and a series of color vignettes. This set of test images can be widely used for evaluating the color reproduction capability of imaging systems and output devices; evaluating the coding technologies necessary for the storage and transmission of high-definition image data, etc. ISO CIELAB standard colour image data (CIELAB/SCID) The test images in ISO [66] consists of eight natural (photographed) images (Figure 13) and ten synthetic images created digitally by a computer. The description and typical usage of the natural images are given in Table 22. The synthetic images consist of eight color charts consisting of various patches and two color vignettes. The purpose of this test image set is to provide a data set with a large color gamut related to illuminant D50. The bit depth of the natural images is 16 bits per channel, while the color charts and vignettes are 8 bits per channel. 31

Figure 13: The eight natural test images in ISO 12640-3. Name Aspect Characteristics Bride and Horizontal Image of a bride wearing white clothes and groom wearing black clothes.

46 Figure 13: The eight natural test images in ISO Name Aspect Characteristics Bride and Horizontal Image of a bride wearing white clothes and groom wearing black clothes. groom Used to evaluate the rendering of human skin tones and neutral colors, especially highlight and shadows. Image consisting of five people wearing colorful clothes, sitting on a dark leather People Horizontal couch. Used to evaluate the color rendering of extremely colorful objects in the presence of skin tones and neutrals. Cashew nuts Vertical Image of dried friutes and filled containers used to evaluate tonal and color rendering, in particular adjustments for grey component replacement. Meal Horizontal Image with widely recognizable cooked food and pastel colors. Used to evaluate high-key tonal rendering and food memory colors. Mandolin Vertical Image of goods, including metallic objects, used to evaluate the reproduction of colors, as well as the reproduction of the lustrous appearances of metallic objects. Tailor scene Horizontal Still-life image of textile used to evaluate the tone reproduction of a range of neutrals and textile structures (object moire). Wool Horizontal Image of different colored balls of wool used to evaluate the reproduction of details in highly chromatic areas. Image of a range of fruits and vegetables. The memory colors of strawberries, oranges, Fruits Square lemons, green grapes, apples, pears, tomatoes and bell peppers are particularly suitable for the evaluation of the naturalness of color re-rendering processes. Table 22: Characteristics and typica1 usage of the natural images from ISO ISO Wide gamut display-referred standard colour image data [Adobe RGB (1998)/S- CID] The test images in ISO [67] consists 14 natural (photographed) images (Figure 14 (a)) and two synthetic images (Figure 14 (b)) created digitally by a computer. The characteristics and typical usage of the natural images are shown in Table 23. The synthetic images consist of one color chart with various patches, and one color vignette. This set of test images are mainly designed to be used on systems utilizing Adobe RGB as the reference encoding, and as such are mainly applicable to the professional market and those systems for which the wide gamut colour monitor is the hub device. Kodak Lossless True Color Image Suite Kodak recommended test images called Kodak Lossless True Color Image Suite [46]. It contains 24 images (Figure 15), 22 of them are outdoor images and 2 of them are studio images. they have been released by the Eastman Kodak Company for unrestricted usage. This set is most commonly used standard test images for evaluating image quality in a wide variety of purposes. Some image quality databases also use it as reference image set. But images in Kodak image suite are scanned images from negative pictures since digital images was a new concept. So the quality of these images are not suitable for CID:IQ database Analysis of images used in existing databases Overview of source images used in public databases Before we analyze the characteristics of source content in image databases we first summarize the source images used in public databases. Images used in TID2013 [47, 48] and TID2008 [1, 2, 3] are almost the same. 24 out of 25 images are from Kodak Lossless True Color Image Suite [46] and the last one is an artificial image which synthesized by 32

(a) The 14 natural test images in ISO 12640-4 (b) The two synthetic test images in ISO 12640-4 Figure 14: Standard test images from ISO 12640-4 data set [67].

47 (a) The 14 natural test images in ISO (b) The two synthetic test images in ISO Figure 14: Standard test images from ISO data set [67]. Name Aspect Characteristics Crayons Horizontal Picture of crayons with high saturation colors; useful for checking edge of gamut reproduction. Flowers Vertical Useful for assessing tonal reproduction of highlight tones and saturated reds. Yarn Horizontal Image of yarn, wool and thread suitable for evaluating the color gamut of devices, texture and fine detail reproduction. Fishing Vertical Fishing goods with fine detail, suitable for evaluating image sharpness and reproduction of detail. Vases Horizontal Picture of transparent and semi-transparent vases, suitable for evaluating the reproduction of smooth highlight tones. Leaves Horizontal Useful in evaluating the reproduction of subtle tonal variation in the leaves and of shadow detail in the dark brown of the trunks of the trees. Borabora Horizontal Landscape image; suitable for the evaluation of the reproduction of deep blue and green colors with subtle tonal variation. Sunflower Horizontal Field of sunflowers with memory colors of sky, trees and grass; suitable for evaluating the reproduction of natural scenes. Bride Vertical Close-up image to evaluate the reproduction of human skin tones. Walkathon Vertical Image of children in walking gear with bright balloons can be used to check the reproduction of images that include saturated colors and skin tones. Spoon Horizontal Image of silverware to evaluate the reproduction characteristics of highlight tones and neutral colors. Violin Vertical Low-key image of a room scene containing miscellaneous objects to evaluate dark colors, particularly browns. Glass Horizontal Image of glassware to evaluate the reproduction characteristics of highlight tones, shadow tones and neutral colors. Beach Horizontal Image of sunny beach shot from shade of trees can be used to evaluate the reproduction of images having a high dynamic range. Table 23: Characteristics and typica1 usage of the natural images from ISO

Figure 15: Kodak Lossless True Color Image Suite [46]. the authors (Figure 16 and Figure 17). Images used in LIVE (Release2) [4, 5, 6] are all from the CD "Austin and Vicinity" by Visual Delights Inc.

48 Figure 15: Kodak Lossless True Color Image Suite [46]. the authors (Figure 16 and Figure 17). Images used in LIVE (Release2) [4, 5, 6] are all from the CD "Austin and Vicinity" by Visual Delights Inc. ( These images were modified from the original (resized) and then distorted to obtain images in the database (Figure 18). Toyama (MICT) [7] dataabse also used the source images from Kodak Lossless True Color Image Suite [46] but only selected 14 of them. All reference images in CSIQ [49, 50] database are copyrighted by the authors. There are 5 different scenes: animals, landscape, people, plants, and urban (Figure 19). 10 reference images are selected in IRCCyN/IVC [51] database. They are "Avion", "Barba", "Boats", "Clown", "Fruit", "House", "Isabe", "Lena", "mandr", and "Pimen" which are commonly used as reference image (Figure 20). In VCL@FER [52, 53] database there are 23 reference which copyrighted by the authors (Figure 21). Three natural images were used as reference images in grayscale image quality database A57 [28, 29]. They are "Horse", "Harbor", and "Baby" (Figure 22). Another grayscale image quality database WIQ [26, 27] deployed 7 widely adopted reference images "Barbara", "Elaine", "Goldhill", "Lena", "Mandrill", "Pepper", and "Tiffany" as their source images. Analysis of source images in public image quality databases Analyzing source images in different image quality databases using the same criterion can help us to decide what kind of images are suited for our database. So that we can develop our own principle of experimental images design. Furthermore, the analysis provide information to other researchers who can make a best choice of which databases are the most suitable for their particular benchmarking or some additional requirements. Winkler proposed to analyse two parameters in [68]: Spatial information (SI) and Colorfulness (CF) in order to characterize the source images in different databases with space and color. In [68] Winkler defined SI and CF as follow: Spatial information (SI) represents edge energy. If we define si h as images which applied horizontal Sobel kernels filter and si v as images which applied vertical Sobel kernels filter. So em = si 2 h + si2 v represent the edge mangnitude at each pixel. Then 34

49 Figure 16: Reference images (1-24) in TID2008 using the KODAK test set and the 25th reference image was synthesized by the authors. Figure 17: The synthesized image in TID2013. Figure 18: Reference images used in LIVE image quality database. 35

50 Figure 19: Reference images in CSIQ database have five different categories. Figure 20: Reference images in IVC database. Figure 21: Reference images in VCL database. 36

51 Figure 22: Reference images in A57 database. CSIQ IVC(I) IVC 3D IVC Art SI LIVE(I) MICT MMSP 3D(I) TID SI CF CF CF CF Figure 23: SI CF for color image databases. Image reproduced from Winkler [68]. SI is the root mean square of the edge magnitude in image: SI = vr/1080 em2 /p (3.1) where p is the image pixel number and vr/1080 (vr is vertical resolution of the image) is normalization factor. Since SI is calculated in grayscale so RGB image can use the conversion equation: SI c = 0.229R G B (3.2) Colorfulness (CF) represents the intensity and assortment of colors in the image. We define c1 = R G and c2 = 0.5(R + G) B as a new straightforward opponent color space, therefor CF is [69]: CF = α 2c1 + α2c β 2 c1 + β2 c2 (3.3) In [68] 8 image quality databases were selected to compute the SI and CF information for all single reference image. The result is in Figure 23. From the result we can see that most of the databases only have the source content which distribute in the center of the plot. It means the range of SI and CF in all reference images are too narrow to represent the real world scene. The ideal resource images should be either scattered on the edge of the plot or distributed in the center. The SI CF plot is a very important indicator for experimental image design and selection. In order to evaluate how well the space of reference images is included by one image quality database Winkler proposed to calculate the range of source characteristic K i where 37

52 Figure 24: Relative ranges R i of source characteristics SI and CF in image quality databases. Image reproduced from Winkler [68]. K 1 = SI, K 2 = CF. The definition of K i is: where K max i R i = max(k i) min(k i ) K max i (3.4) is the maximum value of SI or CF, here Winkler used K max 1 = 150, K max 2 = 100. So the range of source characteristic is between 0 and 1. The result of this criteria is plotted in Figure 24. From the result we can see that most databases have very week performance in terms of both criteria. We can also use these criteria to evaluate the source images in CID:IQ database so that we can select better images compare to existing image quality databases. Orfanidou et al. [70] proposed to analyze the busyness of the scene. Busyness here defined as the image property indicating the presence or absence of details in an image [70]. Because fine detail is a very important property of an image, we need to have reference images which have a wide range of busyness values. The method to calculate busyness is using a simple image segmentation algorithm. This method contains four main stages: Using Sobel edge detector in both horizontal and vertical orientations of the image by applying a threshold of 0.04 in order to calcilate the gradient image in CIELAB L* channel. (Figure 25 (b)) Using flat linear structuring elements to calculate the dilation of the binary image in order to amplify the detail. (Figure 25 (c)) Filling the holes in the dilated image by using a flood filling operation. (Figure 25 (d)) Getting rid of spurious noise by eroding the binary image. (Figure 25 (e)) The busyness value is computed from the ratio of the number of white pixels in the threshold image to the total number of pixels. Simply put, the busyness value is the percentage of white pixels in the image (Figure 25 (e)). For the original image in Figure 25 (a), the busyness value is 0.4. We will also use busyness as an indicator to evaluate our reference images. 38

(a) Original image (b) Binary gradient mask (c) Dilated edge (d) Holes filled (e) Eroded (final) image Figure 25: Stages of the image segmentation process used to calculated image busyness. 3.

all distortions have been used in image quality databases; lastly we survey other distortions in color image quality issues (e.g. image acquisition, displays, color appearance... ). 3.

The selection of these metrics is based on these criteria: all metrics we surveyed are developed after year 2000; the metric should either a metric focus on color issues or a metric take into account

53 (a) Original image (b) Binary gradient mask (c) Dilated edge (d) Holes filled (e) Eroded (final) image Figure 25: Stages of the image segmentation process used to calculated image busyness. 3.3 Survey of distortions In this Section we survey existing distortions from three different aspects: firstly we explore distortions have been tested in image quality metrics; secondly we summarize all distortions have been used in image quality databases; lastly we survey other distortions in color image quality issues (e.g. image acquisition, displays, color appearance... ) Distortions in image quality metrics A total of 17 image quality metrics have been explored in order to find out how many different kinds of distortions have been tested in these metrics. The selection of these metrics is based on these criteria: all metrics we surveyed are developed after year 2000; the metric should either a metric focus on color issues or a metric take into account HVS and for color images respectively. Pedersen et al. [71] give an overview of all image quality metrics, we select 17 metrics from it according to our criteria. The details of the summarized distortions from selected metrics are enumerated in the Table Distortions in image quality databases From the survey of existing image quality databases Subsection we can know that there are a wide variety of distortions used in different databases. Here we will state the distortions have been concluded from 6 color and grayscale image quality databases which are most used and often used to compare with others: TID2013 [47, 48] (TID2013 include all distortions introduced in TID2008), LIVE (release2) [4, 5, 6], Toyama (MICT) [7], CSIQ [49, 50], IECCyN/IVC [51], VCL@FER [52, 53], A57 [28, 29], and WIQ [26, 27]. The details of the concluded distortions from selected databases are enumerated in the Table Other distortions There are a lot of distortions that have been discovered in some image processing procedures, for instance, distortions in image acquisition, displays, color appearance and so on. In this Subsection, a table of distortions discovered in other image quality issues will be presented. The details of the surveyed distortions from other quality issues are enumerated in the Table Classification of distortions All distortions stated above are the alteration of image quality attributes. Some of attributes are always present in images such as contrast, sharpness, lightness and so on. But some of attributes are not always present in image, also called artifacts (e.g. noise, 39

54 Metric Author(s) Distortions P-CIELAB Chou et al. [72] JND contaminated distrotion M-DWT Gayle et al. [73] JPEG, JPEG2000, blur, noise, sharpening, DC-shift Q color Toet et al. [74] Quantization UIQ Wang et al. [75] Impulsive salt-pepper noise, additive Gaussian noise, multiplicative speckle noise, mean shift, contrast stretching, blurring, and JPEG compression SDOG-CIELAB & SDOG-DEE Ajagamelle et al. [76] Luminance changes, JPEG, JPEG2000, contrast changes, lightness changes, and saturation changes SMGMAD Cao et al. [58] Gamut Mapping Artifacts (Hue shifts Gamut expansion Haloing) SCIELABJ Johnson et al. [77] Sharpness MDOG-DEE Simone et al. [57] JPEG and JPEG2000 SSIMIPT Bonnier et al. [78] Gamut Mapping Artifacts (Hue shifts Gamut expansion Haloing) FSIMC Zhang et al. [79] Additive Gaussian noise, spatially correlated noise, image denoising, JPEG2000, and JPEG transformation errors mpsnr Munkberg et al. [80] High dynamic range compression SHAME & SHAME-II Pedersen et al. [81] Gamut Mapping Artifacts and lightness changes SDCGM Kimmel et al. [82] Gamut Mapping Artifacts WCWSSIM Brooks et al. [83] JPEG, blur, mean shift, contrast changes, rotation, zoon in, spatial shift, white Gaussian noise ABF Wang et al. [84] Lightness changes, chroma changes, hue changes, compression artifacts, noise, and sharpness Table 24: Distortions tested in image quality metrics. 40

55 Database TID2013 LIVE (release2) Toyama (MICT) CSIQ IRCCyN/IVC A57 WIQ Distortions Gaussian noise, additive white Faussian noise, additive Gaussian spatially correlated noise masked noise, high frequency noise, impulse noise, quantization noise, Gaussian blur image denoising (residual noise), JPEG, JPEG2000, JPEG transmission errors JPEG2000 transmission errors, non-eccentricity pattern noise, local block-wise distortions mean shift, contrast changes, change of color saturation, multiplicative Gaussian noise comfort noise, lossy compression of noisy image, image color quantization with dither chromatic aberrations, sparse sampling and reconstruction JPEG2000, JPEG, white noise, Gaussian blur, fast fading rayleigh JPEG, JPEG2000 JPEG, JPEG2000, global contrast decrements, additive pink Gaussian noise additive white Gaussian noise, Gaussian blur JPEG, JPEG2000, LAR coding, and blurring Average white Gaussian noise, Gaussian blur, JPEG, and JPEG2000 Additive Gaussian white noise, JPEG, JPEG2000, Gaussian blur JPEG-2000 with Dynamic Contrast-Based Quantization Blocking, ringing, block intensity shift, blurring, noise Table 25: Distortions used in image quality databases. Issue Image acquisition Displays Color appearance Distortions Spherical aberration, astigmatism, coma, field curvature, pincushion distortion Barrel distortion, chromatic aberration, camera flare, demosaicking, stochastic noise color shifts, aliasing effects, blurring, artifacts, moire, stochastic color noise Brightness changes, contrast changes, black level changes, color rendering, gamut mapping Lightness changes, brightness changes, colorfulness changes, chroma changes saturation changes, hue changes Table 26: Distortions discovered in other image quality issues. compression artifacts). For the first category of attributes, there is an optimal value which gives the best image quality (Figure 26 (a)). No matter it this value increases or decreases the image quality always reduces. For the latter category of attributes, the less artifacts in the image the better image quality will be reflected (Figure 26 (b)). Therefor we classify distortions into two main classifications: attributes always present and attributes not always present. We divide the first classification into four sub-classifications: contrast, sharpness, lightness, and color. In the second classification there are four sub-groups: noise, compression, image acquisition, and other artifacts. The details of classification of dstortions are shown in Figure 27. One fact has to be stated that there are not only one classify foundation but here we classify distortions by their attrubutes in image quality. Even some of the distortions in artifact group, they are also highly color related attributes such as: gamut mapping artifact. 41

56 Max Image quality Max Image quality Optimal value Attributes Artifacts Max (a) Image quality attributes always present (b) Artifacts Figure 26: Difference between artifacts and quality attributes always present. Quality attributes always present in images Distortions Quality attributes not always present in images Contrast Sharpness Lightness Color Contrast stretching Local contrast change High contrast transitional error Global contrast decrement Sharpness change Blurring Gaussian blur Lightness change Intensity shift Luminance change Darkening Block intensity shift Local block-wise distortions of different intensity Mean shift Scale change Gaussian noise, Additive Gaussian noise Additive Gaussian spatially correlated noise Additive Gaussian pink noise Additive Gaussian white noise Average white Gaussian noise Multiplicative Gaussian noise White noise, Impulse noise Impulsive salt-pepper noise Multiplicative speckle noise Residual noise, High frequency noise Masked noise, Comfort noise Quantization noise Non-eccentricity pattern noise JPEG, JPEG2000 Bit error in JPEG2000 JPEG2000 transmission error JPEG transmission error DC coefficient loss, LAR coding Lossy compression of noisy images High dynamic range compression Camera flare, Coma, Ringing, Barrel Blooming, Aliasing effects, Moire Demosaiking, Astigmatism aberration Spherical aberration, Pincushion Field curvature, Grid pattern Chromatic aberration Noise Artifacts Compression Image acquisition Other artifact Hue change Color shift Color saturation change Chroma change Rotation, Blockiness, Edge impairment Halo, Worms in error diffusion halftoning Halftoning artifact, Blocking, Spatial shift Image color quantization with dither Block spatial shift, JND contaminated Temporal replacement concealment DCT coefficient quantization Spatial interpolation concealment Sparse sampling and reconstruction Gamut mapping Figure 27: Distortion classification. 42

57 4 The CID:IQ database 4.1 Reference images 23 pictorial images are selected as the reference images in the CID:IQ database (Figure 28). The resolution of all images is 800 pixels by 800 pixels. The reason of selecting this image resolution is, the resolution of the monitor we used to conduct psychophysical experiment is , in order to be able to display two images on the screen simultaneously, the maximum width of the image should be less than 960 pixels. By taking into account the area immediately surrounding the displayed image and its border, we decide the width as 800 pixels. In addition, some of the image quality metrics process images block by block and the shape of the block is squared. By this reason, finally we set the resolution of reference image as 800 pixels by 800 pixels. In order to verify whether these reference images are in line with the principle described in Section 3.2.2, both subjective and objective evaluation methods are used. A full reference images statistics which include original images, image RGB histogram and image gamut in three different views are given in Appendix A Subjective method for reference images evaluation A table contains all features concluded in Section is generated to check if the reference images have these attributes (Table 27). These attributes are: hue, saturation, lightness, contrast, memory colors and others. In the hue attribute there are seven elements: red, green, blue, cyan, magenta, yellow, black and no specific hue. For saturation, lightness and contrast, three levels of intensity are stated: low, medium and high. Skin color, sky-blue and grass-green are three features in memory colors and in skin colors include black, caucasian and asian. Some of the attributes cannot fall into any of these normal categories we defined them as attributes for other purposes: large area of the same color; neutral gray; color transition; fine detail; and text. The numbers in this table represent the distribution of single feature in each reference image. The Sum values (last row in the table) shows that most of features have been contained in the reference images except the Black skin color. In the Hue feature the values of Cyan and Magenta color are less than others because these two color are most commonly used in printing system but not in the nature world Objective method for reference images evaluation Three objective indicators: spatial information, colorfulness and busyness were introduced in Section We will use them to evaluate the reference images in the CID:IQ database and compare the results with the other existing databases. The spatial information versus colorfulness of the CID:IQ database and the busyness are plotted in Figure 29 separately. In Figure 29 (a) the X axis represents the spatial information and the Y axis represents the colorfulness, the number beside each point is the reference image number. The X axis in Figure 29 (b) shows the busyness values in different range and the value on Y axis is the quantity of the images. Another 8 image quality databases are selected to compare to the CID:IQ database. This selection is based on the selection in [68]. From 43

58 Features Hue Saturation Lightness Contrast Skin Image Number Red Green Blue Cyan Magenta Yellow Black No specific hue Low Medium High Low Medium High Low Medium High Black Caucasian Asian Memory colors Other purposes Sky-blue Grass-green Large area of the same color Neutral gray Color transition Fine detail Text Sum Table 27: Table of features for subjective reference images evaluation. 44

59 Figure 28: Reference images in the CID:IQ database. Figure 30 can be seen that the points spread from the CID:IQ database is wider than the others. The CID:IQ database has images with high/low spatial information versus low/high colorfulness values. The images are scattered both near the edge and in the center of the plot. This means the reference images in the CID:IQ database can better represent the real world scene. As can also be seen from Figure 31, the distribution of the busyness values in the CID:IQ database covers more ranges than the others. The busyness in the CID:IQ database includes the range from 0.2 to 0.3 and 0.5 to 1, but most of existing databases only cover the range from 0.7 to 1. Meaning the CID:IQ database has both fine detail images and images with low sharpness. In conclusion, our CID:IQ database has significant superiority in reference images selection compared to most of existing image quality databases. 4.2 Distortions Four categories within six different distortions are used in the CID:IQ database. The categories are: compression artifacts, noise, blurring and gamut mapping artifacts. In the compression artifacts category, both JPEG [85] and JPEG2000 [86] compression standards are selected; in gamut mapping artifacts category, constant hue minimum delta E and SGCK gamut mapping methods [20] are used. All reference images are applied these distortions in five levels (intensities) from low degree of quality degradation to high degree of quality degradation. In this Section, the brief introduction of distortions, the arguments of the distortions selection and distortions implementation will be given respectively. 45

60 CID:IQ Colorfulness Spatial Information (a) SI vs. CF for the CID:IQ image quality database 8 CID:IQ Image quantity Busyness (b) Busyness distribution of the CID:IQ database Figure 29: Results of objective reference images evaluation. 46

61 CSIQ IVC IVCART Colorfulness Colorfulness Colorfulness Spatial Information (a) CSIQ database JPEGXR Spatial Information (b) IVC database LIVE Spatial Information (c) IVC ART database MICT Colorfulness Colorfulness Colorfulness Spatial Information (d) JPEGXR database CID:IQ Spatial Information (e) LIVE database TID Spatial Information (f) MICT database VCLFER Colorfulness Colorfulness Colorfulness Spatial Information (g) CID:IQ database Spatial Information (h) TID2013 database Spatial Information (i) VCL@FER database Figure 30: Comparison of SI vs. CF results between 9 image quality databases. 47

62 20 CSIQ 7 IVC 6 IVC ART Image quantity Image quantity Image quantity Busyness Busyness Busyness (a) CSIQ database (b) IVC database (c) IVC ART database 6 JPEG XR 25 LIVE 10 MICT Image quantity Image quantity Image quantity Busyness Busyness Busyness (d) JPEGXR database (e) LIVE database (f) MICT database 8 CID:IQ 15 TID VCL@FER Image quantity Image quantity 10 5 Image quantity Busyness Busyness Busyness (g) CID:IQ database (h) TID2013 database (i) VCL@FER database Figure 31: Comparison of Busyness results between 9 image quality databases. 48

63 (a) Original image (b) JPEG compressed image (very low bit rate) Figure 32: Comparison between the original image and JPEG compressed image Compression The purpose of image compression is to remove the redundant information from the image so that the storage and transmission of the image data can be efficient. There are two kinds of compression: lossy and lossless. Lossless compression is often used for medical imaging, technical drawings, clip art or comics because archival purposes are needed in these fields. Lossy compression methods are most commonly used for natural images for example photographs in applications where minor loss of information is acceptable to reduce a big amount of image size while the removed information sometimes is imperceptible by human eyes. But when low bit rates are used in lossy compression methods, the visible compression artifacts can not be avoid to introduce into images. So in this study we will only pay attention to the distortions introduced by lossy compression methods. JPEG compression standard The name "JPEG" stands for Joint Photographic Experts Group [87], the name of the committee that created the JPEG standard and also other still picture coding standards. JPEG [85] is a commonly used method of lossy compression standard for digital images. The JPEG standard specifies the codec, which defines how an image is compressed into a stream of bytes and decompressed back into an image, but not the file format used to contain that stream [88]. The level of compression can be controlled to allow a selectable tradeoff between the image quality and image storage size. A typical JPEG compression process will intorduce little perceptible loss in image quality and the compressed file type (seen most often with the.jpg extension) is the one usually produced in digital photography. An example of JPEG compressed imgae (very low bit rate) compare to the original is shown in Figure 32. From the figure we can see that JPEG compression produces ringing artifacts and blocking artifacts in the image. Since JPEG compression is the most commonly used standard in digital imaging workflow such as image acquisition, image storage, image transmission and so on. It is necessary to obtain this distortion type in the CID:IQ database. In addition, most of existing image quality databases have JPEG compression artifacts as their distortion, if same 49

(a) Original image (b) JPEG2000 compressed image (very low bit rate) Figure 33: Comparison between the original image and JPEG2000 compressed image.

64 (a) Original image (b) JPEG2000 compressed image (very low bit rate) Figure 33: Comparison between the original image and JPEG2000 compressed image. compression bit rates are used in the CID:IQ database as the others, the psychophysical experiment results and the correlation between subjective and objective results are worth to be compared. In order to apply the distortion on the reference images, we used the same resource as used in JPEGXR database [39, 40]. The resource was downloaded from Independent JPEG Group homepage [89] and we used the latest version which was released on 13/01/2013. The resource was configured on Ubuntu LTS operating system and the command line is: cjpeg - baseline - quality q - outfile output. jpg input. bmp The command -baseline is used to force baseline-compatible quantization tables to be generated. This clamps quantization values to 8 bits even at low quality settings. Command -quality represent the bit rate, the value q here has a range from 0 (worst image quality) to 100 (best image quality). In the CID:IQ database we used q as 70, 50, 30, 20, 10 to represent 5 distortion levels. JPEG2000 compression standard JPEG2000 [86] is also an image compression standard and created by the Joint Photographic Experts Group [87] committee in Compared to the original standard JPEG which based on discrete cosine transform, the newly designed JPEG2000 is based on wavelet transform. JPEG2000 compression standard has a significant better performance than JPEG standard. At high bit rates, artifacts become nearly imperceptible, JPEG2000 has a small machine-measured fidelity advantage over JPEG. At lower bit rates, JPEG2000 has a advantage over certain modes of JPEG: artifacts are less visible and there is almost no blocking. In terms of visual artifacts: JPEG2000 produces ringing artifacts, manifested as blur and rings near edges in the image, while JPEG produces ringing artifacts and blocking artifacts. An example of JPEG2000 compressed image (very low bit rate) compared to the original is shown in Figure 33. As the same arguments as discussed in JPEG standard, JPEG2000 is also a valuable distortion type should be included in the CID:IQ database especially visual artifacts in JPEG2000 are quite different from JPEG. 50

65 The approach to use JPEG2000 standard compress the reference images, we still used the same method as JPEGXR database [39, 40]. The software is named "Kakadu" and can be download from its homepage [90]. The version we used it the current version The software was configured on Windows 8 Professional operating system and the command line is: kdu_compress. exe -i input. bmp -o output. j2c - rate r The command -i defines original image while -o means compressed image. Command -rate represent the compression bit rate for JPEG2000 standard. In the CID:IQ database 5 levels of bit rates are selected from high to low: 0.9, 0.6, 0.4, 0.3, Noise Image noise means random variation of color information or brightness in images, it is not exist in the object imaged. Image noise can be produced by the sensor and digital camera, so it is considered as an aspect of electronic noise. The quantity of image noise can be almost invisible specks on a digital image captured in good light condition and can also be optical or radioastronomical images that very limited information can be recognized because the whole image is covered by the noise. There are a wide variety of image noises such as Amplifier noise (Gaussian noise), Salt-and-pepper noise, Shot noise, Quantization noise (uniform noise), Film grain, Anisotropic noise and so on. From Figure 27 we can see that over 15 different noise types have been used in existing image quality studies so there is a need to have one noise distortion type in the CID:IQ database. The term shot noise (also known as Poisson noise) is a type of electronic noise which originates from the discrete nature of electric charge. The term also applies to photon counting in optical devices, where shot noise is associated with the particle nature of light. The movement of discrete photons which compose the light and electric causes Poisson noise. If light (consists of a stream of discrete photons) coming out of a laser pointer and creating a visible spot on a wall. The fundamental physical processes that govern light emission are such that these photons are emitted from the laser at random times; but the many billions of photons needed to create a spot are so many that the brightness, the number of photons per unit time, varies only infinitesimally with time. However, if the laser brightness is reduced until only a handful of photons hit the wall every second, the relative fluctuations in number of photons, i.e., brightness, will be significant, just as when tossing a coin a few times. These fluctuations are Poisson noise. Several standard text on digital processing have discussed different noises which spread in the degradation of the digital image and mentioned the importance of the Poisson noise in image acquisition [91, 92]. Some studies about monochrome CCD (chargecoupled device) noise have shown a definite poisson relationship between pixel levels and noise variance [93, 94]. Recent experiments proved that the photon Poisson noise is the dominant contributor to uncertainty in the raw data captured by high-performance sensors using in color digital cameras [95]. According these arguments, poisson noise is recommended to be included in the CID:IQ image quality database. An example of speckle noise image (very high magnitude) compared to the original is shown in Figure 34. The method used to add Poisson noise on the reference images is very simple by using MATLAB (a high-level language and interactive environment for numerical computation, visualization, and programming). There is an existing function named imnoise designed 51

66 (a) Original image (b) Poisson noise image (very high magnitude) Figure 34: Comparison between the original image and Poisson noise image. for this purpose. The version of MATLAB we used is R2013a which configured on Windows 8 Professional operating system. The command line is: temp = imnoise ( input, ' poisson '); noise = temp - input ; output = input +l* noise ; The parameter l here represents the magnitude of Poisson noise added. Five values of l were used to have five different distortion levels: 0.5, 1, 1.5, 2, 2.5 (from good quality to bad quality) Blurring Blurring is a common distortion presents in the images. A Gaussian blur (or Gaussian smoothing) is the result of blurring an image by a Gaussian function [96]. Gaussian blur is commonly used in graphic application especially used for reducing image noise and detail. The visual effect of gaussian blurring is a smooth blur similar to viewing the image through a translucent screen. It is different from the effect causes by defocus aberration in optics. Gaussian blur is considered in the CID:IQ database because it is an important distortion type often presented in practical fields, for example, Gaussian blurring is commonly used with edge detection. Most edge-detection algorithms are sensitive to noise. In order to enhance the result of the edge-detection algorithm, an usual way is using Gaussian blurring before edge detection so that the level of noise in the image can be reduced. In addition, Gaussian blur is frequently obtained in studies dealing with visual quality metrics [5] and in existing image quality databases (see Table 25). An example of Gaussian blurred image (very high level) compared to the original is shown in Figure 35. The method used to blur the reference images is implemented by using MATLAB. The version of MATLAB we used is R2013a which configured on Windows 8 Professional operating system. The command lines are: filter = fspecial ( ' gaussian ',[5 5], l ); output = imfilter ( input, filter ); The first line is producing the Gaussian blurring filter which has a 5 5 pixels size and 52

67 (a) Original image (b) Gaussian blurred image (very high level) Figure 35: Comparison between the original image and Gaussian blurred image. the level is controlled by the parameter l. The second line is applying the filter on the image and blurring the reference image by using the Gaussian filter. The 5 levels we used are: l=0.5; l=0.7; l=0.9; l=1.1; l= Gamut mapping distortions The term gamut or color gamut is a specific complete subset of colors in the fields of color reproduction including graphics and photography. The most common usage refers to the subset of colors which can be accurately represented in a given circumstance, such as within a given output device. Gamut can be simply represented as an area in the CIE 1931 chromaticity diagram [97] as shown in Figure 36. If the area in the curved edge represents the natural colors then the triangle area represents the visible gamut for a srgb device such as a srgb display. It is obvious that the gamut on a device is smaller than the natural gamut, simply put, the way to use a smaller gamut representing a bigger gamut we called gamut mapping. Some descriptions of gamut and gamut mapping have been stated: Stone et al. [98] proposed that an image is represented as a set of points, and the color output device as a solid surrounding the set of all reproducible colors for that device, called its gamut. The shapes of the monitor and the printer gamuts are very different, so it is necessary to transform the image points to fit into the destination gamut, a process we call gamut mapping. Dugay [99] wrote in her thesis that color gamut is a range of colors achievable on a given color reproduction medium (or present in an image of that medium) under a given set of viewing conditions it is a volume in colour space. And color gamut mapping is a method for assigning colors from the reproduction medium to colors from the original medium or image. Levoy et al. [100] proposed that the range of colors displayable on a particular computer screen or reproducible on paper by a particular printer is called the gamut of that display or printer. The colors need to reproduce lie outside the gamut of output display or printer and the approach can modify them to make the colors displayable, without excessively distorting the overall design of the image is called gamut mapping. CIE [20] proposed that two gamut mapping algorithms are obligatory need to be included: chroma-dependent sigmoidal lightness mapping followed by knee scaling towards the cusp [101, 102] and hue-angle preserving minimum E ab clipping [20]. The 53

68 Figure 36: The CIE 1931 color space chromaticity diagram comparing the visible gamut with srgb s. first algorithm is known as compression gamut mapping method and second algorithm is clipping gamut mapping method. Gamut mapping algorithms Chroma dependent sigmoidal lightness mapping and cusp knee scaling (SGCK) This algorithm keeps perceived hue constant uses a generic (image-independent) sigmoidal lightness scaling which is applied in a chroma-dependent way and a 90 % knee function chroma scaling toward the cusp. In short, this method keep the scale of the original gamut and compress the gamut to fit the target gamut. Hue-angle preserving minimum E ab clipping This algorithm keeps colors from the intersection of the original and reproduction gamuts unchanged and only alters original colors that are outside the reproduction gamut. This is done by clipping them to that colour in the reproduction gamut that has the smallest E ab color difference from the corresponding original color. An example of SGCK gamut mapped imgae and Hue-angle preserving minimum E ab clipping gamut mapped image (used the same profile) compare to the original is shown in Figure 37. One of the very important motivations to develop a CID:IQ database is that the existing databases have contained limited color distortions. Gamut mapping often produces not only one single color based distortion but multiple distortions. These distortions include both attributes always present in an image and artifacts such as color changes, saturation changes, contrast changes, hue changes, lightness changes, loss of details, contouring artifacts and so on. It is wise to have gamut mapping distortions in the CID:IQ database. 54

(a) Original image (b) SGCK gamut mapped image (c) Hue-angle preserving minimum E ab clipping gamut mapped image Figure 37: Comparison between the original image and two gamut mapped image.

69 (a) Original image (b) SGCK gamut mapped image (c) Hue-angle preserving minimum E ab clipping gamut mapped image Figure 37: Comparison between the original image and two gamut mapped image. Processing The software we used for gamut mapping is called ICC3D (Interactive Color Correction in 3 Dimensions) which was developed by The Norwegian Colour and Visual Computing Laboratory from Gjøvik University College [103]. A lot of functionalities related to gamut mapping can be done by using this software such as gamut comparison, view device gamut and image gamut, regression, mapping test especially the ICC profile gamut mapping operation which we used. A screenshot of ICC3D is shown in Figure 38. Open file operation can be done by clicking "Open image" button in the top left window. The image will be shown in the top right window and if click "View 3D" button the 3D gamut information will be given in the bottom left and right windows. In the "Control panel" window 7 different image gamut visualization can be selected: quantized, standard, segment maxima, convex hull, alpha shapes, cross-section and height map. In our experiment convex hull is used. The gamut mapping operation can be implement by clicking "Add module" button and choose a module which is suitable for the experiment. In our case we used ICC profile gamut module. As discussed in previous section, 5 different ICC profiles are selected. After adding these 5 ICC profile modules then click "Update" button, all gamut information will appear in the bottom right window. The gamut can be shown in a solid or by its wireframe. Easy comparisons are possible as several gamuts can be represented simultaneously. ICC3D provides 9 different gamut mapping algorithms: SGCK, cursp2cusp, Gamma, clip towards center, constant hue and lightness, cusp, hue-angle preserving minimum E ab clipping, linear lightness mapping and sigmoidal lightness mapping. The module of gamut mapping we selected is ICC profile gamut and the gamut visualization is Convex Hull. In order to conduct five levles of gamut mapping distortions, 5 different ICC profile were chose: PSO Coated v2 300 Glossy laminate profile (volume=552537), PSO LWC Standard profile (volume=457606), PSO MFC Paper profile (volume=359510), ISO uncoated yellowish profile (volume=204334), and ISO newspaper 26v4 profile (volume=141632). A preview of these gamuts is given in Figure 39 and the histogram of profile volumes is shown in Figure 40. The 3D view from 3 angles of these 5 ICC profile gamuts are shown in Appendix B. 55

70 Figure 38: Screenshot of ICC3D software. (a) PSO Coated v2 300 Glossy laminate (b) PSO LWC Standard (c) PSO MFC Paper (d) ISO uncoated yellowish (e) ISO newspaper 26v4 Figure 39: Comparison between five ICC profile gamuts. 56

71 Figure 40: Histogram of ICC profile gamut volumes. 4.3 Experiment setup Viewing conditions As discussed in Section 2.2.3, the viewing condition is a very important factor that need to be well controlled when conducting the experiment. Experimental environment The entire experiment were taken place in a normal laboratory. The environment is psychological and physical comfort. There are no noise when observers doing the experiments and it is an appropriate place to conduct such an experiment. Viewing illumination ITU (International Telecommunication Union) [104] recommended the room viewing illumination should be low. CIE [20] recommended that when measured at the face of the monitor, with a cosine corrected photometer and with the monitor switched off, the level of ambient illumination shall be less than 64 lux and should be less than 32 lux. The correlated colour temperature of the ambient illumination shall be less than or equal to that of the monitor white point. As a result of these recommendations, the experiment for the CID:IQ database are conducted in a dark laboratory with little illumination. The level of ambient illumination is 4.2 lux. Both ITU [104] and CIE [20] recommended that the chromaticity of the white displayed on the color monitor should approximate that of CIE Standard Illuminant D65. The CIE [20] recommendation of luminance level of the white displayed on the monitor shall be greater than 75cd/m 2. Our experiment setup follows these recommendation, the chromaticity of the white displayed on the color monitor is D65 and luminance level of the monitor is 80 cd/m 2. All settings are suited for srgb color space. Viewing distance Since using different viewing distances in separate experiment sessions will have different results from observers. It is a challenge for image quality metrics to evaluate image quality by considering different viewing distances. So we decide to use a normal viewing distance and a longer distance to conduct our experiment. ITU [104] recommended the maximum observation angle relative to the normal is 30 degree. If we transfer it to viewing distance in our case, the shortest distance should be 38cm. As a result we use 50cm (observation angle is 23.4 degree) and 100cm (observation angle is 12.2 degree) as our viewing distances. The real experiment setup is shown in Figure

(a) Viewing distance is 50cm (b) Viewing distance is 100cm (c) Real experiment setup Figure 41: Real experiment setup.

Poor quality 4 5 Fair quality 6 7 Good quality 8 9 Excellent quality Figure 42: The quality scales of category judgment experiment. 4.3.

6 distortions 5 levels 2 distances = 1380 images).

72 (a) Viewing distance is 50cm (b) Viewing distance is 100cm (c) Real experiment setup Figure 41: Real experiment setup. 1 Bad quality 2 3 Poor quality 4 5 Fair quality 6 7 Good quality 8 9 Excellent quality Figure 42: The quality scales of category judgment experiment Psychophysical experiment Psychophysical scaling approach Category judgment is the scaling approach used for psychophysical experiment because the number of images is very big (23 reference images 6 distortions 5 levels 2 distances = 1380 images). Instead of using five categories which recommended from CIE [20], we extended it to nine categories by simply add one extra category between two categories from recommendation (Figure 42). The reasons are: first, it is necessary to keep all categories in an equal space; second, recent studies discovered that some observers are not used to choose the end points (e.g. category one or category nine in a nine categories experiment). Since there are five levels of degradation so by this reason it is better to have more than 5 categories. In conclusion, a nine-category scale is applied. Human observer Totally 17 human observers participated the psychophysical experiment included both expert and non-expert observers. We defined expert as whom has the image processing, color imaging or related background while non-expert is whom has not the background. All observers have taken and passed a visual test include visual acuity test by using a Snellen chart (Figure 43 (a)) and color blind test by using a Ishihara chart (Figure 43 (b)). Observer task instruction The observer was asked to assign the distorted image shown on the right of the display to one of the 9 categories according to the image quality compare to the reference image displayed on the left of the screen. The following script was given to all observers: "This is a subjective experiment for developing a CID:IQ database. In this experiment 23 reference (original) images which represent different scenes are selected. For each reference image different distortion types have been added and several degradation levels were applied for every distortion types. The experiment will be conducted at 2 viewing distances: 50 cm and 100 cm. In order to improve the accuracy and reliability of the results, the entire experiment will be divided to 4 sessions: 2 distances 2 sets of images (first part 12 and second part 11 reference images). Each session will lasts about minutes. 58

73 (a) Snellen chart (b) Ishihara chart Figure 43: Tools used for observer visual test. Training sequences (results for these items are not processed) Main part of the experiment Figure 44: The structure of training sequence, reproduced from ITU [104]. During the experiment, each time you will only see 2 images on the screen, the left image is the reference image and the right one is the distorted image that you need to judge. 9 categories are shown below the images: from 1 (Bad image quality) to 9 (Excellent image quality). Your task is to assign the distorted image to one category. The criterion of judgment is image quality compare to the reference. Be aware that differences between each distortion levels may be very small so please compare the details to the reference images such as sharpness, contrast, colors, saturation, lightness, edges and so on. Remember that you cannot go back to change your opinion so please make the judgment carefully." Training sequences A training sequence should be included at the beginning of the first session recommended by ITU [104]. In the training sequence, about five "dummy presentations" should be introduced to stabilize the observers opinion. The data issued from this training sequence must not be taken into account in the results of the main experiment. If several sessions are needed, three dummy presentations are only necessary at the beginning of the following session. In our experiment, one dummy image with full distortion types and all degradation levels are included at the begining of each session as the training sequences. The structure of training sequence is shown in Figure 44. Experimental hardware The display is EIZO ColorEdge CG inch LCD monitor (Figure 45). Display port is DVI and resolution is 1920 x 1200 (16:10 aspect ratio). Dots per inch (DPI) of the display is 94. This display has been calibrated by using Eye-one match. The CPU of the PC used for experiment is Intel(R) Core(TM)2 Quad CPU The RAM is 3GB and 59

Figure 45: EIZO ColorEdge CG246 24 inch monitor. Reproduced from www.eizo.com, visited October 2013. the operating system is 32 bit Windows 7 Professional.

74 Figure 45: EIZO ColorEdge CG inch monitor. Reproduced from visited October the operating system is 32 bit Windows 7 Professional. Experimental software A Graphic User Interface (GUI) was designed for the category judgment experiment in Matlab. In the begnining the observer will see a welcome interface, after click the "Start" button the experiment starts (Figure 46 (a)). There are two images shown at the same time, the left one is the reference image and the right one is the distorted image. The 9 categories are given below the images (Figure 46 (b)). There is no default choice of category, when observer choose a category and click "OK" the distorted image will changes to the next one automatically. The order of image appearance is organised as follow: first, one reference image from 23 is randomly selected; second, one distortion type from 6 is selected for this reference image; at last, one distortion level for the selected distortion type is selected and shown to observer. The demonstration of image appearance order is given in Figure 47. CIE [20] recommended that the area immediately surrounding the displayed image and its border shall be neutral, preferably dark grey or black to minimise flare, and of approximately the same chromaticity as the white point of the monitor. By taking into account this recommendation we set the surrounding as well as its border color to mid-gray (RGB values: 128,128,128). After an observer finish one session both image appearance order data and category judgment results will be stored automatically. 4.4 Experiment results All the raw data from psychophysical experiment have been stored in Excel files and Matlab data files by observer. These results include: raw category judgment data; reference images display order; distortion types display order; and degradation levels display order from individual observer. The goal of using this image quality database is to plot experiment results for all images (with all distortions and levels) in the same plot, and then calculate the correlation between the subjective results and the metrics, so that to evaluate the metrics. In order to yield results in graphical and/or numerical/formulae/algorithm form which represent the subjective results, ITU [104] recommended that raw data can be transformed to mean opinion scores. The approach of converting category judgment results to mean opinion scores is: M = 1 N N M o (4.1) o=1 60

(a) Welcome interface (b) Experiment interface Figure 46: Screenshot of experimental Graphic User

Reference image a1 Distroted image a1 type b1 level c1 Reference image a1 Distroted image a1 type

Same reference image 6 distortion types 5 levels 23 reference images 6 distortion types 5 levels

75 (a) Welcome interface (b) Experiment interface Figure 46: Screenshot of experimental Graphic User Interface. Reference image a1 Distroted image a1 type b1 level c1 Reference image a1 Distroted image a1 type b1 level c2 Same reference image same distortion type 5 levels Distroted image a1 type b1 level c5 Same reference image 6 distortion types 5 levels 23 reference images 6 distortion types 5 levels Image appearance order Reference image a1 Distroted image a1 type b6 level c5 Reference image a23 Distroted image a23 type b6 level c5 Figure 47: The demonstration of image appearance order. 61

76 where M o is the score of observer o (the score here represents the category, if an observer selects category 1 the score is 1, while an observer selects category 9 then the score is 9), N is the number of observers. Mean opinion scores for each image per distortion type per degradation level are summarized in Appendix E. So far, CID:IQ database is completely developed. It is also important to analyze the experiment in order to evaluate whether the results are expected. 62

77 5 Experiment results analysis We can analyze the data in two ways: by distortion or image. By using distortion data it is easy to compare how the subjective results affected by different distortions while using image data it is simple to investigate the effect of different images. In order to analyze the data from category judgment (as the method introduced in Section 2.2.2), all raw data have to be further transformed to the observer table shown in Table 9. The data analysis processes from observer table to the final Z-scores have been implemented by Green and MacDonald [105] in Matlab. This Matlab script calculates the scale values (Table 17), boundary matrix (Table 16), and Confidence Interval (CI). Finally Z-scores are plotted as the results. 5.1 Distortions data analysis In this sub-section, distortions data will be analyzed according to the Z-scores plots. All these plots are calculated from frequency tables for all observers and all images JPEG2000 compression distortion Subjective results for JPEG2000 compression distortion that are shown in Figure 48 and Table 28. The plot on the left in Figure 49 is the Z-scores from the experiment conducted at 50cm viewing distance and the plot on the right is the Z-scores from the experiment conducted at 100cm. The order is the same for the tables in Table 28. From the Z-scores results we can see that when the observers were at 50cm evaluating the image quality, they can easily identify the difference between different degradation levels because there is no overlap in any of the confidence interval for the 5 levels. The first level has been assigned to category 3 (the scale value is 0.54, it is between boundary two (1.01) and boundary three (0.52) so this level belongs to category 3) which means the image qualiy is good while the fifth level is located at category 6 means the image quality is better than poor but worse then fair. Five levels have crossed 4 consequent categories and no overlap, this meaning there is statistical significant difference between the different levels but the difference is not obviously to indentify. It is a challenge for image quality metrics whether they can give the objective results in the same correlation. The plot on the right shows that the first three levels are quite difficult for observers to see the differences because the scale values are very similar and the confidence intervals overlap. This is different from the experiment results at 50cm. In addition, the fifth level in the right plot is in category 5 (fair image quality) which is better than the same level on the left plot (located at category 6). It is caused by the increasing of the viewing distance. It has been verified that the quality of compressed images are highly dependent on the viewing distance, when the viewing distance increases the distortion is less perceptible. Now it is proved by our experiment results. However, it is difficult for image quality metrics to consider human visual system in order to give the same results. So the subjective data in our CID:IQ database have a significant advantage compared to other databases. Most likely image quality metrics that take into account human visual system and can simulate viewing distance will perform better on this CID:IQ database. Furthermore, in JPEG

78 Figure 48: Z-scores for JPEG2000 distortion. Data points on which the underlaying data is based on 17 observers and 23 images. (a) Results at 50cm viewing distance 50cm Scale Boundary CI JPEG (b) Results at 100cm viewing distance 100cm Scale Boundary CI JPEG Table 28: Z-scores scale, boundary, and confidence interval values for JPEG2000 distortion. compression distortion the first level in two distance are located at the same category. This is most likely caused by the improvement of JPEG2000 compression design principle introduced in Section The promise of JPEG2000 "At high bit rates, artifacts become nearly imperceptible" now is proved as well JPEG compression distortion Subjective results for JPEG compression distortion that presented by Z-scores are shown in Figure 49 and Table 29. There is also no overlap in the plot for 50cm viewing distance and only two scale values are overlaped in the right plot from 100cm. The results of JPEG compression distortion is similar to JPEG2000 compression which confirm that he quality of compressed images are highly dependent on the viewing distance. 64

79 Figure 49: Z-scores for JPEG distortion. Data points on which the underlaying data is based on 17 observers and 23 images. (a) Results at 50cm viewing distance 50cm Scale Boundary CI JPEG (b) Results at 100cm viewing distance 100cm Scale Boundary CI JPEG Table 29: Z-scores scale, boundary, and confidence interval values for JPEG distortion. 65

80 Figure 50: Z-scores for Poisson noise distortion. Data points on which the underlaying data is based on 17 observers and 23 images Poisson noise distortion Subjective results for Poisson noise distortion presented by Z-scores are shown in Figure 50 and Table 30. The Z-scores from 50cm viewing distance show that the perceptual image quality of distortion level 1 is much better than the others. Observers can distinguish different degradation levels. But it is interesting to discover that the average scale value of level 2 from 100cm viewing distance is not only similar to the level 1 but even higher then the scale value of level 1 but confidence intervales overlap. It means human observers thought image quality at level 1 and level 2 are difficult to judge which one is better. This result is unexpected. There are two hypotheses can explain this: first, when observer view the low magnitude noisy image from 100cm it is difficult to see the noise itself but the overall image sharpness is increased because of the noise pixels; second, from Figure 34 we can know that the poisson noise will slightly increases the brightness of the image especially when observers look at it from far away. In general the further to view an image the more blurred the image is. Fairchild et al. [106] stated that additive uniform noise also increases sharpness up to a certain level of noise, and then decreases sharpness. So based on this statement, human observers definitely considered the image with higher sharpness was better. But the increase of sharpness in images with first level of degradation are too minor to be perceived and the noise in level 3 images are no longer imperceptible. That s why only scale value at level 2 is abnormal. It is possible that observer prefer the images with higher brightness. The same explanation as the increase of sharpness can be used in the increase of brightness based on the second hypothesis. Some observers expressed that there is a increase of brightness for Poission noise images when they only looked at noisy images. 66

81 (a) Results at 50cm viewing distance 50cm Scale Boundary CI Noise (b) Results at 100cm viewing distance 100cm Scale Boundary CI Noise Table 30: Z-scores scale, boundary, and confidence interval values for Poisson noise distortion. (a) Results at 50cm viewing distance 50cm Scale Boundary CI Blur (b) Results at 100cm viewing distance 100cm Scale Boundary CI Blur Table 31: Z-scores scale, boundary, and confidence interval values for Gaussian blur distortion Gaussian blur distortion Subjective results for Gaussian blur distortion presented by Z-scores are shown in Figure 51 and Table 31. Here we would like to emphasize that the Z-score scale values for level 4 and level 5 at 100cm viewing distance have very small differences, and their confidence intervals overlap. This is because the blurred images at level 4 are already reach a threshold that observers cannot distinguish this tiny difference when the viewing distance is larger than normal distance Gamut mapping distortions Subjective results for SGCK gamut mapping distortion and Hue-angle preserving minimum E ab clipping gamut mapping distortion that presented by Z-scores are shown in Figure 52 and Table 32. As we can see that the distrubution of Z-score scale values in both SGCK and E ab gamut mapping distortion at both 50cm and 100cm are quite similar. None of the confidence intervals overlap. This represents that perceptual image quality of gamut mapping artifacts with these two gamut mapping algorithms are not influenced by the viewing distance. We would like to draw readers attention to the extreme categories covered by these two gamut mapping methods. For SGCK gamut mapping distortion, the best category in the plot for both viewing distance is category 3 (at 100cm viewing distance, the scale value is 1.09 that between the boundary two 1.74 and boundary three 1.09) but the best category in the plot at both viewing distance for E ab gamut mapping 67

82 Figure 51: Z-scores for Gaussian blur distortion. Data points on which the underlaying data is based on 17 observers and 23 images. distortion is category 4. Meanwhile, the worst category from SGCK is category 8 but the worst category from E ab is 9. Since same ICC gamut profiles were used in both methods, we can conclude that the overall image quality of SGCK gamut mapping approach is better than the Hue-angle preserving minimum E ab clipping gamut mapping method. 5.2 Images data analysis All the details of Z-scores plots with tables of scale, boundary, and confidence interval values are given in Appendix D. The plots are calculated from frequency tables for all observers. Most of the Z-scores plots have an expected behaviour but Z-scores of gamut mapping distortions for Image 12 have a unexpected behavior. In this Section we will only analyze gamut mapping image data for Image 12. The original image is shown in Figure 53. This is a jellyfish image with large area of blue color. If we look at the category judgment frequency table of gamut mapping distortion for this image in Table 33 we can easily see that most of the frequency values are in the first few categories but none of the frequency value is in the category 8 and 9. Because there are a lot of "zero" values in the table it is impossible to do the further calculations neither plot the Z-scores. In order to compute the results we have to equally add value "1" at all levels in the frequency tables for both gamut mapping methods. Then the adjusted Z-scores plots with scale, boundary, and confidence interval values are shown in Figure 54 and Table 34. As a result of this adjust, all scale values are close to zero. This does not mean that the perceptual image quality of these images are better than those which have a lower scale values. This is only caused by the adjustment. This is also the reason we cannot use mean Z-scores scale values but need to calculate mean opinion scores instead (Equation 2.7). This will also enable us to calculate the correlation value for all images and distortions in the same plot, without having scale differences. 68

83 (a) Z-scores for SGCK gamut mapping distortion (b) Z-scores for E gamut mapping distortion Figure 52: Z-scores for gamut mapping distortion. Data points on which the underlaying data is based on 17 observers and 23 images. 69

84 (a) Results at 50cm viewing distance 50cm Scale Boundary CI SGCK (c) Results at 50cm viewing distance 50cm Scale Boundary CI Delta E (b) Results at 100cm viewing distance 100cm Scale Boundary CI SGCK (d) Results at 100cm viewing distance 100cm Scale Boundary CI Delta E Table 32: Z-scores scale, boundary, and confidence interval values for gamut mapping distortion. The problem here with the Z-score is that no matter how "bad" the image quality is the results will always be centered around 0. If the observers give category 1 as a results for all levels, the z-score will be 0, and if they give category 9 to all images, the z-score will be 0 as well. But the mean opinion score will be 1 in the first case and 9 in the second. This is a big difference, and important when calculating the quality. The figure shows that most of the scale values are in the same category which represents bad image quality. The reason for this phenomenon is caused by the gamut of Image 12. The image gamut in three different views are given in Figure 55. This image has a very small gamut and it is only in the blue area. After apply gamut mapping on this image, most the gamut values will be compressed or clipped. If we compare the image gamut to the profile gamuts in Figure 56 we can see that even use the profile gamut with the biggest volume number, the overlap gamut is very little. There is no overlap between the image gamut and the level5 ICC profile gamut. As a result of large image quality degradation, the observers gave the subjective results as stated above. There are also some interesting points in Figure 54. For SGCK gamut mapping at 50cm, level 1 is better than level 5, and there is a tendency that the quality is lower with a smaller gamut. This is not as obvious for 100cm, but we can still see the tendency. For 100cm for E ab clipping gamut mapping, this is not the case. 5.3 An example of comparing subjective and objective results We select a well known and commonly used image quality metric called Structural SIMilarity (SSIM) index [4] as the objective method. It is a method for measuring the similarity between two images but SSIM index can also be viewed as a quality measure of one of the images being compared, provided the other image is regarded as of perfect quality. 70

85 Figure 53: Image 12. (a) Frequency table for SGCK gamut mapping SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level (b) Frequency table for E ab gamut mapping Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level Table 33: Frequency tables for gamut mapping distortions. Figure 54: Z-scores for Image 12. Data points on which the underlaying data is based on 17 observers. 71

50 cm 100 cm Scale Boundary CI Scale Boundary CI 0.36 1.59 0.33 0.28 1.59 0.33 0.24 1.59 0.17 1.59 0.05 1.52 0.13 1.59 SGCK Gamut Mapping -0.19 1.44 SGCK Gamut Mapping -0.27 1.52-0.46 1.11-0.32 1.

59-0.21 1.59-0.06 1.59 1.44 1.59 0.92 1.59-0.11 1.14 Table 34: Scale, boundary, and confidence interval values for Image 12.

86 50 cm 100 cm Scale Boundary CI Scale Boundary CI SGCK Gamut Mapping SGCK Gamut Mapping cm 100 cm Scale Boundary CI Scale Boundary CI Delta E Gamut Mapping Delta E Gamut Mapping Table 34: Scale, boundary, and confidence interval values for Image 12. (a) Image gamut from +L view (b) Image gamut from +a view (c) Image gamut from -b view Figure 55: Gamut for Image 12 (a) Gamut comparison between image gamut and level1 ICC profile gamut (b) Gamut comparison between image gamut and level2 ICC profile gamut (c) Gamut comparison between image gamut and level3 ICC profile gamut (d) Gamut comparison between (e) Gamut comparison between image gamut and level4 ICC profile gamut file image gamut and level5 ICC pro- gamut Figure 56: Gamut comparison between Image 12 gamut and 5 levels of ICC profile gamuts. 72

87 SSIM At both 50cm and 100cm Level1 Level2 Level3 Level4 Level5 JPEG Poison noise Z-scores At 50cm Level1 Level2 Level3 Level4 Level5 JPEG Poison noise At 100cm Level1 Level2 Level3 Level4 Level5 JPEG Poison noise Table 35: An example of comparing SSIM results and Z-scores. Spearman Correlation Pearson Correlation 50cm cm Poisson 50cm Poisson 100cm Table 36: Spearman and Pearson Correlations between objective and subjective results Two examples of subjective data are chosen: Image 1 with JPEG2000 distortion and Poisson noise distortion. The SSIM index results and Z-scores from our database are shown in Table 35 and Figure 57. From the table and figure we can see that there is no difference between 50cm viewing distance and 100cm from SSIM metric (the reason is that SSIM does not have a parameter for the viewing distance and will always give the same result for any viewing distance) but there are differences in the tendency of Z-scores. Then we calculate the Spearman Correlation [107] and Pearson Correlation [108] between Z-scores and results from SSIM metric, the correlation values are in Table 36. The larger the value the higher correlation between subjective and objective. As the results shown in the table, for both JPEG2000 and Poisson noise at 50cm the correlation values are 1 for Spearman Correlation and close to 1 for Pearson Correlation. It presents that the Z-scores and results from SSIM metric have high correlation at 50cm for Image 1. But at 100cm viewing distance the Spearman Correlation value for JPEG2000 is 0.9 and even lower for Poisson noise: 0.3. The same tendency is shown for Pearson Correlation. It means SSIM image quality metrix does not take into account the fact that further viewing distance has influence on perceptual image quality and it has very low correlation with human perception. If we plot the Z-scores against the SSIM values (see Figure 58) we can also see that the points for longer distance have more dispersed distribution than the points for 50cm. This example also proves that have different viewing distances in an image quality database is a challenge for some metrics. 73

100cm Figure 57: The comparison between SSIM index results and Z-scores. (a) Z-scores vs.

88 (a) SSIM results for JPEG2000 at both 50cm and 100cm (b) Z-scores for JPEG2000 at 50cm and 100cm (c) SSIM results for Poisson noise at both 50cm and 100cm (d) Z-scores for Poisson noise at 50cm and 100cm Figure 57: The comparison between SSIM index results and Z-scores. (a) Z-scores vs. SSIM for JPEG2000 (b) Z-scores vs. SSIM for JPEG2000 Figure 58: Z-scores vs. SSIM for JPEG2000 and Poisson noise. 74

89 6 Conclusions and Further Work 6.1 Contributions Through this study, the CID:IQ database has been successfully developed. The most important is the present work has some innovations. Initializing new aspects of developing image quality database, the results of this work and its subjective results are highly motivated for further researches. Not only regarding the possibilities in image quality database, but also into utilization of the methodology and the process of the work for the experimental image design, use of applied color distortion and experiment management. By using CID:IQ database to evaluate state-of-the-art image quality metrics, it brings the results to a different domain. The combination of objective and subjective reference images assessment method extends the applicability and performance evaluation of traditional experimental image design guidelines. The focus of not only most commonly used distortion types but also take into account widespread color related distortion type, gamut mapping artifacts, provide a brand new image subset for the evaluation of image quality metrics. Well designed and controlled viewing conditions demonstrate how to prepare and manage such a kind of psychophusical experiment so that the results are much more reliable and accurate. Applying two viewing distances makes a big challenge for image quality metrics. The CID:IQ database will be published and available for image quality research field. Everyone can use it to assess their image quality metrics free of charge for academic and research purposes. 6.2 Conclusions Firstly, the study introduces current knowledge of image quality evaluation, by explaining both objective and subjective methods. The examples of three psychophysical scaling methods bring the details of each method and demonstrate which method is most suitable for the CID:IQ database. Introduction of essential considerations in evaluation experiment shows the most important aspects need to be taken into account when preparing an experiment. It also guides how to conduct a successful experiment. Therefore we figure out that category judgment method is most suitable for the CID:IQ database because there are too many images needed to be assessed in the experiment. However, none of existing image quality database have applied different viewing distances in their experiments. By this reason, our CID:IQ database is strongly motivated to use two viewing distance when conducting the experiment. Secondly, a survey of existing image quality databases provides detailed information of all databases from 11 aspects. It allows us to investigate the advantages and disadvantages of existing databases in order to propose the aims and motivations of this study. A study of experimental image design methods opens a new window for our work to answer the research questions:"how to select reference images? How to judge them if they are suitable or not?". A survey of distortion types creates a structure of different distortions in different fields, it guides us to select reasonable types of distortion in this 75

90 work. After the pre-study of relevant knowledge, the CID:IQ database is successfully developed. Three innovations are completed in the CID:IQ database: first, it integrates state-of-the-art experimental image design methods and evaluation approaches in both objective and subjective aspects to create a new reference image set; second, gamut mapping distortions which are brand new color related distortions for image quality databases are used in the CID:IQ database; third, using two viewing distances in the experiments which are conducted in well controlled and unified conditions allows us to investigate the relationship between the viewing distance and perceptual image quality. In addition, it is a new indicator to assess the performance of image quality metrics. Furthermore, we extend the experimental image design guidelines which was proposed by Field [61] to a subjective evaluation table (Table 27) in order to assess whether all reference images cover the features recommend in [61]. From the table we can see that except black skin tone the CID:IQ database covers all the features. Then we propose to use objective evaluation methods, image colorfulness and spatial information, which recommend by Winkler [68] along with the method called busyness proposed by Orfanidou [70]. The evaluation results from these objective approaches show that our reference images are significant better than images in existing databases. Six distortions are selected in the CID:IQ database: JPEG2000 compression, JPEG compression, Poisson noise, Gaussian blur, SGCK gamut mapping and E ab gamut mapping. The first 4 distortions are normal distortions which have been used in most of existing databases, however, Poisson noise is a new type of noise that appears in a wide variety of applications. Gamut mapping distortions are only applied in our CID:IQ database by the reason of there is lack of color related distortions in existing databases. Therefor, the CID:IQ database has the capability to evaluate image quality metrics in a new domain. Rigorous managed and controlled experiments establish a benchmarking for how to conduct a subjective psychophysical experiment in the field of image quality. The experiments follow the recommended viewing conditions from CIE [20] and ITU [104]. Viewing angle is extended to use two viewing distances so that makes it possible to investigate the correlation between perceptual image quality and viewing distance. It is a big challenge for image quality metrics to produce the same results as human observers. Instead of using Mean opinion Scores or Difference Mean Opinion Scores, the CID:IQ database uses category judgment approach to produce experimental results and finally Z-scores are used to present the subjective results. Category judgment method is most suitable for this work because a big number of images need to be judged. Meanwhile using Z-scores to present the results transfer the data to a more intuitive way, it makes the results are easier to understand for all readers. 6.3 Further Work The current work only initially completes the development of CID:IQ database, and suggest extensive possibilities for further study. As a first step, further work should start from adding black skin tone images and images that contain cyan color. From Figure 29 we know that it is necessary to include images which have high colorfulness with high spatial information values while images have low colorfulness with low spatial information values are also required. In addition, 76

91 images represent low busyness are needed as well. If above points are all achieved, the quality of reference in the CID:IQ database will be improved to the next level. According to the distortion types in this work, more color related distortions are still required. For example, a database which contains only different gamut mapping distortions is desired. Moreover, multiple types of distortions should also apply to the reference images. Although image quality metrics have been used for assessing images including single types of distortion. However, some image processing procedures can give rise to images which simultaneously introduce multiple types of distortions. This phenomenon brings yet another level of difficulty for image quality metrics. An ideal image quality metric must not only take into account the effects of such degradations on the images, but also consider the effects of these distortions on each other. In addition, a well-known shortage of the image quality metrics is they cannot predict geometric distortions very well such as translation, scaling, rotation, shearing, or changes in view point [109]. If these geometric changes are not very big, then it normally have a tiny impact on perceptual image quality. But these small geometric changes cause big changes in pixel intensities. As a result, most of image quality metrics evaluate these distorted image to be much lower quality than indicated by the actual subjective results. Summary all these issues are required to be solved by using the same set of images and apply new distortions, then conduct extra experiments. Z-scores from Poisson noise distortion show an abnormal relationship between perceptual image quality and noise. In order to find out the exact explanation for this problem, individual experiments for the vast majority of noises are strong recommended to be done. The experiment should consider to use more types of noises and conduct at more viewing distances. At last, a concluding step of this work would be the methodology of creating CID:IQ database provides the possibility and knowledge to develop another types of image quality database. For example, 3D image quality database; not degraded but enhanced image quality database; and multispectral image quality databases. This work demonstrate that this is possible. 77

93 Bibliography [1] Ponomarenko, N., Battisti, F., Egiazarian, K., Astola, J., & Lukin, V Metrics performance comparison for color image database. In 4th International Workshop on Video Processing and Quality Metrics for Consumer Electronics, Scottsdale, [2] Ponomarenko, N., Lukin, V., Zelensky, A., Egiazarian, K., Carli, M., & Battisti, F Tid2008-a database for evaluation of full-reference visual quality assessment metrics. Advances of Modern Radioelectronics, 10(4), [3] Ponomarenko, N., Lukin, V., Zelensky, A., Egiazarian, K., Carli, M., & Battisti, F Tampere image database 2008 (tid2008). (Visited July. 2013). [4] Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P Image quality assessment: From error visibility to structural similarity. Image Processing, IEEE Transactions on, 13(4), [5] Sheikh, H. R., Sabir, M. F., & Bovik, A. C A statistical evaluation of recent full reference image quality assessment algorithms. Image Processing, IEEE Transactions on, 15(11), [6] Sheikh, H., Z.Wang, Cormack, L., & Bovik, A Live image quality assessment database release 2. (Visited July. 2013). [7] Tourancheau, S., Autrusseau, F., Sazzad, Z. M. P., & Horitaa, Y Mict image quality evaluation database. (Visited July. 2013). [8] Engeldrm, P. G Psychometric scaling: a toolkit for imaging systems development. Imcotek Press. [9] Janssen, R Computational image quality, volume 101. SPIE press. [10] Jacobson, R An evaluation of image quality metrics. Journal of Photographic Science, 43(1), [11] Keelan, B Handbook of image quality: characterization and prediction. CRC Press. [12] Fechner, G. T Elemente der psychophysik, volume 2. Breitkopf & Hartel. [13] Pedersen, M Image quality metrics for the evaluation of printing workflows. [14] Thurstone, L. L A law of comparative judgment. Psychological review, 34(4),

94 [15] Silverstein, D. A. & Farrell, J. E Quantifying perceptual image quality. In PICS, volume 98, [16] Bartleson, C. & Grum, F Optical radiation measurements, visual measurements, vol. 5. [17] Cui, C Comparison of two psychophysical methods for image color quality measurement: Paired comparison and rank order. In Color and Imaging Conference, volume 2000, Society for Imaging Science and Technology. [18] Parducci, A Category judgment: a range-frequency model. Psychological review, 72(6), 407. [19] Pedersen, M. & Hardeberg, J. Y Survey of full-reference image quality metrics. [20] CIE. Guidelines for the evaluation of gamut mapping algorithms. Technical Report. ISBN: , CIE TC8-03, 156:2004. [21] Marini, E., Autrusseau, F., & Callet, P. L Evaluation of standard watermarking techniques. autrusse/databases/enrico/. (Visited July. 2013). [22] Autrusseau, F. & Bas, P Subjective quality assessment of the broken arrows watermarking technique. autrusse/databases/brokenarrows/. (Visited July. 2013). [23] Carosi, M., Pankajakshan, V., & Autrusseau, F Toward a simplified perceptual quality metric for watermarking applications. In IS&T/SPIE Electronic Imaging, 75420D 75420D. International Society for Optics and Photonics. [24] Carosi, M., Pankajakshan, V., & Autrusseau, F Fourier subband database. autrusse/databases/fouriersb/. (Visited July. 2013). [25] Autrusseau, F. & Meerwald, P Dt-cwt versus dwt watermark embedding. autrusse/databases/meerwalddb/. (Visited July. 2013). [26] Engelke, U., Kusuma, M., Zepernick, H.-J., & Caldera, M Reduced-reference metric design for objective perceptual quality assessment in wireless imaging. Signal Processing: Image Communication, 24(7), [27] Engelke, U., Zepernick, H., & Kusuma, M Wireless imaging quality database. (Visited July. 2013). [28] Chandler, D. M. & Hemami, S. S Online supplement to vsnr: A visual signalto-noise ratio for natural images based on near-threshold and suprathreshold vision. [29] Chandler, D. M. & Hemami, S. S A57 image quality database. http//foulard. ece. cornell. edu/dmc27/vsnr/vsnr. html# loc1. (Visited July. 2013). 80

95 [30] Tourancheau, S., Autrusseau, F., Sazzad, Z. P., & Horita, Y Impact of subjective dataset on the performance of image quality metrics. In Image Processing, ICIP th IEEE International Conference on, IEEE. [31] Tourancheau, S., Autrusseau, F., Sazzad, Z. P., & Horita, Y Presentation of the images and associated subjective scores of the irccyn/ivc scores on toyama still images database. (Visited July. 2013). [32] Bosc, E., Pepion, R., Le Callet, P., Koppel, M., Ndjiki-Nya, P., Pressigout, M., & Morin, L Towards a new quality metric for 3-d synthesized view assessment. Selected Topics in Signal Processing, IEEE Journal of, 5(7), [33] Alexandre, B., Patrick, L. C., Patrizio, C., & Romain, C Quality assessment of stereoscopic images. EURASIP journal on image and video processing, [34] Alexandre, B., Patrick, L. C., Patrizio, C., & Romain, C Irccyn/ivc 3d images dataset. (Visited July. 2013). [35] Strauss, C., Pasteau, F., Autrusseau, F., Babel, M., Bédat, L., & Déforges, O Subjective and objective quality evaluation of lar coded art images. In Multimedia and Expo, ICME IEEE International Conference on, IEEE. [36] Autrusseau, F. & Babel, M Subjective quality assessment - lar database. autrusse/databases/. (Visited July. 2013). [37] Engelke, U., Maeder, A., & Zepernick, H.-J Visual attention modelling for subjective image quality databases. In Multimedia Signal Processing, MMSP 09. IEEE International Workshop on, 1 6. IEEE. [38] Liu, H., Klomp, N., & Heynderickx, I Tud image quality database: perceived ringing. (Visited July. 2013). [39] De Simone, F., Goldmann, L., Baroncini, V., & Ebrahimi, T Subjective evaluation of jpeg xr image compression. SPIE Applications of Digital Image Processing XXXII, [40] De Simone, F., Goldmann, L., Baroncini, V., & Ebrahimi, T Jpeg core experiment for the evaluation of jpeg xr image. (Visited July. 2013). [41] AIC & AhG Jpeg xr subjective assessment: Core experiments description 4.1. Tech. Rep. WG1N5001, ISO/IEC JTC1/SC29/WG1 (JPEG). [42] Goldmann, L., De Simone, F., & Ebrahimi, T Impact of acquisition distortions on the quality of stereoscopic images. In Fifth International Workshop on Video Processing and Quality Metrics for Consumer Electronics-VPQM Citeseer. [43] Goldmann, L., De Simone, F., & Ebrahimi, T Mmsp 3d image quality assessment. (Visited July. 2013). 81

96 [44] Liu, H., Wang, J., Redi, J., Le Callet, P., & Heynderickx, I An efficient no-reference metric for perceived blur. In Visual Information Processing (EUVIP), rd European Workshop on, IEEE. [45] Vu, C., Phan, T., Singh, P., & Chandler, D. M Digitally retouched image quality (driq) database. (Visited July. 2013). [46] Franzen, R Kodak lossless true color image suite. (Visited July. 2013). [47] Ponomarenko, N., Ieremeiev, O., Lukin, V., Egiazarian, K., Jin, L., Astola, J., Vozel, B., Chehdi, K., Carli, M., Battisti, F., et al Color image database tid2013: Peculiarities and preliminary results. In 4th European Workshop on Visual Information Processing, Paris. [48] Ponomarenko, N., Ieremeiev, O., Lukin, V., Egiazarian, K., Jin, L., Astola, J., Vozel, B., Chehdi, K., Carli, M., Battisti, F., et al Tampere image database 2013 (tid2013). (Visited July. 2013). [49] Larson, E. C. & Chandler, D. M Most apparent distortion: full-reference image quality assessment and the role of strategy. Journal of Electronic Imaging, 19(1), [50] Larson, E. C. & Chandler, D Categorical subjective image quality csiq database. (Visited July. 2013). [51] Le Callet, P. & Autrusseau, F Subjective quality assessment irccyn/ivc database. (Visited July. 2013). [52] Zarić, A., Tatalović, N., Brajković, N., Hlevnjak, H., Lončarić, M., Dumić, E., & Grgić, S Vclfer image quality assessment database. AUTOMATIKA: časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije, 53(4), [53] Zarić, A., Tatalović, N., Brajković, N., Hlevnjak, H., Lončarić, M., Dumić, E., & Grgić, S Vclfer image quality assessment database. (Visited July. 2013). [54] Le Callet, P. & Barba, D A robust quality metric for color image quality assessment. In Image Processing, ICIP Proceedings International Conference on, volume 1, I 437. IEEE. [55] Pedersen, M., Bonnier, N., Hardeberg, J. Y., & Albregtsen, F Estimating print quality attributes by image quality metrics. In Color and Imaging Conference, volume 2010, Society for Imaging Science and Technology. [56] Pedersen, M., Zheng, Y., & Hardeberg, J. Y Evaluation of image quality metrics for color prints. In Image Analysis, Springer. [57] Simone, G., Caracciolo, V., Pedersen, M., & Cheikh, F. A Evaluation of a difference of gaussians based image difference metric in relation to perceived compression artifacts. In Advances in Visual Computing, Springer. 82

97 [58] Cao, G., Pedersen, M., & Baranczuk, Z Saliency models as gamut-mapping artifact detectors. In Conference on Colour in Graphics, Imaging, and Vision, volume 2010, Society for Imaging Science and Technology. [59] Keelan, B. W. & Urabe, H Iso 20462: a psychophysical image quality measurement standard. In Electronic Imaging 2004, International Society for Optics and Photonics. [60] ISO Iso photography - psychophysical experimental methods to estimate image quality - part 1: Overview of psychophysical elements. [61] Field, G. G Test image design guidelines for color quality evaluation. In Color and Imaging Conference, volume 1999, Society for Imaging Science and Technology. [62] McCamy, C. S., Marcus, H., & Davidson, J A color-rendition chart. J. App. Photog. Eng, 2(3), [63] Standard. Iso Graphic Technology Prepress Digital Data Exchange. [64] Standard. Iso Part 1: Graphic Technology Prepress Digital Data Exchange CMYK standard colour image data (CMYK/SCID). [65] Standard. Iso Part 2: Graphic Technology Prepress Digital Data Exchange XYZ/sRGB encoded standard colour image data (XYZ/SCID). [66] Standard. Iso Part 3: Graphic Technology Prepress Digital Data Exchange CIELAB standard colour image data (CIELAB/SCID). [67] Standard. Iso Part 4: Graphic Technology Prepress Digital Data Exchange Wide gamut display-referred standard colour image data [Adobe RGB (1998)/SCID]. [68] Winkler, S Analysis of public image and video databases for quality assessment. [69] Hasler, D. & Susstrunk, S Measuring colourfulness in natural images. In Proc. SPIE, volume 5007, [70] Orfanidou, M., Triantaphillidou, S., & Allen, E Predicting image quality using a modular image difference model. In Electronic Imaging 2008, 68080F 68080F. International Society for Optics and Photonics. [71] Pedersen, M. & Hardeberg, J. Y Full-reference image quality metrics: classification and evaluation. Now Publishers, Incorporated. [72] Chou, C.-H. & Liu, K.-C A fidelity metric for assessing visual quality of color images. In Computer Communications and Networks, ICCCN Proceedings of 16th International Conference on, IEEE. [73] Gayle, D., Mahlab, H., Ucar, Y., & Eskicioglu, A. M A full-reference color image quality measure in the dwt domain. In European Signal Processing Conference, EUSIPCO, 4. Citeseer. 83

98 [74] Toet, A. & Lucassen, M. P A new universal colour image fidelity metric. Displays, 24(4), [75] Wang, Z. & Bovik, A. C A universal image quality index. Signal Processing Letters, IEEE, 9(3), [76] Ajagamelle, S. A., Pedersen, M., & Simone, G Analysis of the difference of gaussians model in image difference metrics. In Conference on Colour in Graphics, Imaging, and Vision, volume 2010, Society for Imaging Science and Technology. [77] Johnson, G. M. & Fairchild, M. D Darwinism of color image difference models. In Color and Imaging Conference, volume 2001, Society for Imaging Science and Technology. [78] Bonnier, N., Schmitt, F., Brettel, H., & Berche, S Evaluation of spatial gamut mapping algorithms. In Color and Imaging Conference, volume 2006, Society for Imaging Science and Technology. [79] Zhang, L., Zhang, L., Mou, X., & Zhang, D Fsim: a feature similarity index for image quality assessment. Image Processing, IEEE Transactions on, 20(8), [80] Munkberg, J., Clarberg, P., Hasselgren, J., & Akenine-Moller, T High dynamic range texture compression for graphics hardware. In ACM Transactions on Graphics (TOG), volume 25, ACM. [81] Pedersen, M. & Hardeberg, J. Y A new spatial hue angle metric for perceptual image difference. In Computational Color Imaging, Springer. [82] Kimmel, R., Shaked, D., Elad, M., & Sobel, I Space-dependent color gamut mapping: A variational approach. Image Processing, IEEE Transactions on, 14(6), [83] Brooks, A. C., Zhao, X., & Pappas, T. N Structural similarity quality metrics in a coding context: exploring the space of realistic distortions. Image Processing, IEEE Transactions on, 17(8), [84] Wang, Z. & Hardeberg, J. Y An adaptive bilateral filter for predicting color image difference. In Color and Imaging Conference, volume 2009, Society for Imaging Science and Technology. [85] Group. Jpeg standard. Joint Photographic Experts Group and others. (Visited October. 2013). [86] Group Jpeg2000 image compression standard. (Visited October. 2013). [87] Group The jpeg committee home page (2007). (Visited October. 2013). [88] Pennebaker, W. B. & Mitchell, J. L Jpeg still image data compression standard. New York: Van Nostrand Reinhold, 1993, 1. 84

99 [89] Group Free library for jpeg image compression. (Visited October. 2013). [90] Software. Jpeg2000 developer toolkit. (Visited October. 2013). [91] Pratt, W Digital image processing: Piks scientific inside. [92] Gonzalez, R. & Richard, E. Re woods, 2007, digital image processing. [93] Healey, G. E. & Kondepudy, R Radiometric ccd camera calibration and noise estimation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(3), [94] Faraji, H. & MacLean, W. J Ccd noise removal in digital images. Image Processing, IEEE Transactions on, 15(9), [95] Trussell, H. J. & Zhang, R The dominance of poisson noise in color digital cameras. In Image Processing (ICIP), th IEEE International Conference on, IEEE. [96] Wikipedia. Gaussian function. (Visited October. 2013). [97] Wikipedia. Cie 1931 color space. chromaticity-diagram-the-cie-xy-chromaticity-diagram-and-the-cie-xyy-colorspace. (Visited October. 2013). [98] Stone, M. C., Cowan, W. B., & Beatty, J. C Color gamut mapping and the printing of digital color images. ACM Transactions on Graphics (TOG), 7(4), [99] Dugay, F. Perceptual evaluation of colour gamut mapping algorithms. Master s thesis, Gjøvik University College and Grenoble Institute of Technology, [100] Levoy, M., Willett, N., & Adams, A. Gamut mapping. (Visited October. 2013). [101] Morovic, J. To develop a universal gamut mapping algorithm. PhD thesis, Phd Thesis, [102] Braun, G. J. & Fairchild, M. D General-purpose gamut-mapping algorithms: evaluation of contrast-preserving rescaling functions for color gamut mapping. In Color and Imaging Conference, volume 1999, Society for Imaging Science and Technology. [103] Software. Icc3d-a color management application. (Visited October. 2013). [104] BT, I.-R. R methodology for the subjective assessment of the quality of television pictures. International Telecommunication Union, Geneva, Switzerland,

100 [105] Green, P. & MacDonald, L Colour engineering: achieving device independent colour, volume 30. Wiley. com. [106] Fairchild, M. & Johnson, G Sharpness rules. In Proceeding of the 8th IS&T/SID Color Imaging Conferences, [107] Spearman s rank correlation coefficient. srank-correlation-coefficient. (Visited October. 2013). [108] Pearson product-moment correlation coefficient. (Visited October. 2013). [109] Chandler, D. M Seven challenges in image quality assessment: past, present, and future research. ISRN Signal Processing,

101 A Reference image statistics 87

102 Statistics information for Image x Number of pixels RGB values (a) Reference image 1 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 88

Statistics information for Image 2 10 x 104 9 8 7 6 5 4 3 2 1 0 0 255 (a) Reference image 2 (b)

103 Statistics information for Image 2 10 x (a) Reference image 2 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 89

5 0 0 255 RGB values (a) Reference image 3 (b) Image

104 Statistics information for Image x Number of pixels RGB values (a) Reference image 3 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 90

Statistics information for Image 4 2.5 x 10 4 2 Number of pixels 1.5 1 0.

105 Statistics information for Image x Number of pixels RGB values (a) Reference image 4 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 91

Statistics information for Image 5 1.8 1.6 1.4 1.

2 2 x 105 0 0 255 (a) Reference image 5 (b) Image

106 Statistics information for Image x (a) Reference image 5 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 92

5 0 0 255 RGB values (a) Reference image 6 (b) Image

107 Statistics information for Image x Number of pixels RGB values (a) Reference image 6 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 93

5 0 0 255 RGB values (a) Reference image 7 (b) Image

108 Statistics information for Image x Number of pixels RGB values (a) Reference image 7 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 94

Statistics information for Image 8 2.5 x 10 4 2 Number of pixels 1.5 1 0.

109 Statistics information for Image x Number of pixels RGB values (a) Reference image 8 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 95

Statistics information for Image 9 2.5 x 10 4 2 Number of pixels 1.5 1 0.

110 Statistics information for Image x Number of pixels RGB values (a) Reference image 9 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 96

5 0 0 255 RGB values (a) Reference image 10 (b) Image

111 Statistics information for Image x Number of pixels RGB values (a) Reference image 10 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 97

Statistics information for Image 11 2.5 x 10 4 2 Number of pixels 1.5 1 0.

112 Statistics information for Image x Number of pixels RGB values (a) Reference image 11 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 98

113 Statistics information for Image x Number of pixels RGB values (a) Reference image 12 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 99

Statistics information for Image 13 2.5 x 10 4 2 Number of pixels 1.5 1 0.

114 Statistics information for Image x Number of pixels RGB values (a) Reference image 13 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 100

Statistics information for Image 14 2.5 x 10 4 2 Number of pixels 1.5 1 0.

115 Statistics information for Image x Number of pixels RGB values (a) Reference image 14 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 101

Statistics information for Image 15 2.5 x 10 4 2 Number of pixels 1.5 1 0.

116 Statistics information for Image x Number of pixels RGB values (a) Reference image 15 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 102

Statistics information for Image 16 2.5 x 10 4 2 Number of pixels 1.5 1 0.

117 Statistics information for Image x Number of pixels RGB values (a) Reference image 16 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 103

5 0 0 255 RGB values (a) Reference image 17 (b) Image

118 Statistics information for Image x Number of pixels RGB values (a) Reference image 17 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 104

Statistics information for Image 18 2.5 x 10 4 2 Number of pixels 1.5 1 0.

119 Statistics information for Image x Number of pixels RGB values (a) Reference image 18 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 105

Statistics information for Image 19 2.5 x 10 4 2 Number of pixels 1.5 1 0.

120 Statistics information for Image x Number of pixels RGB values (a) Reference image 19 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 106

Statistics information for Image 20 3 x 104 2.5 2 1.5 1 0.

121 Statistics information for Image 20 3 x (a) Reference image 20 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 107

5 0 0 255 RGB values (a) Reference image 21 (b) Image

122 Statistics information for Image x Number of pixels RGB values (a) Reference image 21 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 108

5 0 0 255 RGB values (a) Reference image 22 (b) Image

123 Statistics information for Image x Number of pixels RGB values (a) Reference image 22 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 109

Statistics information for Image 23 2.5 x 10 4 2 Number of pixels 1.5 1 0.

124 Statistics information for Image x Number of pixels RGB values (a) Reference image 23 (b) Image histogram (c) Image gamut from +L view (d) Image gamut from +a view (e) Image gamut from -b view 110

125 B ICC profile gamuts PSO Coated v2 300 Glossy laminate profile (a) Profile gamut from +L view (b) Profile gamut from +a view (c) Profile gamut from -b view 111

126 PSO LWC Standard profile (a) Profile gamut from +L view (b) Profile gamut from +a view (c) Profile gamut from -b view 112

127 PSO MFC Paper profile (a) Profile gamut from +L view (b) Profile gamut from +a view (c) Profile gamut from -b view 113

128 ISO uncoated yellowish profile (a) Profile gamut from +L view (b) Profile gamut from +a view (c) Profile gamut from -b view 114

129 ISO newspaper 26v4 profile (a) Profile gamut from +L view (b) Profile gamut from +a view (c) Profile gamut from -b view 115

130

131 C Frequency table 117

132 Frequency table for Image 1 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

133 Frequency table for Image 2 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

134 Frequency table for Image 3 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

135 Frequency table for Image 4 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

136 Frequencytable for Image 5 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

137 Frequency table for Image 6 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

138 Frequency table table for Image 7 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

139 Frequency table for Image 8 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

140 Frequency table for Image 9 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

141 Frequency table for Image 10 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

142 Frequency table for Image 11 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

143 Frequency table for Image 12 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

144 Frequency table for Image 13 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

145 Frequency table for Image 14 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

146 Frequency table for Image 15 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

147 Frequency table for Image 16 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

148 Frequency table for Image 17 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

149 Frequency table for Image 18 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

150 Frequency table for Image 19 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

151 Frequency table for Image 20 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

152 Frequency table for Image 21 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

153 Frequency table for Image 22 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

154 Frequency table for Image 23 JPEG2000 Level Level Level Level Level Level Level Level Level Level JPEG Level Level Level Level Level Level Level Level Level Level Poisson Noise Level Level Level Level Level Level Level Level Level Level Gaussian Blur Level Level Level Level Level Level Level Level Level Level SGCK Gamut Mapping Level Level Level Level Level Level Level Level Level Level Delta E Gamut Mapping Level Level Level Level Level Level Level Level Level Level

155 D Subjective results 141

156 Z-scores for Image

157 Z-scores for Image

158 Z-scores for Image

159 Z-scores for Image

160 Z-scores for Image

161 Z-scores for Image

162 Z-scores for Image

163 Z-scores for Image

164 Z-scores for Image

165 Z-scores for Image

166 Z-scores for Image

167 Z-scores for Image

168 Z-scores for Image

169 Z-scores for Image

170 Z-scores for Image

171 Z-scores for Image

172 Z-scores for Image

173 Z-scores for Image

174 Z-scores for Image

175 Z-scores for Image

176 Z-scores for Image

177 Z-scores for Image

178 Z-scores for Image

A Comparison of Still-Image Compression Standards Using Different Image Quality Metrics and Proposed Methods for Improving Lossy Image Quality

A Comparison of Still-Image Compression Standards Using Different Image Quality Metrics and Proposed Methods for Improving Lossy Image Quality Multidimensional DSP Literature Survey Eric Heinen 3/21/08