Perceptual Evaluation of Tone Mapping Operators

Size: px

Start display at page:

Download "Perceptual Evaluation of Tone Mapping Operators"

Alyson Hunter
5 years ago
Views:

1 Max-Planck-Institut für Informatik Computer Graphics Group Saarbrücken, Germany Perceptual Evaluation of Tone Mapping Operators Master Thesis in Computer Science Computer Science Department University of Saarland Akiko Yoshida Supervisors: Prof. Dr. Karol Myszkowski Prof. Dr. Hans-Peter Seidel Max-Planck-Institut für Informatik Computer Graphics Group Saarbrücken, Germany Begin: January 1, 2004 End: May 17, 2004

3 Eidesstattliche Erklärung Hiermit erkläre ich an Eides statt, dass ich die vorliegende Mastersarbeit selbständig und ohne fremde Hilfe verfasst habe. Ich habe dazu keine weiteren als die angeführten Hilfsmittel benutzt und die aus anderen Quellen entnommenen Stellen als solche gekennzeichnet. Saarbrücken, den 17. May, 2004 Akiko Yoshida

5 Abstract This thesis focuses on tone mapping operators and the perceptual differences between them when the tone mapped images are compared with their corresponding real-world views. To achieve this goal, two tone reproductions were implemented within the scope of this thesis. High dynamic range (HDR) images were acquired using the camera response curve recovery function of Robertson et al. for the perceptual experiment. After the perceptual experiment, multivariate statistical methods were used to analyze the set of data from the psychophysical experiment. Two tone reproductions, the Ashikhmin method and fast bilateral filtering introduced by Durand and Dorsey were implemented. Both of them are classified as local tone mapping techniques. Ashikhmin s method deals with the local adaptation level and local contrast separately. The bilateral filtering method is based upon the work of Tomasi and Manduchi and sped up by piece-wise linear and subsampling strategies. This algorithm divides an image into two layers (a base layer and a detail layer) in order to reduce contrast but preserve details. In order to create HDR images, the camera response curve was recovered by using the method of Robertson et al. Each HDR image was constructed from 15 images with different exposures and saved in the Radiance RGBE format. A psychophysical experiment was held with 14 human observers over seven tone mapped images on two scenes. The tone mapping operators are the followings: the linear tone mapping, the fast bilateral filtering by Durand and Dorsey, Pattanaik et al. s method, the Ashikhmin method, the Ward method, the photographic tone reproduction of Reinhard et al., and the adaptive logarithmic mapping of Drago et al. In the psychophysical experiment, subjects were asked to compare each of the images to its corresponding real-world view and rate its overall brightness, contrast, detail reproduction in dark and bright regions, and natui

6 ii ABSTRACT ralness. The set of data was analyzed by using multivariate statistical analyses for the main effect of the tone mapping operators. The result shows that those tone mapping operators are perceived very differently when compared with the corresponding real-world views. The multivariate analysis of variance (MANOVA) shows that means of the set of data are in a threedimensional space but neither along a line nor on a plane. An interesting result shown by this experiment is that those operators are divided into global and local categories by Mahalanobis distances. The main effects of the tone mapping operators to each of the attributes were quite significant. Overall brightness provides the biggest difference among the tone mapping operators. The linear tone mapping operator was perceived with the highest brightness and Pattanaik, Ward, and Drago also had a higher amount of overall brightness than the others. Additionally, the result shows that although the main effect of the tone mapping operators for the details in bright regions is highly significant, for the details in dark regions is not. Correlations of all of possible combination of the attributes were tested. Regarding the naturalness, this research shows that none of the other attributes has a strong influence by itself. This may suggest that naturalness is influenced by a combination of the other attributes.

7 Acknowledgement I have been fortunate to have a number of excellent people helping my work. First of all, I would like to express my biggest gratitude to my supervisors Prof. Dr. Karol Myszkowski and Prof. Dr. Hans-Peter Seidel for plenty of comments, suggestions, support, and encouragement. I am also pleased to have had many chances to discuss about statistics with Volker Blanz. He has given me a number of hints to lead my statistical work. Additionally, I have been helped so much by Grzegorz Krawczyk and Rafał Mantiuk regarding tone mapping operators and the pilot study of my experiment and also by Frédéric Drago who provided his software packages and tone mapped images by his tone reproduction. Alexandra Pietrzak-Mantiuk participated in the pilot study of the experiment. I also wish to thank Jeffrey Schoner for reviewing this thesis. I am also deeply indebted to the following people for their participation in my perceptual experiment: Edilson de Aguiar, Naveed Ahmed, Mardé Greeff, Samir Hammann, Ioannis Ivrissimtzis, Waqar Saleem, Martin Sunkel, Heiko Wanning, Hitoshi Yamauchi, Shin Yoshizawa, Hang Yu, Jozef Zajac, and Gernot Ziegler. My experiment would have been impossible without them. Lastly, I wish to thank my family and friends for their support and encouragement. My gratitude to them is beyond words. iii

8 iv ACKNOWLEDGEMENT

9 Contents Abstract Acknowledgement i iii 1 Introduction 1 2 The Human Visual Systems Introduction The Human Eye Overview The Retina Light Sensitivity and Visual Adaptation Introduction Threshold versus Intensity Functions Color Appearance Visual Acuity The Time Course of Adaptation High Dynamic Range Images Introduction HDR Image Construction Methods Mann-Picard Method Debevec-Malik Method Mitsunaga-Nayar Method Robertson et al. Method v

10 vi CONTENTS Ward Method HDR Image Formats Formats Comparison Taking HDR Images Tone Mapping Operators Introduction Global Operators Local Operators Time-Dependent Operators Ashikhmin s Tone Reproduction Introduction Overview of the Method Local Adaptation Level Band-Limited Local Contrast LAL Calculation Gaussian Pyramid Tone Mapping Function Perceptual Capacity Tone Mapping Function Complete Procedure Calculating New RGB Values Gamma Correction Fast Bilateral Filtering Introduction Bilateral Filtering Edge-Preserving Smoothing with Robust Estimators Efficient Bilateral Filtering Introduction Piecewise-Linear Bilateral Filtering Subsampling

11 CONTENTS vii Uncertainty Contrast Reduction Multivariate Statistical Methods Introduction Univariate Analysis: Analysis of Variance Introduction Between-Subjects ANOVA Within-Subjects ANOVA Analysis of Covariance Bivariate Analysis: Correlation and Regression Introduction Correlation Regression Multivariate Analysis of Variance Introduction Assumptions and Questions Mathematical Forms of MANOVA Perceptual Evaluation Introduction Overview Perception-Based Measurements Objective Measurements HVS-Based Measurements Experimental Design Results Conclusions 105 A Images for the Perceptual Experiment 107 B Values of the Perceptual Experiment 111

12 viii CONTENTS

13 List of Tables 3.1 HDR formats The generating kernel One-way between-subjects ANOVA Within-subjects ANOVA Correlations Mahalanobis distances ix

14 x LIST OF TABLES

15 List of Figures 2.1 The human eye The structure of rods and cones The range of luminances The threshold vs. intensity (TVI) function Luminous efficiency of cones and rods Spectral sensitivity at different luminance levels Visual acuity Visual acuity The time course of dark adaptation The time courses of light adaptation in the rod and cone systems HDR images Range-range plot and a film response curve Mann-Picard method The Mitsunaga-Nayar method Weighting function for Robertson et al. method An example of the recovered response curve by Robertson et al. method Bits/pixel vs. dynamic range The Stanford memorial church Quality curves on the Stanford Memorial Church Averaged quality curves over 33 images HDR images for our experiment Local adaptation level xi

16 xii LIST OF FIGURES 5.2 Gaussian pyramid in one dimension The equivalent weighting functions An example of the Gaussian pyramid The threshold vs. intensity (TVI) function An example of the tone mapped image by T M(L) Bilaterally filtered images by Tomasi and Manduchi Bilateral filtering Speed-up of the piecewise-linear acceleration Speed-up by downsampling Statistical techniques One-way between-subjects ANOVA One-way within-subjects ANOVA Correlation coefficients An advantage of MANOVA over ANOVA A screenshot of our experiment Distributions and F and p values of each attribute for the main effect of Scenes Distributions and F and p values of each attribute for the main effect of the tone mapping operators Details in dark regions for each of Scenes Relationships between naturalness and the other attributes Overall brightness vs. details in bright regions Mahalanobis distances A.1 Images for Scene A.2 Images for Scene B.1 Values of the perceptual experiment B.2 Values of the perceptual experiment B.3 Values of the perceptual experiment B.4 Values of the perceptual experiment B.5 Values of the perceptual experiment

17 Chapter 1 Introduction The need of high dynamic range (HDR) images has highly increased recently because they are useful not only for static images but also for multimedia applications. Therefore, how to produce visually sufficient HDR images has been one of the important discussions in computer graphics for years against the problems which come from the limited number of bits in image formats and the limited display range of the physical devices. A number of techniques have been introduced to conquer those problems such as how to construct HDR images and their formats. To represent HDR images on today s physical devices, a number of successful tone mapping operators have also been presented. They are of course useful for HDR photography and also for lighting simulations such as realistic rendering and global illumination techniques. To produce visually sufficient images, a knowledge about human visual systems (HVS) cannot be ignored. Many tone mapping operators have been produced based upon HVS theory. Obviously, it is also necessary to judge a quality of images. This thesis focuses upon the human perception when people compare tone mapped images with their corresponding real-world views. In this thesis, Chapter 2 provides an overview of HVS. A survey of HDR images in terms of constructing them from multiple photographs of different exposures and their formats is shown in Chapter 3. In Chapter 4, the existing tone mapping operators are presented and two of them, which were originally introduced by Ashikhmin in 2002 [8] and by Durand and Dorsey in 2002 [31], are 1

18 2 CHAPTER 1. INTRODUCTION in details explained in Chapter 5 and Chapter 6 respectively. In Chapter 7, multivariate statistical methods are presented and Chapter 8 presents our perceptual experiment and its result analyzed by multivariate statistical analyses.

19 Chapter 2 The Human Visual Systems 2.1 Introduction A number of human perception-based techniques have been presented to produce visually sufficient images in computer graphics area. The knowledge of the limitations and capabilities of the human visual systems (HVS) has contributed actively to advances in a number of areas such as perceptually driven rendering, realistic imaging, high-fidelity visualization, appearance-preserving geometric simplification, and image quality measurement methods. Most of those works are based upon physiology and psychophysics of visions which discover how HVS deal with the light incoming to eyes. This section provides an overview of HVS including visual anatomy and physiology, light sensitivity, and visual adaptation. 2.2 The Human Eye Overview The human eye and its structures are shown in Figure 2.1 [16]. The cornea provides most of the eye s refractive power. Its index of refraction is substantially greater than that of air [16]. The iris controls the entry of light into the eye. The variable opening within it is called the pupil and the pupil determines the amount of light that can reach the retina. It is the most obvious mechanism to regulate 3

20 4 CHAPTER 2. THE HUMAN VISUAL SYSTEMS the amount of light stimulating HVS. It changes its size approximately from 7 to 2 mm over a 10 log unit range of luminance levels [9]. The lens changes its shape during the act of accommodation [16] in order to provide focal control. The accommodation allows objects at various distances from the eye to be clearly imaged. The space between the cornea and lens is filled with an exceptionally clear fluid whose name is aqueous humor and the rest of the eye is filled with the vitreous body, which is a thin jelly-like substance interlaced with peculiar fibers [16]. The incoming light through the cornea and the lens is projected onto the retina that has photoreceptor cells and neural tissues. The retina is an important component to consider sensitivity and color vision of HVS. Figure 2.1: The structures of the human eye (adapted from Boynton [16]) The Retina The retina is contained within the inner and outer limiting membranes. The inner membrane separates the retina from the vitreous humor and the outer membrane

2.2. THE HUMAN EYE 5 occupies most of the retina. The retina is about 250 µm thick, contains the total area of about 1100 mm 2, and has a volume of about 0.25 cm 3.

21 2.2. THE HUMAN EYE 5 occupies most of the retina. The retina is about 250 µm thick, contains the total area of about 1100 mm 2, and has a volume of about 0.25 cm 3. In this small volume, there are about 200 million nerve cells and they are involved with the processing of visual information [16]. The retina consists of two major photoreceptor cells: rods and cones, which were named from their shapes. Figure 2.2 shows the structures of the rods and the cones. The rods and cones are not equally distributed over the retinal surface. The rods are sensitive to light and provide Figure 2.2: The structure of photoreceptors (adapted from [118, 119]). Left: the structural features of a photoreceptor cell. Middle: the structure of the rod. Right: the structure of the cone. achromatic vision at low (scotopic) illumination levels. The cones are less sensitive to light than rods are, but they provide color vision at high (photopic) levels. How effective light of a particular wavelength for the rods and cones is shown by the luminous efficiency functions. The range of wavelengths that the rod and cone systems are sensitive is approximately from 400 to 700 nm. The peak of the rods sensitivity is at 498 nm and the cones have three types of sensitivities. The short wavelength (blue) cones have their peak response of sensitivity at 420 nm, the medium wavelength (green) cones peak at 534 nm, and the long wavelength (red) cones peak at 564 nm [37]. Before reaching the photosensitive segments

22 6 CHAPTER 2. THE HUMAN VISUAL SYSTEMS (photoreceptors) of the rods and cones, incoming light must pass through several retinal layers of neural tissue. Fovea, which is a small area near the optic axis (see Figure 2.1), is the photoreceptive surface which is directly exposed to light. 2.3 Light Sensitivity and Visual Adaptation Introduction Figure 2.3 shows the range of luminances, which the human encounter in natural environments, and associated parameters. HVS is processed on this big range of luminances by adaptation. Adaptation is achieved through the coordinated action of mechanical, photochemical, and neural processes in HVS [37]. Note that adaptation does not make people see equally well at all levels. Under dim light, the human eyes can detect small differences of luminance, but the abilities to distinguish pattern details and colors are poor. On the other hand, under daylight, the eyes have sharp color vision, but absolute sensitivity is low. In addition, adaptation process takes time sometimes as everybody experiences in daily life. Figure 2.3: The range of luminances and associated parameters (adapted from Spillman and Werner [97]) Threshold versus Intensity Functions The measurement of visual adaptation is done by detection threshold method. This experiment proceeds as follows. A subject has a seat in front of a big screen which is made dark. Before starting, the subject must wait for a while in order to adapt

23 2.3. LIGHT SENSITIVITY AND VISUAL ADAPTATION 7 the eyes to the illumination of the screen. On each trial, a disk of light is flashed near the center of the screen for a few hundred ms. The subject must answer whether the disk appear or not. If the disk is not seen by the subject, the intensity of the disk is increased on the next trial. If it is seen, the intensity is decreased. The detection threshold against the corresponding background luminance makes the threshold vs. intensity (TVI) function. Figure 2.4 shows TVI functions for rod and cone systems. As shown in the Figure 2.4: The threshold vs. intensity (TVI) functions in log-log space for the rod and cone systems (adapted from Ferwerda [37]). graph, the rod curve is almost flat below about 4 log cd/m 2 of background luminance. It means that the background luminance little affects the threshold. Above 2 log cd/m 2, the curve approaches a vertical asymptote. Over 3.5 log cd/m 2, the rod curve is linear. This relationship can be described by Weber s law (originally presented in [112]) as L = kl (2.1) where L is a luminance value and k is a constant which is experimentally defined

24 8 CHAPTER 2. THE HUMAN VISUAL SYSTEMS [52]. Weber s law is the change in stimulus intensity that can just be discriminated ( L) is a constant fraction (k) of the starting intensity of the stimulus (L) [43]. Weber s law indicates that HVS has constant contrast sensitivity because the increase in threshold with background luminance corresponds to a luminance pattern with constant contrast. The cone curve has a similar shape to that of rod system. Below 2.6 log cd/m 2, the curve is flat, which indicates that the cone system is operating at its absolute levels of sensitivity. Above 2 log cd/m 2, the function is linear and indicates Weber s law and constant contrast sensitivity Color Appearance Figure 2.5 shows the spectral sensitivities of rods and cones at scotopic, mesopic, and photopic levels. (a) shows that at scotopic levels, a detection is dominated by the rod system. Its sensitivity is quite high compared to the other two, but because Figure 2.5: Changes in the spectral sensitivity of HVS at scotopic (left), mesopic (middle), and photopic (right) (adapted from Ferwerda [38]). the rod system is achromatic, color is not apparent. (b) shows the sensitivity at mesopic levels. The rod and cone systems have nearly equal absolute sensitivities. Detection is done by more sensitive system for each wavelength value. It shows that the rod system detect wavelengths below about 575 nm and the cone system is used above it. (c) shows the sensitivity at photopic levels. Most of the detection is done by the cone system. The absolute sensitivity is very low, but because the cone system is trichromatic, color is apparent now.

25 2.3. LIGHT SENSITIVITY AND VISUAL ADAPTATION 9 Figure 2.6 shows the luminance efficiency functions with respect to the rod and cone system threshold sensitivities at different luminance levels. It represents Figure 2.6: Spectral sensitivity at different luminance levels (adapted from Ferwerda [37]). how the spectral sensitivity of HVS changes with luminance level changes and which of the rod or cone systems is used at a particular luminance level. Those changes of the spectral sensitivity lead the color appearance phenomena over the scotopic to photopic ranges. At low luminance levels, HVS is achromatic because the rod system detects wavelengths. At medium luminance levels, the cone system becomes active and colors start to be apparent from the long wavelength (red) to the middle wavelength (green). At high luminance levels, the short wavelength color (blue) becomes apparent.

26 10 CHAPTER 2. THE HUMAN VISUAL SYSTEMS Visual Acuity If there is a bright thin line in the visual field, the eyes optics produce a retinal image that has a slightly blurred. If there are two bright thin lines, the retinal intensity may be overlapped with each other as shown in the left of Figure 2.7. If those two lines are very close to each other, the central intensity between these two lines increases (see the right of Figure 2.7). Figure 2.7 shows that visual acuity is contrast sensitive. Visual acuity is a measure of the HVS ability to resolve spatial Figure 2.7: Visual acuity (adapted from Ferwerda [37]). The retinal intensity profiles the left image resolvable but the right image unresolvable line targets. Figure 2.8: Changes in grating acuity as a function of background luminance (adapted from Shaler [96]). details and usually measured by using the Snellen chart as measuring eyesight in

27 2.3. LIGHT SENSITIVITY AND VISUAL ADAPTATION 11 clinics. Visual acuity is lower at scotopic levels than at photopic levels as shown in Figure 2.8. Its background luminances cover the range from daylight to starlight. This data was measured by testing the detectability of square wave gratings of different spatial frequencies. This graph is useful to predict the visibility of scene details at different levels of illumination The Time Course of Adaptation As known from natural experiences, adaptation cannot be done instantaneously. Figure 2.9 shows the time course of dark adaptation, which adapts the eyes from higher to lower luminance level. This graph is originally measured by Hecht [75]. In his experiment, the observer is adapted to a high background luminance level at the beginning and then plunged into darkness. Detection thresholds are measured continuously over 20 minutes. In the graph, there is a crossing point of the rod and cone systems after about 7 minutes. This point is called the Purkinje break and indicates that transition from detection by the cone system to detection by the rod system [52]. Figure 2.9: The time course of dark adaptation (adapted from Ferwerda [37] and originally measured by Hecht [75]). The crossing point is called Purkinje break.

28 12 CHAPTER 2. THE HUMAN VISUAL SYSTEMS The inverse of the dark adaptation, which adapts the eyes from lower to higher luminance level, is called light adaptation. The time course of light adaptation is more rapid than dark adaptation although complete light adaptation may also take several minutes [52]. The time courses of light adaptation in the rod and the cone system are shown in Figure The left of Figure 2.10 was obtained by an experiment on the time course of light adaptation in the rod system by Adelson in 1982 [1]. This experiment was done as follows. Observers were dark adapted prior to the experiment. A large background field of 0.5 log cd/m 2 was switched on at the beginning of the experiment and measuring the threshold was started. Figure 2.10 was obtained by Baker in 1949 with the similar experiment to the above one [12]. Similar to the rod system, the threshold is the highest immediately after the onset of the background field. The threshold decreases once over time and reaches the minimum value after about 3 minutes. Then, the threshold rises slightly and reaches its adapted level at around 10 minutes. This experiment shows that the light adaptation in the cone system takes longer time than in the rod system.

29 2.3. LIGHT SENSITIVITY AND VISUAL ADAPTATION 13 (a) The time course in the rod systems (measured by Adelson [1]). (b) The time course in the cone systems (measured by Baker [12]). Figure 2.10: The time courses of light adaptation.

30 14 CHAPTER 2. THE HUMAN VISUAL SYSTEMS

Chapter 3 High Dynamic Range Images 3.1 Introduction Scenes can have much greater dynamic range than the range of a photographic film and electronic apparatus.

31 Chapter 3 High Dynamic Range Images 3.1 Introduction Scenes can have much greater dynamic range than the range of a photographic film and electronic apparatus. Images whose dynamic range is huge are called high dynamic range (HDR) images. Figure 3.1 shows an example of an HDR image with different exposures. The exposure is changed by either the aperture Figure 3.1: HDR images with different exposures. setting or the shutter speed of a camera. As seen in Figure 3.1, underexposed pictures can show details in the highlighted area and overexposed pictures can show details in the dark area. This chapter presents five HDR image encoding techniques, which were first introduced by Mann and Picard, by Debevec and 15

32 16 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES Malik, by Mitsunaga and Nayar, by Robertson et al., and by Ward, and presents HDR file format comparison. 3.2 HDR Image Construction Methods Mann-Picard Method Introduction The first algorithm to construct an HDR image from multiple photographs with different exposures was presented by Mann and Picard in 1995 [66]. The input pictures were supposed to be taken by a camera at a fixed location in space and a fixed orientation, with a fixed focal length lens. From the set of pictures, they recovered the film response curve by using self calibration method and linearize the images. To create a single HDR image, they calculated the certainty functions for each of the images and found weights based upon them. Self Calibration Their algorithm selects a relatively dark pixel from an image a and obtains its location a(x 0, y 0 ) and numerical value f(q 0 ) where q 0 is unknown quantity of light that gives rise to the pixel value. Then, the algorithm picks the same pixel up from another image b, namely b(x 0, y 0 ) and its pixel value f(q 1 ). Because it is known that k times as much light rise to b(x 0, y 0 ) as to a(x 0, y 0 ), it leads to f(q 1 ) = f(kq 0 ). It also searches the pixel in image a such that it has the numerical value f(q 1 ). The coordinate of that pixel is considered as (x 1, y 1 ) and it is obvious that a(x 1, y 1 ) = f(q 1 ). After taking (x 1, y 1 ), their algorithm takes the same coordinate (x 1, y 1 ) in image b and takes its pixel value f(q 2 ). Because the scalar value k is known (f(q 1 ) = f(kq 0 )), it is clear that f(q 2 ) = f(kq 1 ) = f(k 2 q 0 ). Then, it searches the pixel whose value is f(q 2 ) in the image a and consider its coordinate as (x 2, y 2 ). By iterating those steps, the nonlinearity of the image sensor at the points f(q 0 ), f(kq 0 ), f(k 2 q 0 ),... f(k n q 0 ) is obtained. After the above steps, their method creates the points on a plot of f(q) as a function of q, where q is the quantity of light measured in arbitrary units. The left

33 3.2. HDR IMAGE CONSTRUCTION METHODS 17 graph in Figure 3.2 shows this process diagrammatically and it is called rangerange plot. With smaller value of k, more sample points on the response curve (the right graph in Figure 3.2) are obtained. Because estimating a function f(q) is not easy, they assume that f is semi-monotonic. Because of this assumption, the range-range plot is also semi-monotonic. Additionally, they impose that f(0) = 0 by taking a picture with the lens cap on and subtracting the resulting pixel value from each of the two images. By this step, the range-range plot passes through the origin. They also suggest to use stronger restrictions on the response curve, for example, a commonly used empirical law for film such that f(q) = α + βq γ. This restriction rises the linearity of the canonical curve D log E (density versus log exposure). Figure 3.2: The process for finding the point-wise nonlinearity from two pictures with different exposures (adapted from Mann and Picard [66]). Left: range-range plot. Right: a film response curve. Constructing an HDR Radiance Map Once the response curve is constructed, a set of the response curves with different exposures is obtained by shifting the response curve to left or to right. Mann and Picard use the two facts about the response curves. One is that for the parts of the film, which are greatly overexposed or greatly underexposed which is shown as

34 18 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES Figure 3.3: The whole procedure of Mann-Picard method (adapted from Mann and Picard [66]). flat parts in the response curves, details are lost. Another fact is that for the steep part of the response curves, details can be more accurately recovered. Based upon this fact, they plot the derivatives of those hypothetical response curves. Those derivatives are called the certainty functions and Mann and Picard use them as the weights of the images with different exposures. They also regard the Wyckoff film as performing an analysis by decomposing the light falling on the sensor into its Wyckoff layers and a synthesis by combining images [115, 116]. The whole

35 3.2. HDR IMAGE CONSTRUCTION METHODS 19 procedure of Mann-Picard method is drawn in Figure Debevec-Malik Method Introduction In 1997, Debevec and Malik presented an algorithm to recover HDR radiance maps from photographs taken with different exposures [25]. Their algorithm is based upon a physical property of imaging systems, both photochemical and electronic, which is known as reciprocity. It recovers the film response function f and then reconstructs an HDR image from those multiple photographs. f is assumed to increase monotonically, therefore its inverse f 1 can be well defined and the exposure X at each pixel is recomputed as X = f 1 (Z) where Z is a nonlinear function of the original exposure. Recovering a Film Response Curve The input of their algorithm is multiple photographs which are taken from the same point with different exposure times t j. It is assumed that the scene is static and changes of lighting can be ignored because the process is taken quickly. Therefore, each of the film irradiance values E i at pixel i is constant. The film reciprocity equation is written as Z ij = f(e i t j ) where i and j are the indexes over pixels and over exposure times respectively. Because of the assumption that f is monotonic, it is invertible as f 1 (Z ij ) = E i t j. Taking the natural logarithm, the function is g(z ij ) = ln E i + ln t j (3.1)

36 20 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES where g(z ij ) = ln f 1 (Z ij ). The reciprocity Z ij and exposure times t j are known but the irradiances E i and the inverse function g are unknown. Debevec and Malik recover E i and g by using a least-square problem and regularization problem. They formulate the problem as one of finding the (Z max Z min + 1) values of g(z) and the N values of ln E i, where Z min and Z max are the least and greatest pixel values in integer respectively, N is the number of pixel locations, and P is the number of photographs, that minimize the quadratic objective function O = N i=1 P [g(z ij ) ln E i ln t j ] 2 + λ j=1 Z max 1 z=z min +1 g (z) 2 (3.2) w where Debevec and Malik use g (z) = g(z 1) 2g(z) + g(z + 1). The value λ is the weight of the second term of O, which is called the smoothness term. They suggest that λ should be chosen appropriately for the amount of noise expected in the Z ij measurements. Minimizing O is considered as a linear least squares problem and Debevec and Malik use the singular value decomposition (SVD) method to solve it. Debevec and Malik suggest three additional points for their algorithm. Firstly, g(z) and E i can be scaled by the scaling factor α as g(z) + α and E i + α. They introduce the new constraint g(z mid ) = 0, which means that the pixel with the value Z mid is assumed to have unit exposure, in order to establish the value of α, where Z mid = 1(Z 2 min + Z max ). Secondly, by anticipating the basic shape of the response function, the solution can be much better. They introduce an additional weighting function w(z) for this improvement and w(z) emphasizes the smoothness. w(z) is calculated as z Z min z 1 w(z) = (Z 2 min + Z max ) (3.3) Z max z z > 1(Z 2 min + Z max ) and therefore, (3.2) is rewritten as O = N i=1 P {w(z ij )[g(z ij ) ln E i ln t j ]} 2 +λ j=1 Z max 1 z=z min +1 [w(z)g (z)] 2. (3.4)

37 3.2. HDR IMAGE CONSTRUCTION METHODS 21 Finally, their algorithm does not need to refer to all of the available pixels. For sufficient result, they suggest the condition N(P 1) > (Z max Z min ) where N is the number of pixels and P is the number of photographs. Those selected pixels must be distributed reasonably between Z min and Z max. Constructing an HDR Radiance Map Once the response curve g is recovered, the radiance values for each pixel can be computed with the exposure times t j. For this step, Debevec and Malik use the weighting function w in (3.3) again and give higher weight to exposures in which the pixel s value is close to the middle of the response function. It is written mathematically as ln E i = P j=1 w(z ij)(g(z ij ) ln t j ) P j=1 w(z. ij) According to Debevec and Malik, the advantages of combining multiple exposures are that it can reduce noise in recovered radiance values and it can also reduce artifacts such as film grain. The weighting function reduces the influence of blooming artifacts because it ignores saturated pixel values. The recovered radiance values are stored in floating point values. For their algorithm, the minimum requirement of the number of photographs is two for recovering the response curve and R for recovering a radiance map with the response curve, where R is F the radiance range which is going to be recovered and F is the dynamic range that the film is capable of representing. Color Images The algorithm which is described above is for grey scale images. To deal with color images with red, green, and blue channels, Debevec and Malik recover the response curves for each channel independently. For color images, each channel can have a scaling factor, but they found that different choices of the scaling factors change the color balance of a radiance map. By default, their algorithm chooses the scaling factor such that a pixel with value Z mid have unit exposure, and then the scaling terms can be adjusted by photographing a calibration lumi-

38 22 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES nance of the known color in case that the film is calibrated to respond achromatically to a particular color of light Mitsunaga-Nayar Method Introduction Mitsunaga and Nayar presented another algorithm to recover the radiometric response function in 1999 [72]. Their approach is to use a flexible parametric model and does not require precise estimates of the exposure times. They use the rough estimates of the ratios of exposures for recovering the response function and it is sufficient. In their method, they include the pre-processing which rejects measurements with large vignetting effects and temporal changes during image acquisition. This pre-processing works efficiently for video systems where it is easier to vary the aperture setting than the exposure time. A Flexible Radiometric Model and Self Calibration As in the Debevec-Malik method, the Mitsunaga-Nayar method is based upon the fact that the brightness of an image I is computed as I = Et where E is an image irradiance and t is the exposure time. According to Horn [47], an image irradiance E is given as E = L π 4 ( d h )2 cos 4 φ where L is the scene radiance, h is the focal length, d is the diameter of its aperture, and φ is the angle between the principal ray and the optical axis. Mitsunaga and Nayar claim that, if the imaging system is ideal, the linear radiometric response curve is given as I = Lke (3.5) where k = cos 4 φ/h 2 and e = (πd 2 /4)t which is considered as an exposure and can be varied by changing the aperture size d or the exposure time t. As Debevec and Malik do, Mitsunaga and Nayar assume that the response curve is monotonic or at least semi-monotonic. Then, they claim that any response

39 3.2. HDR IMAGE CONSTRUCTION METHODS 23 function can be modeled virtually by using a high-order polynomial such as I = f(m) = N c n M n. (3.6) n=0 In the Mitsunaga-Nayar method, they consider the radio of the scaled radiance at a pixel p between two images with different exposures e q and e q+1, where R q,q+1 = e q /e q+1, is given by using (3.5) as I p,q I p,q+1 = L pk p e q L p k p e q+1 = R q,q+1 and the response function of an imaging system is related to the exposure ratio f(m p,q ) f(m p,q+1 ) = R q,q+1. (3.7) Then, Mitsunaga and Nayar order the images such that e q < e q+1, and it leads to 0 < R q,q+1 < 1. By substituting their polynomial model (3.6) for the response function, the following equation is obtained: N n=0 c nm n p,q N n=0 c nm n p,q+1 = R q,q+1. (3.8) ( u From (3.7), they derive f(mp,q) f(m p,q+1 )) = (Rq,q+1 ) u and implies that there are an infinite number of f-r pairs which satisfy (3.7). According to Mitsunaga and Nayar, if the exposure ratios R q,q+1 is known, the response function can be recovered by formulating an error function that is the sum of the squares of the errors in (3.8): E = Q 1 q=1 p=1 [ P N c n Mp,q n R q,q+1 n=1 N n=1 c n M n p,q+1] 2 (3.9) where Q is the number of images. They normalize all of the measurements such that 0 M 1 and fix the indeterminable scale using f(1) = I max, and they add a new constraint: c N = I max N 1 n=1 c n. (3.10)

40 24 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES They obtain the response function coefficients by solving the system of linear equations that result from E c n = 0. (3.11) The problem at this point is that usually it is difficult to obtain accurate estimates of the exposure ratios R q,q+1. Mitsunaga and Nayar use an iterative scheme where the current ratio estimates R (k 1) q,q+! are used to compute the next set of coefficients c (k) n. Those coefficients are used to update the ratio estimates by using (3.8): R (k) q,q+1 = P p=1 N n=1 c(k) N M n p,q N n=1 c(k) N M n p,q+1 (3.12) where the initial ratio estimates R (0) q,q+1 is provided by a user. Finally, their algorithm is summarized as f (k) (M) f (k 1) (M) < ɛ ( M) where ɛ is a small number. Because it is hard to recover the order N in an elegant way, Mitsunaga and Nayar set the upper bound on N (N = 10 for their experiments) and run the algorithm repeatedly to find the N that gives the lowest error. Pre-Processing In order to reduce noises from video, Mitsunaga and Nayar implement a preprocessing step which uses temporal and spatial averaging to obtain robust pixel measurements. They describe three sources of noises: electrical readout from the camera, quantization by the digitizer hardware, and motion of scene objects during data acquisition. They reduce the first two noise sources by taking temporal averaging of images (e.g., 100 images) and reduce the last noise source by taking spatial averaging over flat areas. They use the χ 2 distribution to detect spatial flatness. In addition, Mitsunaga and Nayar propose the algorithm to reduce vignetting effects. According to Asada et al., vignetting increases with the aperture size

41 3.2. HDR IMAGE CONSTRUCTION METHODS 25 in most lenses and vignetting effects are minimal at the center of the image and increase towards the periphery [7]. Based upon this fact, Mitsunaga and Nayar introduce an algorithm that robustly detects pixels that are corrupted by vignetting. Their algorithm checks the brightness measurements M p,q and M p,q+1 in the consecutive images q and q + 1 and computes the vignetting-free area by finding the smallest image circle within which the M p,q -M p,q+1 plot is a compact curve with negligible scatter. Constructing an HDR Radiance Map Once the response curve is recovered, Mitsunaga and Nayar construct a single HDR image by the same algorithm of Debevec and Malik. They present three steps to construct an HDR image from Q images. First, each of M p,q is mapped to its scaled radiance I p,q by using the recovered response curve f. Then, the scaled radiance is normalized by the scaled exposure ẽ q. The scaled exposures are calculated such that their arithmetic mean is equal to 1. Finally, the radiance values are calculated as a weighted average of its individual normalized radiance values as Debevec and Malik use a hat function as a weighting function. Mitsunaga and Nayar define their weighting function based upon the signal-to-noise ratio (SNR). The SNR for the scaled radiance value I is given as SNR = I dm di 1 σ N (M) = f(m) σ N (M)f (M) (3.13) where σ N (M) is the standard deviation of the measurement noise. Mitsunaga and Nayar assume that the noise σ N is independent of the measurement pixel value M, and they define their weighting function as w(m) = f(m)/f (M). (3.14) Their procedure to construct a single HDR image from multiple images is shown in Figure 3.4.

42 26 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES Figure 3.4: The procedure of fusing multiple images in the Mitsunaga-Nayar method (adapted from Mitsunaga and Nayar [72]). Color Images As mentioned by Debevec and Malik, different response curves for each of the color channels must be recovered for the Mitsunaga-Nayar method for color images. To find the relative scalings among the three channels, Mitsunaga and Nayar assume that the three response curves preserve the chromaticity of scene points. With the measurement M = [M r, M g, M b ] T and its scaled radiances I = [I r, I g, I b ] T, they suggest to correct radiance values such that I c = [k r I r, k g I g, k b I b ] T where k r /k b and k g /k b are obtained by least-squares minimization to the chromaticity constraint I c / I c = M/ M. They extended their work to use spatially varying exposures (SVE) in 2000 with the same response curve recovering algorithm [76]. The basic idea is that the brighter pixels have greater exposure and the darker ones have lower exposure. To obtain spatially SVE, they used an optical mask with a pattern of cells with different transparencies Robertson et al. Method Introduction In 1999, Robertson et al. presented another algorithm to construct an HDR image from multiple photographs [92]. Their algorithm requires an initial calibration

43 3.2. HDR IMAGE CONSTRUCTION METHODS 27 whose determines the camera response function if necessary. Once the response curve is recovered, an HDR image is constructed by the weighted average of images with different exposures. With the similar manner to the Debevec-Malik method, Robertson et al. method is based upon the idea such that the relationship between the pixel value and the response curve is given as y ij = f(t i x j + N c ij) (3.15) where y ij is the j th pixel of the i th exposed image, f is the camera response function, t i is the known exposure times, x j is the irradiances, N is the number of pictures, and Nij c is an additive noise term. They assume the image data is 8 bits and explicitly write the camera response function as 0 if z [0, I 0 ] f(z) = m if z (I m 1, I m ], m = 1,..., 254 (3.16) 255 if z (I 254, ). Because y ij are digital numbers, f maps the positive real numbers to an interval of integers {0,..., 255} for 8 bits data. HDR Image with Known Response Function Robertson et al. set the inverse function of f as f 1 (y ij ) = t i x i + N c ij + N q ij = I jij (3.17) under the assumption that the function f is known. (3.17) is rewritten as I yij = t i x i + N ij (3.18) where N ij has the noise term and the dequantization uncertainty term N q ij. They modeled N ij as zero-mean independent Gaussian random variables with vari-

44 28 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES ances σij. 2 However, because characterizing the accurate σij 2 is quite difficult, they choose the variances heuristically. They replaced the variances with weights as w ij = 1/σ 2 ij. Then, Robertson et al. took the similar approach to the one by Debevec and Malik: the response function of a camera is typically steepest (i.e., more sensitive) towards the middle of its output range. With this idea, they set the Gaussian-like weighting function as w ij = w ij (y ij ) = exp ( 4 (y ) ij 127.5) 2 (3.19) (127.5) 2 such that values near 128 (i.e., middle of the range) are weighted more heavily than those near 0 and 255. This function is scaled and shifted such that w ij (0) = w ij (255) = 0 and w ij (127.5) = 1.0. By this function, pixel values near 0 and 255 have very low confidence in the accuracy and those near 128 have high confidence. The weighting function is shown in Figure 3.5. Now, I yij in (3.18) are independent Figure 3.5: Weighting function w ij (adapted from Robertson et al. [92]). Gaussian random variables and the joint probability density function of them is written as { P (I y ) exp } w ij (I yij t i x j ) 2. (3.20) i,j Robertson et al. took the maximum-likelihood (ML) approach to find the high dynamic range image values. The ML approach finds the x j values maximizing the probability in (3.20). They claim that maximizing (3.20) is equivalent to

45 3.2. HDR IMAGE CONSTRUCTION METHODS 29 minimizing the negative of its natural logarithm. It leads to the function O(x) = i,j w ij (I yij t i x j ) 2 (3.21) and this function must be minimized. They minimized Equation (3.21) by setting its gradient as O(x) = 0. It yields the desired irradiance estimate of an HDR image as ˆx j = i w ijt i I yij i w. (3.22) ijt 2 i Because images with longer exposure times are weighted more heavily due to t i component in (3.22), their method can remove quantization effects described in [64, 74, 117]. For Unknown Response Function In the previous section, the response curve is supposed to be known already. However, in most of the cases, it is unknown. Robertson et al. also presented an algorithm in case that the response curve is unknown. In such cases, the response curve must be recovered before applying Equation (3.22). Robertson et al. rewrote Equation (3.21) as Õ(I, x) = i,j w ij (I yij t i x j ) 2 (3.23) because both I m and x j values must be estimated simultaneously. They constrain the estimates for I m such that Î128 = 1.0 because the HDR image estimates ˆx j must be mapped to a usable range {0,..., 255} and the scale of ˆx j is directly dependent upon the scale of I m. Robertson et al. used the Gauss-Seidel relaxation to determine the response function because it minimizes an objective function with respect to a single variable, and then uses these new values when minimizing with respect to subsequent variables. In their algorithm, Equation (3.23) is minimized with respect to each I m and then minimized with respect to each x j. The initial Î is chosen as a linear function with Î128 = 1.0 and the initial ˆx is chosen by Equation (3.22) with the initial linear Î. Robertson et al. took the partial derivative of (3.23)

46 30 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES with respect to I m and set it equal to zero. This yields Î m = where the index set E m is defined as 1 Card(E m ) (i,j) E m t i x j (3.24) E m = {(i, j) : y i,j = m} (3.25) and the set of indices such that m was observed for the input images. Card(E m ) is the cardinality of E m, i.e., the number of times m is observed. After scaling the response function, (3.23) is minimized with respect to each x j by using (3.22). Robertson et al. suggest to iterate those process until reaching some convergence criterion. The convergence criterion is for the rate of decrease in the objective function to be below some minimal threshold. An example of the camera response function f is shown in Figure 3.6. Figure 3.6: An example of the recovered response curve (adapted from Robertson et al. [92]) Ward Method Introduction In 2003, Ward presented a new method to construct HDR images with translational alignment of hand-held photographs [109]. The input of this algorithm is

47 3.2. HDR IMAGE CONSTRUCTION METHODS 31 a set of N grayscale images in 8 bits and one of the N images is selected arbitrarily as the reference image. The output is a set of N 1 (x, y) integer offsets. Then, an HDR image can be constructed by using known camera response curve recovering function such as the Debevec-Malik [25] or the Mitsunaga-Nayar [72] methods described above. Alignment The alignment algorithm of Ward has the following features. Alignment is done on bilevel images using fast bit-manipulation routines, is insensitive to image exposure, and includes noise filtering for robustness. He introduced a new bitmap, which is called a median threshold bitmap (MTB) for alignment. An MTB is defined as follows. 1. It determines the median 8-bit value from a low-resolution histogram over the grayscale image pixels. 2. It creates a bitmap image with 0s where the input pixels are less than or equal to the median value and 1s where the pixels are greater. Once the threshold bitmaps of two exposures to be aligned are obtained, he constructs an image pyramid in order to compute overall offset. It starts with the lowest resolution MTB pair and computes the minimum difference offset between them within a range of ±1 pixel in each dimension. In the next level, the offset is multiplied by 2 and the minimum difference offset is computed again. This procedure continues until the highest resolution MTB (i.e., the original image) and the final offset result is obtained. Its computational cost is proportional to the size of the bitmaps. Threshold Noise Ward found that the above algorithm encounters a trouble with the exposures which have a large number of pixels near its median value because in such cases, the pixels near the median appear as noise in the MTB and difference computation is not done well. To solve this problem, he introduced another bitmap, exclusion bitmap. It has 0s wherever the grayscale value is within some specified distance

48 32 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES of the threshold and 1s for the other cases. He computes an exclusion bitmap for each exposure at each resolution level of the pyramid, takes the XOR difference result for the computed offset, and then takes AND of it with both offset exclusion bitmaps to compute the final difference. This ANDing process disregards differences which are less than the noise tolerance. Removing the pixels near the median value clears the least reliable bit positions in the smooth gradients but preserves the high confidence pixels near strong boundaries [109]. Additionally, he found that this optimization works efficiently in eliminating false minima in offset search algorithm. 3.3 HDR Image Formats Formats A number of HDR formats have been presented for years. This section presents an overview of them and the comparison over HDR formats provided by Ward [107] is shown in the next section. A log encoding is not an HDR image format, but it is necessary to be briefly explained because it is used in one of the HDR formats as shown in the following part of this section. Instead of using the power law I out = K v γ in gamma encoding, a log encoding quantizes values by using I out = I min [ Imax I min ] v (3.26) where the encoded value v is normalized between 0 and 1. Adjacent values differ by a constant factor which is [ Imax ] 1/N (3.27) I min where N is the number of steps in the quantization. Ward compared the log encoding with other encoding techniques, the gamma encoding and the floating point encoding. According to his comparison, the log encoding has constant error envelope over the full range of luminance, but the gamma encoding increases its error dramatically and the floating point encoding does not have perfectly equal step

49 3.3. HDR IMAGE FORMATS 33 sizes of errors [107]. The Pixar Log Encoding has three channels for RGB colors. Each of the channels is 11-bits log encodings. This encoding technique can represent a dynamic range of roughly 3.6 orders of magnitude (3600:1) in 0.4% steps. The Lawrence Berkeley National Laboratory developed the Radiance RGBE format [89]. It uses one byte for each of red, green, and blue components and one byte for an exponent. The exponent part is a scaling factor on RGB components, which is equal to two raised to the power of the exponent minus 128. According to Ward, although the Radiance RGBE format is much better than the others, it has shortcomings in terms of precision and dynamic range [107]. The first problem is its dynamic range is much more than the one that people use as color representation and most of the part of this format becomes useless. In addition, any RGB formats are restricted to a positive range, therefore it cannot cover the visible gamut using any set of real primaries. Another problem is that the distribution of errors is not perceptually uniform on this format. The Silicon Graphics Inc. developed the LogLuv format in Sam Leffler s TIFF library [58] in order to conquer the shortcomings of the RGBE format [110]. It is based upon human visual perception and designed such that the quantization steps match human contrast and color detection thresholds. This format consists of three variants of logarithmic encoding. The first variation squeezes a 10-bit log luminance value with a 14-bit CIE (u, v ) lookup into a standard 24-bit pixel. The details of CIE color space is described in [35]. The second one uses 16 bits for a pure luminance encoding. It deals with negative values and covers a dynamic range of 38 orders of magnitude in 0.3% steps. The last one also uses 16 bits for signed luminance and 8 bits for each of CIE u and v coordinates. The Industrial Light and Magic presented ILM OpenEXR format. It is a wrapper of 16-bit HALF data type which are stored as floating point numbers in 16 bits [59]. This format covers negative primary values along with positive ones, therefore it can represent the entire visible gamut and its range is about 10.7 orders of magnitude with relative precision of 0.1%. Microsoft and Hewlett-Packard proposed a new encoding technique of HDR images, scrgb, which was formerly known as srgb64. This format is based upon srgb specification and is a logical extension of 24-bit srgb to 16 linear

50 34 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES bits per primary, or 12 bits per primary using a gamma encoding. The scrgb consists of two parts. One uses 48 bits/pixel in an RGB encoding and another one uses 36 bits/pixel as either RGB or YCC encoding. Table 3.1 summarizes the information in this section. The first encoding technique (srgb) is not an HDR format, but it is given as a baseline for comparison. The bits/pixel item is for the tristimulus representation, which are the amounts of three primaries specifying a color stimulus, excluding alpha. The dynamic range is given as orders of magnitude or the 10-based logarithm of the maximum representable value over the minimum value. The actual maximum and minimum values are given in each parenthesis. RGBE and XYZE have the most dynamic range in the fewest bits and also XYZE covers the visible gamut although 76 orders of magnitude is much more than the one that can be used for human observers. Encoding Covers Gamut? Bits/pixel Dynamic Range Precision srgb No (1.0:0.025) Variable Pixar Log No (25.0:0.004) 0.4% RGBE No (10 38 : ) 1% XYZE Yes LogLuv 24 Yes (15.9: ) 1.1% LogLuv 32 Yes (10 19 : ) 0.3% EXR Yes (65000: ) 0.1% scrgb Yes (7.5:0.0023) Variable scrgb-nl Yes (6.2:0.0039) Variable scycc-nl Yes Table 3.1: HDR formats (adapted from Ward [107]) Comparison The comparison over HDR image formats was provided by Ward [107]. Figure 3.7 plots dynamic ranges vs. bits/pixel of each of the full-gamut formats. As shown in the figure, the LogLuv format in 24 bits has more dynamic range than the 36-bits scrgb-nl format and 48-bits scrgb format. The LogLuv and XYZE formats in 32 bits have much higher dynamic ranges than the others. Ward collected 33 HDR images in 8 different formats and then converted from one format to another in order to see what kind of errors occurred by encod-

3.3. HDR IMAGE FORMATS 35 Figure 3.7: Cost (bits/pixel) vs. benefit (dynamic range) of full-gamut formats (adapted from Ward [107]). ings with pixel-by-pixel comparison.

51 3.3. HDR IMAGE FORMATS 35 Figure 3.7: Cost (bits/pixel) vs. benefit (dynamic range) of full-gamut formats (adapted from Ward [107]). ings with pixel-by-pixel comparison. To have a numerical comparison, he used CIE E, which is a popular metric to quantify color differences, with applying local reference white because the CIE metric assumes a global white adaptation value but it does not make sense for HDR images. Ward used those formats on a number of images and compared the results. One of the images is the Stanford Memorial Church (Figure 3.8) and Figure 3.9 shows the quality curves of each of the formats on that image. The ideal encoding behavior is a steep descent reaching a very small percentile for anything above 2 on the CIE E axis. By this judgment of idealness, the EXR format is the best one in terms of color fidelity. Ward performed similar comparisons over all of the 33 images with eight HDR formats. Figure 3.10 shows the average of the quality curves over images. According to Ward, the 48-bit/pixel OpenEXR format is the best which demonstrates very high accuracy over all of his example images. The second best one is the 32- bit RGBE format. On the other hand, the 48-bit scrgb, 36-bit scrgb-nl, and

52 36 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES Figure 3.8: The Stanford Memorial Church ( Paul E. Debevec). scycc-nl encoding techniques showed worse performance [107]. 3.4 Taking HDR Images For our perceptual experiment, static images with different shutter speeds are taken by a camera, Kodak Professional DCS560 [32], with lenses, CANON Lense EF 24mm and 14mm [19] in the Max-Planck-Institut für Informatik. Because those images were saved in a raw format of Kodak, they were converted to 36-bit TIFF format by using a program raw2image which is involved in a Kodak DCS Photo Desk package [32]. Then, the response curve of the camera was recovered by using the Robertson et al. method (see Section 3.2.4) which is the recent method to recover camera response curves with stable images. The advantage of this method is that no assumption is necessary for the response curve and response curves can be reconstructed in any arbitrary shape while Debevec-Malik method

53 3.4. TAKING HDR IMAGES 37 Figure 3.9: Quality curves on the Stanford Memorial Church (adapted from Ward [107]). It shows the percentage of pixels above a particular CIE E for each encoding. Figure 3.10: Average quality curves over 33 images (adapted from Ward [107]). assumes that the response curve is smooth and Mitsunaga-Nayar method returns a polynomial response curve. Since our camera was set in a stable condition, it was not necessary to use Ward method. As described in Section 3.2.5, his algo-

54 38 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES rithm can be applied when hand-held pictures are taken. After the response curve was recovered, HDR images were created with it and then saved in the Radiance RGBE format. For constructing one HDR image, 15 images were taken for our project. The range of their shutter speed was 1/2000 to 8.0 seconds. HDR images for our experiment and their dynamic ranges are shown in Figure 3.11.

55 3.4. TAKING HDR IMAGES 39 (a) Maximum pixel luminance: 4, , minimum luminance: , dynamic range: 638,019:1. (b) Maximum pixel luminance: , minimum luminance: 0.006, dynamic range: 26,616:1. Figure 3.11: HDR images for our perceptual experiment. Both of them were taken in the Max-Planck-Institut für Informatik in Saarbrücken, Germany.

56 40 CHAPTER 3. HIGH DYNAMIC RANGE IMAGES

57 Chapter 4 Tone Mapping Operators 4.1 Introduction The concept of a tone reproduction was first introduced to the computer graphics community by Tumblin and Rushmeirer in 1993 [103]. The goal of tone reproductions is to compress the dynamic range of an image, which has much bigger luminance range than that of the physical devices, to the displayable dynamic range. Most of the tone mapping operators can be categorized into two types: global (also known as spatially uniform or single-scale ) and local (also known as spatially varying or multi-scale ) operators [27]. Global operators apply the same transformation onto every pixel of an image. Local operators choose different scales onto different area of an image. Global operators are more easily implemented than local ones because they need to find only one transformation, but local operators can usually provide better results. One of the well-known problems with local operators is halo effects (inverse gradients). This can manifest as a dark aura around a very bright light source. In addition to global and local tone reproductions, there is one more category: time-dependent tone mapping operators which use adaptation over time. This section provides an overview of the existing tone mapping operators and two of them implemented within the scope of this thesis are in details described in the following two chapters. Seven of those tone mapping methods are used in our perceptual experiment described in Chapter 8. 41

58 42 CHAPTER 4. TONE MAPPING OPERATORS 4.2 Global Operators The simplest tone reproduction is a linear mapping. It scales the radiances onto the range between 0 and 255 brutally. If the log of the radiances is taken and linearly scaled to [0, 255], it is called a logarithmic linear mapping. In 1993, Tumblin and Rushmeier presented a tone mapping operator which preserves an overall impression of observers in terms of brightness [103]. They created a HVS model mathematically including visual effects while converting real-world luminance values to perceived brightness images. This method transforms the real-world luminance values to the display ones in order to closely match the brightness of the real-world image and the display image. This tone mapping method works only on grayscale images. This approach was revised to incorporate a linear scaling factor based upon adaptation luminance by Tumblin in 1999 [104]. Ward developed a new tone mapping method preserving perceived contrast in 1994 [108]. This method is based upon human vision contrast sensitivity studied by Blackwell [14]. The linear scaling factor is calculated by using the ratio of the world and the display luminances. This approach was later used in the work of Ferwerda et al. [38] with more accurate expressions for the TVI function. The histogram adjustment tone mapping operator was presented by Ward Larson et al. in 1997 [111]. They extended the earlier works of Ward [108] and Ferwerda et al. [38]. This method uses the knowledge that human eyes are sensitive to relative rather than absolute changes to luminances. This approach avoids halos altogether. Although this method still has several problems, it can provide the best combination of practical simplicity and uniformly good perceptually advocated results. Tumblin et al. proposed two tone mapping reproductions in 1999 [102]. Their methods imitate some of the visual adaptation processes of HVS and are revised upon Tumblin and Rushmeier s work [103]. The first method is a layering method which builds a display image from several layers of lighting and surface properties. The second method is a foveal method which interactively adjusts to preserve the fine details in the region around the viewer s gaze. In 2000, Scheel et al. developed an interactive application of tone reproduction

59 4.3. LOCAL OPERATORS 43 by representing luminances as a texture [94]. The luminance of each vertex is coded into texture coordinates and those luminance values are mapped into display ones by using Ward [108] and Ward Larson s [111] models. Drago et al. presented a perception-motivated tone mapping algorithm for interactive display of HDR images in 2003 [29]. They used logarithmic functions with different bases for darkest and brightest area of an image to compress luminance values. To compress other luminance values in between, a bias function was used. Their method can be applied to HDR video sequences. 4.3 Local Operators In 1971, Land and McCann introduced the Retinex algorithm [55]. This method does not calculate physical or perceived reflectance but considers the human sensory response to lightness in a scene. It maximizes the range of luminance and improves visual difference of radiance. This algorithm was extended to deal with human visual perception by Jobson et al. in 1997 by using multiscale approach [51]. Chiu et al. developed a spatially non-uniform scaling function for HDR images in 1993 [21]. Their basis is that the eye is more sensitive to reflectance than luminance, so that slow spatial variation in luminance may not be greatly perceptible. Their work was extended by Schlick by concentrating computational efficiency and simplifying parameters in 1994 [95]. This method uses the first degree rational polynomial function in order to map real-world luminances onto the display values. Tanaka and Ohnishi created a locally varying scale factor from a Gaussian lowpass filter to reduce image contrast and modeled filters on the center-surround arrangement of retinal receptive fields in 1997 [100]. Because repeated filtering is unnecessary, their method is simpler and faster than that of Chiu et al. In 1998, Pattanaik et al. presented used a multiscale representation of pattern, luminance, and color processing in the HVS and addressed the problems of HDR images [83]. Their model computes adaptation and spatial vision for realistic tone reproduction. Their method consists of two parts: the visual model and the display model. The visual model processes an input image and encodes the perceived

60 44 CHAPTER 4. TONE MAPPING OPERATORS contrast for the chromatic and achromatic channels. The display model provides a reconstructed output by using the encoded information. All of the methods by Chiu et al., Tanaka and Ohnishi, and Pattanaik et al. are affected by halos. This problem is discussed in detail by Tumblin [102, 104]. The low curvature image simplifier (LCIS) method was produced by Tumblin and Turk in 1999 [102]. In this method, an input image is separated into large features and fine details. The features are compressed and the details are preserved. This method is not affected by halos. However, this algorithm is very slow in terms of computation and often leads to images with low contrast. A number of successful local tone mapping operators were introduced in Fattal, Lischinski, and Werman produced a tone mapping operator which identifies gradients at various scales and reduces the magnitudes of large luminance gradients while preserving small changes that are responsible for high frequency details in the image [36]. The Gaussian pyramid approach is used in order to divide low frequency details which has to be compressed strongly and high frequency details which must not be compressed. To avoid halos, this scaling is performed on the finer image resolution while the gradient attenuation function is applied for each pyramid level. Reinhard et al. produced the Photographic Tone Reproduction, which is inspired by dodging and burning used in photography [90]. This method works on various types of HDR images and its computational speed is very good. Additionally, the recent version of this method operates automatically freeing the user from setting parameters that are not particularly intuitive. Ashikhmin presented a new tone mapping method which works on multipass approach [8]. His method calculates local adaptation luminance, applies a tone mapping function using the TVI function, and then calculates the final pixel values to preserve details throughout an image. This method is implemented by us and is described in detail in Chapter 5. Durand and Dorsey presented an edge-preserving filtering called the bilateral filtering method [31]. This method considers two different spatial frequency layers, base and detail, then reduces contrast and preserves details of an image. Our implementation of this method are described in Chapter 6. In 2003, Choudhury and Tumblin introduced the trilateral filtering method

61 4.4. TIME-DEPENDENT OPERATORS 45 [22]. It is a single-pass nonlinear filter for edge-preserving smoothing and visual detail removal. They use two modified forms of Tomasi and Manduchi s bilateral filter [101] and then use the new trilateral filter in order to smooth an image towards a sharply-bounded, piecewise-linear approximation. 4.4 Time-Dependent Operators In 1996, Ferwerda et al. presented a model accounting for changes in color appearance, visual acuity, and temporal sensitivity while preserving global visibility [38]. Their method uses the concept of matching the just noticeable differences (JNDs) for a variety of adaptation levels. They consider both rod and cone response and the aspect of adaptation over time. In 2000, Pattanaik et al. produced a new time-dependent tone mapping operator which is based upon psychophysical experiments and creates color image sequence from any input scene [84]. Their model dealt with the changes of threshold visibility, color appearance, visual acuity, and sensitivity over time. It is also useful to predict the visibility and appearance of scene features. An interactive tone mapping operator was presented by Durand and Dorsey in 2000 [30]. Their algorithm is based upon the visual adaptation. They also proposed extensions to Ferwerda et al. s time-dependent reproduction and incorporated it into global illumination solutions and interactive walkthroughs. Scenes are rendered interactively by a tone mapping by being accelerated by caching the function in look-up tables.

62 46 CHAPTER 4. TONE MAPPING OPERATORS

63 Chapter 5 Ashikhmin s Tone Reproduction 5.1 Introduction Ashikhmin introduced a new tone mapping operator in 2002 [8] which we implemented within the scope of this thesis. This operator can be categorized into local tone mapping operators. The characteristic of this operator is that local adaptation level to signal absolute brightness and local contrast are dealt with together but separately. 5.2 Overview of the Method A local contrast c at a pixel (x, y) is defined as c(x, y) = L(x, y) L a (x, y) 1 (5.1) where L(x, y) is a pixel luminance and L a (x, y) is a local adaptation level which is an average luminance over some neighborhood around (x, y) whose size is different in each pixel. L a (x, y) cannot be just an average over constant size neighborhood because of the well known problem, halos. The size of neighborhood is adjusted by computing adaptation luminance. In smooth region, its size can be rather larger than in high contrast one. This adjustment can be done by applying a simple constraint on the allowed output contrast signal associated with the 47

64 48 CHAPTER 5. ASHIKHMIN S TONE REPRODUCTION neighborhood of a given size. How to estimate a L a image is explained in detail in Section 5.3. Assume the local adaptation level L a is successfully calculated. The next step is to apply tone mapping function T M onto L a in order to create a tone mapped adaptation image T M(L a (x, y)). The purpose of T M function is to compress a high dynamic range image to the display range while trying to convey the overall impression of brightness. The argument of T M function should be a luminance value only. It is a local adaptation level L a for most of the cases, but applying T M function directly to the original image (i.e., T M(L(x, y))) can also produce sufficient results for some cases. The details of T M function is described in Section 5.4. Once a local adaptation level L a (x, y) and a tone mapped adaptation image T M(L a (x, y)) are obtained, the final pixel value can be calculated by using these two values. Using the requirement of local contrast preservation as c d (x, y) = c(x, y) (subscript d refers to display, i.e., tone mapped, quantities) and Equation (5.1), the formula for the final mapping is L d (x, y) = L(x, y)t M(L a(x, y)). (5.2) L a (x, y) Equation (5.2) can produce visually pleasing high contrast images. To preserve visible contrast rather than the one given by Equation (5.2), a visible contrast c v is defined as c v (x, y) = L(x, y) L a(x, y) T V I(L a (x, y)) (5.3) where T V I(L a (x, y)) is the threshold vs. intensity function which provides the value of just noticeable intensity difference for given adaptation level L a (see Section 2.3.2). Using Equation (5.3), the final pixel value is computed as L d (x, y) = T M(L a (x, y) + T V I(T M(L a(x, y))) (L(x, y) L a (x, y)). (5.4) T V I(L a (x, y)) Section 5.4 describes how to select appropriate T V I functions. The subsequent steps of Ashikhmin s tone mapping operator can be summarized as follows.

65 5.3. LOCAL ADAPTATION LEVEL 49 Obtaining luminance channel and constructing a Gaussian pyramid for the next step (see Section 5.3.3). Computing local adaptation luminance L a (x, y) (see Section 5.3). Applying tone mapping function T M onto L a (x, y) (see Section 5.4). Computing final pixel luminance by Equation (5.2) or Equation (5.4). Re-assembling color image by applying the scaling obtained for luminance to each of the original RGB channels. Correcting gamma values. 5.3 Local Adaptation Level Band-Limited Local Contrast Ashikhmin proposes an approach based upon balancing two opposing requirements faced by HVS: keeping the local contrast signal within reasonable bounds while maintaining enough information about image details. A local adaptation level (LAL) can be computed as an average luminance over some neighborhood around every pixel of an image. Larger neighborhood increase the range of contrast and preserve details better. However, large neighborhood in a high contrast image does not provide a good result. Therefore, the key of LAL estimation is that it is necessary to find the largest size of neighborhood which consists of some limit on the generated contrast signal, or, equivalently, over the largest neighborhood which is still sufficiently uniform not to generate excessive contrast signals [8]. Equations (5.1) and (5.3) can estimate the maximum output signal, but they are not efficient ways because adaptation level should be calculated at every pixel with its neighborhood every time. Additionally, smoothly varying local adaptation is wanted, but it is hard to be obtained by using single pixel sensitive measure such as Equation (5.1). Instead of that, Ashikhmin suggests to use a measure of

66 50 CHAPTER 5. ASHIKHMIN S TONE REPRODUCTION neighborhood uniformity. His key observation is that increasing the size of neighborhood does not significantly affect its average luminance value for a uniform neighborhood. Perceptionally, multiscale local contrast is a well-known feature of HVS and determines the band-limited local contrast lc at every (x, y) of an image is an appropriate way to characterize the effects of local adaptation level. It is usually calculated by the ratio of two different lowpass filtered images with the widths related by a factor of 2: lc(s, x, y) = G s(l(x, y)) G 2s (L(x, y)) G s (L(x, y)) (5.5) where G s (L) represents the result of applying the Gaussian filter to the original image L with its width s LAL Calculation The limit of this band-limited contrast must be decided. The calculation is started from the smallest neighborhood, i.e., lowpass filter with its width s = 1 in pixel units. Then, band-limited contrast lc is calculated by Equation (5.5). If it is larger than the upper bound of the contrast, the adaptation value at that point can be the pixel itself. Otherwise, computation is continued by incrementing s by 1. This procedure is finished either when lc reaches the ceiling of the local contrast value or when s reaches the maximum neighborhood size. Ashikhmin suggests s max = 10 pixels. It is found that increasing s max increases the potential of false intensity jumps in the adaptation image. In addition, lc is assumed to be a continuous function of s, therefore linear interpolation can find s more accurately. The upper bound of lc is set 0.5 by Ashikhmin for most of the images. However, for some images, different value of ceiling works better. Once s is decided at (x, y), L a (x, y) is simply taken to be G s (L(x, y)). An example of L a image is shown in Figure 5.1.

67 5.3. LOCAL ADAPTATION LEVEL 51 (a) The original image (b) L a image Figure 5.1: An example of L a processing. (Proposed Burswood Hotel Suite Refurbishment. Interior design - the Marsh Partnership, Perth, Australia; computer simulation - Lighting Images, Perth, Australia. Copyright 1995 Simon Crone.)

68 52 CHAPTER 5. ASHIKHMIN S TONE REPRODUCTION Gaussian Pyramid Overview Because applying the Gaussian filter at every pixel with various of its width takes extremely long time, Gaussian pyramid (Burt-Adelson image pyramid) is used to speed up the LAL computation. Gaussian pyramid was first introduced by Adelson and Burt in 1981 [2]. It is an image encoding technique in which Gaussianlike operator serves as the basis function. It reduces an image g 0 to a new image g 1 whose resolution and sample density are decreased from g 0. Filtering is performed with a Gaussian-like weighting function, which is called the generating kernel. By iterating the same sampling step, a sequence g 0, g 1, g 2... is produced and it is called the Gaussian pyramid. Generating Gaussian Pyramid The original image becomes the base of Gaussian pyramid, g 0, whose level is 0. The next level, level 1, is the image g 1 which is lowpass filtered version of g 0. Each value of g 1 is calculated as weighted average of ones in g 0. With the same manner, the l th level, g l, is obtained by lowpass filtering g l 1. The graphical representation of this process in one dimension is shown in Figure 5.2. The general representation of Gaussian pyramid generation is mathematically written as 2 2 g l (x, y) = w(i, j)g l 1 (2x + i, 2y + j). (5.6) i= 2 j= 2 This level-to-level averaging process is named REDUCE by Adelson and Burt [2]. Equation (5.6) can be written as g l = REDUCE(g l 1 ). The averaging process over neighborhood uses a pattern of weights which is described in the following section. The Generating Kernel The same pattern of weights w (5 5 for this work) is applied in each level of a pyramid. This pattern of weights is called the generating kernel [2] and must

69 5.3. LOCAL ADAPTATION LEVEL 53 Figure 5.2: Gaussian pyramid construction in one dimension (adapted from Burt and Adelson [18]). g 0 is an original image and g l = REDUCE[g l 1 ]. satisfy several constraints. First, w is separable w(i, j) = Ŵ (i)ŵ (j). (5.7) Secondly, Ŵ is normalized and symmetric 2 Ŵ (k) = 1 (5.8) k= 2 Ŵ (k) = Ŵ ( k) (5.9) for k = 0, 1, 2. In addition, there is one more constraint, equal contribution, which is defined by Burt [17]. Let w(0) = a, w( 1) = w(1) = b, and w( 2) = w(2) = c. The equal contribution requires a + 2b = 2c. (5.10) All of the constraints (5.7) (5.10) are satisfied when w(0) = a, w( 1) = w(1) = 1 4 a, w( 2) = w(2) = 1 4 a 2.

70 54 CHAPTER 5. ASHIKHMIN S TONE REPRODUCTION Equivalent Weighting Functions How can we find an appropriate a of the generating kernel? Iterative pyramid construction is equivalent to convolving an original image g 0 by using a set of weighting functions h l : g l = h l g 0. The size of the equivalent weighting function doubles level by level. Equivalent weighting functions for the levels 1, 2, 3, and infinity of the Gaussian pyramid are shown in Figure 5.3 in which a = 0.4 and characteristic shapes with four different a s are shown in Figure 5.3. The equivalent weighting functions are particularly Gaussian-like when a = 0.4. Finally, the 5 5 generating kernel is set as shown Figure 5.3: Left: The equivalent weighting functions h l (x) in the levels 1, 2, 3, and infinity of the Gaussian pyramid with a = 0.4 (adapted from Burt and Adelson [18]). Note that the axis scales are adjusted by factors of 2 for comparison. The resulting equivalent weighting functions are very similar to the Gaussian probability density functions. Right: The equivalent weighting functions with a = 0.6, a = 0.5, a = 0.4, and a = 0.3 (adapted from Burt and Adelson [18]). The function with a = 0.4 is particularly Gaussian-like.

71 5.4. TONE MAPPING FUNCTION 55 in Table Table 5.1: 5 5 generating kernel. Linear Interpolation and Results Since the size of the generating kernel is 5 5 and the size of the equivalent weighting functions doubles level by level, the level l of the Gaussian pyramid, g l, is as same as the one convolved with the kernel size 5l. In order to calculate the band-limited local contrast defined in Equation (5.5), the levels of the Gaussian pyramid needs to be interpolated. They are simply interpolated linearly by using the ratio of heights of the levels. An example of a linearly interpolated Gaussian pyramid is shown in Figure Tone Mapping Function Perceptual Capacity The purpose of a tone mapping function is to compress the range of luminances of a HDR image into the display ones. In order to create a tone mapping function, a new notation, perceptual capacity of a range of luminance values, is introduced by Ashikhmin. Its intuition is that human sensitivity to luminance changes which is given by the TVI function provides a natural scaling factor for a given small range of luminances L. It is defined as L/T V I(L a ) no matter where in the range of luminances it lies. How can we deal with the TVI functions? Threshold sensitivity does not depend on luminance but depends on local adaptation values L a. Most tone mapping operators use the simplest assumption that there exists a single L a value for the

72 56 CHAPTER 5. ASHIKHMIN S TONE REPRODUCTION whole image, but as discussed in [73] and [23], L a is highly localized. Low contrast image could use only one L a value for the whole image, but high contrast image must have different L a values in different part of the image. Ashikhmin suggests to take the world luminance L as approximate measure of adaptation level. Then, an auxiliary capacity function C for a single world luminance value L is defined as C(L) = L 0 1 dl (5.11) T V I(l) representing perceptual capacity of the world luminance range from 0 to L. For the display image, a common approximation of single adaptation value L ad is used as C d = L T V I(L ad ). (5.12) Different TVI functions can be used for display and world perceptual capacities. A log-log space of TVI function for cones and rods is shown in Figure 5.5. TVI function for cone vision is appropriate for display luminance, and for world luminance, either cone or rod TVI function can be more appropriate. Ashikhmin follows the rule established by Ward et al. [111] who simply use a single function combined at the crossing point C in Figure 5.5. Because integrating Equation (5.11) is too costly, Ashikhmin simplifies the TVI shape as four linear segments AB, BC, CD, and DE as shown in Figure 5.5. More accurate approximation could work better, but for this tone mapping operator, this simplification is sufficient. With that approximation, the world capacity function becomes L/ (if L < ) log(l/0.0034)/ (if L < 1) C(L) = (L 1)/ (if 1 L < ) log(l/7.2444)/ (otherwise) where the unit of L is cd/m 2.

73 5.5. COMPLETE PROCEDURE Tone Mapping Function The mapping operation is based upon the concept that world luminances are mapped into display ones according to their relative positions in corresponding perceptual scales. This principle is written mathematically as C d (L d ) C d (L mind ) C d (L maxd ) C d (L mind ) = C(L) C(L min) C(L max ) C(L min ). (5.13) For any world luminance L, Equation (5.13) defines its corresponding world luminance. Equation (5.12) and (5.13) lead the tone mapping function as L d = T M(L) = LDMAX C(L) C(L min) C(L max ) C(L min ) where LDMAX is the maximum displayable luminance and L max and L min are the maximum and minimum luminance values, respectively. Ashikhmin suggests LDMAX = 100 cd/m 2, but it can be bigger (e.g. LDMAX = 500 cd/m 2 ) nowadays. For some images, applying this function directly onto the pixel luminances leads to good results. An example is shown in Figure Complete Procedure Calculating New RGB Values Firstly, the L a image is produced. Secondly, the tone mapping function T M is applied onto L a as T M(L a (x, y) for each pixel. Lastly, the final pixel value is calculated by using Equation (5.2) (or Equation (5.4)). This provides the new luminance values for each pixel. Then, multiplying each of the original RGB components by L d (x, y)/l(x, y) produces the new RGB values Gamma Correction The intensity of light on a monitor is not usually a linear function of applied signals. It is a power function and gamma γ is a numerical parameter to describe the nonlinearity of the intensity reproduction, i.e., it is the exponent of that function.

74 58 CHAPTER 5. ASHIKHMIN S TONE REPRODUCTION In [91], it is presented as ( V v0 L = l where L is the intensity, V is the cathode-ray tube (CRT) drive voltage, v 0 is the voltage error in setting the black level, and l 0 is the combined effect of black-level setting error, stray light, and measurement instrument offset. A detailed survey about l 0 and v 0 is presented by Roberts [91]. What is the gamma correction and why is it necessary? Gamma correction is the process of compensating for the nonlinearity, which is mentioned above, in order to achieve correct reproduction of intensity. It is computed by using hardware lookup tables on the fly on their way to the display. The voltage between 0 and 1 which is required to display RGB intensity between 0 and 1 is given by signal = intensity 1 γ. Details about hardware lookup tables are explained by Poynton [88]. Ashikhmin suggests to set γ = 2.4 for tone mapped images which are generated by his algorithm. ) γ

5.5. COMPLETE PROCEDURE 59 (a) Level 0 (The original

is generated by using the generating kernel from it.

75 5.5. COMPLETE PROCEDURE 59 (a) Level 0 (The original image) (b) Level 1 (c) Level 2 (d) Level 3 (e) Level 4 Figure 5.4: An example of the Gaussian pyramid of the hotel room image. The level 0 is same as the original image and the level 4 is generated by using the generating kernel from it. The levels 1, 2, and 3 are linearly interpolated between them. Note that the RGB values for the levels 1, 2, 3, and 4 are ignored and only their luminance values are generated.

76 60 CHAPTER 5. ASHIKHMIN S TONE REPRODUCTION Figure 5.5: TVI function in log-log space (adapted from Ferwerda [37] and Ashikhmin [8]). Ashikhmin used the TVI function with linear approximation which is drawn by the lines AB, BC, CD, and DE. Figure 5.6: An example of tone mapped image by using the function T M(L). T M function is directly applied onto the original pixel values.

77 Chapter 6 Fast Bilateral Filtering 6.1 Introduction In 1998, Tomasi and Manduchi presented a non-linear filter, the bilateral filtering, for both gray and color images inspired by the Gaussian blur [101]. Their intuition is that images vary slowly over space, therefore pixels which are close to each other have similar values and averaging them together is appropriate filtering. However, this assumption of slow variation fails at edges of an image. To solve this problem, their method applies weights onto pixels as the Gaussian blur does, but if the difference of intensities is too big, it decreases the weights of pixels and prevents blurring across edges. Figure 6.1 shows examples of bilaterally (a) The original image. (b) Bilaterally filtered image after one iteration. (c) After five iterations. Figure 6.1: Examples of the bilaterally filtered images of Tomasi and Manduchi (adapted from Tomasi and Manduchi [101]). 61

78 62 CHAPTER 6. FAST BILATERAL FILTERING filtered color images. Based upon the bilateral filtering by Tomasi and Manduchi, Durand and Dorsey presented the fast bilateral filtering method to display HDR images in 2002 [31]. This method reduces the contrast but preserves details of HDR images. It decomposits an image into two layers: a base layer and a detail layer. The base layer contains a contrast-reduced image and the detail layer preserves details. The final image is composed from those two layers. Durand and Dorsey include the bilateral filtering in the framework of robust statistics. Additionally, they improved the computational speed of the bilateral filtering which is costly if implemented straightforward. They also computed the uncertainty of the output image permitting the correction of doubtful values. Finally, they used the bilateral filtering to display high dynamic range images. This method is fast and stable and required no parameter settings for users. 6.2 Bilateral Filtering The bilateral filtering developed by Tomasi and Manduchi is an alternative to anisotropic diffusion. The anisotropic diffusion was presented by Perona and Malik in 1990 which was inspired by an interpretation of the Gaussian blur [87]. They introduced a new function, the edge-stopping function, varying the conductance according to the image gradient. Perona and Malik introduced two expressions of the edge-stopping function g(x) as g 1 (x) = x2 σ 2 (6.1) g 2 (x) = e (x2 /σ 2 (6.2) where σ is a scale parameter in the intensity domain specifying what gradient intensity should stop diffusion. The discrete Perona-Malik diffusion equation with the value I s at a pixel s is given as I t+1 s = I t s + λ 4 p neighb 4 (s) g(i t p I t s)(i t p I t s) (6.3)

79 6.2. BILATERAL FILTERING 63 where g is the edge-stopping function, t is discrete time steps, neighb 4 (s) is the 4 neighborhood of the pixel s, and λ is a scalar value determining the rate of diffusion. One of the disadvantages of the anisotropic diffusion is that it needs long time to be processed. Additionally, its result depends upon the stopping time. In 1998, Black et al. recasted the anisotropic diffusion in the framework of robust statistics [13] and Dorand and Dorsey s bilateral filtering is derived from their analysis. The robust statistics develops estimators that are robust to outliers or deviation to the theoretical distribution [48, 46]. Black et al. tested the anisotropic diffusion which minimizes an energy over the whole image min s Ω p neighb 4 (s) ρ(i p I s ) (6.4) where Ω is the whole image and ρ is an error norm. Equation (6.4) is solved by gradient descent for each pixel I t+1 s = I t s + λ 4 p neighb 4 (s) ψ(i p I s ) where ψ is the derivative of ρ and t is a discrete time step variable. ψ is proportional to the influence function which characterizes the influence of a sample on the estimate. Black et al. [13] defined g(x) = ψ(x)/x and tested several edgestopping functions such as Huber Lorentz g σ (x) = with σ, 1 σ 1 x x σ otherwise (6.5) g σ (x) = 2 2+ x2 σ 2 (6.6) with σ/ 2,

80 64 CHAPTER 6. FAST BILATERAL FILTERING Tukey g σ (x) = with σ 5, 1 2 (x/σ)2 ] 2 x σ 0 otherwise (6.7) and Gauss g σ (x) = with σ. e x2 2σ 2 (6.8) Note that the value of σ should be changed accordingly to use a consistent scale across estimators for the Lorentz and Tukey functions. Black et al. found that Tukey s function yields more robust results. Tomasi and Manduchi developed the bilateral filtering which is a non-linear filter whose output is a weighted average of the input. As shown in Figure 6.2, it starts with the Gaussian filtering with a spatial kernel f. The weights also depend Figure 6.2: Bilateral filtering. (adapted from Durand and Dorsey [31]) upon the function g in the intensity domain in order to decrease weights with large intensity differences. f is used for the spatial domain and g is for the intensity domain. g is the edge-stopping function which is similar to the one by Perona and Malik [87]. The output of the bilateral filtering for a pixel s is given as J s = 1 k(s) f(p s)g(i p I s )I p (6.9) p Ω

81 6.3. EDGE-PRESERVING SMOOTHING WITH ROBUST ESTIMATORS 65 where k(s) is a normalization term: k(s) = p Ω f(p s)g(i p I s ). The value at a pixel s is mainly influenced by pixels that are spatially close and have similar intensities. 6.3 Edge-Preserving Smoothing with Robust Estimators Although (6.3) uses I t p I t s as the derivative of I t in one dimension, it can also be considered as the 0-order difference between the two pixel intensities. Durand and Dorsey extended the 0-order anisotropic diffusion to a larger spatial support as I t+1 s = I t s + λ p Ω f(p s)g(i t p I t s)(i t p I t s) (6.10) where f is a spatial weighting function (e.g., the Gaussian), Ω is the whole image, and t is the discrete time variable. g is the anisotropic diffusion of Perona and Malik, which is called local diffusion in Durand and Dorsey s bilateral filtering, and g is zero except at the four neighbors of each pixel. Equation (6.10) defines a robust statistical estimator of M-estimators which is a generalized maximum likelihood estimator [46, 48]. M-estimators attempt to limit the influence of outliers by replacing the square of the residuals with a less rapidly increasing loss function of the data value and parameter estimate. If g is uniform (i.e., isotropic) and f is a Gaussian, (6.10) is a Gaussian blur. (6.9) defines an estimator based upon a weighted average of the data, which is called W-estimator [46]. Durand and Dorsey claim that the iterative formulation is an instance of iteratively reweighted least squares and that it is important because according to Hampel et al., M-estimators and W-estimators are essentially equivalent and solve the same energy minimization problem: min s Ω p(i s I p ) (6.11) p Ω

82 66 CHAPTER 6. FAST BILATERAL FILTERING or for each pixel s: ψ(i s I p ) = 0 (6.12) p Ω where ψ is the derivative of ρ [46]. W-estimators are an alternative of M- estimators. Each W-estimator has a characteristic weight function which represents the importance of each sample in its contribution to the estimation. Based upon the analysis of Black et al. [13], Durand and Dorsey define ψ as ψ(x) = g(x) x to find the original formulations. Because g 2 in (6.2) corresponds to the Gaussian influence function for the bilateral filtering, all of the edge-stopping functions for anisotropic diffusion can be used in the bilateral filtering. For influence functions, Durand and Dorsey had an observation over (6.5) (6.8). Then, they found that the Gaussian and Tukey influence functions are sufficient because they are more robust to outliers and better preserve edges. 6.4 Efficient Bilateral Filtering Introduction The direct implementation of the bilateral filtering requires O(n 2 ) time with n pixels in an image. Durand and Dorsey accelerate it by using two strategies. One is a piecewise-linear approximation for the intensity domain. Another one is a sub-sampling for the spatial domain Piecewise-Linear Bilateral Filtering It is well known that the Gaussian filtering can be highly accelerated by using the Fast Fourier Transform. However, this idea cannot be directly used for the bilateral filtering. For acceleration in the intensity domain, Durand and Dorsey discretized the set of signal intensities into segments {i j } and computed a linear

83 6.4. EFFICIENT BILATERAL FILTERING 67 filter for each segment J j s = = 1 k j (s) 1 k j (s) f(p s)g(i p i j )I p p Ω f(p s)hp j (6.13) p Ω and k j (s) = p Ω f(p s)g(i p i j ) = p Ω f(p s)g j (p) (6.14) The final output for a pixel s is a linear interpolation between the output Js j of the two closest values i j of I s. This is called the piecewise-linear approximation of the original bilateral filter. Figure 6.3 shows its speed-up with 17 segments. As shown in the figure, the bigger a spatial kernel size is, the more the original bilateral filtering takes time and piecewise-linear approximation saves an amount of computational time. Figure 6.3: Speed-up of the piecewise-linear acceleration for 17 segments and a image (adapted from Durand and Dorsey [31]).

84 68 CHAPTER 6. FAST BILATERAL FILTERING Subsampling Durand and Dorsey found that downsampling accelerates their algorithm except the final interpolation. They selected the nearest-neighbor downsampling method because the histogram is not modified by it. The result of acceleration is shown in Figure 6.4. The downsampling factors 10 to 25 were tested and no artifacts were detected. Figure 6.4: Speed-up by downsampling for 17 segments and a image (adapted from Durand and Dorsey [31]). The value for the full-scale filtering is 173 sec Uncertainty As Tumblin et al. pointed that edge-preserving contrast reduction encounters small halo artifacts [104, 105], Durand and Dorsey s bilateral filtering had similar noises. By using the factor k of (6.9), which is the sum of the influence of each pixel, they detected pixels which need to be fixed. Then, they computed a lowpass version J of J by using a small Gaussian kernel and linearly interpolated J and J with the log of the uncertainty k.

85 6.5. CONTRAST REDUCTION Contrast Reduction At this point, Durand and Dorsey s method is not a tone reproduction in the sense of Tumblin and Rushmeier s definition [103] because it does not imitate human vision. Durand and Dorsey used their bilateral filtering method to reduce contrast of an image, i.e., to display a high dynamic range image. They divided an image into two layers: a base layer and a detail layer. A base layer is created by using the bilateral filtering and contains contrast reduced image. A detail layer is the division of the input intensity by the base layer and preserves detail. They compressed the range of the base layer with a scale factor in the log domain following Tumblin and Turk [104, 105]. In Durand and Dorsey s work, the scale factor is calculated such that the whole range of the base layer is compressed to a user-controllable base contrast. They found that a base contrast of 5 works well for most of the scenes, but for some situations, it needs to be changed. To deal with colors, Durand and Dorsey performed contrast reduction first and then recomposed colors. They suggest to calculate on the logs of pixel intensities. For filtering, they experimented over influence functions (6.5) (6.8) and found that Huber decreases halos compared to Gaussian but cannot remove them. Additionally, it was found that the results vary with the size of the spatial kernel. As a result, the best influence functions are Gaussian and Tukey to decompose an image accurately. For those two functions, the influence of their spatial kernel σ r is small on results as negligible. Durand and Dorsey found that σ r = 0.4 performs well after their experiments.

86 70 CHAPTER 6. FAST BILATERAL FILTERING

87 Chapter 7 Multivariate Statistical Methods 7.1 Introduction Before discussing our psychophysical experiment, this section presents an overview of multivariate statistical analyses. A variate is a generalization of the concept of a random variable that is defined without reference to a particular type of probabilistic experiment. It is defined as the set of all random variables that obey a given probabilistic law. Multivariate is a vector each of whose elements is a variate [113]. Multivariate statistical analysis has been increasingly popular to analyze a set of data. It can provide analysis when there are many independent variables (IVs) and dependent variables (DVs). If variables are measured (e.g., test scores), they are called dependent variables because they are dependent upon subjects responses. On the other hand, if they are manipulated or controlled (e.g., a teaching method), they are called independent variables or factors because they do not depend upon the initial reaction patterns, features, intentions, etc. of the subjects. There is a number of statistical methods as shown in Figure 7.1. Researchers choose an appropriate technique among them by a research question, numbers of DVs and IVs of a data set, and covariates. Multivariate statistics is an extension of univariate and bivariate statistics. Univariate statistics is an analysis of research with a single DV, but there can be multiple IVs. Analysis of variance (ANOVA) is the prime example of univariate statistics. Bivariate statistics are analyses with 71

88 72 CHAPTER 7. MULTIVARIATE STATISTICAL METHODS Figure 7.1: Choices among statistical techniques (adapted from Tabachnick [99]). two variables and study the relationship between them. Multivariate statistics provide analysis when there are many IVs and/or many DVs all correlated with one another to varying degrees [99]. Multivariate analysis of variance (MANOVA) is described in the following section.

89 7.2. UNIVARIATE ANALYSIS: ANALYSIS OF VARIANCE 73 Two effects of variables are considered for univariate, bivariate, and multivariate statistics: main effects and interaction effects. The idea of interaction was first presented by Fisher in 1926 [40]. It occurs if a relation between two or more variables is modified by other variables. In other words, the simplest interaction is that two IVs interact if the effect of one of the variables differs depending on the levels 1 of the other variable. This is a two-way interaction. The two-way interaction is that an effect for the factor A is modified by another factor B, the three-way interaction is that the two-way interaction between A and B is modified by the factor C, and so on. In many cases, five- or higher-way interactions are very difficult to interpret and not so meaningful [56]. Advantages of ANOVA over t test are that ANOVA can be tested with each factor although there is an interaction effect and that ANOVA can detect interaction effects between variables [98]. A main effect of an IV is the effect of the variable averaging over all levels of other variables in the experiment. ANOVA provides a significance test for the main effect of each variable in a design. If a main effect of an IV is significant, then a null hypothesis that there is no difference between means will be rejected. 7.2 Univariate Analysis: Analysis of Variance Introduction Analysis of variance (ANOVA) is used to test a hypothesis about differences of two or more means of variables. Mean µ is given by µ = P n i=1 x i n where x i is each value of the sample in the group and n is the sample size. To test a difference between two means, t test is usually selected. However, t test can only check differences of each pair of variables independently although there are more than two variables. Multiple t test is possible for such case, but it may lead to a severely high Type I error value [56]. Type I error is the mistake to reject a null hypothesis although the null hypothesis is correct. The mistake to accept a null hypothesis although it is wrong is called the Type II error. ANOVA has an advantage over t test such that it can test difference of several means without increasing the Type 1 The number of levels of an independent variable is the number of variations used in an experiment. An independent variable should have at least two levels.

90 74 CHAPTER 7. MULTIVARIATE STATISTICAL METHODS I error rate. ANOVA returns the statistics significance, the probability value (pvalue) which is satisfactorily accurate. p-value is the probability that the observed relationship or a difference in a sample occurred by pure chance and that in the population from which the sample was drawn, no such relationship or differences exist. The higher the p-value, the less we can believe that the observed relation between variables in the sample is a reliable indicator of the relation between the respective variables in the population [98]. Commonly, if the p-value is less than 0.05 or 0.01, the result is called statistically significant. ANOVA tests a null hypothesis and calculates the significance. A null hypothesis is an hypothesis such that all of the population means are equal: H 0 : µ 1 = µ 2 = = µ k (7.1) where µ i is each population mean and k is the number of populations. A null hypothesis is often used to obtain the reverse of what the experiment is actually believed. ANOVA tests a null hypothesis by comparing two estimates of variance P N i=1 (x i µ) 2 σ 2 where a variance is a measure of data spread and given as σ 2 = N with the population mean µ and the population size N. The idea of variances was first introduced by Fisher in 1918 [39]. The standard deviation is a square root of a variance. One estimate is the Mean Square Error (MS wg ) and it is based upon the variances within groups. MS wg estimates σ 2 whether the null hypothesis is true or not. Another estimate is the Mean Square Between (MS bg ) and it is based upon the variance of the sample means. The logic of ANOVA is the following; if the null hypothesis is true, both MS wg and MS bg return about the same values (E[MS bg ] = E[MS wg ] = σ 2 ) because they are estimates of the same quantity; however, if the null hypothesis is false, then MS bg returns something larger than σ 2 (E[MS bg ] > E[MS wg ]). Because MS bg is based upon the sample means, the greater the variance of sample means, the greater MS bg. Mathematically, the expected value of MS bg is written as E[MS bg ] = σ 2 + N (µ i µ) 2 k 1 (7.2)

91 7.2. UNIVARIATE ANALYSIS: ANALYSIS OF VARIANCE 75 where µ i is the ith population mean, µ is the mean of the population means, k is the number of population means, and N is the number of subjects [56]. If MS bg value is sufficiently greater than that of MS wg, the null hypothesis is rejected; otherwise, the null hypothesis is accepted. To determine if MS bg is sufficiently greater or not, the significance test, which return the p-value, is done with the statistic F (named after Ronald A. Fisher). The sampling distribution of F is known as the distribution of the ratio of two estimates of variance and has two parameters: degrees of freedom numerator and degrees of freedom denominator. A degree of freedom is the number of independent pieces of information that go into the estimate of a parameter [56]. If the null hypothesis is false, the F ratio estimate returns relatively large value and the p-value is lower than the significance level. As described above, usually the significance level is 0.05 or To calculate the difference between means, ANOVA compares variances by using the sums of squares (SS). SS is the variation among the subjects in an experiment. SS is the numerator of a variance and computed as SS = N (x i µ) 2 (7.3) i=1 where x i are scores of subjects and µ is the mean. To run ANOVA efficiently, there are three assumptions about a set of data. All samples are normally distributed. A normal distribution is symmetric and more data are concentrated in the middle than its tails. A normal distribution is defined by using the mean µ (x µ) 1 and the standard deviation σ and its height is given as 2 2πµ e 2µ 2 [56]. 2 All samples have equal variance. All observations are mutually independent. It is known that the ANOVA is robust to modest violations of the first two assumptions [69].

92 76 CHAPTER 7. MULTIVARIATE STATISTICAL METHODS Between-Subjects ANOVA There are several types of ANOVA designs. The most basic design is called oneway between-subjects ANOVA. If a different group of subjects is used for each level of an IV, it is called between-subjects design [56]. An example of subjects assignment in one-way between-subjects ANOVA design is shown in Table 7.1. The core idea of ANOVA is that variances can be partitioned by partitioning SS. Values (by Subjects) Tests T 1 T 2 T 3 x 11 (S 1 ) x 12 (S 4 ) x 13 (S 7 ) x 21 (S 2 ) x 22 (S 5 ) x 23 (S 8 ) x 31 (S 3 ) x 32 (S 6 ) x 33 (S 9 ) Means µ 1 µ 2 µ 3 Grand mean µ Table 7.1: An example of subjects assignment for one-way between-subjects ANOVA. As shown in Table 7.1, let x ij be the value on the i-th row and the j-th column in a subjects assignment table. Then, the difference between each score and the grand mean x ij µ can be considered the sum of two components: the difference between the score and its group mean and the difference between the group mean and the grand mean. It is mathematically written as x ij µ = (x ij µ j ) + (µ j µ) (7.4) Each component of (7.4) can be squared and summed up in order to produce the sums of squares within groups and between groups. The basic partition holds because, conveniently, the cross-product terms produced by squaring and summing cancel each other out [99]. Therefore, the SS partition is written as (x ij µ) 2 = i i j (x ij µ j ) 2 + N j j (µ j µ) 2 (7.5) where N is the sample size. Note that each component of (7.5) is a special case of (7.3). (7.5) shows SS total = SS wg +SS bg representing that the total SS differences between scores and the grand mean can be partitioned into two parts, SS within

93 7.2. UNIVARIATE ANALYSIS: ANALYSIS OF VARIANCE 77 groups and SS between groups. It is shown in Figure 7.2. SStotal SSbg df = k - 1 SSwg df = N - k Figure 7.2: Partition of the sums of squares and degrees of freedom in one-way between-subjects ANOVA. N is the total sample size and k is the number of groups. Degrees of freedom in ANOVA can be partitioned in the same way as done for SS: df total = df wg + df bg. Total degrees of freedom are the total sample size minus 1 because it is lost when the grand mean is estimated: df total = N 1 where N is the total sample size [99]. Within-groups degrees of freedom are the total sample size minus k because it is lost when the means for each of the k groups are estimated: df wg = N k [99]. Between-groups degrees of freedom are defined as df bg = k 1 because it is lost when the grand mean is estimated [99]. In ANOVA, variance is provided by dividing SS by degrees of freedom and it is called the mean square (MS). The same as SS and df, ANOVA produces three variances: the total variability among scores MS total, the variability within groups MS wg, and the variability between groups MS bg. MS wg and MS bg provide F ratio to test a null hypothesis H 0 in (7.1). F ratio is computed as F = MS bg MS wg. (7.6) After F value is calculated, it is compared with a critical F value obtained from a F table with numerator df bg = k 1 and denominator df wg = N k. If an obtained F exceeds the critical F, then a null hypothesis is rejected [99]. One-way between-subjects ANOVA can be extended to the factorial design. Its partition of SS and df and F value computation are shown in [99].

94 78 CHAPTER 7. MULTIVARIATE STATISTICAL METHODS Within-Subjects ANOVA Another design is that the results are obtained from the same subjects with different conditions. Such design is called within-subjects ANOVA and our experiment is classified in this type of design. An example of one-way within-subjects ANOVA is shown in Table 7.2. Subjects Tests T 1 T 2 T 3 S 1 S 1 S 1 S 1 S 2 S 2 S 2 S 2 S 3 S 3 S 3 S 3 Table 7.2: An example of subjects assignment for a one-way within-subjects ANOVA. In this design, computations of SS and MS are done in the same way as that of between-subjects designs. Then, SS wg is partitioned into two parts: the individual differences due to subjects SS subject and the interaction of individual differences with test methods SS subject test as shown in Figure 7.3. F ratio is computed with SStotal SSwg SSbg SSsubject SSsubject-test df = t - 1 df = s - 1 df = (s - 1)(t - 1) Figure 7.3: Partition of the sums of squares and degrees of freedom in one-way within-subjects ANOVA. MS subject test, the interaction of individual differences with test methods, and

95 7.2. UNIVARIATE ANALYSIS: ANALYSIS OF VARIANCE 79 MS bg as F = MS bg MS subject test, (7.7) and dfs are df bg = t 1, df subject = s 1 and df = (t 1)(s 1) where t and s are numbers of test methods and subjects respectively. One-way within-subjects ANOVA can be extended to factorial designs. Additionally, it is possible to use between-subjects and within-subjects designs together. Their partitions of SS and df and F value computations are shown in [99] Analysis of Covariance Covariance is a measure of data spread. While variance is operated on one variable, covariance is usually operated on two variables. It is mathematically given P N i=1 as cov(x, Y ) = (x i µ x)(y i µ y) where x N i and y i are values of group X and Y, µ s are their means, and N is the number of samples for one group. It is obviously commutative, and the covariance between one variable and itself is variance. In case that there are more than two variables, a covariance matrix is obtained whose diagonal shows variances and off-diagonal shows covariances of each pair of variables. Analysis of covariance (ANCOVA) is an extension of ANOVA. One-way AN- COVA is designed to assess group differences on a single DV after the effects of one or multiple covariates are statistically removed. The primary question on AN- COVA is same as that of ANOVA: are there mean differences among groups on the adjusted DV likely to have occurred by chance? ANCOVA is useful for three major purposes: to increase the sensitivity of the test of main effects and interaction effects, to adjust the means of DVs and covariates, and to assess one DV after adjustment for other DVs that are treated as covariates. The third purpose occurs in multivariate analysis of variance. ANCOVA can answer the questions such as main effects and interaction effects of IVs, and effects of covariates. As in ANOVA, ANCOVA needs several assumptions to be run sufficiently. The specific assumptions for ANCOVA are reliability of covariates, linearity between covariates and between covariates and

96 80 CHAPTER 7. MULTIVARIATE STATISTICAL METHODS DVs, and homogeneity of regression in addition to the assumptions of ANOVA. Regression is explained in the next section. ANCOVA can be factorial design if there are multiple IVs. The desirability and use of covariates are same as that in one-way design. 7.3 Bivariate Analysis: Correlation and Regression Introduction Univariate analysis is assessed between a continuous DV and discrete levels of an IV. However, some researchers may be interested in measuring the strength of association between two continuous variables. In such case, bivariate analysis is chosen to analyze a set of data. In this section, two bivariate statistical analyses are presented: correlation and regression. Correlation measures the size and direction of relationship between two or more variables. Regression predicts a score on one variable from a score on the other Correlation Galton introduced the concept of correlation in 1888 [42]. Since then, several measurements of correlations have been proposed and the Pearson Product Moment Correlation (shortly called Pearson r) is the most widely used one now [85]. It is given as r = N XY ( X)( Y ) [N X2 ( X) 2 ][N Y 2 ( Y ) 2 ] (7.8) where Pearson r is the average cross product of standardized X and Y variable scores [99]. It is also simply written as r = ZX Z Y N 1 (7.9)

97 7.3. BIVARIATE ANALYSIS: CORRELATION AND REGRESSION 81 where the numbers are converted into z scores. The z scores, Z X and Z Y, are given as Z X = X µ X σ X (7.10) Z Y = Y µ Y σ Y (7.11) where µ-s and σ-s are means and standard deviations of each X and Y. Pearson r is independent of the scale of measurement and independent of the sample size [99]. Figure 7.4: Examples of correlation coefficients of Pearson r (adapted from HyperStat [56]). The range of the correlation coefficient is between -1.0 and 1.0. If the correlation coefficient is 0.0, it means there is no relationship or predictability between the X and Y variables. When it is -1.0 or 1.0, there is perfect predictability of one score when the other is known. The correlation coefficient -1.0 is called perfect negative correlation which means as one variable increases, another one decreases. With 1.0, as one variable increases, another one also increases and it is called perfect positive correlation. The difference between covariance and correlation is that covariance depends on the units of measurement but correlation does

98 82 CHAPTER 7. MULTIVARIATE STATISTICAL METHODS not. Examples of correlations are shown in Figure Regression The size and direction of the relationship between variables are discovered by correlation. On the other hand, regression predicts a score on one variable from a score on the other. In bivariate regression (i.e., with two variables) where Y is predicted from X, a straight line between the two variables is found. The bestfitting straight line goes through the means of X and Y and minimizes the SS distances between the data points and the line [99]. The following form is given to find the best-fitting straight line: Y = A + BX (7.12) where Y is the predicted score, A is the value of Y when X is 0.0, B is the slope of the line, and X is the value from which Y is to be predicted. The difference between the predicted value and the obtained value of Y at each value of X is called an error of prediction. The squared errors of prediction is minimized by the best-fitting straight line. To solve (7.12), B is given as B = N XY ( X)( Y ) N X 2 ( X) 2. (7.13) B is the ratio of the covariance of the variables and the variance of the one from which predictions are made. (7.8) and (7.13) are very similar, but their denominators are different. In correlation, the variances of both variables are used in the denominator while only the variance of the predictor variable is used in regression. A is given as A = µ Y Bµ X. (7.14) It is the mean of the predicted variable minus the product of the regression coefficient B times the mean of the predictor variable.

99 7.4. MULTIVARIATE ANALYSIS OF VARIANCE Multivariate Analysis of Variance Introduction ANOVA deals with cases with a single DV. Multivariate analysis of variance (MANOVA) is a generalization of analysis of variance with multiple DVs. While ANOVA checks whether the means of multiple groups on a single DV are from the same sampling distribution, MANOVA checks whether the vectors of means on multiple DVs are sampled from the same sampling distribution [5, 82]. MANOVA has several advantages over ANOVA. Firstly, there is a better chance to discover which factor is truly important because several dependent variables are measured in one experiment. Secondly, MANOVA can protect against the Type I errors that may occur in case that multiple ANOVAs are independently tested. In addition, MANOVA can reveal differences not by discovering ANOVA tests. This situation is shown in Figure 7.5. The axes in Figure 7.5 has two DVs Y 1 and Y 2 with two levels for each and represent their frequency distributions. From the point of view of each axis, the distributions are sufficiently overlapping to each other and the mean difference might not be viewed in ANOVA. On the other hand, because MANOVA considers the mean differences in combinations of DVs, the difference becomes apparent. Figure 7.5: An advantage of MANOVA over ANOVA (adapted from Tabachnick [99]). Each of the axes is a DV for a one-way design with two levels and represents its frequency distribution.

100 84 CHAPTER 7. MULTIVARIATE STATISTICAL METHODS One of the disadvantages of MANOVA is that because it is more complicated than ANOVA, there may be some ambiguity to say which independent variable has effect onto each of the dependent ones. As ANOVA cannot tell which groups differ from which other groups, neither can MANOVA. Additionally, MANOVA cannot tell which variables are responsible for the differences in mean vectors Assumptions and Questions Same as for ANOVA, MANOVA has several assumptions about a set of data: All of the DVs must be normally distributed. There exist linear relationships among all pairs of DVs, all pairs of covariates, and all DV-covariate pairs. The DVs exhibit equal levels of variance across the range of predictor variables (homogeneity of variances) The covariances of the DVs must be homogeneous. Lots of specific tests can check if a set of data satisfies this assumption (homogeneity of covariances). Then, MANOVA can answer the following questions: What are the main effects of the IVs? MANOVA checks if the mean difference in the DV among groups at different levels of an IV is larger than expected by chance. To test it, a null hypothesis that the IV has no systematic effect on the optimal linear combination of DVs is used. If there are multiple IVs, separate tests are done for each of them. Additionally, if the sample sizes are equal in all of the cells, those separate tests are independent to each other. What are the interactions among the IVs? MANOVA checks if the change in the DV over levels of one IV depends upon the level of another IV. If there are multiple IVs, there are multiple interaction effects. Separate interaction is tested and they are independent to each other.

101 7.4. MULTIVARIATE ANALYSIS OF VARIANCE 85 What is the importance of the DVs? If there are significant differences between main effects or between interactions, researchers must be interested in which of DVs are affected or unaffected by the IVs. What is the strength of association between the DVs? If a main effect or an interaction affects reliably, the next question is how much it is and what kind of linear combination of DVs has an effect. What are the effects of covariates and how can they be utilized? MANOVA checks whether the covariances provide reliable adjustment and what is the nature of the DV-covariate relationship Mathematical Forms of MANOVA This section provides the mathematical forms of MANOVA. A data set for MANOVA consists of one or multiple IVs and two or more DVs. Basically, MANOVA follows the model of ANOVA where variance in scores is partitioned into difference among groups and within groups. However, the different point on MANOVA is that each subject has scores on each of multiple DVs. The MANOVA equation with N samples is an extension from ANOVA (7.5). The simplest partition is into differences between groups and differences within groups. It is mathematically written as (y ij µ) 2 = n (µ yj µ) 2 + i j j i (y ij µ yj ) 2. (7.15) The total SS differences between scores on y (the DV) and the grand mean µ is partitioned into the SS between group means µ yj and the grand mean, and the SS between scores on y and their group means µ yj. (7.15) can be simply rewritten as SS total = SS bg + SS wg. For factorial designs or with multiple IVs, SS bg is partitioned into variance associated with the first IV, variance associated with the second IV, and variance associated with the interaction between degree of the first and the second IVs [99]. Because there is no single DV, a column matrix (or called vector) is given to MANOVA. j

102 86 CHAPTER 7. MULTIVARIATE STATISTICAL METHODS In ANOVA, SS is divided by degree of freedom to produce MS. On the other hand, the matrix analog of variance is a determinant in MANOVA [99]. While ANOVA checks main effects and interaction effects by the ratio of variances, MANOVA uses the ratio of determinants with Wilks Lambda criterion. Wilks Lambda is given as Λ = S error S effect + S error (7.16) with the determinant of the error cross-product matrix (i.e., differences within groups) and the sum of the error and effect (i.e., differences between groups) cross-product matrices [99]. With Wilks Lambda value, an approximate F is calculated by where ( ) ( ) 1 y df2 F (df 1, df 2 ) = y df 1 (7.17) y = Λ 1/s (7.18) s = p 2 (df effect ) 2 4 p 2 + (df effect ) 2 5 (7.19) (7.20) where p is the number of DVs and df effect is the degrees of freedom for the tested effect. Additionally, df 1 and df 2 are the degrees of freedom for testing F value and given as df 1 = p(df effect ) (7.21) [ df 2 = s (df error ) p df ] [ ] effect + 1 p(dfeffect ) 2 (7.22) 2 2 where df error is the degrees of freedom associated with the error term [99]. The approximate F is compared with the value from F table as done in ANOVA. Tabachnick provides several practical examples of MANOVA on her book [99]. MANOVA can be extended to multivariate analysis of covariance (MAN- COVA). It can be applied to problems such as there are one or more covariates in addition to dealing with multiple DVs.

103 Chapter 8 Perceptual Evaluation 8.1 Introduction Overview There are three categories for image comparison measurement: human perception-based, objective, and HVS-based measurements [114]. Human perception measurement is a subjective measurement method with human participants. Objective measurement is based upon theoretical models. HVS-based measurement is defined mathematically but based upon HVS theory Perception-Based Measurements Perception-based measurement has an advantage that it can be applied even without any reliable model. It involves predicting the number and type of dimensions a viewer uses in ratings. Its main disadvantage is the cost. Collecting data will need a long time and the amount of necessary data for sufficient analysis will grow exponentially with the number of parameters. To analyze data, statistical methods are often used. In 1993, Ahumada and Null presented the term preference factoring in [3]. It contains a range of observers rank and then use the inter-observer variability to extract dimensions of image quality. In terms of measuring preference of human observers, the individual differences scaling (INDSCAL) analysis is most widely used [20, 6]. It provides information about the dimensions the ob- 87

104 88 CHAPTER 8. PERCEPTUAL EVALUATION servers used in judging images. Furthermore, several related works have been presented. Nijenhuis et al. focused upon sharpness and sharpness related attributes [79]. In 1998, Rogowitz et al. had a perceptual image similarity experiments [93]. They also compared the results to two algorithmic image similarity metrics. They analyzed their results by using multidimensional scaling techniques. In 2000, Pellacini et al. introduced psychophysically-based light reflection model by finding perceptually meaningful axes taken from their perceptual experiment with human observers [86]. In 2001, Eissa and Mahdavi had a subject evaluation of lighting in architectural spaces [33]. In 2002, Drago et al. presented the perceptual evaluation of tone mapping operators in terms of similarity and preference by using INDSCAL and the preference mapping (PREFMAP [71]) analyses [28] and Newsham et al. evaluated the lighting quality by comparing on a conventional and HDR monitors by using MANOVA [78] Objective Measurements For objective methods, a number of techniques have been proposed. The simplest method is the mean square error (MSE) written as MSE(I, I ) = 1 NM N M [I (i, j) I(i, j)] 2 (8.1) i=1 j=1 where I(i, j) and I (i, j) are the luminances at the i-th row and the j-th column in the images I and I, and N and M represent the sizes of rows and columns respectively [50, 45]. As clearly seen in (8.1), it calculates only difference between two images and cannot predict which of the two images is better. Another basic method, the peak signal-to-noise ratio (PSNR) is possible for objective measurement. It is a normalization of MSE which is done as PSNR(I, I R 2 ) = 10 log 10 MSE(I, I ). (8.2) where R is the range of display luminances. Although both MSE and PSNR measure only difference between two images, Martens and Meesters discovered from their experimental results such that the MSE performs equally well as a highly so-

105 8.1. INTRODUCTION 89 phisticated model such as the Sarnoff Visual Discrimination Model (VDM) [61] when applied to CIE 1976 lightness (L ) images instead of plain luminance images [67, 68]. This experiment was held over images degraded by noise and blur in JPEG format. In addition to MSE and PSNR, the other techniques such as the root mean squared error (RMSE), the mean absolute error (MAE), and the signal to noise ratio (SNR) are also used to check image quality objectively. A number of objective image quality measurement techniques have been proposed. As one of the mostly-known objective methods, Wilson et al. presented a gray-scale image comparison in 1997 [114]. Their algorithm is based upon exploiting topologies in grey-scale images. Phase-based image comparison was proposed by Lees and Henshaw in 1987 [57]. Their phase-based method has two interesting properties. One is the redundancy removal; two similar images can be compared simply by placing them side-by-side to create a composite image and then a phase-only transformation is applied to the entire composite image. Another property is an automatic analysis of the processed data. A disadvantage of this method is its computational cost. Another phase-based image comparison technique was introduced by Lorenzatto and Kovesi in 1999 [60]. Their technique allows for the measurement of distortion regardless of translation. In 2002, another objective method was introduced by Wang [106]. His method is very simple to calculate and applicable to various image processing applications. He focused loss of correlation, luminance distortion, and contrast distortion. Avcıbaş et al. presented reviews of the statistical analysis of a number of objective image quality measurements in [10, 11] HVS-Based Measurements An HVS-based measurement is an extension of objective measurements with features of HVS. In 1993, Daly introduced the visual difference predictor (VDP) algorithm [24]. It was motivated by the need to quantitatively describe the visual consequences of decisions regarding the design and quality control of imaging products. His model consists of three components: calibration mechanisms, HVS model, and difference visualization. In 1997, Lai and Kuo presented the Haar wavelet to model the space-frequency localization property of HVS [54].

106 90 CHAPTER 8. PERCEPTUAL EVALUATION They discovered that the physical contrast in different resolutions can be easily represented in terms of transform coefficients. Frese et al. introduced an image similarity metric based upon multiscale model of HVS [41]. From a perceptual phenomena, their method extracts features and forms an aggregate measure of similarity by using a weighted linear combination of the feature differences. Another image quality measurement method was presented by Osberger et al. [81]. Their model focuses upon contrast and frequency sensitivities and incorporates a masking strategy designed specifically for natural images. In the next year, Osberger et al. presented a model consisting of an early vision model based upon [81] and a visual attention model [80]. Their technique showed high correlation with subjective test data and it was found that this method is particularly useful for images coded with spatially varying quality. In 2000, Albin et al. presented an image quality methodology which is based upon the LLAB color space for perception of color in complex images [4]. LLAB color space was presented by Luo [63, 62] and it is a recent modification of the CIELab1976 color space [35]. In 2001, Janssen introduced a new image quality measurement method regarding vision primarily as a process in which attributes of items in the outside world are measured and internally quantified with the aim to discriminate and/or identify these items [50]. Bornik et al. introduced another metric combining eight feature-based and perceptually-oriented image quality measurements [15]. Several surveys have been produced over image quality measurement methods. Eskicioglu and Fisher measured performances of image quality measurements for gray-scale image compression in 1995 [34]. They concluded although some numerical measures correlate well with the observers response, they are not reliable for an evaluation across different techniques. On the other hand, they found that a graphical measure which is called Hosaka plots [77] can be used to appropriately specify not only the amount but also the type of degradation in reconstructed images. In 1997, Jackson provided a survey for the Sarnoff/ViDEOS model on visual recognition tasks [49]. In 2000, McNamara presented a survey over perceptually driven rendering techniques, HVS-based image quality metrics, and tone mapping operators [70]. Zhou et al. presented a study of 11 image comparison techniques [120]. They identified that three out of those 11 metrics are effective in separating similarity and difference among images.

107 8.2. EXPERIMENTAL DESIGN Experimental Design For our project, a human perception-based measurement was selected with seven tone mapping operators and 14 human subjects. The tone mapping operators are the linear tone mapping, the bilateral tone mapping by Durand and Dorsey [31], Pattanaik et al. method [84], Ashikhmin method [8], Ward method [111], Reinhard et al. s photographic tone mapping [90], and Drago et al. s adaptive logarithmic mapping [29]. For all of the reproductions, the gamma value was set γ = 2.2 and the maximum display luminance was set 100 cd/m 2. For Pattanaik method, luminance values are scaled by a factor of 650. The local contrast threshold of Ashikhmin method was set 0.1. The other parameter settings were done by following the default settings which were presented in each of the papers. Those images are shown in Appendix A. Two scenes for this experiment are shown in Figure As clearly seen in the figure, the dynamic ranges of both scenes are large enough. Scene 1 has highly bright spot lights around the trees and quite dark areas behind the glass. Scene 2 also has highly bright and dark areas, and in addition, it has gray area (on pipes) which is pointed in a scaling problem by Gilchrist et al. [44]. The scaling problem concerns how the range of luminances in an image is mapped onto a range of perceived grays. The absolute luminances in both scenes were measured by MINOLTA light meter LS-100 [53]. In Scene 1 (Figure 3.11(a)), the brightest area is 13,630 cd/m 2 and the darkest area is cd/m 2. In Scene 2 (Figure 3.11(b)), the brightest area is cd/m 2 and the darkest area is cd/m 2. All of 14 subjects were graduate students and researchers of the Computer Graphics group in the Max-Planck-Institut für Informatik and one of them is the author of this thesis. Two of them are female and the rest are male. The range of their age is All of them had normal or corrected eyesights to normal vision. Additionally, all subjects except the author were naïve enough for the goal of our experiment and tone mapping operators. Participants were asked to view seven images one after another for each of the two scenes (see Figure 3.1) of the Max-Planck-Institut für Informatik on an srgb-calibrated monitor (DELL UltraSharp 1800FP [26]) whose resolution is at 60.0 Hz. For each of the 14 images, they were asked to compare

108 92 CHAPTER 8. PERCEPTUAL EVALUATION them with their corresponding real-world view and give ratings for image appearance and realism. Image appearance attributes are overall brightness, contrast, details in the dark region, and details in the bright region. The realism rating is only naturalness. For the image appearance attributes, subjects rated how much brightness, contrast, or details each of the images has compared to its corresponding real-world view. For the naturalness, they were asked to rate how real the image is. They were allowed to move back and forth among images for a scene. A screenshot of our perceptual experiment is shown in Figure 8.1. Whole procedure for one participant took approximately 20 to 30 minutes. Figure 8.1: A screenshot of our experiment. 8.3 Results The experimental design is seven (tone mapping operators) two (scenes) withinsubjects (see Section for within-subject designs). In this experiment, there

109 8.3. RESULTS 93 are two IVs and five DVs. IVs are tone mapping operators and scenes. DVs are all of the attributes: overall brightness, contrast, details in dark regions, details in bright regions, and naturalness. Our primary interest on this perception test is whether the images produced by different tone mapping operators are perceived differently when they are compared with their corresponding real-world views. To analyze the set of data obtained from the perception test, the Statistics Toolbox of Matlab is used [69]. As preliminary data processing, all scores are normalized over each of the attributes on each of the subjects in order to let the standard deviation 1. The normalization was done by x i x i µ x σ x where x i is a score and µ x and σ x are respectively the mean and the standard deviation over an attribute of a subject. Those normalized values are shown in Appendix B. Figure 8.2 shows the scores and F and p values of the attributes for the main effect of the two scenes. Scenes 1 and scene 2 are shown in Figures 3.11(a) and 3.11(b) respectively. Those results show that the difference between those two scenes is not significant. Only the detail reproductions in dark regions show the significant difference, but for most of the attributes, the effect of Scenes difference is small enough to be ignored. It is also shown by Mahalanobis distance. MANOVA in the Statistics Toolbox of Matlab provides Mahalanobis distances among the IVs. Mahalanobis distance was originally introduced by Mahalanobis in 1936 [65]. It is a measure based upon correlations between variables calculated as S = 1 n 1 n (X i X) t (X i X) (8.3) i=1 where X is a data matrix, X i is the i-th row of X, X is a row vector of means, and n is the number of rows. A Mahalanobis distance is same as the Euclidean distance if the covariance matrix is the identity matrix. Mahalanobis distance is very useful to determine the similarity of a set of values. It is sensitive to inter-variable changes in a data set. It has been used to classify observations into different groups. The Mahalanobis distance between Scenes provided by MANOVA is It shows that those two scenes were perceived similarly and the difference between Scenes is quite small and can be ignored. This is why in our statistical data analysis we considered both scenes together. This also followed our goal to investigate the tone mapping performance for architectural scenes.

110 94 CHAPTER 8. PERCEPTUAL EVALUATION Figure 8.3 shows distributions of scores for each attribute and their F and p values for the main effect of the tone mapping operators. All of the attributes are highly significant for the main effect of the tone mapping operators. Especially, it is manifest in Figure 8.3(a) that images produced by the linear tone mapping, Pattanaik method, Ward method, and Drago method have substantially higher overall brightness and all of the operators are perceived the most differently when compared with their corresponding real-world views. Note that it is shown very clearly that all global methods have stronger overall brightness than the local ones. The second most differently perceived attribute is the details in bright regions as shown in Figure 8.3(d). The bilateral filtering, Ashikhmin method, Reinhard method, and Drago method provide significantly more details in bright regions than the others. All of the local operators are perceived with more details than global ones according to the graph. It is obvious because local operators use different scales for small regions of an image while global operators use only one scale for whole part of an image and tend to saturate bright parts. Contrast (Figure 8.3(b)) and naturalness (Figure 8.3(e)) show almost same values of significance. The linear mapping, Pattanaik method, and Ward method have higher contrast and Ward, Reinhard, and Drago methods have more naturalness than the others. Global operators have stronger contrast than local ones do as shown in the graph. It makes sense because local operators deal with small regions and local contrasts of an image while global operators deal with all pixels together and consider global contrast. Details in dark regions (Figure 8.3(c)) show the least significance among the attributes, but it is still significant because its p value is much smaller than the significance level which is usually 0.05 or Drago method is perceived to have the most details in dark regions, Ashikhmin method is the second most one. The linear, Pattanaik, Ward, and Reinhard have almost same scores, and the bilateral filtering has slightly more details than those four. Because the main effect of Scenes for details in dark regions are significant, the main effect of the tone mapping operators for details in dark regions in each scene was also tested and shown in Figure 8.4. In this case, values were normalized over each attribute of each subject for each scene. They are perceived more differently in Scene 1 than in Scene 2. For both of Scenes, Ashikhmin and Drago methods are perceived as

111 8.3. RESULTS 95 the two most detailed ones in dark regions. Drago is perceived as the most detailed reproduction in dark regions in Scene 1 and Ashikhmin is perceived as the most in Scene 2. An interesting result from this observation is that although the details in bright regions are perceived very differently, the details in dark regions are not perceived as significantly different as in bright regions when tone mapped images are compared with their corresponding real-world views. Another result is shown in Figure 8.5. Those figures show the correlations between naturalness and each of the image appearance attributes. According to those correlation coefficients, the naturalness and each of the overall brightness and details in dark regions have no correlations (Figures 8.5(a) and 8.5(c)). On the other hand, the combination of naturalness and each of the contrast and details in bright regions have correlations (Figures 8.5(b) and 8.5(d)), but they are still small. It can be concluded from this result that none of the image appearance attributes has strong influence to determine naturalness by itself. Naturalness is considered by a combination of those attributes multidimensionally. All of the correlation coefficients with Pearson r are shown in Table 8.1. All of the possible pairs of those attributes were tested, and then it was found that overall brightness and details in bright regions has the biggest absolute value of correlation coefficient with Pearson r = as shown in Figure 8.6. It is obvious because as overall brightness decreases, bright parts are less saturated and better visible. O.C. D.D. D.B. N. O.B O.C D.D D.B Table 8.1: Correlations with Pearson r values of all pairs of overall brightness (O.B.), overall contrast (O.C.), details in dark regions (D.D.), details in bright regions (D.B.), and naturalness (N.). MANOVA of Matlab provides an estimate of the dimension of the space containing the group means and the significant values for each of the dimensions. If d = 0 where d is the estimate of the dimension, it indicates that the means are at the same point. d = 1 indicates that the means are different but along a line, d = 2 shows that the means are on a plane but not along a line, and so on. The null

112 96 CHAPTER 8. PERCEPTUAL EVALUATION hypotheses are tested by calculating the significant values (p-values) in each of the dimensions such as the means are in N-dimensional space where N is the number of dimensions. From our set of data, MANOVA returns d = 3 which indicates that the means are neither along a line nor on a plane but in a three-dimensional space. The p values for each of the dimensions in our perceptual experiment are p(d = 0) = , p(d = 1) = , p(d = 2) = , p(d = 3) = , p(d = 4) = They show that there is no possibility at all to have all means either on a point or along a line. The possibility to have means on a plane is not zero, but it is much smaller than the significance level and can be ignored. In d = 3, the possibility becomes more than the significance level; therefore, the means are located in a three-dimensional space. Table 8.2 shows the Mahalanobis distances among the tone mapping operators given by MANOVA. According to Table 8.2, the linear tone mapping and bilateral filtering are perceived the most differently when compared with their corresponding real-world views. The second and the third most different combinations come from the combination of the linear tone mapping and Ashikhmin method and of the linear tone mapping and Reinhard method. All of the three biggest differences are provided from the linear tone mapping. On the other hand, the least difference is provided between bilateral filtering and Ashikhmin method. This result is visualized in Figure 8.7. An interesting result shown in Figure 8.7 is that those seven tone mapping reproductions are divided into global and local methods by Mahalanobis distances. Three local operators (bilateral, Ashikhmin, and Reinhard) are similar to each other and four global operators (linear, Pattanaik, Ward, and Drago) are similar to each other, but both categories of global and local operators are not so similar to each other.

113 8.3. RESULTS 97 bilateral Pattanaik Ashikhmin Ward Reinhard Drago linear bilateral Pattanaik Ashikhmin Ward Reinhard Table 8.2: Mahalanobis distances among the tone mapping operators provided by MANOVA. The three biggest distances are written in a bold font and the three smallest distances are underlined. All of the biggest differences are from the linear tone mapping. Those Mahalanobis distances are visualized in Figure 8.7.

114 98 CHAPTER 8. PERCEPTUAL EVALUATION Values 0 Values (a) Overall brightness. F = 0.04, p = (b) Contrast. F = 0.09, p = (c) Details in dark regions. F = 58.98, p = E Values Values (d) Details in bright regions. F = E 05, p = (e) Naturalness. F = 2.85, p = Figure 8.2: Distributions and F and p values of each attribute for the main effect of Scenes. Scenes 1 and 2 are shown in Figures 3.11(a) and 3.11(b) respectively. A box shows the lower quartile, median, and upper quartile values. The whiskers are lines extending from each end of the box to show the extent of the rest of the data. Outliers are data with values beyond the ends of the whiskers.

115 8.3. RESULTS 99 (a) Overall brightness. F = 46.14, p = 0.0. (b) Contrast. F = 8.74, p = E 08. (c) Details in dark regions. F = 3.18, p = (d) Details in bright regions. F = 30.45, p = 0.0. (e) Naturalness. F = 8.11, p = E 08. Figure 8.3: Distributions and F and p values of each attribute for the main effect of the tone mapping operators. The operators are numbered as 1 (linear), 2 (bilateral), 3 (Pattanaik), 4 (Ashikhmin), 5 (Ward), 6 (Reinhard), and 7 (Drago).

116 100 CHAPTER 8. PERCEPTUAL EVALUATION (a) Scene 1. F = 5.75, p = E 05. (b) Scene 2. F = 3, p = Figure 8.4: Details in dark regions for each of Scenes.

117 8.3. RESULTS Naturalness 0 Naturalness Overall Brightness Contrast (a) Overall brightness vs. naturalness. r = (b) Contrast vs. naturalness. r = Naturalness 0 Naturalness Details in Dark Regions Details in Bright Regions (c) Details in dark regions vs. naturalness. r = (d) Details in bright regions vs. naturalness. r = Figure 8.5: Naturalness vs. the other attributes. Pearson r values are shown under each of them. The colors represent as follows: yellow (linear), magenta (bilateral), cyan (Pattanaik), red (Ashikhmin), green (Ward), blue (Reinhard), and black (Drago). Note that none of the attributes has strong influence to determine naturalness by itself. Naturalness is considered multidimensionally.

118 102 CHAPTER 8. PERCEPTUAL EVALUATION Naturalness Details in Bright Regions Figure 8.6: Overall brightness vs. details in bright regions. r = As in Figure 8.5, the colors represent yellow (linear), magenta (bilateral), cyan (Pattanaik), red (Ashikhmin), green (Ward), blue (Reinhard), and black (Drago).

119 8.3. RESULTS 103 Figure 8.7: Mahalanobis distances among the tone mapping operators given by MANOVA. As done in Figure 8.3, the operators are numbered as 1 (linear), 2 (bilateral), 3 (Pattanaik), 4 (Ashikhmin), 5 (Ward), 6 (Reinhard), and 7 (Drago). Note that those tone mapping operators are divided into global and local methods by Mahalanobis distances.

120 104 CHAPTER 8. PERCEPTUAL EVALUATION

121 Chapter 9 Conclusions The goal of this thesis is to investigate how differently the tone mapping operators are perceived when compared with their corresponding real-world views. A perceptual experiment was done over seven existing tone mapping operators in two scenes with 14 human subjects. In order to carry out this experiment, two tone reproductions, the Ashikhmin method and fast bilateral filtering by Durand and Dorsey, were implemented within the scope of this thesis. Both of the reproductions are local operators, which apply different scales in different parts of an image while global operators apply the same scale for all of the pixels of an image. After acquiring the camera response curve by the response curve recovery method of Robertson et al., our HDR images were created with it and saved in the Radiance RGBE format. Each HDR image was constructed from 15 images with different exposures. Two scenes taken in the Max-Planck-Institut für Informatik were chosen for the perceptual experiment. Both of them have quite high dynamic ranges and also contain gray areas which points to a scaling problem concerning how the range of luminances in an image are mapped onto a range of perceived grays. In the perceptual experiment, subjects were asked to compare each of the images to its corresponding real-world view and rate its appearance and realism. The image appearance attributes are overall brightness, contrast, details in dark regions, and details in bright regions. The realism rating is naturalness of an image. The subjects rated how much image appearance an image had and how real it was 105

122 106 CHAPTER 9. CONCLUSIONS when compared with its corresponding real view. The set of data was analyzed by using a multivariate analysis of variance for the main effect of the tone mapping operators. The result of the analysis shows that those seven tone mapping operators were perceived very differently in terms of all of the attributes when compared to their corresponding real-world views. Overall brightness shows the most significant differences among the tone reproductions and global operators have more brightness than local ones. The second most differently perceived attribute is detail reproduction in bright regions. In contrast to overall brightness, local operators are perceived with more details in bright regions than global ones. The Ashikhmin method is perceived as the most detailed tone reproduction in bright regions. Contrast and naturalness show almost the same significances. Global operators have more contrast than local ones, but the difference is not as large as for overall brightness. The Ward method showed the strongest contrast and the Drago method is perceived as the most natural. The least significant attribute among those five is detail reproduction in dark regions, but it is still significant enough. The Drago method is perceived as the most detailed one in dark regions. Because there are two types of attributes, correlations between naturalness and each of the image appearance attributes were tested. The result shows that none of the image appearance attributes has a strong influence on the perception of naturalness by itself. This may suggest that naturalness is dependent on a combination of the other attributes. All of other possible pairs between attributes were also tested. The biggest correlation happens between overall brightness and details in bright regions. MANOVA shows that the means of the set of data are located in a threedimensional space but neither on a point, along a line, nor on a plane. In terms of the Mahalanobis distances, the biggest differences are between linear tone mapping and each of the fast bilateral filtering, the Ashikhmin method, and the photographic tone reproduction. The least differences are between fast bilateral filtering and Ashikhmin, between Pattanaik and Ward, and between Ashikhmin and the photographic reproduction. The analysis shows that those tone mapping operators are divided into global and local categories by the Mahalanobis distances.

123 Appendix A Images for the Perceptual Experiment Figures A.1 and A.2 show the images which were used in our perceptual experiment. 107

124 108 APPENDIX A. IMAGES FOR THE PERCEPTUAL EXPERIMENT (a) Linear (b) Bilateral by Durand and Dorsey (c) Pattanaik (d) Ashikhmin (e) Ward (f) Reinhard (g) Drago Figure A.1: Images for Scene 1.

125 109 (a) Linear (b) Bilateral by Durand and Dorsey (c) Pattanaik (d) Ashikhmin (e) Ward (f) Reinhard (g) Drago Figure A.2: Images for Scene 2.

126 110 APPENDIX A. IMAGES FOR THE PERCEPTUAL EXPERIMENT

127 Appendix B Values of the Perceptual Experiment Figures B.1 B.5 show the values of our perceptual experiment. The values were normalized as x i x i µ x σ x where x i is a score and µ x and σ x are respectively the mean and the standard deviation over an attribute of a subject. The normalization was done over each attribute of each subject. 111

128 112 APPENDIX B. VALUES OF THE PERCEPTUAL EXPERIMENT Figure B.1: Values of the perceptual experiment for the subjects 1 3. Tone mapping operators are numbered as 1 (linear), 2 (bilateral), 3 (Pattanaik), 4 (Ashikhmin), 5 (Ward), 6 (Reinhard), and 7 (Drago). The attributes are O.B. (overall brightness), O.C. (overall contrast), D.D. (details in dark regions), D.B. (details in bright regions), and N. (naturalness). Those values are normalized over each attribute of each subject.

129 113 Figure B.2: Values of the perceptual experiment for the subjects 4 6. Tone mapping operators are numbered as 1 (linear), 2 (bilateral), 3 (Pattanaik), 4 (Ashikhmin), 5 (Ward), 6 (Reinhard), and 7 (Drago). The attributes are O.B. (overall brightness), O.C. (overall contrast), D.D. (details in dark regions), D.B. (details in bright regions), and N. (naturalness). Those values are normalized over each attribute of each subject.

130 114 APPENDIX B. VALUES OF THE PERCEPTUAL EXPERIMENT Figure B.3: Values of the perceptual experiment for the subjects 7 9. Tone mapping operators are numbered as 1 (linear), 2 (bilateral), 3 (Pattanaik), 4 (Ashikhmin), 5 (Ward), 6 (Reinhard), and 7 (Drago). The attributes are O.B. (overall brightness), O.C. (overall contrast), D.D. (details in dark regions), D.B. (details in bright regions), and N. (naturalness). Those values are normalized over each attribute of each subject.

115 Figure B.4: Values of the perceptual experiment for the subjects 10 12.

131 115 Figure B.4: Values of the perceptual experiment for the subjects Tone mapping operators are numbered as 1 (linear), 2 (bilateral), 3 (Pattanaik), 4 (Ashikhmin), 5 (Ward), 6 (Reinhard), and 7 (Drago). The attributes are O.B. (overall brightness), O.C. (overall contrast), D.D. (details in dark regions), D.B. (details in bright regions), and N. (naturalness). Those values are normalized over each attribute of each subject.

116 APPENDIX B. VALUES OF THE PERCEPTUAL EXPERIMENT Figure B.5: Values of the perceptual experiment for the subjects 13 14.

132 116 APPENDIX B. VALUES OF THE PERCEPTUAL EXPERIMENT Figure B.5: Values of the perceptual experiment for the subjects Tone mapping operators are numbered as 1 (linear), 2 (bilateral), 3 (Pattanaik), 4 (Ashikhmin), 5 (Ward), 6 (Reinhard), and 7 (Drago). The attributes are O.B. (overall brightness), O.C. (overall contrast), D.D. (details in dark regions), D.B. (details in bright regions), and N. (naturalness). Those values are normalized over each attribute of each subject.

Perceptual Effects in Real-time Tone Mapping

Perceptual Effects in Real-time Tone Mapping G. Krawczyk K. Myszkowski H.-P. Seidel Max-Planck-Institute für Informatik Saarbrücken, Germany SCCG 2005 High Dynamic Range (HDR) HDR Imaging Display of HDR