Elec4622: Multimedia Signal Processing Chapter 9: Introduction to Colour

Elec4622: Multimedia Signal Processing Chapter 9: Introduction to Colour Dr.D.S.Taubman September 29, 21 1 Introduction The purpose of this chapter is to give you an appreciation of what colour means, to correct some popular misconceptions, and to explain why there is no absolute way to interpret the colour produced by an imaging device. We begin by exploring the colour properties of imaging devices, noting that the Human Visual System (HVS) is itself an imaging device. This exploration immediately throws doubts upon our ability to unambiguously convert between the colour representations associated with different devices, and hence our ability to convert the colour data produced by an artificial imager into colour sensations for humans. We show how the perception and the conversion of colour representations generally depends on both the physical properties of the sensors and the illuminant under which scenes are imaged and later viewed. We explore how colour perception can be synthesized for human viewers using a monitor with additive colour primary channels. Along the way, we show why three colour primaries are not enough, contrary perhaps to what you may have been taught in primary school. The colour representations you encounter in image and video files have already been prepared for use with one of a few standardized display devices. While the process of converting the colour samples produced by a real imaging device into RGB samples for use with a display device is highly non-trivial and ill-posed, the process of converting between different standard colour spaces of this form is relatively straightforward 1. We introduce a variety of these standard colour spaces and their properties. 1 Although this is not normally as simple as applying a matrix transform, as some texts and web-sites may suggest, due to the presence of non-linear gamma functions. 1

c Taubman, 21 Elec4622: Colour Page 2 2 Colour 2.1 Colour Imaging Systems Recall that the general relationship between scene radiant intensity r λ (s), and the samples output by an imaging device is Z Z Z v p [n] = h s (n s) r λ (s)ρ p (λ)dλ ds, 1 p P (1) Here, v p [n] denotes the p th of P colour planes, whose colour sensitivity is given by ρ p (λ), andh s (s) denotes the system (virtual) point-spread function. Typical imaging devices have either P =3colour planes (red, green and blue) or P =4colour planes (cyan, magenta, yellow and green). More correctly, h s (s) should be modeled as a function also of wavelength λ. However, in this chapter our focus is on sensing colour. For this reason, we are not interested in the spatial blur introduced by h s (s); indeed, we will completely ignore all spatial dependencies and collapse equation (1) down to v 1 v 2. v P = R R R r (λ) ρ 1 (λ)dλ r (λ) ρ 2 (λ)dλ. = r (λ) ρ P (λ)dλ hr, ρ 1 i hr, ρ 2 i. hr, ρ P i If you like, equation (2) describes the vector of colour samples at a single pixel location, produced in response to a radiant scene power spectrum of r (λ). Note that we can think of the functions r (λ) and ρ p (λ) as vectors (or objects) and the integration of their product as an inner product on these vectors. This is the normal definition of inner product for functions and satisfies all the desirable properties and geometric interpretation of inner products. The most straightforward mechanism for sensing colour is to use multiple imaging systems, one for each colour plane, carefully aligning the spatial and optical properties of these different systems. Ideally, the only distinction between the imaging systems associated with each colour plane is that they should each use a different colour filter, which selectively attenuates particular wavelengths as identified by the different spectral response functions, ρ p (λ). This is essentially what happens in so-called 3-CCD video cameras. Most professional video cameras have 3 CCD s. In this case, a prism is usually employed to split the chromatic components of the scene radiance into red, green and blue bands. Unfortunately, the manufacturing (2)

c Taubman, 21 Elec4622: Colour Page 3 Figure 1: Bayer CFA pattern. precision required to align optical and sensor components for the separate colour planes renders this approach infeasible for low cost consumer video cameras and also for high resolution devices in digital cameras. Alternatively, a single imaging system may be employed repeatedly to capture a succession of images, placing different colour filters between the scene and the image sensor surface in order to build up a collection of image planes. The drawbacks of this approach in the event of scene motion are obvious. In most electronic colour imaging systems, a single sensor array is employed, with a super-imposed Colour Filter Array (CFA), such as that shown in Figure 1. In this case, colour aliasing becomes a serious problem as demonstrated in Chapter 3. Aliasing can be reduced or eliminated by carefully shaping the system PSF, which is a combination of the optical PSF and the sensor integration aperture. Modern digital still cameras and many video cameras typically employ specific optical elements known as anti-aliasing filters, to reduce the severity of aliasing artifacts. These often prove necessary because the PSF of the lens system itself is dependent upon properties such as lens aperture and distance from centre-field, which are difficult to control. 2.2 Metamerism From a mathematical perspective, colour is not a three-dimensional phenomenon, but a property of the continuous (in wavelength) scene radiant intensity. The individual colour samples given in equation (2) are essen-

c Taubman, 21 Elec4622: Colour Page 4 tially projections 2 of the continuous radiance function onto a collection of P different colour sensitivity functions, ρ p (λ). Evidently, it is not possible to reconstruct the original scene radiance from any finite number of such projections, since information is lost. In colour science, two different scene radiances, r 1 (λ) and r 2 (λ), are said to be metamers with respect to a given imaging system, if they are indistinguishable on the basis of the colour plane values v p.thatis, Z (r 1 (λ) r 2 (λ))ρ p (λ) dλ = r 1 r 2, ρ p =, p =1, 2,...,P Obviously an imaging system with more colour planes has less metamers and can better distinguish scene radiances, provided the basis functions ρ p, are linearly independent, i.e. so long as there is no non-zero set of weights α p,forwhich P p α pρ p (λ) is identically equal to. It is also clear that two imaging systems will have exactly the same set of metamers if and only if their spectral sensitivities are linear combinations of one another. That is, ρ (2) 1 (λ) ρ (2) 2 (λ). ρ (2) P (λ) = A ρ (1) 1 (λ) ρ (1) 2 (λ). ρ (1) P (λ) for all λ (3) where A is some P P non-singular matrix (not a function of λ), which expresses the colour sensitivities for imaging system 2 in terms of those for imaging system 1. As we shall see later on, the Human Visual System (HVS) contains three different types of colour receptors, known as the L (long wavelength), M (medium wavelength) and S (short wavelength) cones. In practice, it is extremely difficult to build a set of colour image sensors whose spectral sensitivities are linear combinations of the HVS cone sensitivities (in fact even these vary somewhat between individuals, as a function of age, and as a function of the angle of incidence of the light). Consequently, colour imaging systems typically have different sets of metamers to the HVS. That is, some 2 Recall that the projection of a vector v onto a basis vector e is the vector α e, such that v α e is orthogonal to e. Inparticular,α = hv, ei / kek. Ifwefirst normalize the colour sensitivities such that ρ p =1(i.e., ρ 2 p (λ) dλ =1), then r, ρ p is the projection of the scene radiance onto the colour sensitivity (basis) vector ρ p.

c Taubman, 21 Elec4622: Colour Page 5 colours which appear identical to humans will yield different colour vectors, [v 1,v 2,...,v P ] t, while some colours which appear different to humans will yield the same colour vector. The existence of metamers serves to demonstrate fundamental limitations on what we can expect from a colour imaging system. 2.3 Colour Gamut The term gamut refers to the set of coordinate values which are legal within some representation of colour. For example, since the physical radiance function r(λ) must be non-negative for all λ, we cannot generally expect that all possible colour vectors [v 1,v 2,...,v P ] t are realizable. Gamuts are usually expressed with respect to the normalized (P 1)-dimensional vector, v 1 v 1 v 2. = 1 v 2 P P p=1 v p. v P 1 v P 1 v 1 v 2 v 1 +v 2 +v 3. Expressed in this way, the gamut inevitably occupies a bounded region in (P 1)-dimensional space. Certainly, each normalized coordinate must satisfy v p 1. We will restrict discussion for the moment to only P =3different spectral sensitivity functions ρ p, as in the Human Visual System (HVS) and in most colour imaging devices. Then, the gamut is the set of all possible combinations of the two normalized coordinates, v 1 = v 1 +v 2 +v 3 and v 2 = Conceptually, the gamut may be determined by considering all possible radiance functions r(λ), forwhich P p v p = P R p ρp (λ)r(λ)dλ =1and plotting the 2-D points (v 1,v 2 )=( v 1, v 2 ),whicharise. It is not difficult to show that the gamut must be convex. That is, if vectors v a and v b both belong to the gamut then so do all convex combinations, t v a +(1 t) v b, t 1. Not only that, but convexity combined with positivity of the spectral radiance function r(λ), imply that the boundary of the gamut is described by the points v λ, associated with pure chromatic light sources with wavelength λ. Thus, determination of the gamut is a relatively straightforward matter. We shall see the most important example of this in Section 3. The above convexity relationship has important implications and is the subject of a tutorial problem.

c Taubman, 21 Elec4622: Colour Page 6 2.4 Effect of the Illuminant Recall that the scene radiance is related to the underlying surface properties via r(λ) =s(λ)l(λ) where L(λ) is the illuminant power spectral density. Thus, the colour plane values v p are related to the scene surface reflectance s(λ), according to v p = Z s(λ) L(λ)ρ p (λ) dλ {z } ρ p (λ) This means that the effect of different illumination conditions is equivalent to constant illumination with a different set of virtual sensor sensitivities, ρ p(λ) =ρ p (λ)l(λ). This also means that the sets of surfaces which are indistinguishable to the imaging system are the metamers of this virtual set of sensor sensitivities, which depends strongly upon the illuminant. This explains why the HVS can distinguish some colours (actually surfaces) under one illuminant, but not another. An example with which you are probably familiar arises when trying to select clothing under incandescent or fluorescent lighting, only to discover that the selected clothes do not match under broad daylight. The illuminants L(λ), under which the HVS can function effectively, have wildly different spectral power distributions, ranging from incandescent lighting (tungsten filament light bulb) which is strongly red, to broad daylight which is significantly blue. 3 Perception and Colour Synthesis 3.1 The Eye as a Colour Sensor Up until now we have been concerned with general imaging systems. The Human Visual System (HVS) is itself an imaging system. Figure 2 identifies the major elements of the HVS for the purpose of our current discussion. The front end of the HVS shares many properties in common with man made imaging systems. Light from the scene passes through an optical system, after which it is integrated by a collection of photo-receptors which have different spectral responses. These processes are linear and may be modeled and understood in the same way as a digital camera. What follows is a highly non-linear processing system involving many layers of neurons,

c Taubman, 21 Elec4622: Colour Page 7 Scene radiance Optics: (lens, iris, cornea) Sampling: (rods and cones) Pattern Sensors Cortical Representation Retinal Representation Figure 2: Elements of the Human Visual System. in both the eye and the brain, with bi-directional interactions. These subsequent processes are not fully understood. However, we will focus on the relatively simple front end. The HVS contains two types of photo-receptors: rods and cones. The former are involved mostly in night vision and have no impact on colour vision, so we will concentrate only on the cones here. There are three different types of cones: L, signifying long wavelength, i.e. red; M, signifying medium wavelength; and S, signifying short wavelength, i.e. blue. The cones are organized on a roughly hexagonal lattice (better packing than a rectangular grid) and have higher density at the fovea, a restricted portion of the visual field where image details are resolved. The S cones appear in much lower density (about 5% of the total) than the L and M cones. Consequently, the S cones do not contribute substantially to the resolution of spatial details (that is why deep blue text on a black background, or yellow text on a white background, is almost unreadable for the Web page designers out there). Figure 3 shows the spectral responses of the three cone types. In colour science the colour properties of the HVS are usually expressed in terms of the spectral responses of three non-physical tri-stimulus functions, rather than the responses of the cones themselves. More specifically, colorimetry is normally performed with respect to the so-called 1931 CIE tri-stimulus functions, which are based on the so-called 2 observer 3. The tri-stimulus functions are just linear combinations of the cone responses themselves, so they span the same sub-space of the visible spectrum and have the same metamers as the cones. These tri-stimulus functions are denoted t x (λ), t y (λ) and t z (λ) and their response to a particular radiant spectral density 3 The test images subtend an angle of 2 at the observer s eye.

c Taubman, 21 Elec4622: Colour Page 8 Figure 3: Responses of the Long, Medium and Short wavelength cones of the HVS, as a function of the wavelength, expressed in nano-metres.

c Taubman, 21 Elec4622: Colour Page 9 Figure 4: Tri-stimulus response functions. r(λ), is denoted by an XYZ vector, R X R dλ r(λ)tx (λ) Y = R dλ r(λ)ty (λ) Z dλ r(λ)tz (λ) Figure 4 depicts the spectral sensitivities of the tri-stimulus functions. These functions are related to the cone LMS functions c l (λ), c m (λ) and c s (λ), according to t x (λ) t y (λ) = t z (λ) 2.945 3.51 26.443 1 1 1 124.844 c l (λ) c m (λ) c s (λ) Note that even though this relationship involves negative matrix coefficients, the tri-stimulus functions are nevertheless strictly positive, so that the XYZ representation is also strictly positive. In particular, this means that the tri-stimulus functions might as well be the spectral sensitivity functions of the human visual system, for what concerns the gross perception of colour. The Y coordinate is associated with luminance, so that t y (λ) can and should be interpreted as the relative significance of different wavelengths to the perceptual sensation of brightness in the HVS, as determined by psychovisual experiments. The X coordinate is essentially the difference between

c Taubman, 21 Elec4622: Colour Page 1 Figure 5: Gamut of the HVS. the responses of the L and M cones. There is substantial physiological and psychovisual evidence to support the claim that such a difference is formed as one of the first neurological processing steps in the HVS and that these differences play a key role in perception. The Z coordinate is just a scaled version of the S cone response. The reason for working with the tri-stimulus functions, rather than the cone responses is that they provide a better model of the way in which the cone responses contribute to perception at later stages in the HVS, without sacrificing the generality of the representation: both sets of spectral sensitivity functions are non-negative and have the same sets of metamers. Since the XYZ coordinates scale uniformly with variations in scene illuminant power, it is convenient, and for many purposes sufficient, to restrict our consideration to the normalized so-called chromaticity coordinates, x and y, defined by µ µ x X = y X+Y +Z Y X+Y +Z Clearly these chromaticity coordinates satisfy x, y 1. From the discussion of gamut in Section 2.3, however, one would be right to suspect that the legal chromaticity pairs fall within a smaller region. Recall from that section, that the gamut is a convex region whose boundary is traced out by the chromaticities of pure spectral sources at each wavelength in the visible spectrum. This is the interpretation of the so-called chromaticity diagram which appears in Figure 5.

c Taubman, 21 Elec4622: Colour Page 11 3.2 Display Primaries, Gamut and Gamma In this section we consider the synthesis of colour stimuli using a monitor or similar display device with three colour channels, m 1, m 2 and m 3.Letκ 1(λ), κ 2 (λ) and κ 3 (λ) denote the spectral power distribution associated with these three display channels (e.g. the three monitor phosphors). Suppose that we wish to induce a particular tri-stimulus response represented by the vector [x, y, z] t ; then the monitor signal vector [m 1,m 2,m 3 ]t must satisfy R R R x dλ tx (λ)κ 1 (λ) dλ tx (λ)κ 2 (λ) dλ tx (λ)κ 3 (λ) y z = R R dλ ty (λ)κ 1 (λ) dλ tz (λ)κ 1 (λ) m 1 R R dλ ty (λ)κ 2 (λ) dλ tz (λ)κ 2 (λ) R R dλ ty (λ)κ 3 (λ) dλ tz (λ)κ 3 (λ) m 1 m 2 m 3 = M t m 2 (4) m 3 This means that, provided the monitor transfer matrix M t is non-singular, we can assign m 1 x m 2 = M 1 m t y 3 z It should be noted, however, that the monitor signals cannot generally be non-negative. The constraint m p restricts the set of tri-stimulus responses which can be feasibly induced. To see that such a restriction is inevitable, consider once again the chromaticity diagram, which is reproduced in Figure 6. Let [x 1,y 1 ] t, [x 2,y 2 ] t, [x 3,y 3 ] t denote the chromaticity vectors corresponding to the three monitor channels. That is, µ µr xp 1 = R y R dλ κ p (λ)t x (λ) p dλ κ p (λ)[t x (λ)+t y (λ)+t z (λ)] dλ κ p (λ)t y (λ) Then the set of chromaticities which can be generated by adding different non-negative combinations of these three display spectral power densities is simply the set of all convex combinations of the three chromaticity vectors, i.e. the solid triangle having these three chromaticities as its vertices. Proof of this fact is left as an interesting exercise for the student and may also be given in lectures. As a result, the set of all chromaticities which can be generated by any given monitor is limited to a triangular subset of the XYZ gamut, as shown in Figure 6. Since the XYZ gamut is not itself triangular, this implies that it is physically impossible to reproduce the appearance of

c Taubman, 21 Elec4622: Colour Page 12 Figure 6: Gamut of the XYZ representation with superimposed triangle corresponding to the gamut of some display.

c Taubman, 21 Elec4622: Colour Page 13 all visible colours using only three colour channels. One could add more colour channels in order to increase the gamut, but in every case the display gamut must be the convex hull of the chromaticities of the display channels, i.e. a polygon with the display chromaticities at its vertices. Since the XYZ gamut is not itself a polygon, no finite collection of display channels is capable of completely representing all visible colours. In practice, however, the periphery of the XYZ gamut corresponds to extreme colours which rarely occur in nature, so three colour channels with well positioned chromaticities are usually adequate. We will not explicitly consider colour printing systems here, since there analysis is much more complex. Colour printing is approximately modeled as a subtractive, rather than an additive process, but it is not strictly linear. Consequently, printer gamuts are not generally polygons. They are usually determined empirically during the design of a colour printer, and large colour lookup tables are used to derive the best combination of colour ink/toner to reproduce a particular XYZ stimulus. Before concluding our discussion of display devices, a number of additional comments must be made. Firstly, it is not necessary or customary for manufacturers to specify the power spectral distributions of the three colour channels directly. From equation (4), it is evident that at most nine numbers, R dλ κ p (λ)t (λ) are sufficient to define the properties of a monitor or other display device which are relevant to the human viewer. In fact, these properties are usually specified somewhat indirectly, up to a scale factor, through four chromaticity vectors (eight parameters). These are the chromaticities [x p,y p ] t of each colour channel and the whitepoint, denoted [x w,y w ], which represents the relative strength of the three colour channels. The whitepoint is the chromaticity vector which results when all monitor signals are set equal, say to m 1 = m 2 = m 3 =1.Thus, µr µ R dλ [κ 1 (λ)+κ 2 (λ)+κ 3 (λ)]t x (λ) xw dλ [κ 1 (λ)+κ 2 (λ)+κ 3 (λ)]t y (λ) = y w dλ [κ 1 (λ)+κ 2 (λ)+κ 3 (λ)][t x (λ)+t y (λ)+t z (λ)] R After some algebra, the monitor transfer matrix M t may be derived from these eight quantities, up to a scale factor. The second point which must be made here is that the monitor channel strengths m 1, m 2 and m 3 used in the previous discussion, are not linearly related to the channel drive voltages or the 8-bit values used to specify colour values in a 24-bit computer display. Instead, m 1, m 2 and m 3 represent the intensity of the light emitted by the three monitor phosphors. To be more

c Taubman, 21 Elec4622: Colour Page 14 precise, let these intensity values be normalized to the interval [, 1], sothat the maximum intensity corresponds to m p =1. Also, let the corresponding channel drive voltages or, equivalently, the computer generated colour channel values, be denoted m 1, m 2 and m 3 and also normalized to the range [, 1]. The relationship between m p and m p is normally modeled by a gamma function, m p = G γ,β m p ( g m p if m p = (1 + β)(m p) 1 (5) γ β if <m p 1. where the values of γ and β are identical for all three channels and the values of and g are given in terms of γ and β by γ = β ³ (6) (1 + β) 1 1 γ and β g = (7) (γ 1) A typical gamma function is shown in Figure7. Itconsistsofaninitial linear segment with slope g and breakpoint, which transitions to an offset exponential portion with exponent γ and offset β. The above relationships between γ, β, and g are derived in order to ensure that the function is continuous and has a continuous firstderivativeatthebreakpointbetween the linear and non-linear segments. The gamma function is often incorrectly defined as m p = m 1 γ p, but the initial linear segment plays an important role in limiting the gradient for low amplitude signals and hence ensuring that the function has a well conditioned inverse. The gamma function approximately models the behaviour of real monitors. Perhaps even more importantly, however, standard colour image spaces are generally specified in terms of reference gamma corrected RGB channels. For example, the so-called standard RGB space (srgb), adopts a reference gamma function with γ =2.4 and β =.55. For more information on standard colour spaces, consult Section 4. 3.3 Dynamic Range and Weber s Law In this section, we consider the sensitivity of the HVS to changes in light intensity (luminance). The HVS is able to operate within an extraordinarily

c Taubman, 21 Elec4622: Colour Page 15 1..8 m p.6.4.2.2.4.6.8 1. m p Figure 7: Typical gamma function. large range of luminance conditions, spanning approximately 1 orders of magnitude. The bottom end of this range involves only the rods ( scotopic vision), which are more sensitive but not involved in vision at higher light levels. We are interested here primarily in the cones ( photopic vision). Although the HVS is capable of operating over an enormous dynamic range, sensitivity is far from uniform over this range. If we were to build an A/D converter capable of a dynamic range of say 1 6, then approximately 2 bit precision would be required, which cannot be achieved at the sampling rates required to digitize an image, let alone video. Although the HVS can operate over a comparable dynamic range, it achieves this by introducing a non-linearity so that the number of actually discernible intensity levels is vastly smaller than 1 6. In particular, the HVS responds approximately logarithmically to luminous intensity, which we might write as ½ µ ¾ y y max, log where y is the smallest detectable intensity. Weber s law states that the smallest change in luminous intensity δy, which can be detected against a background of intensity y, is proportional to y. Thatis, δy y is a constant. This makes sense in terms of the logarithmic model above. Differentiating both sides we find that y δy δy y

c Taubman, 21 Elec4622: Colour Page 16 so Weber s law may be understood in terms of uniform sensitivity to changes in y. Weber s law suggests that maybe we should represent image intensities in the log domain, so that uniform changes in the digital representation correspond to approximately uniform changes in perceived intensity. In practice, this is rarely done; however, image intensities are normally represented in the gamma corrected domain, which possesses many of the same properties as the log representation. The gamma function has already been defined in equation (5). Typical gamma values for RGB representations range from γ =2.2 to γ =2.8. The Lab representation which we will encounter shortly is intended primarily for use in modeling visual sensitivity to changes in colour and intensity. It involves a gamma value of γ =3,whichtracks Weber s law reasonably closely over the most useful range of intensities. 4 Standard Colour Image Representations 4.1 Linear XYZ The XYZ representation has already been discussed. It is the canonical representation of the effect of colour stimuli on the HVS. Note that XYZ is a linear representation, so that a large number of bits are usually required to represent the three colour coordinates. It is common to use floating point representations. Importantly, all of the standard colour spaces described in this section are essentially equivalent to XYZ in that they are just different numerical representations of the effect of colour stimuli on the HVS. Thus, it is possible to convert between any of these representations with relative ease. 4.2 Linear RGB Linear RGB spaces are derived from XYZ by means of a simple 3 3 matrix transform. The reason for adopting a linear RGB representation instead of XYZ is to prepare a colour image for a display device with a particular set of RGB primaries. As discussed in Section 3.2, the intensities of the three primaries, r, g and b, required to produce a given XYZ tri-stimulus response, must satisfy r x g = M y b z

c Taubman, 21 Elec4622: Colour Page 17 The transform matrix M, may be determined from the chromaticity coordinates and whitepoint of these three primaries, up to a scale factor. The prime notation is common when referring to RGB coordinates which are linearly related to scene radiant intensity, as in this case. Consequently, these colour spaces are often denoted R G B spaces. This is to distinguish them from the more common use of the term RGB for a gamma corrected colour space. There are several standard sets of RGB primaries. We describe the most important one here: Linear srgb: This is the linear form of what has become the most popular choice for a standard colour space in digital imaging applications; the s in srgb is supposed to stand for standard. Regular srgb is a gamma corrected space, so the linear form should perhaps be denoted sr G B. 4 The whitepoint and primary chromaticities are x w x r x g x b y w y r y g y b =.3127.329.64.33.3.6.15.6 The whitepoint for srgb is identical to the colour (same chromaticity coordinates) of a white surface under the standard daylight D65 (i.e., the spectrum produced by the standard CIE daylight spectral model, at a colour temperature of 65 Kelvin). For reference we note that XYZ itself may be understood as a linear RGB space described by the chromaticity and whitepoint coordinates: 1 1 x w y w 3 3 x r y r x g y g = 1 1 x b y b 4.3 Gamma Corrected RGB Gamma corrected RGB spaces are derived from the relevant linear RGB colour space by applying a gamma correction function of the form indicated in equations (5), (6) and 7), to each of the linear colour channels. The gamma function is identical for each colour channel and completely specified 4 The linear form of srgb is generally identified as linear srgb instead.

c Taubman, 21 Elec4622: Colour Page 18 by the two parameters γ and β, of which the former provides most information concerning the shape of the function. The standard srgb colour space uses γ =2.4 and β =.55. 4.4 YIQ, YUV and YCbCr The so-called opponent colour spaces were developed initially for television broadcast, but have become fundamental to colour image compression applications, as well as a variety of colour image processing tasks, such as contrast enhancement. These colour spaces are linear mappings of gamma corrected RGB coordinates, in which one component represents intensity while the other two chrominance components represent the colour properties through differences between the colour channels. The intensity component is often called the luminance component, although this should not be interpreted as a precise psychometric quantity representing perceived luminous intensity. In all standard opponent colour representations, each of the underlying gamma corrected RGB coordinates is represented as the sum of the intensity/luminance component and a linear combination of the two chrominance components. Three common examples are given here: YIQ: This colour representation was chosen for the first colour television broadcast system, NTSC (now used in the U.S.A., Japan and a number of other countries). It is defined in terms of gamma corrected NTSC primaries, whose whitepoint, primary chromaticities and γ and β values will not be given here for the sake of brevity. The linear mapping is y.299.587.114 r i =.59947.27589.32358 g q.21389.52243.3854 b The chrominance components, I and Q, have been defined in such a way as to ensure that the I axis corresponds roughly to skin tones to which the HVS is particularly sensitive, while the Q axis is orthogonal to the I axis. In the NTSC television standard, the I channel is assigned more transmission spectrum than the Q signal for this reason. The naming convention arises from the fact that in the NTSC composite video standard, the I signal is modulated In-phase with the chrominance band sub-carrier, while the Q component is modulated with Quadrature phase.

c Taubman, 21 Elec4622: Colour Page 19 YUV: This colour representation was chosen for the PAL television standard used in Europe, Australia and other parts of the world. YUV is a variant on the YIQ representation in which the chrominance channels, U and V, are obtained by rotating the I and Q axes. The relevant linear mapping is y u = v.299.587.114.614777.514799.99978.14747.289391.436798 r g b YCbCr: This is perhaps the most popular colour space for digital image processing applications, because it has the property that if the underlying gamma corrected RGB space is represented by 8-bit numbers then the Y, Cb and Cr values may also be represented by 8-bit numbers, spanning the full dynamic range. This property is introduced simply by scaling the U and V channels of the YUV representation given above. The relevant linear mapping is: y c b = c r.299.587.114.16875.33126.5.5.41869.8131 r g b Colour image compression is usually performed within a YCbCr representation of some underlying set of gamma corrected RGB coordinates. Note that in each of the above colour spaces, the symbol Y is used for the intensity/luminance channel. This should not be confused with the Y channel of the XYZ representation, which is a more fundamental luminance quantity. 4.5 CIE Lab The CIE Lab (or just Lab) colour space was developed in order to reflect psychometric observations concerning the sensitivity of the HVS to changes in colour. Formally, CIE Lab may be understood as an opponent colour space in the sense that it is a linear mapping of gamma corrected primaries, where the primaries are whitepoint normalized XYZ values, rather than

c Taubman, 21 Elec4622: Colour Page 2 monitor RGB primaries. Specifically, the Lab space is defined by L = 1G γ,β µ Y Y µ µ a = 431 G γ,β G γ,β XX YY µ µ b = 172.4 G γ,β G γ,β YY ZZ where G γ,β ( ) is the gamma function defined by equations (5), (6) and (7), with γ =3and β =.16 and X, Y, Z are the XYZ tri-stimulus values corresponding to the whitepoint of the illuminant to which the assumed viewer is adapted. Thus, Lab is actually a family of colour spaces, parametrized by the adaptation illuminant, L a (λ). The whitepoint is given by R R X dλ t x (λ)l a (λ) Y = dλ t y (λ)l a (λ) R Z dλ t z (λ)l a (λ) Now the surface spectral reflectance s(λ) cannot be greater than 1, sothe actual XYZ values given by equation (8) cannot be larger than the corresponding whitepoint coordinates. It follows that the quantities X X Y, Y and Z Z should all lie in the range to 1. Recall that the gamma function G γ,β ( ) is defined ontheinterval[, 1]. As mentioned, the CIE Lab colour spaceisimportantbecauseitcan be used to express the perceptual significance of colour differences. Let [L 1,a 1,b 1 ] t and [L 2,a 2,b 2 ] t be two slightly different colours, represented in Lab space. The perceptual significance of this colour difference is expressed in terms of the CIE δe formula as δe = p (L 1 L 2 ) 2 +(a 1 a 2 ) 2 +(b 1 b 2 ) 2 In other words, the CIE Lab colour space has been designed so that the perceptual difference between similar colours can be approximated by the Euclidean distance between their Lab coordinates. Moreover, a value of δe =1corresponds to a just noticeable difference (JND) between the colours. 5 Colour and Illuminant Conversion In Section 3.2, we considered the use of a display system to induce a specific response in the HVS. Although somewhat involved, that problem is well

c Taubman, 21 Elec4622: Colour Page 21 defined and can be tackled in a deterministic manner. In fact, all of the standard colour spaces are essentially related to XYZ through the specification of some hypothetical display system, as discussed in Section 4. In this section, we briefly consider the much more difficult and ill-posed problem of converting the response of an arbitrary colour imaging device into a colour image which is suitable for presentation to a human viewer. The idea is to introduce the reader to some of the relevant issues. To appreciate the difficulty of the problem, recall from Section 2.2 that the metamers of an imaging system depend upon the P - dimensional function sub-space spanned by the spectral response functions, ρ 1 (λ),ρ 2 (λ),...,ρ P (λ). Two imaging devices have exactly the same metamers if and only if their spectral response functions are related by an invertible matrix A, according to equation (3). Now both the HVS and the relevant digital imaging device may both be interpreted as imaging systems in this sense and they will have different sets of metamers, unless we happen to be able to design an imaging system whose spectral responses are linear combinations of the XYZ tri-stimulus functions; we cannot generally do this. Different sets of metamers means that some colours which appear differently to the HVS will be indistinguishable to our man-made imaging device and so perfect colour conversion is clearly impossible. The complexity of the problem is augmented by the fact that the HVS perceives colour in a manner which is sensitive to the ambient viewing conditions. If we could capture an image which represented the entire field of view which could have an impact on a human observer and then reproduce this image again over the entire field of view of the observer, say in a dark projection theatre, or through a visor, then this adaptation to ambient conditions could be discounted. However, we rarely have these luxuries. Images are often viewed on a monitor or as a printed image, within a restricted portion of the observer s field of vision. As a result, colours are perceived with respect to the context of the ambient lighting. As an example, suppose we captured an image at sunset with a digital camera, whose colour channels had exactly the same spectral responses as the cones of the HVS, so that we could in principle recreate the exact sensations which a human observer in the same scene would have received, by appropriately displaying the resulting image. This process would be effective only if the image was displayed (say on a monitor) to an observer who happened to be watching the sun go down at exactly the same time of day (i.e. the observer would need to be adapted to the same scene). If the image was projected to a viewer sitting in front of a computer terminal under fluorescent light, the sensation would be quite different.

c Taubman, 21 Elec4622: Colour Page 22 Visual adaptation is a poorly understood phenomenon and appears to be connected with another poorly understood phenomenon known as colour constancy, which refers to our remarkable ability to correctly identify the colour of different surfaces (e.g. a red jumper, a blue ball, etc.) under wildly different illuminant power spectral densities ranging from incandescent or even candle lighting, which is strongly red, to broad daylight, which is quite blue. In view of these complexities, approximate solutions to the colour conversion problem are often undertaken within the context of the following arguments. Suppose an image is captured using an imager with spectral response functions ρ p (λ), under a single illuminant with power spectral density L o (λ) (the o subscript is for original ). Suppose then that the image is to be presented to a human viewer with the tri-stimulus response functions t x (λ), t y (λ) and t z (λ), who is adapted to an ambient scene illuminant with power spectral density L a (λ) (the a subscript is for adapted ). Then we would like to stimulate the human viewer s cones in such a way as to produce the sensation of the original scene surfaces, viewed under the illuminant conditions to which he is adapted. That is, a scene surface with surface reflectance function s(λ) might be represented by x y = z R R dλ t x (λ)l a (λ)s(λ) R dλ t y (λ)l a (λ)s(λ) (8) dλ t z (λ)l a (λ)s(λ) Since all standard colour representations are ultimately based on the XYZ representation, our task is to find [x, y, z] t, from the colour image planes [v 1,v 2,...,v p ] t produced by the imaging device. These values satisfy R v 1 dλ ρ 1 (λ)l o (λ)s(λ). =. (9) dλ ρ P (λ)l o (λ)s(λ) v P R Comparing equations (8) and (9), we see that the fundamental quantities of interest to the colour conversion problem are not the spectral sensitivity functions themselves, but the illuminant-modified functions, ρ p (λ)l o (λ) and t p (λ)l a (λ). If we are luck enough to be able to find a matrix A, such that ρ 1 (λ)l o (λ) t x (λ)l a (λ) t y (λ)l a (λ) ρ 2 (λ)l o (λ) = A, λ (1) t z (λ)l a (λ). ρ P (λ)l o (λ)

c Taubman, 21 Elec4622: Colour Page 23 then A is the required colour conversion matrix. Evidently, it is highly unlikely that such a matrix exists. Nevertheless, this motivates some practical approaches to the colour conversion problem, in which a colour conversion matrix A is selected in order to minimize the difference between the left and right hand sides of equation (1) in some suitable sense. For example we might look for the matrix A, which minimizes R R ρ 1 (λ)l o (λ)s(λ) 2 R t x (λ)l a (λ)s(λ) R E R t y (λ)l a (λ)s(λ) ρ 2 (λ)l o (λ)s(λ) A (11) t z (λ)l a (λ)s(λ). ρ P (λ)l o (λ)s(λ) where the expectation is taken over the statistical distribution of typical surface reflectance functions s(λ). A number of researchers have studied and reported statistical properties of scene surface reflectance functions. Simple squared error metrics of this form do not account for the fact that errors in certain colours are more important to the human viewer than errors in other colours. Of particular importance is the scene whitepoint. That is, white surfaces in the original scene should be reproduced as white surfaces under the illuminant to which the viewer is adapted. A white surface is a surface which reflects light uniformly over all wavelengths, i.e. s(λ) =1, λ. Thus, it is particularly important to choose the colour conversion matrix A, in such a way as to ensure that R R R t x (λ)l a (λ) R ρ 1 (λ)l o (λ) w a = R t y (λ)l a (λ) ρ 2 (λ)l o (λ) = A t z (λ)l a (λ). = A w o (12) ρ P (λ)l o (λ) Here, w o and w a are the original scene whitepoint and the viewer s adapted whitepoint, respectively. The whitepoint constraint of equation (12) may be imposed on the minimization problem expressed in equation (11). Alternatively, when the number of colour channels P =3, itis quite common to further simplify the colour conversion task by setting A to be the diagonal matrix (i.e. a set of 3 multiplication factors, one for each colour channel) which equalizes the scene and adapted whitepoints. This is referred to as a white balancing operation. Unfortunately, white balancing is not generally sufficient unless the spectral sensitivities of the imaging device are already close to the HVS tri-stimulus functions, or a linear combination of them. R R

c Taubman, 21 Elec4622: Colour Page 24 Before concluding this section, we note that it is often not possible to address colour conversion problems with the rigour suggested by the foregoing discussion, because neither the original scene illuminant L o (λ), northe adapted illuminant L a (λ), are exactly known. It is common to assume that the adapted illuminant is equivalent to a particular daylight colour known as D65, although most computer monitors are set up for a whitepoint corresponding to a more blue daylight colour known as D93 by default 5. The numeric quantity in this labeling is the so-called colour temperature, which can be explicitly set on most colour monitors (D65 indicates a colour temperature of 65 Kelvin). It is also common to assume that outdoor images are captured under a D5 illuminant, which is known as photographic daylight. Under some conditions, the illuminant colour can be explicitly measured. There are also a variety of techniques for guessing the illuminant from the captured image. Finally, recall that most scenes contain non-uniform illumination which further pushes the ideal of perfect colour conversion out of our grasp. 5 The principle reason for this is that it is easier to produce more illumination power in the blue channel of a monitor. Monitor manufacturers find that the brightness of the screen can be an important selling point so they aim to maximize blue content in their default colour settings. D93 is about as far as you can go before the colour becomes too noticeably wrong, but if you are interested in getting colour right, you should generally select D65. If you are viewing the monitor output under a tungsten light source (common in homes), you should probably select the lowest available colour temperature.