Motivation Image Intensity and Point Operations Dr. Edmund Lam Department of Electrical and Electronic Engineering The University of Hong ong A digital image is a matrix of numbers, each corresponding to certain brightness. : Digital Image Proessing (Second Semester, 2016 17) http://www.eee.hku.hk/ elec4245 1 / 48 2 / 48 Motivation A/D converter D/A converter sensor digital image display finite dynamic range finite representation finite dynamic range These numbers are called intensity, or gray levels must be nonnegative must fall within a range of discrete values (dynamic range) are measured by the number of bits. 3 / 48 Intensity Levels Psychovisual research indicates that: For a typical person, the eyes can adapt to a wide range of light intensity levels, on the order of 10 10 from the scotopic threshold to the glare limit. At a particular instance, the eyes respond to a much narrower range. We are often interested in the extent that our eyes can detect changes in light intensity levels. The actual brightness perceived is a logarithmic function of the light intensity arriving at the eyes. Our eyes can also be tricked; the perceived brightness also depends on the surrounding. 4 / 48
Gray Levels Gray Levels How many gray levels are enough to show intensity variations? Often, 8-bit. 2 8 = 256 levels Your computer likes it: 8 bits = 1 byte For an image of size X Y, your computer can store it with XY bytes (each pixel needs 1 byte to store its intensity). In reality, we need much less, due to compression. But other values exist: Printing: We may only have 1-bit ( ink or no ink at a specific location) High dynamic range (HDR) imaging: With better sensors and displays, we may record and show a wider range 8-bits 7-bits 6-bits 5-bits 4-bits 3-bits 2-bits 1-bit 5 / 48 6 / 48 Gray level mapping 1 2 3 We will focus on discussing gray level images as the notations and concepts are much easier to understand. For color images, we can always perform such operations on the luminance channel. (more about channels later) Let the input image be represented by I in (x, y). We process the image, and the output is represented by I out (x, y). The simplest kind of processing is a point-wise operation: I out (x, y) = T { I in (x, y) } where T can be a one-to-one mapping (reversible) can be a many-to-one mapping (irreversible) cannot be a one-to-many mapping For every pixel, we change the intensity from value input to output 7 / 48 8 / 48
Gray-level mapping Gray-level mapping The algorithm can be represented by an input-output plot output intensity LUT is most flexible. But conceptually, let s consider formulas: I out (x, y) = { 0 Iin (x, y) < T 255 I in (x, y) T Threshold (1) I out (x, y) = 255 I in (x, y) Negative (2) I out (x, y) = c log [ 1 + I in (x, y) ] Logarithm (3) input intensity It can usually be implemented as a look-up table (LUT) for maximum efficiency. I out (x, y) = c [ I in (x, y) ] γ Pick c and γ so that I out (x, y) is within [0, 255]. Power-law (4) 9 / 48 Gray-level mapping 10 / 48 Threshold output output Threshold input Negative input output output Logarithm input Power-law input Original Modified 11 / 48 12 / 48
Threshold Negative Output is a binary image T can be set at the mid-point of the intensity range (i.e., 128), but any other number is also fine. Theoretically, we lost 7/8 of the total information! But surprisingly, we retain most of the useful information. Thresholding is often used as part of a computer vision process, e.g., in pattern recognition or defect detection. Original Modified Not used often. For ordinary images it would look funny. More useful for images we don t normally see, such as medical images. 13 / 48 Logarithm 14 / 48 Power-law Example (γ = 1.5): Original Modified Original Modified 15 / 48 16 / 48
Power-law Bit-plane slicing Often, we apply this power-law transformation to correct for image displays in a process known as gamma correction. In the early days of cathode ray tube (CRT) display, the phosphors respond nonlinearly to input voltage, roughly Represent each pixel value in binary, and then create a binary image for each bit. Each such image is called a bit-plane. B = Vγ plane 1 (5) where B is brightness, V is input voltage, and γ 2.2. (also called decoding gamma ) To compensate for this, the input image is preprocessed with a gamma of 1/2.2 0.45. (also called encoding gamma ) Gamma encoding also allows for allocating more bits to preserve the relative differences in the darker tones Typically: the camera encodes the image using a standard gamma, then the display computer handles the color management (including gamma correction) to put data on the video memory, and then the monitor performs its own gamma correction plane 8 17 / 48 180 = 53 =... 1 0 1 1 0 1 0 0 0 0 1 1 0 1 0 1 Plane 8 is most significant while plane 1 is least significant Bit-plane slicing Multiple bit-planes Bit 8 Bit 7 Bit 6 Bit 5 Bit 8 Bit 8 7 Bit 8 6 Bit 8 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 8 4 Bit 8 3 Bit 8 2 Bit 8 1 18 / 48 19 / 48 20 / 48
Application: Watermarking Application: Watermarking Replace bit-plane 1 with another binary image as digital watermark Bit 8 2 Bit 1 + = Slicing this image: Bit 8 Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 This is one form of digital watermarking: hiding information digitally Often used for authentication: for example, to show that a certain picture is owned by you A fancy word steganography: art or practice of concealing a message, image, or file within another message, image, or file The method using bit-plane slicing is simple, easy to implement, and the watermark is easy to detect Drawback: the watermark is not robust: it can easily be destroyed or replaced There are much more sophisticated schemes 21 / 48 22 / 48 Histogram 1 2 3 Each pixel has a value (intensity). By collecting all the pixels together, we can form a histogram. The spatial information is lost! The histogram can give us a vague idea of the intensity concentrations. 23 / 48 24 / 48
Histogram Histogram Original Too dark Too bright Equalized The histogram can be helpful to provide the curve for gray-level mapping. Histogram equalization: output image has (roughly) the same number of pixels of each gray level (hence equalized ) Good thing: make use of all available gray levels to the maximum extent Reality: this is only approximate because we are not allowed one-to-many mapping (see the next example) Conceptually: (for 8-bit) lowest 1/256 intensity of all pixels map to intensity 0; next 1/256 map to intensity 1; next map to 2, etc. Mainly works when the illuminating condition has problem 25 / 48 Histogram equalization 26 / 48 Histogram equalization Example: 3-bit image, 64 64 pixels. Assume the following: gray level number of pixels 0 790 1 1023 2 850 3 656 4 329 5 245 6 122 7 81 1 Gray levels: [0,..., 7], total 4096 pixels 2 Proportion of input pixels at level 0: 790/4096 0.19. We need to fill the entire range of 0 to 7, so such pixels should map to 0.19 7 1.33. Round to the nearest integer, we map them to 1. 3 Proportion of input pixels at level 0 and 1: (790 + 1023)/4096 0.44. Level 1 should map to 0.44 7 3.08 3. 4 Proportion of input pixels at level 0 to 2: (790 + 1023 + 850)/4096 0.65. Level 2 should map to 0.65 7 4.55 5. 5 Similarly: Level 3 6, Level 4 6, Level 5 7, Level 6 7, Level 7 7 27 / 48 28 / 48
Histogram equalization Histogram equalization Generally, output intensity Assume L levels, and j = 0,..., L 1; image is of size M N. Let n j denote the number of pixels at level j. count We compute, for each k, k L 1X sk = nj MN j=0 k = 0, 1,..., L 1 (6) so each sk is the ideal output level for an input level k. input intensity We are limited to integer output levels, so we quantize sk. count 7 6 5 4 3 2 1 0 output intensity 0 1 2 3 4 5 6 7 input intensity Note that output histogram is roughly flat, but not strictly. 29 / 48 Histogram equalization 30 / 48 Adaptive histogram equalization Modification: adaptive histogram equalization Histogram equalization based on a portion of the image, e.g., every non-overlapping 16 16 block (tile). Limit contrast expansion in flat regions by clipping values. Smooth blending (bilinear interpolation) between neighboring tiles. Research: What is undesirable, and how to improve the algorithm? Original Histogram equalization Original 31 / 48 Global equalization Adaptive equalization 32 / 48
Pointwise operations 1 We can perform point-by-point (also known as pointwise) operations to combine several images. Assume the images are of the same size: 2 3 addition: I(x, y) = a(x, y) + b(x, y) subtraction: I(x, y) = a(x, y) b(x, y) multiplication: I(x, y) = a(x, y) b(x, y) a(x, y) I(x, y) = b(x, y) division: 33 / 48 34 / 48 Addition and averaging Addition and averaging Assume each image is corrupted by additive white Gaussian noise: fi (x, y) = g(x, y) + ni (x, y) Average images to reduce noise 1 image 8 images g(x, y) is the ideal noise-free image fi (x, y) is what we capture (subscript i to denote the ith one) ni (x, y) is the noise. Every pixel of the noise follows a Gaussian distribution with mean zero and the same standard deviation σ. The standard deviation (or variance σ2 ) of the noise indicates how severe the image is corrupted. We use the expected value E, such that 32 images (7) 35 / 48 E[ni (x, y)] = 0 (8) E[n2i (x, y)] = σ2 (9) 36 / 48
Addition and averaging Addition and averaging Example σ2 = 0.001 2552 Assume we now have images, f1 (x, y),..., f (x, y) σ2 = 0.01 2552 σ2 = 0.1 2552 1X e fi (x, y) f (x, y) = i=1 Noise in one image: E[( f1 (x, y) g(x, y))2 ] = E[n21 (x, y)] = σ2 37 / 48 38 / 48 Addition and averaging Subtraction Spot the difference: Noise in the averaged image: No defect, f1 (x, y) 1X 1X E[( fe(x, y) g(x, y))2 ] = E[( fi (x, y) g(x, y))2 ] (11) 1 (12) = E[( i=1 X With defect f2 (x, y) i=1 ni (x, y))2 ] i=1 1 X = 2 E[(ni (x, y))2 ] i=1 (13) 1 1 σ2 = σ2 2 (14) = (10) Eq. (13) is valid provided E[ni n j ] = 0 when i, j 39 / 48 40 / 48
Subtraction Multiplication Take the difference: f 1 (x, y) f 2 (x, y) No alignment Properly aligned and thresholded We can think about how an image is formed: (the imaging process) f (x, y) = i(x, y)r(x, y) (15) i(x, y) is the illumination source: 0 < i(x, y) < r(x, y) is the reflectance: 0 < r(x, y) < 1 Some images are formed with transmission (e.g. x-ray), then r(x, y) is the transmissivity f (x, y) are confined to the available dynamic range when captured by a detector Research: How to align? 41 / 48 Other combinations 42 / 48 Other combinations Combining anatomical and functional imaging Generally speaking, combining different images to form one image is known as image fusion. Possibilities include: Different types of camera, same object: multimodal medical image fusion Same type of camera, different aspects of object: panoramic photography Same type of camera, different time of capture: high dynamic range imaging photo credit: img.medicalexpo.com 43 / 48 44 / 48
Other combinations Panorama from cell phone Other combinations High dynamic range (HDR) imaging: combining images from different exposures Image stitching often has to take care of issues such as missing pixels, different exposures, alignment, moving objects, etc 45 / 48 Other combinations 46 / 48 Summary Removing occlusion We looked at image enhancement with one or more images as input. We consider each pixel location as unrelated to its neighbors. Next: We look at image processing that involves the neighbor pixels. (source: Herley, Automatic occlusion removal from minimum number of images, ICIP 2005) 47 / 48 48 / 48