Motivation Image Intensity and Point Operations Dr. Edmund Lam Department of Electrical and Electronic Engineering The University of Hong ong A digital image is a matrix of numbers, each corresponding to certain brightness. : Digital Image Proessing (Second Semester, 2015 16) http://www.eee.hku.hk/ elec4245 1 / 43 2 / 43 Motivation Gray Levels A/D converter D/A converter sensor digital image display finite dynamic range finite representation finite dynamic range These numbers are called intensity, or gray levels must be nonnegative must fall within a range of discrete values (dynamic range) are measured by the number of bits. How many gray levels are enough? Often, 8-bit. 2 8 = 256 levels Your computer likes it: 8 bits = 1 byte For an image of size X Y, your computer can store it with XY bytes (each pixel needs 1 byte to store its intensity). In reality, we need much less, due to compression. But other values exist: Printing: We may only have 1-bit ( ink or no ink at a specific location) High dynamic range (HDR) imaging: With better sensors and displays, we may record and show a wider range What s the limit? Our eyes can see about 14 orders of magnitude! 3 / 43 4 / 43
Gray Levels 8-bits 7-bits 6-bits 5-bits 1 2 4-bits 3-bits 2-bits 1-bit 3 5 / 43 Gray level mapping We will focus on discussing gray level images as the notations and concepts are much easier to understand. For color images, we can always perform such operations on the luminance channel. (more later) Let the input image be represented by I in (x, y). We process the image, and the output is represented by I out (x, y). The simplest kind of processing is a point-wise operation: I out (x, y) = T { I in (x, y) } where T can be a one-to-one mapping (reversible) can be a many-to-one mapping (irreversible) cannot be a one-to-many mapping For every pixel, we change the intensity from value input to output 6 / 43 Gray-level mapping The algorithm can be represented by an input-output plot output intensity input intensity It can usually be implemented as a look-up table (LUT) for maximum efficiency. 7 / 43 8 / 43
Gray-level mapping Gray-level mapping output output LUT is most flexible. But conceptually, let s consider formulas: I out (x, y) = { 0 Iin (x, y) < T 255 I in (x, y) T Threshold (1) I out (x, y) = 255 I in (x, y) Negative (2) Threshold input Negative input I out (x, y) = c log [ 1 + I in (x, y) ] Logarithm (3) output output I out (x, y) = c [ I in (x, y) ] γ Power-law (4) Pick c and γ so that I out (x, y) is within [0, 255]. Logarithm input Power-law input 9 / 43 Threshold 10 / 43 Threshold Output is a binary image T can be set at the mid-point of the intensity range (i.e., 128), but any other number is also fine. Theoretically, we lost 7/8 of the total information! But surprisingly, we retain most of the useful information. Thresholding is often used as part of a computer vision process, e.g., in pattern recognition or defect detection. Modified 11 / 43 12 / 43
Negative Logarithm Modified Not used often. For ordinary images it would look funny. Modified More useful for images we don t normally see, such as medical images. 13 / 43 Power-law Bit-plane slicing Example (γ = 1.5): Represent each pixel value in binary, and then create a binary image for each bit. Each such image is called a bit-plane. plane 8 180 = 53 =... 14 / 43 plane 1 1 0 1 1 0 1 0 0 0 0 1 1 0 1 0 1 Modified Plane 8 is most significant while plane 1 is least significant 15 / 43 16 / 43
Bit-plane slicing Multiple bit-planes Bit 8 Bit 7 Bit 6 Bit 5 Bit 8 Bit 8 7 Bit 8 6 Bit 8 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 8 4 Bit 8 3 Bit 8 2 Bit 8 1 17 / 43 18 / 43 Application: Watermarking Application: Watermarking Replace bit-plane 1 with another binary image as digital watermark Bit 8 2 Bit 1 This is one form of digital watermarking: hiding information digitally Often used for authentication: for example, to show that a certain picture is owned by you A fancy word steganography: art or practice of concealing a message, image, or file within another message, image, or file = + The method using bit-plane slicing is simple, easy to implement, and the watermark is easy to detect Slicing this image: Bit 8 Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 19 / 43 Drawback: the watermark is not robust: it can easily be destroyed or replaced There are much more sophisticated schemes 20 / 43
Histogram 1 2 3 Each pixel has a value (intensity). By collecting all the pixels together, we can form a histogram. The spatial information is lost! The histogram can give us a vague idea of the intensity concentrations. 21 / 43 Histogram 22 / 43 Histogram Too dark Too bright Equalized The histogram can be helpful to provide the curve for gray-level mapping. : output image has (roughly) the same number of pixels of each gray level (hence equalized ) Good thing: make use of all available gray levels to the maximum extent Reality: this is only approximate because we are not allowed one-to-many mapping (see the next example) Conceptually: (for 8-bit) lowest 1/256 intensity of all pixels map to intensity 0; next 1/256 map to intensity 1; next map to 2, etc. Mainly works when the illuminating condition has problem 23 / 43 24 / 43
Example: 3-bit image, 64 64 pixels. Assume the following: gray level number of pixels 0 790 1 1023 2 850 3 656 4 329 5 245 6 122 7 81 1 Gray levels: [0,..., 7], total 4096 pixels 2 Proportion of input pixels at level 0: 790/4096 0.19. We need to fill the entire range of 0 to 7, so such pixels should map to 0.19 7 1.33. Round to the nearest integer, we map them to 1. 3 Proportion of input pixels at level 0 and 1: (790 + 1023)/4096 0.44. Level 1 should map to 0.44 7 3.08 3. 4 Proportion of input pixels at level 0 to 2: (790 + 1023 + 850)/4096 0.65. Level 2 should map to 0.65 7 4.55 5. 5 Similarly: Level 3 6, Level 4 6, Level 5 7, Level 6 7, Level 7 7 25 / 43 26 / 43 Generally, Assume L levels, and j = 0,..., L 1; image is of size M N. Let n j denote the number of pixels at level j. We compute, for each k, s k = L 1 MN k n j k = 0, 1,..., L 1 (5) j=0 so each s k is the ideal output level for an input level k. We are limited to integer output levels, so we quantize s k. count input intensity output intensity 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 input intensity count output intensity Note that output histogram is roughly flat, but not strictly. 27 / 43 28 / 43
Adaptive histogram equalization Modification: adaptive histogram equalization based on a portion of the image, e.g., every non-overlapping 16 16 block (tile). Limit contrast expansion in flat regions by clipping values. Smooth blending (bilinear interpolation) between neighboring tiles. Research: What is undesirable, and how to improve the algorithm? 29 / 43 Global equalization Adaptive equalization 30 / 43 Pointwise operations 1 We can perform point-by-point (also known as pointwise) operations to combine several images. Assume the images are of the same size: 2 3 addition: I(x, y) = a(x, y) + b(x, y) subtraction: I(x, y) = a(x, y) b(x, y) multiplication: division: 31 / 43 I(x, y) = a(x, y) b(x, y) a(x, y) I(x, y) = b(x, y) 32 / 43
Addition and averaging Addition and averaging Assume each image is corrupted by additive white Gaussian noise: fi (x, y) = g(x, y) + ni (x, y) Average images to reduce noise 1 image 8 images (6) g(x, y) is the ideal noise-free image fi (x, y) is what we capture (subscript i to denote the ith one) ni (x, y) is the noise. Every pixel of the noise follows a Gaussian distribution with mean zero and the same standard deviation σ. The standard deviation (or variance σ2 ) of the noise indicates how severe the image is corrupted. We use the expected value E, such that 32 images E[ni (x, y)] = 0 E[n2i (x, y)] 33 / 43 (8) 34 / 43 Addition and averaging Addition and averaging Example σ2 = 0.001 2552 =σ (7) 2 Assume we now have images, f1 (x, y),..., f (x, y) σ2 = 0.01 2552 σ2 = 0.1 2552 1X fi (x, y) fe(x, y) = i=1 Noise in one image: E[( f1 (x, y) g(x, y))2 ] = E[n21 (x, y)] = σ2 35 / 43 (9) 36 / 43
Addition and averaging Subtraction Spot the difference: Noise in the averaged image: i=1 No defect, f1 (x, y) 1X 1X E[( fe(x, y) g(x, y))2 ] = E[( fi (x, y) g(x, y))2 ] With defect f2 (x, y) (10) i=1 1X = E[( ni (x, y))2 ] (11) 1 X = 2 E[(ni (x, y))2 ] i=1 (12) 1 1 2 2 σ σ = 2 (13) i=1 = Eq. (12) is valid provided E[ni n j ] = 0 when i, j 37 / 43 38 / 43 Subtraction Multiplication Take the difference: f1 (x, y) f2 (x, y) No alignment Properly aligned and thresholded We can think about how an image is formed: (the imaging process) f (x, y) = i(x, y)r(x, y) (14) i(x, y) is the illumination source: 0 < i(x, y) < r(x, y) is the reflectance: 0 < r(x, y) < 1 Some images are formed with transmission (e.g. x-ray), then r(x, y) is the transmissivity f (x, y) are confined to the available dynamic range when captured by a detector Research: How to align? 39 / 43 40 / 43
Other combinations High dynamic range (HDR) imaging: combining images from different exposures Other combinations Removing occlusion (source: Herley, Automatic occlusion removal from minimum number of images, ICIP 2005) 41 / 43 42 / 43 Summary We looked at image enhancement with one or more images as input. We consider each pixel location as unrelated to its neighbors. Next: We look at image processing that involves the neighbor pixels. 43 / 43