T Digital Image Processing P (5 cr)

Size: px

Start display at page:

Download "T Digital Image Processing P (5 cr)"

Ethan Pierce
5 years ago
Views:

1 T Digital Image Processing P (5 cr) Autumn 2015 Lecture slides: Jorma Laaksonen LECTURE SLIDES 2015 for self study

2 LECTURE # Course arrangements Overview Course materials Course assignments Exam Course feedback Slides as a reading guide to the book Course learning goals Introduction Fundamentals of the human visual system (2.1 2) Digital image formation (2.3) Representing digital images (2.4) Spatial relations between pixels (2.5) LECTURE # Basic image operations Arithmetic operations (2.6.3/3.4.1) Geometric spatial transformations (2.6.5/5.11)

3 4. Intensity transformations for enhancement Basic notation for spatial operations (3.1.1) Intensity transformation operations (3.2) Histogram processing (3.3) LECTURE # Spatial filtering for image enhancement Spatial mask operations (3.4/3.5) Lowpass filtering (3.5.1/3.6.1) Image sharpening with highpass filtering (3.6/3.7) Combining spatial enhancement methods (3.7/3.8). 81 LECTURE # Fourier transform and frequency domain F-transform in continuous case (4.2.4,4.5.2/4.2.1) Discrete Fourier transform, DFT (4.4.1/4.2.1) dimensional discrete Fourier transform (4.5.5/4.2.2) Properties and uses of 2D Fourier transform Frequency domain filtering (4.7.2/4.2.3) Lowpass filtering (4.8/4.3)

4 7.3 Highpass filtering (4.9/4.4) Homomorphic filtering (4.9.6/4.5) Implementing the 2D Fourier transform LECTURE # Image restoration Degradation model (5.1) Noise models (5.2.2) Restoration in the spatial domain (5.3) Adaptive filtering (5.3.3) Removal of periodic noise (5.4) Linear position invariant degradation process (5.5). 138 LECTURE # Estimating the degradation function (5.6) Inverse filtering (5.7) Wiener filtering (5.8) Constrained least squares filtering (5.9) Summary of frequency domain restoration (5.7 9) Vector matrix notation in image processing

5 LECTURE # Morphology Definitions and operations (9.1/9.1.1) Erosion (9.2.1/9.2.2) Dilation (9.2.2/9.2.1) Opening and closing (9.3) Hit-or-miss (9.4) Boundary extraction (9.5.1) Filling in regions (9.5.2) Extraction of connected components (9.5.3) Thinning (9.5.5) Thickening (9.5.6) Golay alphabets ( ) LECTURE # Wavelets and multiresolution processing Image pyramids and subband coding (7.1) Multiresolution expansions (7.2) One-dimensional wavelet transform (7.3)

6 11.4 Two-dimensional wavelet transform (7.5) LECTURE # Image compression Mathematical background (8.1) Image compression model (8.1.6/8.2) Compression methods (8.2/ ) LECTURE # Variable-length coding (8.2.1/8.4.1) Run-length coding (8.2.5/8.4.3) Bit-plane coding (8.2.7/8.4.3) Block transform coding (8.2.8/8.5.2) Kernel-based image transforms (8.2.8/8.5.2) Lossless predictive coding (8.2.9/8.4.4) Lossy predictive coding (8.2.9/8.5.1) LECTURE # Image segmentation Fundamentals of segmentation (10.1/10.4)

7 13.2 Detecting local changes (10.2/10.1) Edge linking and boundary detection (10.2.7/10.2) Hough transform (10.2.7/10.2.2) An example of Hough transform (10.2.7/10.2.2) Thresholding (10.3) Region-based segmentation (10.4) Use of motion in segmentation (10.6) LECTURE # Use of color in image processing Basics (6) Color fundamentals (6.1) Color models (6.2) Pseudocolor images (6.3) Color transforms (6.5) Color image smoothing and sharpening ( ) Color segmentation ( ) Color edge detection (6.7.3) Noise in color images (6.8) Example of noise in color images (6.8)

8 15. Exam requirements

9 LECTURE #1 8 Learning goals: After this lecture the student should be able to understand the practical arrangements of the course name some goals, methods and application areas of DIP identify some connections between DIP and other fields of research recognize some parts and characteristics of the human visual system understand the subjectivity of the human image quality assessment know the fundamentals of digital image formation analyze the basic spatial relations between pixels

10 1. Course arrangements Overview Welcome Welcome to the course! In 2015 the Digital Image Processing is not lectured any more, but an opportunity is given to pass the course with self study until the next academic year begins in September Enrollment Please enroll yourself for the course and the exams in WebOodi Announcements All announcements concerning the course are given in MyCourses

11 10 Forms of activity and effort The course is planned to consist of the following effort: studying the lecture slides, 12 2h solving the exercise questions, 11 2h 3 course assignments, 3 13h reading for the exam, 40h exam, 3h giving course feedback, 1h = total 135h, equals 5 credits

65 EUR in book stores. http://www.imageprocessingplace.com http://www.imageprocessingplace.com/dip-3e/dip3e sample book material.htm (Ch. 1 2) http://www.imageprocessingplace.com/root files V3/errata sheets.

12 1.2 Course materials Rafael C. Gonzalez & Richard E. Woods: Digital Image Processing, Third Edition Pearson Prentice Hall, 2008, ISBN (hardcover) OR Pearson International Edition, 2008, ISBN (softcover) around 65 EUR in book stores. sample book material.htm (Ch. 1 2) files V3/errata sheets.htm (errata) Chapters 1 10 will be studied. There is an examination copy of Chapters 3 10 of the book available in room B336 for overnight loans. Lecture slides and exercise solutions All materials will be available from the beginning of the course at 11

13 Course assignments Three mandatory course assignments are included in the course fulfillment. The assignments will be published in and the solutions shall be submitted as PDF files in the MyCourses system. The assignments are graded as accepted/rejected rejected ones will need to be corrected and reiterated. Assignments that have been submitted before the soft deadline and accepted without reiteration will give an extra point for the exam. assignment published soft deadline hard deadline

14 Exam There will be at three exams: the first on Wednesday , the second on Monday and the third in the first exam period of autumn Correct and timely course assignments give three extra points in the exam. After the exam, one can take exams only if the course assignments have been returned by and accepted! 1.5 Course feedback Feedback from the course can be given anonymously after the exam. Those students who give course feedback will get an extra point for that exam.

15 Slides as a reading guide to the book It s important that the students read the course book. The slides will serve only as an indication of the parts of the boom that are essential for learning the course contents and passing the exam. The slide titles refer (in parentheses) to the sections of the book where the topic is presented in full detail. If there are two section numbers in parentheses, then the first one refers to the third edition section number and the second one to that in the second edition book.

16 Course learning goals After the course the students should be able to: understand digital image formation and processing analyze digital filters in spatial and frequency domains understand image enhancement and restoration use multiresolution image processing and wavelets know fundamental lossless and lossy image compression methods understand the working of morphological image operations know some basic image segmentation techniques understand the role of color in digital image processing

17 2. Introduction Old adage: A picture is worth a thousand words. Approximately 75% of all human sensory information is based on visual perception. Automatic processing of image information is needed for improving the quality of images, storing them efficiently and analyzing their contents. Goals and subfields (1.1) The goals of digital image processing can be characterized as: Enhancement of visual information for human interpretation. image processing: image image point operations filtering restoration 16

18 17 correction of image geometry enhancement of lines and edges registration of images change detection and analysis Processing of visual information for machine interpretation. image analysis / computer vision: image something else object recognition in an image image explanation, scene analysis robot vision, active machine vision Image compression 3D reconstruction from 2D projections The methods to be used depend heavily on the application.

Since the early 1970s digital image processing has been applied to

images, particle physics, industrial quality control, etc. 18 History (1.

19 Since the early 1970s digital image processing has been applied to satellite images, medical imaging and analysis methods, astronomical images, particle physics, industrial quality control, etc. 18 History (1.2) Transmission of newspaper illustrations with a submarine cable between London and New York in the early 1920s (5 levels of gray) Correction of distortions in images from lunar probes started in the early 1960s in the US.

20 19 Connections to other fields of research (1.2) Pattern recognition Signal processing Artificial intelligence Digital image processing Optics Perceptual psychology Graphics technology

21 20 Application areas (1.3) military applications surveillance graphics technology satellite remote sensing medical diagnostics industrial quality control robot vision image transmission and archiving archeology, physics, astronomy, biology, criminal investigation,...

22 21 Image acquisition methods (1.3) gamma-ray imaging (10 5 ev): medicine, PET, astronomy X-ray imaging (10 3 ev): medicine, CAT ultraviolet imaging (10 1 ev): microscopy, astronomy visible light (10 0 ev): satellite images, fingerprints infrared imaging (10 1 ev): satellite images microwave imaging (10 4 ev): radar images radiowave imaging (10 8 ev): medicine, MRI, astronomy seismic imaging (100 Hz): geology, earth resources hydrophone imaging: marine images ultrasound imaging (5 MHz): manufacturing, medicine electron microscopy ( ): TEM, SEM fractals and other computational images

23 22 Fundamental steps in image processing (1.4) image acquisition preprocessing: image enhancement or restoration segmentation postprocessing, e.g. morphology representation and description classification, recognition

24 Fundamentals of the human visual system (2.1 2) Structure of the human eye (2.1) lens retina fovea iris visual axis blind spot nerve

25 24 Receptors in retina (2.1.1) Cones Rods photopic or bright-light vision 6 7 million around the fovea (5 ) sensitive to colors: three types of cones able to perceive small details each cone receptor has its own nerve able to see only stable view scotopic or dim-light vision million distributed all over the retina (160 ) insensitive to color, only brightness information create and maintain an overall picture multiple rod receptors connected with one nerve sensitive to changes in the view

26 25 Image formation geometry (2.1.2) 15 m 2.55 mm 100 m 17 mm

27 In bright light, Weber ratio is smaller and the eye s relative discrimination ability is better than in dim light. 26 Brightness discrimination (2.1.3) I + I I I background intensity I intensity difference in the middle I c the smallest difference that is detected in 50% of experiments I c /I Weber ratio I c /I small: small relative changes are detected, good discrimination I c /I large: only large changes are detected, bad discrimination

28 27 Adaptation to brightness (2.1.3) The eye has an enormous capability to discriminate brightness levels: levels from scotopic threshold to glare limit. However, in any moment the eye is adapted to some range of brightness and only a limited number of brightness levels can be discriminated. The eye adapts to the average brightness it sees and in any specific image neighborhood approximately distinct intensity levels can be discriminated. Within an image, the adaptation can vary from place to place and consequently a larger variation of intensities can be distinguished. In typical images, somewhat more than 100 intensity levels are required to produce the sensation of continuous intensity values without false contouring.

29 28 Mach bands (2.1.3) Constant intensity is perceived as varying due to the close-by step change. = Step-wise changes are amplified due to under- and overshooting. Explanation: Mexican hat function causes image convolution in retina * =

30 Digital image formation (2.3) Image acquisition devices (2.3) silver halide film semiconductor sensors single sensors linear and circular strip sensors matrix sensors

31 Single sensors (2.3.1) 30

32 Sensor strips (2.3.2) 31

33 Matrix sensors (2.3.3) 32

34 33 Image formation model (2.3.4) f(x, y) is the measured light energy in position (x, y) 0 < f(x, y) < f(x, y) can be characterized by two components: source illumination i(x, y) and reflectance r(x, y): f(x, y) = i(x, y) r(x, y) 0 < i(x, y) < 0 < r(x, y) < 1 where: The intensity (gray level) l of a monochrome image is often an integer value: l = 0 corresponds to black l = L 1 corresponds to white [0, L 1] is the gray scale l [0, L 1]

35 Representing digital images (2.4) Coordinate systems (2.4.2) A digital image is typically presented as a function of spatial x and y coordinates. The orientation of the coordinate axes vary from book to book. x y y y x x mathematical conventional Gonzalez&Woods

36 The quality of television image can be obtained when M = N = 512 and k = Sampling and quantization (2.4.2) Digitizing of spatial xy coordinates represents two-dimensional sampling, also known as spatial quantization. Digitizing of intensity amplitude is known as gray scale or intensity quantization. A digital image is presented as an M N matrix: f(0, 0) f(0, 1) f(0, N 1) f(1, 0) f(1, 1) f(1, N 1) f(x, y) =... f(m 1, 0) f(m 1, 1) f(m 1, N 1) One needs to set the spatial resolution M N and the gray scale resolution L. The resolutions are often powers of two: M = 2 m, N = 2 n, L = 2 k. Storing of an image will then require b bits where: b = M N k.

37 Digital image after sampling and quantization (2.4.1) 36

38 b = b = b = b = 9216 b = b = b = 6144 b = b = 98304

human subjective impression of the image quality.

Images that contain large even areas (low frequencies) require many

39 38 Subjective image quality (2.4.3) Image resolutions and bit counts do not match directly with the human subjective impression of the image quality. Subjective assessments can be studied using isopreference curves. Images that contain large even areas (low frequencies) require many intensity quantization levels to avoid false contouring. Images that contain many small details (high frequencies) require good spatial resolution.

40 Spatial relations between pixels (2.5) Neighbors (2.5.1) Image element or pixel p with coordinates (x, y) has four neighbors in horizontal and vertical directions in positions (x+1, y), (x 1, y), (x, y+1) and (x, y 1). These are called p s 4-neighbors and denoted N 4 (p). p s four diagonal neighbors are (x + 1, y + 1), (x 1, y + 1), (x + 1, y 1) and (x 1, y 1) and they are denoted N D (p). p s 8-neighborhood is the union of N 4 (p) and N D (p): N 8 (p) = N 4 (p) N D (p). In the outer image boundary the neighborhoods are not full.

41 (x-1,y-1) (x,y-1) (x+1,y-1) (x-1,y) p (x,y) (x+1,y) (x-1,y+1) (x,y+1) (x+1,y+1) 40

42 41 Adjacency (2.5.2) The adjacency of pixels is important when object boundaries and paths from one pixel to another are defined. Two pixels are called adjacent if they are somehow neighbors and their gray values or other properties fulfill some condition. One can define that in a binary image a pixel p fulfills the condition if its value is z(p) = 1, or z(p) V = {1}, or p V, where V = {q z(q) V }. We can define three distinct adjacency types for pixels p and q: 4-adjacency: p V q V q N 4 (p) 8-adjacency: p V q V q N 8 (p) m-adjacency (mixed adjacency): p V q V (q N 4 (p) q N D (p) N 4 (p) N 4 (q) V = ) Mixed adjacency eliminates ambiguous paths that often follow from 8-adjacency.

43 m Two image subsets (regions/segments/areas) S 1 and S 2 are adjacent if p, q : p S 1 q S 2 p and q are mutually adjacent. Otherwise the regions are disjoint. 42

44 LECTURE #2 43 Learning goals: After this lecture the student should be able to analyze paths and distance measures in digital images understand the nature of linear operators describe the steps of geometric spatial transformations understand intensity transformations as point operations analyze images and their needed corrections based on histograms implement a simple histogram equalization algorithm

45 3. Basic image operations Two regions are said to be adjacent if their union forms a connected set. 44 Paths and connectivity (2.5.2) A path (or curve) from one pixel p with coordinates (x, y) to another pixel q with coordinates (s, t) is a pixel chain: (x, y) = (x 0, y 0 ), (x 1, y 1 ),, (x n, y n ) = (s, t) In the chain each (x i, y i ), i = 1,..., n, is adjacent to (x i 1, y i 1 ). n is the length of the path. The chain can be defined based on 4-, 8- or m-adjacency. Two pixels, p and q, that belong to a subset S of an image are connected in S, iff there exists a path form p to q whose all pixels belong to S. If p is an image element in S, all pixels connected to p in S form one connected component in S. All pixels of one connected component are connected to each other. An image subset is a connected set if it has only one connected component.

46 Distance measures (2.5.3) Let p, q, and z be image elements whose coordinates are (x, y), (s, t) and (u, v), respectively. A distance function (metrics) D fulfills the following: D(p, q) = 0 p = q D(p, q) > 0 p q D(p, q) = D(q, p) D(p, z) D(p, q) + D(q, z) Commonly used distance measures include: D e (p, q) = (x s) 2 + (y t) 2 D 4 (p, q) = x s + y t D 8 (p, q) = max( x s, y t ) 45 Euclidean distance D 4 distance (city-block/manhattan) D 8 distance (chess board)

47 46 e 8 4 D 4 and D 8 distances are independent of paths connecting the points. Pixels that have distance D 4 = 1 from a pixel p are that pixel s 4-neighbors. Similarly D 8 = 1 and 8-neighbors. The m-distance between two pixels corresponds to the length of the shortest m-path connecting the pixels. The course of m-paths depend on the values of the pixels.

48 47 Linear operations and operators (2.6.2/2.6) A central concept in image processing is whether an operation is linear or not. Let s assume that an image operation H transforms an input image f to output image g: H[f(x, y)] = g(x, y) Operator H is said to be linear, iff H[a i f i (x, y) + a j f j (x, y)] = a i H[f i (x, y)] + a j H[f j (x, y)] Linear operator thus has two properties: additivity and homogeneity. Operation that is not linear is by definition nonlinear.

48 3.1 Arithmetic operations (2.6.3/3.4.1) Standard arithmetic operations +,, and, can be defined between corresponding pixels in images of equal size.

49 Arithmetic operations (2.6.3/3.4.1) Standard arithmetic operations +,, and, can be defined between corresponding pixels in images of equal size. For example, the difference (or change or motion) between images f(x, y) and h(x, y) can be obtained by subtracting the gray values: g(x, y) = f(x, y) h(x, y) Example: In mask mode radiography, X-ray images of vessels are taken before and after injecting iodine medium into the bloodstream. Propagation of the medium can be seen in the difference image.

50 In practice, only few applications allow taking a series of identical images. Registering (or aligning) almost identical images can be difficult. Microscopic and astronomical images can often be averaged. 49 Averaging of noisy images (2.6.3/3.4.2) If it is possible to take multiple identical images of the same target, additive noise present in the images can be reduced. Let s assume a noise model g(x, y) = f(x, y) + η(x, y) where the additive noise η(x, y) is uncorrelated and zero average. One can calculate point-wise average of K images {g i (x, y); i = 1, 2,..., K}: g(x, y) = 1 K K g i (x, y) i=1 Now E{g(x, y)} = f(x, y) and σg(x,y) 2 = 1 K σ2 η(x,y). When K increases, the variance of the pixel values decreases and g(x, y) converges to f(x, y).

51 3.2 Geometric spatial transformations (2.6.5/5.11) Geometric transformations are needed in situations where image pixels original coordinate values need to be transformed (or corrected) to change the spatial relationships between the pixels. These transformations are often called rubber-sheet transformations. Let the pixel coordinates and values in the original image be f(v, w) and in the transformed one respectively g(x, y). The geometric coordinate transformation can be expressed: (x, y) = T {(v, w)} x = r(v, w) y = s(v, w) 50

52 51 Affine coordinate transformations (2.6.5/5.11) Basic geometric transformations like scaling, rotation, translation or shear can be defined for the whole image area. These simple transformations can be expressed as an affine transformation: t 11 t 12 0 [x y 1] = [v w 1] T = [v w 1] t 21 t 22 0 t 31 t 32 1 For example, rotation by angle θ: t 11 t 12 0 cos θ sin θ 0 t 21 t 22 0 = sin θ cos θ 0 t 31 t

53 52 Spatial transformations for registration (2.6.5/5.11.1) Spatial transformations can be used in image registration when aligning images of the same scene. Geometric transformation for image registration is normally implemented in two stages: spatial transformation between coordinate pairs (v, w) and (x, y) gray scale interpolation from pixel values f(v, w) to g(x, y) Mapping of pixel coordinates is assumed to be know exactly for a set of tie points or control points that exist in three or four pairs between the images. r(v, w) and s(v, w) s parametric model has to be selected so that the needed parameter values can be estimated from the tie points coordinates.

54 53 Bilinear mapping (2.6.5/5.11.1) When four matching tie point pairs are used, one has four parallel equations for x and four equations for y. From these, a maximum of eight unknown parameters can be solved. One typically uses the bilinear form: x = r(v, w) = c 1 v + c 2 w + c 3 vw + c 4 y = s(v, w) = c 5 v + c 6 w + c 7 vw + c 8 It transforms tie points exactly from the input (v, w) image to the (x, y) output image and interpolates bilinearly in the area between the tie points. The transformation is solved separately inside each set of four tie point pairs. For each discrete g(x, y) pixel in the output image one will have in the input image a correspondence point f(v, w) where v and w are not integers. One still needs to interpolate the gray scale value for the f(v, w) point from the known gray values in its neighboring pixels.

55 Coefficients a, b, c and d can be solved from the coordinates and intensity values of the four neighbor pixels. 54 Geometric intensity transformation (2.4.4/5.11.2) With the geometric intensity transformation one can calculate the intensity value for each discrete (x, y) point in g from their corresponding non-discrete (v, w) point in f. Any non-discrete (v, w) always resides between four neighboring discrete pixels of f. As the coordinates and intensity values of these four pixels are known f(v, w) can be approximated by interpolation. The simplest interpolation is the use of the value from the nearest neighbor pixel, which is know as the nearest neighbor interpolation or zeroth degree interpolation. Better continuity and lesser distortions can be obtained by using all four neighboring pixels in bilinear interpolation f(v, w) = av + bw + cvw + d.

56 4. Intensity transformations for enhancement 4.1 Basic notation for spatial operations (3.1.1) The operations on pixels in the image plane or spatial domain can be denoted: g(x, y) = T [f(x, y)] f(x, y) is the original input image g(x, y) is the resulting output image T [ ] is the operation applied to f in the neighborhood of point (x, y) The operator T can be applied also to a set of aligned images or channels of a single image. In that case one could replace the scalar f(x, y) with a vectorial f(x, y). If T is applied to the (x, y) pixel alone, it is a point operation, otherwise it is a mask operation. 55

57 Intensity transformation operations (3.2) Intensity transformation operations are a class of point operations where the input image f(x, y) is transformed to the output image g(x, y) by using a transform function s = T (r), where r = f(x, y) is the intensity of a specific point in the input and s = g(x, y) the corresponding value in the output image. Examples of intensity transformation operations include: contrast stretching s = T (r) binarization s = T (r) s r r t r

58 57 image negation s = T (r) dynamic compression s = T (r) log transformations s = c log(1 + r) s = T (r) r gamma correction s = cr γ s = T (r) r r r

59 58 intensity-level slicing s = T (r) s = T (r) r r bit-plane slicing s = T (r) s = T (r) s = T (r) s = T (r) r r r r

60 4.3 Histogram processing (3.3) Histogram operations are an important class of intensity transformations. The histogram of an image is formed by counting how many times each intensity value appears in the image: p(r k ) = is estimated probability of r k = n k MN r k [0, L 1] is k:th discrete intensity value n k is the count of k:th intensity values in the image MN is the total count of pixels in the image Based on the shape of the histogram, the image s appearance can be described and the needed enhancement operations can be planned. 59

61 60 Examples of the forms of intensity value histograms (3.3) p(r k ) p(r k ) p(r k ) p(r k ) r k r k dark image light image weak contrast strong contrast It is often the most convenient to think that the intensity value r gets real values from range [0, 1], where 0 corresponds to black and 1 to white. r k r k

62 Intensity histogram transformations (3.3) Mappings that can be used for transforming intensity histograms are mostly of the form s = T (r), where T (r) is unique and monotonically increasing in the range 0 r 1, which ensures that the ordering of intensity values does not change 0 T (r) 1, when 0 r 1, which ensures that the resulting intensity values remain within bounds The inverse transformation r = T 1 (s) exhibits the same good properties. In the continuous case one can study the differentials: [ p s (s) = p r (r) dr ] ds r=t 1 (s) This shows that the transformed image s intensity histogram p s (s) can be forced to any shape by a proper selection of T (r). 61

63 Intensity histogram equalization (3.3.1) The most commonly used histogram operation aims at making all intensities equally probable. This can be obtained by selecting the transfer mapping: s = T (r) = r 0 p r (w) dw, 0 r 1 The right side of the equation is r s cumulative distribution function (CDF). CDF is monotonically increasing from 0 to 1. We can easily solve the derivative of s with respect to r: ds dr = p r(r) Its inverse dr can be inserted in the previous equation: ds [ p s (s) = p r (r) dr ] [ ] 1 = p r (r) ds r=t 1 (s) p r (r) = 1, 0 s 1 r=t 1 (s) This shows that s = T (r) produces an equalized histogram p s (s). 62

64 63 Examples of histogram equalization results (3.3.1) See Figures 3.16 and 3.20 on pages 121 and 129 and explain what has happened in the equalization process. Study what has happened to the histogram peaks horizontally and vertically. Identify potential sources of distortions and other problems.

65 The transformation strengthens local intensity variations. The standard deviation in the denominator causes that areas of small variance are changed the most. 64 Contrast enhancement with local image statistics (3.3.4) Image contrast enhancement can be based on also other methods than histograms. A simple approach is to try to normalize the local intensity averages and variances. The intensity transformation function can be: g(x, y) = km ( ) f(x, y) m(x, y) + m(x, y), where σ(x, y) g(x, y) = the new intensity value of pixel (x, y) f(x, y) = the old intensity value of pixel (x, y) m(x, y) = the average intensity value around (x, y) σ(x, y) = the standard deviation of intensity in the same area M = the total intensity average of the original image f(x, y) k = a constant, 0 < k < 1

66 LECTURE #3 65 Learning goals: After this lecture the student should be able to explain mask operations as convolutions and inner products analyze linear filters by their impulse and filter transfer function shapes identify properties of linear and median-based lowpass filters understand the principles of image sharpening in spatial domain describe the properties of the Laplacian use unsharp masking and other highboost sharpening methods describe the properties of the Sobel gradient analyze processing steps in a real-world image enhancement system

67 5. Spatial filtering for image enhancement Spatial mask operations (3.4/3.5) Many digital image processing techniques are based on performing arithmetic (or logic) operations in a small fixed neighborhood of each pixel. The operations are called as mask operations, template operations, window operations, filtering operations, convolution operations,... Arithmetic neighborhood operations can be expressed by using the intensity values z i of the image and the mask coefficients w i. z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 z 9 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 As an example, a 3 3-sized mask w(x, y) for calculating the average: z = 1 9 (z 1 + z z 9 ) = i=1 z i

68 67 Linear filtering as spatial convolution (3.4.2/4.2.4) Linear filtering operations can be interpreted as convolutions. Definition of a convolution: a b w(x, y) f(x, y) = w(s, t)f(x s, y t) s= a t= b Convolution is commutative: w(x, y) f(x, y) = f(x, y) w(x, y)

69 68 Linear filtering as an inner product (3.4.3/3.5) The filtering operation can also be defined as a weighted sum calculated using the mask coefficients: z = 9 w i z i i=1 This is equal to vectors inner product or dot product: z = w T z, where w and z are vectors formed of the coefficients w(x, y) and the image pixel values f(x, y) around the point (x, y). All operations that are of the inner product form are linear.

70 69 Shapes of linear spatial filters The impulse response or point spread function of a linear filter shows the spreading of a single white pixel on black background. The shape of the impulse response equals the point reflection (central inversion) of the mask s weights. The Fourier transform of the impulse response is known as the filter transfer function (or simply filter function). Linear filters are commonly circular symmetric in both spatial and frequency domains. 1D intersection of the impulse response gives information on the filter s frequency domain properties frequency domain low-pass high-pass band-pass spatial domain 0 0 0

71 Pixels whose intensity value would change more than the threshold will remain unchanged. Strong changes such as edges and corners will be preserved as a result of the nonlinear rule Lowpass filtering (3.5.1/3.6.1) Noise can be efficiently removed from images if we have a series of aligned images that can be averaged pixel-wise. If such a series does not exist, we need to reduce noise within one image. An image can be smoothed with a spatial filter that averages pixel values under the mask. This linear lowpass filtering reduces the the amount (variance) of uncorrelated additive noise. As a side effect of linear lowpass filtering, sharp changes such as edges and small details are blurred. The larger the mask the stronger the blur. Blurring can be prevented by implementing a nonlinear thresholding rule: { 1 M (m,n) S g(x, y) = f(m, n), f(x, y) 1 M (m,n) S f(m, n) < T f(x, y), otherwise

72 71 Examples of linear smoothing (3.5.1/3.6.1) (500x500) 3x3 5x5 9x9 15x15 35x35

73 72 Order-statistics filters (3.5.2/3.6.2) If the new intensity value for the output image is obtained as something else than a linear combination of the intensity values in the neighborhood of the pixel in the input image, then the operation is nonlinear. Lowpass filtering can be implemented with nonlinear order-statistics filters. Order-statistics filters are based on the ranking of the intensity values under the mask. The operations include: median maximum minimum Nonlinear filters like the median filter do not have a known impulse response nor a transfer function. One may say that median filtering is unique for each image.

73 Noise removal with median filtering (3.5.2/3.6.2) Linear neighbor averaging as a noise removal method tends to blur details. This can to some extent be prevented by using median filtering.

74 73 Noise removal with median filtering (3.5.2/3.6.2) Linear neighbor averaging as a noise removal method tends to blur details. This can to some extent be prevented by using median filtering. Also median filters destroy image details, but not as much as linear lowpass filters of same size. Median filtering is an optimal method for removing strong impulse noise, known also as salt and pepper noise.

75 Image sharpening with highpass filtering (3.6/3.7) Image sharpening aims to enhance blurred details in images or to highlight transitions in intensity. Sharpening can be interpreted as inverse operation of averaging. Sharpening is based on amplifying the intensity differences between pixels. Derivatives (or differences in the discrete case) are well suited for detecting interpixel changes. The first-order derivative (actually difference) of a one-dimensional function: f x = f(x + 1) f(x) The second-order derivative of a one-dimensional function: 2 f = f(x + 1) + f(x 1) 2f(x) x2

76 Examples of types of image details (3.6.1/3.7.1) 75

77 The Laplacian (3.6.2/3.7.2) For a continuous two-dimensional function, the Laplacian is defined as: f(x, y) = 2 f(x, y) = 2 f x f y 2 We can see that the Laplacian is a linear operator. In the one-dimensional case, the first derivative was defined as: 2 f = f(x + 1, y) + f(x 1, y) 2f(x, y) x2 2 f = f(x, y + 1) + f(x, y 1) 2f(x, y) y2 2 f(x, y) = f(x + 1, y) + f(x 1, y) + f(x, y + 1) + f(x, y 1) 4f(x, y) In the mask form: or

78 77 Sharpening with the Laplacian (3.6.2/3.7.2) The Laplacian enhances small details and vanishes (equals zero) for constant and linearly varying areas. Laplacian filtered image can be added to the original one: g(x, y) = f(x, y) 2 f(x, y) = 5f(x, y) f(x + 1, y) f(x 1, y) f(x, y + 1) f(x, y 1) = or: =

79 Sharpening with the Laplacian, an example (3.6.2/3.7.2) 78

80 79 Unsharp masking and highboost filtering (3.6.3/3.7.2) Unsharp masking is an old sharpening trick used in silver halide film photography. A sharpened (highpass filtered) image can be formed by subtracting a blurred (lowpass filtered) version from the original image: g mask (x, y) = f(x, y) f(x, y) The result image is obtained as the sum of the original and the difference: g(x, y) = f(x, y) + g mask (x, y) Or in a more general form with a multiplier k: g(x, y) = f(x, y) + k g mask (x, y) When k > 1 this is called highboost filtering.

81 The gradient as a sharpening method (3.6.4/3.7.3) The gradient vector is defined in a general form: [ ] gx f = grad(f) = = The length of the gradient vector (often called gradient ) is: mag( f) = f = g y f x f y g 2 x + g 2 y g x + g y Robert s cross-gradient, g x = z 9 z 5, g y = z 8 z 6 : -1 0 and Sobel operators: G x = and G y =

82 Combining spatial enhancement methods (3.7/3.8) Study book Section 3.7 and Figure 3.43 and 1) draw a block diagram that shows the processing steps 2) identify which operations are linear and which are not 3) assess the visual quality (sharpness and presense of noise) of the intermediate and final images

83 LECTURE #4 82 Learning goals: After this lecture the student should be able to know the exact form of two-dimensional discrete Fourier transform understand the Fourier transform of a 2-dimensional rectangle calculate the Fourier transform of a spatial lowpass filter analyze the translation, rotation and periodicity properties of the 2D Fourier transform understand image convolution and the convolution theorem define three families of frequency domain low- and highpass filters describe the principle of homomorphic filtering describe how spatial filter mask can be formed from frequency response be familiar with basic building blocks of the 2D Fourier transform

84 6. Fourier transform and frequency domain 83 Discrete two-dimensional Fourier transform converts a digital image to the frequency domain where some filtering operations can be easily implemented. After the processing, the frequency domain image is inverse transformed back to the spatial domain. Frequency domain processing can be used for: enhancement restoration compression content description

85 F-transform in continuous case (4.2.4,4.5.2/4.2.1) F{f(x)} = F (u) = F 1 {F (u)} = f(x) = F{f(x, y)} = F (u, v) = F 1 {F (u, v)} = f(x, y) = f(x) e j2πux dx F (u) e j2πux du f(x, y) e j2π(ux+vy) dx dy F (u, v) e j2π(ux+vy) du dv

86 6.2 Discrete Fourier transform, DFT (4.4.1/4.2.1) For a sequence {f(0), f(1), f(2),..., f(m 1)} we define the forward and backward Fourier transform pair: F (u) = M 1 x=0 f(x) e j2πux/m, u = 0, 1,..., M 1 f(x) = 1 M 1 F (u) e j2πux/m, x = 0, 1,..., M 1 M u=0 Frequency domain properties F (u) is complex: F (u) = R(u) + ji(u) = F (u) e jφ(u). F (u) = R 2 (u) + I 2 (u) φ(u) = tan 1 I(u) R(u) P (u) = F (u) 2 = R 2 (u) + I 2 (u) Fourier (frequency) spectrum phase angle, phase spectrum power spectrum 85

87 86 Forming a discrete sample sequence (4.3,4.4.2/4.2.1) Sampling is the process where a continuous function is converted to a discrete (one- or two-dimensional) sample sequence. A continuous function f(x) can be discretized to a sequence with a constant sampling rate: {f(x 0 ), f(x 0 + x), f(x x),..., f(x 0 + (M 1) x)} The notation can be simplified by using f(x) also for the non-continuous version: f(x) f(x 0 + x x), x = 0, 1,..., M 1 F (u) F (u u) The spatial and frequency resolutions are related as: u = 1 M x

88 dimensional discrete Fourier transform (4.5.5/4.2.2) In the two-dimensional case: F (u, v) = M 1 N 1 x=0 f(x, y) = 1 MN y=0 f(x, y) e j2π(ux/m+vy/n), u = 0, 1,..., M 1, v = 0, 1,..., N 1 M 1 u=0 N 1 v=0 F (u, v) e j2π(ux/m+vy/n), x = 0, 1,..., M 1, y = 0, 1,..., N 1 Notice that the transform pair is unsymmetric with respect to the constant coefficients. In a symmetric formulation, both coefficients would be 1/MN. In the case of a square image, M = N and each coefficient equals 1/N.

89 88 Properties of the 2-dimensional Fourier transform (4.6.5/4.2.2) F (u, v) is complex: F (u, v) = R(u, v) + ji(u, v) = F (u, v) e jφ(u,v) F (u, v) = R 2 (u, v) + I 2 (u, v) φ(u, v) = tan 1 I(u,v) R(u,v) Fourier spectrum phase angle P (u, v) = F (u, v) 2 = R 2 (u, v) + I 2 (u, v) power spectrum F (0, 0) = M 1 N 1 x=0 y=0 f(x, y) MN times average f(x, y) F (u, v) = F ( u, v) conjugate symmetry F (u, v) = F ( u, v) φ(u, v) = φ( u, v) even spectrum symmetry odd phase symmetry

Fourier transform of a 2-dimensional image, an example (4.6.5/4.2.2) The origin of the Fourier plane has been translated to the center of the space for visualization purposes.

90 Fourier transform of a 2-dimensional image, an example (4.6.5/4.2.2) The origin of the Fourier plane has been translated to the center of the space for visualization purposes. Most of the power spectrum has been concentrated in the origin and and along the axes. The form of the transform is a been rotated by 90. sin bu u sin cv. The shape of the rectangle has v 89

91 Trigonometric, real-valued, positive near the origin, unbound in (u, v), Fourier transform of a 2-dimensional mask, an example Let a spatial filter mask be of the form: h(x, y) = h(x, y) = 1 5 H(u, v) = ( δ(x, y) + δ(x 1, y) + δ(x + 1, y) + δ(x, y 1) + δ(x, y + 1) ) M 1 N 1 x=0 y=0 h(x, y) e j2π(ux/m+vy/n), u = 0, 1,..., M 1, v = 0, 1,..., N 1 = 1 ( 1 + e j2πu/m + e j2πu/m + e j2πv/n + e j2πv/n) 5 = 1 ( cos 2πu ) 2πv + 2 cos 5 M N

92 91 Fourier transform of a 2-dimensional mask, an example continued H(u, v) = 1 ( cos 2πu ) 2πv + 2 cos 5 M N = 1 ( cos 2πu ) when v = 0 5 M = 1 ( cos 2πu ) when v = u 5 M

93 7. Properties and uses of 2D Fourier transform (Image ( 1) (x+y) corresponds to the 2D Nyquist frequency.) 92 Translation (4.6.2/4.6.1) f(x, y)e j2π(u 0x/M+v 0 y/n) F (u u 0, v v 0 ) f(x x 0, y y 0 ) F (u, v)e j2π(ux 0/M+vy 0 /N) Translation in one domain corresponds to change in phase angle in the other domain. Translation does not affect Fourier or power spectrum. For visualization purposes the origin of the Fourier space is often translated from the top-left corner to the center, u 0 = M/2, v 0 = N/2: e j2π(u 0x/M+v 0 y/n) = e jπ(x+y) = ( 1) (x+y) f(x, y)( 1) (x+y) F (u M/2, v N/2)

94 93

95 Rotation (4.6.2/4.6.1) In the polar coordinate form x = r cos θ, y = r sin θ, u = ω cos φ, x = ω sin φ one can show that the two-dimensional Fourier transform satisfies: f(r, θ + θ 0 ) F (ω, φ + θ 0 ) Conjugate symmetry (4.6.4/4.6.1) Periodicity (4.6.3/4.6.1) F (u, v) = F ( u, v) F (u, v) = F ( u, v) F (u, v) = F (u + M, v) = F (u, v + N) = F (u + M, v + N) f(x, y) = f(x + M, y) = f(x, y + N) = f(x + M, y + N) The transform result is always periodic in the direction of one coordinate axis with period of M and in the other direction with period N. 94

96 95 Also the inverse transform is periodic so actually also the input image is implicitly (but often falsely) assumed to be periodic. Periodicity and padding (4.6.6,3.4.2/4.6.3) Because real images are seldom periodic, wraparound error takes place in the processing results of non-periodic images. The same effect can be seen to happen also in the spatial domain when the filtering mask in the image boundary is partially outside the image. The wraparound error can be avoided by extending the image f p (x, y) and the convolving filter and h p (x, y) with zeros so that their original sizes A B and C D are increased to P Q where P A+C 1 and Q B +D 1. { f(x, y) (x, y) [0, A 1] [0, B 1] f p (x, y) = 0 A x P B y Q { h(x, y) (x, y) [0, C 1] [0, D 1] h p (x, y) = 0 C x P D y Q

97 Convolution (4.6.6/4.2.4) Linear filtering operations can be interpreted as convolutions. Definition of the convolution: f(x, y) h(x, y) = 1 MN M 1 N 1 m=0 n=0 f(m, n)h(x m, y n) Convolution is commutative: f(x, y) h(x, y) = h(x, y) f(x, y) Convolution theorem: f(x, y) h(x, y) F (u, v) H(u, v) f(x, y) h(x, y) F (u, v) H(u, v) 96

98 7.1 Frequency domain filtering (4.7.2/4.2.3) Filtering in the frequency domain is based on the convolution theorem: spatial convolution between the image and mask corresponds to their product in the frequency domain. Processing steps: g(x, y) = h(x, y) f(x, y) G(u, v) = H(u, v) F (u, v) The 2D Fourier transform F (u, v) of image f(x, y) is calculated F (u, v) is multiplied with a transfer function H(u, v) to get G(u, v) A enhanced image g(x, y) is created from G(u, v) with inverse Fourier transform 97

99 An example of low- and highpass filtering (4.7.2/4.2.3) 98

100 Lowpass filtering (4.8/4.3) Ideal lowpass filter (ILPF) (4.8.1/4.3.1) The ideal lowpass filter has response equal to one inside circle of radius D 0 in the frequency domain and zero outside of if. D 0 is the cutoff frequency: { 1, D(u, v) D 0 H(u, v) = 0, D(u, v) > D 0 D(u, v) = ( u 2 + v 2) 1 2 H(u, v) is circularly symmetric around the origin. The blurring caused by a lowpass filter can be examined by inspecting the shape of its inverse Fourier transform, known as the impulse response or point spread function. The impulse response of the ideal lowpass filter is of the form of rings.

101 100 Every original pixel spreads and is mixed with surrounding pixels. As a result, the rings present in the point spread function cause undesired ringing in the filtering results. Strong shapes like single pixels and edges create echoes seen as rings that replicate the shape of the object. The radii of the rings in h(x, y) are inversely proportional to the cutoff frequency D 0. Strong filtering with small D 0 causes strong ringing. An example of ideal lowpass filtering (4.8.1/4.3.1) D 0 : P %:

102 101 original D 0 = 5, -8% D 0 = 15, -5.4% D 0 = 30, -3.6% D 0 = 80, -2% D 0 = 230, -0.5%

103 Butterworth lowpass filter (4.8.2/4.3.2) One of the most important forms of lowpass filters is the Butterworth filter: 1 H(u, v) = 1 + [ ] 2n D(u, v)/d 0 n = order of the filter D 0 = cutoff frequency D(u, v) = ( u 2 + v 2) 1 2 At the cutoff frequency: H(u, v) = 0.5 H(u, v) The Butterworth filter blurs the image less than the ideal low-pas filter because some portion of the high frequencies are preserved. Formation of rings is much weaker than with the ideal filter D(u,v) D 0 102

103 An example of Butterworth lowpass filtering (4.8.2/4.3.2) original 500 500 D 0 = 5, -8% D 0 = 15, -5.

104 103 An example of Butterworth lowpass filtering (4.8.2/4.3.2) original D 0 = 5, -8% D 0 = 15, -5.4% D 0 = 30, -3.6% D 0 = 80, -2% D 0 = 230, -0.5%

105 Gaussian lowpass filter (4.8.3/4.3.3) A lowpass filter can also be implemented by using the Gaussian function: H(u, v) = e D2 (u,v)/2d 2 0 D 0 = cutoff frequency D 2 (u, v) = u 2 + v 2 At the cutoff frequency: H(u, v) = e A special property of the Gaussian filter is that its point spread function is also Gaussian: h(x, y) = 2πD 0 e 2π2 D 2 0 (x2 +y 2 ) Consequently, the Gaussian filter cannot cause the ringing effect in the image plane. Comparing H(u, v) and h(x, y) we notice that the role of D 0 is there opposite. So a wide frequency response corresponds to a narrow impulse response and vice versa. 104

106 105 An example of Gaussian lowpass filtering (4.8.3/4.3.3) original D 0 = 5, -8% D 0 = 15, -5.4% D 0 = 30, -3.6% D 0 = 80, -2% D 0 = 230, -0.5%

107 106 Application areas for lowpass filtering (4.8.4/4.3.4) Lowpass filtering can be used as a cosmetic or aesthetic operation to remove noise and some other degradations even though the sharpness of the image is sacrificed. For example, after digitizing text, impurities in character shapes can be reduced by lowpass filtering. If the image acquisition step generates e.g. horizontal lines, their visibility can be reduced with frequency domain lowpass filtering. Lowpass filtering is also needed when doing image resizing or reducing the amount of visual data e.g. in a feature extraction or dimensionality reduction step in image analysis.

108 Highpass filtering (4.9/4.4) Highpass filtering can in general be seens as a complement of lowpass filtering: H HP (u, v) = 1 H LP (u, v) Ideal highpass filter (4.9.1/4.4.1) Ideal highpass filter is the complement of the ideal lowpass filter: { 0, D(u, v) D 0 H(u, v) = 1, D(u, v) > D 0 Butterworth highpass filter (4.9.2/4.4.2) Also highpass filtering can be implemented with the Butterworth structure. In that case: 1 H(u, v) = 1 + [ D 0 /D(u, v) ] 2n

109 Gaussian highpass filter (4.9.3/4.4.3) H(u, v) = 1 e D2 (u,v)/2d 2 0 Gaussian highpass (and bandpass) filters can be implemented also as the difference of two Gaussian lowpass filters: H(u, v) = e D2 (u,v)/2d 2 1 e D 2 (u,v)/2d 2 2 Laplace operator in the frequency domain (4.9.4/4.4.4) The Fourier transform of the Laplacian can be solved analytically in the continuous case: 2 f(x, y) = 2 f x + 2 f 2 y 2 F{ 2 f(x, y)} = 4π 2 (u 2 + v 2 )F (u, v) One can see from the Fourier transform that the Laplacian is a highpass filter with zero response at the origin of the frequency plane. From the inverse discrete Fourier transform of 4π 2 (u 2 + v 2 ) one gets approximately the familiar Laplace operator mask. 108

110 Other forms of highpass filtering (4.9.5/4.4.5) Pure highpass filtering is needed in image analysis applications where edges and object boundaries are being searched for segmenting objects out of the background. Images processed for human viewing are seldom purely highpass filtered. Instead, high-frequency emphasis or highboost filtering are typically used to enhance the small details in the images. H HP (u, v) = 1 H LP (u, v) H HB (u, v) = 1 + k [1 H LP (u, v)] This can be written also in a more general form: H HFE (u, v) = k 1 + k 2 H HP (u, v) 109

111 In image formation it is natural to think that changes in the illumination component i(x, y) are slower than those in the reflectance component r(x, y). Consequently, unwanted variations can be reduced by high-boost filtering the linearized image Homomorphic filtering (4.9.6/4.5) Homomorphic filtering means (in the Gonzalez-Woods book) methods that first map some non-linear image degredation process to linear, then process the data with linear methods, and finally inverse maps the results back to the non-linear domain. As already presented, an image f(x, y) can be considered to be formed as the product of the illumination component i(x, y) and the reflectance component r(x, y): f(x, y) = i(x, y) r(x, y) This non-linear image formation can be linearized by applying the logarithm operator on both sides of the equation: ln f(x, y) = ln i(x, y) + ln r(x, y)

112 111 The linearized and processed image is returned to the original form by exponentiation. The whole process can be presented with the flow diagram: f(x, y) ln H(u, v) DFT 1 exp g(x, y) DFT

113 8. Implementing the 2D Fourier transform 112 Separability (4.11.1/4.6.1) F (u, v) = = = M 1 N 1 x=0 M 1 x=0 M 1 x=0 y=0 f(x, y) e j2π(ux/m+vy/n) ( N 1 e j2πux/m y=0 e j2πux/m F y {f(x, y)} = = F x {F y {f(x, y)}} f(x, y) e j2πvy/n ) M 1 x=0 F (x, v) e j2πux/m The 2D transform can thus be implemented as two consecutive 1D transforms. The order of required operations is changed as: O(N 4 ) O(2N 3 ).

114 Forming spatial filter mask from frequency response (4.11.4/4.6.7) Small spatial masks can be faster to use than to perform filtering in the frequency domain. Filtering is however often more intuitive to be defined in the frequency domain. Implementing frequency domain filtering H(u, v) in the spatial domain requires in principle a mask h(x, y) of full M N image size. h(x, y) = M 1 u=0 N 1 v=0 H(u, v) e j2π(ux/m+vy/n) Let s for a m n-sized ĥ(x, y) that only approximates H(u, v) with Ĥ(u, v). ĥ(x, y) = Ĥ(u, v) = M 1 N 1 u=0 m 1 v=0 x=0 y=0 Ĥ(u, v) e j2π(ux/m+vy/n) n 1 ĥ(x, y) e j2π(ux/m+vy/n) One can find the optimal solution for Ĥ(u, v) in the least squared error sense by solving the vector-matrix equation with a pseudoinverse. 113

115 Separating 2D spatial mask to two 1D filterings For the sake of computational implementation, it is often useful to separate e.g. 3 3-sized 2D spatial filter two consecutive 3 1- and 1 3-sized 1D filterings. In general this is possible if the filter is of the outer product form: ad ae af a bd be bf = b d e f cd ce cf c This can be interpreted as the 2D filter h 3 3 being the result of the convolution between the 1D masks h 1 3 and h 3 1 and then changing the association of operations: g = h 3 3 f = (h 1 3 h 3 1 ) f = h 1 3 (h 3 1 f) If the 2D mask is not strictly of the outer product form, it may be e.g. interpreted as a sum of an impulse in the origin and the outer product. By the convolution theorem the Fourier transforms are obtained as products in the frequency domain: G = H 3 3 F = (H 1 3 H 3 1 )F 114

116 115 Building blocks of Fourier transforms mask h(x, y) Fourier transform H(u, v) impulse 1 I 1 1-shift T +1 exp j2πu M ±1-sum C u = T 1 + T +1 2 cos 2πu M 3-block B u = I + C u cos 2πu M 5-block cos 2πu M 4πu + 2 cos M 3 3-box B = B u B v (1 + 2 cos 2πu 2πv M )(1 + 2 cos N ) 8-Laplace B 9I 2 cos 2πu 2πv M + 2 cos N +4 cos 2πu 2πv M cos N 8

117 116 Building blocks of Fourier transforms, part 2 1st difference D 1 2 1st difference D ±1-difference S u = D 1 + D Sobel mask = I T 1 1 exp j2πu M = T +1 I exp j2πu M 1 G u = S u (2I + C v ) 2j sin 2πu M 4j sin 2πu 2πv M (1 + cos N ) 2nd difference L u = D + 1 D cos 2πu M 2 4-Laplace L u + L v 2 cos 2πu M 2πv + 2 cos N 4

118 h(x, y) = H(u, v) = 1 ( 1 + e j2πu/m + e j2πu/m + e j2πv/n + e j2πv/n 9 +e j2πu/m e j2πv/n + e j2πu/m e j2πv/n +e j2πu/m e j2πv/n + e j2πu/m e j2πv/n) = 1 ( cos 2πu 2πv + 2 cos 9 M N = 1 ( cos 2πu ) ( cos 2πv ) 9 M N 2πu + 4 cos M ) 2πv cos N

119 LECTURE #5 118 Learning goals: After this lecture the student should be able to understand the general image degradation process name some noise models recognize the effects of periodic noise in an image and to remove them estimate parameters of the most common noise types recognize some non-linear spatial averaging methods use order-statistics filters and their adaptive variants identify the basic properties and uses of bandreject, bandpass and notch filters tell what optimum notch filtering means explain the linear position invariant degradation model

120 Image restoration With image restoration one tries to improve the image quality by modelling and compensating the degradation process that has caused error in the image. In principle it can be possible to restore the original image quality. Restoration methods can operate both in image and frequency domains. There is always some mathematical optimization criterion. Restoration methods are often computationally heavy. The examples in the book start with a readily digitized image. One could also improve image acquisition, digitization and sampling. The main emphasis in the book is on additive noise. Restoration techniques do exist for more difficult noise types too.

121 Degradation model (5.1) Let us assume the following (degradation/restoration model): f(x, y) H g(x, y) R ˆf(x, y) original image degradation process η(x, y) noise source degraded image restoration process restored image The process H is assumed to be linear and position invariant, noise η is assumed to be uncorrelated and additive. Therefore the process H can be interpreted as a convolution and the convolution theorem leads to: g(x, y) = h(x, y) f(x, y) + η(x, y) G(u, v) = H(u, v)f (u, v) + N(u, v) We will first assume that H = 1 and study only the effects of additive noise. Later we will study the degradation model in full.

122 9.2 Noise models (5.2.2) Gaussian distr. p(z) = 1 2πσ e (z µ)2 2σ 2 Rayleigh distr. p(z) = { 2(z a) b 0 e (z a) 2 b { Erlang distr. a b z b 1 (b 1)! p(z) = e az 0 exponential { distr. ae az p(z) = 0 uniform { distr. 1 p(z) = b a impulse noise P a p(z) = P b 0

different from the others and the exponentially

123 122 Examples of different noises (5.2.2) One can see that only the impulse ore salt and pepper noise is visually different from the others and the exponentially distributed noise here seems to lead to somewhat darker result image.

123 Periodic noise (5.2.3) In image acquisition or when images are transferred over analog transmission channels, periodic noise can appear due to interferences.

124 123 Periodic noise (5.2.3) In image acquisition or when images are transferred over analog transmission channels, periodic noise can appear due to interferences. Periodic noise is easiest to detect and remove in the frequency domain. In the example image, four sinusoidal interference patterns have been added and they appear as eight conjugate symmetric bright spots in the Fourier spectrum. We will later study how this kind of noise is removed.

124 Noise parameter estimation (5.2.4) Sometimes one can know the noise model and its parameters in advance, for example from the specifications of the image acquisition equipment.

125 124 Noise parameter estimation (5.2.4) Sometimes one can know the noise model and its parameters in advance, for example from the specifications of the image acquisition equipment. The noise properties can also be studied empirically by studying the images. The noise properties are most easily analyzed in small patches of constant background intensity. The observed intensity histogram of noise can be fit into the noise model by solving the noise parameters using eg. the maximum likelihood method or the method of moments.

126 Restoration in the spatial domain (5.3) If the degradation is caused by only additive (non-periodic) noise, the restoration is simplest in the spatial domain. If the noise is periodic or the degradation model contains a true degradation process, then the restoration is easiest in the frequency domain. Noise can be removed in the image domain by averaging. In addition to the arithmetic (ie. linear) average, there exist also other variants: arithmetic mean ˆf(x, y) = 1 g(s, t) mn (s,t) S xy 1 geometric mean ˆf(x, y) = mn g(s, t) (s,t) S xy harmonic mean ˆf(x, y) = mn (s,t) S xy 1 g(s,t) (s,t) S contraharmonic mean ˆf(x, y) = xy g(s, t) Q+1 (s,t) S xy g(s, t) Q

127 Examples of averaging (5.3.1) original Gaussian noise pepper noise salt noise arithmetic mean geometric mean Q = 1.5 Q =

128 127 Spatial restorations based on order statistics (5.3.2) The most important family of non-linear noise reduction methods is the order-statistics filters. median filter ˆf(x, y) = median (s,t) S xy {g(s, t)} max filter ˆf(x, y) = max {g(s, t)} (s,t) S xy min filter ˆf(x, y) = min {g(s, t)} (s,t) S xy midpoint filter ˆf(x, y) = 1 [ ] max {g(s, t)} + min {g(s, t)} 2 (s,t) S xy (s,t) S xy alpha-trimmed mean filter ˆf(x, 1 y) = g r (s, t) mn d (s,t) S xy

129 Examples of order-statistics in restoration (5.3.2) salt and pepper median filtering 2nd median filtering 3rd median filtering 128

130 Adaptive filtering (5.3.3) Noise can be effectively reduced with adaptive filtering, that takes into account the local properties of the image. That will prevent blurring of edges while offering good noise reduction in flat areas. Local noise removal filter (5.3.3) One can tune the linear average filter so that the resulting intensity is between the original intensity value and its local neighborhood mean value. The tuning is based on knowing the noise variance in the whole image and measuring the local intensity variance. In flat areas the local average is used whereas near edges the original values are retained. ˆf(x, y) = (1 σ2 η )g(x, y) + σ2 η m σl 2 σl 2 L = g(x, y) σ2 η [g(x, y) m σl 2 L ]

131 An example of local noise removal filter (5.3.3) Gaussian noise arithmetic mean geometric mean adaptive noise removal 130

131 Adaptive median filtering (5.3.3) One can modify the size of the median filtering mask on the basis of the properties of the image under the mask.

132 131 Adaptive median filtering (5.3.3) One can modify the size of the median filtering mask on the basis of the properties of the image under the mask. One increases the mask size if z med = z min or z med = z max, so that z med is free from the impulse noise. If finally z xy = z min or z xy = z max, then z med is used as the output, otherwise z xy. impulse noise 7 7 median adaptive, S max = 7.

133 Removal of periodic noise (5.4) As stated earlier, interferences in image acquisition can cause periodic noise in images. If the noise peaks in the frequency domain reside in the same distance from the origin, the noise can be filtered out by using a bandreject filter in the frequency domain. 1 if D(u, v) < D 0 W 2 ideal H BR (u, v) = 0 if D(u, v) D 0 W 2 1 if D(u, v) > D 0 + W 2 Butterworth H BR (u, v) = [ D(u,v)W D 2 (u,v) D 2 0 ] 2n Gaussian H BR (u, v) = 1 e 1 2 [ D 2 (u,v) D 0 2 ] 2 D(u,v)W

134 Example of periodic noise removal (5.4) periodic noise added frequency spectrum bandreject filter filtering result 133

135 134 Bandpass filtering (5.4.2) A bandpass filter can be obtained from the corresponding bandreject filter: H BP (u, v) = 1 H BR (u, v) Bandpass filtering is mostly needed for modelling and analysis of periodic noise. Notch filters (5.4.3) Ideal, Butterworth and Gaussian lowpass and highpass filters can be modified to corresponding notch pass or notch reject filters by moving the origin of the filter in the frequency domain. Symmetry about the origin has to be preserved. Notch pass and notch reject filters are related as: H NP (u, v) = 1 H NR (u, v)

136 135 An example of notch filtering (5.4.3) frequency spectrum notch pass filter passed noise noise rejected

137 136 Optimum notch filtering (5.4.4) Interferences are seldom so regular that they could be filtered with a linear operation from the whole image area. One can use a model where a weighting function w(x, y) that controls how much noise is subtracted in each pixel: ˆf(x, y) = g(x, y) w(x, y)η(x, y) η(x, y) is the interference noise image extracted with notch pass filters. w(x, y) is chosen point-wise so that in each (x, y) point s local neighborhood the variance of ˆf(x, y) is minimized: w(x, y) = g(x, y)η(x, y) g(x, y)η(x, y) η 2 (x, y) η 2 (x, y) (Here h(x, y) is the average value of h(x, y) in the local neighborhood.)

138 137 An example of optimum notch filtering (5.4.4) (This seems like a real image, not a simulated one as before!) original image g(x, y) frequency spectrum result ˆf(x, y)

139 Linear position invariant degradation process (5.5) Let s study the degradation model more closely. g(x, y) = H[f(x, y)] + η(x, y) We ll first assume that there is no noise η(x, y). If H[ ] is linear, then H[af 1 (x, y) + bf 2 (x, y)] = ah[f 1 (x, y)] + bh[f 2 (x, y)] A linear operator is additive: H[f 1 (x, y) + f 2 (x, y)] = H[f 1 (x, y)] + H[f 2 (x, y)] and homogeneous: H[af 1 (x, y)] = ah[f 1 (x, y)]

140 An operator H for which H[f(x, y)] = g(x, y) is position invariant, iff H[f(x α, y β)] = g(x α, y β) for all images f(x, y) and translations (α, β). In that case the response for an arbitrary image point depends only on the values of the image points, not on their location. Continuous-valued f(x, y) can be presented as an integral of the impulse function (aka Dirac delta function): f(x, y) = Consequently in the noiseless case: [ g(x, y) = H[f(x, y)] = H = = f(α, β)δ(x α, y β) dαdβ ] f(α, β)δ(x α, y β) dαdβ H[f(α, β)δ(x α, y β)] dαdβ f(α, β)h[δ(x α, y β)] dαdβ 139

141 By denoting the impulse response or point spread function as h(x, α, y, β) = H[δ(x α, y β)]: g(x, y) = = f(α, β)h(x, α, y, β) dαdβ f(α, β)h(x α, y β) dαdβ The first integral is the superposition or Fredholm integral. Because H is position invariant, H[δ(x α, y β)] = h(x α, y β) leads to the latter convolution integral. Additive noise η(x, y) can be added: g(x, y) = H[f(x, y)] + η(x, y) = f(α, β)h(x α, y β) dαdβ + η(x, y) = h(x, y) f(x, y) + η(x, y) G(u, v) = H(u, v)f (u, v) + N(u, v) 140

142 LECTURE #6 141 Learning goals: After this lecture the student should be able to recognize methods for estimating degradation functions analyze the properties and problems of inverse filtering use Wiener filtering for image restoration understand the principle of constrained least squares filtering understand the principle of vector-matrix notation in image processing

143 Estimating the degradation function (5.6) For performing the deconvolution of g(x, y) the degradation function H[ ] can be estimated by: observation experimentation modeling Estimating degradation by observation (5.6.1) One takes from an image g(x, y) a piece g s (x, y) where the signal is strong compared to the noise and models the ideal shape of the piece with ˆf s (x, y). Then: H s (u, v) = G s(u, v) ˆF s (u, v) H s (u, v) is then used as an estimate for H(u, v).

144 143 Estimating degradation by experimentation (5.6.2) One picks in the image as small as possible a bright point around which the point spread function is seen directly in g(x, y) and then: H(u, v) = G(u, v) A Estimating degradation by modeling (5.6.3) For example the blur caused by atmospheric turbulence can be modeled with a physical model: H(u, v) = e k(u2 +v 2 ) 5/6 k 0 k = k = k =

145 144 Degradation by uniform linear motion (5.6.3) One can model the case where the camera moves or shakes during the exposure if the time-dependent motion components x 0 (t) and y 0 (t) and exposure time T are known: g(x, y) = T 0 f [ x x 0 (t), y y 0 (t) ] dt Its Fourier transform is: G(u, v) = g(x, y)e j2π(ux+vy) dx dy = = T 0 [ T 0 f [ x x 0 (t), y y 0 (t) ] ] dt e j2π(ux+vy) dx dy f [ x x 0 (t), y y 0 (t) ] e j2π(ux+vy) dx dy dt

146 And further: G(u, v) = T 0 = F (u, v) when the integral is denoted H(u, v) = we get T 0 F (u, v) e j2π [ux 0(t)+vy 0 (t)] dt T 0 e j2π [ux 0(t)+vy 0 (t)] dt e j2π [ux 0(t)+vy 0 (t)] dt G(u, v) = H(u, v)f (u, v) Thus, if the motion functions x 0 (t) and y 0 (t) are known, the transfer function H(u, v) can be solved. 145

147 146 An example of degradation by uniform linear motion (5.6.3) Uniform linear motion in only x-direction: x 0 (t) = at/t, y 0 (t) = 0. H(u, v) = T 0 e j2πuat/t dt = T sin(πua) e jπua πua We can see that H(u, v) vanishes when u = n/a, where n is an integer value.

148 Inverse filtering (5.7) Inverse filtering is a typical example of unconstrained image restoration. In the Fourier domain one simply divides the image by the distortion function: ˆF (u, v) = G(u, v) H(u, v) By inserting the definition of G(u, v) one gets: ˆF (u, v) = G(u, v) H(u, v) = H(u, v)f (u, v) + N(u, v) H(u, v) = F (u, v) + N(u, v) H(u, v) Near the zeroes of H(u, v) the noise is amplified considerably and the N(u,v) H(u,v) term dominates the restoration result. In practice H(u, v) attenuates faster than N(u, v) when the distance from the (u, v) origin is increased. Reasonable filtering results can therefore obtained for only for small frequencies.

149 An example of inverse filtering (5.7) whole H r = 40 r = 70 r =

150 9.9 Wiener filtering (5.8) Wiener filtering is a typical example of constrained image restoration. It tries to minimize the expected squared error e 2 = E[(f ˆf) 2 ] between true f and its estimate ˆf. Wiener filtering in the Fourier plane is: H ˆF (u, v) (u, v) = G(u, v) H(u, v) 2 + γs η (u, v)/s f (u, v) 1 H(u, v) 2 = G(u, v) H(u, v) H(u, v) 2 + γs η (u, v)/s f (u, v) S η (u, v) and S f (u, v) are the power spectra of the noise η and the image f. The coefficient γ is a parameter that controls the restoration filtering result. When γ = 0 Wiener filtering reduces to inverse filtering. When γ = 1 Wiener filtering is optimal in the sense of e 2. If S f (u, v) and S η (u, v) cannot be know accurately, one can replace the noise term with a constant K: ˆF (u, v) = 1 H(u, v) 2 G(u, v) H(u, v) H(u, v) 2 + K 149

151 150 An example of Wiener filtering (5.7) inverse filtering restricted inverse Wiener filtering

152 151 Another example of Wiener filtering (5.7) noisy image inverse filtering Wiener filtering

153 9.10 Constrained least squares filtering (5.9) Wiener filtering is based on the autocorrelation matrix of f and is optimal, but only in the sense of the expected value. Constrained least squares restoration is optimal for a given image when we assume the noise average and variance are known. The smoothness of the restoration result is optimized because additive noise makes the image uneven and grainy. Image smoothness is modeled with the negative of the squared value of the second derivative, discrete Laplacian, of the image. We denote: and min! C = M 1 N 1 [ 2 ˆf(x, y) ] 2 x=0 y= p(x, y) = The optimal value is then reached when: ˆF (u, v) = H (u, v) G(u, v) H(u, v) 2 + γ P (u, v) 2 152

154 γ is Lagrange multiplier selected mathematically to satisfy the constraint g H f 2 = η 2 or visually by iteration. Above, 2 is the Euclidean norm. g, f and η are MN 1-sized image vectors, H is MN MN-sized degradation matrix. Following our previous notations we can now write also: G(u, v) = H(u, v) F (u, v) + N(u, v) g(x, y) = h(x, y) f(x, y) + η(x, y) g = Hf + η 153

155 Summary of frequency domain restoration (5.7 9) In unconstrained restoration no other entities but g and H are assumed to be known. One estimates ˆf so that the expected variance of noise η is minimized. In constrained restoration one not only minimizes the noise term but also some other known error criterion defined by a linear operator. Different selections for the error criterion lead to different restorations compatible with the linear position invariant degradation model. All restorations were of the form: ˆF (u, v) = in inverse filtering X(u, v) = 0. H (u, v) G(u, v) H(u, v) 2 + X(u, v) in Wiener filtering X(u, v) = γs η (u, v)/s f (u, v) or K. in least squares restoration X(u, v) = γ P (u, v) 2.

156 9.12 Vector matrix notation in image processing All linear image processing operations can be presented also as vector matrix operations. g(x, y) = h(x, y) f(x, y) g = Hf If f and g are sized M N pixels, then f and g are MN 1-sized column vectors. The linear operation H is an MN MN-sized matrix. Assuming the pixels of f are stacked column-wise in f a b c... f = d e f... g h i f = (a d g b e h c f i ) T study how H looks like for the Laplacian ?

157 LECTURE #7 156 Learning goals: After this lecture the student should be able to use morphological operations like dilation, opening and thinning understand the duality of morphological operations know what are Golay alphabets name some application areas of digital image morphology understand morphological thinning and thickening

158 10. Morphology 157 logical processing of binary images, generalizations to intensity images image is processed as a point set Application areas of morphology: pre-processing: noise removal enhancement of object shapes qualitative description of objects Morphological operations: erosion & dilation opening & closing hit-or-miss thinning & thickening

159 Definitions and operations (9.1/9.1.1) universal set in principle is the Euclidean 2-dimensional space universal set in practice is the discrete point set Z 2 origin / reference point belongs in, subset, superset, intersection, union empty set, complement () c, subtraction A B = A B c symmetric set or transpose or reflection or mirroring B = {w w = b, when b B} translation or shift (B) z = {c c = b + z, when b B}

160 10.2 Erosion (9.2.1/9.2.2) makes image smaller, removes details A is the set to be eroded, B is erosion structuring element A B = {z (B) z A} = {z (B) z A c = } = {z z + b A, for all b B} = b B(A) b 159

161 10.3 Dilation (9.2.2/9.2.1) makes image larger, fills in gaps A B = {z ( B) z A } = {z z = a + b, for some a A and b B} = b B(A) b comparable to convolution, in both the structuring element is reflected 160

162 An example of dilation (9.2.2/9.2.1) 161

163 Duality of erosion and dilation (9.2.3/9.2.2) Dilation and erosion are actually the same operation only the roles of the object and the background are interchanged: (A B) c = A c B = {z (B) z A} c = {z (B) z A c = } c = {z (B) z A c } = A c B, Q.E.D. Similarly: (A B) c = A c B 162

164 10.4 Opening and closing (9.3) combinations of dilation and erosion removal of separate points can be done with opening filling in holes and gaps can be done with closing A B = (A B) B A A B = (A B) B A A B = {(B) z (B) z A} operations are duals: (A B) c = A c B opening and closing are idempotent A B = (A B) B A B = (A B) B it can be said that A is open / closed with respect to B 163

165 164 An example of opening (9.3) original after erosion after dilation

166 An example of successive opening and closing (9.3) 165

167 Hit-or-miss (9.4) a combined structuring element is an ordered pair B = (B 1, B 2 ) A B = (A B 1 ) (A c B 2 ) = (A B 1 ) (A B 2 ) 10.6 Boundary extraction (9.5.1) Boundary pixels of object A (with respect to some structuring element B) can be extracted: β(a) = A (A B)

168 10.7 Filling in regions (9.5.2) A hollow object A can be filled (with respect to some structuring element B) starting from a seed point p A c inside it: X 0 = {p} X k = (X k 1 B) A c, k = 1, 2, 3,... X k = X k 1 = end 167

169 10.8 Extraction of connected components (9.5.3) Connected components Y i of region A (with respect to a structuring element B) can be extracted starting from a seed point inside the component p i Y i : X 0 = {p} X k = (X k 1 B) A, k = 1, 2, 3,... X k = X k 1 = end 168

170 10.9 Thinning (9.5.5) Thinning of an object A by one side by the hit-or-miss operation with a combined structuring element B: A B = A (A B) = A (A B) c Thinning by all sides of the object can be implemented with a series of structuring elements or a Golay alphabet {B} = {B 1, B 2,..., B n }: A {B} = (( ((A B 1 ) B 2 ) ) B n ) 169

171 Thickening (9.5.6) Thickening of an object A by one side by the hit-or-miss operation with a combined structuring element B: A B = A (A B) Thickening by all sides of the object can be implemented with a Golay alphabet {B}: A {B} = (( ((A B 1 ) B 2 ) ) B n ) Thinning and thickening are duals: (A B) c = A c B, B = (B 2, B 1 )

172 Golay alphabets ( ) thinning with L element (4-neighbors) L (1) = , L (2) = thinning with E element (4-neighbors) E (1) = , E (2) = thinning with M element (4-neighbors) M (1) = , M (2) = ,,, thinning with D and thickening with D element (4-neighbors) D (1) = , D (2) = thickening with C element (4-neighbors) C (1) = , C (2) = ,,

173 An example of thinning (9.5.5) 172

174 173 An example of thickening (9.5.6) Thickening can be implemented as the complement of the thinning of the complement:

175 LECTURE #8 174 Learning goals: After this lecture the student should be able to understand the creation and use of image pyramids be familiar with subband coding technique understand basic principles of multiresolution image processing know Haar s scaling and wavelet functions use one- and two-dimensional discrete wavelet transforms

176 11. Wavelets and multiresolution processing 175 Wavelets are sometimes a better method for image analysis than the Fourier transform. While the Fourier transform is a global method for the whole image, wavelets act more locally and reveal the location of structures that can be found in the image. One application area of wavelets is multiresolution processing of images, where one image is studied in varying sizes Image pyramids and subband coding (7.1) In the following examples this image will be used. In the different areas of the image there are very different intensity distributions and frequency contents.

177 176 Image pyramids (7.1.1) Image pyramids are a classical approach to multiresolution processing. The original image is typically lowpass filtered and downsampled (decimated) so that four original pixels are used to create one new pixel in the lower resolution image. N 2 -sized (N = 2 J ) original image leads to an image pyramid with P + 1 levels containing the total number of pixels: ( N ) P 3 N 2

178 Prediction residual pyramid (upsampling, interpolation, difference, Laplacian) 177 An example of an image pyramid (7.1.1) Approximation pyramid (Gaussian lowpass, downsampling)

179 178 Subband coding (7.1.2) In subband coding a signal is decomposed into two subbands, one of which low frequency and the other high frequency components. One-dimensional discrete input signal is x(n) and analysis filters with impulse responses h 0 (n) (low frequencies) and h 1 (n) (high frequencies). Both filtering results are downsampled so that only one sample out every two is retained leading to signals y 0 (n) and y 1 (n). The original signal is tried to be reconstructed by upsampling y 0 (n) and y 1 (n) by factor of two, filtering the sequences with impulse responses g 0 (n) and g 1 (n) of the synthesis filters, and summing the results to ˆx(n).

180 179 Subband coding and the Z-transform ( /7.1.2) The subband coding process can be understood from the point of view of its Z-transform X(z) = x(n)z n Then n= x down (n) = x(2n) X down (z) = 1 [ ] X(z 1/2 ) + X( z 1/2 ) { 2 x up x(n/2) n = 0, 2, 4,... (n) = X up (z) = X(z 2 ) 0 otherwise downsampling of x(n) followed by upsampling leads to X(z) = 1 [X(z) + X( z)] 2 The whole analysis synthesis chain is then: X(z) = 1 2 G 0(z) [H 0 (z)x(z) + H 0 ( z)x( z)]+ 1 2 G 1(z) [H 1 (z)x(z) + H 1 ( z)x( z)]

181 In order to get X(z) = X(z), the following has to be true: H 0 ( z)g 0 (z) + H 1 ( z)g 1 (z)= 0 H 0 (z)g 0 (z) + H 1 (z)g 1 (z) = 2 The synthesis filters g 0 (n) and g 1 (n) van then be solved when the analysis filters h 0 (n) and h 1 (n) are known (or vice versa): or g 0 (n) = ( 1) n h 1 (n) g 1 (n) = ( 1) n+1 h 0 (n) g 0 (n) = ( 1) n+1 h 1 (n) g 1 (n) = ( 1) n h 0 (n) If the filters are additionally orthonormal and of length 2K, then: g 1 (n) = ( 1) n g 0 (2K 1 n) h i (n) = g i (2K 1 n), i = 0, 1 180

182 181 A numerical example of subband coding (7.1.2) Let the lowpass analysis filter be h 0 = [ ] and highpass analysis filter h 1 = [ 1 1 ]. Then thus y 0 (n) = (x(n) + x(n 1))/2 and y 1 (n) = x(n) x(n 1). According to upper equations of the previous slide, the lowpass synthesis filter is g 0 = [ ( 1) 0 1 ( 1) 1 ( 1) ] = [ 1 1 ] and the highpass synthesis filter g 1 = [ ( 1) ( 1) ] = [ ] n x(n) y 0 (n) y 1 (n) y 0 (n) y 1 (n) y 0 (n) y 1 (n) g 0 : g 1 : ˆx(n)

183 182 An example of real-world analysis synthesis filters (7.1.2) Daubechies orthonormal filters of length 8:

184 183 Two-dimensional subband coding (7.1.2) The original discrete image x(m, n) can be similarly decomposed first by rows and then by columns to one 2-D approximation subband a(m, n) and one vertical, one horizontal and one diagonal detail subbands d V (m, n), d H (m, n) and d D (m, n):

185 Multiresolution expansions (7.2) In multiresolution analysis the input signal is decomposed in approximation and detail parts by using a scaling function and the corresponding wavelet function. Unlike with image pyramids, the total number of pixels does not increase. Multiresolution analysis can also be interpreted as recursive subband coding. Series expansions (7.2.1) Let s assume in the continuous valued case, that function f(x) can be expressed as a linear combination of expansion functions {ϕ k (x)}: f(x) = k α k ϕ k (x) The closed function space spanned by the expansion set is denoted: V = Span{ϕ k (x)} k

186 185 Scaling functions (7.2.2) Let s assume we have a real, square-integrable function ϕ(x), whose integer translates k and power-of-two scalings j are used as the set of expansion functions {ϕ j,k (x)}: ϕ j,k (x) = 2 j/2 ϕ(2 j x k) ϕ(x) is called a scaling function, because the shape of ϕ j,k (x) is scaled by varying j. By a proper selection of ϕ(x) one can span the whole function space L 2 (R). With a specific value of j, the spanned subspace of L 2 (R) is: V j = Span{ϕ j,k (x)} k

187 186 An example of a scaling function: Haar (7.2.2) Haar s scaling function is { 1 0 x < 1 ϕ(x) = 0 otherwise

188 187 Properties of scaling functions (7.2.2) In order for function ϕ(x) to work properly as a scaling function it has to fulfill the following requirements: The scaling function has to be orthogonal with its integer translates. The subspaces spanned by the scaling functions form a nested series: V... V 1 V 0 V 1... V The only function that belongs to all V j is f(x) = 0, i.e. V = {0}. Any function can be represented with arbitrary precision: V = L 2 (R).

189 188 can be ex- When the requirements are fulfilled all scaling functions of V j pressed as linear combinations of the scaling functions of V j+1 : ϕ j,k (x) = n = n h ϕ (n)ϕ j+1,n (x) h ϕ (n)2 (j+1)/2 ϕ(2 j+1 x n) and when j = k = 0: ϕ(x) = n h ϕ (n) 2ϕ(2x n) h ϕ (n) values are scaling function coefficients and they form a scaling vector h ϕ. The above equation is a fundamental one in multiresolution analysis and is known as the refinement equation or the dilation equation.

190 189 Wavelet functions (7.2.3) We saw above how the scaling function spanned an increasing set of function sub spaces. The set difference in the series are spanned by the corresponding wavelet functions. Let s define a wavelet function ψ(x) and its wavelet set {ψ j,k (x)}: ψ j,k (x) = 2 j/2 ψ(2 j x k) that spans the function subspace W j = Span k {ψ j,k (x)}.

191 190 We thus have or or L 2 (R) = V 0 W 0 W 1 L 2 (R) = V 1 W 1 W 2 L 2 (R) = V j0 W j0 W j0 +1 Also all wavelet functions of W j can be expressed as linear combinations of the scaling functions in V j+1 : ψ(x) = n h ψ (n) 2ϕ(2x n) where h ψ (n) are wavelet function coefficients forming a wavelet vector h ψ. h ψ (n) and h ϕ (n) are mutually related as: h ψ (n) = ( 1) n h ϕ (1 n)

192 191 An example of wavelet functions: Haar (7.2.3) Haar s wavelet function is 1 0 x < 0.5 ψ(x) = x < 1 0 otherwise

193 One-dimensional wavelet transform (7.3) Wavelet series expansion (7.3.1) A function f(x) of L 2 (R) can be expressed as a series expansion f(x) = k c j0 (k)ϕ j0,k(x) + d j (k)ψ j,k (x) j=j 0 k c j0 (k) are approximation or scaling coefficients and d j (k) are detail or wavelet coefficients.

194 An example of wavelet transform: Haar (7.3.1) 193

195 Discrete wavelet transform (7.3.2) The forward discrete wavelet transform (DWT) is: W ϕ (j 0, k) = 1 f(x)ϕ j0,k(x) M x W ψ (j, k) = 1 f(x)ψ j,k (x), when j j 0 M x where f(x) is defined for x = 0, 1,..., M 1 and M = 2 J. Similarly j gets values j = j 0, j 0 + 1,..., J 1 and k gets values k = 0, 1,..., 2 j 1. W ϕ (j 0, k) are approximation or scaling coefficients and W ψ (j, k) are detail or wavelet coefficients. ϕ j0,k(x) and ψ j,k (x) are sampled from the corresponding continuous-valued functions. f(x) can be reconstructed with the inverse transform: f(x) = 1 W ϕ (j 0, k)ϕ j0,k(x) + 1 M k M 194 j=j 0 W ψ (j, k)ψ j,k (x) k

196 An example of discrete one-dimensional wavelet transform (7.3.2) Let s transform the series f(0) = 1, f(1) = 4, f(2) = 3, f(3) = 0 with DWT by using Haar s scaling - and wavelet functions. We have now thus M = 4 and J = 2 and we can choose j 0 = 0 so that we will solve for the (j, k) pairs (0, 0), (1, 0) and (1, 1). By sampling ϕ j0,k(x) and ψ j,k (x) we get the Haar transform matrix: H 4 = The coefficients can be solved with row-wise matrix-vector products: W ϕ (0, 0) = 1 [ ] = 1 2 W ψ (0, 0) = 1 [ ( 1) + 0 ( 1)] = 4 2 W ψ (1, 0) = 1 2 [ ( 2) ] = W ψ (1, 1) = 1 2 [ ( 2)] =

197 Two-dimensional wavelet transform (7.5) Discrete two-dimensional wavelet transform can in principle be implemented by using two-dimensional scaling and wavelet functions. However, the discrete two-dimensional wavelet transform is separable, which means that the required two-dimensional functions can be expressed as products of one-dimensional scaling and wavelet functions: ϕ(x, y) = ϕ(x)ϕ(y) ψ H (x, y) = ψ(x)ϕ(y) ψ V (x, y) = ϕ(x)ψ(y) ψ D (x, y) = ψ(x)ψ(y) In practice, the two-dimensional transform is implemented as two consecutive one-dimensional transforms, first for columns, then for rows, or vice versa.

198 The block diagram of two-dimensional wavelet transform (7.5) 197

199 198 Haar transform (7.1.3) An N N-sized image f(x, y) is represented as a matrix F. transform is then T = HFH T The Haar where H is N N-sized Haar transform matrix made of rows of scaling function values ϕ j0,k(x) and rows of wavelet function values ψ j,k (x): Examples: H 2 = 1 [ ] and H 4 =

200 An example of Haar function in wavelet transform (7.1.3) 199

201 Another example of two-dimensional wavelet transform (7.5) 200

202 An example of real-world wavelets: Symlet wavelets (7.5) 201

203 LECTURE #9 202 Learning goals: After this lecture the student should be able to analyze different types of redundancies in digital images describe properties of lossless and lossy compression methods calculate compression ratios and relative redundancies explain self-information of events and entropy of an information source understand Shannon s first theorem estimate the entropy of a digital image name image fidelity criteria describe the image compression model and its parts

204 12. Image compression Compression is made before storing or transmitting the image. The compressed image can later be restored with the inverse transformation. If the original image can restored without losses, then the compression method is information preserving or lossless, otherwise lossy. 203 Background (8) Image compression means reducing the amount of data needed for the presentation of an image. A distinction has to be made between the information in the image and the data needed for presenting it. Calculating the amount of the former is difficult, the latter can be measured in bits. Mathematically compression can be seen as an attempt change a two-dimensional image matrix a statistically uncorrelated data set. Correlated data is always redundant because it contains unnecessary repetition. Removing redundancy results in compression.

205 When b = b, then C = 1 and R = 0. When b b, then C and R 1, so there is notable compression and the original data is redundant. When b b, then C 0 and R, so instead of compression there happens expansion of data Mathematical background (8.1) The same amount of information can be transmitted with varying amount of data. If the the same information can be presented with a smaller amount of data, then the current presentation contains redundant data. Let b and b be the counts of bits needed to present the same information in two different formats. The compression ratio when moving from b-length representation to b -length: C = b b The relative redundancy of the first representation compared to the latter is: R = 1 1 C

206 205 Forms of redundancy (8.1) There exist three different and mutually independent types of redundancy in digital image compression: coding redundancy spatial and temporal inter-pixel redundancy irrelevant information or psycho-visual redundancy Data compression can be obtained by eliminating fully or partially one or more of the redundancy forms.

207 Coding redundancy (8.1.1) The efficiency of image coding can be assessed by studying the intensity histogram of the image. By counting the number of occurrencies n k of each intensity value r k, k = 0, 1,..., L 1, one can estimate the occurrence probability p r (r k ) of the intensity: p r (r k ) = n k MN where M N is the size of the image in pixels. If presenting intensity value r k takes l(r k ) bits, then on the average bits per pixel are needed as L 1 L avg = l(r k )p r (r k ) k=0 If each intensity value is represented with the same number of m bits, the average code length L avg is likewise m regardless of the probabilities. If the true probability distribution is not even, this kind of direct coding is redundant as shown by the following example. 206

208 207 k p r (r k ) 1.code l 1 (r k ) 2.code l 2 (r k ) For the latter code L avg can be calculated to be 2.7 bits. Thus C = 1.11 and R = The example demonstrated that more efficient coding could be obtained by using fewer bits for the most frequent symbols and more bits for the infrequent symbols. There will always be coding redundancy if direct coding is used for images whose intensity distribution is not even.

208 Inter-pixel spatial redundancy, an example (8.1.2) The two images are mutually quite similar, especially if we study their intensity histograms.

209 208 Inter-pixel spatial redundancy, an example (8.1.2) The two images are mutually quite similar, especially if we study their intensity histograms. Removing only coding redundancy would probably lead to equal compression ratios. Studying the autocorrelation functions of the two images reveals the difference in the images contents: The horizontal periodicity of the rightmost image is visible as the periodicity of the autocorrelation function, whereas the autocorrelation function of the more random image is non-periodic.

210 209 Irrelevant information or psycho-visual redundancy (8.1.3) The sensitivity of the human eye for visual information varies: some information is more important than some other. Information that is irrelevant or not useful for the human eye and brain is called psycho-visually redundant and it can be eliminated without affecting too much the perceptual quality of the image. A human is unable to sense the intensities of single pixels, instead he sees larger entities such as edges, contours, textured areas that are then joined to even larger regions that can be recognized. The interpretation process uses the prior knowledge stored in the brain during the organ s lifetime. Removal of psycho-visually redundant data leads to quantitative loss of information, consequently in can be regarded as quantization. In quantization, a large number of input values are mapped to a smaller number of output values. The quantization process is irreversible.

211 210 Measuring the amount of information (8.1.4/8.3.1) If a random event E has the probability P (E), then E is said to contain information of the amount I(E) = log I(E) is the self-information of E. 1 P (E) = log P (E) If P (E) = 1, then E happens always and contains no information: I(E) = 0. If P (E) = 0.99, then E happening contains a small amount of information, I(E) = The event that E does not happen contains more information because that outcome is more infrequent. If base 2 logarithm is used, then I(E) is measured in bits. When P (E) = 0.5 then I(E) = log 2 (1/2) = 1 or 1 bit of information. This the amount of information in the outcome of tossing one coin.

212 Entropy and Shannon s first theorem (8.1.4) Let s assume that an information source produces symbols a j, j = 1,..., J, with probabilities P (a j ), J j=1 P (a j) = 1. The symbols form the source s symbol set A and the probabilities its probability vector z. A and z describe a memoryless information source fully. The entropy or average self-information of the source is J H(z) = P (a j ) log P (a j ) j=1 The larger the entropy the more bits are needed to code the symbols. Shannon s first theorem tells that it is possible to get the average symbol bit length close but not below the entropy of the source. The efficiency of a coding is: η = nh(z) [0, 1] L avg Here n specifies the extension of the source used in the coding. Generally larger n leads to more efficient coding. 211

213 212 An example of source extension and coding efficiency Let the information source be (A, z) = ({a 1, a 2 }, [2/3 1/3] T ), then: α i symbols P (α i ) I(α i ) l(α i ) code length n = 1 (first extension) α 1 a 1 2/ α 2 a 2 1/ n = 2 (second extension) α 1 a 1 a 1 4/ α 2 a 1 a 2 2/ α 3 a 2 a 1 2/ α 4 a 2 a 2 1/ Here α i are n-symbol blocks created from the original symbols. The entropy of the source is H = bit/symbol. When n = 1, the average code length L avg = 1 bit/symbol and η = When n = 2, L avg = 1.83 bit/symbol and η = 0.97.

214 213 Estimating the information content (8.1.4/8.3.4) In Shannon s first theorem it was assumed that the consecutive symbols from the information source are independent. With digital images this is normally not true. Therefore also the naive calculation of the entropy will exaggerate the randomness and required number of bits needed for storing the image information. Let the image be If we assume that the image data is stored as 8-bit values, the maximum value for entropy is H max = 8 bit/pixel.

215 A first degree estimate for the entropy can be obtained from the intensity value histogram: value count P / / / /8 The entropy can be estimated to be H 1 = 1.81 bit/pixel. A second degree estimate is obtained from intensity value pairs: value count P (21,21) 8 1/4 (21,95) 4 1/8 (95,169) 4 1/8 (169,243) 4 1/8 (243,243) 8 1/4 (243,21) 4 1/8 The entropy value is now estimated as H 2 = 1.25 bit/pixel. 214

216 215 For compression it is sometimes good to make a mapping where the consecutive pixel values are subtracted. The entropy can be estimated also from this presentation The value histogram gives us an estimate of H diff = 1.41 bit/pixel. This example showed that estimating the entropy of an image is far from trivial. Consequently measuring the amount of information in an image is problematic.

217 Subjective fidelity criteria are often based on averaged human visual gradings. Subjective criteria can be given on either absolute scales or relative scales by pair-wise comparison. 216 Fidelity criteria (8.1.5/8.1.4) Image quality and its changes can be asses with both objective and subjective fidelity criteria. Objective fidelity criteria are often based on some norm calculated from the error between a degraded version and the original image. The degradation can be caused by lossy image compression. The difference ˆf f can be interpreted to result from additive random noise e and calculate the root-mean-square error or mean-square signal-to-noise ratio: e(x, y) = ˆf(x, y) f(x, y) e rms = 1 MN SNR ms = M 1 N 1 x=0 y=0 M 1 N 1 x=0 M 1 x=0 e(x, y) 2 ˆf(x, y=0 y) 2 N 1 y=0 e(x, y)2

218 Image compression model (8.1.6/8.2) f(x, y) mapper quantizer symbol symbol channel inverse coder decoder mapper ˆf(x, y) encoder decoder The encoder consists of three blocks that should remove as much as possible the three types of redundancy in the image. Mapper converts the image to a non-visual form reduces the spatial inter-pixel redundancy the operation is normally reversible can reduce the amount of data (e.g. run-length coding) can implement an image transform whose outputs are easier to compress (e.g. cosine transform)

219 218 Quantizer reduces the image quality within the limits of a chosen fidelity criterion reduces irrelevant data and psycho-visual redundancy the operation is normally irreversible cannot be used in lossless compression Symbol coder creates code words for describing the outputs of the quantizer (or mapper) the code words may be fixed or varying lengths reduces coding redundancy the operation is reversible

220 Compression methods (8.2/ ) Some compression techniques will be presented in following slides. The methods can be combined to reduce the three types of visual redundancy. Lossless compression methods can be reverted fully and the original image restored. Lossless compression is needed e.g. for storing medical images and in situations where image acquisition is more expensive than transfer and storage, as in inter-planetary space probes. Compression ratios obtainable with lossless methods range between 2 5. Lossy compression methods can easily reach compression ratios in the range without ruining the image quality. Lossy compression can cause such systematic errors in images that are negligible for humans, but affect seriously automatic analysis of the image content. It is often difficult to find out whether an image has in some stage been compressed with a lossy compression method or not.

221 LECTURE # Learning goals: After this lecture the student should be able to create a variable-length coding with Huffman coding analyze properties of different implementations of run-length coding analyze the effect of the use of Gray code in bit-plane coding describe the steps and implementation of transform coding explain lossless and lossy predictive coding analyze the behavior of the Delta modulator understand DPCM coders and Lloyd-Max and optimum uniform quantizers

222 Variable-length coding (8.2.1/8.4.1) Variable-length coding is the main method for reducing coding redundancy. The best know and in some sense optimal variable-length coding methods is the Huffman coding. The Huffman coding is data dependent and created iteratively by combining symbols with the smallest probabilities in pair-wise manner until all symbols have been combined in a tree structure. The tree structure is then used to give variable-length codes to the symbols. The codes describe the path from the root node to the corresponding leaf symbol. The symbol tree has to be stored together with the coded data in order to be able to decode it.

223 An example of Huffman coding (8.2.1/8.4.1) a a a a a a a a a a a a The entropy of the source can be calculated as H(z) = 2.14 bit/symbol. The average word length in the Huffman coding is 2.2 bit/symbol, so the efficiency of the coding is η =

224 Arithmetic coding (8.2.3/8.4.1) Arithmetic coding is an example how discrete symbols can be mapped to real values and vice versa. The real value range [0, 1] is divided in intervals whose lengths match the probabilities of the corresponding symbols. Each coded symbol chain corresponds to some interval of [0, 1]. The interval after the next coded symbol is solved within the previous interval. An example: Let s code the 5-tuple a 1 a 2 a 3 a 3 a 4, when symbols a i have probabilities 0.2, 0.2, 0.4 and 0.2: a a 1 a 2 a 3 a 3 a a 3 a 2 a

225 Run-length coding (8.2.5/8.4.3) Run-length coding is a classical method for coding both intensity and binary images. (An intensity image can also be interpreted as a stack of binary images made of bit layers and encode each layer separately.) Each code word contains information on the pixel s intensity and many times that value is repeated in the run. In the binary case, black and white runs alternate and only the first value needs to be coded. Huffman coding can be used after run-length coding to reduce coding redundancy of the run length values. If zero-length runs are allowed then one can always start with a white run and encode long runs as combination of shorter ones.

The binary image has been run-length coded. The original presentation has needed 1024 343 1 = 351232 bits, after the coding 12166 (1+10) = 133826 bits. The compression ratio is thus 2.

226 The binary image has been run-length coded. The original presentation has needed = bits, after the coding (1+10) = bits. The compression ratio is thus 2.63 and the relative redundancy of the original presentation thus An example of run-length coding (8.2.5/8.1.2) row 100: (1,63) (0,87) (1,37) (0,5) (1,4) (0,556) (1,62) (0,210)

227 226 Two-dimensional run-length coding (8.2.5/8.4.3) In two-dimensional run-length coding, the starting positions of black and white runs are compared between the previous and the current row. Only the differences are coded. Inter-pixel redundancy can be reduced in two instead of only one direction.

228 227 Symbol-based coding (8.2.6/ ) In symbol-based coding, a set of small recurring subimages, called symbols, are first searched in the image. Then the image is coded as being composed of those symbols. Symbol-based coding can be effective if it contains only little noise and the subimages can be identified reliably and effectively. Symbol-based coding could be ideal for printed text if optical character recognition could be done with 100% accuracy:

229 Bit-plane coding (8.2.7/8.4.3) Bit-plane coding is a simple method for trying to remove inter-pixel redundancy. An image with 256 intensity levels is interpreted as 8 binary images each corresponding to one bit plane. The most significant bits correlate the most between adjacent pixels whereas the least significant bits correlate the least. Each bit-plane is coded separately e.g. with run-length coding. If coding of the binary plane does not compress but expands the data, then that bit plane is presented as it is. Coding redundancy can be reduced after bit-plane coding.

230 229 Gray code (8.2.7/8.4.3) Bit plane coding is the more efficient the more there is correlation between adjacent bit values on many bit planes. Correlation between adjacent bits can be strengthened by changing intensity coding from direct code to Gray code. In Gray code, there is always difference in exactly one bit position between successive values. The code words can be produced by mirroring: value direct code Gray code Mathematically Gray code can be generated from the direct code by using the exclusive OR operator : g m 1 = a m 1 g i = a i a i+1 0 i m 2

231 230

232 Block transform coding (8.2.8/8.5.2) All previous compression methods have operated in the spatial image domain. It is also possible implement compression in some transform domain. In transform compression one can use many different discrete transforms including Karhunen-Loève, Fourier, cosine, Walsh-Hadamard and wavelet transforms. The JPEG standard uses discrete cosine transform. All block transform codings rely on the same principle: original N N image compressed image decompressed image split in n n subimages transform quantization symbol encoding symbol decoding inverse transform combining n n subimages encoder channel decoder The original image split into n n-sized subimages, where spatial inter-pixel redundancy is removed by using the transform. Transform coefficients are quantized and coded. Many of the coefficients are quantized to zero or eliminated.

233 12.8 Kernel-based image transforms (8.2.8/8.5.2) The general form of a kernel-based image transform T (u, v) for an n n-sized image g(x, y) is: n 1 n 1 T (u, v) = g(x, y) r(x, y, u, v) x=0 y=0 n 1 n 1 g(x, y) = T (u, v) s(x, y, u, v) u=0 v=0 r(x, y, u, v) is the kernel of the transform and s(x, y, u, v) the kernel of the inverse transform. A transform kernel is separable, iff r(x, y, u, v) = r 1 (x, u) r 2 (y, v) A transform kernel is symmetric, iff r is separable and r 1 on functionally equivalent to r 2 : r(x, y, u, v) = r 1 (x, u) r 1 (y, v) 232

234 Image transform in matrix form Iff kernel r(x, y, u, v) is symmetric, the transform can be expressed as a matrix equation: T = AGA T, where G is n n image matrix, A is n n symmetric transform matrix, a ij = r 1 (i, j), T is n n transform, u, v = 0, 1,..., n 1 The inverse transform can be done with an inverse transform matrix B: Iff B = A 1, one obtains: BTB T = BAGA T B T G = BTB T The original image G can then be reconstructed losslessly from its transform T. If B A 1, one gets an approximation Ĝ: Ĝ = BAGA T B T Discrete Fourier, Walsh-Hadamard, Haar and discrete cosine transforms can be presented in the above formulation. 233

235 Fourier, Walsh-Hadamard and cosine transforms (8.2.8/8.5.2) Discrete 2D Fourier transform: r(x, y, u, v) = e j2π(ux+vy)/n and s(x, y, u, v) = 1 n 2 e j2π(ux+vy)/n Walsh-Hadamard transform: r(x, y, u, v) = s(x, y, u, v) = 1 n ( 1) m 1 i=0 [b i(x)p i (u)+b i (y)p i (v)] b i (z) is the ith bit of z s direct code from right p i (w) is the ith bit of w s Gray code from left Cosine transform: r(x, y, u, v) = s(x, y, u, v) (2x + 1)uπ = α(u)α(v) cos 2n 1 α(u) = n, when u = 0 2, when u 0 n 234 cos (2y + 1)vπ 2n

236 235 The 2D kernels of Walsh-Hadamard and cosine transforms (8.2.8/8.5.2) Walsh-Hadamard cosine

237 236 An example of Fourier, Hadamard and cosine coding (8.2.8/8.5.2) Fourier Hadamard cosine

238 Subimage size selection (8.2.8/8.5.2) One can see that the discrete cosine transform produces smaller squared error than Fourier and Hadamard transforms seems to be large enough size as the cosine transform s error does not decrease with larger subimages. 237

239 238 An example of subimage size selection (8.2.8/8.5.2) Too small subimages lead to visible blocks in the reconstructed image. original 8 8 original

240 239 Bit allocation (8.2.8/8.5.2) The last step in transform coding is the selection of the quantization strategy or bit allocation. One needs first to choose which coefficients are encoded at all, others will thus be quantized to zero. Next one needs to choose how many bits are used in the quantized coefficient values. The book describes two bit allocation strategies in detail: zonal coding and threshold coding.

241 Lossless predictive coding (8.2.9/8.4.4) Due to inter-pixel redundancy, the intensity values of neighboring pixels differ only little. If one knows the previous pixel values one can predict the next value. When the prediction errors are transmitted, the original image can be reconstructed losslessly. Coding prediction errors instead of the original intensity values generally leads to compression. By encoding the difference between the true pixel value f n and the prediction ˆf n, one encodes only the new information of each pixel and thus reduces correlation and redundancy. original image f n + prediction rounding e n Σ encoding compressed image decoding e n + + Σ original image f n prediction ˆf n ˆf n e(n) = f(n) ˆf(n)

241 An example of lossless predictive coding (8.2.9/8.4.4) First degree linear predictor ˆf(x, y) = round[αf(x, y 1)] The optimal value of α could be obtained from the image s

242 241 An example of lossless predictive coding (8.2.9/8.4.4) First degree linear predictor ˆf(x, y) = round[αf(x, y 1)] The optimal value of α could be obtained from the image s autocorrelation function, but the value α = 1 is used for simplicity. The prediction coding reduces the standard deviation of the source The entropy changes similarly

243 Lossy predictive coding (8.2.9/8.5.1) Prediction error based image compression can be made a lot more efficient by allowing the creation of compression errors. A quantization block is added to the encoder. The prediction error e is quantized to a limited number of output values ė. The quantization step determines the obtainable compression ratio and the amount of reconstruction error in the image. The prediction feedback loop in the encoder has to match that of the decoder so that both share the same inputs and state. original image f n + ˆf n Σ e n quantization prediction f n ė n Σ + encoding compressed image decoding ė n + + Σ ˆf n reconstructed image f n prediction + f(n) = ė(n) + ˆf(n)

244 243 Delta modulation (DM) (8.2.9/8.5.1) Delta modulation is the simplest lossy prediction coder: e(n) = f(n) ˆf(n) = f(n) αf(n 1) { +ζ, when e(n) > 0 ė(n) = 1 bit/pixel!!! ζ when e(n) 0 One can see two fundamental problems of DM: 1) it produces granular noise with constant input and 2) it cannot follow the input s largest changes due to slope overload. Both error types cause blurring in the reconstructed image.

245 244 Optimal predictors (8.2.9/8.5.1) A common assumption is that the quantization error e(n) ė(n) is independent of the prediction error e(n). With this assumption, one can devise the predictor by optimizing some quality criterion and neglecting the quantization error. Typically one tries to minimize the squared prediction error with the constraint and using the predictor model min! E{e(n) 2 } = E{(f(n) ˆf(n)) 2 } f(n) = ė(n) + ˆf(n) e(n) + ˆf(n) = f(n) ˆf(n) = m α i f(n i) i=1

246 245 This method is called differential pulse code modulation DPCM. The optimization task is then to select m prediction coefficient α i so that E{e(n) 2 } = E{(f(n) m α i f(n i)) 2 } i=1 is minimized. The optimal solution can be obtained for a given image f by solving the autocorrelation matrix of the image. In practice the benefit of using the true autocorrelation matrix is negligible compared to the effort needed and simpler methods are used.

247 Markov model in prediction (8.2.9/8.5.1) A common approach is to model the image as a two-dimensional Markov model and to solve the prediction coefficient from it. In a two-dimensional Markov model: In a linear fourth degree predictor: E{f(x, y)f(x i, y j)} = σ 2 ρ i vρ j h ˆf(x, y) = α 1 f(x, y 1)+α 2 f(x 1, y 1)+α 3 f(x 1, y)+α 4 f(x 1, y+1) y α i can be solved from the parameters of the Markov model: α 2 α 3 α 1 α 4 α 1 = ρ h, α 2 = ρ v ρ h, α 3 = ρ v, α 4 = 0 x 246

Prediction with a heuristic model (8.2.9/8.5.1) The prediction coefficients can be selected also heuristically or empirically by trial and error.

97f(x, y 1) ˆf(x, y) = 0.5f(x, y 1) + 0.5f(x 1, y) ˆf(x, y) = 0.75f(x, y 1) + 0.75f(x 1, y) 0.

248 Prediction with a heuristic model (8.2.9/8.5.1) The prediction coefficients can be selected also heuristically or empirically by trial and error. Here an image prediction coded with four different models and the resulting prediction error images are shown. The three linear models are: ˆf(x, y) = 0.97f(x, y 1) ˆf(x, y) = 0.5f(x, y 1) + 0.5f(x 1, y) ˆf(x, y) = 0.75f(x, y 1) f(x 1, y) 0.5f(x 1, y 1) When the degree of the model increases the standard deviation of the prediction error goes down The fourth non-linear { predictor (σ e = 4.1) is 0.97f(x, y 1), if h v ˆf(x, y) = 0.97f(x 1, y), otherwise 247

249 Due to symmetry: s i = s i and t i = t i 248 Optimal quantization (8.2.9/8.5.1) After selecting the predictor, one can devise the quantizer so that it meets some optimality criterion with the assumed distribution of prediction errors. In the quantizer, the range of input values s [s i 1, s i ) are mapped to one output value t i. Typically there are L = 2 n quantization levels t i and the distribution of the prediction errors is assumed to be symmetrical. Because small prediction errors s are more probable than large, the quantization levels t i are denser for small s (and symmetrically around zero). The optimality criteria for minimizing the squared quantization error are: si (s t i )p(s)ds = 0, i = 1, 2,..., L s i 1 2 0, when i = 0 t s i = i +t i+1 2, when i = 1, 2,..., L 2 1, when i = L 2

250 It is also possible to implement quantization where the quantization limits adapt locally to the spatial properties of the image. 249 A quantizer satisfying the above conditions is called the L degree Lloyd-Max quantizer. Another option is to use a quantizer with uniform spacing having the additional constraint t i t i 1 = s i s i 1 = θ and resulting with an optimum uniform quantizer. here it is assumed, that the prediction error follows the Laplace distribution with variance σ 2 e = 1. Real quantization levels t i and limits s i can be obtained by multiplying with the true standard deviation σ e.

251 250 An example of DPCM Lloyd-Max quantization (8.2.9/8.5.1) constant adaptive 1.0 b/pix b/pix 2.0 b/pix b/pix 3.0 b/pix b/pix

252 LECTURE # Learning goals: After this lecture the student should be able to understand the goals of segmentation process analyze discrete derivatives as image segmentation methods use basic Hough transform analyze threshold-based segmentation methods explain basic region-based segmentation techniques understand how motion can be used in segmentation

253 13. Image segmentation 252 Segmentation = splitting into parts Segmentation is used to extract objects or other meaningful entities or subparts from images. Subparts of the image can then be described and recognized. Segmentation methods are generally based on detecting discontinuities, typically the edges or borders between objects grouping similar areas, typically pixels that belong to same objects

254 Fundamentals of segmentation (10.1/10.4) The segmentation process can be formulated as: Let R be the whole image area that is to segmented into n subregions R 1, R 2,..., R n, such that: n i=1r i = R, so that the segmentation is complete R i is a connected set R i R j = i j, so that subregions are disjoint Q(R i ) = TRUE, i = 1, 2,, n, so that regions fulfill a homogeneity condition Q(R i R j ) = FALSE i j, so that adjacent regions cannot be combined

254 13.2 Detecting local changes (10.2/10.1) Discontinuities in images like points, lines and edges can generally be found by convolving with a proper detector mask. Isolated point detection (10.2.2/10.1.1) -1-1 -1-1 8-1 -1-1 -1 Isolated points can be detected with masks that look like the Laplace operator.

255 Detecting local changes (10.2/10.1) Discontinuities in images like points, lines and edges can generally be found by convolving with a proper detector mask. Isolated point detection (10.2.2/10.1.1) Isolated points can be detected with masks that look like the Laplace operator. On constant background the response is zero. Large enough, meaningful values can be detected by thresholding.

256 Line detection (10.2.3/10.1.2) Lines of one pixel width can be detected in the four main directions with the following masks (and the responses then possibly combined):

257 256 Edge detection (10.2.4/10.1.3) Edge detection is the most important discontinuity detection task, because detection of isolated points or one-pixel lines is seldom needed in intensity and color images. Idealized edge models (10.2.4/10.1.3) step edge ramp edge

258 257 Derivatives of the ramp edge in the ideal case (10.2.4/10.1.3) We can see the ramp edge between the dark and bright areas in the onedimensional continuous case. The shape of the edge is characteristic also in the first and second derivatives as maximum values and zero crossings, respectively.

259 258 Derivatives of ramp edge in noise (10.2.4/10.1.3) When noise is added, the ideal shapes of the ramp edge derivatives are severely corrupted. Especially the second derivative and its zero crossings become useless. The situation can be eased by first lowpass filtering the noisy and then performing the derivation.

260 259 Gradient operators (10.2.5/10.1.3) Edge detection in real two-dimensional discrete images is much more difficult, but derivation can be applied for it too. The most important discrete derivates are gradient operators and the discrete Laplace operator. The most common gradient operators in image processing are Roberts, Prewitt and Sobel masks. The value of a discrete gradient is a twodimensional vector that indicates the rate of change in horizontal and vertical directions. Combining the gradient to a scalar tells the absolute magnitude value, but not the direction of the change: mag( f) = gx 2 + gy 2 g x + g y

261 260 An example of gradient operators (10.2.5/10.1.3) original lowpass filtered =

262 Laplace operator and Marr-Hildreth detector (10.2.6/10.1.3) The Laplace operator approximates the two-dimensional second derivative: or The Laplace operator can also be made larger than 3 3 pixels by combining it with lowpass filtering. This approach is called Laplace of Gaussian or LoG filtering and it is used in the Marr-Hildreth edge detector: 261 G(r) = e r2 2σ 2 2 G(r) = r2 σ 2 σ 4 e r 2 2σ 2

263 262 An example of LoG filtering and zero crossings (10.2.6/10.1.3) original Sobel gradient lowpass Laplace mask LoG zero crossings

264 Edge linking and boundary detection (10.2.7/10.2) It is quite simple to find pixels that contain strong edges. It is much more difficult to link the edge pixels to make up chains of boundary pixels that separate the objects from each other and the background. Edge means that there is a distinctive local intensity change in the image. The edge is assumed to continue in the neighboring pixels and it assumed to have clearly defined direction that is perpendicular to the principal direction of the intensity change. Boundary means the (possibly imaginary or idealized) continuous path of pixels that separates the pixels belonging on one side to one object and on the other side to the other object. Linking the detected edge pixels to form boundaries can be based on both local and global operations.

264 Example of local processing of edge points (10.2.7/10.2.1) After edge pixels have been detected in the image, the following method based on local similarities between edge pixels can be used for

265 264 Example of local processing of edge points (10.2.7/10.2.1) After edge pixels have been detected in the image, the following method based on local similarities between edge pixels can be used for edge linking: 1) Edge pixels are compared with neighbors in a selected neighborhood area. 2) If neighboring pixels are similar enough, they are linked together. The neighborhood area can be 3 3 or 5 5 pixels. The similarity measure can be based on the magnitude and direction of gradients. The obtained boundaries need still be thinned to single pixel width with e.g. morphological thinning.

266 Hough transform (10.2.7/10.2.2) Hough [haff] transform is the most important global method for detecting boundaries. Hough transform uses as a linking criterion whether the edge pixels reside along a predefined parametric curve or not. All found edge pixels are used simultaneously and they are matched with different combinations of the curve parameter values. The edge pixels do not need to be adjacent, it suffices if they are on the same curve, typically on the same line. y b b = xia + yi Edge pixel (x i, y i ) is known. All lines passing through it satisfy y i = ax i + b or b = x i a + y i. (xi, yi) x a (x i, y i ) define one line in the ab parameter plane. Each point on the ab line corresponds to different line passing through (x i, y i ) in the image.

267 266 y b b = xia + yi Similarly another edge pixel (x j, y j ) is parameterized: (xj, yj) (xi, yi) x (a, b ) b = xja + yj a b = x j a + y j The intersection point (a, b ) of these lines can be calculated. and it defines the one line passing through both (x i, y i ) and (x j, y j ). In practice the ab parameter space is quantized as a two-dimensional matrix of accumulator cells. The parameter lines are drawn in the accumulator matrix and incrementing the cell values by one. The maximum values in the accumulator matrix determine the most probable line parameters.

268 267 y ρ (x i, y i ) A severe problem with the linear parameterization are lines that have infinite slope, i.e., a = and b =. This can be avoided by parameterizing the lines in the ρθ plane instead of the ab plane: x cos θ + y sin θ = ρ θ x Then each image pixel corresponds to a piece of trigonometric curve in the parameter plane. The line parameters are again found with the accumulator matrix. Hough transform can be used for detecting also more complex curves than lines, e.g., circles with three parameter values.

268 13.5 An example of Hough transform (10.2.7/10.2.2)

269 An example of Hough transform (10.2.7/10.2.2) original thresholded gradient accumulation cells detected lines

270 Thresholding (10.3) Segmentation can be implemented by directly using the pixels intensity values if the intensities of the object and background are assumed to be different. A threshold value T is selected and pixels with intensity larger/smaller than T are detected as belonging to the object and others to the background. There can be more than on threshold. The image will then be segmented into more than two segments. Global thresholds can be selected based on the observed intensity value histogram or on assumed shapes of intensity distributions. Local thresholds are based on some local image properties such as the averages or variances of the intensity. Variable thresholds are based on local properties and a priori information on variations in different areas of the image.

271 270 Note that nothing ensures that especially global thresholding would produce connected regions because thresholding does not take into account the twodimensional ordering of the pixels and their neighboring intensity values. If global thresholding produces connected regions, that is an indication of the clear difference between the object and background intensities. two-peak distribution three-peak distribution

272 271 An example of uneven lighting in thresholding (10.3.1/10.3.2) Uneven lighting causes the histogram peaks to be distorted. This can be explained by the illumination reflectance model: f(x, y) = i(x, y)r(x, y) After taking logarithms: ln f(x, y) = ln i(x, y) + ln r(x, y) One can say that the histogram of ln f(x, y):n is ln r(x, y) s histogram convolved with ln i(x, y) s histogram.

273 272 An example of simple global thresholding (10.3.2/10.3.3) The intensity histogram reveals that the original image is almost binary. Consequently it is simple to set the global threshold T so that the shadows are eliminated. Segmentation tasks are in practice seldom this easy.

274 273 Iterative global thresholding (10.3.2/10.3.3) A global threshold T can be set also iteratively. We guess the initial value of T. We segment the image by thresholding based on T. We calculate the average intensity values of the object and background: µ 1 and µ 2. We calculate a new threshold value T = 1 2 (µ 1 + µ 2 ). We iterate...

275 274 Optimal thresholding (10.3.3/10.3.5) Bayesian theory on optimal decision estimation can be applied to thresholding if the shapes of the probability density functions of the object and background intensities are assumed to be known and theior parameters are assumed known or can be estimated from data. Gaussian probability densities are often assumed due to their simple properties. The optimal threshold value T is such that P 1 p 1 (T ) = P 2 p 2 (T ) where P 1 and P 2 are the a priori probabilities of the object and background and p 1 (z) and p 2 (z) their probability distribution functions.

5) An X-ray image of a human heart is segmented by variable thresholding in non-overlapping regions.

276 275 An example of variable thresholding (10.3.7/10.3.5) An X-ray image of a human heart is segmented by variable thresholding in non-overlapping regions. Based on the histogram shape the region is decided to be either background (one peak) or boundary area (two peaks) or heart (one peak). The optimal thresholds are selected in each region based on assumed Gaussian distributions and interpolated between regions.

277 Region-based segmentation (10.4) Threshold-based segmentation typically leads to non-connected regions. Region-based segmentation methods take the connectedness of the regions as their primary goal. Regions are either merged to larger or split to smaller while maintaining their connectedness and homogeneity. Additional a priori information on the sizes of the regions, their shape and intensity value distributions can be made use of. Region-based segmentation starts either from the situation that all pixels are separate and they are started to be merged or that they are all in one large segment that is started to be split.

278 277 Region merging (10.4.1/10.4.2) In region merging, pixels or small segments are merged together to make larger segments. Pixel aggregation is a simple technique that starts from seed points around which other similar enough pixels are merged starting from their immediate neighbors. The similarity between pixels and segments is typically defined in terns of their intensity s average value and variance, texture or color. A central problem is, how to select the seed points automatically and how to define the similarity criterion. When selecting the seed points, one can make use of the a priori knowledge on the intensity values or do som ekind of clustering if better a priori information does not exist.

279 278 An example of region merging (10.4.1/10.4.2) We know a priori that all pixels with intensity value 255 are part of the object, so let s use them as the seed points. original seed points segmentation segment boundaries

280 279 Region splitting and merging (10.4.2/10.4.3) By starting from the assumption that the whole image is one non-homogeneous undersegmented segment, one can derive segmentation methods that try to split the large non-homogeneous segments into smaller more homogeneous ones. If the process leads to oversegmentation, small segments are merged again while making sure that the resulting segments are still homogeneous. In the split and merge methods the segments are typically presented as a quadtree whose non-homogeneous nodes are split to smaller nodes and then neighboring mutually homogeneous nodes are merged to larger nodes.

280 13.8 Use of motion in segmentation (10.6) If we have a series of images of a moving object, the movement can be used for segmentation.

We calculate the thresholded binary difference image: { 1 if f(x, y, t i ) f(x, y, t j ) > θ d ij (x, y) = 0 otherwise where θ is a threshold dependent on the amount of

281 Use of motion in segmentation (10.6) If we have a series of images of a moving object, the movement can be used for segmentation. The first image can be taken as a reference image to which the other images are compared. We calculate the thresholded binary difference image: { 1 if f(x, y, t i ) f(x, y, t j ) > θ d ij (x, y) = 0 otherwise where θ is a threshold dependent on the amount of noise. By summing up multiple difference images we can obtain the trace of the object in the image. The absolute accumulative difference image (AADI) is the sum of its positive (PADI) and negative (NADI) parts. Now PADI shows the approximate shape of the object and NADI its movement.

282 LECTURE # Learning goals: After this lecture the student should be able to understand the fundamentals of color sensing in human eye and brain explain tristimulus values and chromaticity diagram understand RGB, CMYK, HSI and CIE L*a*b* color models create pseudocolor images and implement color transforms know the basics of color image segmentation and filtering

283 14. Use of color in image processing Basics (6) Color is an important source of information for the human eye and brain. The human eye can distinguish thousands of colors and their intensities compared to the approximately one hundred grey scale intensities perceivable. In automatic image analysis colors help in segmenting and identifying objects. Color can be used for two main purposes: full-color images are images real images obtained with color sensors pseudocolor images are originally grey scale images that have been colored for human viewing

284 283 Light as a physical phenomenon (2.2,6.1) Frequency ν Wavelength λ = c ν Photon energy E = hν achromatic light is characterized by intensity only, comparable with black&white TV chromatic light considers the energy distribution of electromagnetic radiation in the band nm. Radiance is the energy of the radiant, measured in watts (W). Luminance measures the human perceived amount of energy in lumens (lm). For example infra red radiant has luminance close to zero. Brightness is a subjective measure.

Six wide bands can be identified: purple 380 450 nm blue 450 480 nm

285 Color fundamentals (6.1) The color spectrum consists of pure colors, i.e. colors that consist of radiation of one wavelength only. Six wide bands can be identified: purple nm blue nm green nm yellow nm orange nm red nm

286 285 Color sensing (6.1) Cone sensors in the human eye s fovea exist in three categories, each sensitive to different color bands. All color sensations (including pure colors of the color spectrum) consist the three simultaneous perceptions and their joint effect. Stimulations similar to that of pure colors can be obtained with other combinations of wavelengths than those of pure colors. Therefore different wavelength distributions can create sensing of pure colors. The color sensing is based on the reflected or radiated color wavelengths. A surface that reflects uniformly all wavelengths is sensed as white.

287 286 Primary colors (6.1) Wavelengths that correspond to red (R), green (G) and blue (B) are considered as the primary colors, because they match the human eye cones and they can generate the widest scale of combined colors. All colors cannot be produced from the RGB primaries. The primary colors have been standardized: blue nm, green nm and red 700 nm. These don t fully match the human eye physiology. Color television and LCD and LED monitors use three sub pixels whose colors match the above wavelengths and whose combined radiation creates the sensed color.

red + green = yellow The secondary colors are defined based on the wavelengths absorbed.

288 287 Secondary colors (6.1) Secondary colors mean pigment colors: cyan, magenta and yellow. Each secondary color can be created as a sum two primary colors: red + blue = magenta green + blue = cyan red + green = yellow The secondary colors are defined based on the wavelengths absorbed. As a combination of all secondary pigment colors black is created. In printing the colors are generally created with three secondary colors and black.

289 288 Color sensing (6.1) Sensing of colors is based on three physiological quantities: brightness hue characterizes the wavelength of the pure color matching the sensed color saturation characterizes the relative purity of the color, i.e. how much white/grey/black has been mixed with the pure color. The pure color of spectrum are fully saturated so they don t contain any mixed white. The chromaticity of color means the combination of hue and saturation.

290 289 Tristimulus values (6.1) The amounts of red, green and blue that are needed for creating the sense of some pure color of spectrum form the tristimulus values X, Y and Z. Trichromatic coefficients (6.1) The tristimulus values can be normalized to produce trichromatic coefficients: X x = X + Y + Z Y y = X + Y + Z Z z = X + Y + Z x + y + z = 1 Relations between sensed pure color wavelengths and the matching trichromatic coefficients have been tabulated empirically.

291 290 Chromaticity diagram (6.1) The trichromatic coefficients have only two degrees of freedom: when x and y are given, z can be solved as z = 1 x y. All colors can then be presented with combinations of red x and green y as the chromaticity diagram. The pure colors of spectrum can be identified on the boundary of the tongue-shaped diagram. Mixed colors reside in the inner parts of the diagram. In the middle there is the point of equal energy that matches the standard white. The saturation of colors is always equal to one on the diagram boundary and approaches zero when moving towards the point of equal energy.

292 291 By a proper combination of any three colors, all shades of colors that reside within the triangle defined by the three colors. Due to the convexity of the chromaticity diagram, all pure colors cannot be produced with any selection of three colors. The set of all producible colors is known as the gamuth of the display Color models (6.2) All color systems in regular use three components which is natural considering the human visual system. How the color axes are defined varies in the models.

293 292 RGB color model (6.2.1) In the RGB color model red, green and blue are mutually orthogonal color axes and each component has values between zero and one. Black is in the origin and the cube diagonal originating from it forms the grey scale or intensity axis to white. B (0,0,1) blue cyan (0,1,1) One can consider a color image to consist of three overlaid independent image planes or components. (1,0,1) magenta (0,0,0) grey scale black (1,1,1) white green (0,1,0) G RGB is a good choice for color presentation, but not so good choice for color image processing and color matching because it does not separate brightness from chromaticity. red (1,0,0) yellow (1,1,0) R

294 293 CMY(K) color model (6.2.2) (0,1,0) M magenta (1,1,1) red (0,1,1) (1,1,0) blue grey scale black cyan (0,0,0) white green yellow (0,0,1) Y C (1,0,0) (1,0,1) Additive RGB model s counterpart in printing technology is the subtractive CMY color model, where the mutually orthogonal axes match cyan, magenta and yellow. In the origin there is now white and the intensity scale ends to black in (1,1,1). C 1 R M = 1 G Y 1 B In actual printing, a four component CMYK model is used, where the fourth component is black (K). Then equal portions of CMY are replaced with equal amount of K.

295 Transforms between RGB and HSI models are nonlinear and therefore somewhat difficult to implement. Also the periodicity of hue H can cause some problems. 294 HSI color model (6.2.3) white blue For most of color image processing the color model of choice is the HSI system because magenta cyan intensity I (over value V) is separated from chromaticity HS black grey scale intensity red H S yellow green hue H is periodic, i.e. the ends of the spectrum (red and purple/- magenta) reside close to each other saturation S expresses the mixing of the pure color with white

296 HSI system axes and color circle (6.2.3) The intersection of HSI axes with constant intensity can be drawn as a triangle, circle or hexagon. hue 295 saturation intensity

297 296 CIE L a b color model (6.5.4) The CIE L a b color model derived from the tristimulus values X, Y and Z is the best choice for precise color matching because it separates intensity from chromaticity, it is colorimetric, i.e. close-by colors have close-by values and color dissimilarity can be measured with Euclidean distance, it is perceptually uniform, i.e. the differences in hue values are perceived uniformly. ( ) Y L = 116 h 16 (X W, Y W, Z W ) = reference white Y W [ ( ) ( )] X Y a = 500 h h X W Y W [ ( ) ( )] Y Z b = 200 h h Y W Z W { 3 q q > h(q) = 7.787q + 16/116 q

emphasizing some properties in the images. Intensity slicing (6.3.

298 Pseudocolor images (6.3) Color images can be produced from originally intensity images for emphasizing some properties in the images. Intensity slicing (6.3.1) One may choose some intensity level above which the intensities are presented with a color or a continuum of colors. original 8 pseudocolors

299 298 Intensity pseudocolor transforms (6.3.2) Each of the output color values I R, I G and I B can be defined as a continuous function of the input intensity value. As a result each intensity will have its own color and close-by intensities will be mapped to close-by colors.

An example of intensity pseudocolor transform (6.3.2) Color components can be created e.g. with sin functions with different wavelengths and phases.

300 An example of intensity pseudocolor transform (6.3.2) Color components can be created e.g. with sin functions with different wavelengths and phases. These can then be varied to produce sensations of different colors. The top/left pseudocolor transform succeeds to distinguish the intensities that match the hidden explosive in the X-ray image. 299

301 Processing of parallel intensity images (6.3.2) Satellites and space probes often acquire images whose wavelengths do not match those of RGB color cameras. The visualizations of planets and their moons are therefore pseudocolor images. blue green nm nm red infra red nm nm R+G+B Ir+G+B Washington D.C., Potomac river 300 Jupiter s moon Io

302 14.5 Color transforms (6.5) The general form of color transforms is the same as with spatial intensity images: g(x, y) = T [f(x, y)] Now f(x, y) and g(x, y) are vector valued and T [ ] similar to that in intensity transform operations where the neighborhood contains only pixel itself. s i = T i (r 1, r 2,..., r n ), i = 1, 2,..., n s i and r i refer to the color components in the used color coordinate system, e.g. RGB, CMYK tai HSI. Remember that H is periodic in HSI! An example: multiplication of image intensity with constant k: g(x, y) = k f(x, y) 3 HSI RGB CMY s 1 = r 1 s i = k r i, i = 1, 2, 3 s i = k r i + (1 k), i = 1, 2, 3 s 2 = r 2 s 3 = k r 301

dim others. The preferred color is defined e.g.

303 Color slicing (6.5.3) In color slicing one creates a pseudocolor image that emphasizes some colors and/or dim others. The preferred color is defined e.g. in the RGB space by its center (a 1, a 2, a 3 ) and the length l of the cube side or the radius R 0 of the sphere around it: { 0.5 if n j=1 s i = (r j a j ) 2 > R0 2 r i otherwise cube 302 sphere

303 Tone and color corrections (6.5.4) Incorrectly reproduced color tones can be corrected by applying tonal corrections to the color channels.

304 303 Tone and color corrections (6.5.4) Incorrectly reproduced color tones can be corrected by applying tonal corrections to the color channels. Simple intensity corrections can be applied as equal transform of R, G and B values. The tuning of the transforms needs to be done or fine-tuned manually.

305 304 Histogram processing (6.5.5) Intensity histogram corrections such as equalization can be applied also to color images. Histogram equalization should never be applied to the R, G and B channels separately, but instead to the I component of the HSI color model!

305 14.6 Color image smoothing and sharpening (6.6.1 2) Same spatial low-and high-pass filters used for intensity images can be used also for color images.

306 Color image smoothing and sharpening ( ) Same spatial low-and high-pass filters used for intensity images can be used also for color images. Filtering can be implemented separately and equally for the R, G and B components or more preferably only for the I component of the HSI model so that the H, S and I components do not get mixed. RGB I RGB-I low-pass high-pass

306 14.7 Color segmentation (6.7.1 2) Color-based segmentation can be implemented both in HSI and RGB color spaces. Intensity thresholding is here replaced by distance measurements between colors.

307 Color segmentation ( ) Color-based segmentation can be implemented both in HSI and RGB color spaces. Intensity thresholding is here replaced by distance measurements between colors. Compared to the simple color slicing principle presented earlier, a more general model can use the Mahanalobis distance between the pixel color and average color a to be found in segmentation: D(z, a) = (z a) T C 1 (z a) where C is the color autocovariance matrix of a set of images.

308 Color edge detection (6.7.3) A simple method for color edge detection is to search for edges in each color channel separately and then to sum the absolute values of three gradients. This is not very accurate as the gradient directions are neglected and they be different in each channel. A better solution is to define a six-dimensional gradient in the RGB space spanned by r, g and b unit vectors: u = R x r + G x g + B x b v = R y r + G y g + B y b

309 308 With u and v one can derive three other values: g xx = u u = R x 2 + G x 2 + B x 2 g yy = v v = R y 2 + G y 2 + B y 2 g xy = u v = R x R y + G x G y + B x And further the gradient direction in the image plane and its total magnitude: 2g xy θ(x, y) = 1 2 tan 1 g xx g yy 1 F θ (x, y) = 2 [g xx + g yy + (g xx g yy ) cos 2θ(x, y) + 2g xy sin 2θ(x, y)] B y

310 An example of color edge detection (6.7.3) original u&v R + G + B difference 309

311 Noise in color images (6.8) Noise in color images is often modelled as three independent and equally distributed additive noise components in the RGB color space, which is quite accurate. Optical filters used in the camera lenses may emphasize noise in some components. When moving form RGB to HSI space the noise is no more additive and equally distributed, especially not in the H and S channels.

312 Example of noise in color images (6.8) RGB H S I

Digital Image Processing

Digital Image Processing Third Edition Rafael C. Gonzalez University of Tennessee Richard E. Woods MedData Interactive PEARSON Prentice Hall Pearson Education International Contents Preface xv Acknowledgments