Image Processing. Application area chosen because it has very good parallelism and interesting output.

Chapter 11 Slide 517 Image Processing Application area chosen because it has very good parallelism and interesting output.

Low-level Image Processing Operates directly on stored image to improve/enhance it. Stored image consists of two-dimensional array of pixels (picture elements): Slide 518 Origin (, ) j i Picture element (pixel) p(i, j) Many low-level image-processing operations assume monochrome images and refer to pixels as having gray level values or intensities.

Computational Requirements Slide 519 Suppose a pixmap has 124 124 pixels and 8-bit pixels. Storage requirement is 2 2 bytes (1 Mbytes) Suppose each pixel must be operated upon just once. Then 2 2 operations are needed in the time of one frame. At 1-8 second/operation (1ns/operation), this would take 1 ms. In real-time applications, the speed of computation must be at the frame rate (typically 6 85 frames/second). All pixels in the image must be processed in the time of one frame; that is, in 12 16 ms. Typically, many high-complexity operations must be performed, not just one operation.

Slide 52 Point Processing Operations that produce output based upon value of a single pixel. Thresholding Pixels with values above predetermined threshold value kept and others below threshold reduced to. Given a pixel, x i, operation on each pixel is if (x i < threshold) x i = ; else x i = 1;

Slide 521 Contrast Stretching Range of gray level values extended to make details more visible. Given pixel of value x i within range x l and x h, the contrast stretched to the range x H to x L by multiplying x i by x H x L x i = ( x i x l ) ------------------- + x x h x l L Gray Level Reduction Number of bits used to represent the gray level reduced. Simple method would be to truncate the lesser significant bits.

Slide 522 Histogram Shows the number of pixels in the image at each gray level: Number of pixels 255 Gray level

Slide 523 Sequential code for(i = ; i < height_max; x++) for(j = ; j < width_max; y++) hist[p[i][j]] = hist[p[i][j]] + 1; where the pixels are contained in the array p[][] and hist[k] will hold the number of pixels having the kth gray level. Similar to adding numbers to an accumulating sum and similar parallel solutions can be used for computing histograms.

Slide 524 Smoothing, Sharpening, and Noise Reduction Smoothing suppresses large fluctuations in intensity over the image area and can be achieved by reducing the high-frequency content. Sharpening accentuates the transitions, enhancing the detail, and can be achieved by two ways. Noise reduction suppresses a noise signal present in the image.

Slide 525 Often requires a local operation with access to a group of pixels around the pixel to be updated. A common group size is 3 3: x x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8

Slide 526 Mean A simple smoothing technique is to take the mean or average of a group of pixels as the new value of the central pixel. Given a 3 3 group, the computation is x 4 ' = x + x 1 + x 2 + x 3 + x 4 + x 5 + x 6 + x 7 + x 8 --------------------------------------------------------------------------------------------------------- 9 where x 4 ' is the new value for x 4.

Slide 527 Sequential Code Nine steps to compute the average for each pixel, or 9n for n pixels. A sequential time complexity of Ο(n).

Parallel Code Slide 528 Number of steps can be reduced by separating the computation into four data transfer steps in lock-step data-parallel fashion. Step 1 Each pixel adds pixel from left Step 2 Each pixel adds pixel from right Step 3 Each pixel adds pixel from above Step 4 Each pixel adds pixel from below

Parallel Mean Data Accumulation Slide 529 x x 1 x 2 x x 1 x 2 x + x 1 x +x 1 +x 2 x 3 x 4 x 5 x 3 x 4 x 5 x 6 x 3 + x 4 x 7 x 8 x 3 +x 4 +x 5 x 6 x 7 x 8 x 6 + x 7 x 6 +x 7 +x 8 (a) Step 1 (b) Step 2

Slide 53 x x 1 x 2 x x 1 x 2 x +x 1 +x 2 x +x 1 +x 2 x 3 x 4 x 5 x 3 x 4 x 5 x +x 1 +x x +x 1 +x 2 2 x 3 +x 4 +x x 3 +x 4 +x 5 5 x 6 +x 7 +x 8 x 6 x 7 x 8 x 6 x 7 x 8 x 6 +x 7 +x 8 x 6 +x 7 +x 8 (c) Step 3 (d) Step 4

Median Slide 531 Sequential Code Median can be found by ordering pixel values from smallest to largest and choosing center pixel value (assuming an odd number of pixels). With a 3 3 group, suppose values in ascending order are y, y 1, y 2, y 3, y 4, y 5, y 6, y 7, and y 8. The median is y 4. Suggests that all the values must first be sorted, and then fifth element taken. Using bubble sort, in which the lesser values found first in order, sorting could, in fact, be terminated after fifth lowest value obtained. Number of stepsgiven by 8 + 7 + 6 + 5 + 4 = 3 steps, or 3n for n pixels.

Parallel Code An Approximate Sorting Algorithm Slide 532 First, a compare-and-exchange operationperformed on each of the rows, requiring three steps. For the ith row, we have p i,j 1 p i,j p i,j p i,j+1 p i,j 1 p i,j where means compare and exchange if left gray level greater than right gray level. Then done on columns: p i 1,j p i,j p i,j p i+1,j p i 1,j p i,j Value in p i,j taken to be fifth largest pixel value. Does not always select fifth largest value. Reasonable approximation. Six steps.

Approximate median algorithm requiring six steps Slide 533 Largest in row Next largest in row Next largest in column

Weighted Masks Slide 534 The mean method could be described by a weighted 3 3 mask. Suppose the weights are w, w 1, w 2, w 3, w 4, w 5, w 6, w 7, and w 8, and pixel values are x, x 1, x 2, x 3, x 4, x 5, x 6, x 7, and x 8. The new center pixel value, x 4 ', is given by x 4 ' = w x + w 1 x 1 + w 2 x 2 + w 3 x 3 + w 4 x 4 + w 5 x 5 + w 6 x 6 + w 7 x 7 + w 8 x 8 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- k Scale factor, 1/k, set to maintain correct grayscale balance. Often, k is given by w + w 1 + w 2 + w 3 + w 4 + w 5 + w 6 + w 7.

Slide 535 Using a 3 3 Weighted Mask Mask Pixels Result w w 1 w 2 x x 1 x 2 w 3 w 4 w 5 x 3 x 4 x 5 = x 4 ' w 6 w 7 w 8 x 6 x 7 x 8 The summation of products, w i x i, from two functions w and x is the (discrete) cross-correlation of f with w (written as f w).

Slide 536 Mask to compute mean 1 1 1 x 4 x x 1 x 2 x 3 x 5 x 6 x 7 x 8 x 4 ' = ---------------------------------------------------------------------------------------------------------- 9 k = 9 1 1 1 1 1 1 A noise reduction mask 1 1 1 x 4 ' = 8x 4 + x + x 1 + x 2 + x 3 + x 5 + x 6 + x 7 + x 8 ------------------------------------------------------------------------------------------------------------------- 16 k =16 1 8 1 1 1 1 High-pass sharpening filter mask 1 1 1 x 4 ' = 8x 4 x x 1 x 2 x 3 x 5 x 6 x 7 x 8 -------------------------------------------------------------------------------------------------------------- 9 k = 9 1 8 1 1 1 1

Slide 537 Edge Detection Highlighting edges of object where an edge is a significant change in gray level intensity. Gradient and Magnitude With a one-dimension gray level function, f(x), first derivative, f/ x, measures the gradient.. Edge recognized by a positive-going or negative-going spike at a transition.

Edge Detection using Differentiation Slide 538 Intensity transition First derivative Second derivative

Image Function A two-dimensional discretized gray level function, f(x,y). Gradient (magnitude) f Gradient Direction where φ is the angle with respect to the y-axis. Gradient can be approximated to for reduced computational effort = f ----- 2 f x + ----- y 2 f φ( x, y) tan 1 ----- y = ------- f ----- x f f ----- y + f ----- x Slide 539

Gray Level Gradient and Direction Slide 54 x Image y f(x, y) φ Constant intensity Gradient

Slide 541 Edge Detection of Image Function Image is a discrete two-dimensional function. Derivative approximated by differences: f/ x is difference in x-direction f/ y is difference in y-direction

Slide 542 Edge Detection Masks Might consider computing the approximate gradient using x 5 and x 3 (to get f/ x) and x 7 and x 1 (to get f/ y); i.e., so that f ----- x x 5 x 3 f ----- x y 7 x 1 f x 7 x 1 + x 5 x 3

Two masks needed, one to obtain x 7 x 1 and one to obtain x 5 x 3. The absolute values of results of each mask added together. Slide 543 1 1 1 1

Slide 544 Prewitt Operator The approximate gradient obtained from Then f ----- y ( x 6 x ) + ( x 7 x 1 ) + ( x 8 x 2 ) f ( x ----- x 2 x ) + ( x 5 x 3 ) + ( x 8 x 6 ) f x 6 x + x 7 x 1 + x 8 x 2 + x 2 x + x 5 x 3 + x 8 x 6 which requires using the two 3 3 masks.

Slide 545 Prewitt operator 1 1 1 1 1 1 1 1 1 1 1 1

Slide 546 Sobel Operator Derivatives are approximated to f ----- ( x y 6 + 2x 7 + x 8 ) ( x + 2x 1 + x 2 ) f ----- ( x x 2 + 2x 5 + x 8 ) ( x + 2x 3 + x 6 ) Operators implementing first derivatives will tend to enhance noise. However, the Sobel operator also has a smoothing action.

Slide 547 Sobel Operator 1 2 1 1 1 2 2 1 2 1 1 1

Edge Detection with Sobel Operator Slide 548 (a) Original image (Annabel) (b) Effect of Sobel operator

Laplace Operator Slide 549 The Laplace second-order derivative is defined as 2 2 2 f f f = -------- x 2 + -------- y 2 approximated to 2 f = 4x 4 ( x 1 + x 3 + x 5 + x 7 ) which can be obtained with the single mask: 1 1 4 1 1

Slide 55 Pixels used in Laplace operator Upper pixel x 1 x 3 x 4 x 5 Left pixel Right pixel x 7 Lower pixel

Effect of Laplace operator Slide 551

Hough Transform Slide 552 Purpose is to find the parameters of equations of lines that most likely fit sets of pixels in an image. A line is described by the equation y = ax + b where the parameters, a and b, uniquely describe the particular line, a the slope and b the intercept on the y-axis. A search for those lines with the most pixels mapped onto them would be computationally prohibitively expensive [ Ο(n 3 )].

Suppose the equation of the line is rearranged as: Slide 553 b = -xa + y Every point that lies on a specific line in the x-y space will map into same point in the a-b space (parameter space). Pixel in image y b y = ax + b b = x 1 a + y 1 (x1, y1) b = xa + y (a, b) x (a) (x, y) plane a (b) Parameter space

Slide 554 Finding the Most Likely Lines In the mapping process, discrete values will be used to a coarse prescribed precision and the computation is rounded to the nearest possible a-b coordinates. The mapping process is done for every point in the x-y space. A record is kept of those a-b points that have been obtained by incrementing the corresponding accumulator. Each accumulator will have the number of pixels that map into a single point in the parameter space. The points in the parameter space with locally maximum numbers of pixels are chosen as lines.

Slide 555 Unfortunately, this method will fail for vertical lines (i.e., with the slope, a, infinite and with the y intercept, b, infinite) and with lines that approach this extreme. To avoid the problem, line equation rearranged to polar coordinates: r = x cos θ + y sin θ where r is the perpendicular distance to the origin in the original (x, y) coordinate system and θ is the angle between r and the x-axis. θ very conveniently the gradient angle of line (with respect to x-axis).

Slide 556 y y = ax + b r r = x cos θ + y sin θ (r, q) r q (a) (x, y) plane x q (b) (r, θ) plane

Slide 557 Implementation Assume origin at the top left corner. x r θ y

Slide 558 The parameter space divided into small rectangular regions. One accumulator for each region. Accumulators of those regions that a pixel maps into incremented. Process done for all pixels in image. If all values of θ were tried (i.e., incrementing θ through all its values), computational effort would be given by the number of discrete values of θ, say k intervals. With n pixels the complexity is Ο(kn). Computational effort can be reduced significantly by limiting range of lines for individual pixels using some criteria. A single value of θ could be selected based upon the gradient of the line.

Accumulators, acc[r][θ], for Hough Transform Slide 559 Accumulator r 15 1 5 1 2 3 θ

Slide 56 Sequential Code Sequential code could be of the form for (x = ; x < xmax; x++) /* for each pixel */ for (y = ; y < ymax; y++) { sobel(&x, &y, dx, dy); /* find x and y gradients */ magnitude = grad_mag(dx, dy); /* find magnitude if needed */ if (magnitude > threshold) { theta = grad_dir(dx, dy); /* atan2() fn */ theta = theta_quantize(theta); r = x * cos(theta) + y * sin(theta); r = r_quantize(r); acc[r][theta]++; /* increment accumulator */ append(r, theta, x, y); /* append point to line */ } }

Slide 561 Parallel Code Since the computation for each accumulator is independent of the other accumulations, it could be performed simultaneously, although each requires read access to the whole image. Left as an exercise.

Slide 562 Transformation into the Frequency Domain Fourier Transform Many applications in science and engineering. In image processing, Fourier transform used for image enhancement, restoration, and compression. Image is a two-dimensional discretized function, f(x, y), but first start with one-dimensional case. For completeness, let us first review results of Fourier series and Fourier transform concepts from first principles.

Slide 563 Fourier Series The Fourier series is a summation of sine and cosine terms: a x( t) = ----- + a 2πjt j cos ---------- + b 2πjt 2 T j sin ---------- T j = 1 T is the period (1/T = f, where f is a frequency). By some mathematical manipulation: 2πij t --- x( t) = X j e T j = where X j is the jth Fourier coefficient in a complex form and i = 1. (Fourier coefficients can also be computed from specific integrals.)

Fourier Transform Slide 564 Continuous Functions The previous summation developed into an integral: x( t) = X( f )e 2πif d f where X(f ) is a continuous function of frequency. The function X(f ) can be obtained from X( f ) = x( t)e 2πift dt X(f ) is the spectrum of x(t), or the Fourier transform of x(t). The original function, x(t), can obtained from X(f ) using the first integral given, which is the inverse Fourier transform..

Discrete Functions Slide 565 For functions having a set of N discrete values. Replace integral with summation, leading to the discrete Fourier transform (DFT): 2πi jk 1 N 1 ---- X k --- x j e N = N j = and inverse discrete Fourier transform given by x k 2πi jk N 1 ---- = X j e N j = for k N 1. The N (real) input values, x, x 1, x 2,, x N 1, produce N (complex) transform values, X, X 1, X 2,, X N 1.

Slide 566 Fourier Transforms in Image Processing A two-dimensional Fourier transform is X lm N 1 M 1 2πi jl km --- + ------ = x jk e N M j = k = where j N 1 and k M 1. Assume image is square, where N = M.

Slide 567 Equation can be rearranged into X lm = N 1 j = N 1 2 πi km x jk e ------ N k = 2πi jl --- e N Inner summation a one-dimensional DFT operating on N points of a row to produce a transformed row. Outer summation a onedimensional DFT operating on N points of a column. Can be divided into two sequential phases, one operating on rows of elements and one operating on columns: X lm = N 1 2πi jl --- X jm e N j =

Two-Dimensional DFT Slide 568 j k Transform rows Transform columns x jk X jm X lm

Applications Frequency filtering can be described by the convolution operation: Slide 569 h( j, k) = g( j, k) f( j, k) where g( j, k) describes weighted mask (filter) and f( j, k) the image. The Fourier transform of a product of functions is given by the convolution of the transforms of the individual functions. Hence, convolution of two functions obtained by taking the Fourier transforms of each function, multiplying the transforms H(j, k) = G(j, k) F(j, k) (element by element multiplication), where F(j, k) is the Fourier transform of f(j, k) and G( j, k) is the Fourier transform of g( j, k), and then taking the inverse transform to return result into spatial domain.

Convolution using Fourier Transforms Slide 57 Image Transform fj,k Convolution f(j, k) F(j, k) Multiply Inverse transform hj,k H(j, k) h(j, k) gj,k g(j, k) G(j, k) Filter/image (a) Direct convolution (b) Using Fourier transform

Parallelizing the Discrete Fourier Transform Slide 571 Starting from X k = 2πi jk N 1 ---- x j e N j = and using the notation w = e 2πi/N, w terms called twiddle factors. Each input multiplied by twiddle factor. X k N 1 = x j w jk j = Inverse transform can be obtained by replacing w with w 1.

Sequential Code Slide 572 for (k = ; k < N; k++) { /* for every point */ X[k] = ; for (j = ; j < N; j++) /* compute summation */ X[k] = X[k] + w j * k * x[j]; } X[k] is kth transformed point, x[k] is kth input, w = e -2pi/N. Summation requires complex number arithmetic. Can be rewritten: for (k = ; k < N; k++) { X[k] = ; a = 1; for (j = ; j < N; j++) { X[k] = X[k] + a * x[j]; a = a * w k ; } } where a is a temporary variable.

Elementary Master-Slave Implementation Slide 573 One slave process of N slave processes assigned to produce one transformed value; i.e., kth slave process produces X[k]. Parallel time complexity with N (slave) processes is Ο(N). Master process w w 1 w n-1 Slave processes X[] X[1] X[n-1]

Pipeline Implementation Unfolding the inner loop for X[k], we have Slide 574 X[k] = ; a = 1; X[k] = X[k] + a * x[]; a = a * w k ; X[k] = X[k] + a * x[1]; a = a * w k ; X[k] = X[k] + a * x[2]; a = a * w k ; X[k] = X[k] + a * x[3]; a = a * w k ;. Each pair of statements X[k] = X[k] + a * x[]; a = a * w k ; could be performed by a separate pipeline stage.

Slide 575 One stage of a pipeline implementation of DFT algorithm x[j] X[k] a w k Process j + a x[j] Values for next iteration X[k] a w k

Slide 576 Discrete Fourier transform with a pipeline 1 w k x[] x[1] x[2] x[3] x[n-1] X[k] a w k P P 1 P 2 P 3 P N-1 Output sequence X[],X[1],X[2],X[3]

Slide 577 Timing diagram X[]X[1]X[2]X[3]X[4] X[5] X[6] P N-1 P N-2 Pipeline stages P 2 P 1 P Time

DFT as a Matrix-Vector Product Slide 578 The kth element of discrete Fourier transform given by X k = x w + x 1 w 1 + x 2 w 2 + x 3 w 3 + x N 1 w N 1 Whole transform can be described by a matrix-vector product: X X 1 X 2 X 3. X k. X N 1 = 1 --- N 1 1 1 1.. 1 1 w w 2 w 3.. w N 1 1 w 2 w 4 w 6.. w 2 ( N 1 ) 1 w 3 w 6 w 9.. w 3 N 1....... 1 w k w 2 k w 3 k.. w N 1....... 1 w N 1 w 2 ( N 1 ) w 3 N 1 ( ).. w N 1 ( ) ( N 1 ) x x 1 x 2 x 3. x k. x N 1 (Note w = 1.) Hence, parallel methods for matrix-vector product as described in Ch. 1 can be used for discrete Fourier transform.

Slide 579 Fast Fourier Transform Method of obtaining discrete Fourier transform with a time complexity of Ο(N log N) instead of Ο(N 2 ). Let us start with the discrete Fourier transform equation: where w = e 2πi/N. N 1 1 X k = --- x j w jk N j =

Slide 58 Each summation an N/2 discrete Fourier transform operating on N/2 even points and N/2 odd points, respectively. 1 X k = -- X 2 even + w k X odd for k =, 1, N 1, where X even is the N/2-point DFT of the numbers with even indices, x, x 2, x 4,, and X odd is the N/2-point DFT of the numbers with odd indices, x 1, x 3, x 5,.

Slide 581 Now, suppose k is limited to, 1, N/2 1, the first N/2 values of the total N values. Complete sequence divided into two parts: and 1 X k = -- X 2 even + w k X odd 1 X k + N 2 = -- X 2 even + w k + N 2 X odd = 1 -- X 2 even w k X odd since w k+n/2 = w k, where k < N/2. Hence, we could compute X k and X k+n/2 using two N/2-point transforms:

Decomposition of N-point DFT into two N/2-point DFTs Slide 582 Input sequence Transform x x 1 x 2 x 3 N/2 pt DFT X even + X k x N-2 x N-1 N/2 pt DFT X odd w k X k+n/2 k =, 1, N/2

Slide 583 Each of the N/2-point DFTs can be decomposed into two N/4-point DFTs and the decomposition could be continued until single points are to be transformed. A 1-point DFT is simply the value of the point.

Computation often depicted in the form: Slide 584 x + + X x 1 + X 1 x 2 + + X 2 x 3 + X 3 Four-point discrete Fourier transform

Slide 585 Sequential Code Sequential time complexity is essentially Ο(N log N) since there are log N steps and each step requires a computation proportional to N, where there are N numbers. The algorithm can be implemented recursively or iteratively.

Slide 586

Parallelizing the FFT Algorithm Binary Exchange Algorithm x x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 1 x 11 x 12 x 13 x 14 x 15 Sixteen-point FFT computational flow X X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 1 X 11 X 12 X 13 X 14 X 15

Slide 587

Mapping processors onto 16-point FFT computation Process Row P P 1 P 2 P 3 Inputs P/r x 1 x 1 1 x 2 11 x 3 1 x 4 11 x 5 11 x 6 111 x 7 1 x 8 11 x 9 11x 1 111x 11 11x 12 111x 13 111x 14 1111x 15 Outputs X X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 1 X 11 X 12 X 13 X 14 X 15

16 Point FFT WITHOUT TRANSPOSE ON FOUR PROCESSORS, EACH PROCESSOR HAS A COLUMN Two of the four phases involve non-local communication

P P1 P2 P3 16 Point FFT When Elements in a 4 by 4 square 8 4 12 1 9 5 13 2 1 6 14 3 11 7 15 8 4 12 1 9 5 13 2 1 6 14 3 11 7 15 Comms local local 8 4 12 1 9 5 13 2 1 6 14 3 11 7 15 8 4 12 1 9 5 13 2 1 6 14 3 11 7 15 non- local non- local 8 4 12 1 9 5 13 2 1 6 14 3 11 7 15

FFT TRANSPOSE ON FOUR PROCESSORS Apart from the transpose operation all communication is local

8 4 12 1 9 5 13 2 1 6 14 3 11 7 15 8 4 12 1 9 5 13 2 1 6 14 3 11 7 15 8 4 12 1 9 5 13 2 1 6 14 3 11 7 15 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 P P1 P2 P3 Comms local local transpose local local FFT TRANSPOSE ALGORITHM

Transpose Algorithm If processors organized as 2-dimensional array, communications first takes place between processors in each column, and then in each row: FFT using transpose algorithm first two steps. P P 1 P 2 P 3 Slide 588 x x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 1 x 11 x 12 x 13 x 14 x 15

During the first two steps, all communication within a processor. Duringlast two steps, the communication between processors. Between the first two steps and the last two steps, the array elements transposed. Slide 589 P P 1 P 2 P 3 x x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 1 x 11 x 12 x 13 x 14 x 15

After transpose, last two steps proceed but now involve communication only within the processors. Only communication between processors is to transpose array. FFT using transpose algorithm last two steps Slide 59 P P 1 P 2 P 3 x x 4 x 8 x 12 x 1 x 5 x 9 x 13 x 2 x 6 x 1 x 14 x 3 x 7 x 11 x 15