Authors: The Jackson Three Michael Bach, Jan Haenel, Jian Kun Shen Advising Professor: Dr Bogdan J. Matuszewski ABSTRACT

BACK Application Report Real-Time Image Mosaic Construction Authors: The Jackson Three University Michael Bach, Jan Haenel, Jian Kun Shen of Advising Professor: Dr Bogdan J. Matuszewski Central Lancashire ABSTRACT The aim of this project is to show one of possible implementations of a real-time image mosaic algorithm for mass-market hand held digital still cameras to improve their functionality. Contrary to any offline based image mosaic construction software, provided also by some camera manufacturers, this online mosaic construction has to be a real time application, which is capable to find the transformation parameters between a previous taken pictures and the live stream of images visible on a camera display. The primary aim of the proposed image mosaic construction is to help user with camera guidance and picture acquisition. The camera guidance is related to finding relation between previously taken pictures and the live video frames stream displayed on a camera s LCD. This assures that all parts of scene, in which user is interested, are included in the captured photos and there is sufficient overlap between them to form high quality single picture of the whole scene. As the result of the adopted constraints and assumptions a simple rigid transformation model is used for mosaic construction. Due to the large amount of data that has to be processed, a high performance DSP TMS320C6000 platform from Texas Instruments (TI ) is being employed to perform mosaic construction. To prove the possibility of online real time mosaic construction a development environment for TMS320C6700 processors was used, consisting of a DSK6701 board and Code Composer Studio. Image data exchange between host and DSK6701 board was realized using the DSP BIOS. This document was an entry in the TI DSP Challenge 2000, an annual contest organized by TI to encourage students from around the world to find innovative ways to use DSPs. For more information on the TI DSP Challenge 2000, see TI s World Wide Web site at www.ti.com/sc/dsp_challenge. Real Time Image Mosaic Construction 1

INTRODUCTION Among many applications of image mosaic construction the most important and probably the most often used in practice are increase of a camera angle of view and resolution enhancement. Two ways are possible to obtain the image mosaic. First, a hardware-based method, uses parabolic mirrors or time synchronized multiple camera systems. Due to their costs they are not suitable for mass market. Second, a software-based method, uses a post processing of images to find their correct alignment usually on planar, cylindrical or spherical surface. This process requires uncovering the correspondence between overlapped images and correction of perspective distortions. As a result of today's broad presence of home PC's and acceptable software costs this method seams to be suitable for the mass market. The offline based mosaic construction brings the disadvantage that users have to ensure sufficient image overlapping during a single acquisition of images to be used in the mosaic. An implementation of the online-based image mosaic construction to guide the picture acquisition should simplify this process. There is a broad spectrum of methods reported in literature concerned with construction of the image mosaic. The biggest differences between them are the transformation class used to align images in the mosaic and method used for estimation of unknown parameters of this transformation. In general the mosaic construction can be divided into several steps: 1. Finding the correspondence between images overlapping area 2. Transformation of images to ensure alignment 3. Blending information from different images in overlapping area. Mosaic construction method described in this report uses rigid transformation (translation and rotation only), even though in the most cases it does not model accurately dependence between overlapped images. The rigid transformation model was selected due to its simplicity, what is important in the real time implementation. Additionally the created mosaic is to be displayed on a low-resolution LCD, hence mosaic imperfection will not be visible. In the described application mosaic is used to guide image acquisition. The high-resolution quality mosaic can still be constructed offline. The algorithm has been implemented on the TMS320C6711 DSK platform from TI. Implementation details along with profiling results are presented in this report. 2 Real Time Image Mosaic Construction

Lit # Contents 1. Mosaic construction algorithm 1.1 Similarity measures 1.2 Pyramid decomposition 1.3 Overlap detection 1.4 Translation estimation 1.5 Rotation estimation 1.6 Polar transformation 2. Implementation of mosaic construction on TMS320C6711 DSK 3. Results 1 Mosaic construction algorithm The mosaic construction process starts at the point when the user decided to take a picture, the camera recognises the user s request for an image and stores it in the camera image memory. While storing and processing the taken image, the camera keeps on shifting images from the acquisition system to the LCD screen. Copies of these pictures are now taken from the video stream and are online matched with the latest image that was taken by the user. This defines two images; image one is always the latest picture taken by the user and image 2 is always the latest image from the live video stream. The developed mosaic construction algorithm is now used to guide picture acquisition. This leads to a possible user interface, the cameras backside LCD can be used to show the latest taken picture (or the already constructed mosaic) in the center of the display and the very latest image from the live video stream will be matched with it. This process is illustrated in figure 1. Real Time Image Mosaic Construction 3

latest image from the video stream latest image from the video stream latest image taken by the user already taken images presented as a mosaic (a) (b) Figure 1: Representation of mosaic based user interface guidance system The user guidance system now has to determine using transformation parameters whether the camera can be moved further or whether it is time to take the next picture in order to get a complete area coverage in the mosaic image. The main function used for Mosaic Construction is a procedure, which takes two, shifted and rotated images and computes their relative position to each other, this is the parameter estimation for the rigid transformation. Rigid transformation is assumed to be sufficiently enough to describe the alignment of two images without taking care of spatial distortion. Image translation and rotation are basic image transformation methods, which can be described by simple vector equations. x' cos( α ) = y' sin( α) sin( α ) cos( α) x tx y t y 1 (1) The rigid transformation is a very simple transformation model, although it does not cover the problem of spatial distortion, it satisfies the needs which are required in order to create the mosaic for picture acquisition guidance. Y' Image A Y α t x ty X X' Image A' Figure 2: Rigid transformation model for mosaic construction 4 Real Time Image Mosaic Construction

Lit # Figure (3) shows the estimation process, one has to keep in mind that the entire algorithm is controlled by a large set of parameters. They appear not in this flow graph, as then the entire illustration would be so complex that it would be hard to analyse. This flow graph is just an illustration of the rough idea behind the method. It points out clearly that several levels of resolution are examined and that rotation and translation estimation work together in an iterative process. Image A latest taken image Image B from video stream On level rotation and translation estimation Pyramid decomposition i:=1 Image A1... An Image B1... Bn select translation vector i and rotation angle i from N previous level estimations On level rotation and translation estimation Translation estimation T Yes top level? No Rotation estimation α T< t Increase payramid level No i= nmax Yes pass the best fit translation vector and belonging angle to user guidance system pass the N best fit translation vector and belonging angle to the next level Figure 3: flow graph of the mosaic construction algorithm As it has been stated before, this method is a compromise between quality and computation speed, the algorithm is controlled by several parameters, which determine quality on the one hand side and also calculation speed on the other hand side. Less quality in this case means a higher error for single translation vectors and for rotation angles, to achieve a high quality mosaic, a non real-time computation in a post-processing procedure has to find the mosaic with the lowest error. This can be done by PC software that can be based on a similar method, but now using a different parameter set and also an error equalisation algorithm and non-linear warping (to correct perspective distortion) to compute a high quality mosaic. Real Time Image Mosaic Construction 5

Image acquisition system (optical system and CCD element) Frame grabber video stream + LCD monitor controller Image memory last taken image Rotation & Translation vector estimation first taken image User guidance system (rotation and translation data processing) Figure 4: Simplified block diagram of the camera acquisition guidance system The rigid transformation parameter estimation starts with a procedure that is applied to speed up computation time, the pyramid decomposition. The idea behind pyramid decomposition is, that in a lower resolution any image similarities can be found much faster as these images are also much smaller. The quality of this estimation process is obviously not the best, so the next level in the pyramid deals with images that have higher the resolution. In these images only the areas around best fit positions from the previous level are being examined. This saves calculation time again as areas, which are not of interest, are not being examined. This leads to an iterative process, if once for a location the best translation fit is found, for this point the best rotation is being searched, as the rotation has changed, the translation will be corrected again, and then again rotation... and so on. This iteration terminates if a certain threshold for rotation or translation is reached. In the actual project work, the thresholding was applied to the translation only. After all the previous level best fits are corrected to new best fits, these points will be brought to the next level of the pyramid. After correcting the translation vector and the rotation angle on the top of the pyramid level, the highest correlation value determines the final translation vector with its assigned angle α. 1.1 Similarity measures For finding the location of a sub image B in another image A, B needs to be compared with A in each position of A. This requires methods that measure the similarity of areas taken from A which are compared with B. To compare image A with image B they both need to have the same size, if either one exceeds the other; areas with the same size as the small image have to be cut out off the bigger image. 6 Real Time Image Mosaic Construction

Lit # The goal is to express the similarity of these two identical sized images with one number, several methods exist but only two are taken for the project. The selection of these methods is not arbitrary. Again, one method is faster but results are of a less quality, where for the other one computation cost is higher, but estimated parameters are more accurate. The methods are the Sum of Square Differences (SSD) and the Cross Correlation Coefficients (CCC). SSD is very simple as it takes the intensity from the same pixel location in A and B, subtracts them and squares the difference. This is done for all overlapping pixel in A and B, the final number that gives the similarity between A and B is the sum of all the squared differences. The functions that define SSD and CCC are given by: c c SSD CCC ( IA xi, yj) IB( xi, yj) ) = ( (2) = i j 2 ( IA( xi, yi ) IA) ( IB ( xi, yi) IB ) 2 2 ( IA( xi, yi) IA) ( IB( xi, yi ) IB) i j i j i j (3) The numerical result delivered by SSD or CCC does not only differ in quality, it is the range of the numbers that make a big difference. The best fit between A and B is indicated by a zero for the SSD. If A and B have different content, the SSD will give number greater than zero as a result. Result of the SSD is dependant on the image size. This means, that for different image sizes different numbers can show the same level of similarity. A size independent correlation is the CCC, it also gives results in a smaller range. The correlation coefficients reach from 1 up to +1 for the best fit. 1.2 Pyramid Decomposition Pyramid decomposition is the creation of n new images out of the original image i 0. This sequence of new images goes from i 1 to i n. The creation rule for this sequence says that image i x+1 has to be an image which is half the resolution in x and y direction of image i x. Starting point is image i 0 with a resolution of m times n pixel. The next image i 1 according to the rule will have a resolution of m/2 by n/2 pixel, and so on. The actual downsizing process is done by taking 4 pixels in a 2x2 area out the source image and calculating the average value, which will then become the intensity for one pixel in the new image. Figure 5 shows the principle of pyramid decomposition on a small original image i 0, which are 8 times 8 pixels "big". The next image size is 4 by 4 pixel and the final image ends up with 2 by 2 pixel. Real Time Image Mosaic Construction 7

Original image i o, size: 8x8 1. downsized image i 1, size: 4x4 2. downsized image i 2, size: 2x2 Figure 5: Pyramid decomposition One can imagine that an image of 2 by 2 pixel can not contain much useful image information, in practice the final downsize yields an image i n with a resolution of about 60 times 60 pixel, such images as it turned out are small enough to gain a fast computation and still contain sufficient image information. The small images shown in Figure 5 are just taken to show the principle of the system (b) (c) (e) (f) (a) (d) Figure 6: Example of the pyramid decomposition The effect the down sampling on an image is shown in Figure (6, a...e). This sequence shows Mr. Jian Kun Shen and Mr. Jan Haenel after a long day in the laboratory. The resolution of the origin image (a) is 640x480 pixels; it goes down to image (f) with 20x15 pixel. This low resolution is just to stress the effect, for practical applications one would stop at image (d) with a resolution of 80x60 pixel. 1.3 Overlap detection The aim of overlap detection is to find the size of the biggest possible square around the centroid (center of gravity) of the overlapping area, that fits into the overlapping area, for subsequent processing of translation and rotation estimation (Figure 7). 8 Real Time Image Mosaic Construction

Lit # Having the co ordinates of the corners of both images, i.e. 8 points, the intersections for all possible segment combinations can be calculated, with respect to local transformation parameters T x, T y and α. Input [Tx Ty], α image size Transform Corners of image 2 ([Tx Ty], α) Calculate segment intersections Select initial valid intersection This leads to 16 intersecting points, whereas at most 8 of them lie on actual segments of both images, i.e. are valid corner points of the overlapping polygon. The algorithm shown in figure 8 finds all polygon vertices of the overlapping area in counter clockwise direction, consisting of segment intersections and actual image corner points. Image 1 Intersection point Calc. Cross product of involved segments [cx cy cz] Select segment from image 1 no cz > 0? yes Select segment from image 2 Overlapping Area rmin xc, yc Image 2 R Select next image corner and appropriate segment no Does this segment have another intersection? Image 2 Select next intersection Image 1 x c y c Is it the initial intersection? no xc yc Output Co ordinates of corners of overlapping polygon counter clockwise Figure 8: Block diagram of algorithm to find polygon vertices R Figure 7: Estimation of centroid and square area for subsequent processing R Using the obtained polygon vertices the centroid and its distances to all polygon segments (perpendicular) can be calculated, the minimum distance r min defines the size of the sub images for subsequent processing. Real Time Image Mosaic Construction 9

1.4 Translation estimation Translation estimation uses one of the similarity measures to determine the best position of a sub-image from image B in the area of image A. To find sub-images that actually can be correlated, this is where the need of centre of gravity comes in. Figure 9(b) and 9(c) show image A and B with vectors c a and c b pointing to the centre of gravity in the respective images. Sub-image B' can now be taken from B around the location where [c bx,c by ] T points to. The maximum size of B' is a user definable parameter. Before a search can take place the sub image B' has to be rotated, to be in the same co ordinate frame as image A. This comes from the fact that image B has actually never been rotated like it is shown in figure 9(a), it always remains as a rectangular image in the memory. Figure 9: t x Image B Image A c ax c bx (b) [c bx,c by] T image center B c by c ay t y (c) [c ax,c ay] T image center A (a) Figure 9: Correspondence between image centers and the centroid of the overlapping area After B is rotated, c b also needs to be corrected by α in order to point to the same image area, now in the rotated image. An area A', bigger than B' is now being taken from A around c a. B' Search region A' Array of correlation results Figure 10: Sliding sub image B in the search area A 10 Real Time Image Mosaic Construction

Lit # Figure 10 illustrates the correlation method for matrices of unequal size. The smaller matrix, in this case B' is positioned at the top left corner of A'. Then, with a so-called sliding window process, B' is being slid over the entire area of A'. For every position from B' in A' the correlation has to be performed, the result is a two dimensional array with the correlation results. The size of the that array is dependant on A' and B'. The size of both A' and B' is user definable, for simplicity A' and B' are usually square. It is preferable to choose B' odd as in this case the centre of B' has integer location Cross Correlation Coeffitiens: Sum of Squared Differences: (a) (b) (c) (d) Figure 11: SSD and CCC results for translation estimation Figure 11(a) shows Mr. Jan Haenel and Mr. Bach on one of the rare breaks during this project work. Figure 11(b) is a sub-image of 47x47 pixel, taken from (a). (c) and (d) show the correlation result for the CCC and the SSD algorithm respectively. Both results indicate a best fit at the expected position. Note: As this correlation is done with images where the content of one image (in this case (b)) was extracted from another image (a), both correlation results seem to give the same quality for the similarity response. This is not the case in a practical application where both images suffer from distortion and different lighting conditions. Not all of the correlation results that are elements in this array are useful. It is only the N highest numbers (where N is the parameter of the algorithm), which are taken to a further processing. Therefore a function has to find correlation values, which are local maxima, this function returns these local maxima values as well as their co-ordinates in the correlation array. These maxima are sorted by the correlation value in that way that correlation result and corresponding transformation appear in a descending order in a table, starting with the highest correlation value. The selection of more than one result for further processing avoids the method to be trapped by a wrong estimated result. Misleading results that are taken upwards will sooner or later be eliminated by estimations with a better match. Real Time Image Mosaic Construction 11

1.5 Rotation estimation Similar to translation estimation the estimation of the rotation angle can be done by measurement of similarity. If both sub images A and B are transformed into polar co ordinates the same measurement as proposed in the previous section could be applied again. Result is a vector containing the cross correlation for each possible rotation angle with respect to the angle step width δϕ of the polar transformation. Figure 12 shows the performance in polar co ordinates (a) and the corresponding operation in Cartesian co ordinates (b). As shown in figure 12 sticking the ends of both polar transformed images together leads to a so called circular correlation. Cartesian sub image 2 Polar sub image 2 Polar sub image 1 0 0 360 720 (a) Polar sub image 1 0 Polar sub image 2 0 (b) Cartesian sub image 1 0 (c) Figure 12: Circular correlation for rotation estimation A fast implementation of the circular correlation can be achieved by using Discrete Fourier Transform. Since the Discrete Fourier Transform of digital signal is circular in any case, each row of the polar images is transformed into the frequency domain using a 1D FFT (Fast Fourier Transformation) algorithm. Corresponding rows (with one of the transform conjugate) are then multiplied and transformed back into the spatial domain using an IFFT (Inverse Fast Fourier Transform). Finally all obtained pixel values for each separate angle are added together to get a vector containing the correlation results. The entire method yields the so called cross correlation. 12 Real Time Image Mosaic Construction

Lit # Polar transformed sub image 1 1D FFT 1D FFT Multiply conjugate 1D IFFT Polar transformed sub image 2 Σ α Figure 13: Method for circular correlation using FFT 1.6 Polar Transformation In a Cartesian co-ordinate system the location of a point p is given by two distance coordinates x and y which are the distances respectively from the co-ordinate systems origin to the point in that system. The same point p in a polar co-ordinate system is described by a polar co-ordinate, which consists of a distance co-ordinate r, and an angle co-ordinate ϕ. The relationship between these two systems can be described as follows x = r cos( ϕ) y = r sin( ϕ) r ϕ = = x² + y² tan 1 y x (4) Real Time Image Mosaic Construction 13

Beforehand, the advantage one gains from representing an image in polar co-ordinates rather than in Cartesian co-ordinates is, that a rotation of an image in the Cartesian co ordinates causes a translation of the corresponding image in the polar co ordinates. This will become obvious when taking a closer look at the transformation. 1 Source image Image area covered by the transformation r Y r 2 r 1 ϕ polar coordinate system shown in an arbitrary position in the image 1 X Image cartesian coordinate system Figure 14: Image polar co ordinates The object of this transformation can be seen as a sampling of the image in angular and radial direction. The result is an image, which shows pixels along the y-axis presenting the samples along the radius; the x-axis shows these radial samples for different angles. The introduction of r 1 and r 2, which define an inner and outer radius limit respectively, is necessary to avoid unnecessary sampling for r<r 1 due to too high sampling density and for r>r 2 due to a too low sampling density. This raises the question for optimised numbers for angle and radian steps. Both can be seen as dependant on the original image and can be set to the needs of the application. Fewer steps are obviously faster to compute, but also cause the loss of estimation resolution. In general the angle steps should be chosen in that way that at least every pixel on the outer circle is covered by the transformation. The same rule should be applied to choose the radian steps; a pixel lying on the way of r should at least once be hit by r. (a) r cartesian to polar transformation ϕ (b) r 30 ϕ Figure 15: Change from rotation to translation using polar transformation 14 Real Time Image Mosaic Construction

Lit # Figure 15 shows the result of the transformation on two input images. Sequence (a) shows an un-rotated square and its polar transformation. Every angle step in this example covers 2 degrees. Origin of the transformation is in both cases the center of the image. Sequence (b) shows the square rotated by 30 degrees counter clockwise. It can be seen that its polar transformation looks similar to (a), but the humps are shifted. 2. Implementation of mosaic construction on TMS320C6711 DSK Figure 16 shows the structure of the final software development environment. The entry point for program executions is like everywhere in C, the main function. For the project work, main is only used to get information about image sizes and has to allocate memory for the images, which will be generated. After the commands of main are executed, the DSP jumps back into the idle loop, this is due to the multi tasking capabilities of the DSP. If there are no processes running that were called by a main function, the processor jumps into the idle loop. Once the processor runs in the idle loop, it permanently checks the presence of interrupts. If a user now transfers a file via Host channel to the DSP, an interrupt is released. An internal interrupt manager sets now a certain bit in a byte, which is called mailbox. If the mailbox received two bits from incoming file transfer interrupts, the BIOS jumps to function mosaic, which is defined in mosaic.c. Function mosaic is the core function for the estimation process. DSP BIOS mosaic.c.dat file #1.DAT file #2 void Read_File(Channel,a 0) {... } void main(void) { /* initialize algorithm */ } global parameters # arithmetic.c Software Interrupts Idle Loop Rigid Transformation parameter estimation.dat file #3 void Write_File(Channel,a 0) {... } mosaic.c t x, α ty (a) (b) Figure 16: Structure of the software development environment Real Time Image Mosaic Construction 15

Profiling results for some of the developed functions are shown in figures from 17 to 23 The written functions were tested as stand alone functions. This was possible by creating a test-bench, which emulates an environment that supports the functions with necessary data. 12 10 8 6 4 non optimized optimized 2 0 128x96 256x192 512x384 1024x768 Figure 17: Pyramid Decomposition, million of clock cycles (Mls) vs. Image size 7 6 5 4 3 2 1 0 45 90 180 270 360 non optimized optimized Figure 18: Polar transformation, radial samples constant=100, MIs vs. angle steps 9 8 7 6 5 4 3 2 1 0 32 64 128 256 non optimized optimized Figure 19: Polar transformation, angle steps constant=180, MIs vs. radial samples 16 Real Time Image Mosaic Construction

Lit # 300 250 200 150 100 non optimized optimized 50 0 128x96 256x192 512x384 1024x768 Figure 20: Performance of rotation, Mls vs. image size One of the key functions, SSD was further measured with optimized assembler coding: 6000 5000 4000 3000 2000 non optimized Ass. non optimized C 1000 0 40 80 120 Figure 21: 8 Bit data load without optimization, number clock cycles vs. number of elements 180 160 140 120 100 80 60 40 20 0 40 80 120 optimized Assembler optimized C Figure 22: 8 Bit data load with optimization, number clock cycles vs. number of elements Real Time Image Mosaic Construction 17

1200 1000 800 600 400 no optimization optimized 200 0 40 80 120 Figure 23: 32 Bit data load, only Assembler, number clock cycles vs. number of elements 3 Results In this chapter some results of image mosaic construction are presented. In figure 24 the mosaic consisting of two images is shown. The mosaic showing the famous Rome landmark is shown in figure 25. This mosaic has been computed using nine pictures. (a) (c) (b) Figure 24. Mosaic computed from two images showing car park outside our Department. (a)-(b) picture taken in the standard single image camera mode ; (c) mosaic constructed from images (a)-(b) 18 Real Time Image Mosaic Construction

Lit # (a) (b) (c) Figure 25: Image mosaic showing the Forum in Rome. This mosaic is constructed from nine images. (a) the first image in the sequence taken in the single picture camera mode ; (b) ninth image in the sequence taken in the single picture camera mode ; (c) mosaic References [1] Stiller C., Konrad J., Estimating motion in image sequences, IEEE Signal Processing Magazine, July 1999, pp. 70-91. [2] Szeliski R., Video mosaic for virtual environment, IEEE Computer Graphics and Applications, March 1996, pp.22-30. [3] Shark L.-K., Matuszewski B.J., Smith J.P., and Varley M.R., Automatic mosaic and construction of three-dimensional shearographic surface images, Proceedings of the IASTED International Conference on Signal and Image Processing (SIP 99), Nassau, Bahamas October 1999, pp.183-187. [4] Matuszewski B.J., Shark L.-K., Smith J.P. and Varley M.R., Automatic mosaicing with error equalisation of non-destructive testing images for aerospace industry inspection, 15 th NDT Word Congress, Rome October 2000. [5] Code Composer Studio User s Guide (SPRU 328). Texas Instruments [6] TMS320C6000 DSP/BIOS User s Guide (SPRU 303). Texas Instruments Real Time Image Mosaic Construction 19

[7] TMS320C6000 DSP/BIOS Application Programming Interface (API) Reference Guide (SPRU 403). Texas Instruments [8] TMS320C6000 Optimizing Compiler User s Guide (SPRU 187). Texas Instruments [9] TMS320C6000 CPU and Instruction set Reference Guide (SPRU 189). Texas Instruments. 20 Real Time Image Mosaic Construction BACK