Image Frame Fusion using 3D Anisotropic Diffusion

Image Frame Fusion using 3D Anisotropic Diffusion Fatih Kahraman 1, C. Deniz Mendi 1, Muhittin Gökmen 2 1 TUBITAK Marmara Research Center, Informatics Institute, Kocaeli, Turkey 2 ITU Computer Engineering Department, 34469, Istanbul, Turkey {fatih.kahraman, deniz.mendi}@bte.mam.gov.tr, gokmen@itu.edu.tr Abstract In this paper, a modified 3D anisotropic diffusion method is proposed to improve the multi-frame image fusion performance. Multi frame image sequence is considered to be composed of aligned and warped images. The goal of this approach is to obtain a restored image from the aligned and warped image sequence, where alignment error and Gaussian noise are reduced. The proposed method consists of medium band stack filter and tree-structured 3D diffusion filter. I. INTRODUCTION The usage of the surveillance camera is rapidly increasing day by day. For any security reason, it may be required to obtain one single and improved image either in resolution or in visual quality from different scenes of the video data. In addition, a poor quality video data suffering from investigation of a certain object may be utilized to get several images from the same scene and after processing for the enhancement of these images, it is possible to get a better quality picture. Several recent studies are centred on this issue [1-8]. The most successful methods for the spatiotemporal image restoration are increasingly used to enhance the quality of low-resolution video or image sequences. Such applications are crucial and bring an important add-on to forensic investigation as evidence [3]. In commercial applications, such as face recognition, car plate identification or any object investigation, a common approach is to use the spatio-temporal information of the video sequence. In order to increase the performance of such applications, it is possible to derive the restored image information from a multi frame image sequence which may be gathered from the consecutive frames of different scenes. In this work, we focus on restoring the image from a multi frame image sequence. The multi frame is supposed to be composed of the frames which are gathered from consecutive frames of different scenes of the video data. The objects in these frames are all aligned and warped to same reference shape. Therefore, it is expected to have alignment artefacts and additive noise which will be removed by 3D anisotropic diffusion filtering. In section 2, the warping algorithm, frame fusion and 3D diffusion process are discussed. Section 3 is devoted to the proposed method. The results of the proposed method are presented in Section 4. Section 5 describes the future plans and conclusions. II. MULTI-FRAME IMAGE RESTORATION The multi frame image restoration can be achieved in different ways. One is a data-driven approach where several image frames of the same objects are aligned to the reference frame. Aligned image sequence restoration can be established after eliminating variation stemming from pose and image perspective. This elimination is in the current work carried out by Annotating prominent object (i.e. face) features, (See Figure 1.a). Filtering out effects stemming from affine variations (translation, rotation and scaling), by a piece-wise affine warp onto a reference shape. Following this step, restoring the image from a multi frame image sequence can now be achieved easily by using of these shape-compensated images. Notice that multi frame is supposed to be composed of the frames which are gathered from consecutive frames of different scenes and they are warped to the same reference shape. All warped image frames are composed to obtain an image volume (See Figure 1.c). A. Image Warping The object under investigation in a video stream is modelled by a triangular based approach. The shape of the object is labelled by the fudicial points (landmarks). Relying upon the landmarks, a triangulated mesh is produced for the reference position and orientation of the object. By means of the image sequence, a piecewise affine warping [8][9] [10] is defined between corresponding triangles (See Figure 1.a and 1.b). In this study, warping is a crucial step, since image volume cannot be created unless the individual frames are warped. In particular, a piecewise affine warping is defined for a pair of triangles. Considering that for a triangle in the training mesh set of, there exists a corresponding triangle in mesh. This can be summarized; For any pixel, in, determine the triangle in which it is suited. Warp, to the triangle in, by means of the affine warping. Affine warping rules are defined by an input image sequence. In real implementation, it is possible to insert fudicial points shifts due to the poor resolution or high noise in the image. This causes alignment errors to the warped images.

B. Frame Averaging The multi frame image enhancement can be achieved by averaging the frames of the image sequence. This is a useful technique for reducing the additive noise. However, frame fusion techniques which are especially based on averaging methods cause smoothing effect. Therefore, significant deblurring improvement can further be achieved by image restoration filters. The multi frame averaging can thus be expressed as follows;, 1,, 1 where denotes each frame of the multi frame sequence. The number of frames in multi frame sequence is taken as. is the spatial deblurring filtering applied to each. The averaged frame is denoted by. Here, is taken as median filtering, Wiener filtering, 2D anisotropic diffusion filtering and unweighted averaging where is taken as identity function. The results of these methods are discussed in the Section 4, Experimental Results. Figure 1: Generation of image volume using warped image frame; a) annotated landmark on image frames, b) triangulated mesh based on the landmarks, c) image volume is formed up using warped images. (The right most figure in (c) shows half of the image volume) The frame averaging has the benefit of reducing the additive noise whereas Wiener filter deblurs the image by means of the point spread function which represents the blurring characteristics. Similarly, the other non-linear denoising filters, such as median and anisotropic diffusion filters, are also implemented in order to enhance the high frequencies of the image. On the other hand, removal of the additive noise by averaging process results in smoothing the high spatial resolution. Regarding that each frame is corrupted by zero-mean additive Gaussian noise with variance, the estimate of variance equals to, where stands for the number of frames. Decreasing the variance of the restored image as a result of the averaging brings smoothing effect whilst details are (a) (b) (c) often enhanced by deblurring filters. This two-fold problem gets more complicated for aligned and warped multi frames due to the incorrectly located fudicial marks and geometric deformations caused by warping methods. The later sections are devoted to handle such typical problems. C. 3D Anisotropic Diffusion Filter Anisotropic diffusion filter gives the flexibility of smoothing the image while keeping the high frequencies. Relying upon this property, anisotropic diffusion filter is implemented for restoration of the multi frame misaligned image sequence with additive noise. The diffusion process equilibrates the concentration differences throughout the image or similarly volume surface. The nonlinear 3D anisotropic diffusion filter equation is expressed below;,,,,,, (2),,,,,,,,, (3) where,,, is multi frame image sequence whereas is for the iteration time, not for the temporal indices. In the physical process of diffusion, corresponds to the flow function. is controlled by conduction coefficient and gradient vector. The flow rate in diffusion equations is controlled by the conduction coefficient by means of the flow constant, such that flow increases with the gradient strength,. This property of diffusion process is crucial in image enhancement. The flow is increased in smooth regions, where. On the other hand, the edges of the image are preserved by limiting the flow rate where. Therefore, behaves as the level under which the image is smoothed and over which the edges are preserved. In the meantime, noise is filtered based on the local structures of the image. This yields increasing the signal-tonoise ratio with no significant distortions through the edges. The anisotropic diffusion filtering has also some drawbacks. The diffusion process is somewhat an averaging process over the neighboring pixels. Due to this behavior, the image gets smoother as the iteration step increases or similarly the diffusion process denoted by in Eq(2) and (3). In this paper, 3D anisotropic nonlinear diffusion is implemented to multi frame image sequence. Each frame is warped from original images and they are aligned in a multi frame sequence. It is not considered that a temporal motion occurs between the frames. On contrary, it is considered that the edges are shifted due to the warping errors in which it is expected most likely to have jitter-type error around the edges of the image. Taking into consideration both the warping error and the additive Gaussian noise in multi frame sequence, 3D anisotropic diffusion filter is proposed to derive one single restored image. Since the 3D diffusion filter decreases the smoothing operation at the boundaries of the object, this can be exploited by setting different values to each unit of the Cartesian coordinates.

III. TREE-STRUCTURED 3D DIFFUSION FILTERING In this paper, the aligned image sequence is concerned for image restoration. Each frame of the aligned image sequence has an object which is warped to a certain position by means of indicated fudicial points on the original image. Therefore, incorrect locating of fudicial point results in wrong warping and the geometry will somehow be deformed. This will also yield that the edges of the warped images through the coordinate of,, consists of jitter error. This is obvious due to the fact that the fudicial marks are mostly located along the boundaries of the objects. A. Medium Band Stack Filter In order to improve the performance of the 3D diffusion filter, the image pixels are sorted along the -coordinate of the multi frame image sequence and the far end frames of the pixel-based re-ordered image sequence is truncated. This approach is named as Medium Band Stack Filter (MSB) by the authors of the paper. MBS is expressed as follows;,,,,, 1,,; 1,,; 1,, (4),,,,,,, 1 (5) operation indicates the sorting in an ascending order. By means of the sorting process, the one-dimensional vector along the -coordinate is sorted. In the following step as expressed in Eq(5), two certain slices from the each end of the image sequence which are denoted by, are removed out. The image sequence size is therefore truncated to by by 1 2 volume vector. One of the advantage of the medium band stack filtering as a pre-processing step before 3D diffusion filtering is the removal of the impulsive errors from the image sequence. Due to the sorting in an ascending order, the cropped away frames are almost the dark and light frames with insignificant object textures. Therefore relatively less critical information and highly noisy frames are removed. Another advantage maintained by sorting operator is that include the nearest intensity along the -coordinate due to the sorting operation and thus 3D diffusion process adapts itself to the steeping intensity levels not for the intensity variations caused by alignment and warping errors. B. Tree-Structured Diffusion Filtering The three-dimensional diffusion process is implemented to the cropped multi frame image volume,. It is empirically observed that partitioning the volume into sub regions along the -axis and carrying out the diffusion process at separate regions improves enhancement quality in terms of signal-to-noise ratio. The diffusion filter representation is denoted as follows;,,,, (6) signifies the i th diffusion process implemented to the region R j, where L j is the number of frames in R j. The 3D diffusion filter kernel is chosen as 3x3x3 volume. The conduction coefficient in Eq(3) is defined for 26 directions, separately. The dependence of is developed such that for the three main axes, there exists and. The coefficients are controlled by these two values. Based on the notation given in Eq(6), the tree-structured diffusion process is depicted in terms of the following steps; 1. The truncated image sequence volume is divided into three sub-regions. The boundaries of the regions are due to the locations where the second derivative of the image covariance is vanished. Each region is diffused;,,,,, (7) are derived. Throughout this step of algorithm, is chosen a high value in order to smooth out along the -axis. 2. The second step is focused on smoothing the side regions with respect to each other and thus a new reordering of the side regions is proposed as it follows;,,,,,,,,,,,, (8) and are the number of frames in the regions and, respectively. The second step diffusion process attempts to conduct flow along the far pixel values at equal distance to mid-point. Therefore they attempt to equilibrate each other. Following this step of diffusion, the volume frames are reordered to their original positions. This step of diffusion process is depicted as, and, 3. In the last step of the algorithm, all processed regions are diffused and this step is represented by the following expression;,,,,, (9) This step aims to enhance the image sequence by merging all diffusion processes of all regions. Tree-structured anisotropic diffusion filtering output is averaged in order to provide single restored image. One benefit of the tree structured algorithm is not only to get one solution but also the sub-region diffusion filter averaged images are possible to be alternative solutions in the context of image restoration. Therefore, it is possible to choose one of the best restored image among the listed ones;,,,,,, (10) where denotes the average operation as expressed in Eq(1). One of the drawbacks of this proposed method is that it requires at least 10 frames to be processed. It is however a useful technique for the aligned images from a video sequence where it is potentially available to obtain more samples of the object. In case of limited number of frames, it is possible to decrease the number of regions. On contrary, if there are enough amount of frames, it is possible to extend the tree branches by increasing the number of the regions.

(a) (b) (c) (d) (e) (f) (g) (h) Figure 2: The checker board test sequence. a) and b) are two successive frames representing the alignment error. The crossings of the white and black squares are randomly shifted within ±2 pixels. c) presents a randomly selected frame from the test sequence which is corrupted by Gaussian noise with variance of 0.35. d) Averaging of 100 frames (SNR 5.56dB). e) Median filtering (SNR 7.56dB) and f) Wiener filter (SNR, 5.44dB). g) the average of the 2D diffusion filtering of frames of the test sequence (SNR 5.56dB). h) 3D diffusion filtering of the whole test sequence (SNR 5.51dB). (a) Figure 3: Tree-structured 3D Anisotropic Diffusion Filtering results are presented; a) presents output derived from all regions (10.08dB) b) is from Region 2 (11.43dB). IV. EXPERIMENTAL RESULTS In this section, we compare the distortion values obtained from the experimentation using three set of test data. The denoising performance of the proposed method is presented and discussed throughout this section while other image restoration techniques are also presented in order to give a comparison about the de-noising performance. In this context, frame averaging, median filtering, Wiener filtering, 2D/3D anisotropic diffusion filtering are implemented to the test data. A. Test Data a) Checker Board Data: Incorrect alignment is simulated by randomly shifting the boundaries of the squares up to 2 pixels. Test data is corrupted by Gaussian noise with variance of 0.35. The number of the frames is 100 and the size of each frame is 160x160 pixels. Figure 2.a and 2.b represent the misalignment error. Figure 2.c shows a sample frame of the checker board within test video stream. b) Pointer Data: The scope of this test data is to present the performance improvement of the proposed method on a real video. The recorded video stream consists of two triangle forms and a line of text from several view angles (Fig 4.a). The video frames are corrupted by Gaussian noise with variance of 0.2. By means of the landmarks as shown in Figure 4.b, the alignment and warping the object under scope is achieved. The incorrect alignment due to the annotation error is depicted in Figure 4.c/d/e, which is likely undulation and unsharp edge effect. The test stream consists of 90 frames with the size of 150x140 pixels. c) Face Data: The purpose of using such a test data is to implement the proposed method to a real noisy video which has a more complex object. The person in recorded video (b) slowly turns his face by 45 o. Gaussian noise with 0.15 variance is contaminated to test stream. 73 landmarks as shown in Figure 1 is utilized in order to annotate the face. 38 frames with the size of 78x68 pixels are included within the test stream (Fig. 5.b) B. Performance Measurements The performance of the proposed method is measured by the signal-to-noise ratio. The SNR is defined as follows 10 log (11) where x is the observed image which is under performance evaluation and x gt is the ground-truth image. The groundtruth checker board data has no alignment error and no additive noise. On the other hand, the ground-truth image for the face data set and the pointer data set are regarded as the image on which each individual image from the test sequence is warped. Figure 4.a and 5.a depict the groundtruth image for the pointer and face test data, respectively. C. Results of Tree-Structured 3D Diffusion Filtering Before diffusion filtering, MBS (medium band stack) filtering is achieved on the board test data (See Figure 2). Following the MBS step, the checkerboard output image volume becomes such that the initial frame of the image sequence becomes a very dark image with larger black squares and through the -axis, white squares get larger and the pixel intensities increase to white level. The noisy parts of the image frames are pushed to the far sides of the frame sequence and therefore the less meaningful portions of the sorted image sequence are cropped away as indicated by in Eq(5). The choice of in all test data is 0.2. In this paper we proposed 3D diffusion method in a tree structured manner such that diffusion filtering is applied at each region or in other words at each level of the tree. Diffusion filtering is processed with separate set of and flow constant vector. The result of the proposed 3D diffusion filtering does not need to be necessarily a unique solution, but it is possible to choose the highest quality image among the sub-region diffusion filter outputs as well. This requires the experience of the user which is almost the case in the field of forensic evidence evaluation or medical imaging applications.

(a) (b) (c) (d) (e) (f) (g) (h) (i) Figure 4: a) is the ground truth image b) represents the landmarks utilized for triangulation c-d-e) are three samples of aligned images f) the input test sequence by which the performance measurements are handle. The variance of noise is 0.2 g) the conventional 3D diffusion filtering (6.33 db) h) the proposed filtering method result (7.26 db) i) Region 2 result (7.53 db). Image restoration performance of frame averaging (Fig 2.d), median filtering (Fig 2.e) and Wiener filtering (Fig 2.f) are presented in Figure 2. As in median filter approach, it is possible to show that dark and light regions of the image are recovered. However, the edges are smoothed along the misalignment region (Fig 2.e). Wiener and averaging performs the smoothing overall the image (Fig 2.d and 2.f). Figure 2.g and Figure 2.h represent that the averaging of 2D diffusion and the 3D diffusion perform quiet similar results. SNR values in Table 1, the frame averaging and Wiener filter performs better than median filtering with around 1dB of SNR improvement. (a) (b) (c) (d) Figure 5: The proposed method performance is presented by the face data. a) is the ground truth image, b) is the noise corrupted frame (var=0.15), c) Tree-structured 3-D Anisotropic Diffusion filtering result ( ), d) region 2 result for the tree-structured 3D diffusion filtering (, ). The nose and mount are zoomed and presented in the same order. The well-known image restoration methods perform a valuable amount improvement in SNR term. The SNRs of the checker board are distributed between 0.99 and 2.21 db. The pointer test data consists of image frames between 1.02 db and 3.12 db. In the context of face data, each frame of the face image sequence has an SNR measurement varying between 5.12 db and 12.15 db. Similar improvements in SNR term is achieved by the image restoration methods which is approximately at least 3 db of SNR improvement. Table 1, tabulates the SNR observations for the checker board, pointer and face data set. The proposed method of 3D diffusion filtering is tested by three image sequences. Smoothing the intra-regions and preserving boundaries at the inter-regions property of diffusion filtering is emphasized by the proposed treestructured diffusion filtering. This can be observed on the checker board image that is shown in Figure 2. (a) Figure 6: a) The improvement in SNR between 0.05 and 0.7 noise variance and b) the length of frame sequence are compared. Triangle stands for the proposed tree structured 3D diffusion method, square for averaging and circle for conventional 3D diffusion method. Figure 3.a shows the overall output of tree-structured 3D diffusion, that is in Eq(9), whereas 3.b depicts,. Both of the figures have sharp edges and black/white region intensities are closer to their original levels. The SNR measurement for is 10.08dB and for, 11.43dB. This yields that the improvement in SNR measurement with respect to the frame averaging method is 4.52 db and 5.87 db, respectively. The pointer data set performs 7.26 db for, and 7.53 db for,, where it has to be remarked that conventional 3D anisotropic diffusion is about 6.30 db which is almost very close to the direct frame averaging method. Therefore, these SNR measurements designate that tree-structured 3D diffusion method introduces about 1.23 db of improvement above the conventional approach. The SNR improvement in pointer data set is less than the checker board data set. This issue is acceptable, since more severe alignment error is introduced. Fig 4.c, 4.d and 4.e are three aligned and warped image frames. Figure 4.g presents the conventional 3D diffusion filtering. Figure 4.h and 4.i are and,. The edges are more emphasized and the homogeneous regions are smoothed. The text line gets readable. The and, of face data set are shown in Figure 5.c and Figure 5.d, respectively. Both of the images are visually improved. The SNR measurements are also improved such as is 16.13 db and 16.28 db for,. The frame averaging SNR measurement is 14.98 db. The visual inspection on the diffused images shows that the noise is (b)

filtered out while keeping the details on the both of the images. Based on the experimental results, it is observed that tree based 3D anisotropic diffusion combined with medium band stack filtering outperforms than the other methods for the multi frame image sequence which consists of alignment artifacts and significant amount of additive noise. Table 1. The SNR measurements obtained from proposed 3D diffusion filtering and from other image restoration methods. The SNR measurements are tabulated for checker board, pointer and face test data set. Methods Checker Board SNR Pointer Face Min individual frame SNR 0.99 db 1.02 db 5.12 db within the test stream Max individual frame SNR 2.21 db 3.12 db 12.15 db within the test stream Frame Fusion (Average) 5.56 db 6.16 db 14.98 db Median Filter 7.56 db 6.27 db 13.05 db Wiener Filter 5.44 db 6,37 db 13.91 db (x 0,σ 0.5 Wiener Filter 5.40 db 6.42 db 14.98 db (x 0,σ 0.15 2-D Anisotropic Diffusion 5.56 db 6.37 db 14.87 db 0.2 1 iteration 3-D Anisotropic Diffusion 5.51 db 6.33 db 14.69 db 0.2 3 iterations 3-D Anisotropic Diffusion 5.48 db 6.30 db 14.13 db 0.2 5 iterations Tree-structured 3D 10.08 db 7.26 db 16.13 db Anisotropic Diffusion Tree-structured 3D Anisotropic Diffusion (Region 2) 11.43 db 7.53 db 16.28 db The effect of noise variance to the performance of the tree-structured diffusion filtering is presented by Figure 6.a. As the noise variance gets above 0.1, approximately 2 db of additional SNR improvement can be observed with respect to the conventional methods such as frame averaging and 3D diffusion filtering which presumes no region based tree structure model. It is worth of noting that above 0.35 of noise variance, annotation process gets hard due to obscured landmarks. As the length of frame sequence increases, the SNR also improves. The frame length effect on performance improvement is shown in Figure 6.b. From 20 frames to 90 frames, approximately 1 db improvement can be observed. This outcome also presents that even down to 20 frames, the proposed method may have meaningful results. introduced to the sequence of images. The proposed method in future should be enriched for an adaptive way of region selection. Additionally, the image enhancement may be further improved relying upon the image content. However, the initial results show that this proposed method is a candidate for investigation of a certain object in case of gathering the information from different video scenes. ACKNOWLEDGMENT We are grateful to Dr. Binnur Kurt for his helpful comments and for his early contributions to our ideas. This work is supported by the National Scientific and Research Council of Turkey, project no: 108G002. REFERENCES [1] S. John, M. A. Vorontsov, Multi-frame Selective Information Fusion from Robust Error Estimation Theory, IEEE Trans. on Image Processing, Vol. 14 (5), pp. 577-584, 2005. [2] F. Wheeler, X. Liu, and P. Tu, Multi-Frame Super-Resolution for Face Recognition, Proceeding of IEEE Conference on Biometrics: Theory, Applications and Systems (BTAS), pp. 27-29, 2007. [3] D. Thomas, K. W. Bowyer, P. J. Flynn, Multi-frame Approaches To Improve Face Recognition, Proceedings of the IEEE Workshop on Motion and Video Computing (WMVC '07), pp. 19-19, 2007. [4] M. K. Ozkan, A. T. Erdem, M. I. Sezan, A. M. Tekalp, Efficient multi-frame Wiener restoration of blurred and noisy image sequences, IP(1), No. 4, pp. 453-476, 1992. [5] E. Dubois, S. Sabri, Noise reduction in image sequences using motion- compensated temporal filtering, IEEE Transactions on Communications, Vol. 32, pp. 826-831, 1984. [6] B. K. Gunturk, Y. Altunbasak, R. M. Mersereau, Multi-frame information fusion for gray-scale and spatial enhancement of images, ICIP03, Vol. 2, pp. 319-322, 2003. [7] INTEL, Video Image Reconstruction and Enhancement: A Terascale Computing Application, INTEL White Papers, 2007. [8] F. Kahraman, B. Kurt, M. Gokmen, Robust Face Alignment For Illumination and Pose Invariant Face Recognition, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2007), Workshop on Biometrics, pp.1-7, 2007. [9] C. A. Glasbey and K. V. Mardia. A review of image warping methods. Journal of Applied Statistics, Vol. 25 (2), pp. 155-171, 1998. [10] M. B. Stegmann, B. K. Ersbøll, R. Larsen. FAME - A Flexible Appearance Modeling Environment. IEEE Trans. Med. Imaging, Vol. 22(10), pp. 1319-1331, 2003. V. CONCLUSION AND FUTURE WORKS We described a multi-frame image restoration method, based on the anisotropic diffusion process. The multi frame image sequence is supposed to be composed of warped and aligned images. We presented a tree-structured 3D diffusion filtering process which is combined with the medium band stack filtering. The proposed method improves the image restoration quality for the image sequence which is corrupted by additive noise and alignment artifacts are