GOPRO CAMERAS MATRIX AND DEPTH MAP IN COMPUTER VISION

Size: px

Start display at page:

Download "GOPRO CAMERAS MATRIX AND DEPTH MAP IN COMPUTER VISION"

Meghan Simon
5 years ago
Views:

1 Tutors : Mr. Yannick Berthoumieu Mrs. Mireille El Gheche GOPRO CAMERAS MATRIX AND DEPTH MAP IN COMPUTER VISION Delmi Elias Kangou Ngoma Joseph Le Goff Baptiste Naji Mohammed Hamza Maamri Kenza Randriamanga Dimby Marchetti Thibault Renauld Vincent Group 1

2 Contents 1 Abstract 2 2 Problem Statement From disparity to depth From optical flow to depth Additional problems Estimation methods based disparity Computing the disparity Inter-camera rectification Illumination variation and occlusion Achieved results Estimation methods based optical flow Computing the optical flow Computing the depth Achieved results Conclusion 9 1/10

3 1 Abstract The depth information is at the heart of various 3D applications such as autonomous driving, augmented reality or navigation. The depth of an object can be estimated by means of different techniques. Stereo imaging using GoPro cameras was privileged because of its accessibility and cost-efficiency among other reasons. This project is part of a broader one which goal is to achieve a high resolution depth map estimation system using a matrix of 9 GoPro Cameras (3x3-1920x1080). 2 Problem Statement 2.1 From disparity to depth A common method for extracting depth information from intensity images is to acquire a pair of images using two cameras displaced from each other by a known distance. Disparity refers to the difference in location of an object in corresponding images (left and right) as seen by the left and right eye. In a pair of images derived from stereo cameras, the apparent motion in pixels for every point can be measured and an intensity image is made out of the measurements. A disparity map refers to the apparent pixel difference or motion between a pair of stereo images. Figure 1 From disparity to depth 2.2 From optical flow to depth There is a second approach in order to obtain depth information. Instead of computing disparity, the optical flow can be analyzed. The optical flow is a quantity which corresponds to a displacement. In stereo vision (two cameras) the optical flow is equal to the movement measured between two similar pixels in the two different pictures from the same scene. In contrast to the disparity calculation, the camera does not need to be perfectly aligned on a plane. Once the optical flow is calculated, two motions are extracted, the plane motion and especially the parallax motion, which contains the depth information. 2.3 Additional problems When dealing with wide-angle cameras such as the Gopro used, distortions appear. These are called radial distortions and are caused by the high distance between the edge of the camera lens and the optical center. This results in the curvature of straight lines, the further the lines are from the optical center the more curved they are. To solve this problem the internal parameters of the Gopro camera must be estimated. The distortion coefficients are related to parameters, each camera has its own parameters, and they can be estimated using different sights taken with the Gopro and a toolbox in Matlab. Several acquisitions of sights are required to get a satisfying estimation and they have to be taken at different angles and positions. 2/10

Figure 2 Example of sight used Following this process the Matlab toolboxes were used again to rectify the image. An example can be seen in the following figure.

1 Computing the disparity Detecting conjugate pairs in stereo images can be extremely challenging since for each point in the left image, the corresponding point in the right image has to be found.

4 Figure 2 Example of sight used Following this process the Matlab toolboxes were used again to rectify the image. An example can be seen in the following figure. Figure 3 Correction of the distortion 3 Estimation methods based disparity 3.1 Computing the disparity Detecting conjugate pairs in stereo images can be extremely challenging since for each point in the left image, the corresponding point in the right image has to be found. To determine these two points, one in each image, form a conjugate pair, it is necessary to measure the similarity of the points. Thus, before stereo matching, it is unavoidable to locate features which can be matched. Obviously, the points in a uniform region are not good candidates for matching. The variances computed in one direction using all pixels in a window centered about a point are good measures of the distinctness of the point along the direction. The directional variance is given by : I = [f(x, y) f(x + 1, y)] 2 (1) (x,y) S where S represents the pixels in the window. To estimate disparity of points beween two images, two estimators were used: the first one is the minimum mean square error (MMSE) : disparity(x, y) = argmin [f(x, y) g(x + d, y)] 2 (2) d D (x,y) S where f and g are the two images and D is a range of possible values of disparity. However this method is not chosen due to more accurate results obtained with the method of normalized cross correlation (NCC) : disparity(x, y) = argmin d D 1 n x,y S [f(x, y) f][g(x + d, y) ḡ] σ f σ g (3) 3/10

5 where n is the number of pixels in S, f and g barred are the mean of each subimages and the sigmas their standard deviations. However the map computed by block-matching based on NCC is not as accurate as wished because. That is why after computing a first map an iterative algorithm is used aiming to smooth the map, dividing the error pixel percent by four. Illumination variation is then taken into account. 2 solutions were tested : the first uses histogram transfer to fix illumination before computing map disparity and the second simultaneously computes disparity and illumination variation providing maps of disparity and illumination. After tests, the first method is selected. In multiview-case, three images are considered, the middle image is taken as reference. Considering multiple images at once helps with overcoming some issues such as occlusion, and the disparity map obtained happens to be more reliable. To assess the reliability of a disparity map, a comparison was made between the ground truth disparity map and the resulting one. This comparison is quatified by means of errors per pixel. 3.2 Inter-camera rectification The rectification of cameras is basically the transformation that should be applied to the images taken by the 3 3 cameras matrix in order that every pixel of every image is aligned along the horizontal axis. The basic approach is to rectify image 2 by 2. The approach of Vincent Nozick [4]. is using all images from the the matrix cameras at the same time rather than pairwise image rectification. The multi-view rectification, as Vincent Nozick showed, is the transformation of the images in order for them to belong all to a common image plane. In other terms, the rectification process consists in finding a transformation that can be applied to a group of cameras so that their focal plane become coplanar. The particularity of this method is there is no need to know any intrinsic parameters of the cameras and it only requires the knowledge of matching points, also called key points, between the views. Key points are in fact invariant features from every image that can be reliable to match between different views of an object for example. They are invariant to scale, rotation, additional noise and illumination changes. The algorithms SIFT provide a clear method to detect these key points and with the RANSAC algorithm the match can be implemented. If P i = K i [R i C i ] is the projection matrix of each camera, the purpose of this rectification is to find a transformation allowing all focal image planes to be coplanar. The new projection matrix of each camera after the rectification is P i = K i [R i C i ]. The transformation can be assumed as a homographic mixing first of all, a rotation around the optical center, then an update of the focal length. It can be expressed as: where R i = R i R 1 i H i = K i R i K 1 i (4) is the rotation applied to P i in order to have the same orientations as P i. The purpose is to find K i and R i so that the key points are aligned horizontally and vertically with their matches. In this context, two types of rectification were faced: Horizontal rectification : (H i x i k) y y k = 0 (5) Vertical rectification : (H i x k i ) x x k = 0 (6) y k represents the vertical coordinate of the rectified point k on each view. A non-linear process was chosen by optimizing K i and R i to satisfy the equation of vertical rectification by minimizing the residual error over all the corrected points : e = (H i x i k) y y k. The minimizing process is then simplified by reducing the parameters to just the focal length. Hence, the camera s internal parameter matrix is characterized by: 4/10

f i 0 w i 2 0 f i h i 2 0 0 1 w i and h i are respectively the width and the height of the image in the i th view and f i = f i 03α i.

3 Illumination variation and occlusion Illumination can have a significant impact on the appearance of surfaces, as the patterns of shading, specularities and shadows change.

6 f i 0 w i 2 0 f i h i w i and h i are respectively the width and the height of the image in the i th view and f i = f i 03α i. f i 0 value of the focal length and α i the ratio between the current focal length and the initial value. is the initial 3.3 Illumination variation and occlusion Illumination can have a significant impact on the appearance of surfaces, as the patterns of shading, specularities and shadows change. For instance, some images of a baby under different lighting conditions are shown in the following figure: Figure 4 Lighting difference between two images In order to estimate the illumination variation between two images, the division between these is operated. The results is shown in the following figure: Figure 5 Illumination variation between two images The occlusion between two stereo images represents the obstructions which the baby hides, that cannot be seen from one image to another. The occlusion is handled with two constraints of the disparity map and epipolar geography: Ordering constraint : It means that the pixel order of the left image along the epipolar line must be the same as that of their correspondents. Uniqueness constraint : Two pixels from one image must have at most one matched pixel from the other image. 5/10

Once this step completed, the "multi-rectification" algorithm provided by Vincent Nozick computes the transformation to be applied using the matching-points.

7 3.4 Achieved results Vincent Nozick s algorithm must be applied in order that cameras focal planes are aligned along the horizontal axis. OpenCV on Matlab provides both SIFT and RANSAC algorithm, programs that define key points on a first image and then match them with relative key points on a second image. Once this step completed, the "multi-rectification" algorithm provided by Vincent Nozick computes the transformation to be applied using the matching-points. Using a 3 by 3 matrix of cameras, each horizontal triplet will be considered and their focal plane aligned. Let us consider one of those triplets : Figure 6 Initial triplet First of all, characteristic points on each of the 3 images need to be found and to be matched, this is the goal of SIFT algorithm. RANSAC algorithm is also needed in order to remove possible aberrant points. It works on the keypoints detected by SIFT, RANSAC simply keeps pertinent points, called "inliers", and rejects aberrant points called "outliers". The separation occuring improves the robustness of the key-points matching phase. Figure 7 Matched points It can be seen that the algorithms used were able to determine points of correspondence between the images. Finally the homography, computed using the matching-points, has to be applied to each image. 6/10

Figure 8 Rectified points It finally can be seen that corresponding pixels are horizontally aligned.

View Image Rectification [4], Camera array image rectification and calibration for stereoscopic and autostereoscopic displays [3].

The results are shown in the following image: Figure 9 Illumination variation corrected with histogram transfer The constraints have

8 Figure 8 Rectified points It finally can be seen that corresponding pixels are horizontally aligned. Please refer to the attached articles for further information : Epipolar Rectification for Autostereoscopic Camera Setup [2], Multiple View Image Rectification [4], Camera array image rectification and calibration for stereoscopic and autostereoscopic displays [3]. The illumination variation is corrected through histogram transfer : the histogram of the first image is exported to the second image. The results are shown in the following image: Figure 9 Illumination variation corrected with histogram transfer The constraints have been implemented in the theoretical and estimated disparity maps, hence the results in the following figure: Figure 10 Theoretical, estimated disparity and occlusion maps 7/10

4 Estimation methods based optical flow 4.1 Computing the optical flow The optical flow was estimated using Matlab s vision toolbox with OpenCV.

Then the result obtained was used to estimate the quantity of movement which are vectors, Vx and Vy, which are related to the movement on the horizontal axis and the movement on the vertical axis.

9 4 Estimation methods based optical flow 4.1 Computing the optical flow The optical flow was estimated using Matlab s vision toolbox with OpenCV. Among the functions supplied by this toolbox one can find the pixel blocks that are not at the same position in two images of the same scene. Then the result obtained was used to estimate the quantity of movement which are vectors, Vx and Vy, which are related to the movement on the horizontal axis and the movement on the vertical axis. Figure 11 Two different views from the same scene The following figure is a representation of the optical flow : the bluer the area, the more movement within the images, the red areas are those that did not moved. Figure 12 Optical Flow 4.2 Computing the depth In stereovision the position of the two cameras can bound by a homography. This homography is an addition of a rotation and a translation. The position of any pixel in the first image can be estimate by applying the homography to the corresponding pixel from the second image. The optical flow can be divided into a plane and a parallax motion (Eq.(7)). The depth information is included in the parallax motion. ω : optical flow ω π : plane movement µ : parallax movement ω = µ + ω π (7) The plane motion corresponds to the displacement between identical pixels from the first image and pixels from the second one, once applied the rotational component R of the homography and so can be computed according to Eq.(8) and Eq.(9). Indeed a pure rotation does not provide any parallax. 8/10

ω π = p i p ωi (8) p i : a pixel from the first image p i : the same R rotated pixel in the second image p ωi = H p i (9) In order to estimate the absolute depth information directly, an epipole need

10 ω π = p i p ωi (8) p i : a pixel from the first image p i : the same R rotated pixel in the second image p ωi = H p i (9) In order to estimate the absolute depth information directly, an epipole need to be calculated, which is difficult to obtain. However the relative depth of a pixel regarding a known depth can be computed Eq.(10). Providing a reference point the relative depth is much easier to compute. γ i γ 1 = µt i (pwi pw1) (10) µ T 1 (pwi pw1) Eventually, the addition of a second reference point provide an enhanced precision. (11) Z map = γ2 γ 1 γ i γ 2 (11) 4.3 Achieved results Using this method decent estimations of the depth map were obtained as shown in the figure below. However the resolution is not optimal due to approximations used in order to compute our results. Figure 13 Input image and corresponding depth map 5 Conclusion Exploring different methods to estimate depth allowed us to compare the results of each method as well as comparing the pros and cons of these methods. The depth map obtained using optical flow is less accurate than the results of the disparity. However the procedure used to estimate the optical flow is easier to generalize to nine cameras whereas with the disparity method it appears difficult to go further than 3 cameras. References [1] Yann Dumortier André Ducrot. Real-Time Quasi Dense Two-Frame Depth Map For Autonomous Guided Vehicles. In: (2013). DOI: 9/10

11 [2] Fabrice Boutarel and Vincent Nozick. Epipolar Rectification for Autostereoscopic Camera Setup. In: EAM Mechatronics Yokohama, Japan, Nov. 2010, pp [3] Vincent Nozick. Camera array image rectification and calibration for stereoscopic and autostereoscopic displays. In: annals of telecommunications (July 2013), pp [4] Vincent Nozick. Multiple View Image Rectification. In: proc. of IEEE-ISAS 2011, International Symposium on Access Spaces. E-ISBN: , print ISBN: Yokohama, Japan, 17-19, June 2011, pp ISBN: [5] Marc Pollefeys. Exploiting scene constraints. URL: www. cs. unc. edu/ ~marc/ tutorial/ node110.htm. 10/10

Adaptive Zoom Distance Measuring System of Camera Based on the Ranging of Binocular Vision

Adaptive Zoom Distance Measuring System of Camera Based on the Ranging of Binocular Vision Zhiyan Zhang 1, Wei Qian 1, Lei Pan 1 & Yanjun Li 1 1 University of Shanghai for Science and Technology, China