A Visible-Light and Infrared Video Database for Performance Evaluation of Video/Image Fusion Methods

Size: px
Start display at page:

Download "A Visible-Light and Infrared Video Database for Performance Evaluation of Video/Image Fusion Methods"

Transcription

1 Noname manuscript No. (will be inserted by the editor) A Visible-Light and Infrared Video Database for Performance Evaluation of Video/Image Fusion Methods Andreas Ellmauthaler Carla L. Pagliari Eduardo A. B. da Silva Jonathan N. Gois Sergio R. Neves Received: date / Accepted: date 0 0 Abstract In general, the fusion of visible-light and infrared images produces a composite representation where both data are pictured in a single image. The successful development of image/video fusion algorithms relies on realistic infrared/visible-light datasets. To the best of our knowledge, there is a particular shortage of databases with registered and synchronized videos from the infrared and visible-light spectra suitable for image/video fusion research. To address this need we recorded an image/video fusion database using infrared and visible-light cameras under varying illumination conditions. Moreover, different scenarios have been defined to better challenge the fusion methods, with various contexts and contents providing a wide variety of meaningful data for fusion purposes, including nonplanar scenes, where objects appear on different depth planes. However, there are several difficulties in creating datasets for research in infrared/visible-light image fusion. Camera calibration, registration, and synchronization can be listed as important steps of this task. In particular, image registration between imagery from sensors of different spectral bands imposes additional difficulties, as it is very challenging to solve the correspondence problem between such images. Motivated by these challenges, this work introduces a novel spatiotemporal video registration method capable of generating registered and temporally aligned infrared/visible-light video sequences. The proposed workflow improves the registration ac- A. Ellmauthaler Halliburton Technology Center, Rua Paulo Emidio Barbosa,, Ilha da Cidade Universitaria, Rio de Janeiro, -0, Brazil andreas.ellmauthaler@halliburton.com C. L. Pagliari Instituto Militar de Engenharia, Rio de Janeiro, 0-0, Brazil carla@ime.eb.br E. A. B. da Silva Universidade Federal do Rio de Janeiro, Rio de Janeiro, -, Brazil eduardo@smt.ufrj.br J.N. Gois Centro Federal de Educação Tecnológica Celso Suckow da Fonseca, Rio de Janeiro, -000, Brazil jonathan.gois@cefet-rj.br S. R. Neves Instituto de Pesquisas da Marinha, Rio de Janeiro, -0, Brazil sergio@ipqm.mar.mil.br

2 Andreas Ellmauthaler et al. curacy when compared to the state-of-the art. By applying the proposed methodology to the recorded database we have generated the VLIRVDIF (Visible-Light and Infrared Video Database for Image Fusion), a publicly available database to be used by the research community to test and benchmark fusion schemes. Keywords Infrared/visible image/video database image registration image fusion camera calibration Introduction Multi-camera setups are particularly effective in environments where a single camera is incapable of capturing the entire information available within the monitored scene. Scenes shot with visible-light cameras usually exhibit good textural information, whereas infrared (IR) imagery may exhibit objects that are invisible to visible-light sensors, provided they can measure the heat emissions to form an image. An imaging system could produce two disctinct images/videos of the same scene or a single image/video containing information from various cameras positioned close to each other. The latter is of special interest since it exceeds the physical limitations of a single sensor within a single image. In particular, it is used in spatial and temporal super-resolution frameworks as well as in image fusion applications for the purpose of increasing the overall depth of focus, the overall dynamic range and the overall spectral response of an imaging system []. In order to develop image fusion algorithms one needs a large and diverse set of image pairs. A variety of high quality data is useful to validate any method, testing its overall ability to provide results with real images, under real-life conditions. Datasets for image fusion research should contain video footage from different scenarios, designed to challenge fusion algorithms. We created the VLIRVDIF (Visible-Light and Infrared Video Database for Image Fusion) to contribute with registered and synchronized videos from the IR and visible-light spectra, aiming to aid researchers working in image/video fusion. The database contains registered IR/visible-light video sequence pairs, as well as their raw, unprocessed counterparts, and is freely available for download at []. By doing so, we hope to contribute to alleviate the problem faced by most researchers in multimodal image fusion that is the shortage of registered and synchronized videos for evaluation purposes. A major concern when creating the VLIRVDIF database was to generate a dataset for image/video fusion purposes under varying lighting conditions and challenging environmental conditions. Therefore, of the sequences were captured under high temperatures, between o C and o C, causing the vegetation and objects to be warmer or as warm as the human subjects in the scenes. In addition, while some sequences were shot at night with little or no illumination, others were acquired under controlled indoor illumination. The scenes present concealed objects under clothes or objects, that are only captured by the IR sensors. In addition, it was designed to provide non-planar scenes, imposing challenging conditions to image fusion algorithms. When properly combined, visible-light and IR footage could provide videos that can be useful for many applications. To the best of our knowledge, there are a handful of publicly available databases which are suitable for research in multisensor image and video fusion. The generation of such databases has to tackle several problems, including the creation of challenging scenarios and the inherent difficulties in producing registered source images at sub-pixel accuracy originating from image sensors operating at different spectral bands. This latter problem motivated us to develop an algoritm to register pairs of spatio-temporally

3 Title Suppressed Due to Excessive Length Temporal Alignment Rectification Registration Figure Schematic diagram of the proposed IR/visible-light video registration framework. In the the superimposed pseudo-color images on the right, the visible-light and IR images occupy the green and red channels, respectively misaligned IR and visible-light videos of the same dynamic scene recorded from distinct yet stationary viewpoints. Therefore, independent of the underlying application, it is of vital importance that the images are represented in a common reference coordinate frame. This can be achieved by jointly calibrating the employed cameras, that is, computing their optical properties (intrinsic parameters) as well as the relative positions of the individual cameras with respect to each other (extrinsic parameters). Based on these calibration parameters the images can subsequently be undistorted and rectified such that the pixel coordinates in one image sequence are in direct correspondence to pixel coordinates in the other image sequence. In the course of this work we will refer to this process as image registration. Fig. shows the schematic workflow of the developed IR/visible-light video registration framework. In general, both traditional [ ] and self-calibration [, ] methods are wellsuited for registering image sequences originating from cameras operating in the same spectral band. However, they tend to face problems for sequences obtained by sensors of different modalities (such as IR and visible-light sensors). For self-calibration methods this is mainly due to the possible lack of mutual feature-points or common scene characteristics within corresponding input images. These problems are less severe for traditional calibration methods. Moreover, the construction of a calibration board, whose interest points appear likewise in the IR and visible-light spectrum and allow for an accurate calibration of the employed cameras, is not a trivial task. As a consequence, only a few approaches to IR/visible-light stereo camera calibration can be found in the literature [,,,]. A more detailed discussion on camera calibration is presented in Section. The developed approach uses a planar calibration board equipped with miniature light-bulbs to register an IR/visible-light image sequence pair misaligned in space and time. The large number of light bulbs makes the registration process more robust against the lack of mutual scene characteristics, a common source of problems when registering video sequences originating from different spectral modalities. The processing chain first determines the exact light bulb positions in the individual frames of an IR/visible-light video sequence and utilizes this information to estimate the temporal offset. This is followed by the camera calibration process which is used to rectify and remove distortion from the images. We show that the developed system is able to estimate the temporal offset with a high confidence level. Furthermore, the introduced calibration scheme leads to calibration results which exhibit significantly smaller MREs (mean reprojection errors) when compared to the state-of-theart. Examples of the effectiveness of the developed framework for generating pairs suitable for image fusion, where co-registered images at sub-pixel accuracy are required [], can be found in [].

4 Andreas Ellmauthaler et al. The remaining part of this paper is organized as follows. Section presents the databases available for public domain that are suitable for image and video fusion reserach, discusses the requirements that guided the creation of the VLIRVDIF dataset, and describes its design. Next, Section reviews the necessary camera calibration steps, while the temporal alignment, already presented in [], is reviewed in Section. The developed IR/visible-light camera registration scheme is described in detail in Section. In Section the experimental results, obtained when applying the proposed framework in order to build the VLIRVDIF from misaligned/unregistered IR/visible-light video pairs, are presented. Conclusions and future plans are given in Section Image Fusion Database Image fusion can be summarized as the process of integrating complementary information from multiple images into a composite representation containing a better description of the underlying scene than any of the individual source images could provide [0]. These techniques are particularly necessary in environments where a single sensor type is not sufficient to expose the whole scene content. For instance, images captured in the visible spectrum usually exhibit good textural information but tend not to contain objects located in poorly illuminated regions or behind smoke screens. IR imagery, on the other hand, does not suffer from these shortcomings but generally lacks textural information []. Thus, the combination of visible and IR sensors may lead to a composite representation where both textural information (visible image) and complementary information from the IR spectrum are depicted in a single image. The effectiveness of such fusion systems has been demonstrated for many tasks such as target tracking, concealed weapon detection, remote sensing and robot navigation, among others []. An invaluable quality of a video surveillance system is to be effective both at day and night, as well as under different illumination conditions. As algorithms that rely on two different electromagnetic spectrum bands can be highly effective, we created a database that may also be suitable for video surveillance and military applications, besides object tracking and video understanding. In order to address these requirements, we defined the context of the video (e.g. indoor, outdoor, sunny day) and its content. The content is an object in the scene (e.g., a structure, a person, a vehicle) or an event (e.g., people interacting, people walking). From this a concept list was defined by integrating the items from the context and content sets. The idea was to create concepts for different applications presenting different scenarios of interest. These include indoor and outdoor surveillance sequences, people interacting, people leaving a package with a concealed weapon, people walking, people hiding in the woods, people hiding behind a smoke screen, vehicles and boats, all under varying illumination and temperature conditions. Subsection. surveys datasets that are available for the research community, and Subsection. describes the VLIRVDIF designed and generated in this work. 0. Public IR/visible databases In [] the authors used a single axis camera to combine an IR and a visible-light camera to capture 00 image pairs of indoor and outdoor scenes. The cameras alignment was achieved by a beam-splitter and the visible-light images had to be scaled down and further aligned

5 Title Suppressed Due to Excessive Length using a manually computed homography. No videos were produced and the unregistered image pairs available at []. The test images used in [] were published in [] and []. The latter can be downloaded from []. When using this database, all image pairs have to be registered before the application of any fusion method. The image data published in [] is not publicly available. It was created for human silhouette extraction purposes, where color and IR cameras recorded people walking in indoor environments. The Object Tracking and Classification Beyond the Visible Spectrum (OTCBVS) Benchmark Dataset Collection [] provides benchmark datasets for testing and evaluating computer vision algorithms using images beyond the visible spectrum. These datasets comprise: person detection in thermal imagery (only IR images); unregistered thermal and visible face images under different conditions (unregistered IR/visible image pairs); color/thermal manually registered images; faces for facial analysis with thermal imagery; different scenarios with thermal imagery for detection and tracking purposes, weapon detection and weapon discharge detection with thermal imagery; face images (visible-light only) illuminated by IR lights to overcome the problem of illumination variation for face recognition applications []. The Ohio State University (OSU) Color and Thermal Database provides sequences with color/thermal images registered using homography with manually-selected points. This dataset, referenced in [], was created for the development of pedestrian tracking algorithms that require thermal and visible streams to be co-registered. It belongs to the OTCBVS collection []. The Eden project multi-sensor dataset [], reported in [0], provides multi-sensor videos for target tracking purposes. The IR and visible-light videos were manually registered, with the correspondences being selected throughout each video sequence. Whenever possible, the videos from the two spectral bands were synchronized with the help of an omnidirectional flash, visible in both spectral bands, activated at each scene shot. Only one scenario is available at [] in visible, IR, side-by-side IR/visible and fused versions. The International Society of Image Fusion [] provides links to websites containing databases of interest to information fusion researchers. However, there are not neither registered nor synchronized IR/visible-light image sequences. An imaging system for computing sparse depth maps from multispectral images is proposed in [] along with a visible-light and IR image dataset [] together with its associated ground-truth. In [] the accuracy of similarity measures for thermal-visible image registration of human silhouettes is investigated using a registered and synchronized video dataset, that is part of the LITIV dataset []. In the LITIV dataset humans are walking in a scene in various depth planes. Another part of the LITIV dataset was introduced in [] addressing the issues of IR/visible video registration, sensor fusion, and people tracking for far-range videos. A dataset for maritime imagery recognition was proposed in []. It is publicly available at [] and comprises unregistred visible and infrared ship imagery. The National Optics Institute (INO) Video Analytics Dataset [] comprises visiblelight and IR registered sequence pairs, where sequences are available at 0 frames per second (fps) and one at fps. The videos include parking lots, vehicles and people. The Visual Analysis of People Lab created the Stereo Thermal Dataset [], containing video sequences from two synchronized thermal cameras, with 0 0 pixels at 0 fps, captured under sunny conditions with a temperature of approximately 0 o Celcius. The videos exhibit pedestrians in scenarios that present a high degree of occlusion. The dataset has no visiblelight spectrum data.

6 Andreas Ellmauthaler et al The VLIRVDIF (Visible-Light and Infrared Video Database for Image Fusion) There are many challenges involved in jointly processing IR and visible-light spectral bands. Therefore, when creating a database for IR/visible-light image/video fusion purposes one needs to take these into consideration by imposing several conditions to the scenes. Usually, low contrast levels in the visible-light spectrum severely affect the performance of image/video processing methods. The lower the contrast of a scene, the worse the image/video processing algorithm. Therefore, some scenarios should include different illumination conditions such as bright sun light and nighttime footage, as well as illumination changes across the video. Additionally, smoke screens and objects moving into shaded areas could be introduced to stress object tracking in the visible band. Also, people wearing camouflage uniforms hiding behind vegetation increase the difficulties in the visible-light side. Challenges for the IR sensor are imposed by high outdoor temperatures (very close to the average human body temperature) under bright sun light and indoor controlled illumination variation. Moreover, people transporting objects that are concealed to the visible band sensor (e.g. behind newspapers and vegetation or inside bags) should be also part of the content. Therefore, both video context (e.g. indoor, outdoor, sunny day) and their content (e.g. structures, persons, hidden objects) offer challenging scenarios for testing video registration and fusion algorithms. Taking these into account, we designed the VLIRVDIF, consisting of different video sequences, manually recorded at distinct locations. Table gives an overview of the main properties of the recorded video sequences, including a rough summary of the scene contents as well as the prevailing environmental conditions. Selected scene thumbnails can be seen in Fig.. The four outdoor sets of video sequences were shot under different conditions. The seven Camouflage sequences were acquired under bright sunlight and high temperatures. The sequences exhibit people wearing both civilian and camouflage outfits passing through and/or hiding behind vegetation. In addition, some sequences present smoke screens transparent to the infrared wavelengths. In general, smoke screens are used to obscure the visible and infrared radiation lights as electro-optical countermeasures. The smoke screen present in some Camouflage sequences conceals objects in the visible region only. In some Camouflage sequences there are people carrying weapons that are hiding behind vegetation and/or the smoke screen. The Patio sequence, shot in the twilight, displays one person concealed in the vegetation with a background consisting of several people passing by a corridor. A particular feature of the Camouflage and Patio sequences is given by their scene planes. The Camouflage sequence exhibits two dominant scene planes at distances of 0m and 00m, respectively, whereas the Patio sequence features an inclined scene plane with distances varying in the range of -0m. This poses a significant challenge to image fusion algorithms. The outdoor sequence Trees comprises four different scenes all shot under bright sunlight, with some scenes acquired under backlighting conditions. One of the Trees scene displays a person concealed under the shade of trees, while the others show two people crossing a lawn, either hiding into a shaded area or emerging from a shaded area. There is also a scene with a car that stops and picks up one of the hidden persons. The nightime sequences Guanabara Bay disclose a view of the Guanabara Bay and the Rio de Janeiro - Niterói bridge with vehicles crossing and vessels navigating the bay. The two sets of indoor video sequences, Lab and Hangar, were shot under artificial light and controlled temperature. The five Lab sequences display two people carrying bags with weapons that are concealed only to the visible wavelengths. Moreover, there are scenes where objects are hidden behind newspapers, as well as scenes where objects con-

7 Title Suppressed Due to Excessive Length Table Overview of the sequences from the VLIRVDIF database []. Name Scene Content Environmental Conditions Camouflage Sequences with outdoor scenes Bright sunlight People hiding behind vegetation and/or smoke screen o C dominant scene planes at distances of approx. 0m and 00m, respectively Lab Sequences with indoor scenes Artificial light People walking around, hiding weapon-like items o C within bags and behind newspapers Distance to scene plane approx. m Patio Sequences with outdoor scenes Twilight Several people passing by a corridor; One person hiding o C behind vegetation Varying distance to scene plane (m-0m) Trees Sequences with outdoor scenes Bright sunlight persons crossing a lawn and hiding into a shaded o C area; Crossing Car; One person concealed under the shade of trees. Distance to scene plane approx. 0m Hangar Sequences with outdoor scenes Artificial light Several people crossing a dimly lit corridor. Distance o C to scene plane approx. 0m Guanabara Sequences with outdoor scenes Nighttime Bay View of the Guanabara Bay and the Rio de Janeiro - o C Niterói bridge. Vehicles crossing the bridge; Ships on Guanabara Bay; Distance to scene plane approx. 00m (a) (b) 0 Figure Employed calibration board consisting of light bulbs, arranged in a matrix, in the (a) visible-light and (b) IR spectrum. The depicted images were taken from an IR/visible-light image sequence after temporal alignment. cealed in bags are left unattended. Several people crossing a dimly lit corridor are the main subject of the Hangar sequence. In one of the sequences a person carries a concealed weapon in a bag, while others start a conversation. The light fades in and out and occasionally blinks. Independent of the scene content, each IR/visible-light video pair starts off by exhibiting different poses of the calibration board shown in Fig.. These poses include translational and rotational movements of the calibration board and were chosen in such a way that both temporal and spatial alignment can be performed simultaneously using the same calibration footage (see Sections., and ). The employed test setup consisted of a portable tripod (Fig. ) on which an IR and a visible-light camera were rigidly mounted side-by-side. The viewing angle and the zoom

8 Andreas Ellmauthaler et al. 0 Figure Used test setup consisting of an IR (left) and a visible-light camera (right) mounted side-by-side. of the employed cameras were manually adjusted in such a way that the observed overlap between the field-of-view of both cameras was as large as possible. The IR video sequences were obtained by recording the analogue NTSC video output of a FLIR Prism DS camera, operating at a spectral range of.µm to µm (Mid-Wavelength IR). In order to convert the analogue video stream to digital video, a Pinnacle Dazzle Digital Video Creator 0 video capturing device was utilized. In accordance with the NTSC standard, the resultant video exhibits a resolution of 0 0 pixels (which differs from the native 0 pixels resolution of the employed IR camera). As for the visible-light video sequences a Panasonic HDC-TM00 camera was employed. These videos were recorded at a resolution of 0 00 and subsequently downsampled and cropped to match the IR video resolution of 0 0 pixels. Both IR and visible-light video sequences were recorded at a rate of 0 frames per second (0 fps). The VLIRVDIF is publicly available at []. It contains both raw, unprocessed visible-light and IR video sequences, as well as their registered and synchronized counterparts. Moreover, the scenes were recorded varying different parameters, such as distance of the pair of cameras to the scene plane, illumination, rotation of the cameras, zoom, movement of the targets in the scene, occlusions, among others. The idea behind the variation of these parameters is twofold: to stress the registration method and to obtain a larger diversity in the database. 0 0 Camera calibration Although the calibration procedure employed in this work has already been published in [0], we review the necessary mathematical concepts involving camera calibration and calibration point localization to allow a proper understanding of the registration process. For this purpose, we start by describing how D scene points can be accurately mapped onto a D image plane and derive the corresponding camera model. Next, based on the single camera model we review the epipolar geometry of two views and address the question of how the knowledge of the position of an image point in one view constrains the position of the corresponding point in the other view. In the course of this work the following notation is used: Homogeneous D coordinates X = [X Y Z ] T are represented by bold, capital letters whereas homogeneous D coordinates x = [x y ] T are represented in boldface, lowercase letters. Their inhomogeneous

9 Title Suppressed Due to Excessive Length counterparts are denoted by X = [X Y Z] T and x = [x y] T, respectively. As for stereo camera calibration, we use the superscript to indicate entities associated with the second view Single Camera Calibration In the basic pinhole camera model an image point in D is represented by the homogeneous vector x and its counterpart in the D world coordinate system by the homogeneous vector X. The general mapping given by the pinhole camera can be expressed by [] α x s x 0 µx = K [R t] X, with K = 0 α y y 0, () 0 0 where µ is an arbitrary scale factor, R and t are the extrinsic camera parameters and K is called the intrinsic camera matrix [] or camera calibration matrix []. The parameters of the rotation matrix R and the translation vector t represent the placement of the world coordinate system with respect to the camera coordinate system whereas K contains the internal camera parameters in terms of pixel dimensions. These are the focal length (α x, α y ) and the principal point (x 0, y 0 ) of the camera in the x and y direction, respectively, as well as the parameter s which describes the skewness of the two image axes. In this work we focus on finite cameras corresponding to the set of homogeneous matrices P = K[R t] for which the left hand submatrix KR is non-singular. When using a calibration device we can assume, without loss of generality, that the calibration pattern is located on the plane Z = 0 in the world coordinate system. Thus, we can rewrite eq. () such that X x µ y = K [r r r t] Y X 0 = K [r r t] Y = H X, () where R is given by [r r r ], H = K [r r t] is called a homography matrix and X = [X Y ] T. Lens distortion can be incorporated using the following expression [, ] F( x c, K, P) = [ ( xc k r + k r +... ) + ( p x c y c + p (r + x c) ) ] ( y c k r + k r +... ) + ( p (r + yc ), () ) + p x c y c where x c = [x c y c ] T are the (non-observable) distortion-free, normalized points in the camera coordinate system before applying the camera calibration matrix K, K = {k, k,...} and P = {p, p } are the coefficients of the radial and tangential distortion, respectively, and r = x c + yc. The (observable) distorted, normalized points x d are then approximated by x d = x c + F( x c, K, P) () and the final image points are given by x = K x d. In this work a nd order radial distortion model with tangential distortion is used such that K = {k } and P = {p, p }.

10 0 Andreas Ellmauthaler et al. 0 With all this in mind, a final global optimization step is incorporated which estimates the complete set of parameters using the previously obtained calibration parameters as an initial guess. This optimization is done iteratively by minimizing the following functional [] x ij x ( ) K, K, P, R i, t i, X j, () i j where x ij is the sub-pixel position of the j th calibration point in the i th calibration image, and x ( ) K, K, P, R i, t i, X j is the projection of the corresponding calibration point X j from the D world coordinate system. Given the calibration point positions in the real world and camera coordinate system, various off-the-shelf solutions for camera calibration exist. Among them, the OpenCV Camera Calibration Toolbox [] as well as the Camera Calibration Toolbox for Matlab [] are predominately used.. Stereo Camera Calibration In this subsection we formally define the epipolar geometry between a pair of images. As before, we start with the basic pinhole camera model which does not assume lens distortion. Suppose a D scene point X is imaged at the point x in the first view and at x in the second view. Then, corresponding points x x satisfy the epipolar constraint [] x T Fx = 0, () 0 where F is called the fundamental matrix of the camera pair. An important property of the fundamental matrix is that it is of rank. Hence, F does not provide point-to-point correspondences. Instead it specifies a map x l from a point in one image to its corresponding epipolar line in the other image []. Assuming that both cameras, represented by the matrices P and P, have been calibrated according to the pinhole camera model such that P = K [I 0] P = K [R t], () where, without loss of generality, we choose the world origin to coincide with the first camera P, then the fundamental matrix can be expressed by [] F = [ K t ] K RK, () where we use the notation that the -vector [ K t ] defines a skew-symmetric matrix such that the vector product a b = [a] b, and R and t describe the relative rotation and 0 displacement of the two cameras, respectively. Due to the linearity of eq. (), the fundamental matrix provides a simple and computationally friendly solution to compute point-to-line correspondences within a stereo camera setup. However, for real cameras employing optical lenses such a linear mapping is no longer valid. To this end, the mapping of image points from the first view to the second view in the presence of lens distortion can be summarized as follows: First, apply the inverse camera calibration matrix to the D image points in the first view x d = K x. Next, in order to obtain the distortion-free, normalized points x c, the inverse distortion model of eq. () needs to be employed to x d. However, this is not straightforward since no analytic solution for

11 Title Suppressed Due to Excessive Length 0 the inverse exists. One way to bypass this problem is to approximate the inverse distortion model recursively [, ] x c x d F( x d, K, P) x d F( x d F( x d, K, P), K, P) x d F( x d F( x d F( x d, K, P), K, P), K, P).... where F is defined in eq. (). By doing so, the error introduced when substituting x d with x c on the right-hand side gets smaller at each iteration. As was shown in [, ] three to four iterations are sufficient to compensate strong lens distortions. Next, the undistorted points x c are mapped from the first camera coordinate system through the plane at infinity [] to the camera coordinate system of the second camera [] (x c = Rx c ) and lens distortion is added using the forward lens distortion model of eq. () such that x d = x c + F( x c, K, P ). Finally, by applying the camera calibration matrix (x = K x d), a potential match of x in the second view is found. Please note that, as a consequence of lens distortion, the previously established point-to-line correspondences no longer hold. Instead if points x and x correspond, then x lies on a curved epipolar line controlled by the polynomial distortion function of eq. (). Besides the chosen camera model, the overall accuracy of camera calibration depends to a great extent on the ability to localize a set of calibration points within the provided calibration footage. In the next section we introduce the calibration point detection scheme for IR/visible-light imagery from [0]. It is capable of localizing corresponding calibration point pairs in the IR and visible-light spectra with high accuracy. () Calibration Point Detection Due to the different spectral sensitivity of IR and visible-light cameras, the construction of a calibration board whose interest points appear both in the visible-light and IR spectra is not a trivial task. For example, existing camera calibration approaches based on black/white calibration patterns cannot be employed straightforwardly since, in most cases, such calibration devices do not appear in the IR image. The calibration board chosen in this work uses miniature light bulbs, equidistantly mounted on a planar calibration board [, 0]. This configuration is of special interest since, when energized, heat and light are simultaneously emitted by the light bulbs causing the calibration pattern to appear in both the visible-light and IR modalities. This is demonstrated in Fig., where the employed calibration board consisting of light bulbs, arranged in a matrix, is shown in the visible-light and IR spectra, respectively. Other approaches are presented in [], [] and []. The main advantages of the chosen calibration board include its versatility (e.g. the calibration board can be used for daytime and nighttime recordings), its fast operational readiness ( plug & play ) and portability. Moreover, since the same physical entities (light bulbs) are used as calibration points in the IR and visible-light images, eventual imperfections of the calibration board (e.g. loose contact of one of the light bulbs) can be compensated more easily. Nevertheless, when observing Fig. some challenges associated with the chosen calibration board can be identified. Due to the use of cheap, off-the-shelf light bulbs, the emitted radiation pattern tends to differ from light bulb to light bulb - a problem which is further aggravated when tilting the calibration board. In extreme cases, this may even lead to the fading of some light bulbs. In addition, the visibility of the light bulbs in the visible-light image depends to a large extent on the surrounding lightning conditions. For example, for

12 Andreas Ellmauthaler et al. outdoor sequences recorded at bright day light, the calibration points are less noticeable than for indoor scenes where the lightning conditions can be controlled. In order to cope with these challenges, a series of steps are proposed in [0] to robustly extract the sub-pixel positions of the miniature light bulbs along all video frames exhibiting the calibration board of Fig Calibration point localization In order to compute the exact sub-pixel positions of the miniature light bulbs along all IR/visible-light video frames exhibiting the calibration board of Fig., we first have to separate the light bulb regions from the background. Ideally, this would be accomplished by applying a static threshold to the calibration images, labeling all pixels above the threshold as belonging to a potential light bulb region. However, due to the varying appearance of the light bulbs, no global threshold is capable of reliably producing a binary image that contains all light bulbs whilst suppressing wrongly extracted background regions. Thus, the approach adopted in this work does not rely on a single global threshold but tries to extract the exact light bulb positions by iteratively determining the optimal threshold for each calibration image. For this purpose, we first choose an initial threshold (either manually or by means of some adaptive thresholding scheme as the one in []) which is subsequently used to binarize the calibration image. After the thresholding operation, the extracted light bulb regions are expected to exhibit ellipse-like patterns in the binarized image. Based on this assumption, we post-process the binary image by removing all regions which appear with arbitrary shape and do not resemble the expected ellipsoidal radiation pattern. This is accomplished by fitting an ellipse to the boundary pixels of each region and discarding those for which the committed error (defined as the sum of squares of the distances between the boundary pixels of the region and the fitted ellipse) is above some threshold. Furthermore, we also remove those regions corresponding to ellipses with large eccentricity (measure of how much the ellipse deviates from being circular) since it is assumed that the ellipses corresponding to light bulb regions closely resemble a circle. In our implementation the ellipse fitting is performed by employing the algorithm of []. The described procedure mitigates the problem of fast movements, hand shaking and eventual distortions of the light bulb region. A first estimate of the calibration point positions is obtained by substituting the original light bulb regions with the area of the computed ellipses and calculating their centroids within the original calibration images. If the number of computed calibration points is below the overall number of light bulbs we repeat the above procedure using the next lower threshold. If on the other hand the number of extracted calibration points is larger than the number of light bulbs, a potential solution is to randomly choose a subset of calibration points from the complete set and to compute the corresponding homography using the DLT algorithm. If the correct set is chosen, mapping the calibration points from the D world coordinate system to the calibration image results in a small MRE, defined as MRE = x i H X i. (0) N i Here, N is the total number of light bulbs, x i is the estimated position of the i th calibration point within the calibration image and X i represents the position of the corresponding calibration point in the world coordinate system. If, on the contrary, the MRE is high, we have

13 Title Suppressed Due to Excessive Length strong evidence that the chosen subset does not correspond to the true light bulb positions and another subset needs to be chosen. Even though this procedure was found to be robust, it is computationally expensive when the number of extracted regions is much larger than the actual number of light bulb regions. In fact, in a scenario with k light bulbs and n extracted regions with n > k, the combinatorial n! complexity of this approach corresponds to different combinations. It is easy to k!(n k)! verify that the number of possible combinations grows exponentially with the number of extracted regions. For instance, for the case of light bulbs and, and extracted regions, respectively, the overall number of combinations is, 0 and. Thus, in situations where the ratio of extracted calibration points to light bulbs renders the above mentioned method impracticable, a preliminary step for outlier removal is needed. This is done by exploiting the available information about the light bulb distribution on the calibration board. In more detail, assuming that the distances between pairs of adjacent light bulbs are approximately constant within the calibration images, we iteratively eliminate the calibration points whose mean distances to their closest neighbors differ most from the median of distances, calculated over the whole set of extracted regions. This procedure is repeated until the combinatorial complexity for the aforementioned method is reduced to an acceptable degree such that it can be used to remove all remaining wrongly extracted calibration points without a high computational overhead. If the number of extracted calibration points matches the number of light bulbs, and the corresponding MRE is below a pre-defined threshold, then the final calibration point positions can be computed. Since our goal is to obtain calibration points for which the MRE is as small as possible we refine the corresponding homography H by minimizing the functional min xi H X i. () H i The final calibration point positions are computed by applying the refined homography to the calibration point positions in the world coordinate system. Fig. a and b show the resulting calibration point positions for the visible-light and IR calibration image of Figs. a and b, respectively. After the exclusion of the outliers, the first estimate of the calibration points are the centers of gravity of the selected areas. However, due to reflection, plate inclination and binarization threshold adjustment, the centers of gravity do not always represent the best calibration points. Minimization can be described with the following steps: (i) The calibration points are defined as the centers of gravity; (ii) The homography is calculated using the DLT algorithm [] (note that the world coordinates system points are known by the construction of the board); (iii) The homography is applyied to the world real points and the position of calibration points is updated; (iv) Steps (i-iii) are repeated until the MRE is acceptable or does not change. The next section shows that, by means of the extracted calibration point positions, the time-shift between two unsynchronized IR and visible-light sequences can be successfully determined. A preliminary version of this temporal alignment was published in []. 0 Temporal Alignment Let S V and S I be two video sequences N V and N I frames long, recorded at the same frame rate by a visible-light and an IR camera, respectively, exhibiting different poses of the

14 Andreas Ellmauthaler et al. (a) (b) Figure Results of the calibration point detection for the (a) visible-light and (b) IR calibration images of Fig. (zoomed version) (a) (b) Figure Global movement of all calibration points along a (a) visible-light and (b) IR video sequence. Each line represents the vertical movement of a single calibration point. Bright pixel values indicate an upward movement whereas dark pixel values represent a downward movement of the calibration board. calibration board of Fig.. Finding the temporal offset ˆt between the two video sequences S V and S I is equivalent to maximizing a similarity measure s( ) over a set of potential temporal offset candidates t such that 0 ˆt = arg max t s ( S V, S I, t ). () The temporal alignment approach proposed in [] starts off by recording alternating translational movements of the calibration board in the downward and upward directions. This is followed by the extraction of the calibration point positions in each frame of the IR and visible-light video sequence as elaborated in Section.. Based on the extracted calibration point positions, we determine the vertical component of speed of each calibration point along the video sequences. This is accomplished by subtracting the y-coordinates of the calibration point positions between two successive video frames. Fig. depicts the global movement of all calibration points with each line representing the overall vertical movement of a single calibration point. In both images, brighter pixel values indicate the displacement of the calibration board in the upward direction whereas darker pixel values suggest a downward movement of the calibration pattern. Based on Fig., the temporal offset between the two video sequences can be determined in a straightforward manner. It simply corresponds to the horizontal displacement between the two images for which their horizontal cross-correlation is maximized. More specifically, given a temporal offset candidate t, the similarity between the visible-light sequence S V and the IR sequence S I is given by

15 Title Suppressed Due to Excessive Length 0 s(s V, S I, t) = M M m= n N M V (m, n t)m I (m, n) m= n N ( ) M M V (m, n t) k= l N, () ( ) M I (k, l) where the matrices M V (m, n) and M I (m, n) respectively express the displacement of the m th calibration point between two consecutive visible-light and IR frames at time instant n, M is the number of calibration points, N V and N I are, respectively, the number of visiblelight and infrared frames considered, and N = {n (n t) N V n N I }. Please note that the similarity measure of eq. () is restricted to the interval [, ]. The two video sequences are considered to have coincident movements if the similarity measure is and opposite movement if the result is. A result of 0 implies that no similarities between the two sequences could be found. As expressed in eq. (), the best estimation of the temporal offset ˆt between the IR and visible-light video sequence is the one for which eq. () is maximized. Fig. shows the result of the temporal alignment for two IR/visible-light video sequences corresponding to Fig.. The highest similarity (according to eq. ()) is obtained for a temporal offset t of frames. This result corresponds well with Fig. which, when subjectively assessed, suggests a time-shift of approximately 00 frames between the two sequences. 0. Similarity t Figure Result of the temporal alignment for the two IR and visible-light video sequences corresponding to Fig.. The highest similarity (according to eq. ()) between the two video sequences is obtained for t = frames. 0 Image Registration Once the IR/visible-light video sequence pair is synchronized, the individual and joint camera parameters of the IR/visible-light camera pair can be estimated. This is accomplished by choosing N temporally aligned calibration images and following the calibration procedure outlined in Section.. Please note that in the current implementation the calibration images where chosen manually such that a large variety of different poses of the calibration board is incorporated in the calibration process. However, this process can be automated by extracting the pose information directly from the homography matrices [, ].

16 Andreas Ellmauthaler et al. 0 0 A potential drawback of the proposed approach is that the calibration point localization as described in Section. is performed using non-fronto-parallel calibration images which suffers from nonlinear distortions due to the camera optics. In order to improve calibration results, it is therefore beneficial to first map the calibration images onto an undistorted fronto-parallel view (see Fig. ) and determine the exact calibration point positions within these canonical images. However, in order to do so, full knowledge of the calibration parameters would be necessary - information that is usually not available at this point. One possible solution to this problem is presented in [] where the authors advocate an iterative refinement approach, using alternating mappings of the calibration images onto a canonical fronto-parallel view and back. In this work we follow a similar approach. After computing a first preliminary version of the calibration parameters we remove the radial and tangential distortion from the calibration images and map them onto a canonical fronto-parallel plane in the world coordinate system. Within this fronto-parallel view we then localize the calibration points using the processing chain of Section.. Finally, these new calibration points are remapped onto the original image plane and the camera parameters are recomputed using the updated calibration point positions. This process is repeated until convergence, where in each new iteration the mapping onto the fronto-parallel plane is performed using the camera parameters from the previous iteration. Fig. shows the undistorted equivalents of Fig. in the fronto-parallel view. As shown in Section, the calibration parameters obtained by means of this iterative calibration point refinement result in reprojection accuracies exceeding the ones of traditional IR/visible-light camera calibration approaches [,, ], as well as the novel model proposed by []. (a) (b) Figure Undistorted views of the calibration boards of Fig. in the fronto-parallel plane. (a) Visible-light image. (b) IR image. 0 After completing the individual calibration procedures for the IR and visible-light camera we jointly calibrate them as described in Section.. By doing so, we gain knowledge of the relative displacement of the two cameras, consequently enabling us to map points from one view to the other one. As previously pointed out, due to lens distortion this mapping is not linear in the sense that a point in one view does not correspond to a line in the other view. Instead a curved line is generated on which the corresponding points in the second view reside. This is demonstrated in Fig. where the epipolar curves resulting from mapping the IR calibration points of Fig. b to the visible-light calibration image of Fig. a are highlighted. It can be observed that the distances between the epipolar curves and the corresponding calibration points are very small, indicating the high accuracy of the stereo calibration results.

17 Title Suppressed Due to Excessive Length Figure Result of stereo calibration when mapping the IR calibration points of Fig. b to the visible-light calibration image of Fig. a. Note that due to lens distortion this mapping is not linear, resulting in curved epipolar lines. 0 0 Next, based on the obtained epipolar geometry we rectify the IR/visible-light image pairs [, ], resulting in image correspondences where the epipolar curves are linearized and run parallel to the x-axis. By doing so, disparities between the IR and the visible-light images occur in the x-direction only. In this work rectification is achieved by undistorting both image sequences using eq. () and applying two rectifying homographies HR and H0R to the undistorted IR and visible-light images, respectively. Thus, after rectification, point correspondences are given by [] 00 0 where F = 0 0 () x0t H0T R F HR x = and x and x0 represent two corresponding image points taken from an undistorted IR/visible-light image pair. As a consequence the epipoles e and e0, corresponding to the right and left null space of F, are mapped to the point p = [ 0 0]T at infinity. Since all epipolar lines must pass through their corresponding epipoles it is easy to verify that all epipolar lines run parallel to the x-axis and, in effect, all corresponding image points have identical y-coordinates. Fig. shows the result of rectification for an arbitrary IR/visible-light image pair from sequence Trees. Notice that due to the different fields-of-view of the employed IR/visible-light camera pair, after rectification, the visible-light image is completely contained within the corresponding IR image. Moreover, Fig. also illustrates the effect of distortion removal. This is particularly apparent when observing the boundaries of the IR image which, after distortion removal, appear curved. Upon completion of the rectification process, we manually displace the rectified images horizontally until the principal scene planes in the two views appear spatially aligned. Then, we crop the overlapping areas and resample the resulting image portions such that the final image resolution matches the native spatial resolution of the IR/visible-light video pair. Fig. 0 presents the registration result for the Trees sequence. Note that this displacement process could be made automatic by identifying a region of interest in the images and corresponding points within it. Such a region of interest would correspond to a given scene depth. However, such a method would still have the limitation of not being able to perfectly register pairs having objects at very different depths. A similar

18 Andreas Ellmauthaler et al. Figure Result of image rectification for a sample IR/visible-light image pair from sequence Trees. For visualization purposes, the two images where overlaid on top of each other and occupy the red (visible-light) and green (IR) channels in the depicted RGB pseudo-color image. (a) (b) (c) Figure 0 Final registration results for an IR/visible-light image pair from the Trees image sequence. (a) Registered visible-light image. (b) Registered IR image. (c) RGB pseudo-color image where the registered visible-light and IR images of (a) and (b) occupy the red and green color channels, respectively. 0 problem is faced in [], a work that presents an almost automatic video registration method that relies on correspondences found via shape contour matching. Although out of the scope of our work, an option to perform that would be to, given a set of corresponding salient points in the two images, compute the depths and use and a depth-based image rendering [0] in one of the images to perform registration in all depths. However, such a method would still have to deal with the problem of occluded areas between the two cameras. In [] the occlusion problem is addressed using a video registration technique based on a RANSAC trajectory-to-trajectory matching for far-range videos. It estimates an affine transformation matrix that maximizes the overlapping of IR and visible foreground pixels. The method assumes that there is an intersection of the field of view between thermal and visible cameras and it does not employ any camera calibration method. A registration method for IR and

19 Title Suppressed Due to Excessive Length visible-light stereo videos based on local self-similarity is presented in [], that also treats the problem of occluded regions in a scene. Results 0 In this section the effectiveness of the developed IR/visible-light video registration framework is demonstrated. For the sake of brevity we constrain our discussion to the registration results of temporarily and spatially misaligned IR/visible-light video sequence pairs, each originating from a distinct recording location with varying scene content and lighting conditions (see Table for more details). The results of the other remaining sequences constituting the VLIRVDIF are available at []. Representative scene thumbnails of the selected IR/visible-light video sequences (before registration) are illustrated in Fig.. Figure Selected IR/visible-light scene thumbnails from the video sequences used for evaluation purposes. The top row consists of visible-light images, whereas the bottom row represents the corresponding IR images.. The VLIRVDIF challenges The IR/visible-light video pairs are naturally non-overlapping due to their different spectral bands. This poses difficulties for the calibration, registration, temporal alignment and fusion processes. To that end, the VLIRVDIF is designed to accentuate characteristics that impose different degrees of difficulties to be tackled by these processes. For example, the VLIRVDIF is composed of video pairs with several IR/visible-light frames presenting different degrees of occlusion, as well as regions within the frames that have absolutely no similarities with their associated visible-light/ir frame pairs. The VLIRVDIF also provides non-planar scenes, where objects appear on different depth planes. Some methods may need rectified video frames which are not available when dealing with non-planar scenes. As the fundamental idea of registration is finding correspondences from video frame pairs to allow scenes and objects to be represented in a common coordinate system, these chacateristics impose a significant challenge to registration algorithms. In addition, some sequences were shot under high temperature levels, which can hamper the efficiency of fusion methods.. Temporal Alignment Results 0 The estimated temporal offsets t for the selected video sequences (see Fig. ) together with the corresponding similarity measures of eq. () are given in Table. The attained similarity is very close to for all six assessed video sequences. This implies that after

20 0 Andreas Ellmauthaler et al. Table Results of the temporal offset estimation for the six different IR/visible-light video sequence pairs corresponding to the scenes depicted in Fig., from left to right. st pair nd pair rd pair th pair th pair th pair Temporal Offset 0 Similarity (a) (b) Figure Five calibration frames from the Lab image sequence (a) before and (b) after temporal alignment. 0 temporal alignment the movements of the calibration board are almost identical between the IR and visible-light video sequences. However, it is worth noting that the overall similarity measure depends, to a certain extent, on the movements performed with the calibration board. Thus, a small similarity does not necessarily imply a poor estimation of the temporal offset. In addition, the curve pictured in Fig. exhibits a single distinct peak corresponding to the position of the correct temporal offset, suggesting the robustness of the proposed approach. In order to visually demonstrate the effectiveness of the proposed temporal alignment scheme, Fig. shows five calibration frames from the second IR/visible-light video sequence pair of Table before and after temporal alignment. It can be noted that the unsynchronized video frames (Fig. a) display a significant misalignment in time. This is particularly evident when observing the four IR video frames to the right which appear to lag considerably behind the visible-light frames. As for the synchronized video frames (Fig. b), both IR and visible-light frames exhibit similar poses of the alignment board, thus indicating the correct temporal alignment of the IR/visible-light video sequence pair.

A NOVEL ITERATIVE CALIBRATION APPROACH FOR THERMAL INFRARED CAMERAS

A NOVEL ITERATIVE CALIBRATION APPROACH FOR THERMAL INFRARED CAMERAS A NOVEL ITERATIVE CALIBRATION APPROACH FOR THERMAL INFRARED CAMERAS Andreas Ellmauthaler, Eduardo A. B. da Silva, Carla L. Pagliari, Jonathan N. Gois and Sergio R. Neves 3 Universidade Federal do Rio de

More information

Stereo Image Rectification for Simple Panoramic Image Generation

Stereo Image Rectification for Simple Panoramic Image Generation Stereo Image Rectification for Simple Panoramic Image Generation Yun-Suk Kang and Yo-Sung Ho Gwangju Institute of Science and Technology (GIST) 261 Cheomdan-gwagiro, Buk-gu, Gwangju 500-712 Korea Email:{yunsuk,

More information

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation Obviously, this is a very slow process and not suitable for dynamic scenes. To speed things up, we can use a laser that projects a vertical line of light onto the scene. This laser rotates around its vertical

More information

Flexible Calibration of a Portable Structured Light System through Surface Plane

Flexible Calibration of a Portable Structured Light System through Surface Plane Vol. 34, No. 11 ACTA AUTOMATICA SINICA November, 2008 Flexible Calibration of a Portable Structured Light System through Surface Plane GAO Wei 1 WANG Liang 1 HU Zhan-Yi 1 Abstract For a portable structured

More information

Detecting motion by means of 2D and 3D information

Detecting motion by means of 2D and 3D information Detecting motion by means of 2D and 3D information Federico Tombari Stefano Mattoccia Luigi Di Stefano Fabio Tonelli Department of Electronics Computer Science and Systems (DEIS) Viale Risorgimento 2,

More information

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing Visual servoing vision allows a robotic system to obtain geometrical and qualitative information on the surrounding environment high level control motion planning (look-and-move visual grasping) low level

More information

A Low Power, High Throughput, Fully Event-Based Stereo System: Supplementary Documentation

A Low Power, High Throughput, Fully Event-Based Stereo System: Supplementary Documentation A Low Power, High Throughput, Fully Event-Based Stereo System: Supplementary Documentation Alexander Andreopoulos, Hirak J. Kashyap, Tapan K. Nayak, Arnon Amir, Myron D. Flickner IBM Research March 25,

More information

calibrated coordinates Linear transformation pixel coordinates

calibrated coordinates Linear transformation pixel coordinates 1 calibrated coordinates Linear transformation pixel coordinates 2 Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration with partial

More information

CS201 Computer Vision Camera Geometry

CS201 Computer Vision Camera Geometry CS201 Computer Vision Camera Geometry John Magee 25 November, 2014 Slides Courtesy of: Diane H. Theriault (deht@bu.edu) Question of the Day: How can we represent the relationships between cameras and the

More information

Easy to Use Calibration of Multiple Camera Setups

Easy to Use Calibration of Multiple Camera Setups Easy to Use Calibration of Multiple Camera Setups Ferenc Kahlesz, Cornelius Lilge, and Reinhard Klein University of Bonn, Institute of Computer Science II, Computer Graphics Group Römerstrasse 164, D-53117

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

Implemented by Valsamis Douskos Laboratoty of Photogrammetry, Dept. of Surveying, National Tehnical University of Athens

Implemented by Valsamis Douskos Laboratoty of Photogrammetry, Dept. of Surveying, National Tehnical University of Athens An open-source toolbox in Matlab for fully automatic calibration of close-range digital cameras based on images of chess-boards FAUCCAL (Fully Automatic Camera Calibration) Implemented by Valsamis Douskos

More information

Stereo Vision. MAN-522 Computer Vision

Stereo Vision. MAN-522 Computer Vision Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in

More information

LUMS Mine Detector Project

LUMS Mine Detector Project LUMS Mine Detector Project Using visual information to control a robot (Hutchinson et al. 1996). Vision may or may not be used in the feedback loop. Visual (image based) features such as points, lines

More information

Computer Vision Lecture 17

Computer Vision Lecture 17 Computer Vision Lecture 17 Epipolar Geometry & Stereo Basics 13.01.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar in the summer semester

More information

Computer Vision Lecture 17

Computer Vision Lecture 17 Announcements Computer Vision Lecture 17 Epipolar Geometry & Stereo Basics Seminar in the summer semester Current Topics in Computer Vision and Machine Learning Block seminar, presentations in 1 st week

More information

Multichannel Camera Calibration

Multichannel Camera Calibration Multichannel Camera Calibration Wei Li and Julie Klein Institute of Imaging and Computer Vision, RWTH Aachen University D-52056 Aachen, Germany ABSTRACT For the latest computer vision applications, it

More information

Rectification and Disparity

Rectification and Disparity Rectification and Disparity Nassir Navab Slides prepared by Christian Unger What is Stereo Vision? Introduction A technique aimed at inferring dense depth measurements efficiently using two cameras. Wide

More information

Computer Vision I. Dense Stereo Correspondences. Anita Sellent 1/15/16

Computer Vision I. Dense Stereo Correspondences. Anita Sellent 1/15/16 Computer Vision I Dense Stereo Correspondences Anita Sellent Stereo Two Cameras Overlapping field of view Known transformation between cameras From disparity compute depth [ Bradski, Kaehler: Learning

More information

Three-Dimensional Sensors Lecture 2: Projected-Light Depth Cameras

Three-Dimensional Sensors Lecture 2: Projected-Light Depth Cameras Three-Dimensional Sensors Lecture 2: Projected-Light Depth Cameras Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/ Outline The geometry of active stereo.

More information

Stereo and Epipolar geometry

Stereo and Epipolar geometry Previously Image Primitives (feature points, lines, contours) Today: Stereo and Epipolar geometry How to match primitives between two (multiple) views) Goals: 3D reconstruction, recognition Jana Kosecka

More information

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006, School of Computer Science and Communication, KTH Danica Kragic EXAM SOLUTIONS Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006, 14.00 19.00 Grade table 0-25 U 26-35 3 36-45

More information

Epipolar Geometry and Stereo Vision

Epipolar Geometry and Stereo Vision Epipolar Geometry and Stereo Vision Computer Vision Jia-Bin Huang, Virginia Tech Many slides from S. Seitz and D. Hoiem Last class: Image Stitching Two images with rotation/zoom but no translation. X x

More information

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION Mr.V.SRINIVASA RAO 1 Prof.A.SATYA KALYAN 2 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PRASAD V POTLURI SIDDHARTHA

More information

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection CHAPTER 3 Single-view Geometry When we open an eye or take a photograph, we see only a flattened, two-dimensional projection of the physical underlying scene. The consequences are numerous and startling.

More information

Rectification and Distortion Correction

Rectification and Distortion Correction Rectification and Distortion Correction Hagen Spies March 12, 2003 Computer Vision Laboratory Department of Electrical Engineering Linköping University, Sweden Contents Distortion Correction Rectification

More information

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania Image Formation Antonino Furnari Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania furnari@dmi.unict.it 18/03/2014 Outline Introduction; Geometric Primitives

More information

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG. Computer Vision Coordinates Prof. Flávio Cardeal DECOM / CEFET- MG cardeal@decom.cefetmg.br Abstract This lecture discusses world coordinates and homogeneous coordinates, as well as provides an overview

More information

Computer Vision Projective Geometry and Calibration. Pinhole cameras

Computer Vision Projective Geometry and Calibration. Pinhole cameras Computer Vision Projective Geometry and Calibration Professor Hager http://www.cs.jhu.edu/~hager Jason Corso http://www.cs.jhu.edu/~jcorso. Pinhole cameras Abstract camera model - box with a small hole

More information

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Final Report Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Omer Shakil Abstract This report describes a method to align two videos.

More information

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration Camera Calibration Jesus J Caban Schedule! Today:! Camera calibration! Wednesday:! Lecture: Motion & Optical Flow! Monday:! Lecture: Medical Imaging! Final presentations:! Nov 29 th : W. Griffin! Dec 1

More information

Thermal and Optical Cameras. By Philip Smerkovitz TeleEye South Africa

Thermal and Optical Cameras. By Philip Smerkovitz TeleEye South Africa Thermal and Optical Cameras By Philip Smerkovitz TeleEye South Africa phil@teleeye.co.za OPTICAL CAMERAS OVERVIEW Traditional CCTV Camera s (IP and Analog, many form factors). Colour and Black and White

More information

Vision Review: Image Formation. Course web page:

Vision Review: Image Formation. Course web page: Vision Review: Image Formation Course web page: www.cis.udel.edu/~cer/arv September 10, 2002 Announcements Lecture on Thursday will be about Matlab; next Tuesday will be Image Processing The dates some

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching Stereo Matching Fundamental matrix Let p be a point in left image, p in right image l l Epipolar relation p maps to epipolar line l p maps to epipolar line l p p Epipolar mapping described by a 3x3 matrix

More information

IRIS SEGMENTATION OF NON-IDEAL IMAGES

IRIS SEGMENTATION OF NON-IDEAL IMAGES IRIS SEGMENTATION OF NON-IDEAL IMAGES William S. Weld St. Lawrence University Computer Science Department Canton, NY 13617 Xiaojun Qi, Ph.D Utah State University Computer Science Department Logan, UT 84322

More information

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Literature Survey Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Omer Shakil Abstract This literature survey compares various methods

More information

ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW

ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW Thorsten Thormählen, Hellward Broszio, Ingolf Wassermann thormae@tnt.uni-hannover.de University of Hannover, Information Technology Laboratory,

More information

Multiple View Geometry

Multiple View Geometry Multiple View Geometry CS 6320, Spring 2013 Guest Lecture Marcel Prastawa adapted from Pollefeys, Shah, and Zisserman Single view computer vision Projective actions of cameras Camera callibration Photometric

More information

Outline. ETN-FPI Training School on Plenoptic Sensing

Outline. ETN-FPI Training School on Plenoptic Sensing Outline Introduction Part I: Basics of Mathematical Optimization Linear Least Squares Nonlinear Optimization Part II: Basics of Computer Vision Camera Model Multi-Camera Model Multi-Camera Calibration

More information

Chapter 9 Object Tracking an Overview

Chapter 9 Object Tracking an Overview Chapter 9 Object Tracking an Overview The output of the background subtraction algorithm, described in the previous chapter, is a classification (segmentation) of pixels into foreground pixels (those belonging

More information

STRAIGHT LINE REFERENCE SYSTEM STATUS REPORT ON POISSON SYSTEM CALIBRATION

STRAIGHT LINE REFERENCE SYSTEM STATUS REPORT ON POISSON SYSTEM CALIBRATION STRAIGHT LINE REFERENCE SYSTEM STATUS REPORT ON POISSON SYSTEM CALIBRATION C. Schwalm, DESY, Hamburg, Germany Abstract For the Alignment of the European XFEL, a Straight Line Reference System will be used

More information

CSE 252B: Computer Vision II

CSE 252B: Computer Vision II CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribes: Jeremy Pollock and Neil Alldrin LECTURE 14 Robust Feature Matching 14.1. Introduction Last lecture we learned how to find interest points

More information

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg Human Detection A state-of-the-art survey Mohammad Dorgham University of Hamburg Presentation outline Motivation Applications Overview of approaches (categorized) Approaches details References Motivation

More information

Calibration of a Different Field-of-view Stereo Camera System using an Embedded Checkerboard Pattern

Calibration of a Different Field-of-view Stereo Camera System using an Embedded Checkerboard Pattern Calibration of a Different Field-of-view Stereo Camera System using an Embedded Checkerboard Pattern Pathum Rathnayaka, Seung-Hae Baek and Soon-Yong Park School of Computer Science and Engineering, Kyungpook

More information

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision What Happened Last Time? Human 3D perception (3D cinema) Computational stereo Intuitive explanation of what is meant by disparity Stereo matching

More information

LIGHT STRIPE PROJECTION-BASED PEDESTRIAN DETECTION DURING AUTOMATIC PARKING OPERATION

LIGHT STRIPE PROJECTION-BASED PEDESTRIAN DETECTION DURING AUTOMATIC PARKING OPERATION F2008-08-099 LIGHT STRIPE PROJECTION-BASED PEDESTRIAN DETECTION DURING AUTOMATIC PARKING OPERATION 1 Jung, Ho Gi*, 1 Kim, Dong Suk, 1 Kang, Hyoung Jin, 2 Kim, Jaihie 1 MANDO Corporation, Republic of Korea,

More information

Epipolar Geometry and the Essential Matrix

Epipolar Geometry and the Essential Matrix Epipolar Geometry and the Essential Matrix Carlo Tomasi The epipolar geometry of a pair of cameras expresses the fundamental relationship between any two corresponding points in the two image planes, and

More information

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth Common Classification Tasks Recognition of individual objects/faces Analyze object-specific features (e.g., key points) Train with images from different viewing angles Recognition of object classes Analyze

More information

Camera model and multiple view geometry

Camera model and multiple view geometry Chapter Camera model and multiple view geometry Before discussing how D information can be obtained from images it is important to know how images are formed First the camera model is introduced and then

More information

CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION

CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION In this chapter we will discuss the process of disparity computation. It plays an important role in our caricature system because all 3D coordinates of nodes

More information

Measurement of Pedestrian Groups Using Subtraction Stereo

Measurement of Pedestrian Groups Using Subtraction Stereo Measurement of Pedestrian Groups Using Subtraction Stereo Kenji Terabayashi, Yuki Hashimoto, and Kazunori Umeda Chuo University / CREST, JST, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan terabayashi@mech.chuo-u.ac.jp

More information

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University. 3D Computer Vision Structured Light II Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Introduction

More information

Experiments with Edge Detection using One-dimensional Surface Fitting

Experiments with Edge Detection using One-dimensional Surface Fitting Experiments with Edge Detection using One-dimensional Surface Fitting Gabor Terei, Jorge Luis Nunes e Silva Brito The Ohio State University, Department of Geodetic Science and Surveying 1958 Neil Avenue,

More information

Outline. ETN-FPI Training School on Plenoptic Sensing

Outline. ETN-FPI Training School on Plenoptic Sensing Outline Introduction Part I: Basics of Mathematical Optimization Linear Least Squares Nonlinear Optimization Part II: Basics of Computer Vision Camera Model Multi-Camera Model Multi-Camera Calibration

More information

Introduction to Homogeneous coordinates

Introduction to Homogeneous coordinates Last class we considered smooth translations and rotations of the camera coordinate system and the resulting motions of points in the image projection plane. These two transformations were expressed mathematically

More information

Camera Calibration for a Robust Omni-directional Photogrammetry System

Camera Calibration for a Robust Omni-directional Photogrammetry System Camera Calibration for a Robust Omni-directional Photogrammetry System Fuad Khan 1, Michael Chapman 2, Jonathan Li 3 1 Immersive Media Corporation Calgary, Alberta, Canada 2 Ryerson University Toronto,

More information

Color Local Texture Features Based Face Recognition

Color Local Texture Features Based Face Recognition Color Local Texture Features Based Face Recognition Priyanka V. Bankar Department of Electronics and Communication Engineering SKN Sinhgad College of Engineering, Korti, Pandharpur, Maharashtra, India

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman Stereo 11/02/2012 CS129, Brown James Hays Slides by Kristen Grauman Multiple views Multi-view geometry, matching, invariant features, stereo vision Lowe Hartley and Zisserman Why multiple views? Structure

More information

DD2423 Image Analysis and Computer Vision IMAGE FORMATION. Computational Vision and Active Perception School of Computer Science and Communication

DD2423 Image Analysis and Computer Vision IMAGE FORMATION. Computational Vision and Active Perception School of Computer Science and Communication DD2423 Image Analysis and Computer Vision IMAGE FORMATION Mårten Björkman Computational Vision and Active Perception School of Computer Science and Communication November 8, 2013 1 Image formation Goal:

More information

Omni Stereo Vision of Cooperative Mobile Robots

Omni Stereo Vision of Cooperative Mobile Robots Omni Stereo Vision of Cooperative Mobile Robots Zhigang Zhu*, Jizhong Xiao** *Department of Computer Science **Department of Electrical Engineering The City College of the City University of New York (CUNY)

More information

BIL Computer Vision Apr 16, 2014

BIL Computer Vision Apr 16, 2014 BIL 719 - Computer Vision Apr 16, 2014 Binocular Stereo (cont d.), Structure from Motion Aykut Erdem Dept. of Computer Engineering Hacettepe University Slide credit: S. Lazebnik Basic stereo matching algorithm

More information

And. Modal Analysis. Using. VIC-3D-HS, High Speed 3D Digital Image Correlation System. Indian Institute of Technology New Delhi

And. Modal Analysis. Using. VIC-3D-HS, High Speed 3D Digital Image Correlation System. Indian Institute of Technology New Delhi Full Field Displacement And Strain Measurement And Modal Analysis Using VIC-3D-HS, High Speed 3D Digital Image Correlation System At Indian Institute of Technology New Delhi VIC-3D, 3D Digital Image Correlation

More information

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important.

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important. Homogeneous Coordinates Overall scaling is NOT important. CSED44:Introduction to Computer Vision (207F) Lecture8: Camera Models Bohyung Han CSE, POSTECH bhhan@postech.ac.kr (",, ) ()", ), )) ) 0 It is

More information

Critique: Efficient Iris Recognition by Characterizing Key Local Variations

Critique: Efficient Iris Recognition by Characterizing Key Local Variations Critique: Efficient Iris Recognition by Characterizing Key Local Variations Authors: L. Ma, T. Tan, Y. Wang, D. Zhang Published: IEEE Transactions on Image Processing, Vol. 13, No. 6 Critique By: Christopher

More information

Detecting and Identifying Moving Objects in Real-Time

Detecting and Identifying Moving Objects in Real-Time Chapter 9 Detecting and Identifying Moving Objects in Real-Time For surveillance applications or for human-computer interaction, the automated real-time tracking of moving objects in images from a stationary

More information

Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner

Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner F. B. Djupkep Dizeu, S. Hesabi, D. Laurendeau, A. Bendada Computer Vision and

More information

Model-Based Stereo. Chapter Motivation. The modeling system described in Chapter 5 allows the user to create a basic model of a

Model-Based Stereo. Chapter Motivation. The modeling system described in Chapter 5 allows the user to create a basic model of a 96 Chapter 7 Model-Based Stereo 7.1 Motivation The modeling system described in Chapter 5 allows the user to create a basic model of a scene, but in general the scene will have additional geometric detail

More information

Real-time Video Surveillance for Large Scenes

Real-time Video Surveillance for Large Scenes Real-time Video Surveillance for Large Scenes Hanyu Liu Technical Report Abstract Video surveillance provides security monitoring of target scenes. As the public safety is confronted of more and more neoteric

More information

Application questions. Theoretical questions

Application questions. Theoretical questions The oral exam will last 30 minutes and will consist of one application question followed by two theoretical questions. Please find below a non exhaustive list of possible application questions. The list

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Multiple View Geometry

Multiple View Geometry Multiple View Geometry Martin Quinn with a lot of slides stolen from Steve Seitz and Jianbo Shi 15-463: Computational Photography Alexei Efros, CMU, Fall 2007 Our Goal The Plenoptic Function P(θ,φ,λ,t,V

More information

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz Stereo CSE 576 Ali Farhadi Several slides from Larry Zitnick and Steve Seitz Why do we perceive depth? What do humans use as depth cues? Motion Convergence When watching an object close to us, our eyes

More information

Fast Image Registration via Joint Gradient Maximization: Application to Multi-Modal Data

Fast Image Registration via Joint Gradient Maximization: Application to Multi-Modal Data MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Fast Image Registration via Joint Gradient Maximization: Application to Multi-Modal Data Xue Mei, Fatih Porikli TR-19 September Abstract We

More information

Robot Vision: Camera calibration

Robot Vision: Camera calibration Robot Vision: Camera calibration Ass.Prof. Friedrich Fraundorfer SS 201 1 Outline Camera calibration Cameras with lenses Properties of real lenses (distortions, focal length, field-of-view) Calibration

More information

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz Stereo II CSE 576 Ali Farhadi Several slides from Larry Zitnick and Steve Seitz Camera parameters A camera is described by several parameters Translation T of the optical center from the origin of world

More information

C / 35. C18 Computer Vision. David Murray. dwm/courses/4cv.

C / 35. C18 Computer Vision. David Murray.   dwm/courses/4cv. C18 2015 1 / 35 C18 Computer Vision David Murray david.murray@eng.ox.ac.uk www.robots.ox.ac.uk/ dwm/courses/4cv Michaelmas 2015 C18 2015 2 / 35 Computer Vision: This time... 1. Introduction; imaging geometry;

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

Fundamental Matrices from Moving Objects Using Line Motion Barcodes

Fundamental Matrices from Moving Objects Using Line Motion Barcodes Fundamental Matrices from Moving Objects Using Line Motion Barcodes Yoni Kasten (B), Gil Ben-Artzi, Shmuel Peleg, and Michael Werman School of Computer Science and Engineering, The Hebrew University of

More information

Automated calibration of multi-camera-projector structured light systems for volumetric high-speed 3D surface reconstructions

Automated calibration of multi-camera-projector structured light systems for volumetric high-speed 3D surface reconstructions Vol. 26, No. 25 10 Dec 2018 OPTICS EXPRESS 33278 Automated calibration of multi-camera-projector structured light systems for volumetric high-speed 3D surface reconstructions M ARC E. D EETJEN 1,2 1 Mechanical

More information

GOPRO CAMERAS MATRIX AND DEPTH MAP IN COMPUTER VISION

GOPRO CAMERAS MATRIX AND DEPTH MAP IN COMPUTER VISION Tutors : Mr. Yannick Berthoumieu Mrs. Mireille El Gheche GOPRO CAMERAS MATRIX AND DEPTH MAP IN COMPUTER VISION Delmi Elias Kangou Ngoma Joseph Le Goff Baptiste Naji Mohammed Hamza Maamri Kenza Randriamanga

More information

Stereo imaging ideal geometry

Stereo imaging ideal geometry Stereo imaging ideal geometry (X,Y,Z) Z f (x L,y L ) f (x R,y R ) Optical axes are parallel Optical axes separated by baseline, b. Line connecting lens centers is perpendicular to the optical axis, and

More information

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation ÖGAI Journal 24/1 11 Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation Michael Bleyer, Margrit Gelautz, Christoph Rhemann Vienna University of Technology

More information

Camera Calibration for Video See-Through Head-Mounted Display. Abstract. 1.0 Introduction. Mike Bajura July 7, 1993

Camera Calibration for Video See-Through Head-Mounted Display. Abstract. 1.0 Introduction. Mike Bajura July 7, 1993 Camera Calibration for Video See-Through Head-Mounted Display Mike Bajura July 7, 1993 Abstract This report describes a method for computing the parameters needed to model a television camera for video

More information

Aircraft Tracking Based on KLT Feature Tracker and Image Modeling

Aircraft Tracking Based on KLT Feature Tracker and Image Modeling Aircraft Tracking Based on KLT Feature Tracker and Image Modeling Khawar Ali, Shoab A. Khan, and Usman Akram Computer Engineering Department, College of Electrical & Mechanical Engineering, National University

More information

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Ham Rara, Shireen Elhabian, Asem Ali University of Louisville Louisville, KY {hmrara01,syelha01,amali003}@louisville.edu Mike Miller,

More information

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253 Index 3D reconstruction, 123 5+1-point algorithm, 274 5-point algorithm, 260 7-point algorithm, 255 8-point algorithm, 253 affine point, 43 affine transformation, 55 affine transformation group, 55 affine

More information

Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation

Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation Jong Taek Lee, M. S. Ryoo, Matthew Riley, and J. K. Aggarwal Computer & Vision Research Center Dept. of Electrical & Computer Engineering,

More information

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers Augmented reality Overview Augmented reality and applications Marker-based augmented reality Binary markers Textured planar markers Camera model Homography Direct Linear Transformation What is augmented

More information

Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC

Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC Shuji Sakai, Koichi Ito, Takafumi Aoki Graduate School of Information Sciences, Tohoku University, Sendai, 980 8579, Japan Email: sakai@aoki.ecei.tohoku.ac.jp

More information

Fully Automatic Endoscope Calibration for Intraoperative Use

Fully Automatic Endoscope Calibration for Intraoperative Use Fully Automatic Endoscope Calibration for Intraoperative Use Christian Wengert, Mireille Reeff, Philippe C. Cattin, Gábor Székely Computer Vision Laboratory, ETH Zurich, 8092 Zurich, Switzerland {wengert,

More information

Sensor Modalities. Sensor modality: Different modalities:

Sensor Modalities. Sensor modality: Different modalities: Sensor Modalities Sensor modality: Sensors which measure same form of energy and process it in similar ways Modality refers to the raw input used by the sensors Different modalities: Sound Pressure Temperature

More information

3D-OBJECT DETECTION METHOD BASED ON THE STEREO IMAGE TRANSFORMATION TO THE COMMON OBSERVATION POINT

3D-OBJECT DETECTION METHOD BASED ON THE STEREO IMAGE TRANSFORMATION TO THE COMMON OBSERVATION POINT 3D-OBJECT DETECTION METHOD BASED ON THE STEREO IMAGE TRANSFORMATION TO THE COMMON OBSERVATION POINT V. M. Lisitsyn *, S. V. Tikhonova ** State Research Institute of Aviation Systems, Moscow, Russia * lvm@gosniias.msk.ru

More information

Chaplin, Modern Times, 1936

Chaplin, Modern Times, 1936 Chaplin, Modern Times, 1936 [A Bucket of Water and a Glass Matte: Special Effects in Modern Times; bonus feature on The Criterion Collection set] Multi-view geometry problems Structure: Given projections

More information

Using temporal seeding to constrain the disparity search range in stereo matching

Using temporal seeding to constrain the disparity search range in stereo matching Using temporal seeding to constrain the disparity search range in stereo matching Thulani Ndhlovu Mobile Intelligent Autonomous Systems CSIR South Africa Email: tndhlovu@csir.co.za Fred Nicolls Department

More information

Creating a distortion characterisation dataset for visual band cameras using fiducial markers.

Creating a distortion characterisation dataset for visual band cameras using fiducial markers. Creating a distortion characterisation dataset for visual band cameras using fiducial markers. Robert Jermy Council for Scientific and Industrial Research Email: rjermy@csir.co.za Jason de Villiers Council

More information

Pattern Feature Detection for Camera Calibration Using Circular Sample

Pattern Feature Detection for Camera Calibration Using Circular Sample Pattern Feature Detection for Camera Calibration Using Circular Sample Dong-Won Shin and Yo-Sung Ho (&) Gwangju Institute of Science and Technology (GIST), 13 Cheomdan-gwagiro, Buk-gu, Gwangju 500-71,

More information

Visible and Long-Wave Infrared Image Fusion Schemes for Situational. Awareness

Visible and Long-Wave Infrared Image Fusion Schemes for Situational. Awareness Visible and Long-Wave Infrared Image Fusion Schemes for Situational Awareness Multi-Dimensional Digital Signal Processing Literature Survey Nathaniel Walker The University of Texas at Austin nathaniel.walker@baesystems.com

More information