A multi-camera positioning system for steering of a THz stand-off scanner

Similar documents
Lecture 8 Active stereo & Volumetric stereo

Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner

Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera

Project Title: Welding Machine Monitoring System Phase II. Name of PI: Prof. Kenneth K.M. LAM (EIE) Progress / Achievement: (with photos, if any)

Visual Hulls from Single Uncalibrated Snapshots Using Two Planar Mirrors

Pattern Feature Detection for Camera Calibration Using Circular Sample

Integration of Multiple-baseline Color Stereo Vision with Focus and Defocus Analysis for 3D Shape Measurement

Measurement of 3D Foot Shape Deformation in Motion

Creating a distortion characterisation dataset for visual band cameras using fiducial markers.

3D Sensing. 3D Shape from X. Perspective Geometry. Camera Model. Camera Calibration. General Stereo Triangulation.

Walking gait dataset: point clouds, skeletons and silhouettes

A 3-D Scanner Capturing Range and Color for the Robotics Applications

Fundamental Matrices from Moving Objects Using Line Motion Barcodes

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

arxiv: v1 [cs.cv] 28 Sep 2018

Surround Structured Lighting for Full Object Scanning

Surround Structured Lighting for Full Object Scanning

CS Decision Trees / Random Forests

3D Fusion of Infrared Images with Dense RGB Reconstruction from Multiple Views - with Application to Fire-fighting Robots

Occlusion Detection of Real Objects using Contour Based Stereo Matching

A COMPREHENSIVE SIMULATION SOFTWARE FOR TEACHING CAMERA CALIBRATION

Development of a Fall Detection System with Microsoft Kinect

Lecture 8 Active stereo & Volumetric stereo

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Handheld scanning with ToF sensors and cameras

Camera Calibration for a Robust Omni-directional Photogrammetry System

Computer and Machine Vision

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

Data-driven Depth Inference from a Single Still Image

ENGN D Photography / Spring 2018 / SYLLABUS

Improved Navigated Spine Surgery Utilizing Augmented Reality Visualization

HISTOGRAMS OF ORIENTATIO N GRADIENTS

3D Reconstruction of a Hopkins Landmark

EECS 442 Computer vision. Stereo systems. Stereo vision Rectification Correspondence problem Active stereo vision systems

Human Motion Detection and Tracking for Video Surveillance

Stereoscopic Vision System for reconstruction of 3D objects

Vehicle Dimensions Estimation Scheme Using AAM on Stereoscopic Video

Outline. ETN-FPI Training School on Plenoptic Sensing

#65 MONITORING AND PREDICTING PEDESTRIAN BEHAVIOR AT TRAFFIC INTERSECTIONS

Supplementary Material: Decision Tree Fields

Dynamic Time Warping for Binocular Hand Tracking and Reconstruction

3D-2D Laser Range Finder calibration using a conic based geometry shape

3D Object Model Acquisition from Silhouettes

Tracking Under Low-light Conditions Using Background Subtraction

The main problem of photogrammetry

Archeoviz: Improving the Camera Calibration Process. Jonathan Goulet Advisor: Dr. Kostas Daniilidis

STEREO VISION AND LASER STRIPERS FOR THREE-DIMENSIONAL SURFACE MEASUREMENTS

Fully Automatic Endoscope Calibration for Intraoperative Use

Horus: Object Orientation and Id without Additional Markers

Camera Registration in a 3D City Model. Min Ding CS294-6 Final Presentation Dec 13, 2006

REPRESENTATION REQUIREMENTS OF AS-IS BUILDING INFORMATION MODELS GENERATED FROM LASER SCANNED POINT CLOUD DATA

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Stereo Image Rectification for Simple Panoramic Image Generation

Measurement of Pedestrian Groups Using Subtraction Stereo

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Omni-directional Multi-baseline Stereo without Similarity Measures

International Conference on Communication, Media, Technology and Design. ICCMTD May 2012 Istanbul - Turkey

Project Updates Short lecture Volumetric Modeling +2 papers

Mapping Non-Destructive Testing Data on the 3D Geometry of Objects with Complex Shapes

Transparent Object Shape Measurement Based on Deflectometry

ENGN2911I: 3D Photography and Geometry Processing Assignment 1: 3D Photography using Planar Shadows

EECS 442 Computer vision. Announcements

Easy to Use Calibration of Multiple Camera Setups

Calibration of a Different Field-of-view Stereo Camera System using an Embedded Checkerboard Pattern

Real Time Motion Detection Using Background Subtraction Method and Frame Difference

Multiple View Geometry

Automatic Reconstruction of 3D Objects Using a Mobile Monoscopic Camera

DTU Technical Report: ARTTS

Kinect Cursor Control EEE178 Dr. Fethi Belkhouche Christopher Harris Danny Nguyen I. INTRODUCTION

3D Digitization of a Hand-held Object with a Wearable Vision Sensor

Light source estimation using feature points from specular highlights and cast shadows

Segmentation and Tracking of Partial Planar Templates

LUMS Mine Detector Project

of human activities. Our research is motivated by considerations of a ground-based mobile surveillance system that monitors an extended area for

Dynamic Human Shape Description and Characterization

People detection and tracking using stereo vision and color

Multi-View Stereo for Static and Dynamic Scenes

Gregory Walsh, Ph.D. San Ramon, CA January 25, 2011

HIGH SPEED 3-D MEASUREMENT SYSTEM USING INCOHERENT LIGHT SOURCE FOR HUMAN PERFORMANCE ANALYSIS

And. Modal Analysis. Using. VIC-3D-HS, High Speed 3D Digital Image Correlation System. Indian Institute of Technology New Delhi

Depth Range Accuracy for Plenoptic Cameras

Gait analysis for person recognition using principal component analysis and support vector machines

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Structured light 3D reconstruction

Markerless human motion capture through visual hull and articulated ICP

Temporally-Consistent Phase Unwrapping for a Stereo-Assisted Structured Light System

Image Based Reconstruction II

3D Modeling of Objects Using Laser Scanning

Miniaturized Camera Systems for Microfactories

Structured Light. Tobias Nöll Thanks to Marc Pollefeys, David Nister and David Lowe

Reduced Image Noise on Shape Recognition Using Singular Value Decomposition for Pick and Place Robotic Systems

3D Vision Real Objects, Real Cameras. Chapter 11 (parts of), 12 (parts of) Computerized Image Analysis MN2 Anders Brun,

Video Processing for Judicial Applications

Part I: HumanEva-I dataset and evaluation metrics

Face Detection Using Convolutional Neural Networks and Gabor Filters

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Mixed-Reality for Intuitive Photo-Realistic 3D-Model Generation

Self-learning Voxel-based Multi-camera Occlusion Maps for 3D Reconstruction

Multiple View Geometry

Transcription:

A multi-camera positioning system for steering of a THz stand-off scanner Maria Axelsson, Mikael Karlsson and Staffan Rudner Swedish Defence Research Agency, Box 1165, SE-581 11 Linköping, SWEDEN ABSTRACT Stand-off THz imaging to detect concealed treats is a coming technique for security applications. A THz sensor can provide high resolution 3D imagery of a scene. However, efficient scene scanning and management of the THz sensor is a challenging task due to the limited field of view of the sensor and physical scanning limitations. In this paper we discuss the requirements on a scene scanning solution and present a scene scanning technique using a multi-camera system with 3D positioning capabilities. A visual hull method is used to position subjects in the scene. The presented technique relaxes the requirements on the scanning speed of the THz sensor and facilitates an efficient scene scanning solution. Keywords: Imaging, screening, stand-off detection, teraherz, visual hull 1. INTRODUCTION Detection of concealed threats at stand-off distance is desired in many security applications. Upcoming techniques for stand-off detection use sub-millimeter-wave imaging systems which can provide high resolution 3D imagery of a scene. In such 3D imagery hidden threats can be detected using manual or automated methods and security can be alerted before the threat is close. The THz sensor systems developed for stand-off detection usually has a limited spatial coverage due to the instantaneous field of view, the scanning rate, and the manageable data rate, which puts a demand on methods for efficient scene scanning. A scene scanning system is needed to control the THz sensor and steer it to the 3D position where a subject or specific part of a subject is located. Valuable imaging time can be saved using accurate 3D positioning of each subject to avoid scanning empty areas. Subjects can be tracked and positioned in the scene and at each time the sensor can be steered to the next point of interest. In complex scenarios where subjects are allowed to walk or move freely, the system can also be used to track individual body parts and obtain full coverage of the body incrementally, as different body parts of a person become visible to the THz sensor system. In addition to positioning, a scene scanning system can also provide an estimate of scan completeness or information from the scanning system can be used to merge the high resolution volume data from different scans, e.g., frontal and rear scans of a person, to verify that full coverage of the body is obtained. In this paper we discuss scene scanning methods using multi-sensor approaches. We also present our experimental results from an investigation of a scene scanning technique using a multi-camera system with 3D positioning capabilities. Multi-sensor positioning systems are investigated since their ability to detect and position subjects accurately in 3D is greatly improved compared to using only a single sensor. The 3D positioning is demonstrated on real data acquired from seven HD video cameras. We use a visual hull method where each camera view provide support for the presence of interesting foreground objects in each part of the scene based on an adaptive Gaussian background model. The presented technique relaxes the requirements on the scanning speed of the THz sensor and facilitates an efficient scene scanning solution. Correspondence: maria.axelsson@foi.se Passive Millimeter-Wave Imaging Technology XIV, edited by David A. Wikner, Arttu R. Luukanen, Proc. of SPIE Vol. 8022, 80220L 2011 SPIE CCC code: 0277-786X/11/$18 doi: 10.1117/12.883394 Proc. of SPIE Vol. 8022 80220L-1

Figure 1. Example scene where a single subject is scanned from three sides while walking through an appointed path. 2. SCANNING SCENARIOS Several scenarios can be imagined at a security checkpoint. The problem complexity increases with the number of subjects in the scene and with their allowed variation in pose and position. The simplest case is where a single subject stands in an appointed pose while the THz system scans one side. The subject then turns and the other side is scanned. As the pose and position is roughly known the scene scanning system has much a-priori information. If the high resolution sensor is slow in image acquisition compared to the time it is possible to stand still, it becomes necessary to track body parts and keep track of the scanned volumes. The problem complexity increases when the subject is allowed to move more freely. Figure 1 shows an example scene setup for a case where the subject is moving along a path and scans can be obtained from three different views during motion. The position and pose of a subject walking through the security checkpoint need to be tracked to ensure that all parts of the body are scanned. Depending on the time constraints set by the speed of the THz scanning system, both the position and orientation of the person and the positions and orientations of the arms and legs might be needed. In an environment where one person at the time passes through the scanning area, there is no risk of interperson occlusions. Additionally, the scanning system does not need to shift its focus between multiple persons. This relaxes the requirement on fast sensor movements. It is desirable maximize the flow through a security checkpoint, and hence unnecessary restrictions should be avoided. If multiple subjects are allowed to walk along the designated path simultaneously, there is of course an increase in complexity and such scenarios are further restricted by the image acquisition time. In an environment where subjects can move entirely freely, occlusions are likely to preclude complete scanning of all persons. The highest complexity level is scanning a scene with a free flowing crowd. This is far beyond the achievable horizon at this time. Proc. of SPIE Vol. 8022 80220L-2

3. CAMERA CONFIGURATION FOR A SCENE SCANNING SYSTEM Information about the position and pose of all subjects in a scene can be obtained using many different sensor configurations. We have briefly looked into different sensors, e.g., visual cameras, infrared cameras, and range sensors. However, as many image processing methods are already available for visual cameras we decided to start our experiments with a multi-camera setup using visual cameras. Infrared cameras can also be added to aid detection of subjects. A camera configuration should contain multiple cameras to facilitate positioning of individuals that needs to be screened. When using multiple single cameras, positioning and tracking performance is improved compared to using a single sensor. Both occlusions between subjects and subject self-occlusion can be handled better using a multi-camera setup. If multiple cameras are used, distance measurements can be made using triangulation, stereo images (dense stereo maps) can be calculated using pairs of cameras to identify several individuals which are partly occluded easier, and 3D positioning methods like the visual hull can be used. The method using visual hull is further described in Section 4. The camera positions must be known to be able to calculate stereo maps and triangulate distances to objects in the world. This requires that both the intrinsic and extrinsic camera parameters are known. An example of practical camera calibration is described in Section 5. The cameras should be set up in a configuration around the scene where they cover a common field of view. In our experiment described in Section 5 we have used seven cameras in a circle to be able to investigate methods for pose and positioning using a multiple-view setup. However, in a final scanning system fewer cameras in a half circle where one is positioned close to the high resolution sensor may be enough. This will be determined by demands from the application on the scene scanning and the requirements on the tracking and positioning algorithms. 4. IMAGE PROCESSING METHODS Image processing methods can be used for 3D positioning, shape extraction, and pose estimation using the camera data. In addition, the data can be used to extract 3D models of each subject to be matched with the high resolution THz data and gain information about the completeness of the scans. Some of the available image processing methods are described in the following paragraphs. First detection of individuals in the scene is needed. This can be achieved by using a background model and extract the foreground in each frame. Then potential targets can be detected using a tailored detector, e.g., a head detector. Accurate and consistent detections in complex scenes with many subjects or many moving objects are usually difficult to obtain. Therefore detections are often not used by themselves if a robust positioning method is needed. Instead, detections can be passed on to a target tracker which use a model of the possible movements to select feasible tracks. If the scanning system is fast and can capture the interesting part of a body before the subject has moved too far, this sort of tracking together with a silhouette image from the position of the THz sensor may be enough in a scene scanning system. If more precise 3D positioning is needed, full body volume detection and tracking of body parts can be used. An example method for 3D positioning and estimation of shape is extraction of the visual hull. Also here a background model is used and the foreground objects, such as individuals, are segmented in each view and segmentations from all camera views are used to reconstruct the 3D bounding volume of the foreground objects. This is in some sense similar to tomographic reconstruction except that we only have binary images with object and background classes. The visual hull was introduced by Laurentini 1 as the volume which completely encloses an object in a scene given a set of silhouette images. Is is widely used to produce three dimensional models from multiple views. In multi-person scenarios, many persons often occlude each other in some camera views and therefore this type of method often needs many views, which may be impractical. To relax the requirements on the segmentation of the subjects against the background, range sensors can be used to aid the extraction of the visual hull. In addition to positioning subjects, pose estimation is important if subjects are allowed to move in the scene or if the scanning is very slow and subjects can not stand still. In such scenarios data from high resolution Proc. of SPIE Vol. 8022 80220L-3

Figure 2. The actors in the field experiment showing the variation of clothes. imagery must be registered to the correct body parts on a model to identify regions that are not yet scanned. Pose estimation can, e.g., be performed by extracting silhouettes from all camera images and matching them to possible poses, selecting the pose which best matches the silhouettes. Pose can also be obtained using dense stereo map or range sensors.2 5. EXPERIMENTAL SETUP A field trial was carried out in the summer of 2010. The aim of the trial was to collect data sets to assess scene scanning methods and evaluate specific image processing methods. In the experiment seven Panasonic HDC SD-700 camcorders recording at 50 fps were used. A fictive circular shaped checkpoint with approximately 12 meters in diameter was built. Cameras and lightning were rigged in a circle, where the cameras were mounted 2.5-3.0 meters above the ground facing a common scene center. Measurements were collected during two days. The first day our three main experiments where carried out where four actors walked along three pre-defined paths. The actors wore more and less concealing clothes to get data suitable for evaluation of pose estimation difficulty. Figure 2 shows the actors and three sets of clothes. The second day, a smaller experiment where four different actors in regular office clothes were standing in different simple poses to use for initial experiments with pose tracking. Camera calibration scenes were also recorded both days. First the video sequences were time synchronized. An example frame is shown in Figure 3. This was performed by detecting a specific event in all views such as a clap and flashing light. The frame rate is high which makes accurate time synchronization easier. The intrinsic camera parameters (focal length, principal point, and lens distortion) and the extrinsic camera parameters (camera position and orientation) of the seven cameras must be determined to be able to obtain accurate measurements from the video sequences. The intrinsic parameters are calibrated for each camera individually using the camera calibration toolbox for Matlab.3 A checkerboard pattern is moved around to cover Proc. of SPIE Vol. 8022 80220L-4

Figure 3. Seven cameras are synchronized and calibrated to obtain measurements of a common scene. the entire field of view of the camera. The corners on the checkerboard are found automatically4 and extracted with sub-pixel precision. A wrapper is used for the main calibration function in the toolbox to automate the intrinsic camera calibration. The extrinsic camera parameters were determined using reference markers on the ground. The coordinates of the markers on the ground, the coordinates of the markers in each camera view and the intrinsic camera parameters are then used together to calculate the position and orientation of the each camera. The current extrinsic calibration method has an accuracy below a couple of centimeters at a distance of five to six meters (in the scene center). The manual steps in the current calibration method can be reduced and the accuracy can be improved. We are currently investigating automated self-calibration of the rig to facilitate fast deployment of the scene scanning system.5 6. RESULTS FROM THE EXPERIMENTS The measurements from the field trial have been used to evaluate a method for scene scanning using extraction of the visual hull of the subjects in the scene. The visual hull is the 3D space which is bounded by the projection of the subject in the camera views. It can be refined when more cameras are added to a scene. The method we use to extract the visual hull is based on silhouette images of the subject. Each of the seven camera views in a frame is segmented into foreground and background using a Gaussian background model. All pixels in the image are given a probability of belonging to the foreground. The background model is also adaptive to gradual changes over time. An example image and the probability of each pixel belonging to the foreground is shown in Figure 4. The visual hull is extracted by projecting the foreground pixels in each camera to the common world coordinates in the scene. The world is divided into cells which gain support from each camera that there is an object of interest occupying the cell. If many (often all) cameras agree that there is something in the cell it is set as object. The visual hull extraction is calculated in levels from the ground and upwards and the resolution of the cells can be set differently for all three dimensions. The resulting visual hull is represented as cells which are set or not set. It is also possible to set a probability for each cell to contain an object. A visualization of the extracted visual hull is shown in Figure 5. As can be seen in the figure shadows can affect the extracted visual hull. Also pixels misclassified as foreground or background effect the result. Proc. of SPIE Vol. 8022 80220L-5

Figure 4. (Left) Image frame from one of the cameras. (Right) Foreground segmentation using the Gaussian background model. Figure 5. (Left) Visualization of the extracted visual hull and its position on the floor in 3D. (Right) A part of a camera frame used in the reconstruction of the visual hull. These effects can be reduced using an improved model for background extraction. Our example here is used for demonstration purposes and the segmentation parameters have not been tuned to our specific conditions at this time. Further visualization examples of the visual hull are shown in Figure 6. All examples of the visual hull shown here are extracted using a cell size of 0.01 meters. This small cell size may not be required for an efficient scene scanning method. However, this needs further investigation. A method for scene scanning using visual hull may be used in combination with target tracking to find the interesting part of the scene where accurate 3D positioning is needed. This can reduce the processing time required for the scene scanning. The time requirements on the scene scanning system to output new scanning positions will depend on the scanning speed of the THz system. If the scanning is fast the requirement on tracking body parts is somewhat relaxed and if the scanning is slow there will be more time left to process the images and track subjects and their body parts. Proc. of SPIE Vol. 8022 80220L-6

Figure 6. The extracted visual hull of a subject from four different views. 7. CONCLUSION AND FUTURE WORK In this paper we have discussed some of the available techniques for scene scanning. With the data from the field trial we have demonstrated a method for 3D positioning of subjects in a scene, using a visual hull method. This type of method is feasible for scenes with few well separated individuals since occlusion between subjects may create ghost answers. If the scanning using the THz system is rapid, scenarios like a single subject standing in any pose may be solvable in combination with a merging of data from, e.g., the front and back side of the subject. To handle more complex scenes, e.g., where a single subject moves along an appointed path, the visual hull method needs to be extended with tracking functionality. Other methods such as pose estimation from silhouettes may be possible to use directly without extracting the full visual hull as an intermediate step. Pose estimation can also be used to assess scan completeness. Further investigations are needed to evaluate the methods suggested for scene scanning with respect to performance and speed. ACKNOWLEDGMENTS MSB (Swedish Civil Contingencies Agency) and FMV (Swedish Defence Materiel Administration) are acknowledged for funding this study. REFERENCES 1. A. Laurentini, The visual hull concept for silhouette-based image understanding, IEEE Trans. Pattern Anal. Mach. Intell. 16, pp. 150 162, February 1994. 2. J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, Real-time human pose recognition in parts from single depth images, in IEEE International Conference on Computer Vision, 2011. 3. J.-Y. Bouguet, Camera calibration toolbox for Matlab. http://www.vision.caltech.edu/bouguetj/calib doc/. 4. M. Axelsson, P. Follo, and C. Grönwall, Camera calibration using automated identification of checkerboard patterns, in Proceedings of SSBA 2010, Symposium on image analysis, Uppsala, 2010. 5. M. Axelsson, Automatic calibration of a camera positioning system using estimation of the essential matrix, in Proceedings of SSBA 2011, Symposium on image analysis, Linköping, 2011. Proc. of SPIE Vol. 8022 80220L-7