Measurement of Pedestrian Groups Using Subtraction Stereo

Similar documents
Improvement of Human Tracking in Stereoscopic Environment Using Subtraction Stereo with Shadow Detection

A 100Hz Real-time Sensing System of Textured Range Images

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

High-speed Three-dimensional Mapping by Direct Estimation of a Small Motion Using Range Images

Fast and Stable Human Detection Using Multiple Classifiers Based on Subtraction Stereo with HOG Features

Detection of Small-Waving Hand by Distributed Camera System

3D Environment Measurement Using Binocular Stereo and Motion Stereo by Mobile Robot with Omnidirectional Stereo Camera

arxiv: v1 [cs.cv] 28 Sep 2018

Vehicle Dimensions Estimation Scheme Using AAM on Stereoscopic Video

From infinity To infinity Object Projected spot Z: dinstance Lens center b: baseline length Image plane k f: focal length k CCD camera Origin (Project

Fast Human Detection Algorithm Based on Subtraction Stereo for Generic Environment

Research on an Adaptive Terrain Reconstruction of Sequence Images in Deep Space Exploration

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS

Efficient Stereo Image Rectification Method Using Horizontal Baseline

Gesture Recognition using Temporal Templates with disparity information

Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC

Development of Vision System on Humanoid Robot HRP-2

HIGH SPEED 3-D MEASUREMENT SYSTEM USING INCOHERENT LIGHT SOURCE FOR HUMAN PERFORMANCE ANALYSIS

A 200Hz Small Range Image Sensor Using a Multi-Spot Laser Projector

Realtime Omnidirectional Stereo for Obstacle Detection and Tracking in Dynamic Environments

Absolute Scale Structure from Motion Using a Refractive Plate

Rectification and Disparity

Computer Vision Lecture 17

Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera

Computer Vision Lecture 17

Text Block Detection and Segmentation for Mobile Robot Vision System Applications

Transactions on Information and Communications Technologies vol 16, 1996 WIT Press, ISSN

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Passive 3D Photography

A Simple Interface for Mobile Robot Equipped with Single Camera using Motion Stereo Vision

Optical Flow-Based Person Tracking by Multiple Cameras

Behavior Learning for a Mobile Robot with Omnidirectional Vision Enhanced by an Active Zoom Mechanism

A Comparison between Active and Passive 3D Vision Sensors: BumblebeeXB3 and Microsoft Kinect

DEPTH AND GEOMETRY FROM A SINGLE 2D IMAGE USING TRIANGULATION

Recognition of Finger Pointing Direction Using Color Clustering and Image Segmentation

DEVELOPMENT OF REAL TIME 3-D MEASUREMENT SYSTEM USING INTENSITY RATIO METHOD

Estimation of Camera Motion with Feature Flow Model for 3D Environment Modeling by Using Omni-Directional Camera

Utilizing Semantic Interpretation of Junctions for 3D-2D Pose Estimation

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

Stereo Rectification for Equirectangular Images

Camera Registration in a 3D City Model. Min Ding CS294-6 Final Presentation Dec 13, 2006

Three-Dimensional Measurement of Objects in Liquid with an Unknown Refractive Index Using Fisheye Stereo Camera

Comments on Consistent Depth Maps Recovery from a Video Sequence

Synchronized Ego-Motion Recovery of Two Face-to-Face Cameras

Multiple Baseline Stereo

Self-calibration of a pair of stereo cameras in general position

Visual Tracking of Unknown Moving Object by Adaptive Binocular Visual Servoing

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

Compositing a bird's eye view mosaic

Seminar Heidelberg University

Specular 3D Object Tracking by View Generative Learning

A Robust Two Feature Points Based Depth Estimation Method 1)

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera

DETECTION OF 3D POINTS ON MOVING OBJECTS FROM POINT CLOUD DATA FOR 3D MODELING OF OUTDOOR ENVIRONMENTS

Vehicle Detection Method using Haar-like Feature on Real Time System

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Intelligent overhead sensor for sliding doors: a stereo based method for augmented efficiency

Stereo Vision Image Processing Strategy for Moving Object Detecting

Multiview Image Compression using Algebraic Constraints

Dept. of Adaptive Machine Systems, Graduate School of Engineering Osaka University, Suita, Osaka , Japan

Announcements. Stereo

Registration of Moving Surfaces by Means of One-Shot Laser Projection

Flexible Calibration of a Portable Structured Light System through Surface Plane

AUTOMATIC RECTIFICATION OF LONG IMAGE SEQUENCES. Kenji Okuma, James J. Little, David G. Lowe

Measurement of 3D Foot Shape Deformation in Motion

Dynamic Pushbroom Stereo Mosaics for Moving Target Extraction

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Stereo Vision. MAN-522 Computer Vision

On-line and Off-line 3D Reconstruction for Crisis Management Applications

Motion Planning for Dynamic Knotting of a Flexible Rope with a High-speed Robot Arm

Integration of Multiple-baseline Color Stereo Vision with Focus and Defocus Analysis for 3D Shape Measurement

Incremental Observable-Area Modeling for Cooperative Tracking

An Efficient Image Matching Method for Multi-View Stereo

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

LIGHT STRIPE PROJECTION-BASED PEDESTRIAN DETECTION DURING AUTOMATIC PARKING OPERATION

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Fisheye Camera s Intrinsic Parameter Estimation Using Trajectories of Feature Points Obtained from Camera Rotation

Stereo-Based Obstacle Avoidance in Indoor Environments with Active Sensor Re-Calibration

Depth-Layer-Based Patient Motion Compensation for the Overlay of 3D Volumes onto X-Ray Sequences

AN EFFICIENT BINARY CORNER DETECTOR. P. Saeedi, P. Lawrence and D. Lowe

Similarity Image Retrieval System Using Hierarchical Classification

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Theory of Stereo vision system

MOTION. Feature Matching/Tracking. Control Signal Generation REFERENCE IMAGE

A Framework for 3D Pushbroom Imaging CUCS

Computer Vision for Computer Graphics

Segmentation and Tracking of Partial Planar Templates

CS5670: Computer Vision

Fast Lighting Independent Background Subtraction

Camera Geometry II. COS 429 Princeton University

Capturing, Modeling, Rendering 3D Structures

Creating a distortion characterisation dataset for visual band cameras using fiducial markers.

Object Shape Reconstruction and Pose Estimation by a Camera Mounted on a Mobile Robot

Announcements. Stereo

BIL Computer Vision Apr 16, 2014

Face Tracking in Video

CS201 Computer Vision Camera Geometry

Moving Object Detection Using a Stereo Camera Mounted on a Moving Platform

Recap from Previous Lecture

Transcription:

Measurement of Pedestrian Groups Using Subtraction Stereo Kenji Terabayashi, Yuki Hashimoto, and Kazunori Umeda Chuo University / CREST, JST, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan terabayashi@mech.chuo-u.ac.jp Abstract. In this paper, detection of pedestrian groups and counting of the number of pedestrians in each group using subtraction stereo are discussed. Subtraction stereo is a stereo vision method that focuses on the movement of objects to make a stereo camera robust and produces range images for moving regions. Pedestrian groups are detected with a standard labeling, and three dimensional (3D) features of pedestrian groups are measured from range images obtained by subtraction stereo. Then a method to count the number of pedestrians in a group is proposed. The basic algorithm of the subtraction stereo is implemented on a commercially available stereo camera, and the effectiveness of the method to count the number of pedestrians is verified by experiments using the stereo camera. 1 Introduction A huge number of studies have been carried out for stereo vision until now [1,2,3,4]. These days, several practical stereo vision systems have been reported. Some studies realize real-time acquisition of range images using personal computers (PCs) because the CPUs and Graphics Processing Units (GPUs) are fast enough [5,6]. In some studies, a Field Programmable Gate Array (FPGA) is used instead of a PC to acquire range images [7]. Some stereo cameras that are connected to a PC are commercially available [8] and widely used. There are stereo cameras that are practically used for automotive cars [9]. We are aiming at developing a practical stereo camera for applications such as surveillance, in which detection of anomalies or measurement of moving people are required. Several systems have been proposed for such kind of surveillance using a single camera [10,11]. However, a single camera may not be sufficient since the size of a target is not obtained without restriction such that the camera pose is given and the target is on the ground. Stereo cameras are more appropriate, for size information can be directly obtained and scalable use for several scenes becomes possible. It becomes possible to distinguish whether the target is human-size or larger/smaller or to know whether the target person is an adult or a child. For example, it is possible to avoid falsely detecting a bird flying near the camera. As explained above, we can say that stereo cameras have already reached a practical level. However, stereo vision hasaweaknessthatstereomatchingis G. Bebis et al. (Eds.): ISVC 2009, Part II, LNCS 5876, pp. 538 549, 2009. c Springer-Verlag Berlin Heidelberg 2009

Measurement of Pedestrian Groups Using Subtraction Stereo 539 inevitably not robust for weak textures or recurrent patterns, which is called the correspondence problem. There have also been many studies that aimed to solve the problem. Muti-baseline stereo [12] that uses multiple cameras and make the stereo matching more robust is well known. Another approach is to project some pattern such as random dot pattern to give texture on a scene, though this approach does not work when targets are far. We focus on the movement of objects to make a stereo camera robust and have proposed subtraction stereo [13]. In this paper, we choose an issue to detect and measure pedestrian groups, and apply the subtraction stereo to deal with the issue. This paper is organized as follows. In section 2, we show the outline of subtraction stereo. In section 3, we discuss the detection and measurement of pedestrian groups. Then in section 4, we propose a method to estimate the number of pedestrians in a group. Experimental results to measure pedestrians using the stereo camera with subtraction stereo algorithm are given in section 5. This paper is concluded in section 6. 2 Basic Algorithm of Subtraction Stereo Fig. 1 shows the basic algorithm of the subtraction stereo. In standard stereo vision, two images captured with right and left cameras are matched and disparities are obtained at each pixel. The subtraction stereo adds a step to extract moving regions in the images of each camera, and then applies the stereo matching to the extracted moving regions. The extraction of moving regions is realized by a subtraction method, in which the simplest one is background subtraction. In compensation that the pixels where a disparity is obtained are restricted to moving regions, the subtraction stereo can realize the robustness of the stereo matching by restricting the search space for matching, and by using motion information as well as the original image itself. Fig. 2 shows an example of the result of subtraction stereo. Fig. 2(b) is the disparity image obtained by the subtraction stereo for the scene Fig. 2(a). Color represents the disparity; bluer color indicates larger disparity, i.e. smaller distance. In contrast to the disparity image obtained by the standard stereo matching Fig. 2(c), appropriate disparity images are obtained for moving objects only. 3 Detection and 3D Measurement of Pedestrian Group Regions In this section, we explain the measurement of features of pedestrians with the subtraction stereo. We assume a fixed parallel stereo camera in this paper. 3.1 Labeling In the subtraction stereo, the obtained disparity image is originally restricted to moving regions. Therefore, pedestrian group regions can be obtained with the standard labeling technique.

540 K. Terabayashi, Y. Hashimoto, and K. Umeda Scene Camera1 Image input Image input Camera2 Detection of moving regions Subtraction Detection of moving regions Matching Measurement of Disparities, ranges Moving regions + range information Fig. 1. Flow of subtraction stereo 3.2 Measurement of 3D Position When the disparity of a point is given, the corresponding distance along the optical axis and 3D position of the point are calculated. Hereafter, we use distance as the meaning of distance along the optical axis. Let the disparity be k pixel and the distance be z m. The distance z is calculated by z = b f (1) k p where b is the baseline length, f is the focal length of the lens, and p is the width of each pixel of the image. As is well known, the distance is inversely proportional to the disparity. Furthermore, the 3D position of a measured point is obtained from the distance z and the image coordinates (u, v) of a point in the image (see Fig. 3). Assuming that there is no skew and the aspect ratio of each pixel is 1, then the 3D position x is given as x = z [ p f (u u 0) p f (v v 0)1 ] T (2) where (u 0,v 0 ) is the image coordinates of the image center. 3.3 Coordinate Transformation When position and orientation of the camera are given, the 3D position x w of the point in the world coordinate system is calculated as x w = Rx + t (3) where t and R are camera s position and orientation respectively. t is a 3D vector and R is a 3 3 rotation matrix.

Measurement of Pedestrian Groups Using Subtraction Stereo 541 (a) Scene (b) Subtraction stereo (c) Standard stereo Fig. 2. An example of subtraction stereo for an indoor scene

542 K. Terabayashi, Y. Hashimoto, and K. Umeda u v (u 0, v 0 ) Fig. 3. Definition of the image coordinate system As 3D information of the scene is obtained with a stereo camera, its position and orientation parameters can be measured by observing a scene. When a plane (ground) is observed, its height is measured as a position parameter, and its tilt and roll angles are measured as orientation parameters. It is also possible to obtain positions of pedestrians and use the points in spite of the ground points to calculate the position and orientation parameters of the camera. 3.4 Features of Pedestrian Group Region It is possible to calculate 3D position of every point in a pedestrian group region using the procedure above. Thinking of rather large measurement errors of stereo vision (which is proportional to the square of distance), we average the distance to each point of the region and use the value z c as the distance to the region. And by substituting z c and the image coordinates (u c,v c )ofthecentroidofthe region for eq.(2), 3D position x c to represent the region is obtained. At the same time, we adopt the area of the region (i.e., the number of the pixels) S as the feature of the region. 4 Estimation of the Number of Pedestrians In this section, we introduce the method to estimate the number of pedestrians of the detected pedestrian group region. 4.1 Method to Estimate the Number of Pedestrians The area S and the distance z theoretically satisfies c = S z 2 (4) where c is a constant. Let c for one person which is obtained by the equation (4) be c 1. If there is no overlap of persons, the number of pedestrians n can be estimated with the following equation. n = c (5) c 1 c 1 can be estimated by calculation, given the size of a person, or it can also be estimated by measuring a person.

Measurement of Pedestrian Groups Using Subtraction Stereo 543 4.2 Compensation of the Camera Orientation The area S is affected by the camera orientation. When the optical axis of a camera is parallel to the ground, the area becomes maximum. A camera is often fixed at a high position with a downward tilt so that the pedestrians are looked down on as illustrated in Fig. 4. In such a case, area S becomes smaller. When the tilt angle is θ, theareas should be rectified to S = S cosθ. (6) Stereo Camera θ Optical axis Pedestrians Zw Yw Xw Fig. 4. The relation of the camera and pedestrians 5 Experiments In this section, we show experimental results to evaluate the proposed method to measure pedestrian groups. 5.1 Implementation of Basic Algorithm Using a Commercially Available Stereo Camera We implemented the basic algorithm of subtraction stereo on a commercially available stereo camera. The stereo camera is Point Grey Research Bumblebee2 (color, f=3.8 mm). We set the size of the image to 320 240 pixel. In this implementation, the simple background subtraction is employed to extract moving objects. The stereo matching procedure of the Bumblebee2 library was applied to the subtraction images of the right and left cameras and a disparity image is obtained. The rate to obtain disparity images is about 37 fps with a PC with Core 2 Duo T9300 (2.50GHz).

544 K. Terabayashi, Y. Hashimoto, and K. Umeda 5.2 Experiment to Evaluate the Accuracy of the Method to Estimate the Number of Pedestrians We evaluated the accuracy of the method to estimate the number of pedestrians. The stereo camera was installed so that the pedestrians were looked down on from a building as shown in Fig. 5. Fig. 6 shows the experimental setup. The camera was set at the height of 8.3 m with 50 deg downward tilt. Fig. 5. Experimental scene Stereo camera x 50 y 8.3m z (optical axis) z w Person x w y w Ground Fig. 6. Experimental setup of the experiments Estimation of the constant. To estimate the constant c 1, we investigated the change of c when a person with a height of 180 cm walked from forefront to back almost vertically to the camera. Fig. 7 shows the result. It is shown that constant c keeps around 140,000 regardless of distance, and therefore, we set constant c 1 to 140,000. If we obtain the distance z and the width p per pixel, the number

Measurement of Pedestrian Groups Using Subtraction Stereo 545 of pixels of a region of person in the image coordinate system are obtained from eq.(2). It means that when the area S is obtained, then the theoretical value of constant c is obtained. p of the stereo camera is 7.4 μm. When the theoretical value of constant c is set to 140,000, it corresponds to the area S of 0.52 m 2. This area corresponds to the average width of 29 cm for the height 180 cm. c 200000 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 1 51 101 151 201 251 Frame No. Fig. 7. Time-series change of c of one person Examples of estimating the number of pedestrians. We carried out experiments for scenes with different number of pedestrians. Fig. 8, 9, 10 show the results. In each figure, (a) shows the color image of the scene that is captured with the stereo camera, and (b) shows the disparity image obtained with the subtraction stereo and the results to detect pedestrian groups and measure number of pedestrians of each group from the disparity image. The red rectangles surround detected pedestrian regions, and numbers indicate the estimated number of pedestrians. (a) Color image (b) Result Fig. 8. An example of detecting pedestrian groups (several groups)

546 K. Terabayashi, Y. Hashimoto, and K. Umeda (a) Color image (b) Result Fig. 9. An example of detecting pedestrian groups (12 persons) (a) Color image (b) Result Fig. 10. An example of detecting pedestrian groups (25 persons) (a) Experimental scene 1 (2 persons) (b) Experimental scene 2 (10 persons) Fig. 11. Disparity image of the experimental scene The results show that the proposed method works well for several conditions with small (minimum with 1 person) or rather large (maximum with 25 persons) groups; the pedestrian groups are detected appropriately, and the

Measurement of Pedestrian Groups Using Subtraction Stereo 547 Number of people Number of people Number of People 4 2 0 14 12 10 8 6 4 2 0 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 1 51 101 151 201 251 Frame No. (a) Results for experimental scene 1 1 51 101 151 201 251 Frame No. (b) Results for experimental scene 2 1 51 101 151 201 251 Frame No. (c) Results for experimental scene 3 Fig. 12. Experimental result of counting the number of pedestrians estimated numbers are almost accurate. A video to detect the pedestrian groups and estimate number of pedestrians is attached. As the video shows, the process is realized appropriately in real time. Evaluation of the accuracy of estimating the number of pedestrians. We carried out experiments to evaluate the accuracy of estimating the number of pedestrians for different number of pedestrians. We applied the proposed method to scenes in which 2, 10, and 25 pedestrians walk from right to left, and estimated the number of pedestrians. Fig. 11(a) and (b) show disparity images of 2 and 10 pedestrians. Again, the red rectangles surround detected pedestrian regions.

548 K. Terabayashi, Y. Hashimoto, and K. Umeda A disparity image of 25 pedestrians is given in Fig. 10(b). Fig. 12(a), (b), and (c) show the results to estimate the number of pedestrians for these scenes with 2, 10, and 25 persons respectively. A pedestrian group appeared from the right edge of the image, and disappeared from the left edge of the image. The range between two red lines show the period while every pedestrian was captured by the camera. From the experimental results, we can see that a certain level of accuracy is realized to estimate the number of pedestrians by the proposed method. Specifically, errors of estimated numbers while every pedestrian was captured are settled as follows. Fig. 12(a) is from -51 to 13 %, Fig. 12(b) is from -11 to 26 %, and Fig. 12(c) is from -8 to 18 %. And at the outside of two red lines in Fig. 12, we can see that estimated numbers change mostly linearly, which are appropriate results. 6 Conclusions In this paper, we discussed the measurement of pedestrians using subtraction stereo. The method to detect pedestrian groups and estimate the number of pedestrians in each group was proposed. The basic algorithm of the subtraction stereo was implemented on a commercially available stereo camera, and the effectiveness of the proposed method was verified by experiments using the stereo camera. When the overlapped regions of pedestrians seen from the camera increase, the area becomes smaller and consequently the estimated number of pedestrians becomes less than the actual value. Additionally, as the proposed method uses the area, the estimated number are affected by shadows. We proposed a simple technique to remove shadows in [14], but it is only applicable to separated persons and not applicable to a pedestrian group. Our future works include to deal with these issues. Especially, development of a method to remove shadows in real time is important. And we construct a stereo camera with the function of subtraction stereo and apply it for surveillance applications. References 1. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge Univ. Press, Cambridge (2000) 2. Hebert, M.: Active and passive range sensing for robotics. In: Proc. of ICRA 2000, vol. 1, pp. 102 110 (2000) 3. Brown, M., Burschka, D., Hager, G.: Advances in computational stereo. IEEE Trans. Pattern Analysis and Machine Intelligence 25, 993 1008 (2003) 4. Seitz, S., et al.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: Proc. of CVPR 2006, pp. 519 528 (2006) 5. Kagami, S., Okada, K., Inaba, M., Inoue, H.: Realtime 3d depth flow generation and its application to track to walking human being. In: Proc. of 15th International Conference on Pattern Recognition (ICPR 2000), vol. 4, pp. 197 200 (2000)

Measurement of Pedestrian Groups Using Subtraction Stereo 549 6. Ueshiba, T.: An efficient implementation technique of bidirectional matching for real-time trinocular stereo vision. In: Proc. of 18th International Conference on Pattern Recognition (ICPR2006), vol. 1, pp. 1076 1079 (2006) 7. Hariyama, M., Kobayashi, Y., Sasaki, H., Kameyama, M.: Fpga implementation of a stereo matching processor based on window-parallel-and-pixel-parallel architecture. IEICE Trans. Fundamentals, 3516 3522 (2005) 8. Point Grey Research, http://www.ptgrey.com/ 9. Hanawa, K., Sogawa, Y.: Development of stereo image recognition system for ada. In: Proc. IEEE Intelligent Vehicle Symposium 2001 (2001) 10. Collins, R., et al.: A system for video surveillance and monitoring. Tech. report CMU-RI-TR-00-12 (2000) 11. Haga, T., Sumi, K., Yagi, Y.: Human detection in outdoor scene using spatiotemporal motion analysis. In: Proc. of 17th International Conference on Pattern Recognition (ICPR 2004), vol. 4, pp. 331 334 (2004) 12. Okutomi, M., Kanade, T.: A multiple-baseline stereo. IEEE Trans. Pattern Analysis and Machine Intelligence 15, 353 363 (1993) 13. Umeda, K., Nakanishi, T., Hashimoto, Y., Irie, K., Terabayashi, K.: Subtraction stereo - a stereo camera system that focuses on moving regions. In: Proc. of SPIE 3D Imaging Metrology, vol. 7239 (2009) 14. Hashimoto, Y., Matsuki, Y., Nakanishi, T., Umeda, K., Suzuki, K., Takashio, K.: Detection of pedestrians using subtraction stereo. In: Proc. of 2nd International Workshop on SensorWebs, Databases and Mining in Networked Sensing Systems (SWDMNSS 2008), pp. 165 168 (2008)