Learning the Three Factors of a Non-overlapping Multi-camera Network Topology

Similar documents
2003 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes

Multi-Camera Target Tracking in Blind Regions of Cameras with Non-overlapping Fields of View

Research Article Continuous Learning of a Multilayered Network Topology in avideocameranetwork

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Tracking objects across cameras by incrementally learning inter-camera colour calibration and patterns of activity

Graph Matching Iris Image Blocks with Local Binary Pattern

Practical Camera Auto-Calibration Based on Object Appearance and Motion for Traffic Scene Visual Surveillance

Pedestrian counting in video sequences using optical flow clustering

Automatic Shadow Removal by Illuminance in HSV Color Space

Research on Recognition and Classification of Moving Objects in Mixed Traffic Based on Video Detection

A Background Modeling Approach Based on Visual Background Extractor Taotao Liu1, a, Lin Qi2, b and Guichi Liu2, c

Sequences Modeling and Analysis Based on Complex Network

Robust color segmentation algorithms in illumination variation conditions

Iterative Removing Salt and Pepper Noise based on Neighbourhood Information

Research on the Checkpoint Server Selection Strategy Based on the Mobile Prediction in Autonomous Vehicular Cloud

An Automatic Timestamp Replanting Algorithm for Panorama Video Surveillance *

12/12 A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication

An Edge-Based Approach to Motion Detection*

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement

Research on Evaluation Method of Video Stabilization

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Statistical Inference of Motion in the Invisible

Story Unit Segmentation with Friendly Acoustic Perception *

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance

Face Alignment Under Various Poses and Expressions

Fundamental Matrices from Moving Objects Using Line Motion Barcodes

arxiv: v1 [cs.cv] 20 Dec 2016

Clustering-Based Distributed Precomputation for Quality-of-Service Routing*

A Growth Measuring Approach for Maize Based on Computer Vision

Graph-based High Level Motion Segmentation using Normalized Cuts

A Linear Approximation Based Method for Noise-Robust and Illumination-Invariant Image Change Detection

Spatial Latent Dirichlet Allocation

PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE

Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association

Minimal Test Cost Feature Selection with Positive Region Constraint

Moving cameras Multiple cameras

Multi-Camera Calibration, Object Tracking and Query Generation

Embedded Palmprint Recognition System on Mobile Devices

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization

K-coverage prediction optimization for non-uniform motion objects in wireless video sensor networks

New structural similarity measure for image comparison

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

Research on QR Code Image Pre-processing Algorithm under Complex Background

Real-time Vehicle Matching for Multi-camera Tunnel Surveillance

Measurement of Pedestrian Groups Using Subtraction Stereo

A Real Time Human Detection System Based on Far Infrared Vision

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

View Registration Using Interesting Segments of Planar Trajectories

An Objective Evaluation Methodology for Handwritten Image Document Binarization Techniques

A New Algorithm for Detecting Text Line in Handwritten Documents

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Spatio-Temporal LBP based Moving Object Segmentation in Compressed Domain

A Joint Histogram - 2D Correlation Measure for Incomplete Image Similarity

Inter-camera Association of Multi-target Tracks by On-Line Learned Appearance Affinity Models

RSRN: Rich Side-output Residual Network for Medial Axis Detection

Temporal Filtering of Depth Images using Optical Flow

Real-time target tracking using a Pan and Tilt platform

A multiple hypothesis tracking method with fragmentation handling

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

EE795: Computer Vision and Intelligent Systems

ФУНДАМЕНТАЛЬНЫЕ НАУКИ. Информатика 9 ИНФОРМАТИКА MOTION DETECTION IN VIDEO STREAM BASED ON BACKGROUND SUBTRACTION AND TARGET TRACKING

Estimation Of Number Of People In Crowded Scenes Using Amid And Pdc

Reversible Image Data Hiding with Local Adaptive Contrast Enhancement

Multi-camera Pedestrian Tracking using Group Structure

Robust Fingertip Tracking with Improved Kalman Filter

Optical flow and tracking

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Spatial Outlier Detection

Logical Templates for Feature Extraction in Fingerprint Images

2 Proposed Methodology

Available online at ScienceDirect. Procedia Computer Science 22 (2013 )

Trajectory Analysis and Semantic Region Modeling Using A Nonparametric Bayesian Model

arxiv: v1 [cond-mat.dis-nn] 30 Dec 2018

Detecting Anomalous Trajectories and Traffic Services

A Modified Mean Shift Algorithm for Visual Object Tracking

Fingerprint Ridge Distance Estimation: Algorithms and the Performance*

Multiple View Geometry of Projector-Camera Systems from Virtual Mutual Projection

Correcting User Guided Image Segmentation

Connected Component Analysis and Change Detection for Images

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Cluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning

DYNAMIC BACKGROUND SUBTRACTION BASED ON SPATIAL EXTENDED CENTER-SYMMETRIC LOCAL BINARY PATTERN. Gengjian Xue, Jun Sun, Li Song

Robot localization method based on visual features and their geometric relationship

Human Gait Recognition Using Bezier Curves

A PMU-Based Three-Step Controlled Separation with Transient Stability Considerations

A Boosting-Based Framework for Self-Similar and Non-linear Internet Traffic Prediction

Automatic visual recognition for metro surveillance

Image Interpolation using Collaborative Filtering

CIS UDEL Working Notes on ImageCLEF 2015: Compound figure detection task

Path Recovery of a Disappearing Target in a Large Network of Cameras using Particle Filter

Simultaneous Vanishing Point Detection and Camera Calibration from Single Images

Figure 1 shows unstructured data when plotted on the co-ordinate axis

Person identification from spatio-temporal 3D gait

Image segmentation based on gray-level spatial correlation maximum between-cluster variance

A Heat-Map-based Algorithm for Recognizing Group Activities in Videos

A Novel Extreme Point Selection Algorithm in SIFT

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine

Transcription:

Learning the Three Factors of a Non-overlapping Multi-camera Network Topology Xiaotang Chen, Kaiqi Huang, and Tieniu Tan National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences {xtchen,kqhuang,tnt}@nlpr.ia.ac.cn Abstract. In this paper, we propose an unsupervised approach for learning the three factors of the topology of a non-overlapping multicamera network, which are nodes, links, and transition time distributions. It is a cross-correlation based method. Different from previous methods, the proposed method can deal with large amounts of data without considering the size of time window. The connectivity between nodes is estimated based on the N-neighbor accumulated cross-correlations, as well as the transition time distribution for each link. Furthermore, integrated with similarity cues, the proposed method can be extended into weighted cross-correlation models for better performance. Experimental results both on simulated and real-life datasets demonstrate the effectiveness of the proposed method. Keywords: Topology recovering, Transition time distribution, Camera network, Non-overlapping views. 1 Introduction As the number of cameras used in the wide area video surveillance increases, multi-camera object tracking plays a more important role in understanding and analyzing the scenes. It is a challenging problem. Especially when cameras have non-overlapping views among them, the observations of the same object under different cameras are often widely separated in time and space. Thus lack of spatio-temporal cues between cameras makes it different from single camera object tracking or overlapping multi-camera object tracking. To compensate for lack of spatio-temporal cues across cameras, various strategies are proposed to recover the topology graph of the non-overlapping multi-camera network. Figure 1 shows a topology graph of a non-overlapping multi-camera network. The topology graph usually has three main factors: firstly, the nodes, from which objects enter or exit; secondly, the links between nodes, indicating the connectivity of each two nodes and corresponding to the real paths in the environment which can be followed by objects; thirdly, the transition time distribution for each link across cameras, demonstrating the probability of transition time of an object moving from one node to another. If we observe an object C.-L. Liu, C. Zhang, and L. Wang (Eds.): CCPR 2012, CCIS 321, pp. 104 112, 2012. c Springer-Verlag Berlin Heidelberg 2012

Topology Recovering of a Non-overlapping Multi-camera Network 105 Fig. 1. The topology graph of a non-overlapping multi-camera network. Nodes are entry/exit zones labeled by different numbers. The green solid arrows denote visible paths within the field of view (FoV) of each camera, which can be detected by single camera tracking. The red dotted arrows represent valid links between nodes across cameras, which depend on methods of recovering the topology to estimate the existence and corresponding transition time distributions. leaves the field of view of a camera at a moment, we can predict the object s re-appearance after some time under certain cameras using the knowledge of topology. 2 Related Work and Contributions Generally, the nodes are defined as entry/exit zones in the FoVs of cameras, which can be learned by clustering the starting or ending points of trajectories observed by single camera tracking [1 3], or defined as single cameras [4]. To estimate the existence of link between two nodes, and the transition time distribution for each link, the methods can be put into two categories. The first one is based on solving the correspondence problem [2, 5, 6] or object tracking [7]. Javed, O. et al [5] use Parzen windows to estimate the inter-camera spacetime probabilities from training data, assuming the correspondences are known. These methods usually have good estimations of the transition time distributions, however, solving the problem of correspondences or object tracking itself is complicated and challenging. The second one does not require establishing correspondences between observations or object tracking [1, 3, 8, 9]. Makris, D. et al [8] calculate a crosscorrelation function of two signals which represent the arrival event sequence observed at one node and the departure event sequence observed at the other node in a time window. Ideally if a link exists, then the cross-correlation has a clear peak around the most popular transition time. However, in most cases, the peak is not so clear due to the large variance of transition time of true correspondences and a large number of false correspondences which result from a large traffic flow or a long time window. To make the peak sharp, methods [3, 9] add similarity in appearance to weight the cross-correlation model. These

106 X. Chen, K. Huang, and T. Tan methods are usually easy to be implemented, however, few of them consider the estimation of transition time distributions. To estimate the transition time distribution, Zou, X. et al [9] fit K Gaussian functions to a normalized crosscorrelation using the EM algorithm, which is not proper for considering both true and false correspondences. Based on cross-correlation function, our method focuses on decreasing the large variance of transition time of true correspondences, which can compensate for the influence caused by large-scale false correspondences to a certain degree. Thus, the proposed method can deal with large amounts of data or a long time window. Based on an iteration, our method can estimate the transition time distribution for each valid link, which is different from other cross-correlation based methods. In addition, the proposed method avoids solving the problem of establishing correspondences between non-overlapping views, making it easy to be implemented. 3 Proposed Method In this paper, we represent the entry/exit zones in the FoVs of cameras as nodes in the topology, and estimate the locations of nodes by clustering the starting or ending points of trajectories observed by single camera tracking, similar to the node estimation in [1 3]. As we mentioned before, directly estimating the topology from cross-correlations suffers from large-scale false correspondences and the large variance of transition time of true correspondences. We assume that the transition time of true correspondences is within a restricted range of some popular transition time while transition time caused by false correspondences is irregular and widespread in the whole time axis. Based on this reasonable assumption, we compute an N-neighbor accumulated cross-correlation function for every two nodes across cameras to reduce the influence caused by large variance of transition time of true correspondences, which makes the peak clear and sharp. Given node i and node j from two cameras, we observe objects departing at node i and arriving at node j within a time window which is long enough. Let D i (t) anda j (t) denotes the departure time sequence at node i and arrival time sequence at node j respectively. Then the N-neighbor accumulated crosscorrelation function of D i (t) anda j (t) is computed as follows: R i,j n (τ n )= = = τ n+n τ 0=τ n n τ n+n τ 0=τ n n τ n+n R i,j 0 (τ 0) τ 0=τ n n t= E[D i (t) A j (t + τ 0 )] + D i (t) A j (t + τ 0 ),τ n n (1) where R i,j 0 (τ 0) represents the cross-correlation function of D i (t) anda j (t).

Topology Recovering of a Non-overlapping Multi-camera Network 107 Algorithm 1. The estimation of valid links and transition time distributions 1: input: D i(t), A j(t) 2: Initialize Link = False, MaxList(τ) =0,whereτ 0. 3: for n from 1 to M 1 do 4: Compute Rn i,j (τ n) according to Eq.1. 5: find τn = argmaxrn i,j (τ n) τ n MaxList(τn) MaxList(τ n)+1 6: if n M 2 then 7: StepMaxList(τ )= τ +T 2 MaxList(τ),τ T 2 τ=τ T 2 8: ratio =max(stepmaxlist)/σ(maxlist) 9: if ratio T 1 then 10: Link = True, break 11: end if 12: end if 13: end for 14: output: N(τn,n), when Link = True Considering the application that tracking pedestrians across non-overlapping cameras, we only deal with one traffic pattern (pedestrians), assuming that the transition time between each two nodes across cameras follows a normal distribution. We process an iteration to estimate the connectivity for each pair of nodes, and the parameters of the transition time distribution as well. The main idea of this iteration is to find the most steady and frequent peak in Rn i,j (τ n ) rather than a very clear peak in the cross-correlation. The details of the proposed method are summarized in Algorithm 1, where M 1 and M 2 set the upper and lower limits of n respectively. T 1 is a threshold above which a valid link is believed to exist. The parameter T 2 allows a small fluctuation of the average transition time. N(τn,n) is the estimated transition time distribution for a valid link, where τn demonstrates the average transition time. Figure 2 gives an example using the proposed method to detect a valid link under the condition that the pedestrian flow is large and the time window is very long (far longer than the average transition time 885), while general cross correlation [8] fails. There are two clear peaks in the cross-correlation using the method [8], while neither of them is around actual average transition time. Estimating transition time distributions by the EM algorithm based on the cross-correlations is improper because of so much noise, while using the proposed method, the transition time distribution of this link is estimated to be N(898, 139), very close to the ground truth N(885, 148). Combined with similarity cues, the proposed method can be applied to the weighted cross-correlation model [3, 9], by transforming the computation of R i,j n (τ n) from Eq.1 to Eq.2: R i,j n (τ n )= τ n+n + τ 0=τ n n t= Sim(O i (t),o j (t + τ 0 )),τ n n (2)

108 X. Chen, K. Huang, and T. Tan Fig. 2. The estimated cross-correlations. By the method in [8]; By our method without similarity cues (R 2,3 n (τ n), n = 139). where O i (t) ando j (t) denotes the departure object sequence at node i and arrival object sequence at node j respectively. Sim(, ) measures the similarity between each object in O i (t) and each object in O j (t). For this similarity measurement, any effective feature can be used. 4 Experimental Results We evaluate the performance of the proposed method both on simulated data and real-life data. 4.1 Simulated Experiments The simulation is based on a multi-camera network shown in Figure 3. In the network, the nodes in a closed dotted curve belong to the same camera. The departure time of 1000 moving objects follows a uniform distribution U(0, 1500), and the transition time between nodes follows a normal distribution N(300, 20). Each object is equally likely to arrive at any connected node after leaving any node(in the same camera or a different camera). M 1 and M 2 is set to 100 and 10 respectively in this case. The threshold T 1 is set to 0.98, and T 2 is set to 20. First, the proposed method is compared with a benchmark method [8]. In our experiments, the parameter ω in method [8] is set to 2, which controls the peak detection threshold. Then, we extend the proposed method into the weighted

Topology Recovering of a Non-overlapping Multi-camera Network 109 Fig. 3. Experimental setups. The network topology; The Gaussian models of similarity. The green area demonstrates the possibility of failures in object matching, which is true to fact. cross-correlation model according to Eq.2, and compare it with the method [3]. The similarity between the same objects and between two different objects follows N(0.7, 0.1) and N(0.4, 0.1) respectively, shown in Figure 3. Figure 4 shows the recovered topology graphs using different methods. Since we focus on estimating the connectivity between every two nodes across cameras, the links within the same FoVs are neglected. Previous methods [3, 8] have bad performance, due to the large amounts of traffic data and a long time window. Extended to the weighted cross-correlation model, the proposed method fully recovers the topology of the simulated network, as shown in Figure 4 (d). Our method also recovers the transition time distribution, not only an average transition time for each valid link. Estimated cross-correlations for the link from node 2 to node 1 are shown in Figure 5. Only our method successfully detects this valid link, and returns the average transition time after the nth iteration. N(267,12) N(300,10) N(293,13) N(304,22) N(308,15) N(270,26) N(284,16) N(304,12) (c) (d) Fig. 4. The recovered topology graphs. By the method in [8]; By our method without similarity cues; (c) By the method in [3]; (d) By our method with similarity cues. 4.2 Real-Life Experiments The real-life experimental setup of the network is shown in Figure 6. The network has three non-overlapping cameras, containing two, two and four entry/exit zones respectively. We use two-hour-long videos to recover the topology of the network. Gaussian Mixture Model is used to detect every pedestrian entering or leaving each node and the corresponding arrival time or departure time is also recorded.

110 X. Chen, K. Huang, and T. Tan (c) (d) Fig. 5. The estimated cross-correlations. By the method in [8]; By our method without similarity cues (Rn 2,1 (τ n), n = 22); (c) By the method in [3]; (d) By our method with similarity cues (Rn 2,1 (τ n), n = 15). Fig. 6. The layout of a non-overlapping multi-camera network; The corresponding topology graph N(898,139) N(885,148) N(870,118) N(890,124) N(990,140) N(974,177) Fig. 7. The recovered topology graph. The ground truth (red) is estimated by the EM algorithm based on only true correspondences labeled by hand.

Topology Recovering of a Non-overlapping Multi-camera Network 111 We do not set limits for the size of time window, so all the departure events and arrival events happened in the videos are used to recover the topology. M 1 and M 2 is set to 1000 and 10 respectively in this case. The threshold T 1 is set to 0.9, and T 2 is set to 50. The recovered topology is shown in Figure 7. The learned transition time distributions (black) provide good estimates of the ground truth. Although there is a real path from node 5 to node 4, the proposed method fails to detect this valid link because there are only two pedestrians walking from node 5 to node 4 among the 53 departure events detected in node 5 and 100 arrival events detected in node 4 in the whole videos. It is very difficult to detect this valid link based on so few true correspondences. Estimated cross-correlations for the link from node 2 to node 3 using the proposed method and the method [8] are shown in Figure 2. 5 Conclusions In this paper, we have presented a solution for automatically recovering the three factors of the topology of a non-overlapping camera network. Unlike previous cross-correlation based work, the proposed method can deal with large amounts of data without considering the size of time window. The connectivity for each pair of nodes is estimated based on the stability and frequency of peaks in the N- neighbor accumulated cross-correlations, which is more robust. Combined with similarity cues, the proposed method canbeappliedtoweightedcross-correlation models, which improves the performance. Future work will focus on extending the proposed method to large-scale multi-camera networks. Acknowledgement. This work is funded by National Natural Science Foundation of China (Grant No.61175007,61175002), the Strategic Priority Research Program of Chinese Academy of Sciences (Grant No.XDA06030300), the National Basic Research Program of China (Grant No.2012CB316302), the Tsinghua National Lab for Information Science and Technology Cross-discipline Foundation (Grant No.Y2U10 11MC1). References 1. Ellis, T.J., Makris, D., Black, J.K.: Learning a Multi-camera Topology. In: Joint IEEE Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS), pp. 165 171 (2003) 2. Cai, Y., Huang, K., Tan, T., Pietikäinen, M.: Recovering the Topology of Multiple Cameras by Finding Continuous Paths in a Trellis. In: ICPR, pp. 3541 3544 (2010) 3. Niu, C., Grimson, E.: Recovering Non-overlapping Network Topology using Far-field Vehicle Tracking Data. In: ICPR, pp. 944 949 (2006) 4. Marinakis, D., Giguère, P., Dudek, G.: Learning network topology from simple sensor data. In: Kobti, Z., Wu, D. (eds.) Canadian AI 2007. LNCS (LNAI), vol. 4509, pp. 417 428. Springer, Heidelberg (2007)

112 X. Chen, K. Huang, and T. Tan 5. Javed, O., Rasheed, Z., Shafique, K., Shah, M.: Tracking across Multiple Cameras with Disjoint Views. In: ICCV, pp. 952 957 (2003) 6. Tieu, K., Dalley, G., Grimson, W.E.L.: Inference of Non-overlapping Camera Network Topology by Measuring Statistical Dependence. In: ICCV, pp. 1842 1849 (2005) 7. Nam, Y., Ryu, J., Choi, Y., Cho, W.: Learning Spatio-temporal Topology of a Multicamera Network by Tracking Multiple People. Proceedings of World Academy of Science, Engineering and Technology 24, 175 180 (2007) 8. Makris, D., Ellis, T., Black, J.: Bridging the Gaps between Cameras. In: CVPR, vol. 2, pp. 205 210 (2004) 9. Zou, X., Bhanu, B., Roy-Chowdhury, A.: Continuous Learning of a Multilayered Network Topology in a Video Camera Network. EURASIP Journal on Image and Video Processing (2009)