Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules

Similar documents
A Novel Validity Index for Determination of the Optimal Number of Clusters

Video Data and Sonar Data: Real World Data Fusion Example

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

the data. Structured Principal Component Analysis (SPCA)

Cluster-Based Cumulative Ensembles

New Fuzzy Object Segmentation Algorithm for Video Sequences *

A scheme for racquet sports video analysis with the combination of audio-visual information

arxiv: v1 [cs.db] 13 Sep 2017

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Detecting Moving Targets in Clutter in Airborne SAR via Keystoning and Multiple Phase Center Interferometry

FUZZY WATERSHED FOR IMAGE SEGMENTATION

Boosted Random Forest

timestamp, if silhouette(x, y) 0 0 if silhouette(x, y) = 0, mhi(x, y) = and mhi(x, y) < timestamp - duration mhi(x, y), else

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

Segmentation of brain MR image using fuzzy local Gaussian mixture model with bias field correction

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

特集 Road Border Recognition Using FIR Images and LIDAR Signal Processing

Shape Outlier Detection Using Pose Preserving Dynamic Shape Models

Naïve Bayes Slides are adapted from Sebastian Thrun (Udacity ), Ke Chen Jonathan Huang and H. Witten s and E. Frank s Data Mining and Jeremy Wyatt,

Detection and Recognition of Non-Occluded Objects using Signature Map

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Graph-Based vs Depth-Based Data Representation for Multiview Images

FOREGROUND OBJECT EXTRACTION USING FUZZY C MEANS WITH BIT-PLANE SLICING AND OPTICAL FLOW

Chromaticity-matched Superimposition of Foreground Objects in Different Environments

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Face and Facial Feature Tracking for Natural Human-Computer Interface

Semi-Supervised Affinity Propagation with Instance-Level Constraints

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

Machine Vision. Laboratory Exercise Name: Student ID: S

A {k, n}-secret Sharing Scheme for Color Images

Extracting Partition Statistics from Semistructured Data

Gray Codes for Reflectable Languages

Discrete sequential models and CRFs. 1 Case Study: Supervised Part-of-Speech Tagging

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ICIP.2016.

Comparing Images Under Variable Illumination

Developing Dually Optimal LCA Features in Sensory and Action Spaces for Classification

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors

Evolutionary Feature Synthesis for Image Databases

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition

Using Augmented Measurements to Improve the Convergence of ICP

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

Cluster Centric Fuzzy Modeling

Simulation of Crystallographic Texture and Anisotropie of Polycrystals during Metal Forming with Respect to Scaling Aspects

The Implementation of RRTs for a Remote-Controlled Mobile Robot

Wide-baseline Multiple-view Correspondences

Measurement of the stereoscopic rangefinder beam angular velocity using the digital image processing method

Adaptive Implicit Surface Polygonization using Marching Triangles

Gradient based progressive probabilistic Hough transform

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

Self-Location of a Mobile Robot with uncertainty by cooperation of an heading sensor and a CCD TV camera

Adapting K-Medians to Generate Normalized Cluster Centers

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION

Accurate Partial Volume Estimation of MR Brain Tissues

Exploiting Enriched Contextual Information for Mobile App Classification

Accommodations of QoS DiffServ Over IP and MPLS Networks

Transition Detection Using Hilbert Transform and Texture Features

MATH STUDENT BOOK. 12th Grade Unit 6

Defect Detection and Classification in Ceramic Plates Using Machine Vision and Naïve Bayes Classifier for Computer Aided Manufacturing

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs?

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

Outline: Software Design

Relevance for Computer Vision

Weak Dependence on Initialization in Mixture of Linear Regressions

Fast Rigid Motion Segmentation via Incrementally-Complex Local Models

Superpixel Tracking. School of Information and Communication Engineering, Dalian University of Technology, China 2

Fuzzy Meta Node Fuzzy Metagraph and its Cluster Analysis

Simultaneous image orientation in GRASS

Gait Based Human Recognition with Various Classifiers Using Exhaustive Angle Calculations in Model Free Approach

Supplementary Material: Geometric Calibration of Micro-Lens-Based Light-Field Cameras using Line Features

ASL Recognition Based on a Coupling Between HMMs and 3D Motion Analysis

We P9 16 Eigenray Tracing in 3D Heterogeneous Media

Pipelined Multipliers for Reconfigurable Hardware

Probabilistic Classification of Image Regions using an Observation-Constrained Generative Approach

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

NOISE REMOVAL FOR OBJECT TRACKING BASED ON HSV COLOR SPACE PARAMETER USING CAMSHIFT

DOMAIN ADAPTATION BY ITERATIVE IMPROVEMENT OF SOFT-LABELING AND MAXIMIZATION OF NON-PARAMETRIC MUTUAL INFORMATION. M.N.A. Khan, Douglas R.

ASSESSMENT OF TWO CHEAP CLOSE-RANGE FEATURE EXTRACTION SYSTEMS

An Efficient and Scalable Approach to CNN Queries in a Road Network

Sequential Incremental-Value Auctions

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Self-aware and Self-expressive Camera Networks

Micro-Doppler Based Human-Robot Classification Using Ensemble and Deep Learning Approaches

Compressed Sensing mm-wave SAR for Non-Destructive Testing Applications using Side Information

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

3D Shadows: Computer Vision for an Unencumbered Interface

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Contour Box: Rejecting Object Proposals Without Explicit Closed Contours

Crowd-GPS-Sec: Leveraging Crowdsourcing to Detect and Localize GPS Spoofing Attacks

Context-Aware Activity Modeling using Hierarchical Conditional Random Fields

Background/Review on Numbers and Computers (lecture)

Transcription:

Improved Vehile Classifiation in Long Traffi Video by Cooperating Traker and Classifier Modules Brendan Morris and Mohan Trivedi University of California, San Diego San Diego, CA 92093 {b1morris, trivedi}@usd.edu Abstrat Visual surveillane systems intend to extrat meaning from a sene. Two initial steps for this extration are the detetion and traking of objets followed by the lassifiation of these objets. Often times these are viewed as separate problems where eah is solved by an individual module. These tasks should not be done individually beause they an help one another. This paper demonstrates the benefit gained both in traking and lassifiation through the ommuniation between these individual modules. This is shown on a real-time system monitoring highway traffi. The system retrieves online video at 10 frames/se and onduts traking and lassifiation simultaneously. Results show an improvement from 74% to 88% auray in lassifiation results. 1. Introdution Video surveillane has prompted a wide variety of researh with traking being one of the foremost [7]. Aurate traking is possible even through many diffiult situations suh as hanging lighting onditions, olusion, or adverse weather onditions. There is also the objet reognition amp that seeks to determine the identity of an objet visually [1]. Usually these are seen as two different problems. In a sene where the objets of interest are in motion they are in fat omplementary tasks [9]. Traking and lassifiation should both be implemented in a visual surveillane system beause they are inherently linked in many higher level analyses. Aurate vehile lassifiation an be used for strutural health monitoring [4], environmental studies on impats from emissions [3], and road management and traffi planning. In a more general setting, traking with lassifiation an be partiularly useful for re-identifiation [8] of vehiles through larger video networks with non-overlapping views or without time syn- Figure 1. Standard output frame from the system. Text aompanying eah detetion gives the detetion number d, trak number t, lassifiation (detetion, trak), and veloity {d# t# # # v#}. hronization. Classifiation an also be used to provide ontext to systems that learn normal and abnormal behavior patterns [6]. (In a highway appliation one expets large truks to travel in the slower lanes). This paper demonstrates the benefit gained both in traking and lassifiation results through the ommuniation between the individual modules. This is demonstrated with a real-time system monitoring highway traffi. The system retrieves online video at 10 frames/se and onduts traking and lassifiation simultaneously. Results show an improvement from 74% to 88% auray in lassifiation results. 2. System Overview The system presented in this paper is a general traking system to be used as a utility for lab experiments. The goal was to develop traking and lassifiation software that an

Objet Detetion blobs Detetion Classifiation w Traking w Trak Classifiation x t Traks Figure 2. System blok diagram with interonnets between lassifiation and traking. be used as a front end for higher level analyses. The experimental test bed onsists of 10 ameras situated around ampus, offering a wide variety of senes from highway to foot traffi. Video is streamed via the internet using Axis video servers at 10 frames a seond. This software an be run in real-time for long periods of time (data for this paper was olleted over 24 hours) olleting data and statistis that an be stored for future investigation [2]. A blok diagram, with four main bloks, the Objet Detetion, Detetion Classifiation, Traking, and Trak Classifiation modules, for this system is in Fig. 2. The Objet Detetion module loates potential objet pixels by onstruting a bakground model and performing bakground subtration. The Detetion Classifiation module takes measurements on onneted omponent objet blobs to lassify the objet type. The Traking module traks blobs using a Kalman filter and the objet measurements. Finally, the Trak Classifiation module uses the traking information to refine the objet lass estimation from the Detetion Classifiation blok. A typial output frame from this system is shown Fig. 1. The labels above eah vehile are of the form {d# t# # # v#} with d being the detetion number, t the trak number, the lassifiation (detetion lass, trak lass), and finally a rough veloity in mph. 3. Objet Detetion The Objet Detetion module quikly determines foreground pixels by using an adaptive bakground subtration sheme. The bakground model is omposed of two parameters µ, a time averaged bakground image of the sene, and σ, a measure of the variability in the sene. The bakground model is adaptively updated as eah new video frame is reeived by omputing a running average where the ontribution of the newest frame, I t, is ontrolled by the parameter α [0, 1], L T µ t = (1 α)µ t 1 + αi t, (1) σ 2 t = (1 α)σ 2 t 1 + α(i t µ t ) 2. (2) The foreground pixels are extrated by bakground subtration and thresholding, where the threshold is determined by the past deviations of a pixel (σ 0 is a small onstant to suppress noise), I foreground = (I t µ t ) > T (σ n + σ 0 ). (3) The foreground is further proessed to fill in any holes with morphologial operations. Eah blob is then labeled by onneted omponent analysis generating a unique identifier for further proessing. 4. Detetion Classifiation The Detetion Classifiation module takes measurements of eah foreground blob. The measurements are intended to haraterize an objet by providing a unique signature of any potential sene objet. The measurement vetor used here is omposed of 17 simple blob features {area, breadth, ompatness, elongation, perimeter, onvex hull perimeter, length, long and short axis of fitted ellipse, roughness, entroid, 5 image moments}, x = [m 0,..., m 16 ] T. The objet lass is determined by transforming x and omparing the transformed vetor with a set of training examples. The lassifier is trained by olleting measurement samples and performing linear disriminant analysis (LDA) [5] to projet the data onto a lower dimensional spae better suited for lassifiation. The objets are then ompared in this projetion spae using a weighted K nearest neighbor (wknn) lassifier. The training set is hosen to have the same number of examples of eah lass to maintain omparison fairness. The training set is made up of prototype measurement vetors learned by lustering using fuzzy means (FCM). The details of the lassifiation sheme are given in the following setions. 4.1. LDA Classifiation is performed in a lower dimensional spae onstruted using linear disriminant analysis. LDA designs a spae by transforming the features in a training set to maximize the distane between lasses. Let D = {x 1,..., x N } be a set of N training vetors for lass, eah of dimension d, with mean µ = 1 N N x i. The full training set, D = {D 1,..., D C }, is omposed of the training samples from all lasses and has mean µ = 1 N N x i, where N = N. The LDA projetion is found by the

maximization problem P LDA = argmax w w T S B w w T S W w, (4) where S B is the between lass satter matrix and S W is the within lass satter matrix, given by S B = S W = C N i (µ i µ)(µ i µ) T, (5) C x k D C (x k µ i )(x k µ i ) T. (6) The solution to this maximization leads to the generalized eigen problem S B w = λs W w. The top M eigenvetors are retained to obtain the LDA projetion matrix, x LDA = P LDA x = [w 1,..., w i,..., w M ]x (7) The detetion measurements are transformed by projeting them onto the LDA spae using P LDA where lassifiation an our using weighted K nearest neighbors. 4.2. wknn The wknn rule [10] is a modifiation of the nearest neighbor (NN) lassifier. The advantage of wknn is that eah sample is assigned to every lass while NN only gives a binary indiation of lass membership. This lass weight is a soft membership to eah lass, whih builds robustness to noise and outliers. The weight for lass, w, is determined by adding the similarity of the K losest training samples with label. The similarity is defined as the inverse of the Eulidean distane between vetors. The label of an individual detetion, L D, is the lass that has highest weight, 4.3. FCM w = K x i D L D = argmax 1 x i x test, (8) w. (9) Using a NN derivative makes lassifiation inherently dependent on the training set. The training set must be diverse enough to apture all desired lasses and ontain ample variability to distinguish between these lasses. When olleting samples, the training set will be biased toward the most often ourring lass. (The number of sedans far exeeds the number of semi truks in highway surveillane). Fairness is introdued to the wknn lassifier by normalizing eah lass to have the same number of training samples (N p ). These prototype training vetors are learned using Fuzzy C Means [11] to iteratively minimize the loss funtion N p N Q = u m ik x k v i 2. (10) k=1 With membership onstraint N p u ik = 1. (11) x k is a test point, v i a luster prototype, u ik [0, 1] is the membership of sample k to prototype i, and m > 1 is a fuzzifiation fator. This problem is solved by minimizing the objetive funtion (10) subjet to the onstraint (11) by using the method of Lagrange multipliers. The minimization leads to the following updates for the prototype vetors v i and membership u ik, v i = u ik = N k=1 um ik x k N k=1 um ik N p ( xk v i 2 x k v j 2 j=1 ) 2 m 1 1 (12). (13) The prototype vetors are used as the training set for wknn. (The training set an be adapted to new samples by using the membership sore, v j = u ij x i + (1 u ij )v j, but this has not been implemented). 5. Traking The Traking module is based on the entroid of deteted blobs. The entroid of eah blob is traked using a onstant veloity model Kalman filter. The state of the filter is the entroid loation and veloity, s = [ x, y, v x, v y ] T, and the measurement is an estimate of this entire state, y = ŝ = [ ˆ x, ĉ y, vˆ x, ˆv y ] T. The data assoiation problem between multiple blobs is solved by omparison of the predited entroid loation with the entroids of the detetions in the urrent. The blob with entroid losest to the predited loation is hosen as a math for the trak. In addition to the Kalman filter, eah trak maintains a history of the measurements of detetions belonging to the trak. When a new detetion is assoiated to a trak, the trak history is updated x trak t = (1 α)x trak t 1 + αx detetion t. (14)

Class 0-Sedan 1-Truk 2-SUV 3-Semi 4-Van 5-TSV 6-MT Total % 81.7 76.2 63.3 62.5 62.2 62.2 100 74.4 Table 1. Detetion lassifiation auray results Class 0-Sedan 1-Truk 2-SUV 3-Semi 4-Van 5-TSV 6-MT Total % 94.3 87.5 75.0 100 90.5 0 85.71 88.4 Table 2. Traking lassifiation auray results has not had the time to initialize a veloity before it traking makes the mistake of linking the wrong vehile. This inorret linking atually ours twie in the last 2 frames as the middle ar gets assoiated with the trak atually belonging to the bottom sedan. (a) Frame 4: Two vehiles merged from bakground detetion. (b) Frame 5: Two new traks are instantiated beause the trak measurement onstraint was violated. (a) Trak 40: Van mislassified as SUV (2) by Detetion Classifier but orretly labeled by Trak Classifier as Van (4) Figure 3. Example of trak measurement onsisteny onstraint. 3(a) and 3(b) show a trak being split. Similar to the bakground update, α [0, 1], but now ontrols how similar measurements from suessive detetions must be along the trak. Larger α is used when objets have larger variability along a trak. The trak measurement history is used to enfore onsisteny between a potential detetion and trak. In addition to being in the predited loation, a mathed objet must also have similar measurements (S meas > T S ). The similarity between a trak and a test detetion is defined as (b) Trak 58: SUV mislassified as Van (4) by Detetion Classifier but orretly labeled by Trak Classifier as SUV (2) Figure 4. Examples of trak lassifiation orreting for mislassified detetion. S meas = [(x trak x test ) T Σ 1 (x trak x test )] 1, (15) where Σ is a diagonal matrix with entries equal the the variane of the partiular measurement learned during training. Fig. 3 shows a trak orretly being split into 2 new traks beause the measurement onstraint was violated. Even with the measurement onstraint there are still ases when diffiult to disambiguate traks as seen in Fig. 5. When the merged sedans are split into 3 new traks the Kalman filter 6. Trak Classifiation Traking gives a reord of an objet while in the amera field of view. Eah time instant along a trak is an example of the objet, giving us T examples over the ourse of a trak (T does not have to be the end of a trak). Given these T samples, the Trak Classifiation module generates the objet lass by maximum likelihood estimation.

Figure 5. Diffiulties traking even with measurement onstraints. After the split all three of the vehiles appear quite similar onfusing the trak orrespondenes and ausing multiple trak splits. L T = argmax = argmax T ln p(x t ) (16) t=1 T ln w w. (17) t=1 The likelihood p(x t ) of lass is approximated by normalizing (8) to be a valid probability. The trak lass is refined eah frame as the trak is updated. The trak label takes into aount all the evidene throughout the entire trak to make a deision on lass type rather than a single frame measurement that ould potentially be orrupted by many sorts of noise. The final trak label is the lass assigned last before the trak ends. Fig. 4 gives examples of the trak lassifier overoming inorret detetion lassifiation results. In Fig. 4(a) the Detetion Classifiation is 2 (SUV) but the Trak Classifiation is 4 (Van). Fig. 4(b) is the opposite ase where the detetion label is inorret (4 - Van) but the traking label maintains the true vehile identity (2 - SUV). 7. Results The proposed traking + lassifiation system was run for 24 hours to test the improvements of trak based lassifiation. The video stream analyzed was a highway sene streamed over the internet and proessed between 10-15 frames/se with 352x240 resolution. Sine this system is able to run for long periods of time it is not feasible to store all the video results. Instead 5 minute output lips were saved every hour for evaluation. The training data onsisted of 1700 vehiles divided into 7 lasses. The 7 different vehile lasses were 0 - Sedan, 1 - Truk, 2 - SUV, 3 - Semi, 4 - Van, 5 - Truks+SUV+Van (TSV), and 6 - Moving Truks (MT). The LDA projetion was found by retaining the top M = 5 eigenvetors of 7. Using FCM, 1043 training prototypes were generated (149 for eah lass). All lassifiation results were then omputed using wknn with K = 5 with respet to these prototypes. Tables 1 and 2 give the lassifiation auray after hand labeling the true vehile lasses for two of the videos. The lower rates seen in the Detetion Classifier is beause of the similarity between vehile lasses. Vans and SUVs are quite similar as well as the Semi and Moving Truks. Note that none of the TSV vehiles were properly lassified after traking. This is beause the TSV lass was a wrapper lass for Truk, SUV, and Van beause they were previously found to be the most often onfused vehiles [9]. This label was used sparingly for the rare ourrene of a vehile that even a human ould not distinguish, making it a lass of hard examples. Beause of its rarity and strong similarity to 3 other lasses the Trak Classifier hose to label all vehiles with a less general label. The TSV examples were plaed into either one of the Truk, SUV, or Van lasses. Even with the diffiulties disambiguating lasses based on single frame detetions the Trak Classifier performs quite well with a total improvement of over 10%. Unfortunately this lassifier did not work well for all times of the day. At night the lassifier was useless beause low light onditions produed inomplete detetions. 8. Conlusions Separately, traking and lassifiation are two important tasks of any surveillane system. Performing both these operations in onjuntion delivers improved performane in both. This paper demonstrates this improvement through experiments run on live video. Data was aptured and proessed in real-time over long periods of time. Analysis of the system output showed an improvement of 10% over sin-

gle frame lassifiation using a trak based lassifier as well as more onsistent vehile traks. Referenes [1] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman. Eigenfaes vs. Fisherfaes: Reognition using lass speifi linear projetion. IEEE Trans. Pattern Anal. Mahine Intell., 19(7):771 720, July 1997. [2] S. Bhonsle, M. Trivedi, and A. Gupta. Database-entered arhiteture for traffi inident detetion, management, and analysis. In Pro. IEEE Conf. on Intell. Transport. Syst., pages 149 154, Dearborn, Mihigan, Ot. 2000. [3] C. Cardelino. Daily variability of motor vehile emissions derived from traffi ounter data. Journal of the Air and Waste Management Assoiation, 48(7), July 1998. [4] R. Chang, T. Gandhi, and M. M. Trivedi. Vision modules for a multi-sensory bridge monitoring approah. In Pro. IEEE Conf. on Intell. Transport. Syst., pages 971 976, Ot. 2004. [5] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classifiation. Wiley-Intersiene, New York, NY, seond edition, 2001. [6] W. Hu, X. Xiao, D. Xie, T. Tan, and S. Maybank. Traffi aident predition using 3-d model-based vehile traking. IEEE Trans. Veh. Tehnol., 53(3):677 694, May 2004. [7] V. Kastrinaki, M. Zervakis, and K. Kalaitzakis. A survey of video proessing tehniques for traffi appliations. Image and Vision Computing, 21(4):359 381, Apr. 2003. [8] G. T. Kogut and M. M. Trivedi. Maintaining the identity of mulitiple vehiles as they travel through a video network. In Pro. IEEE Conf. on Intell. Transport. Syst., pages 756 761, Oakland, California, Aug. 2001. [9] B. T. Morris and M. M. Trivedi. Robust lassifiation and traking of vehiles in traffi video streams. In Pro. IEEE Conf. on Intell. Transport. Syst., Toronto, Canada, Sept. 2006. to be published. [10] T. K. Osamu Hasegawa. Type lassifiation, olor estimation, and speifi target detetion of moving targets on publi streets. Mahine Vision and Appliations, 16:116 121, Feb. 2005. [11] W. Pedryz. Knowledge-Based Clustering: From Data to Information Granules. John Wily, Hoboken, New Jersey, 2005.