Detecting Errors and Imputing Missing Data for Single-Loop Surveillance Systems

Similar documents
DETECTING ERRORS AND IMPUTING MISSING DATA FOR SINGLE LOOP SURVEILLANCE SYSTEMS

Wishing you all a Total Quality New Year!

X- Chart Using ANOM Approach

CS 534: Computer Vision Model Fitting

y and the total sum of

An Entropy-Based Approach to Integrated Information Needs Assessment

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

S1 Note. Basis functions.

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Why visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information

Classifier Selection Based on Data Complexity Measures *

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation

A Binarization Algorithm specialized on Document Images and Photos

A Semi-parametric Regression Model to Estimate Variability of NO 2

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Support Vector Machines

TN348: Openlab Module - Colocalization

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Data Mining: Model Evaluation

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Cluster Analysis of Electrical Behavior

Reducing Frame Rate for Object Tracking

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Chapter 9. Model Calibration. John Hourdakis Center for Transportation Studies, U of Mn

An Optimal Algorithm for Prufer Codes *

USING GRAPHING SKILLS

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Module Management Tool in Software Development Organizations

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Virtual Machine Migration based on Trust Measurement of Computer Node

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

The Codesign Challenge

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

UB at GeoCLEF Department of Geography Abstract

Concurrent Apriori Data Mining Algorithms

SVM-based Learning for Multiple Model Estimation

Econometrics 2. Panel Data Methods. Advanced Panel Data Methods I

Lecture 5: Multilayer Perceptrons

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Angle-Independent 3D Reconstruction. Ji Zhang Mireille Boutin Daniel Aliaga

Problem Set 3 Solutions

Outlier Detection based on Robust Parameter Estimates

Programming in Fortran 90 : 2017/2018

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers

3D vector computer graphics

Hierarchical clustering for gene expression data analysis

Mathematics 256 a course in differential equations for engineering students

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Performance Evaluation of Information Retrieval Systems

Research Article Time Series Outlier Detection Based on Sliding Window Prediction

REFRACTIVE INDEX SELECTION FOR POWDER MIXTURES

Private Information Retrieval (PIR)

APPLICATION OF A COMPUTATIONALLY EFFICIENT GEOSTATISTICAL APPROACH TO CHARACTERIZING VARIABLY SPACED WATER-TABLE DATA

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Anonymisation of Public Use Data Sets

Three supervised learning methods on pen digits character recognition dataset

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems

Design of a Real Time FPGA-based Three Dimensional Positioning Algorithm

Analysis of Collaborative Distributed Admission Control in x Networks

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Simulation Based Analysis of FAST TCP using OMNET++

Analysis of Continuous Beams in General

Pictures at an Exhibition

Biostatistics 615/815

Quantifying Responsiveness of TCP Aggregates by Using Direct Sequence Spread Spectrum CDMA and Its Application in Congestion Control

ECONOMICS 452* -- Stata 12 Tutorial 6. Stata 12 Tutorial 6. TOPIC: Representing Multi-Category Categorical Variables with Dummy Variable Regressors

Air Transport Demand. Ta-Hui Yang Associate Professor Department of Logistics Management National Kaohsiung First Univ. of Sci. & Tech.

ECONOMICS 452* -- Stata 11 Tutorial 6. Stata 11 Tutorial 6. TOPIC: Representing Multi-Category Categorical Variables with Dummy Variable Regressors

Brave New World Pseudocode Reference

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

ELEC 377 Operating Systems. Week 6 Class 3

An Image Fusion Approach Based on Segmentation Region

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

A Robust Method for Estimating the Fundamental Matrix

5.0 Quality Assurance

Lecture #15 Lecture Notes

Solving two-person zero-sum game by Matlab

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Review of approximation techniques

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Intro. Iterators. 1. Access

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Edge Detection in Noisy Images Using the Support Vector Machines

Transcription:

16 Transportaton Research Record 1855 Paper No. 3-357 Detectng Errors and Imputng Mssng Data for Sngle-Loop Survellance Systems Chao Chen, Jamyoung Kwon, John Rce, Alexander Skabardons, and Pravn Varaya Sngle-loop detectors provde the most abundant source of traffc data n Calforna, but loop data samples are often mssng or nvald. A method s descrbed that detects bad data samples and mputes mssng or bad samples to form a complete grd of clean data, n real tme. The dagnostcs algorthm and the mputaton algorthm that mplement ths method are operatonal on 14,871 loops n sx dstrcts of the Calforna Department of Transportaton. The dagnostcs algorthm detects bad (malfunctonng) sngle-loop detectors from ther volume and occupancy measurements. Its novelty s ts use of tme seres of many samples, nstead of basng decsons on sngle samples, as n prevous approaches. The mputaton algorthm models the relatonshp between neghborng loops as lnear and uses lnear regresson to estmate the value of mssng or bad samples. Ths gves a better estmate than prevous methods because t uses hstorcal data to learn how pars of neghborng loops behave. Detecton of bad loops and mputaton of loop data are mportant because they allow algorthms that use loop data to perform analyss wthout requrng them to compensate for mssng or ncorrect data samples. Loop detectors are the best source of real-tme freeway traffc data today. In Calforna, these detectors cover most urban freeways. But loop detector data contan many holes (mssng values) or bad (ncorrec values and requre careful cleanng to produce relable results. Bad or mssng samples present problems for any algorthm that uses the data for analyss. Therefore, one must both detect when data are bad and throw them out and then fll holes n the data wth mputed values. The goal s to produce a complete grd of relable data. Analyses that use such a complete data set can be trusted. Bad data must be detected from the measurements themselves. The problem has been studed by FHWA, the Washngton State Department of Transportaton, and others. Exstng algorthms usually work on the raw 2-s or 3-s data and produce a dagnoss for each sample. But t s dffcult to tell f a sngle 2-s sample s good or bad unless t s very abnormal. Fortunately, loop detectors do not just gve random errors some loops produce reasonable data all the tme, whle others produce suspect data all the tme. By examnng a tme seres of measurements, one can readly dstngush bad behavor from good. The dagnostcs algorthm presented here examnes a day s worth of samples together, producng convncng results. Once bad samples are thrown out, the resultng holes n the data must be flled wth mputed values. Imputaton wth tme seres analyss has been suggested, but these mputatons are effectve only for short perods of mssng data; lnear nterpolaton and neghborhood C. Chen and P. Varaya, Department of Electrcal Engneerng and Computer Scence, and J. Kwon and J. Rce, Department of Statstcs, Unversty of Calforna, Berkeley, CA 9472. A. Skabardons, Insttute of Transportaton Studes, Unversty of Calforna, Berkeley, CA 9472-172. averages are natural mputaton methods, but they do not use all the relevant data that are avalable. The mputaton algorthm presented here estmates values at a detector by usng data from ts neghbors. The algorthm models each par of neghbors lnearly and fts ts parameters on hstorcal data. It s robust and performs better than other methods. DESCRIPTION OF DATA The freeway performance measurement system (PeMS) collects, stores, and analyzes data from thousands of loop detectors n sx dstrcts of the Calforna Department of Transportaton (Caltrans) (Transacct.eecs.berkeley.edu, 1). The PeMS database currently has 1 terabyte of data onlne, and t collects more than 2 GB data per day. PeMS uses the data to compute freeway usage and congeston delays, measure and predct travel tme, evaluate ramp-meterng methods, and valdate traffc theores. There are 14,871 manlne loops n the PeMS database from sx Caltrans dstrcts. The results presented here are for manlne loops. Each loop reports the volume q(, the number of vehcles that cross the loop detector durng a 3-s nterval t, and occupancy k(, the fracton of ths nterval durng whch there s a vehcle above the loop. Each par of volume and occupancy observatons s called a sample. The number of total possble samples n 1 day from manlne loops n PeMS s therefore (14,871 loops) (2,88 sample per loop per day) = 42 mllon samples. In realty, however, PeMS never receves all the samples. For example, Los Angeles has a mssng sample rate of about 15%. Whle t s clear when samples are mssed, t s harder to tell when a receved sample s bad or ncorrect. A dagnostcs test needs to accept or reject samples on the bass of the assumpton of what good and bad samples look lke. EXISTING DATA-RELIABILITY TESTS Loop data errors have plagued ther effectve use for a long tme. In 1976, Payne et al. dentfed fve types of detector error and presented several methods to detect them from 2-s and 5-mn volume and occupancy measurements (2). These methods place thresholds on mnmum and maxmum flow, densty, and speed and declare a sample to be nvald f they fal any of the tests. Later, Jacobson et al. defned an acceptable regon n the k-q plane and declared samples to be good only f they fell nsde the regon (3). Ths s called the Washngton algorthm n ths paper. The boundares of the acceptable regon are defned by a set of parameters, whch are calbrated from hstorcal data or derved from traffc theory. Exstng detecton algorthms (2 4) try to catch the errors descrbed by Payne et al. (2). For example, chatterng and pulse breakup cause

Chen et al. Paper No. 3-357 161 q to be hgh, so a threshold on q can catch these errors. But some errors cannot be caught ths way, such as a detector stuck n the off (q =, k = ) poston. Payne s algorthm would dentfy ths as a bad pont, but good detectors wll also report (, ) when there are no vehcles n the detecton perod. Elmnatng all (, ) ponts ntroduces a postve bas n the data. On the other hand, the Washngton algorthm accepts the (, ) pont, but dong so makes t unable to detect the stuck type of error. A threshold on occupancy s smlarly hard to set. An occupancy value of.5 for one 3-s perod should not ndcate an error, but a large number of 3-s samples wth occupances of.5, especally durng nonpeak perods, ponts to a malfuncton. The Washngton algorthm was mplemented n Matlab and tested on 3-s data from two loops n Los Angeles for 1 day on August 7, 21. The acceptable regon s taken from Jacobson et al. (3). The data and ther dagnoses are shown n Fgure 1. Vsually, Loop 1 looks good (Fgure 1b), and Loop 2 looks bad (Fgure 1d). Loop 2 looks bad because there are many samples wth k = 7% and q = as well as many samples wth occupances that appear too hgh, even durng nonpeak perods, and when Loop 1 shows low occupancy. The Washngton algorthm, however, does not make the correct dagnoss. Of 2,875 samples, t declared 1,138 samples to be bad for Loop 1 and 883 bad for Loop 2. In both loops, there were many false alarms. Ths s because the maxmum acceptable slope of q/k was exceeded by many samples n free flow. Ths suggests that the algorthm s very senstve to thresholds and needs to be calbrated for Calforna. Calbraton s mpractcal because each loop wll need a separate acceptable regon, and ground truth would be dffcult to get. There are also false negatves many samples from Loop 2 appear to be bad because they have hgh occupances durng off-peak tmes, but they were not detected by the Washngton algorthm. Ths llustrates a dffculty wth the threshold method the acceptable regon has to be very large, because there are many possble traffc states wthn a 3-s perod. On the other hand, much more nformaton can be ganed by lookng at how a detector behaves over many sample tmes. Ths s why Loop 1 s easly recognzed as good and Loop 2 as bad by lookng at ther k( plots, and ths s a key nsght that led to the dagnostcs algorthm. PROPOSED DETECTOR DIAGNOSTICS ALGORITHM Desgn The algorthm for loop-error detecton uses the tme seres of flow and occupancy measurements nstead of makng a decson based on an ndvdual sample. It s based on the emprcal observaton that good and bad detectors behave very dfferently over tme. For example, at any gven nstant, the flow and occupancy at a detector locaton can have a wde range of values, and one cannot rule out most of them; but over a day, most detectors show a smlar pattern flow and occupancy are hgh n the rush hours and low late at nght. Fgures 2a and 2b show typcal 3-s flow and occupancy measurements at Vehcle Detector Staton 759531. Most loops have outputs that look lke ths, but some loops behave qute dfferently. Fgures 2c and 2d gve an example of a bad loop (Vehcle Detector Staton 759518). Ths loop has zero flow and an occupancy value of.7 for several hours durng the evenng peak perod clearly, these values must be ncorrect. Four types of abnormal tme seres behavor were found, and these are gven n Table 1. Types 1 and 4 are self-explanatory; FIGURE 1 Washngton algorthm on two loops: Loop 1 (a) volume versus occupancy and (b) occupancy; Loop 2 (c) volume versus occupancy and (d) occupancy. Occupancy s n percent. Loops are n Los Angeles, Interstate 5 North, postmle 7.8, Lanes 1 and 2.

162 Paper No. 3-357 Transportaton Research Record 1855 lane 4 volume lane 4 volume lane 4 occupancy lane 4 occupancy FIGURE 2 (a, c) Typcal and abnormal 3-s flow; (b, d) occupancy measurements. Types 2 and 3 are llustrated n Fgures 2c, 2d, and 1b. The errors n Table 1 are not mutually exclusve. For example, a loop wth all zero occupancy values exhbts both Type 1 and Type 4 errors. A loop s declared bad f t s n any of these categores. No sgnfcant number was found of loops wth chatter or pulse breakup, whch would produce abnormally hgh volumes. Therefore, the current form of the detecton algorthm does not check for ths condton. However, a ffth error type and error check can easly be added to the algorthm to flag loops wth consstently hgh counts. The daly statstcs algorthm (DSA) was developed to recognze error Types 1 through 4. The nput to the algorthm s the tme seres of 3-s measurements q(d, and k(d,, where d s the ndex of the day and t =, 1, 2,..., 2,879 s the 3-s sample number; the output s the dagnoss (d) for the dth day: (d) = f the loop s good and (d) = 1 f the loop s bad. In contrast to exstng algorthms that operate on each sample, DSA produces one dagnoss for all the samples of a loop on each day. Only samples between 5: a.m. and 1: p.m. were used for the dagnostcs, because outsde ths perod, t s more dffcult to tell the dfference between good and bad loops. There are 2,41 3-s samples n ths perod; therefore, the algorthm s a functon of 2,41 2 = 4,82 varables. Thus the dagnostc (d) on day d s a functon, (d) = f(q(d, a), q(d, a + 1),..., q(d, b), k(d, a), k(d, a + 1),..., k(d, b)), where a = 5 12 = 6 s the sample number at 5: a.m. and b = 22 12 = 2,64 s the last sample number, at 1: p.m. To deal wth the large number of varables, frst reduce them to four statstcs, S 1,..., S 4, whch are approprate summares of the tme seres. Ther defntons are gven n Table 2, where S j (, d) s the jth statstc computed for the th loop on the dth day. The decson becomes a functon of these four varables. For the th loop and dth day, the decson whether the loop s bad or good s determned accordng to the rule 1 ( d) = f S1(, d) > s* 1 or S2(, d) > s* 2 or S3(, d) > s* 3 or S (, d) < s* otherwse 4 4 () 1 TABLE 1 Loop Detector Data Error Error Type Descrpton Lkely Cause Fracton of Loops n Dstrct 12 1 Occupancy and flow are mostly zero Stuck off 5.6% 2 Non-zero occupancy and zero flow, see Hangng on 5.5% Fgure 2c and 2d 3 Very hgh occupancy, see Fgure 1d Hangng on 9.6% 4 Constant occupancy and flow Stuck on or 11.2% off All Errors 16%

Chen et al. Paper No. 3-357 163 TABLE 2 Name Defnton Descrpton S 1 (,d) 1 ( k ( d, = ) number of samples that have occupancy = S 2 (,d) S 3 (,d) S 4 (,d) a t b a t b a t b ( 1) Statstcs for Dagnostcs 1 ( k ( d, > )1( q ( d, = ) number of samples that have occupancy > and flow = * * 1( k ( d, > k ), k =.35 number of samples that have occupancy > k * (=.35) pˆ ( x) = x: p( x) > pˆ ( x) log( pˆ ( x)), a t b 1( k ( d, = x) 1 a t b entropy of occupancy samples a well-known measure of the randomness of a random varable. If k (d, s constant n t, for example, ts entropy s zero. where s* j are thresholds on each statstc. These four statstcs summarze the daly measurements well because they are good ndcators of the four types of loop falure gven n Table 1. Ths s seen n the hstogram of each statstc dsplayed n Fgure 3. The data were collected from Los Angeles on Aprl 24, 22. The dstrbuton of each statstc shows two dstnct populatons. In S 1, for example, there are two peaks, at and 2,41. Ths shows that there are two groups of loops one group of about 4,7 loops has very few samples that report zero occupancy, and another group of about 3 reports almost all zeros. The second group s bad, because they have Type 1 error. Snce all four dstrbutons are strongly bmodal, Equaton 1 s not very senstve to the thresholds s* j, whch just have to be able to separate the two peaks n the four hstograms n Fgure 3. The default thresholds are gven n Table 2. The only other parameters for ths model are the tme ranges and the defnton of S 3, where an occupancy threshold of.35 s specfed. The DSA uses a total of seven parameters, lsted n Table 3. They work well n all sx Caltrans dstrcts. Performance The DSA algorthm s mplemented and run on PeMS data. The last column n Table 1 shows the dstrbuton of the four types of error n Dstrct 12 (Orange County) for 31 days n October, 21. Because the ground truth of whch detectors are actually bad s not avalable, the performance of ths algorthm must be verfed vsually. Fortunately, ths s easy for most cases, because the tme seres show dstnctly dfferent patterns for good and bad detectors. A vsual test was performed on loops n Los Angeles, on data from August 7, 21. There are 662 loops on Interstate 5 and Interstate 21, of whch 142 (21%) were declared to be bad by the algorthm. The plots of occupancy were then checked manually to verfy these results. Fourteen loops were found that were declared good, but ther plots suggested they could be bad. Ths suggests a false negatve rate of 14/(662 142) = 2.7%. There were no false postves. Ths suggests that the algorthm performs very well. frequency (a) (b) (c) (d) FIGURE 3 Hstograms of (a) S 1, (b) S 2, (c) S 3, and (d) S 4.

164 Paper No. 3-357 Transportaton Research Record 1855 Real-Tme Operaton The descrbed detecton algorthm gves a dagnoss on samples from an entre day, but real-tme detecton the valdty of each sample as t s receved s also of nterest. Therefore, a decson ˆ (d,, where d s the current day and t s the current sample tme, s wanted. Use the smple approxmaton ˆ ( dt, ) = ( d 1) ( 2) where s defned n Equaton 1. Equaton 2 has two consequences. Frst, a loop s declared good or bad for an entre day. As a result, some flexblty s lost because good data from a partally bad loop may be thrown away (ths pont s dscussed n the conclusons secton). Second, there s a 1-day lag n the dagnoss, whch ntroduces a small error. The probablty of loop falure gven the loop status on the prevous day was estmated, and Equaton 2 was found to be true 98% of the tme. Therefore, t s a good approxmaton. IMPUTATION OF MISSING AND BAD SAMPLES Need for Imputaton TABLE 3 Parameters of Daly Statstcs Algorthm and Default Settngs Parameter Value k *.35 * s 1 * s 2 * s 3 * s 4 a b 12 The measurement of each detector s modeled as ether the actual value or an error value, dependng on the status : qmeas, ( d, = qreal, ( d, [ 1 ( d) ] + ( d, ( d) kmeas, ( d, = kreal, ( d, [ 1 ( d) ] + φ( d, ( d) t 2, 879 ( 3) 5 2 4 5 a.m. 1 p.m. future (5); Daley presented a method for predcton from neghbor loops by usng a Kalman flter (6). In the case here, the errors do not occur randomly but persst for many hours and days. Tme seres predctons become nvald very quckly and are napproprate n such stuatons. An mputaton scheme was developed that uses nformaton from good neghbor loops at only the current sample tme. Ths s a natural way to deal wth mssng data and s used by tradtonal mputaton methods. For example, to fnd the total volume of a freeway locaton wth four lanes and only three workng loops, one may reasonably use the average of the three lanes and multply t by four. Ths mputes the mssng value by usng the average of ts neghbors. Lnear nterpolaton s another example. Suppose detector s bad and s located between detectors j and k, whch are good. Let x, x j, x k be ther locatons and x j < x < x k ; then ˆ ( x xj) q ( k + ( xk x) q ( j q( = ( 4) x x s the lnear nterpolaton mputaton. Whle these tradtonal mputaton methods are ntutve, they make nave assumptons about the data. The proposed algorthm, on the other hand, models the behavor of neghbor loops better because t uses hstorcal data. Lnear Model of Neghbor Detectors A lnear regresson algorthm for mputaton s proposed that models the behavor of neghbor loops by usng hstorcal data. It was found that occupances and volumes of detectors n nearby locatons are hghly correlated. Therefore, measurements from one locaton can be used to estmate quanttes at other locatons, and a more accurate estmate can be formed f all the neghborng loops are used n the estmaton. Two loops are defned as neghbors f they are n the same locaton n dfferent lanes or f they are n adjacent locatons. Fgure 4 shows a typcal neghborhood. Both volume and occupancy from neghborng locatons are strongly correlated. Fgure 5 shows two pars of neghbors wth lnearly related flow and occupances. Fgure 6 plots the dstrbuton of the correlaton coeffcents between all neghbors n Los Angeles. It shows that most neghbor pars have hgh correlatons n both flow and occupancy. The hgh correlaton among neghbor loop measurements means that lnear regresson s a good way to predct one from the other. It s also easy to mplement and fast to run. The followng parwse lnear model relates the measurements from neghbor loops: q( = α (, j) + α (, j) q ( + nose j k( = β (, j) + β (, j) k ( + nose (5) 1 1 k j j where q meas, and k meas, are the measured values, q real, and k real, are the true values, and and φ are error values that are ndependent of q real, and k real,. An estmate of the loop status was obtaned n Equaton 2. It says to dscard the samples from detectors that are declared bad. Ths leaves holes n the data, n addton to the orgnally mssng samples. Ths s a common problem at each sample tme, the user must determne whether the sample s good. An applcaton that analyzes the data must deal wth both possbltes. An approach to mssng data s to predct them by usng tme seres analyss. Nhan modeled occupancy and flow tme seres as autoregressve movng average processes and predcted values n the near FIGURE 4 Example of neghborng loops.

Chen et al. Paper No. 3-357 165.4 2 Loop 197 Occupancy.3.2.1 Loop 194 Flow Rate (vph) 15 1 5.1.2.3.4 Loop 198 Occupancy (a) 5 1 15 2 Loop 197 Flow Rate (vph) (b) 25 FIGURE 5 Scatter plot of occupances and flows from two pars of neghbors. For each par of neghbors (, j), the parameters α (, j), α 1 (, j), β (, j), β 1 (, j) are estmated by usng 5 days of hstorcal data. Let q (, q j (, t = 1, 2,..., n be the hstorcal measurements of volume; then n 1 2 α(, j), α1(, j) = arg max [ q( α α1 q ( j ] ( 6) α α1 n t = 1 The parameters for densty are ftted the same way. Parameters can be found for all pars of loops that report data n the hstorcal database, but some loops never report any data. For them, a set of global parameters α* (δ, l 1, l 2 ), α* 1 (δ, l 1, l 2 ), β* (δ, l 1, l 2 ), β* 1 (δ, l 1, l 2 ) s used that generalze the relatonshp between pars of loop n dfferent confguratons. For each combnaton of (relatve locaton, lane of Loop 1, lane of Loop 2), the lnear model s as follows: q( = α* ( l lj) + ( l lj) q ( j δ,, α* δ,, + nose ( 7) k( = β* ( δ, l, l ) + β* ( δ, l, l ) k ( + nose j 1 j j where δ= f and j are n the same locaton on the freeway, 1 otherwse; l = lane number of loop ; l j = lane number of loop j; and l, l j = 1, 2, 3,..., 8. The global parameters are ftted to data smlar to the local parameters. In Los Angeles, there are 6,76 pars of neghbors (, j) for Dstrbuton 1.5 Occupancy Volume.2.4.6.8 1 Correlaton Coeffcent FIGURE 6 Cumulatve dstrbuton of correlaton coeffcents between neghbors. 5,377 loops; n San Bernardno, there are 3,896 pars for 466 loops. The four parameters for each par are computed for these two dstrcts and stored n database tables. When values are mputed for loop by usng ts neghbors, each neghbor provdes an estmate, and the fnal estmate s taken as the medan of the parwse estmates. Both volume and occupancy mputaton are performed the same way. The mputaton for volume s qˆ ( d, = α (, j) + α (, j) q ( d, j 1 qˆ ( dt, ) = medan qˆ ( dt, ), j: neghbor of, ˆ ( dt, ) = j j j { } () 8 Here, ˆ j (d,, obtaned from Equaton 2, s the dagnoss of the jth loop only estmates from good neghbors are used n the mputaton. Equaton 12 s a way to combne nformaton from multple neghbors. Whle ths method s suboptmal compared wth those wth jont probablty models, such as multple regresson, t s more robust. Multple regresson models all neghbors jontly, unlke the parwse model adopted here. Daley presented an estmaton method based on all neghbors jontly (6), but here, the parwse model was chosen for ts robustness t generates an estmate as long as there s one good neghbor. In contrast, multple regresson needs values at each sample tme from all the neghbors. Robustness s also ncreased by use of the medan of qˆ j nstead of the mean, whch s affected by outlers and errors n j. After one teraton, the mputaton algorthm generates estmates for all the bad loops that have at least one good neghbor. Somethng remans to be done for the bad loops that do not have good neghbors. A scheme for ths has not been chosen, but there are several alternatves. The current mplementaton smply terates the mputaton process. After the frst teraton, a subset of the bad loops s flled wth mputed values these are the loops wth good neghbors. In the second teraton, the set of good loops grows to nclude those that were mputed n the prevous teraton, so some of the remanng bad loops now have good neghbors. Ths process contnues untl all loops are flled or untl all the remanng bad loops do not have any good neghbors. The problem wth ths method s that the mputaton becomes less accurate wth each succeedng teraton. Fortunately, most of the bad loops are flled n the frst teraton. In Dstrct 7 on Aprl 24, 22, for example, the percentages of flled loops n the frst four teratons are 9%, 5%, 1%, and 1%; the entre grd s flled after eght teratons. Another alternatve s to use the current mputaton only for the frst n mputatons. After that, f there are stll loops wthout values, another method can be used, such as hstorcal mean. In any case, an alternatve mputaton scheme s requred for sample tmes when there are no good data for any loop.

166 Paper No. 3-357 Transportaton Research Record 1855 TABLE 4 Performance of Imputaton Quantty Mean Standard Devaton Mean Absolute Error Standard Devaton of Error Mean Error Occupancy.85.61.13.21.1 Volume (vph) 122 527 132 21 6 Performance The performance of ths algorthm was evaluated for data from Aprl 24, 22. To run ths test, 189 loops were found that are themselves good and also had good neghbors. From each loop, the measured flows and occupances q ( and k ( were collected; the algorthm was then run to compute the estmated values qˆ( and kˆ (, based on neghbors. From these, the root-mean-square errors for each loop were found; see Table 4. Ths table shows that the estmates are unbased, as they should be. The standard devaton of mputaton error s small compared to the mean and standard devaton of the measurements. Fgure 7 compares the estmated and orgnal values for one loop. They show good agreement. The performance of the algorthm was also compared aganst that of lnear nterpolaton. Ffteen trplets of good loops were chosen for ths test. Ten of the trplets are loops n the same lane, dfferent locatons, whle fve trplets have ther loops n the same locaton, across three lanes. In each trplet, two loops were used to predct the volume and occupancy of the thrd loop by usng lnear nterpolaton. In every case, the neghborhood method produced a lower error n occupancy estmates; t produced smaller errors n flow estmates n 1 of 15 locatons. Overall, the neghborhood method performed better n the mean and medan, as expected. CONCLUSIONS Algorthms were presented to detect bad loop detectors from ther outputs and to mpute mssng data from neghborng good loops. Exstng methods of detecton evaluate each 2-s sample to determne f t represents a plausble traffc state, but t was found that there s much more nformaton about how detectors behave over tme. The presented algorthm makes dagnoses based on the sequence of measurements from each detector over a whole day. Vsually, bad data are much easer to detect when vewed as a tme seres. The algorthm found almost all the bad detectors that could be found vsually. The mputaton algorthm estmates the true values at locatons wth bad or mssng data. Ths s an mportant functonalty, because almost any algorthm that uses the data needs a complete grd of data. Tradtonally, the way to deal wth mssng data s to nterpolate from nearby loops. The presented algorthm performs better than nterpolaton because t uses hstorcal nformaton on how the measurements from neghbor detectors are related. The volume and occupancy between neghbor loops were modeled lnearly, and the lnear regresson coeffcents of each neghbor par were found from hstorcal data. Ths algorthm s smple and robust. There reman many possbltes for mprovements to the algorthms descrbed here. The detecton algorthm has a tme lag. To address ths, a truly real-tme detecton algorthm s beng developed that wll ncorporate neghbor loop measurements as well as the past day s statstcs. Whle the lnear model descrbes most neghbor pars, some pars have nonlnear relatonshps, so a more general model may be better. Another area for mprovement s the handlng of entre blocks of mssng data. The current mputaton algorthm needs a large number of good loops to mpute the rest, but t does not work f most or all the loops are bad for a sample tme. A method s needed for addressng ths stuaton. Sngle-loop data dagnostcs s an mportant area of research. Whle loop detectors are the most abundant source of traffc nformaton, the data are sometmes bad or mssng. The algorthms presented construct a complete grd of clean data n real tme. They smplfy the desgn of upper-level algorthms and mprove the accuracy of analyss based on loop data..2 3 Occupancy.15.1.5 Orgnal Estmate Vehcles Per Hour 2 1 Orgnal Estmate 4 6 8 1 12 14 Tme of Day (a) 4 6 8 1 12 14 Tme of Day (b) FIGURE 7 Orgnal and estmated (a) occupances and (b) flows for a good loop.

Chen et al. Paper No. 3-357 167 ACKNOWLEDGMENTS Ths study s part of the PeMS project, whch s supported by grants from Caltrans to the Calforna PATH Program. The authors thank the engneers from Caltrans Dstrcts 3, 4, 7, 8, 11, and 12 and from Caltrans headquarters for ther encouragement, understandng, and patence. REFERENCES 1. Chen, C., K. Petty, A. Skabardons, P. Varaya, and Z. Ja. Freeway Performance Measurement System: Mnng Loop Detector Data. In Transportaton Research Record: Journal of the Transportaton Research Board, No. 1748, TRB, Natonal Research Councl, Washngton, D.C., 21, pp. 96 12. 2. Payne, H. J., E. D. Helfenben, and H. C. Knobel. Development and Testng of Incdent Detecton Algorthms. FHWA-RD-76-2. FHWA, U.S. Department of Transportaton, 1976. 3. Jacobson, L. N., N. L. Nhan, and J. D. Bender. Detectng Erroneous Loop Detector Data n a Freeway Traffc Management System. In Transportaton Research Record 1287, TRB, Natonal Research Councl, Washngton, D.C., 199, pp. 151 166. 4. Cleghorn, D., F. L. Hall, and D. Garbuo. Improved Data Screenng Technques for Freeway Traffc Management Systems. In Transportaton Research Record 132, TRB, Natonal Research Councl, Washngton, D.C., 1991, pp. 17 23. 5. Nhan, N. Ad to Determnng Freeway Meterng Rates and Detectng Loop Errors. Journal of Transportaton Engneerng, Vol. 123, No. 6, 1997, pp. 454 458. 6. Daley, D. J. Improved Error Detecton for Inductve Loop Sensors. WA- RD 31. Washngton State Department of Transportaton, Olympa, 1993. The contents of ths paper reflect the vews of the authors, who are responsble for the facts and the accuracy of the data presented. The contents do not necessarly reflect the offcal vews of or polcy of the Calforna Department of Transportaton. Ths paper does not consttute a standard, specfcaton, or regulaton. Publcaton of ths paper sponsored by Commttee on Hghway Traffc Montorng.