Proximity Analysis of Vulnerable Facilities to Infrastructure Facilities Using Density Estimation: California Facilities

Size: px

Start display at page:

Download "Proximity Analysis of Vulnerable Facilities to Infrastructure Facilities Using Density Estimation: California Facilities"

Willis Glenn
5 years ago
Views:

1 Proximity Analysis of Vulnerable Facilities to Infrastructure Facilities Using Estimation: California Facilities DRAFT REPORT, May 6, 2008 Rae Zimmerman, Jeffrey S. Simonoff, Zvia Segal Naphtali, Carlos E. Restrepo and Henry Willis Introduction Public and private decision-makers continue to seek risk-based approaches to allocating funds to secure facilities that are potentially vulnerable to terrorist attacks. An important input into decisions about allocating funds for security is the density of critical infrastructure around facilities that could be potential targets of an attack. A terrorist attack on a vulnerable facility located in the vicinity of a critical infrastructure system could disable that system, thus have cascading impacts on other infrastructure systems, and ultimately on the economy. Objective The objective of this work is to develop a methodology to help decision-makers and policymakers rank potentially vulnerable facilities (i.e., facilities vulnerable to terrorist attack) by developing and applying a measure of their proximity to infrastructure facilities. High infrastructure density around a vulnerable facility may exacerbate overall vulnerability. The work presented in this report uses California as a case area. Developing measures of infrastructure density around facilities considered vulnerable would support decision-makers in allocating scarce financial resources to a number of those facilities and distribute available resources for security purposes. Financial resources are assumed to be distributed at the county level, thus, those counties with the highest number of vulnerable facilities close to infrastructure facilities would receive more resources for security. Hence, the methodology presented in this report can be used as an input to prioritize facilities according to infrastructure density around them. This work was initiated after a discussion that Rae Zimmerman, Carlos E. Restrepo and Henry Willis had about interesting policy research issues and approaches related to resource allocation for security involving infrastructure. Methodology and Approach The approach adopted is to obtain distances between the set of vulnerable facilities provided by CREATE (indicated as red dots in Figure 1) and selected infrastructure facilities from publicly available datasets. Where databases were only available in the form of addresses, the data was geocoded to obtain a consistently defined and analyzable database. The data was then put into a spreadsheet. A statistical analysis was undertaken whose objective was to provide a density measure of the infrastructure around vulnerable facilities. The infrastructure density measure developed is based on an estimate of the probability density of infrastructure distances from each vulnerable facility ( red dots ). 1

2 Data Vulnerable facilities are the list of facilities from CREATE in the state of California (referred to as red dots in this paper). The identity of these facilities has not been revealed by CREATE other than the fact that they belong to categories of important facilities that are at risk for being attacked by terrorists. Infrastructure facilities are the facilities in the NYU-Wagner ICIS infrastructure databases obtained from publicly-available sources and include facilities such as dams, bridges, airports, ports, major highways, etc. Using GIS to Map Facilities and to Estimate Distances As an initial exercise the team tested the methodologies with one set of infrastructure data: the location of dams and their spatial location relative to the vulnerable facilities provided by CREATE. Zvia Segal Naphtali produced some initial GIS maps to examine the data. Figure 1 shows the vulnerable facilities ( red dots ) and dams ( yellow dots ) for the state of California. Figure 1. Vulnerable Facilities and Dams in California As a second step the team decided to narrow down the dams database to include only high hazard dams, as these would be the dams that, if damaged, would result in the largest effects on the surrounding population. Naphtali used ArcGIS to estimate the distance between vulnerable facilities and high hazard dams. Table 1 shows a sample of the resulting data. The columns represent the vulnerable facilities (178 facilities) and each row represents a high hazard dam. Distances are in kilometers. The same procedure was followed to estimate distances between each vulnerable facility and all the Bay Area Rapid Transit (BART) stations. 2

3 Table 1. Distance Estimated Between Vulnerable Facilities and High Hazard Dams in California These distance estimates were then used as inputs into an analysis of infrastructure density around vulnerable facilities that uses smoothing methods. Jeffrey S. Simonoff developed two separate sets of analyses for infrastructure density measures using high hazard dams and Bay Area Rapid Transit (BART) stations. These analyses are presented below. Measures of Infrastructure Using Dams for the Infrastructure Measure The objective of this work is to provide a measure of the potential impact of an attack on a vulnerable facility ( red dots in Figure 1) on other infrastructure ( yellow dots in Figure 1). The data consists of a set of distances from the vulnerable facilities to a high hazard dam (in kilometers). The analytical tool used is an estimate of the probability density of infrastructure distances from the vulnerable facility. This takes as the underlying model that infrastructure is placed randomly relative to the vulnerable facility according to some underlying probability distribution that depends on both the vulnerable facility and the infrastructure. This is not technically true, of course, but as a working model it allows for measures of the density of closeness of infrastructure to vulnerable facilities. Consider a sample { x1, K, x n } of observations from a density f ( x ); in our case the x values are distances from the vulnerable facility, and f () is the underlying probability density function for the distribution of infrastructure distances from the vulnerable facility. The measures used are based on a log-linear local likelihood density estimator (Hjort and Jones, 1996; Loader, 1996). The idea behind this density estimator is to approximate the log-density locally with a straight line. Specifically, the density estimate at any point x is f ( x ) = exp( a 0), where a0 is the constant term of the minimizer of n xi x u x K [ a0 + a1( xi x)] n K exp[ a0 + a1( u x)]. i= 1 h du h ^ ^ ^ 3

4 The standard kernel density estimator corresponds to this estimator with taken to be equal to 0, but the more general version has the advantage of automatically adjusting for the tendency of the kernel estimator to be biased downwards at the boundaries 1. The smoothness of the density estimate is determined by the smoothing parameter h, with too small a value resulting in an undersmoothed estimate that has too many local bumps and dips and too large a value resulting in an oversmoothed estimate that potentially smooths over important local structure in the underlying density. The smoothing parameter can be chosen by eye in a particular case, but it is useful to have an automatic choice (particularly since we are estimating almost 200 densities for each type of infrastructure here). A method that has been shown to be useful for local likelihood density estimation is choosing h based on AIC C, the corrected Akaike Information Criterion (Hurvich, Simonoff, and Tsai, 1998; Simonoff, 1998). Once the density estimate is formed, it can be used to construct different measures of infrastructure density around the vulnerable facility. The simplest measure is which can be interpreted as the estimated probability of infrastructure occurring within one kilometer of the vulnerable facility. Similarly, the measure d ^ I( d) = f( u) du 0 a 1 ^ (0), f is an estimate of the probability of infrastructure occurring within d kilometers of the vulnerable facility (this is the area under the estimated density curve from 0 to d); values of d such as 50 or 100 would seem reasonable to represent distances within which infrastructure could be affected by an attack on the vulnerable facility. Note that the first proposed measure is I (0) ; while technically the probability of infrastructure occurring within one kilometer would be I (1) rather than I (0), these values will be virtually identical to each other. Examples of infrastructure density estimation Four vulnerable facilities were initially examined (numbers 0, 11, 44, and 136 in the database). These were chosen because they reflect different patterns in the infrastructure density, and cover reasonably well the different patterns that seem to be in the data. The smoothing parameters for the density estimates have been chosen here by eye, rather than using an automatic criterion. Vulnerable facility 0 1 See Simonoff (1996, Section 3.4) for more discussion of the advantages of local likelihood estimation over kernel estimation. 4

5 Sensitive facility The distribution is bimodal, with a smaller cluster of infrastructure centered roughly 250 kilometers away, and a larger cluster centered roughly 700 kilometers away. Since Los Angeles and San Francisco are roughly 600 kilometers apart, it is reasonable to think that the two clusters correspond to centers of dam infrastructure in those two areas. The measures of infrastructure density for this vulnerable facility are I(0)= , I(50)= , and I(100)= Vulnerable facility 11 5

6 Sensitive facility The distribution is trimodal, with a tight cluster roughly 50 kilometers away, a larger cluster centered roughly 200 kilometers away, and a smaller cluster centered 600 kilometers away. The measures of infrastructure density for this vulnerable facility are I(0)= , I(50)= , and I(100)= Vulnerable facility 44 6

7 Sensitive facility The distribution is bimodal, with a large cluster roughly 200 kilometers away, and a smaller cluster centered 500 kilometers away. The measures of infrastructure density for this vulnerable facility are I(0)= , I(50)= , and I(100)= Vulnerable facility 136 7

8 Sensitive facility The distribution is unimodal, with infrastructure centered roughly 300 kilometers away. The measures of infrastructure density for this vulnerable facility are I(0)= , I(50)= , and I(100)= The different measures imply different orderings of infrastructure density (and hence, potentially different priority levels for resource allocation) for the four vulnerable facilities (ordered from highest to lowest), but all agree that facilities 11 and 44 are closer to dam infrastructure, while facilities 0 and 136 are farther away: I(0) I(50) I(100)

9 Using BART stations for the Infrastructure Measure This section uses the same approach to measure infrastructure density around a vulnerable facility as the previous section but instead of using dams the infrastructure data refers to Bay Area Rapid Transit District (BART) stations. The situation here is less complex than that relating to dams, since there are only 43 BART stations, and they are concentrated in the San Francisco Bay Area (San Francisco, and Contra Costa and Alameda counties). Four infrastructure density measures are shown below, adding I(150) (the probability of infrastructure within 150 kilometers) to the earlier I(0), I(50), and I(100). Examples of infrastructure intensity estimation Six vulnerable facilities (numbers 0, 9, 11, 15, 28, and 34) are examined. The smoothing parameters for the density estimates have been chosen here by eye, rather than using an automatic criterion. Vulnerable facility 0 9

10 Sensitive facility The distribution is unimodal, centered roughly 690 kilometers away. Obviously this facility is nowhere near the Bay Area. The measures of infrastructure density for this vulnerable facility are all virtually zero, as would be expected: I(0)= 4.6 x 10-22, I(50)= 1.9 x 10-19, I(100)= 5.4 x 10-18, and I(150)= 1.5 x Vulnerable facility 9 10

11 Sensitive facility The distribution is unimodal, with a peak centered 140 kilometers away. The measures of infrastructure density for this vulnerable facility are I(0)= 3.7 x 10-7, I(50)= , I(100)= , and I(150)= Vulnerable facility 11 11

12 Sensitive facility The distribution is bimodal, with a large cluster roughly 15 kilometers away, and a smaller bump centered 35 kilometers away. The measures of infrastructure density for this vulnerable facility are I(0)= , I(50)= , I(100)= , and I(150)= Vulnerable facility 15 12

13 Sensitive facility The distribution is bimodal, with infrastructure centered roughly 10 and 20 kilometers away. The measures of infrastructure density for this vulnerable facility are I(0)= , I(50)= , I(100)= , and I(150)= Vulnerable facility 28 13

14 Sensitive facility The distribution is unimodal, with a sharp peak centered 80 kilometers away (and a slight hint of a bump 90 kilometers away). The measures of infrastructure density for this vulnerable facility are I(0)= 3.5 x 10-9, I(50)= , I(100)= , and I(100)= Vulnerable facility 34 14

15 Sensitive facility The distribution is bimodal, with infrastructure centered roughly 245 and 255 kilometers away. The measures of infrastructure density for this vulnerable facility are I(0)= 4.2 x 10-9, I(50)= , I(100)= , and I(150)= The different measures imply different orderings of infrastructure density for the six vulnerable facilities (ordered from highest to lowest), but all agree that facilities 11 and 15 are closest to BART infrastructure, facilities 9, 28, and 34 are a moderate distance away, while facility 0 is very far away: I(0) I(50) I(100) I(150)

16 Future Research Directions The exploratory analyses shown in this report refer to measures of infrastructure density using two sets of infrastructure data separately: high hazard dams and BART stations. The next step will be to extend these analyses to include all vulnerable facilities, and then connect the density estimates to funding priorities at (for example) the county level. Such analyses will also be applied to other publicly available infrastructure databases, such as those for Amtrak stations, airports, ports, closest distance to major highways, and others. We will also explore how the analyses for different types of infrastructure can be combined to provide an overall risk measure for collateral damage to infrastructure from an attack on a vulnerable facility; an important challenge will be to decide whether different infrastructure types should be weighted differently in the analyses. References Hjort, N.L. and Jones, M.C. (1996), Locally parametric density estimation, Annals of Statistics, 24, Hurvich, C.M., Simonoff, J.S., and Tsai, C.-L. (1998), Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion, Journal of the Royal Statistical Society, Ser. B, 60, Loader, C.R. (1996), Local likelihood density estimation, Annals of Statistics, 24, Simonoff, J.S. (1996), Smoothing Methods in Statistics, Springer-Verlag, New York. Simonoff, J.S. (1998), Three sides of smoothing: Categorical data smoothing, nonparametric regression, and density estimation, International Statistical Review, 66,

Santa Clara County. San Francisco City and County. Marin County. Napa County. Solano County. Contra Costa County. San Mateo County

Santa Clara County. San Francisco City and County. Marin County. Napa County. Solano County. Contra Costa County. San Mateo County Marin County Santa Clara County San Francisco City and County Sonoma County Napa County Contra Costa County San Mateo County Solano County Alameda County ABAG History In 1961, Bay Area leaders recognized