An Extreme Value Approach to Information Technology Security Investment

Similar documents
Evaluating Influence Diagrams

Networks An introduction to microcomputer networking concepts

Bias of Higher Order Predictive Interpolation for Sub-pixel Registration

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 6, NO. 5, MAY On the Analysis of the Bluetooth Time Division Duplex Mechanism

TAKING THE PULSE OF ICT IN HEALTHCARE

Multi-lingual Multi-media Information Retrieval System

REPLICATION IN BANDWIDTH-SYMMETRIC BITTORRENT NETWORKS. M. Meulpolder, D.H.J. Epema, H.J. Sips

On the Computational Complexity and Effectiveness of N-hub Shortest-Path Routing

The Impact of Avatar Mobility on Distributed Server Assignment for Delivering Mobile Immersive Communication Environment

An Adaptive Strategy for Maximizing Throughput in MAC layer Wireless Multicast

Maximum Weight Independent Sets in an Infinite Plane

ABSOLUTE DEFORMATION PROFILE MEASUREMENT IN TUNNELS USING RELATIVE CONVERGENCE MEASUREMENTS

A sufficient condition for spiral cone beam long object imaging via backprojection

Preparing better graphs

Pavlin and Daniel D. Corkill. Department of Computer and Information Science University of Massachusetts Amherst, Massachusetts 01003

Topic Continuity for Web Document Categorization and Ranking

A choice relation framework for supporting category-partition test case generation

Optimal Sampling in Compressed Sensing

Millimeter-Wave Multi-Hop Wireless Backhauling for 5G Cellular Networks

Towards Understanding Bilevel Multi-objective Optimization with Deterministic Lower Level Decisions

5 Performance Evaluation

Efficient and Accurate Delaunay Triangulation Protocols under Churn

Tdb: A Source-level Debugger for Dynamically Translated Programs

Computer-Aided Mechanical Design Using Configuration Spaces

Blended Deformable Models

A GENERIC MODEL OF A BASE-ISOLATED BUILDING

COMPOSITION OF STABLE SET POLYHEDRA

Distributed Systems Security. Authentication Practice - 2. Prof. Steve Wilbur

Date: December 5, 1999 Dist'n: T1E1.4

TRUSTED WIRELESS HEALTH A New Approach to Medical Grade Wireless

Lecture 4: Routing. CSE 222A: Computer Communication Networks Alex C. Snoeren. Thanks: Amin Vahdat

CYBER INSURANCE: A DEEP DIVE

MultiView: Improving Trust in Group Video Conferencing Through Spatial Faithfulness David T. Nguyen, John F. Canny

Pipelined van Emde Boas Tree: Algorithms, Analysis, and Applications

AUTOMATIC REGISTRATION FOR REPEAT-TRACK INSAR DATA PROCESSING

Appearance Based Tracking with Background Subtraction

EMC VNX Series. Problem Resolution Roadmap for VNX with ESRS for VNX and Connect Home. Version VNX1, VNX2 P/N REV. 03

Use of Extreme Value Statistics in Modeling Biometric Systems

Computer User s Guide 4.0

Availability Analysis of Application Servers Using Software Rejuvenation and Virtualization

Requirements Engineering. Objectives. System requirements. Types of requirements. FAQS about requirements. Requirements problems

THE Unit Commitment problem (UCP) is the problem of

What s New in AppSense Management Suite Version 7.0?

POWER-OF-2 BOUNDARIES

Fault Tolerance in Hypercubes

PARAMETER OPTIMIZATION FOR TAKAGI-SUGENO FUZZY MODELS LESSONS LEARNT

The Disciplined Flood Protocol in Sensor Networks

Master for Co-Simulation Using FMI

Maximal Cliques in Unit Disk Graphs: Polynomial Approximation

IDENTIFICATION OF THE AEROELASTIC MODEL OF A LARGE TRANSPORT CIVIL AIRCRAFT FOR CONTROL LAW DESIGN AND VALIDATION

Statistical Methods in functional MRI. Standard Analysis. Data Processing Pipeline. Multiple Comparisons Problem. Multiple Comparisons Problem

Uncertainty Determination for Dimensional Measurements with Computed Tomography

Combined cine- and tagged-mri for tracking landmarks on the tongue surface

Ma Lesson 18 Section 1.7

StaCo: Stackelberg-based Coverage Approach in Robotic Swarms

Real-time mean-shift based tracker for thermal vision systems

Minimal Edge Addition for Network Controllability

Congestion-adaptive Data Collection with Accuracy Guarantee in Cyber-Physical Systems

FINITE ELEMENT APPROXIMATION OF CONVECTION DIFFUSION PROBLEMS USING GRADED MESHES

EMC ViPR. User Guide. Version

On Plane Constrained Bounded-Degree Spanners

Putting the dynamic into software security testing

Image Denoising Algorithms

Continuity Smooth Path Planning Using Cubic Polynomial Interpolation with Membership Function

Discrete Cost Multicommodity Network Optimization Problems and Exact Solution Methods

Spam detection system: a new approach based on interval type-2 fuzzy sets

Computing the weights of criteria with interval-valued fuzzy sets for MCDM problems Chen-Tung Chen 1, Kuan-Hung Lin 2, Hui-Ling Cheng 3

Stereo Matching and 3D Visualization for Gamma-Ray Cargo Inspection

Discretized Approximations for POMDP with Average Cost

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Method to build an initial adaptive Neuro-Fuzzy controller for joints control of a legged robot

A Hybrid Weight-Based Clustering Algorithm for Wireless Sensor Networks

Understanding BGP Misconfiguration

Cost Based Local Forwarding Transmission Schemes for Two-hop Cellular Networks

Tu P7 15 First-arrival Traveltime Tomography with Modified Total Variation Regularization

CS224W Final Report. 1 Introduction. 3 Data Collection and Visualization. 2 Prior Work. Cyprien de Lichy, Renke Pan, Zheng Wu.

A personalized search using a semantic distance measure in a graph-based ranking model

Picking and Curves Week 6

A Wireless MAC Protocol comparison.

NETWORKED CONTROL SYSTEM: THEORY AND SIMULATIONS. A Project by. Sandeep Bimali

Constructing and Comparing User Mobility Profiles for Location-based Services

Nash Convergence of Gradient Dynamics in General-Sum Games. Michael Kearns.

Image Compression Compression Fundamentals

EECS 487: Interactive Computer Graphics f

Risk Based Security. Automotive Safety & Security, 30. Mai 2017 Christof Ebert and Dominik Lieckfeldt, Vector Consulting Services V1.

Extreme Value Theory in (Hourly) Precipitation

Resolving Linkage Anomalies in Extracted Software System Models

CS 4204 Computer Graphics

Constrained Routing Between Non-Visible Vertices

Cohesive Subgraph Mining on Attributed Graph

Compound Catadioptric Stereo Sensor for Omnidirectional Object Detection

CS 153 Design of Operating Systems Spring 18

An Introduction to GPU Computing. Aaron Coutino MFCF

IP Multicast Fault Recovery in PIM over OSPF

Data/Metadata Data and Data Transformations

Dynamic Maintenance of Majority Information in Constant Time per Update? Gudmund S. Frandsen and Sven Skyum BRICS 1 Department of Computer Science, Un

Addressing in Future Internet: Problems, Issues, and Approaches

arxiv: v3 [math.co] 7 Sep 2018

QoS-driven Runtime Adaptation of Service Oriented Architectures

Estimating Model Parameters and Boundaries By Minimizing a Joint, Robust Objective Function

Transcription:

Association for Information Systems AIS Electronic Library (AISeL) ICIS 2005 Proceedings International Conference on Information Systems (ICIS) December 2005 An Extreme Vale Approach to Information Technology Secrity Investment Jinggo Wang State University of New York, Bffalo Abhijit Chadhry Bryant University Raghav Rao State University of New York, Bffalo Follow this and additional works at: http://aisel.aisnet.org/icis2005 Recommended Citation Wang, Jinggo; Chadhry, Abhijit; and Rao, Raghav, "An Extreme Vale Approach to Information Technology Secrity Investment" (2005). ICIS 2005 Proceedings. 29. http://aisel.aisnet.org/icis2005/29 This material is broght to yo by the International Conference on Information Systems (ICIS) at AIS Electronic Library (AISeL). It has been accepted for inclsion in ICIS 2005 Proceedings by an athorized administrator of AIS Electronic Library (AISeL). For more information, please contact elibrary@aisnet.org.

AN EXTREME VALUE APPROACH TO INFORMATION TECHNOLOGY SECURITY INVESTMENT Jinggo Wang School of Management State University of New York, Bffalo Bffalo, NY U.S.A. wang7@bffalo.ed Aby Chadhry CIS Department Bryant University Smithfield, RI U.S.A. achadh@bryant.ed H. Raghav Rao School of Management State University of New York, Bffalo Bffalo, NY U.S.A. mgmtrao@bffalo.ed Abstract Information technology secrity investment is receiving increasing attention in recent years. Varios methods have been proposed to determine the effective level of secrity investment. In this paper, we introdce an extreme vale approach to address the isses of effective bdgeting and investing in IT secrity. In or model, the secrity stats of a system depends on two factors: system secrity level, which is measred by the level of secrity investment, and system attack level, which reflects the secrity risk with which the system is confronted. Secrity investment level is endogenos to the system, while attack level is exogenos. Extreme vale analysis is sed to characterize the stochastic behavior of high-level attacks based on the historical data and to make inferences on ftre attacks. Based on these inferences, we determine the effective secrity soltions and the level of secrity investment to modlate the likelihood of system failre. For illstration prposes, we se an extreme vale approach to analyze a set of traffic data collected from a regional bank. Keywords: Information assrance, secrity investment, two-factor model, extreme vale theory, denial of service (DoS) Introdction The importance of effective management of information technology secrity has increased in recent years de to the increasing freqency and cost of secrity breaches (Gordon et al. 2005). While high-risk organizations may adopt secrity at any price, most commercial organizations have to consider the cost-benefit tradeoff for sch an investment. How to efficiently invest in IT secrity is a big challenge. In the Ernst & Yong Global Information Secrity Srvey (Ernst & Yong 2003, 2004), bdget constraints are listed as one of the main obstacles to effective information secrity. Qantification tools, if applied prdently, can assist in the anticipation and control of direct and indirect compter secrity cost (Geer et al. 2003; Mercri 2003). In this paper, we propose an approach based on extreme vale theory (Gmbel 1958) for IT secrity investment. In or model the secrity stats of a system depends on two factors: system secrity level, which is measred by the level of secrity investment, and system attack level, which reflects the secrity risk with which the system is confronted. Attack level is treated as an exogenos variable that cases system failre, while secrity investment level is endogenos, preventing system failre, and is determined by organizations. The difference between secrity level and the attack level measres the vlnerability of the system. 2005 Twenty-Sixth International Conference on Information Systems 347

Secrity and Assrance Instead of calclating the expected loss vale, we apply extreme vale theory to stdy the extreme attack behavior and address the isses of effective bdgeting and investing in IT secrity. Extreme vale theory is one of the most important statistical disciplines for the applied science, and has fond applications in engineering (Castillo 1988), insrance and finance (Embrechts et al. 1997), and management strategy (Dahan and Mendelson 2001), as well as in environment and biomedical research. Extreme vale theory qantifies the stochastic behavior of a process at nsally large (or small) levels. It is concerned with probabilistic and statistical qestions related to those extreme events. To or knowledge, this is the first paper to apply the extreme vale theory in secrity investment decisions. With the application of extreme vale theory, we attempt to address the following isses: 1. What is the probability distribtion of high-level attacks (i.e., what is the probability that an attack over a given level will occr dring a given year)? 2. What secrity investment is needed so that the probability of potential system failre is below a certain threshold? 3. What are the factors affecting the behavior of high-level attacks? Are the natre and cases of high-level attacks changing over time? Is it a seasonal phenomenon? By answering these qestions, we make inferences on ftre attacks, ths determining the effective secrity soltions and investment level to modlate the likelihood of system failre. Consider defending against denial of service (DoS) as an example. A DoS attack is an incident in which an organization is deprived of the services of a resorce they wold normally expect to have. A Web site can occasionally be forced to temporarily cease operation when accessed by millions of people. High-level traffic is always regarded as a signal of forthcoming DoS attacks. Sppose that, as part of or design criteria to defend against DoS, the Web server is reqired to be able serve all traffic that it is likely to experience within its projected life span, say 1 year (or more years). Daily traffic is monitored and historical data might be available for the last 2 years. The challenge is to estimate the traffic level that might occr over the next 1 year given the 2-year history. Extreme vale theory provides a framework enabling sch extrapolations. Using extreme vale theory, we may not only estimate the distribtion of high-level traffic and the occrrence probability of the traffic over a given level dring a given year, bt also answer sch qestions as what level of traffic will be exceeded with probability 1/365 in a given day. In addition, we may identify factors that inflence the behavior of high-level traffic with proper regression analyses, ths helping s predict the trend of traffic with the change of environment and time. Based on the characterization of extremely heavy traffic, we make inferences on ftre attacks and determine a proper secrity soltion and the level of investment. The organization of this paper is as follows. In the next section, after a literatre review, we introdce or two-factor secrity model. We then present the extreme vale theory. We apply extreme vale analysis to a set of daily internal traffic data collected from a regional bank for illstrative prposes. Finally, we smmarize or stdy and discss ideas for ftre research. Secrity Risk and Secrity Investment Related Literatre Several models have been proposed to determine the effective level of secrity investment. There are basically two approaches (Cavsogl 2004). 1. Using traditional risk or decision analysis framework. Generally these models apply a standard reslt in optimal-control theoretic certainty eqivalence, which implies that only the mean vales (probability-weighted average otcomes) of target variables matter for an optimal policy setting. Gordon and Loeb (2002) proposed an expected benefits of investment in information secrity (EBIS) model. Hoo (2000) sed a decision analysis approach to evalate different policies for IT secrity. Longstaff et al. (2000) proposed a hierarchical holographic model (HHM) to assess secrity risks and provide a model for assessing the efficacy of risk management. 2. Using game theory to model the strategic interactions between the organizations and attackers. Some researchers arge that IT secrity can be treated as a kind of game between organizations and attackers. While the organizations try to cover vlnerabilities in their systems, attackers race in an effort to exploit them. Secrity investments not only prevent secrity breaches by redcing vlnerabilities that attackers can exploit bt also act as a deterrent for attackers by making attacks less attractive (Schechter and Smith 2003). Longstaff et al. (2000)arged that investment in system risk assessment can redce 348 2005 Twenty-Sixth International Conference on Information Systems

Wang et al./it Secrity Investment the likelihood of intrsions, which yields benefits mch higher than the investment. Cavsogl, et al. (2004) constrcted a game tree to describe the interaction between organization and hackers. There are two main isses in these models. 1. The expected loss vale or benefit vale cannot flly characterize secrity failres. Usally secrity failres are lowprobability events, bt once realized, failres can bring hge loss. The loss may be intangible and not amenable to accrate estimation. 2. Rationality of hackers is hard to captre as they may be motivated by a different vale system. They may be rational, bt not in or terms. They may be driven by motivations other than money. It is hard for s to know their cost fnction for attacking the system. A Two-Factor Secrity Model Secrity risk assessment determines the level of secrity risk that exists within the organization. Farahmand et al. (2005) presented a sbjective analysis and probability assessment with a damage evalation of information secrity incidents. Geer et al. (2003) introdced a techniqe called bsiness-adjsted risk (BAR) for classifying secrity defects by their vlnerability type, degree of risk, and potential bsiness impact. In this paper, we define attack level a as a metric, which reflects the threats that an organization confronts, in the same manner as the temperatre reflects the relative warmth and cold of a day and the Dow Jones Index reflects the healthiness of the stock market. Attack level may be evalated daily or monthly based on the information on hackers, worms, virs, and other attack incident. A similar idea is sed in the Homeland Secrity Advisory System (http://www.dhs.gov/dhspblic/display?theme=29). On a daily basis of monitoring and analyzing threat information, the government may isse a threat level to reflect the crrent sitation (severe red; high orange; elevated yellow; garded ble; and low green). In defending against DoS, organizations may monitor the daily traffic, and regard the level of traffic as the attack level on systems. We define secrity level s as the ability of an organization to defend its IT systems from failre reslting from a secrity attack, sch as the capability that an organization has in defending the system again a DoS attack. The system s secrity level is converted from the organization s secrity investment i. By investing in IT secrity (training secrity staff; bying new technologies sch as an intrsion detection system, a firewall, etc.; timely installation of software patches; and increasing the system capacity), the organization improves its system secrity level. For simplicity in or discssion, we assme that the level of IT secrity investment is eqivalent with the secrity level system in or model. Schechter (2004) arged that when attacking a software system is only as difficlt as it is to obtain a vlnerability to exploit, the secrity strength of that system is eqivalent to the market price of sch vlnerability. He sggested that the strength of a system s secrity shold be qantified from the viewpoint of the attacker rather than the defender, and introdces an approach that secrity strength can be measred sing a market mechanism. In or paper, we se the difference between secrity level and the attack level, which is the term i-a, to measre the vlnerability of the system. We assme that both the investment level i and the attack level a are continos. The secrity stats of information systems is affected only by these two factors. We define a system srvival fnction (the probability fnction of system failre) (F) depending on the probability that i a < L, where L is a certain threshold of vlnerability; that is, F = probability of failre = prob. (i a < L) (1.1) where i is the investment level and a is the attack level. F increases when a increases, and decreases when i increases. When (i a) increases, F decreases. If we assme L = 0, the probability of system failre depends on the probability that i < a (i.e., the probability that investment level is less than eqal to attack level). a (attack level) is exogenos in the fnction F, while i (investment level) is endogenos. An organization follows a dynamic investment strategy, in which it makes investment decisions based on attack level a and the stats of the system F; that is, i = i( a, F) (1.2) 2005 Twenty-Sixth International Conference on Information Systems 349

Secrity and Assrance Table 1. Notations Two-Factor Secrity Model F System srvival fnction (the probability fnction of system failre) i Secrity investment level a System attack level Extreme Vale Theory X Random observations, x is one observation from X F Common cmlative distribtion fnction F of random observations M Block maxima, M = max{ X, X,... X }, and n is the nmber of observations. n n 1 2 H ( x) The limiting distribtion of exetrma µ, σ, α Distribtion Parameters of Frechet, Weibll, Gmbel, as well as Generalized Extreme Vale distribtion (GEV) ξ Shape Parameter of GEV Retrn Level x p xˆ Estimated Retrn Level p 1/p Retrn Period, p is a probability High threshold Y Y = X > 0, y is an observation from Y G Generalized Pareto Distribtion (GPD) ξ, σ, ψ Distribtion Parameters of GPD The maximm level in time period t Z t n In the example of defending against DoS, the system stats depends on the system capacity and the daily traffic experienced by the system. The probability of system failre is determined by the probability that the system capacity is less than the daily traffic. Based on the observed daily traffic, the organization determines proper secrity soltions. One of the key reqirements for sch a dynamic investment strategy is to accrately captre and model the dynamic behavior of attacks. With or two-factor secrity model, it is important for s to know the behavior of extreme attacks for an effective investment. To defend against DoS, we need to nderstand the behavior of high-level traffic so that we can make inferences on ftre attacks and design proper defense soltions to prevent the system failre cased by extremely heavy traffic. In the following section, we introdce extreme vale theory, which we se to characterize the stochastic behavior of high-level attacks and to identify the factors (inclding time) that may inflence high-level attacks. Table 1 lists the notations we se in the analysis. Extreme Vale Analysis Classic Extreme Vale Theory The principal reslts of extreme vale theory concern the limiting distribtion of sample extrema (maxima or minima). Since in or model the probability of system failre depends on the probability of the exceedance of attack over investment and we are concerned with the behavior of extreme large attacks, sch as the distribtion of high-level traffic, we will only discss sample maxima here. Sppose that X 1, X 2, X n is a seqence of independent, identically distribted observations, sch as n-day daily traffic in DoS, with a common cmlative distribtion fnction F, which is not necessarily known. Let the sample maximm be denoted by M n = max {X 1, X 2, X n } (M n is also referred as block maxima). We are interested in the stochastic behavior of M n. We know Pr{ M x} = F( x) n n (1.3) 350 2005 Twenty-Sixth International Conference on Information Systems

Wang et al./it Secrity Investment Reslt (1.3) is of no immediate interest, since it simply says that for any fixed x for which F(x) < 1, we have Pr{ M x} 0 as n goes to infinity. For nontrivial limit reslts we mst renormalize: find a n > 0, b n sch that n Mn b n Pr x = Fax ( n + bn) Hx ( ) an (1.4) where H(x) is the limiting distribtion of F(a n x + b n ). The fndamental theorem of the extreme vale theorem provides three possible distribtions for H(x) as follow: Theorem 1 (Fisher and Tippett 1928): The only three types of non-degenerate distribtions H(x) satisfying Eqation (1.4) are α x µ σ Frechet: H( x) = e if x µ (1.5) 0 otherwise 1 if x µ α Weibll: H( x) = µ x (1.6) σ e otherwise x µ σ e Gmbel: H() x = e x and σ > 0 (1.7) The three extreme-vale distribtions, normalized to mean and nit variance, are shown in Figre 1 (" = 3.9 for Frechet, and " = -2 for Weibll). The stars indicate the 99 th percentile of the distribtions respectively. Frechet: This is the long-tailed case. The nderlying attack-level distribtion H(x) for Frechet distribtion has a fat tail (e.g., 1 H(x) declines as x -" ). The attack level confronted by organizations has great pside ncertainty. Gmbel: This is the medim-tailed case for which 1 H(x) decreases exponentially for large x. For this type of attack distribtion, there are no specific limits on the attack level, bt the attack level is not likely to be too high or too low. Most attack levels are distribted in a central range. Weibll: This is the short-tailed case in which the distribtion has a finite endpoint. Organizations face predictably finite attack levels. Figre 1. Densities for the Three Extreme Vale Distribtions (: = 0, F² = 1) 2005 Twenty-Sixth International Conference on Information Systems 351

Secrity and Assrance There are several reasons why there may be different probability distribtions for the high-level attack. First, it may be de to the natre of digital assets: the perceived vale of digital assets and their criticalness to the pblic and organizations. Terrorism or engineered attacks are more attracted to high-valed digital assets, or the digital assets that can case high-level damage to the pblic and/or organizations. Second, different attack types may have different distribtions. For example, a violation that is initiated from a finite nmber of internal sers in an organization is likely to differ from threats from virses or worms, which can originate anywhere in the world. Third, the exposre range of the digital assets may also reslt in different distribtions of the attack level. The digital assets connected with the Internet are more likely to come nder high-level attack, while application systems having limited access in an isolated environment are less likely to be exposed to same level attack. Forth, de to negative externality of attack (Camp and Wolfram 2004), the size of the organization and its network becomes a factor. A larger organization is more likely to sffer an attack than a smaller one. However, empirical exploration of these hypotheses is needed. The three types may be combined into a single generalized extreme vale (GEV) distribtion (Coles 2001a, p. 48). x µ H ( x) = exp 1+ ξ σ 1/ ξ (1.8) where x µ x :1 ξ + > 0, : is a location parameter, F > 0is a scale parameter and > is a shape parameter. The limit σ 1 ξ 0 corresponding to the Gmbel distribtion, ξ > 0 to the Frechet distribtion with α =, ξ < 0 to the Weibll 1 distribtion with α =. By inversing the eqation (1.8), we obtain ξ ξ x p ξ { [ p ] } σ µ 1 log(1 ), for ξ 0 = ξ µ σ log{ log(1 p) }, for ξ = 0 where G(x p ) = 1 p. In common terminology, x p is the retrn level associated with the retrn period 1/p, since, to a reasonable degree of accracy, the level x p is expected to be once every 1/p periods. In other word, x p is exceeded by the period maximm in any particlar period with probability p. Exceedances over Thresholds Extremes are scarce, so model estimations of block maxima have a large variance (Coles 2001a, p. 66). Modeling block maxima is a wastefl approach to extreme vale analysis especially if one block happens to contain more extreme observations than another. If an entire time series of, say, horly or daily observations are available, the data may be better sed by avoiding the procedre of blocking. Exceedances over thresholds provide a alternative way to model extreme vale by characterizing an observation as extreme if it exceeds a high threshold. Theorem 2 (Coles 2001a, p. 75; Smith 2003): Consider the distribtion of X conditionally on exceeding some high threshold, and let Y = X, and Y > 0. We know F ( y) = Pr{ Y y Y > 0} = F ( + y) F ( ) 1 F ( ) As ω = sp{ x: F( x) < 1}, we fond a limit distribtion, F 352 2005 Twenty-Sixth International Conference on Information Systems

Wang et al./it Secrity Investment F ( y) G( y; σ, ξ ) where G is generalized Pareto distribtion (GPD) y G( y; σ, ξ ) = 1 1+ ξ σ 1/ ξ (1.9) defined on {y:y >0 and (1 + >y/f) > 0} The rigoros connection between exceedances over thresholds and the classic extreme vale theory was established by Pickands (1975). Similar with GEV, GPD has three cases depending on the vale of the parameter >: The case > > 0 is the long-tailed case, for which 1 G(x) decays at the same rate as x -1/> for large x. This is reminiscent of 1/ ξ the sal Pareto distribtion, Gx ( ) = 1 cx. For > = 0, we have the exponential distribtion with mean F as the limit y G( y; σ,0) = 1 exp σ σ ξ For > < 0, the distribtion has finite pper endpoint at. Replacing Y = X into (1.9), now we have x Pr{ X > x X > } = 1+ ξ σ 1/ ξ (1.10) It follows that x Pr{ X > x} = ς 1+ ξ σ 1/ ξ (1.11) where ς = Pr{ X > }. By inversing the eqation (1.11), we obtain x p ξ σ ς + 1, for ξ 0 ξ p = ς + σ log, for ξ = 0 p (1.12) x p is the (1/p)-observation retrn level. In other word, the level x p is expected to be once every 1/p observations to a reasonable degree of accracy, or the probability of an observation to exceed x p is p. Sppose that we have one observation for each day. Then a 365-observation retrn level is the same as a 1-year (or 365-day) retrn level, which is the level expected to be exceeded once every 365 observations (or in a year). Factor Analysis In the above discssion, we do not consider that high-level attacks may systemically change throgh time, or be inflenced by the changes of other environmental factors. In the context of DoS, the network traffic or server load may increase over time, 2005 Twenty-Sixth International Conference on Information Systems 353

Secrity and Assrance becase the Internet is expanding and the e-bsiness is matring. The organization s internal traffic may be affected significantly by the nmber of employees and the nmber of enterprise applications. The activities of worms, virses, or hackers may vary seasonally. In the following discssion, we introdce the models that captre these changes and inflences. Let GEV(:, F, >) denote the GEV distribtion with parameters :, F, and >. Let Z t denote the maximm attack level in time period t. To examine whether the maximm attack level changes linearly over the observation periods, a sitable model for Z t is (Coles 2001a, p. 107) where for coefficients $ 0 and $ 1. Z ~ GEV( µ ( t), σ, ξ ) t µ () t = β + β t 0 1 To identify other factors that might have significant impact on the maximm attack levels, the model can be extended into a general form µ = β0 β βn 1 ( t) [1, z1( t),..., zn ( t)]... where z i (t) are the factors to be examined (e.g., the nmber of employees and the nmber of enterprise applications in different time periods). The seasonal model with k seasons s 1, s 2,, s n takes the form where I j (t) is the dmmy variable having µ = β1 β βk 2 ( t) [ I1( t), I2( t),..., Ik ( t)]... 1, if s( t) = s j I j ( t) =, j = 1,..., k 0, otherwise Using these regression models, we are able to identify whether high-level attacks are changing over time, and/or whether there is any seasonal effect. We also can identify factors that may inflence the maximm attack level. Following the same logic, we can also test the factors that might have an impact on the parameters " and >. The information helps s nderstand the trend of attacks and ths make strategic investment in IT secrity more effectively with the change of environment. An Empirical Analysis Malicios traffic from self-propagating worms and denial of service attacks constantly threatens everyday operation of an organization s Internet systems. Defending networks from these threats demands appropriate tools to condct comprehensive 354 2005 Twenty-Sixth International Conference on Information Systems

Wang et al./it Secrity Investment vlnerability assessments of networked systems (Sommers et al. 2004). The high-level traffic above a certain threshold is perceived as a signal of attack to IT systems. High-level traffic cases network otage and denial of service. In this section, we analyze daily internal traffic collected from a large regional bank sitated in New York state. With 1 year as the retrn period, we estimate the retrn level of traffic. The retrn level of traffic provides valable information, enabling s to design proper defense strategies and adjst the investment level to prevent network otage and denial of service. We record the internal traffic from Janary 16, 2004, to March 20, 2005, daily (see Figre 2). The traffic is comprised by a nmber of activities, inclding employee login/logot, file/printer access, or any other activity done at network level inter-server commnication (most of which happens atomatically or is schedled) application access information (only some applications are monitored) Since a series of daily data is available, exceedances over thresholds is employed in or extreme vale analysis. We se the maximm-likelihood method to estimate the distribtion parameters of generalized Pareto distribtion (GPD) with the S-PLUS fnctions obtained from the website (Coles 2001b). To se exceedances over thresholds, a proper threshold mst be selected. The mean residal life plot (Coles 2001a, p. 78) is a diagnostic plot drawn before fitting any model and gives gidance abot what threshold to se. Figre 3 shows the mean residal life plot with approximate 95 percent confidence intervals for the daily traffic. The plot is initially linear, bt shows sbstantial crvatre in the range of 1.1 10 6 < < 1.3 10 6. For > 1.3 10 6, the plot is reasonably linear when jdged relative to confidence intervals, sggesting we set = 1.3 10 6. (We also did a sensitivity analysis with = 1.5 10 6, the reslts do not have mch difference.) The choice leads to 149 exceedences in the series of length 4 430. Ths ς =149/430=0.347 with var( ˆ ) = 5.27 10. The maximm likelihood estimators of GPD parameters are ˆ ς ( ˆ σ, ˆ ξ ) = (281972.2, 0.09), with standard error 33901.08 and 0.09 respectively. The 95 percent confidence interval for > is [-0.09, 0.26] (Figre 4). Therefore, the maximm likelihood estimate corresponds to an nbonded distribtion, althogh the evidence is not overwhelming and 0 lies inside the 95 percent confidence interval. Diagnostic plots for the fitted GPD are shown in Figre 5. Both the set of the probability plot and of the qantile plot are near linear, showing the validity of the fitted model. The retrn level crve asymptotes to an infinite level. The corresponding density estimates are roghly consistent with the histogram of the data, bt not perfect. Traffic 4,000,000 3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 1/16/04 2/16/04 3/16/04 4/16/04 5/16/04 6/16/04 7/16/04 8/16/04 9/16/04 10/16/04 11/16/04 12/16/04 1/16/05 2/16/05 3/16/05 Event Date Figre 2. Daily Traffic from Janary 16, 2004, to March 20, 2005 2005 Twenty-Sixth International Conference on Information Systems 355

Secrity and Assrance Mean Excess -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 1 2 3 Figre 3. Mean Residal Life Plot for Daily Traffic Profile Log-likelihood 16 18 20 22 24 26-0.2 0.0 0.2 0.4 Shape Parameter Figre 4. Profile Likelihood for Shape Parameter > in Threshold Excess Model of Daily Traffic 356 2005 Twenty-Sixth International Conference on Information Systems

Wang et al./it Secrity Investment Probability Plot Qantile Plot Model 0.0 0.2 0.4 0.6 0.8 1.0 Empirical 1.5 2.0 2.5 3.0 3.5 0.0 0.2 0.4 0.6 0.8 1.0 Empirical Retrn Level Plot 1.5 2.0 2.5 3.0 Model Density Plot Retrn level 5 10 15 f(x) 0 1 2 3 0.1 1.0 10.0 100.0 1000.0 Retrn period (years) 1.5 2.0 2.5 3.0 3.5 x Figre 5. Model Diagnostic Plots for Threshold Excess Model Fitted to Daily Traffic Pattern -2030-2040 -2050-2060 -2070-2080 Profile Log-likelihood -2090 2*10^6 2.5*10^6 3*10^6 3.5*10^6 4*10^6 4.5*10^6 5*10^6 Retrn Level Figre 6. Profile Likelihood for 1-Year Retrn in Level in the Threshold Excess Model 2005 Twenty-Sixth International Conference on Information Systems 357

Secrity and Assrance The GPD model provides a direct method for risk estimation sing retrn level. In or analysis, we se 1 year as or retrn period. Since we have one observation for each day, the 1-year retrn level corresponds to the 365-observation retrn level with 6 p = 1/365. As ˆ ξ > 0, we se the first part of eqation (1.12) to calclate the retrn level and have x ˆ p = 3.01 10. Figre 6 plots the profile log-likelihood for the 1-year retrn level. The 95 percent confidence interval is [2.6 10 6, 4.0 10 6 ]. The retrn level of traffic shows the level of investment we shold match for the next year sch that the exceeded probability of traffic 6 at x ˆ = 3.01 10 in next year is less than 1/365 with 95% confidence. p In or model, we have (i - a) as a measre of vlnerability, where i is the investment factor and a is the attack factor. In the data analysis, a is the packet traffic rate, while i is the capacity rate in packets per second that the system can handle. Estimated retrn level of traffic x in the data analysis provides an extremal vale estimate of packet traffic rate to which the system is sbject, ˆ p which in trn gives the vale of attack factor a. Taking this vale as the system capacity i, and assming that the secrity system is a simple serial system of three elements connection to the Internet, firewall/roter, and the server we can then specify the bandwidth of the pipe that is connecting to the firewall, the firewall/roter s filtration capacity, and the capacity of the server operating system to handle that TCP/IP traffic, all in terms of packets per nit of time. Given these capacities, we can estimate what the cost of sch a secrity system is likely to be, and decide whether to increase or decrease investment by comparing with crrent configrations. We may also introdce intrsion detection systems (IDS) or reconfigre the existing IDS to properly protect the system. Conclsion In this paper, we introdce an extreme vale approach for secrity investment. Compared with other methods on determining the effective level of secrity investment, or model does not need to calclate the expected loss de to system failre, nor make assmptions relating to hackers behavior. It is a dynamic strategy for secrity investment. With extreme vale analysis, the distribtion of high-level attacks is estimated. We may then determine the retrn level of attacks for a certain retrn period. The retrn level of attacks provides important information for s to design a proper defense capability and make an investment decision. Using the daily traffic data collected from a large regional bank, we examine the distribtion of high-level traffic. Using one year as the retrn period, we estimate the retrn level of traffic. The methodology provides many avenes of research in the ftre. First of all, sing the extreme vale approach, we can examine whether there is any difference in the distribtion of highlevel attacks from different types of attacks, sch as denial of service, malicios code, etc, as well as from different initiators, sch as internal employees, hackers, or competitors. Second, the time-effect on the attack level can be examined empirically. With the extreme vale approach, we can answer whether the maximm attack level is changing over time, and whether there is any seasonal effect. Frther we can identify factors that inflence the maximm attack level. This information will help to make strategic investment in IT secrity more effective. A similar analysis can also be done for spam e-mail where we can estimate the capacity of the e-mail system to handle the srge in traffic de to spams. There too, we will get a system size in terms of packet handling capacity, which in trn will sggest some dollars as investment. In ftre research, we propose to show how or methodology can help size a secrity system in terms of packet handling capacity systematically and thereby help estimate the dollar investment that may be reqired. There are certain limitations of or paper. First, we only focs on the discssion sing extreme vale theory to characterize the behavior of attacks. We do not look at how to operationally decide a corresponding secrity investment level, nor do we convert it into a real protection level for a system throgh the combination of varios technology and secrity policy. This is an interesting topic that needs frther exploration. Second, in extreme vale analysis we view the system attack as an exogenos variable. The casal isses of the attack are not explored. Third, extreme vale analysis, being a statistical approach based on past data has limited application where the secrity scenario is evolving sch that past data are no longer a reliable indicator of what ftre sitations may entail. Acknowledgements The athors wold like to thank the track chair, the associate editor, and the two referees for their comments, which have greatly improved the paper. This research has been fnded in part by NSF nder grant #0402388. The sal disclaimer applies. 358 2005 Twenty-Sixth International Conference on Information Systems

Wang et al./it Secrity Investment References Camp, L. J., and Wolfram, C. Pricing Secrity, in Economics of Information Secrity, L. J. Camp and S. Lewis (Ed.), Klwer Academic Pblishers, Boston, 2004, pp. 17-34. Castillo, E. Extreme Vale Theory in Engineering, Academic Press, San Diego, 1988. Cavsogl, H. Economics of IT Secrity Management, in Economics of Information Secrity, L. J. Camp and S. Lewis (Ed.), Klwer Academic Pblishers, Boston, 2004, pp. 71-83. Cavsogl, H., Mishra, B., and Raghnathan, S. A Model For Evalating IT Secrity Investments, Commnications of the ACM (47:7), 2004, pp. 87-92. Coles, S. An Introdction to Statistical Modeling of Extreme Vales, Springer-Verlag, London, 2001a. Coles, S. How to Use the S-PLUS Fnctions and Datasets, Jne 2001b (available online at http://www.maths.bris.ac.k/ ~masgc/ismev/smmary.html). Dahan, E., and Mendelson, H. An Extreme-Vale Model of Concept Testing, Management Science (47:1), 2001, pp. 102-116. Embrechts, P., Klppelberg, C., and Mikosch, T. Modeling Extremal Events for Insrance and Finance, Springer, New York, 1997. Ernst & Yong. Global Information Secrity Srvey 2003, Ernst & Yong LLP, 2003 (available online at http://www.deloitte.com/dtt/cda/doc/content/global%20secrity%20srvey%202003.pdf). Ernst & Yong. Global Information Secrity Srvey 2004, Ernst & Yong LLP, 2004 (available online at http://www.deloitte.com/dtt/cda/doc/content/dtt_financialservices_secritysrvey2004_051704.pdf). Farahmand, F., Navathe, S. B., Sharp, G. P., and Enslow, P. H. A Management Perspective on Risk of Secrity Threats to Information Systems, Information Technology and Management (6:2-3), 2005, pp. 203-255. Fisher, R. A., and Tippett, L. H. C. Limiting Forms of The Freqency Distribtions of The Largest or Smallest Member of a Sample, in Proceedings of the Cambridge Philosophical Society (24), Cambridge University Press, London, 1928, pp. 189-190. Geer, D., Hoo, K. S., and Jaqith, A. Information Secrity: Why the Ftre Belongs to the Qants, IEEE Secrity & Privacy (1:4), Jly/Agst 2003, pp. 32-40. Gordon, L. A., and Loeb, M. P. The Economics of Information Secrity Investment, ACM Transactions on Information and Systems Secrity (5:4), 2002, pp. 438-457. Gordon, L. A., Loeb, M. P., Lcyshyn, W., and Richardson, R. 2005 CSI/FBI Compter Crime and Secrity Srvey, Compter Secrity Institte, 2005 (available online at http://www.cpppe.md.ed/bookstore/docments/2005csisrvey.pdf). Gmbel, E. J. Statistics of Extremes, Colmbia University, New York, 1958. Hoo, K. J. S. How Mch is Enogh? A Risk-Management Approach to Compter Secrity, CISAC Working Paper, Stanford University, Agst 2000 (available online at http://cisac.stanford.ed/pblications/11900/). Longstaff, T. A., Chittister, C., Pethia, R., and Haimes, Y. Y. Are We Forgetting the Risks of Information Technology? IEEE Compter (33:12), 2000, pp. 43-51. Mercri, R. T. Analyzing Secrity Costs, Commnications of the ACM (46:6), 2003, pp. 15-18. Pickands, J. Statistical Inference Using Extreme Order Statistics, Annals of Statistics (3), 1975, pp. 119-131. Schechter, S. E. Compter Secrity Strength and Risk: A Qantitative Approach, npblished Ph.D. dissertation, Harvard University, 2004. Schechter, S. E., and Smith, M. D. How Mch Secrity is Enogh to Stop a Thief? The Economics of Otsider Theft via Compter Systems Networks, in Proceedings of the 7 th Financial Cryptography Conferences, R. N. Wright (Ed.), Gadelope, French West Indies, Janary 27-30, 2003. Smith, R. L. Statistics of Extremes, with Applications in Environment, Insrance, and Finance, npblished manscript, Department of Statistics, University of North Carolina, 2003. Sommers, J., Yegneswaran, V., and Barford, P. A Framework for Malicios Workload Generation, in Proceedings of the 4 th ACM SIG COMM Conference on Internet Measrement, J. Krose (Ed.), Taormina, Sicily, Italy, 2004, pp. 82-87. 2005 Twenty-Sixth International Conference on Information Systems 359