Incremental Sensor Placement Optimization on Water Network

Similar documents
Incremental Sensor Placement Optimization on Water Network

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Viral Marketing and Outbreak Detection. Fang Jin Yao Zhang

Contaminant Source Identification for Priority Nodes in Water Distribution Systems

Jure Leskovec Machine Learning Department Carnegie Mellon University

Theorem 2.9: nearest addition algorithm

A Class of Submodular Functions for Document Summarization

Graphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

CS 598CSC: Approximation Algorithms Lecture date: March 2, 2011 Instructor: Chandra Chekuri

Submodular Optimization in Computational Sustainability. Andreas Krause

5. Lecture notes on matroid intersection

3 No-Wait Job Shops with Variable Processing Times

Discrete Sensor Placement Problems in. Distribution Networks

1 Linear programming relaxation

A Simplified Procedure for Sensor Placement Guidance for Small Utilities

Main approach: always make the choice that looks best at the moment.

Submodular Optimization

6. Lecture notes on matroid intersection

Parameterized Complexity of Independence and Domination on Geometric Graphs

Discrete Sensor Placement Problems in Distribution Networks

A Comparison of Mixed-Integer Programming Models for Non-Convex Piecewise Linear Cost Minimization Problems

Online Facility Location

An Evolutionary Algorithm for the Multi-objective Shortest Path Problem

Lecture 2. 1 Introduction. 2 The Set Cover Problem. COMPSCI 632: Approximation Algorithms August 30, 2017

Part I Part II Part III Part IV Part V. Influence Maximization

Column Generation Method for an Agent Scheduling Problem

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

Maximum Betweenness Centrality: Approximability and Tractable Cases

NP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions.

Notes for Lecture 24

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY

Solving NP-hard Problems on Special Instances

α Coverage to Extend Network Lifetime on Wireless Sensor Networks

Hatim Lechgar, Abdelouahed Mallouk, Mohamed El Imame Malaainine, Tarik Nahhal and Hassane Rhinane

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T.

A Randomized Algorithm for Minimizing User Disturbance Due to Changes in Cellular Technology

Directional Sensor Control for Maximizing Information Gain

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

CPSC 536N: Randomized Algorithms Term 2. Lecture 10

Sensor Placement Guidance in Small Water Distribution Systems

Main approach: always make the choice that looks best at the moment. - Doesn t always result in globally optimal solution, but for many problems does

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Polynomial-Time Approximation Algorithms

Online Stochastic Matching CMSC 858F: Algorithmic Game Theory Fall 2010

Copyright 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch.

Lecture 7. s.t. e = (u,v) E x u + x v 1 (2) v V x v 0 (3)

A 2-APPROXIMATION ALGORITHM FOR THE MINIMUM KNAPSACK PROBLEM WITH A FORCING GRAPH. Yotaro Takazawa Shinji Mizuno Tokyo Institute of Technology

A New Combinatorial Design of Coded Distributed Computing

3 INTEGER LINEAR PROGRAMMING

On Distributed Algorithms for Maximizing the Network Lifetime in Wireless Sensor Networks

COMP 355 Advanced Algorithms Approximation Algorithms: VC and TSP Chapter 11 (KT) Section (CLRS)

Last topic: Summary; Heuristics and Approximation Algorithms Topics we studied so far:

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Discrete (and Continuous) Optimization WI4 131

Leveraging Set Relations in Exact Set Similarity Join

Approximability Results for the p-center Problem

Outline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility

Influence Maximization in the Independent Cascade Model

Lecture 9: Pipage Rounding Method

V. Solving Integer Linear Programs

A Reduction of Conway s Thrackle Conjecture

Efficient Synthesis of Production Schedules by Optimization of Timed Automata

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Optimization I : Brute force and Greedy strategy

On the Max Coloring Problem

Online Distributed Sensor Selection

Solving Large Aircraft Landing Problems on Multiple Runways by Applying a Constraint Programming Approach

The complement of PATH is in NL

Approximation Algorithms

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

An Optimal and Progressive Approach to Online Search of Top-K Influential Communities

1 The Traveling Salesperson Problem (TSP)

Computational complexity

Training Digital Circuits with Hamming Clustering

Consistency and Set Intersection

Theorem: The greedy algorithm for RKP will always output the optimal solution under Selection Criterion 3. Proof sketch: Assume that the objects have

Unit 8: Coping with NP-Completeness. Complexity classes Reducibility and NP-completeness proofs Coping with NP-complete problems. Y.-W.

Using Hybrid Algorithm in Wireless Ad-Hoc Networks: Reducing the Number of Transmissions

Paths, Flowers and Vertex Cover

Fast Approximate Energy Minimization via Graph Cuts

Boosting Simple Model Selection Cross Validation Regularization

Quality Guarantees on Locally Optimal Solutions for Distributed Constraint Optimization Problems

NUMERICAL METHODS PERFORMANCE OPTIMIZATION IN ELECTROLYTES PROPERTIES MODELING

PCP and Hardness of Approximation

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

Rough Set Methods and Submodular Functions

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

On the Minimum k-connectivity Repair in Wireless Sensor Networks

On Distributed Submodular Maximization with Limited Information

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

Online Coloring Known Graphs

Solving the Large Scale Next Release Problem with a Backbone Based Multilevel Algorithm

A Row-and-Column Generation Method to a Batch Machine Scheduling Problem

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262

Solution for Homework set 3

Subset sum problem and dynamic programming

Byzantine Consensus in Directed Graphs

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Lecture 11: Maximum flow and minimum cut

Transcription:

Incremental Sensor Placement Optimization on Water Network Xiaomin Xu 1, Yiqi Lu 1, Yanghua Xiao 1, Sheng Huang 2, and Wei Wang 1 1 Department of Computing and Information Technology, Fudan University, Shanghai 200433, Peoples Republic of China 2 IBM Research, Shanghai 200433, Peoples Republic of China {asuka19982006,luyiqi}@gmail.com, shawyh@fudan.edu.cn, huangssh@cn.ibm.com, weiwang1@fudan.edu.cn Abstract. With the development and expansion of cities, the public water distribution systems in cities would be enlarged as a consequence. Therefore, as far as the monitoring of water distribution systems is concerned, strategic placement of additional sensors throughout distribution network would exert great impact on the performance of real-time early warning systems (EWSs). On the other hand, constraints of the modification cost on the original placement strategy should also be taken into consideration. In this paper, we reduce the incremental sensor deployment optimization problem to set cover problem and algorithms with provable performance guarantees because of the submodularity property of the objective function based on greedy heuristic are designed and optimized by local search approaches such as simulated annealing. The detailed experimental results using two benchmark distribution networks demonstrate the effectiveness and efficiency of our algorithms. 1 Introduction For early warning systems (EWSs) design, the general goal of sensor placement to optimization is to place a limited number of sensors in a water distribution network such that the impact to public health of an accidental or intentional injection of contaminant is minimized. There have been a large amount of works[1-14] proposed on optimizing water sensor network placement. [1] gives an overview for the literature on optimizing sensor networks for containment detection. Most of these approaches are only able to perform on small networks up to about 500 nodes. Many approaches cannot provide provable performance guarantees since they adopt heuristics. To name a few, [1] equate the placement problem with a p-median problem and make use of a large toolset of existing algorithms for this problem. They consider two algorithm mixed integer programming(mip) and a fast heuristic(grasp). [13] uses genetic algorithms and [8] adopt crossentropy selection. The first algorithm which gives a performance bound is [14]. In their work, a algorithm CLEF is proposed by exploiting the submodularity property [15,16], which can achieve at least a fraction of ( 1 1 e ) of the optimal solution. This algorithm can handle network of size up to 10000 nodes. All the

2 algorithms mentioned above do not consider the sensor placement problem accumulatively. With the development and expansion of cities, the public water distribution systems in cities would be enlarged as a consequence. Therefore, as far as the monitoring of water distribution systems is concerned, strategic placement of additional sensors throughout distribution network would play a vital role in real-time early warning systems (EWSs). On the other hand, the change of the structure of original water distribution systems may exert great influence on the efficiency of EWSs and may be costly. Sometimes, we was not given a fixed amount of sensors and to optimize the objectives defined in previous section, instead, the requirement of sensor placement problem would be: given an expected upper-bound of an objective criteria and to use less sensors in the sensor placement planning to detect possible contamination scenarios as more as possible. That kind of requirement for sensor placement problem is common, because with the expansion of the water distribution system in the city, a lager area should be protected from contamination. What the city government concerned would be: is there existing a global expected upper-bound of criteria such as detection time for the sensor placement design such that the contamination influence could be under control. Therefore, it would be a great challenge to design new strategic to deploy sensors incrementally under the condition that the efficiency should be guaranteed and alteration cost on original sensor placement should be limited. In this paper, we reduce the sensor placement optimization problem in the condition of incremental placement and cost limitation on sensor placement modification to set-cover problem. Optimization Algorithm based on Greedy Heuristic is also presented, whose solution quality is guaranteed due to the sub-modularity property of the Optimization Objective Function defined. Our contributions are: 1. We consider the sensor placement incrementally and take the dynamic change of water distribution system and cost limitation on placement modification in to account. 2. Algorithm for the problem is proposed with provable bound of the solution it returned. Different modification strategies used in the algorithm purposed are also discussed briefly to judge the better strategy. 3. the detailed experimental results using two benchmark distribution networks demonstrate the robustness of the solution found and efficiency of our algorithms. 2 Preliminariesn 2.1 MIP model In MIP model, water distribution systems are modeled as undirected graph G = (V, E), vertices in set V represents junctions, tanks, and other resources in the systems. Edges in E represent pipes, pumps, and valves. In MIP model, A denotes the set of contamination scenarios which consist of several individual

3 contamination events. Events can be characterized by quadruples of the form (v x, t s, t f, X), where v x V is the origin of the contamination event, t s and t f are the contamination event start and stop times, and X is the contamination event profile. t a s, t a b respectively denote the start and stop times of the contamination event for scenario a. d a (t) denotes the total network-wide impact of a contamination scenario a at any given point in time t t a s. γ aj denote the earliest time t at which a hypothetical sensor at junction v j can detect contaminant due to a contamination scenario a. d aj = d a (γ aj ) is the total impact of contamination scenario a if the contaminant is first detected by a sensor at v j. t denotes the stop time imposed on the water quality simulations If no contaminant ever reaches v j, then γ aj = t. Let q denote a dummy location that corresponds to failed detection of contamination scenario a. d aq is the total impact of contamination scenario a, if it is not detected before t. MIP model is used to model the placement of p sensors on a set L V, with the objective of minimizing the expected impact of a set A of contamination scenarios. Each contamination scenario A has a likelihood α a such that a A α a = 1. L a L {q} be the subset of vertices in L {q} that could possibly be contaminated by scenario a. x ai is an indicator variable with value 1 if location i raised the alarm for contamination scenario a and 0 otherwise. Binary decision variable s i for each potential sensor location i L equals 1 if a sensor is placed at location i and 0 otherwise. The design objective is to minimize a A α a i L a d ai x ai. Then the formalization of MIP model on sensor placement problem would be as following: (DSP )minimize α a d ai x ai a A i L a i L a x ai = 1, a A x ai s i, a A, i L a (1) where i L s i p s i {0, 1}, i L 0 x ai 1, a A, i L a 2.2 Modification on MIP model We make some modification on the original MIP model for the requirement of additional sensor placement planning problem. Definition of our additional sensor placement problem is: Given current water distribution system G = (V, E ) and the original version of water distribution system denoted as G = (V, E) (assuming V V ), and the current water sensor placement vector S on G. Now we want to place additional u sensors on G. c 1, c 2 denote the cost of installing and uninstalling a sensor respectively and C represents the upper bound of expected cost of sensor placement alteration. Our optimization objective is identical to original problem.

4 Vertex set L 1 = {i V S i = 1} represents set of vertices in V which has been placed with sensor previously. The formalization of our optimization problem is: minimize α a a A i L a d aix ai i L a x ai = 1, a A x ai s i, a A, i L a i L where a s i i L a s i + u s i {0, 1}, i L 0 x ai 1, a A, i L a (c 1 + c 2 ) i L 1 (s i s i) + u c 2 C (2) 3 Incremental sensor placement optimization problem The placement performance requirement discussed in the previous section can be defined for both of incremental and non-incremental sensor placement problem. Definition 1 (covered scenario). if contamination scenario a detected under sensor placement strategy S and the total impact of contamination scenario a, denoted as d a is at most M, then we say that scenarios a is covered by sensor placement strategy S with contaminative impact at most M. Definition 2 (non-incremental sensor placement). Given the expected upperbound of the criteria considered M and other parameter definitions are the same to MIP model, find a sensor placement strategy S such that number of scenarios covered by S is maximized. When incremental sensor placement problem is concerned, cost of installing and uninstalling a sensor should be taken into account. Hence, in the following definition, we define a constant upper bound for the number of modifiable original sensor placements. Definition 3 (incremental sensor placement). Given the expected upperbound of the criteria considered M, number of sensors to be incrementally placed p, upper-bound of the number of modifiable original sensor placements o, the original sensor place strategy S 0 and other parameter definitions are the same to MIP model, find a sensor placement strategy S such that number of scenarios covered by S is maximized. 3.1 Reduction to set cover problem Associating scenario set of vertex v represents all the scenarios which could be detected at vertex v with the contaminative impact within constant M, denoted as R v,m = {a a A, v V, d av M}, where d av represent the contaminative

5 impact of scenario a when detected at vertex v, similar to definition in MIP model. In order to solve the sensor placement problem on our definitions, we reduce this problem to Maximizing-aimed Set-Cover Problem. Definition 4 (Maximizing-aimed Set-Cover Problem). Given a universe E = {e 1...e n }, collection C = {c 1...c m }, such that for each c i C: c i E, to find a subset C of collection C whose cardinality is p, so that the cardinality C i is maximized. C i C Then we can reduce the incremental sensor placement problem to set-cover problem. Definition 5 (Evaluation Function). For any sensor placement strategy S V and the expected upper-bound of the criteria considered M, the evaluation function F (S) is defined as: F(S) = v S R v,m Definition 6 (incremental sensor placement based on set cover). Given the scenario set A = {a 1...a n } as the universe, the expected upper-bound of the criteria considered denoted as M, G(V, E) represent the water distribution network, collection C = {R v,m v V } and the number of sensor to be place incrementally p, upper-bound of the number of modifiable original sensor deployed o, the original sensor place strategy SO. Denote sensor place strategy as a vertex set, in which each vertex is chosen to place a sensor. Our goal is to find sensor place strategy C, whose cardinality is p + SO and SO C o, so that the cardinality R v,m is maximized. v C 3.2 Submodularity of evaluation fuction The Evaluation Function has several intuitive properties. It is nonnegative, F( ) = 0, i.e., if we place no sensors, the evaluation function is 0. We can also see that evaluation function F is nondecreasing, i.e., for placement strategies A B V, it holds that F (A) F (B), hence the evaluation function can only increase if we place more sensors. There is an additional intuitive property: if we add a sensor to a large deployment, we would expect less object function gain than if we add the sensor to a small deployment. This diminishing returns is formalized by the combinatorial concept of submodularity [15] : A set function F is called submodular if for all subsets A B S and elements S, it holds that F(A {s}) F(A) F(B {s}) F(B), in other words, adding s to the smaller set A helps more than adding it to the larger set B. In fact, we can prove that the Evaluation Function as defined earlier are submodular. Theorem 1. the Evaluation Function F is submodular.

6 Proof. Firstly, for any sensor placement strategy A V, it holds true that: F(A {i}) = R v,m v A {i} F(A) = R v,m v A F(A {i}) F(A) = R v,m = R v,m v A {i},v / A v {A {i} A} Assume that :B A, the results still hold true for B. Secondly, according to set operation theorems, we know that: A {i} A = (A {i}) A = (A A) ({i} A) = {i} A Because therefore, B A A B {i} A {i} B F (A {i}) F (A) = R v,m F (B {i}) F (B) = v A {i} v B {i} R v,m A {i} B {i} R v,m v A {i} F (A {i}) F (A) F (B {i}) F (B) R v,m v B {i} 4 Algorithms for Optimization In the last section, we have showed that the sensor placement optimization objective function is submodular. Let s firstly consider the non-incremental sensor placement problem, ( ) in which p sensors will be placed in V places. Obviously, there are possible deployment strategies, which is impossible V p to be searched exhaustively to find the global-optimal deployment strategy for large water distribution network. On the other hand, when our incremental sensor placement problem is considered, which attempt to place p additional sensors and modify placement( of at most ) ( o sensors ) placed originally, the strategy SO V o searching space would be,(where SO represent the original sensor placement strategy). This search space is even larger than that of o p the non-incremental sensor placement problem. As a result, various heuristic approaches are proposed, which try to find high quality solutions through a search procedure. e.g., genetic algorithms [9], mixed integer programming solutions(mip) [2]. However, many of the heuristic approaches don t provide provable quality guarantee about the solution they returned or running-time guarantee.

7 But the greedy heuristic approach is an exception, which has been proven to provide specific bound for the quality of the solution it returned. Hence, in the following sections, we will base our algorithm on greedy heuristic because of its strong theoretical guarantees. For simplicity, we will discuss the basic greedy algorithm for non-incremental sensor placement problem firstly. Then some modification will be discussed to ensure that the greedy heuristic would be suitable for solving the incremental sensor placement problem. 4.1 Basic Greedy Heuristic Approach The basic greedy heuristic algorithm start with the empty placement, S = and proceeds iteratively. In each iteration, a new place v V which can increase the Evaluation Function most would be chosen as the next place to deploy a sensor. a fundamental result by [15] shows that this intuitive procedure is near optimal for the class of nondecreasing submodular functions: The greedy algorithm always return a set S G whose evaluation function value F (S G ) such that F (S G ) (1 1 e )F (S ), where S denote the global optimal placement of p sensors. Hence, the greedy solutions achieve an approximation of at least 1 1 e times to the global optimal solution. The 1 1 e 63% of the optimal bound on the greedy algorithm is an offline bound. The submodularity property of the Evaluation Function F also guarantees a tighter bound on the solution the greedy algorithm found. Let S denote an arbitrary solution for sensor placement problem, and for each place a / S, let δ a = F (S {a}) F (S) represent the improvement gained when a sensor is placed at place a. Assume p sensors would be placed, S represents the optimal solution for deploying p sensors, and s 1,..., s p represent the p places with top-p largest improvement gain, then the online bound could be found for arbitrary solution:f (S ) F (S) + p j=1 δ s j. Theorem 2. F (S ) F (S) + p j=1 δ s j Proof. Because the function with submodularity property is nondecreasing, thus, we have: F (S ) F (S S) Secondly, we define S i = i t=1 s t for i = 1,...p, and S 0 = It holds true that S Si = S S i 1 {s i }, S S i S. Hence, according to the submodularity property, we have the following inequalities: for i 1,..., p So, p F (S S i ) F (S S i 1) F (S {s i }) F (S) i=1 F (S S i ) F (S S i 1) p i=1 F (S {s i }) F (S)

8 Which is equivalent to: F (S S ) F (S) p i=1 F (S {s i }) F (S) = p j=1 δ s j According to definition that s 1,..., s p represent the p places with top-p largest improvement gain, thus, we have p j=1 δ s p j j=1 δ s j Therefore, F (S ) F (S) F (S S ) F (S) p j=1 δ s j Running time of the greedy algorithm The running time of the algorithm is proportional to the number of locations V = n, the contamination scenarios considered m and the time taken to compute the evaluation function F, which is a union operation in this case whose complexity is O(m). In each iteration, O(n) places should be tested for evaluation function F, thus, the total running time would be O(pnm). Optimization on the basic greedy algorithm In our implementation, some modification would be made on the basic greedy algorithm to enhance its efficiency(see Algorithm 1). The key idea lies in that the set union operation would cost less if the cardinality of sets decrease. Thus, instead of recalculating δ v for each currently unselected place v repeatedly in every iteration, we update set R v,m for each currently unselected place v at the end of each iteration by eliminating the scenarios which has been covered by the place chosen in the iteration. In this case, at each iteration, it hold true that δ v = R v,m and with the proceeding of the algorithm, cardinality of most R v,m would decrease dramatically because we actively update R v,m. Algorithm 1 Optimized Greedy Heuristic Input: p, M, R v,m for v V Output: S G 1: iter 1 2: S G 3: while iter p do 4: 5: v c arg max v V SG R v,m S G {v c} S G 6: for each v V S G do 7: R v,m R v,m R vc,m 8: end for 9: iter iter + 1 10: end while 11: return S G

9 4.2 Algorithm for Incremental Sensor Placement Problem In the last two sections, we have discuss the greedy heuristics algorithms for non-incremental sensor placement optimization. In this section, we would apply the greedy heuristic to incremental sensor placement problem. We use DecisionF unction to decide which partition of the original sensor placement strategy to be preserved and we will discuss different strategies adapted in DecisionF unction and their performance in detail later. According our definition of incremental sensor placement problem, there are at lest SO o sensors remain unchanged in our new sensor placement strategy. Algorithm can be divided into 3 parts. The first part of algorithm is DecisionF unction which is responsible for choosing SO o sensors in SO to be preserved in the final placement strategy(denoted as S r in the algorithm 2). The second part of the algorithm would update R v,m for every place v at which no sensor is deployed to eliminate the scenarios covered by the SO o sensors chosen previously. The third part of the algorithm is identical to Algorithm 1, in which p + o sensors would be placed by greedy heuristic. Strategies Applied in DecisionFunction DecisionF unction is responsible for choose SO o sensors to be preserved in the final sensor placement strategy. Obviously, that decision would make a significant impact on the quality of the final sensor placement strategy found by Algorithm 2. In this paper, the following strategies would be considered. 1. Randomized Heuristic(RH) The randomized Heuristic simply choose SO o sensors from the original sensor placement strategy SO randomly. 2. Greedy Heuristic(GH) The greedy heuristic strategy is identical to algorithm 1, which start with the empty placement, S = and proceeds SO o times iteratively. In each iteration, a new place v SO which can provide increase the Evaluation Function most would be chosen as the next place to deploy a sensor. 3. Simulated Annealing Optimization of Randomized Heuristic(SA) Firstly, we use Randomized Heuristic to choose SO o sensors denoted as S RH. Then, the local search heuristic approach: Simulated Annealing would be applied to the original solution to reach a local optimal solution. Let S cur denotes the solution to be optimized in current iteration. The simulated annealing process, in each round, purposes an exchange of a selected location s S cur and an unselected location s SO S cur and compute the evaluation function gain of the exchange δ = F (S cur {s } {s}) F (S cur ). If δ is positive (i.e. the exchange operation has optimized the previous solution), the proposal is accepted. Otherwise, the proposal is accepted with probability exp( δ/ ϑ t), where ϑ t is the annealing tempature at round t. We use exponential decay schema, where ϑ t = Cq t, for some large constant C and small constant q(0 < q < 1).

10 Algorithm 2 Optimized Greedy Heuristic for Incremental Sensor Placement Input: p, M, R v,m for v V,SO,o Output: S G 1: S r DecisionF unction(so, SO o) 2: for each v S r do 3: for each v V do 4: R v,m R v,m R v,m 5: end for 6: end for 7: S G OptimizedGreedyHeuristic(p + o, M, V S G) 8: return S G SG 0.83 0.82 0.81 no modification all modifiable limited modification 0.45 0.40 0.35 0.30 all modifiable no modification limited modification DetectRatio 0.80 0.79 0.78 DetectRatio 0.25 0.20 0.15 0.77 0.10 0.76 0.75 0 2 4 6 8 10 12 14 16 18 20 Sensor Placed Incrementally (a) BWSN1 0.05 0.00 0 5 10 15 20 25 30 35 40 Sensor Placed Incrementally (b) BWSN2 Fig. 1. (a):detect ratio for BWSN1, assuming 5 sensors have been deployed on network originally, o = 2(b): detect ratio for BWSN1, assuming 10 sensors have been deployed on network originally, o = 4

11 5 Experimental Study 5.1 Network Analyzed and Simulation Setting We considered both the small network on 129 nodes (BWSN1),and a large, realistic, 12,527 node distribution network (BWSN2) provided as part of the BWSN challenge [17]. We run EPANET hydraulic simulation and water quality simulation for the two water distribution networks described above. In term of criteria to evaluate contamination impact, in August 2006, the Battle of Water Sensor Networks(BWSN) [17] challenge proposed four realistic objective functions: the time until an intrusion is detected (Z 1 ), the expected population affected by an intrusion(z 2 ), the expected amount of contaminated water consumed(z 3 ) and likelihood of detection(z 4 ). The criteria Z 2, Z 3, Z 4 could be consider as a function varying with elapsed time, thus, we adapt Z 1 as the evaluation criteria in experiment. For network BWSN1, 516 contamination scenarios were generated for each of the nodes in the water distribution network at 4 different attack time: 6 A.M., 12 A.M., 6 P.M., 12 P.M. Each contamination features 96-hour injection of a fictional contaminant at strength 1000mg/min (using EPANET s MASS injection type) and had a total duration of 96 hours. For network BWSN2, 4000 contamination scenarios were generated for 1000 nodes randomly selected in the water distribution network and other settings are the same as BWSN1. In the following experiments, we fix M = 120, 150(min) for BWSN1 and BWSN2 respectively. For both of the two network s water quality simulation, we assume that a sensor would alarm when the concentration of contaminant surpass 10mg/L at the place the sensor was deployed. The quality of the sensor placement strategy S is measured by detect ratio, which is defined as F (S) A, where A denotes the total number of contamination scenarios considered. 5.2 Incremental vs non-incremental Sensor Placement In this section, we will show the necessity of taking the modification cost on original sensor deployment strategy into consideration, when we aim at optimize the performance of sensor placement under the incremental sensor placement case. In this case, 3 modification strategy are compared: no modification, limited modification and complete modification on original placement on both the performance quality and the modification cost spent on original sensor placement. For each network, we would generate an original sensor placement randomly and set SO = 5 and SO = 10 for network BW SN1 and network BW SN2 respectively. o is set as 2 and 5 for BW SN1 and BW SN2 in incremental case study. It s clear that limited modification served as a trade-off compared with the other two strategies. In Figure 1, although the detect ratio is just a little lower than that of complete modification strategy and higher than that of no modification strategy, the modification cost is upper-bounded by o = 2, 5 while that of complete modification strategy much higher on average(number of sensor to be replaced is 5,10 in BWSN1 and BWSN2 respectively).

12 5.3 Analysis of different Strategies Applied in DecisionFunction In Section 4.2, we have purpose 3 different strategies for DecisionF unction to choose the sensor placements to be preserved in the final sensor placement. In this section, we will compare the three strategies on the quality of final sensor placement strategy. In Figure 2, we set SO = 5 and SO = 10 for network BW SN1 and network BW SN2 respectively. o is set as 2 and 5 respectively for BW SN1 and BW SN2 in incremental case study. We also set iteration number to be 1000 for the simulated annealing strategy. According to Figure 2(a), simulated annealing(sa) and random heuristic(rh) strategy show higher detect ratio than greedy heuristic(gh) does. However, in Figure 2(b), the performance of the 3 strategies are relatively close to each other, because the scale of network BWSN2 is much larger, which made it more difficult for local search strategy to jump out from a local optimal solution. DetectRatio 0.83 0.82 0.81 0.80 0.79 0.78 0.77 RH SA GH 0.76 0 2 4 6 8 10 12 14 16 18 20 Sensor Placed Incrementally DetectRatio 0.40 RH SA 0.35 GH 0.30 0.25 0.20 0.15 0.10 0 5 10 15 20 25 30 35 40 Sensor Placed Incrementally (a) DecisionF unction for BWSN1 (b) DecisionF unction for BWSN2 Fig. 2. (a):decisionf unction for BWSN1 with 3 different strategies to preserve a given number of sensors(set o = 2) in the original sensor placement strategy(5 sensors) (b): DecisionF unction for BWSN2 with 3 different strategies to preserve a given number of sensors(set o = 5) in the original sensor placement strategy(10 sensors) 5.4 Robustness against contamination scenario The number of possible contamination scenarios is impossible to be enumerated even theoretically. Thus, could a sensor placement strategy with high detect ratio on scenario set considered be still reliable, when unknown contamination scenario occur? There is no guarantee that a placement strategy with high detect ratio on the training contamination scenario set would also perform well to detect other unknown contamination scenarios. Therefore, in this section, we will illustrate that sensor placement founded by our algorithms are robust against unknown scenarios. During our experiments, we set SO = 5, 10 and o = 2, 5 for network BW SN1

13 and network BW SN 2 respectively, and use simulated annealing strategy with 1000 iterations in DecisionF unction. In Figure 3, we vary the number of sensor to be placed incrementally and compute detect ratio on scenarios sets considered respectively. We generate 4 scenario sets for BW SN1 and BW SN2 randomly as test cases, each set contains 100 scenarios, and compute detect ratio for each of the test case scenario sets. In this figure, we compare the detect ratio of the training sets and the test case sets on average. With the sensors to be placed varying, detect ratio of test case on average is almost the same as that of training contamination sets, which shows that the sensor placement strategy is robust against unknown contamination scenarios. DetectRatio 0.80 0.75 0.70 0.65 0.60 0.55 Actual Expected 0.50 0 2 4 6 8 10 12 14 16 18 20 Sensors Placed Incrementally (a) Robustness on BWSN1 DetectRatio 0.40 0.35 0.30 0.25 0.20 0.15 0.10 Actual Expected 0 5 10 15 20 25 30 35 40 Sensors Placed Incrementally (b) Robustness on BWSN2 Fig. 3. (a):robustness on average of BWSN1 computed on 4 test case sets (b): Robustness on average of BWSN2 computed on 4 test case sets 5.5 Running Time Analysis In this section, we will compare running time of exhausted searching, the greedy algorithm and the fast greedy algorithm on running on BW SN1 and BW SN2. We fix SO = 5 and SO = 10 for network BW SN1 and network BW SN2 respectively. We can observe in Figure 4 that the greedy algorithms achieve a great improvement comparing with exhausted searching. Moreover, the running time of greedy algorithms is linear to the sensors to be placed when the training contamination scenario set is fixed, which has been proved previously. In Figure 4(b), optimized greedy algorithm shows better efficiency than the original version on large network BWSN2, because a fraction of scenarios is associated with many vertices in network, when they are detected and removed, the cardinality of R v,m for many different vertex v decreased, thus the set operation become faster.

14 1000 Optimized Greedy Original Greedy Exhaustived Search(expected) 10 5 Original Greedy Optimized Greedy Exhaustived Search(expected) Running Time 100 Running Time(sec) 10 4 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Sensor Placed Incrementally (a) Running time on BWSN1 10 3 1 2 3 4 5 6 Sensor Placed Incrementally (b) Running time on BWSN2 Fig. 4. (a):running time on BWSN1 in log scale (b): Running time on BWSN2 in log scale 6 Conclusion In this paper, we consider the sensor placement optimization problem in the condition of incremental placement and cost limitation on sensor placement modification, which is quite realistic in real-world. We also propose a reduction of the problem based on our definition to set-cover problem. In this way, an universal bound on contamination detection time could be guaranteed. Optimization Algorithm based on Greedy Heuristic is proposed, whose solution quality is guaranteed due to the sub-modularity property of the Optimization Objective Function defined. Optimization of the algorithm proposed is also discussed briefly both theoretically and experimentally. References 1. Berry, J., Hart, W. E., Phillips, C. E., Uber, J. G., and Watson, J.P. Sensor placement in municipal water networks with temporal integer programming models. In J. Water Resour. Plann. Manage, 132(4), 218-224. 2006. 2. Berry, J. W., Fleischer, L., Hart, W. E., Phillips, C. A., and Watson, J. P. Sensor placement in municipal water networks. In J. Water Resour. Plann. Manage, 131(3), 237-243. 2005. 3. Kessler, A., Ostfeld, A., and Sinai, G. Detecting accidental contaminations in municipal water networks. In J. Water Resour. Plann. Manage, 124(4), 192-198. 1998. 4. Kumar, A., Kansal, M. L., and Arora, G. Identification of monitoring stations in water distribution system. In J. Environ. Eng. 123(8), 746-752. 1997. 5. Watson, J.-P., Greenberg, H. J., and Hart, W. E. A multipleobjective analysis of sensor placement optimization in water networks. In Proc., World Water and Environmental Resources Conf. 2004. 6. Ostfeld, A., and Salomons, E. Sensor network design proposal for the battle of the water sensor networks (BWSN). In 8th Annual Symp on Water Distribution Systems Analysis. 2006.

7. Wu, Z. Y., and Walski, T. Multi objective optimization of sensor placement in water distribution systems. In 8th Annual Symp. on Water Distribution Systems Analysis. 2006. 8. Dorini, G., et al. An efficient algorithm for sensor placement in water distribution systems. In 8th Annual Symp. on Water Distribution Systems Analysis. 2006. 9. Guan, J., Aral, M. M., Maslia, M. L., and Grayman, W. M. Optimization model and algorithms for design of water sensor placement in water distribution systems. In 8th Annual Symp. on Water Distribution Systems Analysis. 2006. 10. Berry, J., Hart, W., Phillips, C. A., and Watson, J. P. A facility location approach to sensor placement optimization. In 8th Annual Symp. on Water Distribution Systems Analysis. 2006. 11. Huang, J. J., McBean, E. A., and James, W. Multiobjective optimization for monitoring sensor placement in water distribution systems. In 8th Annual Symp. on Water Distribution Systems Analysis. 2006. 12. Preis, A., and Ostfeld, A. Multiobjective sensor design for water distribution systems security. In 8th Annual Symp. on Water Distribution Systems Analysis. 2006. 13. A. Ostfeld and E. Salomons. Optimal layout of early warning detection stations for water distribution systems security. In J. Water Resources Planning and Management. 130(5):377-385, 2004. 14. Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne Van- Briesen and Natalie Glance. Cost-effective Outbreak Detection in Networks. In KDD, 2007. 15. G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. In Mathematical Programming, 1978. 16. M. Sviridenko. A note on maximizing a submodular set function subject to knapsack constraint. In Operations Research Letters, 2004. 17. A. Ostfeld, J. G. Uber, and E. Salomons. Battle of water sensor networks: A design challenge for engineers and algorithms. In WSDA, 2006. 18. Technical Report of Incremental Sensor Placement Optimization. http://gdm.fudan.edu.cn/attach/wsnd/report.pdf 15