Point-Based Value Iteration for Partially-Observed Boolean Dynamical Systems with Finite Observation Space

Size: px

Start display at page:

Download "Point-Based Value Iteration for Partially-Observed Boolean Dynamical Systems with Finite Observation Space"

Christian White
6 years ago
Views:

1 Point-Based Value Iteration for Partially-Observed Boolean Dynamical Systems with Finite Observation Space Mahdi Imani and Ulisses Braga-Neto Abstract This paper is concerned with obtaining the infinite-horizon control policy for partially-observed Boolean dynamical systems (POBDS) when measurements take place in a finite observation space, with application to Boolean gene regulatory networks. The goal of control is to reduce the steady-state mass of undesirable states, which might be associated with disease. The idea behind the proposed method is to transfer the partially-observed Boolean states into a continuous observed state space known as belief space, and then employ the well-known value iteration method based on Point-Based Value Iteration (PBVI). The performance of the method is investigated using a Boolean network model constructed from melanoma geneexpression data observed through Bernoulli noise. I. INTRODUCTION A fundamental problem in genomic signal processing is to obtain appropriate intervention strategies in gene regulatory networks to beneficially alter network dynamics. The goal of control is to shift steady-state mass away from undesirable states, such as cell proliferation states which may be associated with cancer [1]. Boolean networks [2] provide an effective model of the dynamical behavior of gene regulatory networks consisting of genes in activated/inactivated states, the relationship among which is governed by logical relationships updated at discrete time intervals [3], [4]. To date, different modeling approaches such as Probabilistic Boolean Networks (PBNs) [3], S-systems [5], and Bayesian networks [6] have been proposed in literature to mathematically capture the behavior of genetic regulatory networks. In addition, various intervention approaches [1], [7] have been developed. These all assume that the system states are directly measurable, and are mostly in the framework of Probabilistic Boolean Networks (PBNs). In contrast, this paper employs the partially-observed Boolean dynamical system (POBDS) model, first introduced in [4]. Several tools have been designed for this *The authors acknowledge the support of the National Science Foundation, through NSF award CCF M. Imani and U. M. Braga-Neto are with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA m.imani88@tamu.edu, ulisses@ece.tamu.edu signal model in recent years such as the optimal filtering and smoothing methods based on the MMSE criterion which are called Boolean Kalman Filter (BKF) [4] and Boolean Kalman Smoother (BKS) [8] respectively. In addition, schemes for simultaneous state and parameter estimation, network inference and fault detection for POBDS were introduced in [9] [11]. In [12], the statefeedback controller for POBDS was designed based on optimal infinite horizon control of the underlying Boolean dynamical system using the BKF as state observer. The focus of this paper is on output-feedback control of POBDS models with a finite observation space. For a special case of gene regulatory network observed through finite noisy measurements, a method for partitioning the space into a finite set is proposed. Basically, the unobserved Boolean states of the system are transformed into the observed continuous state space known as belief space. The belief space is a high-dimensional simplex, which represents the probability distribution of the state given the history of the observations and control inputs. Considering a system with d Boolean variables, the size of Boolean state is 2 d, and the belief space is a subset of R 2d 1. We obtain the infinite-horizon control for POBDS in the belief space by employing a pointbased value iteration method [13]. The performance of the method is investigated using a Boolean network model constructed from melanoma gene-expression data observed through Bernoulli noise. II. PARTIALLY-OBSERVED BOOLEAN DYNAMICAL SYSTEMS Deterministic Boolean dynamical systems models are unable to cope with (1) uncertainty in state transition due to system noise and the effect of unmodeled variables; the fact that (2) the Boolean states of a system are never observed directly. This motivates the partiallyobserved Boolean dynamical systems (POBDS) model, first proposed in [4], which will be briefly described next.

2 We assume that the system is described by a state process {X k ; k = 0, 1,...}, where X k {0, 1} d is a Boolean vector of size d. The state is affected by a deterministic sequence of control inputs {u k ; k = 0, 1,...}, where u k U represents a purposeful control input into the system state. The sequence of states is observed indirectly through the observation process {Y k ; k = 1, 2,...}, where Y k is a vector of measurements. The states are assumed to be updated and observed at each discrete time through the following nonlinear signal model: X k = f (X k 1, u k 1 ) n k (state model) Y k = h (X k, v k ) (observation model) for k = 1, 2,..., where f {0, 1} d U {0, 1} d is a controlled network function, {n k ; k = 1, 2,...} is Boolean transition noise, indicates component-wise modulo-2 addition, whereas h is a general function mapping the current state and observation noise v k into the finite measurement space. The noise process {n k ; k = 1, 2,...} is assumed to be white in the sense that n k and n l are independent for k l. In addition, the noise process is assumed to be uncorrelated from the state process and control input. The previous assumptions imply that {X k ; k = 0, 1,...} is a controlled Markov chain. If (x 1,..., x 2d ) is a list of all the Boolean states of the system, then the transition matrix M k (u) of this Markov chain, of size 2 d 2 d, and known as the prediction matrix, is given by: (M k (u)) ij = P (X k = x i X k 1 = x j, u k 1 = u) = P (n k = x i f(x j, u)), (1) (2) for i, j = 1,..., 2 d. On the other hand, given a measurement y at time k, the update matrix T k (y), also of size 2 d 2 d, is a diagonal matrix, defined by: (T k ( y)) ii = P (Y k = y X k = x i ), (3) for i = 1,..., 2 d. These definitions are used in the following paragraphs for obtaining the control policy for a POBDS. III. BELIEF STATES AND INFINITE-HORIZON CONTROL In this section, the infinite-horizon control problem for partially-observed Boolean dynamical systems is formulated. The goal of control in this paper is to select the appropriate external input u k U at each time k to make the network spend the least amount of time, on average, in undesirable states; e.g., states associated with cell proliferation, which may be associated with cancer [1]. We assume here that the measurements and control inputs at each time point belong to finite sets Y and U, respectively. Furthermore, we assume that the system prediction matrix M k (u) and update matrix T k (y) can only depend on time through the control input u or measurement y, respectively. We will therefore drop the index k and write simply M(u) and T (y). Since the state of the system is not observed directly, all that is available for decision making at each time step is the observations up to current time (y 1,..., y k ) and the control input applied to the system up to previous time step (u 1,..., u k 1 ). Instead of storing the history of observations and control inputs, one can use the probability of states given that information at each time step. This probability distribution is known as belief state. The belief state is denoted by a vector b, where 0 b(i) 1, for i = 1,..., 2 d, and 2d i=1 b(i) = 1, such that b k = P (X k u 0, y 1,..., u k 1, y k ), (4) Equation (4) implies the fact that the initial belief state is the initial state distribution, b 0 (i) = P (X 0 = x i ), for i = 1,..., 2 d. It can be shown that the belief is a sufficient statistic for a given history [14]. Assuming b is the current belief state of the system, if the control input u is applied and observation y is made, the new belief can be obtained by using Bayes rule as: b y,u = T (y) M(u) b T (y) M(u) b 1, (5) where 1 denotes the L1-norm of a vector. Thus, by using the concept of belief state, a POBDS with state transition matrix M can be transformed into a Markov decision process (MDP) with new state transition probability in the belief space, which consists of a simplex in R 2d 1 and is directly observable. Therefore, one can use strategies developed for systems with directly observed states (e.g., the value iteration method introduced in [15]) to obtain the optimal control policy in the belief space. To do so, the transition probability in belief space should be defined. The probability of starting at belief b, applying control input u and and reaching the new belief b, can be obtained as: P (b T (y) M(u) b 1, if b = b y,u, b, u) = 0, if b b y,u, (6) for y Y. It should be noted that although b belongs to a continuous space R 2d 1, each belief point results in at most Y U new belief points. One can define the space of current existing belief as B, where all b B.

3 Let g(x i, u) be a bounded cost of control for state x i and control input u, for i = 1,..., d. Collect all these costs in a vector g(u) of size 2 d. The costs can be transformed to belief space as follows: g(b, u) = 2 d g(x i, u) b(i) = g(u) T b. (7) i=1 The classical results proved in [15] for MDPs can be used here in belief space. The Bellman operator for the belief space can be obtained as follows: T [J](b) [g(b, u) + γ P (b b, u)j(b )] u U b B u U [g(u)t b + γ T (y) M(u) b 1 J(b y,u )]. (8) As we stated before, the belief space is a simplex in R 2d 1, but it has been shown in [16] that in both finite and infinite horizon control, the cost function can be modeled by the lower envelope of a finite set of linear functions. These linear functions are described by vectors, namely, α-vectors (row vectors), where Λ contains all possible α-vectors. Thus, given the set of Λ, the cost function for a given belief point b can be obtained as: J(b) α Λ α b, (9) where the multiplication of each α-vector (row vector) and b (vertical vector) results in a scalar which is the expected cost for a given belief point. Therefore, the value iteration method can be performed by obtaining new set of α-vectors instead of working with belief state in continuous space. Exponential growth of α- vectors prevents the application of exact value iteration method in most of the real problems. Several studies have focused on shrinking Λ to the minimal that is able to handle the problem; but, they were not successful in handling problems with large number of states. (For more information about these methods, the reader is referred to [17].) IV. POINT-BASED VALUE ITERATION FOR POBDS Point-based value iteration algorithm was developed to find the optimal control policy in large systems by some kind of approximation. For POBDS, one can rewrite equation (8) based on α-vector as follows: T [J](b) = min u U [g(u)t b + γ T (y)m(u) b 1 (min α α Λ by,u )] u U [g(u)t b + γ T (y)m(u) b 1 (min α T (y) M(u) b )] α Λ T (y) M(u) b 1 u U [g(u)t b + γ min α T (y) M(u) b] α Λ u U [g(u)t b + γ min α y,u Λ αy,u b], where (10) α y,u = α T (y) M(u). (11) Now, a compact backup operation that generates a new α-vector for a specific belief b can be written as: backup (Λ, b) = argmin α b u u U,α Λ α b u b, α b u = g(u) + γ argmin α y,u b, α y,u α Λ α y,u = α T (y) M(u). (12) Equation (12) implies the fact that to update α-vectors in each step of value iteration method, one can obtain α y,u for all y Y and u U, where α y,u are independent of b, and use them to obtain backup for all belief points in Λ. Let us assume that B m 1 contains the belief points in (m 1)th iteration, B m will contain at most B m 1 U Y new belief points and as a result, B m 1 U Y α-vectors should be computed. Therefore, the computational complexity of the method grows exponentially. The point-based value iteration method was first presented in [13] to bound the size of the cost function by computing the cost only at finite reachable belief points. In this paper, we employ the point-based value iteration method to find the infinite-horizon control policy for partially-observed Boolean dynamical systems. The method contains two main steps: 1) PBVI Update Step: The method starts with initial B 0 = {b 0 } and Λ 0 = {α 0 }. In the m-th iteration, for each belief point b B m 1, the point-based Bellman backup (equation (12)) will be performed based on the α-vectors in Λ m 1. In contrast to the exact value iteration method, the set of belief points will not be expanded at this step and the current belief points in B m 1 will stay the same and only the set of α-vectors will be

4 updated. The process will continue till the difference between cost function (J) in two consecutive iterations gets smaller than a prespecified threshold (β) for all b B m 1. The last set of obtained α-vectors creates Λ m set required for the next iteration. Note that the order of performing the backup operation on belief points in B m 1 is arbitrary; in addition, one common choice for initial α is α 0 (i) = 1 (1 γ) max j=1,...,2 d,u U g(x j, u) for i = 1,..., 2 d. 2) Belief Expansion Step: As it was mentioned before, the belief space is in a simplex in R 2d 1 and in order to have a better coverage of continuous belief space, the belief set should be expanded. This task can be done by expanding the belief set based on the belief points in B m 1. As we discussed before, each belief point in B m 1 results at most U Y different belief points. But, the idea is that instead of keeping all the belief points, we can keep only one of the U Y successor belief points for each belief point in B m 1. The basic idea for choosing the new points is to increase the coverage over the belief space, thus, it is logical to add the successor points which are far from our current belief set (B m 1 ). By following the above steps, at each iteration of the proposed method, the size of the belief set will be double. It should be noted that the stopping criterion for this method can be defined as the size of belief set reaching to a prespecified number. The complete steps of the point-based value iteration method for partiallyobserved Boolean dynamical systems are presented in Algorithm 1. To execute the obtained control policy efficiently, one can store the optimal control input resulting from each α-vector and use the pair for decision making. Finally, for the convergence of the method, it has been shown that the error between the cost function obtained by the m-th horizon of the method and the optimal cost is bounded [13]. V. NUMERICAL EXPERIMENT In this section, we conduct a numerical experiment using a Boolean network for metastatic melanoma [18]. The network contains 7 genes: WNT5A, pirin, S100P, RET1, MART1, HADHB and STC2. The regulatory relationship for this network is presented in Table I. The ith output binary string specifies the output value for ith input gene(s) in binary representation. For example, the last row of Table I specifies the value of STC2 at current time step k from different pairs of (pirin,stc2) values at previous time step k 1: (pirin = 0, STC2 = 0) k 1 STC2 k = 1 Algorithm 1 Point-Based Value Iteration for Partially- Observed Boolean Dynamical Systems 1: B 0 {b 0 }. 2: Λ 0 {α 0 }. 3: m 0 4: while size of B m is smaller than prespecified number do 5: m m : Λ m PBVI UPDATE (B m 1, Λ m 1 ). 7: B m BELIEF EXPANSION (B m 1 ). 8: end while 1: function PBVI UPDATE (B m 1, Λ m 1 ) 2: Υ Λ m 1 3: J (b) minα Υ α b, for all b B. 4: repeat 5: J J. 6: α y,u = α T (y)m(u), α Υ, u U, y Y. 7: Υ. 8: for each b B m 1 do 9: for each u U do 10: α b u = g(u) +γ arg minα y,u α Υ α y,u b. 11: end for 12: α b = arg min α b u u U,α Υ αb u b. 13: Υ Υ {α b }. 14: J (b) α b b 15: end for 16: Υ Υ. 17: until max b Bm 1 J(b) J (b) > β. 18: Λ m Υ 19: return Λ m 20: end function 1: function BELIEF EXPANSION (B m 1 ) 2: B m = B m 1. 3: for each b B m 1 do 4: b y,u = 5: B m = B m. 6: end for 7: return B m 8: end function T (y) M(u) b T (y) M(u) b 1, for u U, y Y. argmax b y,u u U, min b B m 1 b y,u b 1 (pirin = 0, STC2 = 1) k 1 STC2 k = 1 (pirin = 1, STC2 = 0) k 1 STC2 k = 0 (pirin = 1, STC2 = 1) k 1 STC2 k = 1 The goal of control is preventing WNT5A gene to be upregulated. For more information about the biological rationale for this, the reader is referred to [18]. The control input consists of flipping or not flipping the state of control input gene. The cost function is defined as follows: g(x j 5 + O(u) if WNT5A = 1,, u) = { O(u) if WNT5A = 0, for j = 1,..., 2 d ; where O(u) is 0 for no control input and 1 when control input is performed.

5 TABLE I: Boolean functions for the melanoma Boolean Network. Genes Input Gene(s) Output WNT5A HADHB 10 pirin prin, RET1,HADHB S100P S100P,RET1,STC RET1 RET1,HADHB,STC MART1 pirin,mart1,stc HADHB pirin,s100p,ret STC2 pirin,stc E2F The process noise is assumed to have independent components distributed as Bernoulli, with intensity p, so that all genes are perturbed with a small probability. We assume the states are observed through i.i.d. Bernoulli noise with parameter q the same for all genes. The discount factor γ is assumed to be 0.95 and the internal loop in Algorithm 1 will stop when the differences between cost of all belief points in two consecutive steps gets smaller than β = The outer loop is performed 10 times, which results in 1024 belief points. In Figure 2, the results of the proposed method under control of RET1 gene are presented as well as the results of Value Iteration method (with parameters γ = 0.95 and β = 10 8 [12]), which is the optimal control policy in a case of directly observed states, and also the results of the method introduced in [12] (we name it VI-BKF) which performs the policy obtained by Value Iteration based on the estimated states obtained by the Boolean Kalman Filter (BKF) [4]. All results are obtained over a long run of the system (10000 time steps), starting from 0 for all genes in the first step. It can be seen that the system is well controlled under RET1 gene control input. In the presence of small process and observation noise, the results of PBVI-POBDS and VI-BKF are close to the case of VI with directly observed states. As the noise increases, the fraction of observed desirable states for the system under control policies obtained by PBVI- POBDS and VI-BKF decreases. This reduction is more visible for VI-BKF; the reason is that the performance of state estimation by BKF degrades in the presence of larger process and observation noise, which increases the rate of wrong estimation and therefore, decreases control performance. The fraction of observed desirable states under different control inputs is shown in Figure 1. The process and observation noise are both considered to be small (p = 0.01, q = 0.01). It is clear that the system under control of WNT5A, S100P and RET1 control inputs are well controlled; on the other hand, pirin, MART1, HADHB Number Fig. 1: Fraction of observed desirable states under control input of RET1 gene (p and q denote the intensity of process and observation noise respectively). and STC2 are not good options to avoid undesirable states Number Fig. 2: Fraction of desirable states under different control inputs. The average fraction of observed desirable states for PBVI-POBDS for different number of belief points is presented in Figure 3. It can be seen that increasing the number of belief points increases the performance of the PBVI-POBDS method, especially at a small number of belief points. Finally, the activation of the WNT5A gene under control of RET1 gene and without control are shown in Figure 4 for 100 time steps. It can be seen that WNT5A is mostly upregulated in the system without control but it is downregulated under control policy obtained by PBVI-POBDS.

6 WNT5A Activations Number Fig. 3: Average Fraction of desirable states under RET1 control gene as a function of number of belief points N Under Control of RET1 Gene Without Control Time Fig. 4: Sample system evolution over 100 time steps for WNT5A gene under control of RET1 and without control. VI. CONCLUSION In this paper, we proposed a method to find the infinite-horizon control policy for partially-observed Boolean dynamical systems with a finite observation space. The application of the proposed method is discussed in the context of the Boolean network of melanoma observed through finite noisy observations. [4] U. Braga-Neto, Optimal state estimation for boolean dynamical systems, in Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on, pp , IEEE, [5] E. O. Voit and J. Almeida, Decoupling dynamical systems for pathway identification from metabolic profiles, Bioinformatics, vol. 20, no. 11, pp , [6] N. Friedman, M. Linial, I. Nachman, and D. Pe er, Using bayesian networks to analyze expression data, Journal of computational biology, vol. 7, no. 3-4, pp , [7] R. Pal, A. Datta, and E. R. Dougherty, Optimal infinite-horizon control for probabilistic boolean networks, Signal Processing, IEEE Transactions on, vol. 54, no. 6, pp , [8] M. Imani and U. Braga-Neto, Optimal state estimation for boolean dynamical systems using a boolean kalman smoother, in 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp , IEEE, [9] M. Imani and U. Braga-Neto, Maximum-likelihood adaptive filter for partially-observed boolean dynamical systems, IEEE transaction on Signal Processing, Accepted. [10] M. Imani and U. Braga-Neto, Optimal gene regulatory network inference using the boolean kalman filter and multiple model adaptive estimation, in th Asilomar Conference on Signals, Systems and Computers, pp , IEEE, [11] A. Bahadorinejad and U. Braga-Neto, Optimal fault detection and diagnosis in transcriptional circuits using next-generation sequencing, [12] M. Imani and U. Braga-Neto, State-feedback control of partially-observed boolean dynamical systems using rna-seq time series data, in 2016 American Control Conference (ACC2016), IEEE, [13] J. Pineau, G. Gordon, S. Thrun, et al., Point-based value iteration: An anytime algorithm for pomdps, in IJCAI, vol. 3, pp , [14] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, Planning and acting in partially observable stochastic domains, Artificial intelligence, vol. 101, no. 1, pp , [15] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic programming and optimal control, vol. 1. Athena Scientific Belmont, MA, [16] E. J. Sondik, The optimal control of partially observable markov processes over the infinite horizon: Discounted costs, Operations Research, vol. 26, no. 2, pp , [17] A. Cassandra, M. L. Littman, and N. L. Zhang, Incremental pruning: A simple, fast, exact method for partially observable markov decision processes, in Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence, pp , Morgan Kaufmann Publishers Inc., [18] E. R. Dougherty, R. Pal, X. Qian, M. L. Bittner, and A. Datta, Stationary and structural control in gene regulatory networks: basic concepts, International Journal of Systems Science, vol. 41, no. 1, pp. 5 16, REFERENCES [1] A. Datta, A. Choudhary, M. L. Bittner, and E. R. Dougherty, External control in markovian genetic regulatory networks, Machine learning, vol. 52, no. 1-2, pp , [2] S. A. Kauffman, Metabolic stability and epigenesis in randomly constructed genetic nets, Journal of theoretical biology, vol. 22, no. 3, pp , [3] I. Shmulevich, E. R. Dougherty, and W. Zhang, From boolean to probabilistic boolean networks as models of genetic regulatory networks, Proceedings of the IEEE, vol. 90, no. 11, pp , 2002.

BoolFilter Package Vignette

BoolFilter Package Vignette Levi D. McClenny, Mahdi Imani, Ulisses Braga-Neto Dept. of Electrical and Computer Engineering Texas A&M University - College Station, TX levimcclenny@tamu.edu, m.imani88@tamu.edu,