Discretized Approximations for POMDP with Average Cost

Size: px
Start display at page:

Download "Discretized Approximations for POMDP with Average Cost"

Transcription

1 Discretized Approximations for POMDP with Average Cost Hizhen Y Lab for Information and Decisions EECS Dept., MIT Cambridge, MA 0239 Dimitri P. Bertsekas Lab for Information and Decisions EECS Dept., MIT Cambridge, MA 0239 Abstract In this paper, we propose a new lower approximation scheme for POMDP with disconted and average cost criterion. The approximating fnctions are determined by their vales at a finite nmber of belief points, and can be compted efficiently sing vale iteration algorithms for finite-state MDP. While for disconted problems several lower approximation schemes have been proposed earlier, ors seems the first of its kind for average cost problems. We focs primarily on the average cost case, and we show that the corresponding approximation can be compted efficiently sing mlti-chain algorithms for finite-state MDP. We give a preliminary analysis showing that regardless of the existence of the optimal average cost J in the POMDP, the approximation obtained is a lower bond of the liminf optimal average cost fnction, and can also be sed to calclate an pper bond on the limsp optimal average cost fnction, as well as bonds on the cost of execting the stationary policy associated with the approximation. We show the convergence of the cost approximation, when the optimal average cost is constant and the optimal differential cost is continos. INTRODUCTION We consider discrete-time infinite horizon partially observable Markov Decision Processes (POMDP) with the state space S, the observation space Z and the control space U all being finite. Let X be the set of probability distribtions on S, called belief space, and g (s) be the per stage cost fnction. With the average cost criterion, we minimize over the policies π the average expected cost N Eπ { N t=0 g t (s t ) s 0 x}, as N goes to infinity, when the initial state s 0 follows the distribtion x. POMDPs with average cost criterion are sbstantially more difficlt to analyze than with disconted cost. Althogh there are optimality eqations whose soltion provides the optimal average cost fnction and a stationary optimal policy, in general there is no garantee that a soltion exists, and there are no finite comptation algorithms to obtain it. Therefore, discretized approximations are comptationally appealing as approximate soltions for average cost POMDP, since the problem of finite-state MDPs with average cost is well nderstood and can be solved with several commonly sed algorithms. We note that a discretization scheme for disconted POMDP that gives a lower approximation was first proposed by (Lovejoy, 99). It was later improved by (Zho and Hansen, 200). There have been no proposals of discretization schemes for average cost POMDP, to or knowledge. A conceptally different alternative to solve approximately average cost POMDP is the finite memory approach (Aberdeen and Baxter, 2002). In this approach, one seeks a policy that is average cost optimal within a class of finite state controllers. The advantage of the finite memory approach is that a sboptimal policy can be learned in a model-free fashion, i.e., with a simlator rather than an explicit transition probability model of the system. By contrast the discretization approaches of Lovejoy, and Zho and Hansen, as well as ors, reqire an exact mechanism for generating beliefs/conditional state distribtions, as the system is operating. We have recently become aware of the related work by (Ormoneit and Glynn, 2002) on MDP with continos state space and average cost. Or POMDP scheme can be viewed as a special case of their general approximation scheme. However, the lower approximation property is special to POMDP, and the corresponding asymptotic convergence reslts are also different in the two works.

2 The starting point for or discretization methodology is the disconted problem, for which we introdce a new lower approximation scheme, based on a fictitios optimistic controller that receives extra information abot the hidden states. The cost of this controller, a lower bond to the optimal cost, can be calclated sing finite-state MDP methods, and can be sed as an approximate cost-to-go fnction for a one-step lookahead scheme. We extend or approach to the average cost criterion, the discretized problem can be solved by mlti-chain algorithms for finite-state MDP. We show that the corresponding approximate cost is a lower bond to the optimal liminf average cost fnction, and can be sed to obtain an pper bond to the optimal limsp average cost fnction, as well as bonds on the cost of the stationary policy associated with the approximation. We show asymptotic convergence of the cost approximation of the discretization scheme, assming that the optimal average cost is constant and the optimal differential cost is continos. The paper is organized as follows. In Section 2, we consider discretized approximations in the disconted case, and introdce a new approximation scheme. We prove asymptotic convergence for two main discretization schemes. In Section 3, we extend discretized approximations to the average cost case, and give an analysis of error bonds and asymptotic convergence. Finally in Section 4, we present experimental reslts. De to space limitations, some of the proofs have been omitted. They can be fond in an expanded version of this paper (Y and Bertsekas, 2004), which also addresses some additional topics, inclding a general framework for deriving pper and lower approximation schemes for POMDP. 2 DISCOUNTED CASE We introdce a new approximation scheme and smmarize known discretized lower approximation schemes for the disconted case. The belief MDPs associated with them will be the basis for the lower approximation schemes in the average cost case. The reslts obtained here will also be sefl there. In the disconted case, we minimize the disconted cost E π { α t g t (s t ) s 0 x} t for a fixed α 0, ). The optimal cost fnction Jα(x) satisfies the Bellman eqation J α(x) = (TJ α)(x), (TJ)(x) = min U x g + αe z { J ( φ (x, z) )} ], and denotes transpose, g denotes the per stage cost vector, and φ (x, z) denotes the conditional distribtion of s after applying control and observing z. A few notations for expectations will be sed throghot the text. At places emphasis of the distribtion is necessary, we se the symbol E z x, {...}, which shold be read as z p(z x, )..., and is eqivalent to the conditional expectation E z {... x, }. 2. A NEW INEQUALITY The optimal cost J α( ) is concave, i.e., for any convex combination x = i γ i( x)x i, γ i ( x) 0 and i γ i( x) =, J α ( x) i γ i( x)j α (x i). Using this property with x = φ (x, z) in the Bellman eqation, we have the following ineqality that was proposed by (Zho and Hansen, 200) for a discretized cost approximation: Jα(x) { min x g +αe z x, γ ( i φ (x, z) ) J i α(x i ) } ]. () We introdce a new ineqality, which follows from concavity of E z x, {Jα (φ (x, z))} in x. Proposition For all x X, x i X and γ i (x) 0 sch that x = i γ i(x)x i, i γ i(x) =, the optimal cost Jα (x) satisfies J α (x) min x g +α γ { ( i(x)e z xi, J i α φ (x i, z) )}]. (2) We present here, however, an alternative proof, that ses the interpretation of a modified process in which there is additional information abot the randomness of the initial distribtion. This argment has the same spirit as region-observable-pomdp (Zhang and Li, 997), and can be generalized (Y and Bertsekas, 2004). Since Prop. implies concavity of Jα ( ), which is not sed in the proof, it can also be sed to establish concavity of Jα ( ) withot an indction argment. Proof: Consider a new process P, otherwise identical to the original POMDP, except that the initial distribtion of s 0 is generated by a mixtre of m distribtions x i marginally identical to x. By this we mean that there is a random variable q taking vales from to m with p(q = k x) = γ k (x), p(s 0 q = k) = x k (s 0 ). Assme q is not accessible to the controller. The optimal cost for this new process eqals Jα (x), and is This is so becase x g = P i γi(x)x ig and min P P min.

3 achieved by the policy π that is optimal in the original POMDP. Denote its action at x by a. We have Jα (x) =x g a + E { E π { α t g(s t, t ) x, a, z, q} x, a }. t= Let φ a (x, (z, q)) be the distribtion p(s x, a, (z, q)). As q and hence φ a (x, (z, q)) are inaccessible to π, by the optimality of Jα ( ), we have that in the last eqation E π { α t g(s t, t ) x, a, z, q} αjα( φa (x, (z, q)) ). t= Since φ a (x, (z, q)) = φ a (x i, z) given q = i, it follows that Jα(x) x { g a + αe (z,q) x,a J α ( φa (x, (z, q)) )} = x g a + α γ { ( i(x)e z xi,a J i α φa (x i, z) )} min x g +α γ { ( i(x)e z xi, J i α φ (x i, z) )}]. 2.2 DISCRETIZED APPROXIMATIONS We first smmarize known lower approximation schemes, and then prove asymptotic convergence for two main schemes corresponding to the ineqalities () and (2) Approximation Schemes Let G = {x i } be a finite set of beliefs sch that their convex hll is X. A simple choice is to discretize X into a reglar grid, so we refer to x i as grid points. By choosing different x i and γ i ( ) in the ineqalities () and (2), we obtain lower cost approximations that are fnctionally determined by their vales at a finite nmber of beliefs. Definition (-Discretization Scheme) Call (G, γ) an -discretization scheme G = {x i } is a set of n beliefs, γ = (γ ( ),..., γ n ( )) is a convex representation scheme sch that x = i γ i(x)x i for all x X, and is a scalar characterizing the fineness of the discretization, and defined by = max x X max x i G γ i(x)>0 x x i. Given (G, γ), let T Di, i =, 2, be the associated mappings corresponding to the right-hand sides of ineqalities () and (2), respectively: ( T D J)(x) = min x g + α i E z x,{ γi ( φ (x, z) )} J(x i ) ], ( T D2 J)(x) = min x g + α i γ i(x)e z xi,{ J ( φ (x i, z) )}]. (3) (4) Associated with these mappings are their niqe belief MDPs on the continos belief space X, which we will refer as the modified belief MDPs. The optimal cost fnctions J i in these modified belief MDPs satisfy, respectively, ( T Di Ji )(x) = J i (x) J α(x), x X, i =, 2. Both J i are fnctionally determined by their vales at a finite nmber of beliefs, which will be called spporting points, and whose set is denoted by C. In particlar, the fnction J can be compted by solving a corresponding finite-state MDP on C = G = {x i }, and the fnction J 2 can be compted by solving a corresponding finite-state MDP on C = {φ (x i, z) x i G, U, z Z}. 2 The comptation can ths be done efficiently by variants of vale iteration methods, or linear programming. Usally X is partitioned into convex regions and beliefs in a region are represented as the convex combinations of its vertices. The fnction J is then piecewise linear on each region, and the fnction J 2 is piecewise linear and concave on each region. To see the latter, let q(x i, ) = E z xi,{ J 2 ( φ (x i, z) ) }, and we have J 2 (x) = min x g + α s γ i(x)q(x i, )]. The simplest case for both mappings is when G consists of vertices of the belief simplex, i.e. G = {e s s S}, e s (s) = and e s (s ) = 0, s s, s, s S. Denote the corresponding mappings by T D 0 i, i =, 2, respectively, i.e., ( T D 0 J)(x) = min x g + α s p(s x, )J(e s) ], (5) ( T D 0 2 J)(x) = min x g + α s x(s)e z s,{j ( φ (e s, z) ) } ]. (6) The mapping T D 0 is the QMDP approximation, sggested by (Littman, Cassandra, and Kaelbling, 995), who have shown good reslts for certain applications. In the belief MDP associated with T D 0, the states will be observable after the initial step. In the belief MDP associated with T D 0 2, the previos state will be revealed at each stage. One can show that T D 0 2 gives a better 2 More precisely, C = {φ (x i, z) x i G, U, z Z, sch that p(z x i, ) > 0}.

4 approximation than T D 0 in both disconted and average cost cases. For the comparison of T Di in general, by concavity of Jα, one can relax the ineqality Jα T D2 Jα to obtain an ineqality of the same form as the ineqality Jα T D Jα. See (Y and Bertsekas, 2004) for these details. By concatenating mappings we obtain other discretized lower approximations. For example, T T Di, i =, 2; TI T D2, (7) T I denotes a region-observable-pomdp type of mapping (Zhang and Li, 997). In the concatenated mapping ( T I T D2 ) we only need grid points to be on lower dimensional spaces. Let T be any of the above mappings. Its associated modified belief MDP is not necessarily a POMDP model. It is straightforward to show the following, 3 by comparing the N-stage optimal cost of the modified MDP to that of the original POMDP. This reslt also holds for α =. Proposition 2 Let J 0 be a concave fnction on X. For any α 0, ], ( T N J 0 )(x) (T N J 0 )(x), Asymptotic Convergence x X, N. We will now provide a limiting theorem for T D and T D2 sing the niform continity property of J α ( ). We first give some conventional notations related to policies, to be sed throghot the paper. Let µ be a stationary policy, and J µ be its cost. We define the mapping T µ by (T µ J)(x) = x g µ(x) + αe z x,µ(x) {J ( φ µ(x) (x, z) ) }, and similarly for any control, we define T to be the mapping that has the same single control in place of µ(x) in T µ. Let T be either T D or T D2, and similarly let T µ and T correspond to a policy µ and control, respectively. The fnction J α(x) is continos on X. For any continos fnction v( ), E z x, {v ( φ (x, z) ) } is also continos on X. As X is compact, by the niform continity of corresponding fnctions, we have the following lemma. Lemma Let v( ) be a continos fnction on X. For any δ > 0, there exists > 0 sch that for any -discretization scheme (G, γ) with, (T v)(x) ( T v)(x) δ, x X, U, 3 Use indction and concavity, or alternatively an argment similar to the proof of Prop.. T is either T D or T D2 associated with (G, γ). By Lemma, and the standard error bonds J J T J J α and J µ J Tµ J J α (see e.g., (Bertsekas, 200)), we have the following limiting theorem, which states that the lower approximation and the cost of its look-ahead policy, as well as the cost of the optimal policy with respect to the modified belief MDP, all converge to the optimal cost of the original POMDP. Theorem Let (G k, γ k ) be a seqence of k - discretization schemes with k 0 as k. Let J k, µ k and µ k be sch that J k = T k Jk = T k, µk Jk, T µk Jk = T J k, T k is either T D or T D2 associated with (G k, γ k ). Then for any fixed α 0, ), J k J α, J µk J α, J µk J α, as k. 3 DISCRETIZED APPROXIMATIONS FOR AVERAGE COST CRITERION In average cost POMDP, the objective is to minimize the average cost N Eπ { N t=0 g(s t, t ) s 0 x 0 }, as N goes to infinity. For POMDP with average cost, in order that a stationary optimal policy exists, it is sfficient that the following fnctional eqations, in the belief MDP notation, J(x) = min E x x, {J( x)}, (8) J(x) + h(x) = min U(x) x g + E x x, {h( x)}], U(x) = argmin E x x, {J( x)}, admit a bonded soltion (J ( ), h ( )). The stationary policy that obtains the minimm is then optimal with its average cost being J (x). However, there are no finite comptation algorithms to obtain it. (For a general analysis of POMDP with average cost, see (Fernández-Gacherand, Arapostathis, and Marcs, 99) or the srvey by (Arapostathis et al., 993).) We now extend the application of the discretized approximations to the average cost case. First, note that solving the corresponding average cost problem in the discretized approach is mch easier. Let T be any of the mappings from Eq. (3)-(7) in Section For its associated modified belief MDP, writing ḡ (x) for cost per-stage, we have the following average cost optimal-

5 ity eqations: J(x) = min Ẽ x x, {J( x)}, (9) J(x) + h(x) = min U(x) ḡ (x) + Ẽ x x,{h( x)}], U(x) = argmin Ẽ x x, {J( x)}, and we se Ẽ to indicate that the expectation is taken with respect to the distribtions p( x x, ) of the modified MDP, which satisfy p( x x, ) = 0, (x, ), x C, with C being the finite set of spporting beliefs. There are bonded soltions ( J( ), h( )) to the optimality eqations (9) for the following reason: Every finitestate MDP problem admits a soltion to its average cost optimality eqations. Frthermore if x C, x is transient and nreachable from C, and the next belief x belongs to C nder any control in the modified MDP. It follows that the optimality eqations (9) restricted on {x} C are the optimality eqations for the finite-state MDP with C + states, so the soltion ( J( x), h( x)) exists for x x C with their vales on C independent of x. This is essentially the algorithm to solve J( ) and h( ) in two stages, and obtain an optimal stationary policy for the modified MDP. Concerns arise, however, abot sing any optimal policy for the modified MDP as sboptimal control in the original POMDP. Althogh all average cost optimal policies behave eqally optimally in the asymptotic sense, they do so in the modified MDP, in which all the states x C are transient. As an illstration, sppose for the completely observable MDP, the optimal average cost is constant over all states, then at any belief x C any control will have the same asymptotic average cost in the modified MDP corresponding to the QMDP approximation scheme. The sitation worsens, if even the completely observable MDP itself has a large nmber of states that are transient nder its optimal policies. We therefore speclate that for the modified MDP, we shold aim to compte policies with additional optimality garantees, relating to their finite-stage behaviors. Fortnately for finitestate MDPs, there are efficient algorithms for compting sch policies. In the following we present the algorithm, after a brief review of the related reslts for finite-state MDP, and give preliminary analysis of error bonds and asymptotic convergence. We show sfficient conditions for the convergence of cost approximation, assming that the optimal average cost of the POMDP is constant. 3. ALGORITHM We first briefly review related reslts for finite-state MDPs. Since average cost measres the asymptotic behavior of a policy, given two policies having the same average cost, one can incr significantly larger cost in finite steps than the other. The concept of n-discont optimality is sefl for differentiating between sch policies. It is also closely related to Blackwell optimality. A policy π is n-discont optimal if its cost in the disconted cases satisfy lim sp α ( α) n (J π α (s) J π α(s)) 0, s, π. By definition an (n + )-discont policy is also k- discont optimal for k =, 0,..., n. A policy is called Blackwell optimal, if it is optimal for all the disconted problems with discont factor α ᾱ, ) for some ᾱ <. For finite-state MDPs, a policy is Blackwell optimal if and only if it is -discont optimal. By contrast, any ( )-discont optimal policy is average cost optimal. For any finite-state MDP, there exist stationary average cost optimal policies and frthermore, stationary n-discont optimal and Blackwell optimal policies. In particlar, there exist fnctions J( ), h( ) and w k ( ), k = 0,...,n +, with w 0 = h sch that they satisfy the following nested eqations: J(s) + h(s) = w k (s) + w k (s) = J(s) = min U(s) E s s,{j( s)}, (0) min g (s) + E s s, {h( s)}], U (s) min E s s,{w k ( s)}, U k (s) U (s) = argmin E s s, {J( s)}, U(s) U 0 (s) = argmin g (s) + E s s, {h( s)}], U (s) U k (s) = arg min E s s, {w k ( s)}. U k (s) Any stationary policy that attains the minimm in the right-hand sides of the eqations in (0) is an n- discont optimal policy. For finite-state MDPs, a stationary n-discont optimal policy not only exists, bt can also be efficiently compted by mlti-chain algorithms. Frthermore, in order to obtain a Blackwell optimal policy, which is -discont optimal, it is sfficient to compte a (N 2)-discont optimal policy, N is the nmber of states in the finite-state MDP. We refer readers to (Pterman, 994) Chapter 0, especially Section 0.3 for details of the algorithm as well as theoretical analysis.

6 This leads to the following algorithm for compting an n-discont optimal policy for the modified MDP defined on the continos belief space. We first solve the average cost problem on C, then determine optimal controls on transient states x C. Note there are no conditions (sch as nichain) at all on this modified belief MDP. The algorithm solving the modified MDP. Compte an n-discont optimal soltion for the finite-state MDP problem associated with C. Let J (x i ), h(x i ), and w k (x i ), k =,...,n +, with x i C, be the corresponding fnctions obtained that satisfy Eq. (0) on C. 2. For any belief x, let the control set U n+ be compted at the last step of the seqence of optimizations: U = arg min Ẽ xi x,{ J (x i )}, U 0 = arg min U ḡ (x) + Ẽx i x,{ h(x i )}], U k = arg min U k Ẽ xi x,{ w k (x i )}, k n +. Let be any control in U n+, and let µ (x) =. Also if x C, define J (x) = Ẽx i x,{ J (x i )}, h(x) = ḡ (x) + Ẽx i x,{ h(x i )} J (x). With the above algorithm we obtain an (n )- discont optimal policy for the modified MDP. When n = C, we obtain an -discont optimal policy for the modified MDP, 4 since the algorithm essentially comptes a Blackwell optimal policy for every finitestate MDP restricted on {x} C, for all x. Ths, for the modified MDP, for any other policy π, and any x X, lim sp α ( α) n µ ( J α (x) J α(x)) π 0, n. It is also straightforward to see that J (x) = lim α ( α) J α(x), x X, () J α (x) are the optimal disconted costs for the modified MDP, and the convergence is niform over X, since J α(x) and J (x) are piecewise linear interpolations of the fnction vales on a finite set of beliefs. 4 Note that -discont optimality and Blackwell optimality are eqivalent for finite-state MDPs, however, they are not eqivalent in the case of a continos state space. In the modified MDP, althogh for each x there exists an α(x) (0, ) sch that µ (x) is optimal for all α-disconted problems with α(x) α <, we may have sp x α(x) = de to the continity of the belief space. 3.2 ANALYSIS OF ERROR BOUNDS We now show how to bond the optimal average cost of the original POMDP, and how to bond the cost of execting the sboptimal policy, that is optimal to the modified MDP, in the original POMDP. Let V π N (x) = Eπ { N t=0 ḡ t (x t ) x 0 = x} be the N- stage cost of a non-randomized policy π, which can be non-stationary, in the original POMDP. Let J (x)=inf liminf π N V N(x), π J+(x)=inf limsp π N V N(x). π It is straightforward to show 5 that J (x) J + (x), x X. We now show that J (x) J (x), x X. Proposition 3 The optimal average cost fnction J (x) of the modified MDP satisfies J (x) J (x), x X. Proof: Let VN (x) and Ṽ N (x) be the optimal N-stage cost fnction of the original POMDP, and of the modified belief MDP, respectively. By Prop. 2 in Section 2.2., we have Ṽ N (x) V N (x), N. Ths J (x) = liminf N Ṽ N (x) liminf N V N (x) J (x). Next we give a simple pper bond on J +( ). Theorem 2 The optimal liminf and limsp average cost fnctions satisfy J (x) J (x) J+(x) max J ( x) + δ, δ = max x X x C (T h)(x) J (x) h(x) ], and J (x), h(x) and C are defined as in the modified MDP. This statement is a conseqence of the following lemma, whose proof, omitted here, follows by bonding the expected cost per stage in the smmation of the N-stage cost. Lemma 2 Let J(x) and h(x) be any bonded fnctions on X, and µ be any stationary policy. Define 5 Since in the disconted case the corresponding lower approximation satisfies J α(x) J α(x), by Eq. () and a Taberian theorem, we have for the approximate average cost J (x) = lim α ( α) J α(x) lim inf α ( α)j α(x) inf lim sp V π N N(x) π = J+(x).

7 constants δ + and δ by δ + =max x X δ =min x X ḡµ(x) (x) + E x x,µ(x) {h( x)} J(x) h(x) ], ḡµ(x) (x) + E x x,µ(x) {h( x)} J(x) h(x) ]. Then V µ N (x), the N-stage cost of execting policy µ, satisfies β (x) + δ liminf limsp N V µ N (x) β + (x), β (x) are defined by β + (x) = max J( x), x D x µ N V µ N (x) β+ (x) + δ +, x X, β (x) = min J(x), x D x µ and D x µ denotes the set of beliefs reachable nder policy µ from x. Let µ be the stationary policy that is optimal for the modified MDP. We can se Lemma 2 to bond the liminf and limsp average cost of µ in the original POMDP. For example, if the optimal average cost JMDP of the completely observable MDP problem eqals the constant λ over all states, then we also have J (x) = λ, x X, for this modified MDP. The cost of execting the policy µ in the original POMDP can therefore be bonded by λ + δ liminf limsp N V µ N (x) N V µ N (x) λ + δ +. The qantities δ + and δ can be hard to calclate exactly in general, since J ( ) and h( ) obtained from the modified MDP are piecewise linear fnctions. The bonds may also be loose. On the other hand, these fnctions may indicate the strctre of the original problem, and help s to refine the discretization scheme in the approximation. 3.3 ANALYSIS OF ASYMPTOTIC CONVERGENCE Let (G, γ) be an -discretization scheme, and J and J α, be the optimal average cost and disconted cost, respectively, in the modified MDP associated with (G, γ) and either T D or T D2. Recall that in the disconted case (Theorem ) for a fixed discont factor α, we have asymptotic convergence to optimality: lim J α, (x) = Jα(x). We now address the qestion whether J J (x), as 0, when J (x) = J (x) = J + (x) exists. This qestion of asymptotic convergence nder the average cost criterion is hard to tackle for a cople of reasons. First of all, it is not clear when J (x) exists. (Fernández-Gacherand, Arapostathis, and Marcs, 99) have shown that nder certain conditions, (sch as the condition that Jα (x) J α ( x) is bonded for all α 0, ), and its relaxed variants,) the optimal average cost J (x) exists and eqals a constant λ over X, and frthermore λ = lim ( α)jα (x), x X. (2) α However, even when Eq. (2) holds, in general we have lim J (x) = lim lim α ( α) J α, (x) lim α lim ( α) J α, (x) = λ. To ensre that J λ, we therefore need stronger conditions than those that garantee the existence of λ. We now show that a sfficient condition is the continity of the optimal differential cost h ( ). Theorem 3 Sppose the average cost optimality eqations (8) admit a bonded soltion (J (x), h (x)) with J (x) eqal to a constant λ. Then, if the differential cost h (x) is continos on X, we have lim J (x) = λ, x X, and the convergence is niform, J is the optimal average cost fnction for the modified MDP corresponding to either T D or T D2 with an associated -discretization scheme (G, γ). Proof: Let µ be the optimal policy for the modified MDP associated with an -discretization scheme. Let T be the mapping corresponding to the modified MDP, defined by ( Tv)(x) = min ḡ (x) + Ẽ x x,{v( x)}]. Since h (x) is continos on X, by Lemma in Section 2.2.2, we have that for any δ > 0, there exists > 0 sch that for all -discretization schemes with <, (T µ h )(x) ( T µ h )(x) δ. (3) We now apply the reslt of Lemma 2 in the modified MDP with J = λ, h = h, and µ = µ. That is, by the same argment as in Lemma 2, we have J (x) = liminf N Ṽ µ N (x) λ + η, x X, η = min x X ( T µ h )(x) λ h (x) ]. Since λ + h (x) = (Th )(x) (T µ h )(x), and (T µ h )(x) ( T µ h )(x) δ by Eq. (3), we have ( T µ h )(x) λ h (x) δ.

8 Hence η δ, and J (x) λ δ for all, and x X, which proves the niform convergence of J to λ. Note that the ineqality J J is crcial in the preceding proof. Note also that the proof does not generalize to the case when J (x) is not constant. A fairly strong sfficient condition that garantees the existence of a constant J and a continos h is that Jα (x) is eqicontinos on X for all α 0, ). (For a proof see (Ross, 968) or Theorem 6.3 (iv) in (Arapostathis et al., 993)). 4 PRELIMINARY EXPERIMENTS We demonstrate or approach on a set of toy problems: Paint, Bridge-repair, and Shttle. The sizes of the problems are smmarized in Table. Their descriptions and parameters are as specified in A. Cassandra s POMDP File Repository ( and we define costs to be negative rewards when a problem has a reward model. Table : Sizes of Problems S U Z Paint Bridge Shttle We sed some simple grid patterns. One pattern, referred to as k-e, consists of k grid points on each edge, in addition to the vertices of the belief simplex. Another pattern, referred to as n-r, consists of n randomly chosen grid points, in addition to the vertices of the simplex. The combined pattern is referred to as k-e+n-r. Ths the grid pattern for QMDP approximation is 0-E, for instance, and 2-E+0-R is a combined pattern. The grid pattern then indces a partition of the belief space and a convex representation (interpolation) scheme, which we kept implicitly and compted by linear programming on-line. The algorithm for solving the modified finite-state MDP was implemented by solving a system of linear eqations for each policy iteration. This may not be the most efficient way. No higher than 5-discont optimal policies were compted, when the nmber of spporting points became large. Figre shows the average cost approximation of T D and T D2 with a few grid patterns for the problem Paint. In all cases we obtained a constant average cost for the modified MDP. The horizontal axis is labeled by the grid pattern, and the vertical axis is the approximate cost. The red crve is obtained by T D, 0 E E 2 E+0 R 3 E 3 E+0 R 3 E+00 R 4 E Figre : Average Cost Approximation for Problem Paint Using Varios Grid Patterns. Ble: TD2, Red: T D. and the ble crve T D2. As will be shown below, the approximation obtained by T D2 with 3-E is already near optimal. The policies generated by T D2 are not always better, however. We also notice, as indicated by the drop in the crves when sing grid pattern 4-E, that the improvement of cost approximation does not solely depend on the nmber of grid points, bt also on they are positioned. In Table 2 we smmarize the cost approximations obtained (colmn LB) and the simlated cost of the policies (colmn S. Policy) for the three problems. The approximation schemes obtaining LB vales in Table 2, as well as the policies simlated, are listed in Table 3. The colmn N. UB shows the nmerically compted pper bond of the optimal we calclate δ in Theorem 2 by sampling the vales of (T h)(x) h(x) J(x) at hndreds of beliefs generated randomly and taking the maximm over them. Ths the N. UB vales are nder-estimates of the exact pper bond. For both Paint and Shttle the nmber of trajectories simlated is 60, and for Bridge 000. Each trajectory has 500 steps starting from the same belief. The first nmber in S. Policy in Table 2 is the mean over the average cost of simlated trajectories, and the standard error listed as the second nmber is estimated from bootstrap samples we created 00 psedo-random samples by sampling from the empirical distribtion of the original sample and compted the standard deviation of the mean estimator over these 00 psedo-random samples. As shown in Table 2, we find that some policy from the discretized approximation with very coarse grids can already be comparable to the optimal. This is verified by simlating the policy (S. Policy) and comparing its average cost against the lower bond of the optimal

9 Table 2: Average Cost Approximations and Simlated Average Cost of Policies Problem LB N. UB S. Policy Paint ±0.002 Bridge ±.258 Shttle ±0.007 Table 3: Approximation Schemes in LB and Simlated Policies in Table 2 Problem LB S. Policy Paint TD2 w/ 3-E TD w/ -E Bridge TD2 w/ 0-E TD2 w/ 0-E Shttle TD,2 w/ 2-E T D w/ 2-E (LB), which in trn shows that the lower approximation is near optimal. We find that in some cases the pper bonds may be too loose to be informative. For example, in the problem Paint we know that there is a simple policy achieving zero average cost, therefore a near-zero pper bond does not tell mch abot the optimal. In the experiments we also observe that an approximation scheme with more grid points does not necessarily provide a better pper bond of the optimal. 5 CONCLUSION In this paper we have proposed a discretized lower approximation approach for POMDP with average cost. We have shown that the approximations can be compted efficiently sing mlti-chain algorithms for finite-state MDP, and they can be sed for bonding the optimal liminf and limsp average cost fnctions, as well as generating sboptimal policies. Ths, like the finite state controller approach, or approach also bypasses the difficlt analytic qestions sch as the existence of bonded soltions to the average cost optimality eqations. We have also introdced a new lower approximation scheme for both disconted and average cost cases, and shown asymptotic convergence of two main approximation schemes in the average cost case nder certain conditions. Acknowledgements References Aberdeen, D. and J. Baxter (2002). Internalstate policy-gradient algorithms for infinite-horizon POMDPs. Technical report, RSISE, Astralian National University. Arapostathis, A., V. S. Borkar, E. Fernández- Gacherand, M. K. Ghosh, and S. I. Marcs (993). Discrete-time controlled Markov processes with average cost criterion: a srvey. SIAM J. Control and Optimization 3(2): Bertsekas, D. P. (200). Dynamic Programming and Optimal Control, Vols. I, II. Athena Scientific, second edition. Fernández-Gacherand, E., A. Arapostathis, and S. I. Marcs (99). On the average cost optimality eqation and the strctre of optimal policies for partially observable Markov decision processes. Ann. Operations Research 29: Littman, M. L., A. R. Cassandra, and L. P. Kaelbling (995). Learning policies for partially observable environments: Scaling p. In Int. Conf. Machine Learning. Lovejoy, W. S. (99). Comptationally feasible bonds for partially observed Markov decision processes. Operations Research 39(): Ormoneit, D. and P. Glynn (2002). Kernel-based reinforcement learning in average-cost problems. IEEE Trans. Atomatic Control 47(0): Pterman, M. L. (994). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, Inc. Ross, S. M. (968). Arbitrary state Markovian decision processes. Ann. Mathematical Statistics 39(6): Y, H. and D. P. Bertsekas (2004). Ineqalities and their applications in vale approximation for disconted and average cost POMDP. LIDS tech. report, MIT. to appear. Zhang, N. L. and W. Li (997). A model approximation scheme for planning in partially observable stochastic domains. J. Artificial Intelligence Research 7: Zho, R. and E. A. Hansen (200). An improved gridbased approximation algorithm for POMDPs. In Int. J. Conf. Artificial Intelligence. This work is spported by NSF Grant ECS We thank Leslie Kaelbling for helpfl discssions.

FINITE ELEMENT APPROXIMATION OF CONVECTION DIFFUSION PROBLEMS USING GRADED MESHES

FINITE ELEMENT APPROXIMATION OF CONVECTION DIFFUSION PROBLEMS USING GRADED MESHES FINITE ELEMENT APPROXIMATION OF CONVECTION DIFFUSION PROBLEMS USING GRADED MESHES RICARDO G. DURÁN AND ARIEL L. LOMBARDI Abstract. We consider the nmerical approximation of a model convection-diffsion

More information

COMPOSITION OF STABLE SET POLYHEDRA

COMPOSITION OF STABLE SET POLYHEDRA COMPOSITION OF STABLE SET POLYHEDRA Benjamin McClosky and Illya V. Hicks Department of Comptational and Applied Mathematics Rice University November 30, 2007 Abstract Barahona and Mahjob fond a defining

More information

On the Computational Complexity and Effectiveness of N-hub Shortest-Path Routing

On the Computational Complexity and Effectiveness of N-hub Shortest-Path Routing 1 On the Comptational Complexity and Effectiveness of N-hb Shortest-Path Roting Reven Cohen Gabi Nakibli Dept. of Compter Sciences Technion Israel Abstract In this paper we stdy the comptational complexity

More information

Minimal Edge Addition for Network Controllability

Minimal Edge Addition for Network Controllability This article has been accepted for pblication in a ftre isse of this jornal, bt has not been flly edited. Content may change prior to final pblication. Citation information: DOI 10.1109/TCNS.2018.2814841,

More information

An Adaptive Strategy for Maximizing Throughput in MAC layer Wireless Multicast

An Adaptive Strategy for Maximizing Throughput in MAC layer Wireless Multicast University of Pennsylvania ScholarlyCommons Departmental Papers (ESE) Department of Electrical & Systems Engineering May 24 An Adaptive Strategy for Maximizing Throghpt in MAC layer Wireless Mlticast Prasanna

More information

Tu P7 15 First-arrival Traveltime Tomography with Modified Total Variation Regularization

Tu P7 15 First-arrival Traveltime Tomography with Modified Total Variation Regularization T P7 15 First-arrival Traveltime Tomography with Modified Total Variation Reglarization W. Jiang* (University of Science and Technology of China) & J. Zhang (University of Science and Technology of China)

More information

Finite Domain Cuts for Minimum Bandwidth

Finite Domain Cuts for Minimum Bandwidth Finite Domain Cts for Minimm Bandwidth J. N. Hooker Carnegie Mellon University, USA Joint work with Alexandre Friere, Cid de Soza State University of Campinas, Brazil INFORMS 2013 Two Related Problems

More information

Image Denoising Algorithms

Image Denoising Algorithms Image Denoising Algorithms Xiang Hao School of Compting, University of Utah, USA, hao@cs.tah.ed Abstract. This is a report of an assignment of the class Mathematics of Imaging. In this assignment, we first

More information

A sufficient condition for spiral cone beam long object imaging via backprojection

A sufficient condition for spiral cone beam long object imaging via backprojection A sfficient condition for spiral cone beam long object imaging via backprojection K. C. Tam Siemens Corporate Research, Inc., Princeton, NJ, USA Abstract The response of a point object in cone beam spiral

More information

Evaluating Influence Diagrams

Evaluating Influence Diagrams Evalating Inflence Diagrams Where we ve been and where we re going Mark Crowley Department of Compter Science University of British Colmbia crowley@cs.bc.ca Agst 31, 2004 Abstract In this paper we will

More information

Optimal Sampling in Compressed Sensing

Optimal Sampling in Compressed Sensing Optimal Sampling in Compressed Sensing Joyita Dtta Introdction Compressed sensing allows s to recover objects reasonably well from highly ndersampled data, in spite of violating the Nyqist criterion. In

More information

An Improved Policy Iteratioll Algorithm for Partially Observable MDPs

An Improved Policy Iteratioll Algorithm for Partially Observable MDPs An Improved Policy Iteratioll Algorithm for Partially Observable MDPs Eric A. Hansen Computer Science Department University of Massachusetts Amherst, MA 01003 hansen@cs.umass.edu Abstract A new policy

More information

CS 4204 Computer Graphics

CS 4204 Computer Graphics CS 424 Compter Graphics Crves and Srfaces Yong Cao Virginia Tech Reference: Ed Angle, Interactive Compter Graphics, University of New Mexico, class notes Crve and Srface Modeling Objectives Introdce types

More information

Bias of Higher Order Predictive Interpolation for Sub-pixel Registration

Bias of Higher Order Predictive Interpolation for Sub-pixel Registration Bias of Higher Order Predictive Interpolation for Sb-pixel Registration Donald G Bailey Institte of Information Sciences and Technology Massey University Palmerston North, New Zealand D.G.Bailey@massey.ac.nz

More information

h-vectors of PS ear-decomposable graphs

h-vectors of PS ear-decomposable graphs h-vectors of PS ear-decomposable graphs Nima Imani 2, Lee Johnson 1, Mckenzie Keeling-Garcia 1, Steven Klee 1 and Casey Pinckney 1 1 Seattle University Department of Mathematics, 901 12th Avene, Seattle,

More information

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 6, NO. 5, MAY On the Analysis of the Bluetooth Time Division Duplex Mechanism

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 6, NO. 5, MAY On the Analysis of the Bluetooth Time Division Duplex Mechanism IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 6, NO. 5, MAY 2007 1 On the Analysis of the Bletooth Time Division Dplex Mechanism Gil Zssman Member, IEEE, Adrian Segall Fellow, IEEE, and Uri Yechiali

More information

Fault Tolerance in Hypercubes

Fault Tolerance in Hypercubes Falt Tolerance in Hypercbes Shobana Balakrishnan, Füsn Özgüner, and Baback A. Izadi Department of Electrical Engineering, The Ohio State University, Colmbs, OH 40, USA Abstract: This paper describes different

More information

arxiv: v1 [cs.cg] 27 Nov 2015

arxiv: v1 [cs.cg] 27 Nov 2015 On Visibility Representations of Non-planar Graphs Therese Biedl 1, Giseppe Liotta 2, Fabrizio Montecchiani 2 David R. Cheriton School of Compter Science, University of Waterloo, Canada biedl@waterloo.ca

More information

Alliances and Bisection Width for Planar Graphs

Alliances and Bisection Width for Planar Graphs Alliances and Bisection Width for Planar Graphs Martin Olsen 1 and Morten Revsbæk 1 AU Herning Aarhs University, Denmark. martino@hih.a.dk MADAGO, Department of Compter Science Aarhs University, Denmark.

More information

REPLICATION IN BANDWIDTH-SYMMETRIC BITTORRENT NETWORKS. M. Meulpolder, D.H.J. Epema, H.J. Sips

REPLICATION IN BANDWIDTH-SYMMETRIC BITTORRENT NETWORKS. M. Meulpolder, D.H.J. Epema, H.J. Sips REPLICATION IN BANDWIDTH-SYMMETRIC BITTORRENT NETWORKS M. Melpolder, D.H.J. Epema, H.J. Sips Parallel and Distribted Systems Grop Department of Compter Science, Delft University of Technology, the Netherlands

More information

Picking and Curves Week 6

Picking and Curves Week 6 CS 48/68 INTERACTIVE COMPUTER GRAPHICS Picking and Crves Week 6 David Breen Department of Compter Science Drexel University Based on material from Ed Angel, University of New Mexico Objectives Picking

More information

Maximum Weight Independent Sets in an Infinite Plane

Maximum Weight Independent Sets in an Infinite Plane Maximm Weight Independent Sets in an Infinite Plane Jarno Nosiainen, Jorma Virtamo, Pasi Lassila jarno.nosiainen@tkk.fi, jorma.virtamo@tkk.fi, pasi.lassila@tkk.fi Department of Commnications and Networking

More information

Curves and Surfaces. CS 537 Interactive Computer Graphics Prof. David E. Breen Department of Computer Science

Curves and Surfaces. CS 537 Interactive Computer Graphics Prof. David E. Breen Department of Computer Science Crves and Srfaces CS 57 Interactive Compter Graphics Prof. David E. Breen Department of Compter Science E. Angel and D. Shreiner: Interactive Compter Graphics 6E Addison-Wesley 22 Objectives Introdce types

More information

Multiple-Choice Test Chapter Golden Section Search Method Optimization COMPLETE SOLUTION SET

Multiple-Choice Test Chapter Golden Section Search Method Optimization COMPLETE SOLUTION SET Mltiple-Choice Test Chapter 09.0 Golden Section Search Method Optimization COMPLETE SOLUTION SET. Which o the ollowing statements is incorrect regarding the Eqal Interval Search and Golden Section Search

More information

StaCo: Stackelberg-based Coverage Approach in Robotic Swarms

StaCo: Stackelberg-based Coverage Approach in Robotic Swarms Maastricht University Department of Knowledge Engineering Technical Report No.:... : Stackelberg-based Coverage Approach in Robotic Swarms Kateřina Staňková, Bijan Ranjbar-Sahraei, Gerhard Weiss, Karl

More information

Discrete Cost Multicommodity Network Optimization Problems and Exact Solution Methods

Discrete Cost Multicommodity Network Optimization Problems and Exact Solution Methods Annals of Operations Research 106, 19 46, 2001 2002 Klwer Academic Pblishers. Manfactred in The Netherlands. Discrete Cost Mlticommodity Network Optimization Problems and Exact Soltion Methods MICHEL MINOUX

More information

The Impact of Avatar Mobility on Distributed Server Assignment for Delivering Mobile Immersive Communication Environment

The Impact of Avatar Mobility on Distributed Server Assignment for Delivering Mobile Immersive Communication Environment This fll text paper was peer reviewed at the direction of IEEE Commnications Society sbject matter experts for pblication in the ICC 27 proceedings. The Impact of Avatar Mobility on Distribted Server Assignment

More information

Introduction to Computational Manifolds and Applications

Introduction to Computational Manifolds and Applications IMPA - Institto de Matemática Pra e Aplicada, Rio de Janeiro, RJ, Brazil Introdction to Comptational Manifolds and Applications Part 1 - Constrctions Prof. Marcelo Ferreira Siqeira mfsiqeira@dimap.frn.br

More information

New Architectures for Hierarchical Predictive Control

New Architectures for Hierarchical Predictive Control Preprint, 11th IFAC Symposim on Dynamics and Control of Process Systems, inclding Biosystems Jne 6-8, 216. NTNU, Trondheim, Norway New Architectres for Hierarchical Predictive Control Victor M. Zavala

More information

STABILITY OF SIMULTANEOUS RECURRENT NEURAL NETWORK DYNAMICS FOR STATIC OPTIMIZATION

STABILITY OF SIMULTANEOUS RECURRENT NEURAL NETWORK DYNAMICS FOR STATIC OPTIMIZATION STABILITY OF SIMULTANEOUS RECURRENT NEURAL NETWOR DYNAMICS FOR STATIC OPTIMIZATION Grsel Serpen and Yifeng X Electrical Engineering and Compter Science Department, The University of Toledo, Toledo, OH

More information

Statistical Methods in functional MRI. Standard Analysis. Data Processing Pipeline. Multiple Comparisons Problem. Multiple Comparisons Problem

Statistical Methods in functional MRI. Standard Analysis. Data Processing Pipeline. Multiple Comparisons Problem. Multiple Comparisons Problem Statistical Methods in fnctional MRI Lectre 7: Mltiple Comparisons 04/3/13 Martin Lindqist Department of Biostatistics Johns Hopkins University Data Processing Pipeline Standard Analysis Data Acqisition

More information

Lecture 10. Diffraction. incident

Lecture 10. Diffraction. incident 1 Introdction Lectre 1 Diffraction It is qite often the case that no line-of-sight path exists between a cell phone and a basestation. In other words there are no basestations that the cstomer can see

More information

OPTI-502 Optical Design and Instrumentation I John E. Greivenkamp Homework Set 9 Fall, 2018

OPTI-502 Optical Design and Instrumentation I John E. Greivenkamp Homework Set 9 Fall, 2018 OPTI-502 Optical Design and Instrmentation I John E. Greivenkamp Assigned: 10/31/18 Lectre 21 De: 11/7/18 Lectre 23 Note that in man 502 homework and exam problems (as in the real world!!), onl the magnitde

More information

POWER-OF-2 BOUNDARIES

POWER-OF-2 BOUNDARIES Warren.3.fm Page 5 Monday, Jne 17, 5:6 PM CHAPTER 3 POWER-OF- BOUNDARIES 3 1 Ronding Up/Down to a Mltiple of a Known Power of Ronding an nsigned integer down to, for eample, the net smaller mltiple of

More information

An Approach to State Aggregation for POMDPs

An Approach to State Aggregation for POMDPs An Approach to State Aggregation for POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Eric A. Hansen Dept. of Computer Science and Engineering

More information

Master for Co-Simulation Using FMI

Master for Co-Simulation Using FMI Master for Co-Simlation Using FMI Jens Bastian Christoph Claß Ssann Wolf Peter Schneider Franhofer Institte for Integrated Circits IIS / Design Atomation Division EAS Zenerstraße 38, 69 Dresden, Germany

More information

Dynamic Maintenance of Majority Information in Constant Time per Update? Gudmund S. Frandsen and Sven Skyum BRICS 1 Department of Computer Science, Un

Dynamic Maintenance of Majority Information in Constant Time per Update? Gudmund S. Frandsen and Sven Skyum BRICS 1 Department of Computer Science, Un Dynamic Maintenance of Majority Information in Constant Time per Update? Gdmnd S. Frandsen and Sven Skym BRICS 1 Department of Compter Science, University of arhs, Ny Mnkegade, DK-8000 arhs C, Denmark

More information

Triangle-Free Planar Graphs as Segments Intersection Graphs

Triangle-Free Planar Graphs as Segments Intersection Graphs Triangle-ree Planar Graphs as Segments Intersection Graphs N. de Castro 1,.J.Cobos 1, J.C. Dana 1,A.Márqez 1, and M. Noy 2 1 Departamento de Matemática Aplicada I Universidad de Sevilla, Spain {natalia,cobos,dana,almar}@cica.es

More information

EECS 487: Interactive Computer Graphics f

EECS 487: Interactive Computer Graphics f Interpolating Key Vales EECS 487: Interactive Compter Graphics f Keys Lectre 33: Keyframe interpolation and splines Cbic splines The key vales of each variable may occr at different frames The interpolation

More information

Augmenting the edge connectivity of planar straight line graphs to three

Augmenting the edge connectivity of planar straight line graphs to three Agmenting the edge connectivity of planar straight line graphs to three Marwan Al-Jbeh Mashhood Ishaqe Kristóf Rédei Diane L. Sovaine Csaba D. Tóth Pavel Valtr Abstract We characterize the planar straight

More information

p-norm MINIMIZATION OVER INTERSECTIONS OF CONVEX SETS İlker Bayram

p-norm MINIMIZATION OVER INTERSECTIONS OF CONVEX SETS İlker Bayram p-norm MINIMIZATION OVER INTERSECTIONS OF CONVEX SETS İlker Bayram Istanbl Technical University, Department of Electronics and Telecommnications Engineering, Istanbl, Trkey ABSTRACT We consider the imization

More information

Topological Drawings of Complete Bipartite Graphs

Topological Drawings of Complete Bipartite Graphs Topological Drawings of Complete Bipartite Graphs Jean Cardinal Stefan Felsner y Febrary 017 Abstract Topological drawings are natral representations of graphs in the plane, where vertices are represented

More information

Uncertainty Determination for Dimensional Measurements with Computed Tomography

Uncertainty Determination for Dimensional Measurements with Computed Tomography Uncertainty Determination for Dimensional Measrements with Compted Tomography Kim Kiekens 1,, Tan Ye 1,, Frank Welkenhyzen, Jean-Pierre Krth, Wim Dewlf 1, 1 Grop T even University College, KU even Association

More information

A GENERIC MODEL OF A BASE-ISOLATED BUILDING

A GENERIC MODEL OF A BASE-ISOLATED BUILDING Chapter 5 A GENERIC MODEL OF A BASE-ISOLATED BUILDING This chapter draws together the work o Chapters 3 and 4 and describes the assembly o a generic model o a base-isolated bilding. The irst section describes

More information

Summer 2017 MATH Suggested Solution to Exercise Find the tangent hyperplane passing the given point P on each of the graphs: (a)

Summer 2017 MATH Suggested Solution to Exercise Find the tangent hyperplane passing the given point P on each of the graphs: (a) Smmer 2017 MATH2010 1 Sggested Soltion to Exercise 6 1 Find the tangent hyperplane passing the given point P on each of the graphs: (a) z = x 2 y 2 ; y = z log x z P (2, 3, 5), P (1, 1, 1), (c) w = sin(x

More information

The Disciplined Flood Protocol in Sensor Networks

The Disciplined Flood Protocol in Sensor Networks The Disciplined Flood Protocol in Sensor Networks Yong-ri Choi and Mohamed G. Goda Department of Compter Sciences The University of Texas at Astin, U.S.A. fyrchoi, godag@cs.texas.ed Hssein M. Abdel-Wahab

More information

Nash Convergence of Gradient Dynamics in General-Sum Games. Michael Kearns.

Nash Convergence of Gradient Dynamics in General-Sum Games. Michael Kearns. Convergence of Gradient Dynamics in General-Sm Games Satinder Singh AT&T Labs Florham Park, NJ 7932 bavejaresearch.att.com Michael Kearns AT&T Labs Florham Park, NJ 7932 mkearnsresearch.att.com Yishay

More information

Reinforcement Learning: A brief introduction. Mihaela van der Schaar

Reinforcement Learning: A brief introduction. Mihaela van der Schaar Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming

More information

Continuity Smooth Path Planning Using Cubic Polynomial Interpolation with Membership Function

Continuity Smooth Path Planning Using Cubic Polynomial Interpolation with Membership Function J Electr Eng Technol Vol., No.?: 74-?, 5 http://dx.doi.org/.537/jeet.5..?.74 ISSN(Print) 975- ISSN(Online) 93-743 Continity Smooth Path Planning Using Cbic Polomial Interpolation with Membership Fnction

More information

PARAMETER OPTIMIZATION FOR TAKAGI-SUGENO FUZZY MODELS LESSONS LEARNT

PARAMETER OPTIMIZATION FOR TAKAGI-SUGENO FUZZY MODELS LESSONS LEARNT PAAMETE OPTIMIZATION FO TAKAGI-SUGENO FUZZY MODELS LESSONS LEANT Manfred Männle Inst. for Compter Design and Falt Tolerance Univ. of Karlsrhe, 768 Karlsrhe, Germany maennle@compter.org Brokat Technologies

More information

arxiv: v3 [math.co] 7 Sep 2018

arxiv: v3 [math.co] 7 Sep 2018 Cts in matchings of 3-connected cbic graphs Kolja Knaer Petr Valicov arxiv:1712.06143v3 [math.co] 7 Sep 2018 September 10, 2018 Abstract We discss conjectres on Hamiltonicity in cbic graphs (Tait, Barnette,

More information

Estimating Model Parameters and Boundaries By Minimizing a Joint, Robust Objective Function

Estimating Model Parameters and Boundaries By Minimizing a Joint, Robust Objective Function Proceedings 999 IEEE Conf. on Compter Vision and Pattern Recognition, pp. 87-9 Estimating Model Parameters and Bondaries By Minimiing a Joint, Robst Objective Fnction Charles V. Stewart Kishore Bbna Amitha

More information

Constructing Multiple Light Multicast Trees in WDM Optical Networks

Constructing Multiple Light Multicast Trees in WDM Optical Networks Constrcting Mltiple Light Mlticast Trees in WDM Optical Networks Weifa Liang Department of Compter Science Astralian National University Canberra ACT 0200 Astralia wliang@csaneda Abstract Mlticast roting

More information

Offline and Online Scheduling of Concurrent Bags-of-Tasks on Heterogeneous Platforms

Offline and Online Scheduling of Concurrent Bags-of-Tasks on Heterogeneous Platforms Offline and Online Schedling of Concrrent Bags-of-Tasks on Heterogeneos Platforms Anne Benoit, Loris Marchal, Jean-François Pinea, Yves Robert, Frédéric Vivien To cite this version: Anne Benoit, Loris

More information

Approximate Linear Programming for Average-Cost Dynamic Programming

Approximate Linear Programming for Average-Cost Dynamic Programming Approximate Linear Programming for Average-Cost Dynamic Programming Daniela Pucci de Farias IBM Almaden Research Center 65 Harry Road, San Jose, CA 51 pucci@mitedu Benjamin Van Roy Department of Management

More information

Topic Continuity for Web Document Categorization and Ranking

Topic Continuity for Web Document Categorization and Ranking Topic Continity for Web ocment Categorization and Ranking B. L. Narayan, C. A. Mrthy and Sankar. Pal Machine Intelligence Unit, Indian Statistical Institte, 03, B. T. Road, olkata - 70008, India. E-mail:

More information

Multi-lingual Multi-media Information Retrieval System

Multi-lingual Multi-media Information Retrieval System Mlti-lingal Mlti-media Information Retrieval System Shoji Mizobchi, Sankon Lee, Fmihiko Kawano, Tsyoshi Kobayashi, Takahiro Komats Gradate School of Engineering, University of Tokshima 2-1 Minamijosanjima,

More information

Constrained Routing Between Non-Visible Vertices

Constrained Routing Between Non-Visible Vertices Constrained Roting Between Non-Visible Vertices Prosenjit Bose 1, Matias Korman 2, André van Renssen 3,4, and Sander Verdonschot 1 1 School of Compter Science, Carleton University, Ottawa, Canada. jit@scs.carleton.ca,

More information

MPC for Humanoid Gait Generation: Stability and Feasibility

MPC for Humanoid Gait Generation: Stability and Feasibility MPC for Hmanoid Gait Generation: Stability and Feasibility Nicola Scianca, Daniele De Simone, Leonardo Lanari, Giseppe Oriolo arxiv:191.855v1 [cs.ro] 24 Jan 219 Abstract We present a novel MPC framework

More information

Blended Deformable Models

Blended Deformable Models Blended Deformable Models (In IEEE Trans. Pattern Analysis and Machine Intelligence, April 996, 8:4, pp. 443-448) Doglas DeCarlo and Dimitri Metaxas Department of Compter & Information Science University

More information

Fast Obstacle Detection using Flow/Depth Constraint

Fast Obstacle Detection using Flow/Depth Constraint Fast Obstacle etection sing Flow/epth Constraint S. Heinrich aimlerchrylser AG P.O.Box 2360, -89013 Ulm, Germany Stefan.Heinrich@aimlerChrysler.com Abstract The early recognition of potentially harmfl

More information

AUTOMATIC REGISTRATION FOR REPEAT-TRACK INSAR DATA PROCESSING

AUTOMATIC REGISTRATION FOR REPEAT-TRACK INSAR DATA PROCESSING AUTOMATIC REGISTRATION FOR REPEAT-TRACK INSAR DATA PROCESSING Mingsheng LIAO, Li ZHANG, Zxn ZHANG, Jiangqing ZHANG Whan Technical University of Srveying and Mapping, Natinal Lab. for Information Eng. in

More information

Tdb: A Source-level Debugger for Dynamically Translated Programs

Tdb: A Source-level Debugger for Dynamically Translated Programs Tdb: A Sorce-level Debgger for Dynamically Translated Programs Naveen Kmar, Brce R. Childers, and Mary Lo Soffa Department of Compter Science University of Pittsbrgh Pittsbrgh, Pennsylvania 15260 {naveen,

More information

Hardware-Accelerated Free-Form Deformation

Hardware-Accelerated Free-Form Deformation Hardware-Accelerated Free-Form Deformation Clint Cha and Ulrich Nemann Compter Science Department Integrated Media Systems Center University of Sothern California Abstract Hardware-acceleration for geometric

More information

Date: December 5, 1999 Dist'n: T1E1.4

Date: December 5, 1999 Dist'n: T1E1.4 12/4/99 1 T1E14/99-559 Project: T1E14: VDSL Title: Vectored VDSL (99-559) Contact: J Cioffi, G Ginis, W Y Dept of EE, Stanford U, Stanford, CA 945 Cioffi@stanforded, 1-65-723-215, F: 1-65-724-3652 Date:

More information

Page # CISC360. Integers Sep 11, Encoding Integers Unsigned. Encoding Example (Cont.) Topics. Twoʼs Complement. Sign Bit

Page # CISC360. Integers Sep 11, Encoding Integers Unsigned. Encoding Example (Cont.) Topics. Twoʼs Complement. Sign Bit Topics CISC3 Integers Sep 11, 28 Nmeric Encodings Unsigned & Twoʼs complement Programming Implications C promotion rles Basic operations Addition, negation, mltiplication Programming Implications Conseqences

More information

Q-learning with linear function approximation

Q-learning with linear function approximation Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007

More information

Maximal Cliques in Unit Disk Graphs: Polynomial Approximation

Maximal Cliques in Unit Disk Graphs: Polynomial Approximation Maximal Cliqes in Unit Disk Graphs: Polynomial Approximation Rajarshi Gpta, Jean Walrand, Oliier Goldschmidt 2 Department of Electrical Engineering and Compter Science Uniersity of California, Berkeley,

More information

5 Performance Evaluation

5 Performance Evaluation 5 Performance Evalation his chapter evalates the performance of the compared to the MIP, and FMIP individal performances. We stdy the packet loss and the latency to restore the downstream and pstream of

More information

Ma Lesson 18 Section 1.7

Ma Lesson 18 Section 1.7 Ma 15200 Lesson 18 Section 1.7 I Representing an Ineqality There are 3 ways to represent an ineqality. (1) Using the ineqality symbol (sometime within set-bilder notation), (2) sing interval notation,

More information

Towards Tight Bounds on Theta-Graphs

Towards Tight Bounds on Theta-Graphs Toards Tight Bonds on Theta-Graphs arxiv:10.633v1 [cs.cg] Apr 01 Prosenjit Bose Jean-Lo De Carfel Pat Morin André van Renssen Sander Verdonschot Abstract We present improved pper and loer bonds on the

More information

Computer-Aided Mechanical Design Using Configuration Spaces

Computer-Aided Mechanical Design Using Configuration Spaces Compter-Aided Mechanical Design Using Configration Spaces Leo Joskowicz Institte of Compter Science The Hebrew University Jersalem 91904, Israel E-mail: josko@cs.hji.ac.il Elisha Sacks (corresponding athor)

More information

Seismic trace interpolation with approximate message passing Navid Ghadermarzy and Felix Herrmann and Özgür Yılmaz, University of British Columbia

Seismic trace interpolation with approximate message passing Navid Ghadermarzy and Felix Herrmann and Özgür Yılmaz, University of British Columbia Seismic trace interpolation with approximate message passing Navid Ghadermarzy and Felix Herrmann and Özgür Yılmaz, University of British Colmbia SUMMARY Approximate message passing (AMP) is a comptationally

More information

Lecture 19 Subgradient Methods. November 5, 2008

Lecture 19 Subgradient Methods. November 5, 2008 Subgradient Methods November 5, 2008 Outline Lecture 19 Subgradients and Level Sets Subgradient Method Convergence and Convergence Rate Convex Optimization 1 Subgradients and Level Sets A vector s is a

More information

Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs

Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs Nevin L. Zhang and Weihong Zhang lzhang,wzhang @cs.ust.hk Department of Computer Science Hong Kong University of Science &

More information

Identification of the Symmetries of Linear Systems with Polytopic Constraints

Identification of the Symmetries of Linear Systems with Polytopic Constraints 2014 American Control Conference (ACC) Jne 4-6, 2014. Portland, Oregon, USA Identification of the Symmetries of Linear Systems with Polytopic Constraints Clas Danielson and Francesco Borrelli Abstract

More information

Representing a Cubic Graph as the Intersection Graph of Axis-parallel Boxes in Three Dimensions

Representing a Cubic Graph as the Intersection Graph of Axis-parallel Boxes in Three Dimensions Representing a Cbic Graph as the Intersection Graph of Axis-parallel Boxes in Three Dimensions ABSTRACT Abhijin Adiga Network Dynamics and Simlation Science Laboratory Virginia Bioinformatics Institte,

More information

arxiv: v1 [cs.cg] 26 Sep 2018

arxiv: v1 [cs.cg] 26 Sep 2018 Convex partial transversals of planar regions arxiv:1809.10078v1 [cs.cg] 26 Sep 2018 Vahideh Keikha Department of Mathematics and Compter Science, University of Sistan and Balchestan, Zahedan, Iran va.keikha@gmail.com

More information

This chapter is based on the following sources, which are all recommended reading:

This chapter is based on the following sources, which are all recommended reading: Bioinformatics I, WS 09-10, D. Hson, December 7, 2009 105 6 Fast String Matching This chapter is based on the following sorces, which are all recommended reading: 1. An earlier version of this chapter

More information

ABSOLUTE DEFORMATION PROFILE MEASUREMENT IN TUNNELS USING RELATIVE CONVERGENCE MEASUREMENTS

ABSOLUTE DEFORMATION PROFILE MEASUREMENT IN TUNNELS USING RELATIVE CONVERGENCE MEASUREMENTS Proceedings th FIG Symposim on Deformation Measrements Santorini Greece 00. ABSOUTE DEFORMATION PROFIE MEASUREMENT IN TUNNES USING REATIVE CONVERGENCE MEASUREMENTS Mahdi Moosai and Saeid Khazaei Mining

More information

A fast point-based algorithm for POMDPs

A fast point-based algorithm for POMDPs A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl

More information

Pavlin and Daniel D. Corkill. Department of Computer and Information Science University of Massachusetts Amherst, Massachusetts 01003

Pavlin and Daniel D. Corkill. Department of Computer and Information Science University of Massachusetts Amherst, Massachusetts 01003 From: AAAI-84 Proceedings. Copyright 1984, AAAI (www.aaai.org). All rights reserved. SELECTIVE ABSTRACTION OF AI SYSTEM ACTIVITY Jasmina Pavlin and Daniel D. Corkill Department of Compter and Information

More information

Minimum Spanning Trees Outline: MST

Minimum Spanning Trees Outline: MST Minimm Spanning Trees Otline: MST Minimm Spanning Tree Generic MST Algorithm Krskal s Algorithm (Edge Based) Prim s Algorithm (Vertex Based) Spanning Tree A spanning tree of G is a sbgraph which is tree

More information

Application of Gaussian Curvature Method in Development of Hull Plate Surface

Application of Gaussian Curvature Method in Development of Hull Plate Surface Marine Engineering Frontiers (MEF) Volme 2, 204 Application of Gassian Crvatre Method in Development of Hll Plate Srface Xili Zh, Yjn Li 2 Dept. of Infor., Dalian Univ., Dalian 6622, China 2 Dept. of Naval

More information

Efficient Scheduling for Periodic Aggregation Queries in Multihop Sensor Networks

Efficient Scheduling for Periodic Aggregation Queries in Multihop Sensor Networks 1 Efficient Schedling for Periodic Aggregation Qeries in Mltihop Sensor Networks XiaoHa X, Shaojie Tang, Member, IEEE, XiangYang Li, Senior Member, IEEE Abstract In this work, we stdy periodic qery schedling

More information

Real-time mean-shift based tracker for thermal vision systems

Real-time mean-shift based tracker for thermal vision systems 9 th International Conference on Qantitative InfraRed Thermography Jly -5, 008, Krakow - Poland Real-time mean-shift based tracker for thermal vision systems G. Bieszczad* T. Sosnowski** * Military University

More information

TOWARD AN UNCERTAINTY PRINCIPLE FOR WEIGHTED GRAPHS

TOWARD AN UNCERTAINTY PRINCIPLE FOR WEIGHTED GRAPHS TOWARD AN UNCERTAINTY PRINCIPLE FOR WEIGHTED GRAPHS Bastien Pasdelop, Réda Alami, Vincent Gripon Telecom Bretagne UMR CNRS Lab-STICC name.srname@telecom-bretagne.e Michael Rabbat McGill University ECE

More information

Congestion-adaptive Data Collection with Accuracy Guarantee in Cyber-Physical Systems

Congestion-adaptive Data Collection with Accuracy Guarantee in Cyber-Physical Systems Congestion-adaptive Data Collection with Accracy Garantee in Cyber-Physical Systems Nematollah Iri, Lei Y, Haiying Shen, Gregori Calfield Department of Electrical and Compter Engineering, Clemson University,

More information

METAMODEL FOR SOFTWARE SOLUTIONS IN COMPUTED TOMOGRAPHY

METAMODEL FOR SOFTWARE SOLUTIONS IN COMPUTED TOMOGRAPHY VOL. 10, NO 22, DECEBER, 2015 ISSN 1819-6608 ETAODEL FOR SOFTWARE SOLUTIONS IN COPUTED TOOGRAPHY Vitaliy ezhyev Faclty of Compter Systems and Software Engineering, Universiti alaysia Pahang, Gambang, alaysia

More information

Sampling Online Social Networks by Random Walk with Indirect Jumps

Sampling Online Social Networks by Random Walk with Indirect Jumps Sampling Online Social Networks by Random Walk with Indirect Jmps Jnzho Zhao, Pinghi Wang, John C.S. Li, Don Towsley, and Xiaohong Gan arxiv:708.0908v [cs.si] 30 Ag 207 Abstract Random walk-based sampling

More information

arxiv: v1 [cs.si] 12 Dec 2016

arxiv: v1 [cs.si] 12 Dec 2016 Connection Discovery sing Shared Images by Gassian Relational Topic Model Xiaopeng Li, Ming Cheng, James She HKUST-NIE Social Media Lab, Hong Kong University of Science & Technology, Hong Kong xlibo@connect.st.hk,

More information

SECOND order computational methods commonly employed in production codes, while sufficient for many applications,

SECOND order computational methods commonly employed in production codes, while sufficient for many applications, th AIAA Comptational Flid Dynamics Conference 7-3 Jne, Honoll, Hawaii AIAA -384 Active Fl Schemes for Systems Timothy A. Eymann DoD HPCMP/CREATE Kestrel Team, Eglin AFB, FL 354 Philip L. Roe Department

More information

Networks An introduction to microcomputer networking concepts

Networks An introduction to microcomputer networking concepts Behavior Research Methods& Instrmentation 1978, Vol 10 (4),522-526 Networks An introdction to microcompter networking concepts RALPH WALLACE and RICHARD N. JOHNSON GA TX, Chicago, Illinois60648 and JAMES

More information

Fast Ray Tetrahedron Intersection using Plücker Coordinates

Fast Ray Tetrahedron Intersection using Plücker Coordinates Fast Ray Tetrahedron Intersection sing Plücker Coordinates Nikos Platis and Theoharis Theoharis Department of Informatics & Telecommnications University of Athens Panepistemiopolis, GR 157 84 Ilissia,

More information

On Plane Constrained Bounded-Degree Spanners

On Plane Constrained Bounded-Degree Spanners Algorithmica manscript No. (ill be inserted by the editor) 1 On Plane Constrained Bonded-Degree Spanners 2 3 Prosenjit Bose Rolf Fagerberg André an Renssen Sander Verdonschot 4 5 Receied: date / Accepted:

More information

Method to build an initial adaptive Neuro-Fuzzy controller for joints control of a legged robot

Method to build an initial adaptive Neuro-Fuzzy controller for joints control of a legged robot Method to bild an initial adaptive Nero-Fzzy controller for joints control of a legged robot J-C Habmremyi, P. ool and Y. Badoin Royal Military Academy-Free University of Brssels 08 Hobbema str, box:mrm,

More information

On total regularity of the join of two interval valued fuzzy graphs

On total regularity of the join of two interval valued fuzzy graphs International Jornal of Scientific and Research Pblications, Volme 6, Isse 12, December 2016 45 On total reglarity of the join of two interval valed fzzy graphs Soriar Sebastian 1 and Ann Mary Philip 2

More information

EUCLIDEAN SKELETONS USING CLOSEST POINTS. Songting Luo. Leonidas J. Guibas. Hong-Kai Zhao. (Communicated by the associate editor name)

EUCLIDEAN SKELETONS USING CLOSEST POINTS. Songting Luo. Leonidas J. Guibas. Hong-Kai Zhao. (Communicated by the associate editor name) Volme X, No. 0X, 200X, X XX Web site: http://www.aimsciences.org EUCLIDEAN SKELETONS USING CLOSEST POINTS Songting Lo Department of Mathematics, University of California, Irvine Irvine, CA 92697-3875,

More information

The Intersection of Two Ringed Surfaces and Some Related Problems

The Intersection of Two Ringed Surfaces and Some Related Problems Graphical Models 63, 8 44 001) doi:10.1006/gmod.001.0553, available online at http://www.idealibrary.com on The Intersection of Two Ringed Srfaces and Some Related Problems Hee-Seok Heo and Sng Je Hong

More information

Appearance Based Tracking with Background Subtraction

Appearance Based Tracking with Background Subtraction The 8th International Conference on Compter Science & Edcation (ICCSE 213) April 26-28, 213. Colombo, Sri Lanka SD1.4 Appearance Based Tracking with Backgrond Sbtraction Dileepa Joseph Jayamanne Electronic

More information