Morfismos, Vol 12, No 2, 2008

Page 1

VOLUMEN 12 NÚMERO 2 JULIO A DICIEMBRE DE 2008 ISSN: 1870-6525


Morfismos Comunicaciones Estudiantiles Departamento de Matem´aticas Cinvestav

Editores Responsables • Isidoro Gitler • Jes´ us Gonz´alez

Consejo Editorial • Luis Carrera • Samuel Gitler • On´esimo Hern´andez-Lerma • Hector Jasso Fuentes • Miguel Maldonado • Ra´ ul Quiroga Barranco • Enrique Ram´ırez de Arellano • Enrique Reyes • Armando S´anchez • Mart´ın Solis • Leticia Z´arate

Editores Asociados • Ricardo Berlanga • Emilio Lluis Puebla • Isa´ıas L´opez • Guillermo Pastor • V´ıctor P´erez Abreu • Carlos Prieto • Carlos Renter´ıa • Luis Verde

Secretarias T´ecnicas • Roxana Mart´ınez • Laura Valencia ISSN: 1870 - 6525 Morfismos puede ser consultada electr´onicamente en “Revista Morfismos” en la direcci´on http://www.math.cinvestav.mx. Para mayores informes dirigirse al tel´efono 57 47 38 71. Toda correspondencia debe ir dirigida a la Sra. Laura Valencia, Departamento de Matem´aticas del Cinvestav, Apartado Postal 14-740, M´exico, D.F. 07000 o por correo electr´onico: laura@math.cinvestav.mx.


VOLUMEN 12 NÚMERO 2 JULIO A DICIEMBRE DE 2008 ISSN: 1870-6525



Informaci´ on para Autores El Consejo Editorial de Morfismos, Comunicaciones Estudiantiles del Departamento de Matem´ aticas del CINVESTAV, convoca a estudiantes de licenciatura y posgrado a someter art´ıculos para ser publicados en esta revista bajo los siguientes lineamientos: • Todos los art´ıculos ser´ an enviados a especialistas para su arbitraje. No obstante, los art´ıculos ser´ an considerados s´ olo como versiones preliminares y por tanto pueden ser publicados en otras revistas especializadas. • Se debe anexar junto con el nombre del autor, su nivel acad´ emico y la instituci´ on donde estudia o labora. • El art´ıculo debe empezar con un resumen en el cual se indique de manera breve y concisa el resultado principal que se comunicar´ a. • Es recomendable que los art´ıculos presentados est´ en escritos en Latex y sean enviados a trav´ es de un medio electr´ onico. Los autores interesados pueden obtener el foron web mato LATEX 2ε utilizado por Morfismos en “Revista Morfismos” de la direcci´ http://www.math.cinvestav.mx, o directamente en el Departamento de Matem´ aticas del CINVESTAV. La utilizaci´ on de dicho formato ayudar´ a en la pronta publicaci´ on del art´ıculo. • Si el art´ıculo contiene ilustraciones o figuras, ´ estas deber´ an ser presentadas de forma que se ajusten a la calidad de reproducci´ on de Morfismos. • Los autores recibir´ an un total de 15 sobretiros por cada art´ıculo publicado.

• Los art´ıculos deben ser dirigidos a la Sra. Laura Valencia, Departamento de Matem´ aticas del Cinvestav, Apartado Postal 14 - 740, M´ exico, D.F. 07000, o a la direcci´ on de correo electr´ onico laura@math.cinvestav.mx

Author Information Morfismos, the student journal of the Mathematics Department of the Cinvestav, invites undergraduate and graduate students to submit manuscripts to be published under the following guidelines: • All manuscripts will be refereed by specialists. However, accepted papers will be considered to be “preliminary versions” in that authors may republish their papers in other journals, in the same or similar form. • In addition to his/her affiliation, the author must state his/her academic status (student, professor,...). • Each manuscript should begin with an abstract summarizing the main results.

• Morfismos encourages electronically submitted manuscripts prepared in Latex. Authors may retrieve the LATEX 2ε macros used for Morfismos through the web site http://www.math.cinvestav.mx, at “Revista Morfismos”, or by direct request to the Mathematics Department of Cinvestav. The use of these macros will help in the production process and also to minimize publishing costs. • All illustrations must be of professional quality.

• 15 offprints of each article will be provided free of charge.

• Manuscripts submitted for publication should be sent to Mrs. Laura Valencia, Departamento de Matem´ aticas del Cinvestav, Apartado Postal 14 - 740, M´ exico, D.F. 07000, or to the e-mail address: laura@math.cinvestav.mx


Lineamientos Editoriales “Morfismos” es la revista semestral de los estudiantes del Departamento de Matem´ aticas del CINVESTAV, que tiene entre sus principales objetivos el que los estudiantes adquieran experiencia en la escritura de resultados matem´ aticos. La publicaci´ on de trabajos no estar´ a restringida a estudiantes del CINVESTAV; deseamos fomentar tambi´en la participaci´ on de estudiantes en M´exico y en el extranjero, as´ı como la contribuci´ on por invitaci´ on de investigadores. Los reportes de investigaci´ on matem´ atica o res´ umenes de tesis de licenciatura, maestr´ıa o doctorado pueden ser publicados en Morfismos. Los art´ıculos que aparecer´ an ser´ an originales, ya sea en los resultados o en los m´etodos. Para juzgar ´esto, el Consejo Editorial designar´ a revisores de reconocido prestigio y con experiencia en la comunicaci´ on clara de ideas y conceptos matem´ aticos. Aunque Morfismos es una revista con arbitraje, los trabajos se considerar´ an como versiones preliminares que luego podr´ an aparecer publicados en otras revistas especializadas. Si tienes alguna sugerencia sobre la revista hazlo saber a los editores y con gusto estudiaremos la posibilidad de implementarla. Esperamos que esta publicaci´ on propicie, como una primera experiencia, el desarrollo de un estilo correcto de escribir matem´ aticas.

Morfismos

Editorial Guidelines “Morfismos” is the journal of the students of the Mathematics Department of CINVESTAV. One of its main objectives is for students to acquire experience in writing mathematics. Morfismos appears twice a year. Publication of papers is not restricted to students of CINVESTAV; we want to encourage students in Mexico and abroad to submit papers. Mathematics research reports or summaries of bachelor, master and Ph.D. theses will be considered for publication, as well as invited contributed papers by researchers. Papers submitted should be original, either in the results or in the methods. The Editors will assign as referees well–established mathematicians. Even though Morfismos is a refereed journal, the papers will be considered as preliminary versions which could later appear in other mathematical journals. If you have any suggestions about the journal, let the Editors know and we will gladly study the possibility of implementing them. We expect this journal to foster, as a preliminary experience, the development of a correct style of writing mathematics.

Morfismos


Contenido The vanishing discount approach to average reward optimality: the strongly and the weakly continuous cases Tom´ as Prieto-Rumeau and On´esimo Hern´ andez-Lerma . . . . . . . . . . . . . . . . . . . 1

V´ertices simpliciales y escalonabilidad de grafos Roberto Cruz y Mario Estrada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Asymptotic normality of average cost Markov control processes Armando F. Mendoza-P´erez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53



Morfismos, Vol. 12, No. 2, 2008, pp. 1–15

The vanishing discount approach to average reward optimality: the strongly and the weakly continuous cases ∗ Toma´s Prieto-Rumeau

On´esimo Hern´andez-Lerma

Abstract We consider a discrete-time stochastic dynamic programming model and we propose conditions under which the limit of discount optimal policies, as the discount factor converges to one, is average optimal. We prove this result under strong and weak continuity conditions and, moreover, we relax the usual value boundedness condition on the relative values of the optimal discounted reward.

2000 Mathematics Subject Classification: 93E20, 90C40. Keywords and phrases: dynamic programming, vanishing discount, average optimality.

1

Introduction

The basic problem dealt with in this paper is the existence of control policies π that maximize the long-run expected average reward # ! T −1 " 1 r(xt , π(xt )) (1) v(x, π) := lim inf Eπx T →∞ T t=0

for every initial state x0 = x. (The underlying controlled system is a fairly general discrete-time stochastic control process described in Section 2; see (6).) Among the several known techniques to analyze this problem, the most common is the vanishing discount approach, which can be traced back to Taylor [16]. It is so-named because it is based on ∗

This research was partially supported by CONACyT Grant 45693-F.

1


2

T. Prieto-Rumeau and O. Hern´andez-Lerma

the convergence as ρ ↑ 1 (0 < ρ < 1) of ρ-discounted optimal reward policies. To state this more precisely, we need some notation. For each discount factor ρ ∈ (0, 1), let !∞ # " (2) vρ (x, π) := Eπx ρt r(xt , π(xt )) t=0

be the expected discounted reward of the admissible control policy π ∈ Π (see Section 2) when the initial state is x0 = x. The optimal ρdiscounted reward function is defined as (3)

vρ (x) := sup vρ (x, π) π∈Π

for every state x. For a given fixed state x′ , consider the relative value function uρ (x) := vρ (x) − vρ (x′ ).

This function is one of the key tools in the vanishing discount approach. To obtain the convergence of ρ-discount optimal policies to average optimal policies as ρ ↑ 1, it was assumed in [16] that uρ was uniformly bounded, that is, there exists a constant L such that |uρ (x)| ≤ L for every state x and 0 < ρ < 1. This condition was later relaxed to the following weaker value boundedness condition: there exists a constant L and a function m such that (4)

−m(x) ≤ uρ (x) ≤ L

for every state x and 0 < ρ < 1; see, e.g., [2, Assumption A1], [5, Assumption 4.1], [12, Definition 2.1] or [15]. In this paper, we further relax (4) and assume the existence of a function m (satisfying appropriate hypotheses) such that (5)

−m(x) ≤ uρ (x) ≤ m(x)

for every x and 0 < ρ < 1. Such a condition can also be found in e.g. [3, Lemma 4.5], [4, Assumption 3.3] or [7, Lemma 10.4.2]. Relaxing (4) to (5) is indeed a relevant issue because (4) is, in fact, a fairly restrictive condition. For instance, to obtain (4), it is assumed in [12] that the reward function r is bounded. Moreover, condition (4) excludes the case


The vanishing discount approach to average reward optimality

3

of an unbounded utility function (see the comment after Assumption 5.3 in [12, p. 1423]). Also, in Section 4 of this paper, we describe a control model for which (5) holds, whereas (4) does not. Summarizing, the goal of this paper is to give conditions on the controlled process that, together with the condition (5), ensure that the limit of ρ-discount optimal policies, as ρ ↑ 1, is average optimal. The basic control model is described in Section 2. In Section 3, we consider two different sets of hypotheses, namely, strong and weak continuity conditions, depending on the corresponding strong or weak continuity of the control system’s transition function. Also in Section 3, we state our main results: Theorem 3.10 and Corollary 3.12, in which we mention several particular cases of interest. Finally, we present an example in Section 4, and our conclusions are stated in Section 5.

2

The control model

The formulation of the controlled process and the notation is mainly drawn from [12]. We assume that the state space S and the action set A are Borel spaces (that is, measurable subsets of complete and separable metric spaces). Let Γ be a nonempty set-valued function from S to A. For each x ∈ S, the corresponding set of feasible control actions is Γ(x) ⊆ A. The family of feasible state-action pairs is denoted by K, i.e., K := {(x, a) ∈ S × A : a ∈ Γ(x)}, which is assumed to be a measurable subset of S × A. (In this paper, measurability is always referred to the Borel σ-algebra.) We consider a sequence {ξt }t≥0 of i.i.d. random variables from a given probability space (Ω, F, P) to (Z, Z) with common distribution ν. Let h : K × Z → S be a measurable function. We assume that the state of the system is updated according to the function h, meaning that if the action a ∈ Γ(x) is chosen at x ∈ S and the value of the random perturbation is ξ, then the next state of the system is h(x, a, ξ) ∈ S. We suppose that the reward function is the measurable real-valued mapping r : K → R. Let Π be the family of measurable functions π : S → A such that π(x) ∈ Γ(x) for every x ∈ S. (We suppose that Π is nonempty.) We call π ∈ Π a deterministic stationary policy. For each π ∈ Π and every


4

T. Prieto-Rumeau and O. Hern´andez-Lerma

initial state x0 ∈ S independent of {ξt }t≥0 , xt+1 = h(xt , π(xt ), ξt ) for t = 0, 1, . . .

(6)

is a Markov process and it stands for the state of the system under the policy π. The corresponding expectation operator is denoted by Eπx0 . Although larger classes of policies may be considered, it is well known that for the control problem we are dealing with Π is a “sufficient” class of policies — see [6, Chapter 4] or [7, Chapter 8], for instance. Given an admissible policy π ∈ Π and an initial state x ∈ S, the corresponding long-run average reward and expected discounted reward are defined as in (1) and (2), respectively. Given a discount factor 0 < ρ < 1, we say that π ∈ Π is ρ-discount optimal if vρ (x, π) = vρ (x) for every x ∈ S (recall (3)). Similarly, π ∈ Π is average reward optimal if v(x, π ∗ ) = sup v(x, π) ∀ x ∈ S. π∈Π

3

Main results

As already mentioned, we will consider two different sets of hypotheses, which we label as strong and weak continuity assumptions.

The strongly continuous case We state the assumptions we make on our control model. First, we have the following Lyapunov-like condition. Assumption 3.1 There exists a measurable function w : S → [1, ∞), and constants 0 < β < 1 and b > 0 such that ! w(h(x, a, ξ))ν(dξ) ≤ βw(x) + b ∀ (x, a) ∈ K. Z

The next assumption introduces some usual continuity and compactness requirements. We note that the function w in Assumptions 3.2 and 3.4 is taken from Assumption 3.1. Assumption 3.2 (i) For every x ∈ S, the set Γ(x) is compact. (ii) The reward function r(x, a) is upper semicontinuous on A(x) for every x ∈ S. In addition, there exists a constant M such that |r(x, a)| ≤ M w(x)

∀ (x, a) ∈ K.


The vanishing discount approach to average reward optimality

(iii) The function (x, a) !→

!

5

w(h(x, a, ξ))ν(dξ)

Z

is continuous on A(x) for every x ∈ S. (iv) Strong continuity. For every bounded and measurable ζ : S → R, the function ! (x, a) !→ ζ(h(x, a, ξ))ν(dξ) Z

is continuous on A(x) for every x ∈ S. Remark 3.3 (The additive-noise case) The strong continuity condition is satisfied, for instance, when S = Z = R, h(x, a, ξ) = g(x, a) + ξ, where g is continuous on A(x) for each fixed x ∈ S, and, in addition, ν has an almost everywhere continuous bounded density with respect to the Lebesgue measure. This includes, of course, the linear case in which g(x, a) = k1 x + k2 a for some constants k1 , k2 . Finally, we state the value boundedness condition. Assumption 3.4 There exists a state x′ ∈ S and a constant M ′ > 0 such that sup |vρ (x) − vρ (x′ )| ≤ M ′ w(x) ∀ x ∈ S. 0<ρ<1

The weakly continuous case Among the hypotheses made so far on the control model, the most restrictive one is the strong continuity condition in Assumption 3.2(iv). Under additional appropriate conditions, strong continuity can be relaxed to weak continuity. To this end, first, the “measurability” of w in Assumption 3.1 is replaced with “continuity”. Assumption 3.5 There exists a continuous function w : S → [1, ∞), and constants 0 < β < 1 and b > 0 such that ! w(h(x, a, ξ))ν(dξ) ≤ βw(x) + b ∀ (x, a) ∈ K. Z


6

T. Prieto-Rumeau and O. Hern´andez-Lerma

In Assumptions 3.6 and 3.8 below, the function w is taken from Assumption 3.5. Assumption 3.6 (i) The function Γ : S → 2A is upper semicontinuous and compact-valued. (ii) The reward function r is upper semicontinuous on K and, moreover, there exists a constant M > 0 such that |r(x, a)| ≤ M w(x) (iii) The function (x, a) %→ is continuous on K.

!

∀ (x, a) ∈ K.

w(h(x, a, ξ))ν(dξ)

Z

(iv) Weak continuity. The function ! ζ(h(x, a, ξ))ν(dξ) (x, a) %→ Z

is continuous on K for every bounded and continuous ζ : S → R. Remark 3.7 The weak continuity assumption is satisfied, for instance, if the function h(x, a, ξ) is continuous on K for each ξ ∈ Z. We introduce some notation. Let Bw (S) be the family of measurable functions ζ : S → R with finite w-norm, that is, ||ζ||w := sup{|ζ(x)|/w(x)} < ∞. x∈S

Assumption 3.8 The controlled process is w-uniformly ergodic on Π; that is, for each π ∈ Π, the Markov process (6) has a unique invariant probability measure µπ on S and, in addition, there exist constants R > 0 and 0 < α < 1 such that for every x ∈ S, ζ ∈ Bw (S) and t ≥ 0 " " ! " π " ζ(y)µπ (dy)"" ≤ w(x)||ζ||w Rαt . sup ""Ex [ζ(xt )] − π∈Π

S

In the weakly continuous case, we do not need to impose a value boundedness condition because, in fact, Assumption 3.8 implies Assumption 3.4 (the proof is easy; see, e.g., Lemma 4.5 in [3] or Lemma


The vanishing discount approach to average reward optimality

7

10.4.2 in [7]). A sufficient condition for Assumption 3.8 is proposed in [7, Proposition 10.2.5]. In what follows, we will suppose that either the Assumptions 3.1, 3.2 and 3.4 or the Assumptions 3.5, 3.6 and 3.8 hold. In either case, we know from the results in [7, Chapter 8] that the optimal ρ-discounted reward is the unique solution in Bw (S) of the discounted reward optimality equation: # ! " vρ (h(x, a, ξ))ν(dξ) ∀ x ∈ S. (7) vρ (x) = max r(x, a) + ρ a∈Γ(x)

Z

In addition, a policy ∈ Π is ρ-discount optimal if and only if π ∗ (x) attains the maximum in (7) for every x ∈ S, i.e., " ∗ vρ (h(x, π ∗ (x), ξ))ν(dξ) ∀ x ∈ S. (8) vρ (x) = r(x, π (x)) + ρ π∗

Z

The vanishing discount approach to average reward optimality is related to the following definition of limit and accumulation policies. Definition 3.9 Given a policy π ∗ ∈ Π and a sequence {πk }k∈N in Π, we say that (i) {πk }k∈N converges to π if limk πk (x) = π(x) for every x ∈ S; (ii) π ∗ is an accumulation policy of {πk }k∈N if, for every x ∈ S, there exists a subsequence {kx } such that πkx (x) → π(x); (iii) {πk }k∈N converges continuously to π if limk πk (xk ) = π(x) for every x ∈ S and every sequence xk → x. The concept of accumulation policy in Definition 3.9(ii) comes from [13]. Continuous convergence and its applications to stochastic dynamic programming are analyzed in [10]. Next, we prove our main result, which states the relation between average reward optimal policies and the limit of discount optimal policies. The proof of this result, Theorem 3.10, follows the same arguments needed to obtain the so-called average reward optimality inequality [7, Theorem 10.3.1], although the proof is focused on the analysis of the limit of discount optimal policies. Theorem 3.10 Let {ρk }k∈N , with ρk ↑ 1, be a sequence of discount factors, and let πk ∈ Π, for every k ∈ N, be a ρk -discount optimal policy. Then the following holds:


8

T. Prieto-Rumeau and O. Hern´andez-Lerma

(i) If Assumptions 3.1, 3.2 and 3.4 are satisfied and {πk } converges to π ∗ ∈ Π, then π ∗ is an average reward optimal policy; (ii) If Assumptions 3.5, 3.6 and 3.8 are satisfied and {πk } converges continuously to π ∗ ∈ Π, then π ∗ is an average reward optimal policy. Proof: From Assumption 3.1 or 3.5, an induction argument (see, e.g., [7, Lemma 10.4.1]) gives (9)

Eπx [w(xt )] ≤ β t w(x) +

(1 − β t ) (1 − β)b

∀ π ∈ Π, x ∈ S, t ≥ 0.

Therefore, by Assumption 3.2(ii) or 3.6(ii), we have Eπx |r(xt , π(xt ))| ≤ M β t w(x) +

M (1 − β t ) , (1 − β)b

so that supρ∈(0,1) |(1 − ρ)vρ (x′ )| is finite, with x′ ∈ S as in Assumption 3.4. Thus g := lim inf (1 − ρk )vρk (x′ ) k→∞

is well defined. Our proof now proceeds in two steps. In step one, we prove that g ≥ sup v(x, π) ∀ x ∈ S. π∈Π

In step two, we show that π ∗ satisfies g ≤ v(x, π ∗ ) ∀ x ∈ S. Average reward optimality of π ∗ will then follow. Step one. By definition of uρ (in Section 1), a simple calculation shows that the discounted reward optimality equation (7) can be written in the equivalent form: # ! " (10) (1 − ρ)vρ (x′ ) + uρ (x) = max r(x, a) + ρ uρ (h(x, a, ξ))ν(dξ) a∈Γ(x)

Z

for every x ∈ S. Consider now a subsequence {k ′ } of {k} such that lim (1 − ρk′ )vρk′ (x′ ) = g.

k′ →∞


The vanishing discount approach to average reward optimality

9

Let u := lim inf k′ uρk′ and note that u is in Bw (S). Now, by (10), for the sequence {ρk′ } and every (x, a) ∈ K we have ! ′ uρk′ (h(x, a, ξ))ν(dξ). (1 − ρk′ )vρk′ (x ) + uρk′ (x) ≥ r(x, a) + ρk′ Z

Taking the lim inf k′ →∞ in this inequality and using Fatou’s lemma (which indeed applies as a consequence of our assumptions), we obtain ! u(h(x, a, ξ))ν(dξ) ∀ (x, a) ∈ K. (11) g + u(x) ≥ r(x, a) + Z

Iteration of (11) yields that, for every initial state x ∈ S, any policy π ∈ Π and t ≥ 0, g ≥ Eπx [r(xt , π(xt ))] + Eπx [u(xt+1 ) − u(xt )]. Summing up these inequalities for t = 0, . . . , T − 1 and then dividing by T yields $ " T −1 # Eπ [u(xT )] − u(x) 1 . r(xt , π(xt )) + x g ≥ Eπx T T t=0

Letting T → ∞, recalling that u ∈ Bw (S) and using (9), we obtain g ≥ v(x, π) and, therefore, (12)

g ≥ sup v(x, π) ∀ x ∈ S. π∈Π

This completes step one. Step two. Since πk is a ρk -discount optimal policy, from (8) and (10) we have ! (1 − ρ)vρk (x′ ) + uρk (x) = r(x, πk (x)) + ρk uρk (h(x, πk (x), ξ))ν(dξ) Z

for every k ∈ N and x ∈ S. Consequently, for every ε > 0 and large enough k, we have ! uρk (h(x, πk (x), ξ))ν(dξ) (13) g − ε + uρk (x) ≤ r(x, πk (x)) + ρk Z

for every x ∈ S.


10

T. Prieto-Rumeau and O. Hern´andez-Lerma

Suppose now that the Assumptions 3.1, 3.2 and 3.4 are satisfied. Then, taking the lim sup in (13), recalling that r(x, ·) is upper semicontinuous and by the extension of Fatou’s lemma [7, Lemma 8.3.7], we obtain ! ∗ g − ε + u(x) ≤ r(x, π (x)) + u(h(x, π ∗ (x), ξ))ν(dξ), Z

where u := lim supk uρk ∈ Bw (S). But ε > 0 being arbitrary, the same arguments as in the proof of step one yield that g ≤ v(x, π ∗ ) ∀ x ∈ S, which combined with (12) shows that π ∗ is an average reward optimal policy and, besides, that g is the (constant) optimal average reward. This completes the proof of statement (i), that is, under the hypotheses in the strongly continuous case. We now consider the weakly continuous case, which consists of Assumptions 3.5, 3.6 and 3.8. Following [8], we define the generalized lim sup of the sequence uρk as u∗ (x) := sup{lim sup uρk (xk )}, k→∞

where the supremum is taken over the family of sequences {xk } ⊆ S such that xk → x. Let us now go back to (13) and take the lim sup through a sequence xk → x such that lim supk uρk (xk ) ≥ u∗ (x) − ε, so that g − 2ε + u∗ (x) ≤ lim sup r(xk , πk (xk )) k→∞ ! uρk (h(xk , πk (xk ), ξ))ν(dξ). + lim sup k→∞

Z

Then we proceed as in the proof for the strongly continuous case, but this time we take into account that both r and the multifunction Γ are upper semicontinuous. Finally, we apply the Fatou lemma for a generalized lim sup (see [8, Lemma 5] and also [14, Lemma 2.3]) to obtain ! ∗ ∗ (14) g −2ε+u (x) ≤ r(x, π (x))+ u∗ (h(x, π ∗ (x), ξ))ν(dξ) ∀ x ∈ S. Z

This implies, by standard arguments, that v(x, π ∗ ) ≥ g for every x ∈ S. The proof of Theorem 3.10 is complete. !


The vanishing discount approach to average reward optimality

11

Remark 3.11 The second step in the proof of Theorem 3.10 relies on the application of a Fatou-like lemma. For instance, when the usual value boundedness condition holds, then we use the standard Fatou lemma because the relative value function uρ is bounded above; see (4). Under the strong continuity assumptions, we use the Fatou lemma in [7, Lemma 8.3.7], while if the weak continuity conditions hold, then we use the Fatou lemma for a generalized lim sup in [14, Lemma 2.3]. Therefore, the assumptions we make on the control model heavily depend on the hypotheses needed for the corresponding Fatou lemma and, similarly, the kind of results we reach (statements (i) and (ii) in Theorem 3.10) also depend on the kind of Fatou lemma that is applied. We specialize Theorem 3.10 to the following important particular cases. Corollary 3.12 Suppose that {ρk }k∈N is a sequence of discount factors such that ρk ↑ 1 and let πk ∈ Π, for every k ∈ N, be a ρk -discount optimal policy. (i) Under the strong continuity conditions (Assumptions 3.1, 3.2 and 3.4), if for every x ∈ S the function ρ #→ uρ (x) is monotone (either increasing or decreasing), then any accumulation policy of {πk }k∈N is average reward optimal. (ii) If the state space S is denumerable, then under either the strong or the weak continuity conditions, any accumulation policy of {πk } is average reward optimal. The condition in Corollary 3.12(i) can be interpreted as follows: the expected discounted reward grows faster for any x ∈ S than for x′ ∈ S as ρ ↑ 1, and it is satisfied, for instance, in the consumption-investment model in [6, Section 3.6]; see also [1].

4

An example

In this section we give an example of a control model that satisfies (5) but does not satisfy the value boundedness condition (4). The following inventory system with permitted backlog is based on the model analyzed in [17]. The state space and the action set are S = A = R. The distribution ν is supported on [0, ∞), it satisfies the


12

T. Prieto-Rumeau and O. Hern´andez-Lerma

conditions in Remark 3.3, and we assume that its expectation equals one. Furthermore, we suppose that there exists some δ > 0 such that ! ∞ eδξ ν(dξ) < ∞. 0

(Note that, for instance, the mean one exponential distribution satisfies these hypotheses.) Fix a constant K > 1/2 and let ! ∞ 1 e−δξ ν(dξ). 0 < λ < − log δ 0 The action sets Γ(x) are the intervals [−x, max{−2x, −x + K}] for x ≤ 0 and [−x, max{λ, −x + K}]

for x > 0.

The system’s transition function h is given by h(x, a, ξ) = x+a−ξ. The cost function is c(x, a) = (x + a)2 − a (cf. [17, Equation (3.1)]). Finally, let w(x) = eδ|x| for x ∈ R. This control model satisfies the Assumptions 3.1 and 3.2. Given a discount factor 0 < ρ < 1, a direct calculation shows that the optimal ρ-discounted cost function (recall that we are minimizing a cost) is (ρ + 1)2 ∀ x ∈ R, vρ (x) = x − 4(1 − ρ) and the optimal ρ-discount policy is

1 πρ (x) = −x + (1 − ρ) ∀ x ∈ R. 2 Hence, the value boundedness condition (4) does not hold, whereas (5) (or Assumption 3.4) is satisfied. Moreover, for every x ∈ R, πρ (x) converges to −x as ρ ↑ 1. Therefore, by Theorem 3.10(i), the policy π(x) = −x, for x ∈ R, is average cost optimal. Further, from the proof of Theorem 3.10 we also obtain that the minimal average cost is −1 = lim (1 − ρ)vρ (x). ρ↑1


The vanishing discount approach to average reward optimality

5

13

Concluding remarks

In the previous sections, we have considered a fairly general discretetime stochastic control model and, under two different sets of hypotheses (strong and weak continuity), we have proved that the limit of ρdiscount optimal policies, as the discount factor ρ ↑ 1, is a long-run average reward optimal policy. The main contribution of this paper is to relax the usual value boundedness assumption on the relative value funtion (4) and, instead, assume the weaker condition (5). We have illustrated our results with the generalized inventory system in Section 4. Some important issues, however, remain open. In Theorem 3.10(i) it is assumed that the discount optimal policies {πk } converge to some π ∗ , and then it is proved that π ∗ is average reward optimal. It would be interesting to know whether this convergence can be relaxed, and thus obtain a result like that in Corollary 3.12(i) under general assumptions. To this end, results on the existence of measurable selectors would be involved. Also, it would be interesting to check whether the continuous convergence in Theorem 3.10(ii) can be relaxed to (usual) convergence, perhaps by strengthening the hypotheses on the control model. Tom´ as Prieto-Rumeau Departamento de Estad´ıstica, Facultad de Ciencias, UNED, Senda del Rey 9, 28040, Madrid, Spain, tprieto@ccia.uned.es

On´esimo Hern´ andez-Lerma Departamento de Matem´ aticas, CINVESTAV-IPN, 14-470, M´exico D.F. 07000, M´exico, ohernand@math.cinvestav.mx

References [1] Cruz-Su´arez H. D., A stochastic consumption-investment problem with unbounded utility function, Morfismos 4 (2000), 19–30. [2] Dutta P. K., What do discounted optima converge to? A theory of discount rate asymptotics in economic models, J. Econom. Theory 55 (1991), 64–94. [3] Gordienko E.; Hern´andez-Lerma O., Average cost Markov control processes with weighted norms: existence of canonical policies, Appl. Math. (Warsaw) 23 (1995), 199–218.


14

T. Prieto-Rumeau and O. Hern´andez-Lerma

[4] Guo X. P.; Zhu Q. X., Average optimality for Markov decision processes in Borel spaces: a new condition and approach, J. Appl. Prob. 43 (2006), 318–334. [5] Hern´andez-Lerma O.; Lasserre J. B., Average cost optimal policies for Markov control processes with Borel state space and unbounded costs, Systems Control Lett. 15 (1990), 349–356. [6] Hern´andez-Lerma O.; Lasserre J. B., Discrete-Time Markov Control Processes: Basic Optimality Criteria, Springer, New York, 1996. [7] Hern´andez-Lerma O.; Lasserre J. B., Further Topics on DiscreteTime Markov Control Processes, Springer, New York, 1999. [8] Ja´skiewicz A.; Nowak A. S., On the optimality equation for average cost Markov control processes with Feller transition probabilities, J. Math. Anal. Appl. 316 (2006), 495–509. [9] Kawaguchi K.; Morimoto H., Long-run average welfare in a pollution accumulation model, J. Econom. Dynam. Control 31 (2007), 703–720. [10] Langen H. J., Convergence of dynamic programming models, Math. Oper. Res. 6 (1981), 493–512. [11] Morimoto H.; Fujita Y., Ergodic control in stochastic manufacturing systems with constant demand, J. Math. Anal. Appl. 243 (2000), 228–248. [12] Nishimura K.; Stachurski J., Stochastic optimal policies when the discount rate vanishes, J. Econom. Dynam. Control 31 (2007), 1416–1430. [13] Sch¨al M., Conditions for optimality and for the limit of n-stage optimal policies to be optimal, Z. Wahrs. verw. Gerb. 32 (1975), 179–196. [14] Sch¨al M., Average optimality in dynamic programming with general state space, Math. Oper. Res. 18 (1993), 163–172. [15] Sennott L. I., A new condition for the existence of optimal stationary policies in average cost Markov decision processes, Oper. Res. Lett. 5 (1986), 17–23.


The vanishing discount approach to average reward optimality

15

[16] Taylor H. M., Markovian sequential replacement processes, Ann. Math. Stat. 36 (1965), 1677–1694. [17] Vega-Amaya O.; Montes-de-Oca R., Application of average dynamic programming to inventory systems, Math. Methods Oper. Res. 47 (1998), 451–471.



Morfismos, Vol. 12, No. 2, 2008, pp. 17–32

V´ertices simpliciales y escalonabilidad de grafos Roberto Cruz

Mario Estrada

Resumen Dado un grafo simple no dirigido G, se le asocia un complejo simplicial ∆G cuyas caras corresponden a los conjuntos independientes de G. Van Tuyl y Villarreal definieron un grafo G como escalonable si el complejo simplicial asociado ∆G es escalonable en el sentido no puro de Bj¨orner y Wachs. Estos autores demostraron que todos los grafos triangulados son escalonables y que los grafos bipartidos escalonables son precisamente los grafos bipartidos secuencialmente Cohen-Macaulay. En el presente art´ıculo se prueba que el concepto de v´ertice simplicial de un grafo permite, no solo demostrar estos resultados, sino dar otras condiciones necesarias y suficientes para la escalonabilidad de un grafo. Adem´as se demuestra que todo grafo simplicial es escalonable y que todo grafo arcocircular que contenga al menos un v´ertice simplicial es escalonable.

2000 Mathematics Subject Classification:13F55, 13D02, 05C38, 05C75. Keywords and phrases: grafos escalonables, v´ertices simpliciales, secuencialmente Cohen-Macaulay, grafos simpliciales, grafos arco-circulantes.

1

Introducci´ on

Sea G = (VG , EG ) un grafo simple (sin lazos ni aristas mu ´ltiples) no dirigido, VG = {x1 , . . . , xn } su conjunto de v´ertices y EG su conjunto de aristas. Identificando cada v´ertice xi con la variable xi en el anillo de polinomios R = k[x1 , . . . , xn ] sobre el campo k, se le asocia a G un ideal de monomios libres de cuadrados I(G) = ({xi xj | {xi , xj } ∈ EG }). El ideal I(G) se denomina ideal de aristas del grafo G. Utilizando la correspondencia de Stanley - Reisner, se le asocia al grafo G el complejo simplicial ∆G tal que I∆G = I(G), es decir que el ideal de StanleyReisner del complejo simplicial coincida con el ideal de aristas del grafo. 17


18

Roberto Cruz y Mario Estrada

En este caso, las caretas de ∆G son los conjuntos independientes o conjuntos estables maximales de G. Se dice que el grafo G es escalonable si su complejo simplicial asociado ∆G es escalonable. Esta definici´on fue introducida por Van Tuyl y Villarreal [15] y se utiliza la definici´on de escalonabilidad no pura introducida por Bj¨orner y Wachs [1]. Para los grafos, la generalizaci´on natural de la propiedad Cohen-Macaulay es la de ser secuencialmente Cohen-Macaulay. Un teorema de Stanley [13] afirma que la escalonabilidad implica la propiedad de ser secuencialmente Cohen-Macaulay. En el mencionado trabajo de Van Tuyl y Villarreal [15] se prueban los siguientes teoremas: Teorema 1.1.1 [15, Teorema 2.12] Sea G un grafo triangulado. Entonces G es escalonable. Teorema 1.1.2 [15, Teorema 3.8] Sea G un grafo bipartido. Entonces G es escalonable si y solo si G es secuencialmente Cohen-Macaulay. El argumento central en la prueba del teorema 1.1.1 es la existencia de un v´ertice x en un grafo triangulado G cuya vecindad induce un subgrafo completo [15, Lema 2.11]. Por otra parte, la demostraci´on del teorema 1.1.2 se basa en que todo grafo bipartido, conexo y secuencialmente Cohen-Macaulay tiene un v´ertice con grado 1 y en la siguiente afirmacion que da condiciones necesarias y suficientes para la escalonabilidad de un grafo que contiene un v´ertice de grado 1: Teorema 1.1.3 [15, Teorema 2.9] Sea G un grafo y sean x1 ,y1 dos v´ertices adyacentes de G con deg(x1 ) = 1. Sean G1 = G\ ({x1 } ∪ NG (x1 )) y G2 = G\ ({y1 } ∪ NG (y1 )) , entonces G es escalonable si y solo si G1 y G2 son escalonables. Curiosamente la introducci´on del concepto de v´ertice simplicial permite sustituir las condiciones del anterior teorema por la condici´on m´as general de que el grafo G contenga un v´ertice simplicial. Un v´ertice x de un grafo G se denomina simplicial si su vecindad NG (x) induce un subgrafo completo. En la Secci´on 2 se demuestra el siguiente teorema que generaliza el teorema 1.1.3 de Van Tuyl y Villarreal.


V´ertices simpliciales y escalonabilidad de grafos

19

Teorema 1.1.4 (Teorema 2.1.13) Sea G un grafo, x1 un v´ertice simplicial, NG (x1 ) = {x2 , . . . , xr } y Gi = G\ ({xi } ∪ NG (xi )), para i = 1, . . . , r. G es escalonable si y solo si Gi es escalonable para todo i = 1, . . . , r. Este resultado es ideal para establecer la escalonabilidad de grafos que tengan al menos un v´ertice simplicial y ofrece otra demostraci´on para el teorema de Van Tuyl y Villarreal sobre la escalonabilidad de los grafos triangulados y para el teorema de los mismos autores sobre la equivalencia para grafos bipartidos entre la escalonabilidad y la condici´on de ser secuencialmente Cohen-Macaulay. Este u ´ltimo teorema puede extenderse a los grafos que contienen al menos un v´ertice simplicial. Teorema 1.1.5 (Corolario 2.1.15) Sea G un grafo que contiene un v´ertice simplicial. Entonces G es escalonable si y solo si es secuencialmente Cohen - Macaulay. En la Secci´on 3 se aplica la multiplicaci´on de v´ertices simpliciales para obtener nuevas condiciones necesarias y suficientes para la escalonabilidad de un grafo. Dado un grafo G y x un v´ertice simplicial de G, el grafo G◦x se obtiene mediante la multiplicaci´on del v´ertice x, agregando un nuevo v´ertice x′ que se conecta a todos los v´ertices de la vecindad de x. En el trabajo se prueba el siguiente Teorema 1.1.6 (Teorema 3.1.17) Sea G un grafo, x un v´ertice simplicial de G y G ◦ x el grafo obtenido por la multiplicaci´ on del v´ertice x. Entonces G es escalonable si y solo si G ◦ x es escalonable. En la secci´on 4 y final se establece la escalonabilidad de los grafos simpliciales y de los grafos arco-circulares que tienen un v´ertice simplicial. En un grafo simplicial cada v´ertice es un v´ertice simplicial o es adyacente a un v´ertice simplicial. Teorema 1.1.7 (Teorema 4.1.24)Sea G un grafo simplicial, entonces G es escalonable. Finalmente se demuestra el siguiente teorema sobre la escalonabilidad de los grafos arco-circulares: Teorema 1.1.8 (Teorema 4.1.30) Sea G un grafo arco-circular que tiene al menos un v´ertice simplicial. Entonces G es escalonable.


20

Roberto Cruz y Mario Estrada

2

Escalonabilidad de grafos que contienen v´ ertices simpliciales

En esta secci´on se generaliza el teorema de Van Tuyl y Villarreal [15, Teorema 2.9] sobre las condiciones necesarias y suficientes para la escalonabilidad de un grafo, reemplazando la condici´on sobre la existencia de un v´ertice de grado 1, por la existencia de un v´ertice simplicial. Definici´ on 2.1.9 Se dice que un complejo simplicial ∆ es escalonable si sus caretas pueden ordenarse F1 , . . . , Fs de forma tal que para todo 1 ≤ umero l ∈ {1, . . . , j − 1} tal i < j ≤ s, existe un v´ertice v ∈ Fj \Fi y un n´ que Fj \Fl = {v}. La secuencia F1 , . . . , Fs se denomina escalonamiento de ∆. Aqui se utiliza la definici´on de escalonabilidad ’no pura’ introducida por Bj¨oner and Wachs [1]. Se dir´a que ∆ es escalonable puro si todas las caretas tienen la misma dimensi´on. Definici´ on 2.1.10 Sea G un grafo simple no dirigido y ∆G su complejo simplicial asociado. Se dice que G es un grafo escalonable si ∆G es un complejo simplicial escalonable. La anterior definici´on fue introducida por Van Tuyl y Villarreal [15]. En el referido art´ıculo se demuestra que todo grafo triangulado es escalonable [15, Teorema 2.12]. Un grafo se denomina triangulado si todo ciclo de longitud estrictamente mayor que 3 posee una cuerda, es decir, una arista entre dos v´ertices no consecutivos del ciclo. La demostraci´on se basa en el lema de Dirac [4] que asegura que todo grafo triangulado posee un v´ertice, denominado simplicial, cuya vecindad induce un subgrafo completo o clique. Dado un subconjunto S ⊂ VG , por G\S se denota el grafo formado a partir de G eliminando todos los v´ertices de S y todas las aristas incidentes en cada v´ertice de S. Si x es un v´ertice de G, por NG (x) se denota la vecindad de x, es decir, el conjunto de todos los v´ertices de G que son adyacentes a x. Definici´ on 2.1.11 Sea G un grafo simple no dirigido. Un v´ertice x de G se denomina simplicial si su vecindad NG (x) induce un subgrafo completo de G. Dado un grafo G y S ⊂ VG , denotemos por ⟨S⟩ el subgrafo inducido por el conjunto de v´ertices S. Notemos que si x es un v´ertice simplicial


V´ertices simpliciales y escalonabilidad de grafos

21

de G, el subgrafo inducido ⟨{x} ∪ NG (x)⟩ es un clique maximal, adem´as es el u ´ nico clique maximal que contiene a x. El siguiente teorema de Dirac afirma que todo grafo triangulado tiene un v´ertice simplicial. Teorema 2.1.12 (Dirac, [4]) Todo grafo triangulado G tiene un v´ertice simplicial. Adem´ as, si G no es un clique, entonces tiene dos v´ertices simpliciales no adyacentes entre si. En el teorema 2.9 de [15] el v´ertice x1 , al ser de grado 1, es un v´ertice simplicial ya que este v´ertice junto con su vecindad induce un subgrafo completo maximal que adem´as es el u ´nico que contiene a x1 (la arista {(x1 , y1 }). Este hecho y la utilizaci´on de los v´ertices simpliciales en la demostraci´on de la escalonabilidad de los grafos triangulados sugieren la siguiente generalizaci´on: Teorema 2.1.13 Sea G un grafo, x1 un v´ertice simplicial de G y su vecindad NG (x1 ) = {x2 , . . . , xr }. Sea Gi = G\ ({xi } ∪ NG (xi )) para i = 1, . . . , r. G es escalonable si y solo si Gi es escalonable para todo i = 1, . . . , r. Demostraci´ on: Sea G escalonable. El teorema 2.6 del art´ıculo de Van Tuyl y Villarreal[15], asegura que si G es escalonable y x cualquier v´ertice de G, entonces el grafo G′ = G\ ({x} ∪ NG (x)) es escalonable. Por tanto, los grafos Gi son escalonables. La prueba en la otra direcci´on es practicamente id´entica a la prueba del teorema 2.9 de [15] sobre la escalonabilidad de los grafos triangulados. Sea Gi escalonable y Fi1 , . . . , Fisi un escalonamiento de ∆Gi para cada ´nico i = 1, . . . , r. El subgrafo ⟨{x1 } ∪ NG (x1 )⟩ = ⟨{x1 , . . . , xr }⟩ es el u subgrafo maximal que contiene a x1 . Adem´as cada careta de ∆G , es decir, cada conjunto independiente maximal de G, intersecta a {x1 , . . . , xr } exactamente en un v´ertice. Por el argumento anterior, la lista completa de caretas de ∆G es F11 ∪ {x1 }, . . . , F1s1 ∪ {x1 }; . . . ; Fr1 ∪ {xr }, . . . , Frsr ∪ {xr }. Se demuestra que la lista con ese orden lineal es un escalonamiento de ∆G . Se consideran dos casos: 1. F ′ = Fik ∪ {xi }, F = Fjt ∪ {xj }, i < j. Se tiene que xj ∈ F \F ′ . Adem´as, el conjunto Fjt ∪{x1 } es un conjunto independiente de G, por tanto est´a contenido en una de las caretas de ∆G que contiene a x1 , es decir, existe l, 1 ≤ l ≤ s1 , tal que Fjt ∪ {x1 } ⊂ F1l ∪ {x1 }.


22

Roberto Cruz y Mario Estrada

Denotando por F ′′ = F1l ∪ {x1 }, se tiene que {xj } = F \F ′′ y F ′′ es anterior a F . 2. F ′ = Fik ∪ {xi }, F = Fit ∪ {xi }, k < t. Este caso se demuestra a partir de la escalonabilidad del grafo Gi . ✷ El teorema anterior generaliza el teorema 2.9 de [15] al usar que todo v´ertice de grado 1, es un v´ertice simplicial. Este resultado adem´as puede servir para dar otra demostraci´on de que los grafos triangulados son escalonables [15, Teorema 2.12]. Todo subgrafo inducido de un grafo triangulado es triangulado, adem´as todo grafo triangulado por el lema de Dirac (teorema 2.1.12) o es un clique o contiene dos v´ertices simpliciales. Aplicando la inducci´on en n = |VG | y suponiendo que el v´ertice x1 de G es simplicial, los subgrafos Gi son triangulados al ser subgrafos inducidos de G y son escalonables por la hip´otesis de inducci´on. Por el teorema 2.1.13 el grafo G es escalonable. De igual forma, en la demostraci´on de que la condici´on de un grafo bipartido de ser secuencialmente Cohen-Macaulay implica la escalonabilidad del mismo, [15, Teorema 3.8] se puede utilizar el teorema 2.1.13. Asumiendo que G es bipartido y secuencialmente Cohen-Macaulay y aplicando la inducci´on en el n´ umero de v´ertices, el lema 3.7 de [15] asegura la existencia en G de un v´ertice x1 de grado 1 (es decir, un v´ertice simplicial). Por el teorema 3.3 del mismo art´ıculo los subgrafos G1 = G\ ({x1 } ∪ NG (x1 )) y G2 = G\ ({y1 } ∪ NG (y1 )), donde y1 es el v´ertice adyacente a x1 , son secuencialmente Cohen-Macaulay. Por la hip´otesis de inducci´on estos grafos son escalonables y por el teorema 2.1.13 se obtiene que G es escalonable. El teorema 2.1.13 tambi´en puede usarse para establecer la escalonabilidad de grafos que tengan v´ertices simplicales. Se toma el v´ertice simplicial x1 , se hallan los subgrafos Gi , si alguno de estos no es escalonable, entonces el grafo inicial no es escalonable. Si todos son escalonables entonces el grafo original es escalonable y su escalonamiento puede construirse a partir de los escalonamientos de los subgrafos Gi . a b

c

e

g

d

c

d

f

f

e

b

! ❅ ❅

g !

Los grafos G (a la derecha) y H (a la izquierda)

❅ ❅

a


V´ertices simpliciales y escalonabilidad de grafos

23

Ejemplo 2.1.14 Sean G y H los grafos indicados en la figura anterior. El v´ertice g del grafo G es simplicial y su vecindad es NG (g) = {c, d}. Los grafos Gg , Gc , Gd son escalonables con escalonamientos: ∆Gg = ⟨{a, e}, {a, f }, {b, e}, {b, f }⟩; ∆Gc = ⟨{b, f }⟩; ∆Gd = ⟨{a, e}⟩. Por el teorema 2.1.13 se obtiene que G es escalonable y que ∆G = ⟨{a, e, g}, {a, f, g}, {b, e, g}, {b, f, g}, {b, f, c}, {a, e, d}⟩, es un escalonamiento de G. Por otra parte, el v´ertice a es un v´ertice simplicial del grafo H y su vecindad es NH (a) = {b, c}. El grafo Ha = H\ ({a} ∪ NH (a)) no es escalonable y por el teorema 2.1.13, el grafo H no es escalonable. Van Tuyl y Villarreal demostraron la equivalencia entre la escalonabilidad y la propiedad de ser secuencialmente Cohen - Macaulay para los grafos bipartidos [15, Teorema 3.8]. Como consecuencia del teorema 2.1.13, puede obtenerse un resultado an´alogo para los grafos que contienen al menos un v´ertice simplicial. Corolario 2.1.15 Sea G un grafo que contiene un v´ertice simplicial. Entonces G es escalonable si y solo si es secuencialmente Cohen Macaulay. Demostraci´ on: Si G es escalonable entonces es secuencialmente Cohen - Macaulay seg´ un se deriva de un resultado de Stanley [13]. Sea ahora G secuencialmente Cohen - Macaulay y supongamos que todo grafo secuencialmente Cohen - Macaulay con un n´ umero menor de v´ertices es escalonable. Sea x1 un v´ertice simplicail de G, NG (x1 ) = {x2 , . . . , xr }. Los grafos Gi = G\ ({xi } ∪ NG (xi )) para i = 1, . . . , r, son secuencialmente Cohen - Macaulay [15, Teorema 3.3] y por la hipotesis de inducci´on son escalonables. El teorema 2.1.13 asegura la escalonabilidad del grafo G. ✷

3

Multiplicaci´ on de v´ ertices simpliciales

En esta secci´on se aplica la multiplicaci´on de v´ertices simpliciales a grafos escalonables con el fin de obtener nuevos grafos escalonables. La multiplicaci´on de v´ertices es la clave de la demostraci´on dada por Lov´asz [7] del teorema de los grafos perfectos, que afirma que un grafo


24

Roberto Cruz y Mario Estrada

es perfecto si y solo si lo es su complemento. Un grafo G es perfecto si para todo subgrafo inducido, se cumple que su n´ umero crom´atico es igual a su n´ umero clique. Tanto los grafos bipartidos como los grafos triangulados son grafos perfectos. En la secci´on se utiliza la definici´on de multiplicaci´on de v´ertices dada por Golumbic[6]. Definici´ on 3.1.16 [6] Sea G un grafo, x un v´ertice de G. El grafo G ◦ x se obtiene de G agregando un nuevo v´ertice x′ que se conecta a todos los v´ertices de NG (x). En este caso se dice que el grafo G ◦ x se obtiene por multiplicaci´ on del v´ertice x. Teorema 3.1.17 Sea G un grafo, x un v´ertice simplicial de G y G ◦ x el grafo obtenido por la multiplicaci´ on del v´ertice x. Entonces G es escalonable si y solo si G ◦ x es escalonable. Demostraci´ on: Sea x un v´ertice simplicial de G, x′ el nuevo v´ertice que se conecta a todos los v´ertices de NG (x), G′ = G ◦ x y NG (x) = {x1 , . . . , xr } = NG′ (x′ ). El v´ertice x′ es simplicial en G′ . Para i = 1, . . . , r sea Gi = G\ ({xi } ∪ NG (xi )) = G′i = G′ \ ({xi } ∪ NG′ (xi )) .

Los grafos obtenidos al quitar los v´ertices x′ y x junto con sus vecindades de los respectivos grafos G′ y G cumplen la relaci´on: !

"

G′x′ = G′ \ {x′ } ∪ NG′ (x′ ) = G\ ({x} ∪ NG (x)) ∪ {x} = Gx ∪ {x},

es decir, el grafo G′x′ es el mismo grafo Gx agregandole el v´ertice aislado {x}. Sea G escalonable. Por teorema 2.1.13, los grafos Gx , G1 , . . . , Gr , son escalonables. El grafo Gx ∪ {x} es tambi´en escalonable, basta agregar el v´ertice x a todas las caretas de ∆Gx . Esto significa que los grafos G′x′ , G′1 , . . . , G′r son escalonables y por el teorema 2.1.13, G′ = G ◦ x es escalonable. Sea ahora G′ escalonable. Los grafos G′x′ , G′1 , . . . , G′r son escalonables por el teorema 2.1.13. Notemos que si G′x′ = Gx ∪ {x} es escalonable, entonces Gx es escalonable, basta quitar al v´ertice x de todas las caretas de ∆G′ ′ , pues x aislado. Entonces los grafos Gx , G1 , . . . , Gr son x escalonables y por el teorema 2.1.13, el grafo G es escalonable. ✷ Si G es un grafo que tiene dos v´ertices simpliciales no adyacentes con la misma vecindad, se puede considerar uno de estos v´ertices como multiplicaci´on del otro, por tanto podemos eliminarlo del grafo y analizar la escalonabilidad del grafo reducido.


V´ertices simpliciales y escalonabilidad de grafos

25

La multiplicaci´on de v´ertices simpliciales puede generalizarse agregando m´as de un v´ertice a cada v´ertice simplicial. Definici´ on 3.1.18 Sea G un grafo, S = {x1 , . . . , xr } ⊂ VG un conjunto v´ertices simpliales tales que NG (xi ) ̸= NG (xj ) para i ̸= j y sea h = (h1 , . . . , hr ) un vector de enteros positivos. El grafo H = G ◦ h se obtiene de G por multiplicaci´ on de los v´ertices de S, si por cada v´ertice simplicial xi , i = 1, . . . , r, se agregan a G hi nuevos v´ertices x1i , . . . , xhi i y cada uno de estos v´ertices se conecta a todos los v´ertices de NG (xi ). Corolario 3.1.19 Sea G un grafo y S = {x1 , . . . , xr } ⊂ VG , conjunto de v´ertices simpliciales tales que NG (xi ) ̸= NG (xj ) para i ̸= j y sea h = (h1 , . . . , hr ) un vector de enteros positivos. Entonces G es escalonable si y solo si el grafo H = G ◦ h es escalonable. Demostraci´ on: rema 3.1.17. ✷

Para cada v´ertice xi de S, se aplica hi veces el teo-

Nota 3.1.20 Dado un grafo escalonable G que contiene varios v´ertices simpliciales, el corolario anterior permite obtener nuevos grafos escalonables multiplicando cada uno de los v´ertices simpliciales de G. Si se tiene un escalonamiento de G, ser´ıa conveniente contar con un procedimiento sencillo que permita construir un escalonamiento del grafo multiplicado. La demostraci´ on del teorema 2.1.13 garantiza que si x es un v´ertice simplicial, se puede construir un escalonamiento de ∆G , F1 , . . . , Fs , F1′ , . . . , Fr′ , tal que las caretas F1 , . . . , Fs , en las cuales x est´ a contenido, ocupan las primeras posiciones. Por otra parte, la demostraci´ on del teorema 3.1.17 garantiza que el grafo G ◦ x tiene un escalonamiento que se obtiene agregando el nuevo v´ertice a las caretas F1 , . . . , Fs . Sin embargo, cuando el grafo escalonable H es producto de la multiplicaci´ on de m´ as de un v´ertice simplicial del grafo escalonable G y partiendo de un escalonamiento de G se agregan los nuevos v´ertices a las caretas en las cuales est´ an contenidos los v´ertices simpliciales correspondientes, se puede obtener una lista de caretas que no constituye un escalonamiento de H como se muestra en el ejemplo 3.1.21. Sea G un grafo escalonable, x1 , . . . , xr v´ertices simpliciales de G, h = (h1 , . . . , hr ) un vector de enteros positivos y F1 , . . . , Fs es un escalonamiento de ∆G . El escalonamiento del grafo escalonable H = G ◦ h se puede obtener de la siguiente forma. on de las caretas que contienen a x1 y sea Sea Ft1 . . . , Ftp la subsucesi´ Fr1 . . . , Frq la subsucesi´ on de las caretas que no contienen a x1 donde


26

Roberto Cruz y Mario Estrada

p + q = s. La demostraci´ on del teorema 2.1.13 garantiza que Ft1 . . . , Ftp , Fr1 . . . , Frq es un escalonamiento de ∆G . Sean ahora Ft′1 . . . , Ft′p las caretas obtenidas al agregarles a las caretas Ft1 . . . , Ftp los h1 nuevos v´ertices correspondientes a x1 ; la demostraci´ on del teorema 3.1.17 garantiza que Ft′1 . . . , Ft′p , Fr1 . . . , Frq es un escalonamiento del grafo escalonado H1 = G ◦ (h1 , 0, . . . , 0). Si aplicamos sucesivamente el procedimiento descrito a los grafos H2 = H1 ◦ (0, h2 , 0, . . . , 0), H3 = H2 ◦ (0, 0, h3 , 0, . . . , 0), . . . ,

H = Hr = Hr−1 ◦ (0, 0, . . . , 0, hr )

se obtiene el escalonamiento buscado. Ejemplo 3.1.21 Sean los grafos G y H = G ◦ h, donde S = {x1 , y1 } y h = (1, 2). x1

b

c

a

d

❅ ❅

b

"

c

y2

x2 " ❅ a

d

y1

x1

y1

" ❅

G

" " ✟ y2 ✟ " ✟

H

Es f´ acil ver que el grafo G es escalonable y que ∆G = ⟨{x1 , c, y1 }, {x1 , d}, {a, c, y1 }, {b, y1 }, {b, d}⟩, es un escalonamiento de ∆G . El v´ertice x2 y los v´ertices y2 , y3 del grafo H son producto de la multiplicaci´ on de los v´ertices x1 , y1 respectivamente. Por el corolario anterior el grafo H es escalonable, sin embargo si en el escalonamiento anterior agregamos el v´ertice x2 a las caretas que contienen x1 y los acil ver que la lista v´ertices y2 , y3 a las caretas que contienen y1 , es f´ obtenida ⟨{x1 , x2 , c, y1 , y2 , y3 }, {x1 , x2 , d}, {a, c, y1 , y2 , y3 }, {b, y1 , y2 , y3 }, {b, d}⟩, no es un escalonamiento de ∆G . Como las caretas que contienen x1 ocupan las primeras posiciones, siguiendo la demostraci´ on del teorema 3.1.17 podemos agregar a estas


V´ertices simpliciales y escalonabilidad de grafos

27

caretas el v´ertice x2 , para obtener un escalonamiento del grafo H1 = G ◦ (1, 0) producto de la multiplicaci´ on del v´ertice x1 . El escalonamiento obtenido es: ∆H1 = ⟨{x1 , x2 , c, y1 }, {x1 , x2 , d}, {a, c, y1 }, {b, y1 }, {b, d}⟩. Se puede reorganizar el escalonamiento anterior, tomando todas las caretas que contienen el v´ertice simplicial y1 en su orden y colocandolas en las primeras posiciones ∆H1 = ⟨{x1 , x2 , c, y1 }, {a, c, y1 }, {b, y1 }, {x1 , x2 , d}, {b, d}⟩. Agregando ahora a estas caretas los v´ertices y2 , y3 se obtiene un escalonamiento del complejo simplicial asociado a H = H1 ◦ (0, 2) = G ◦ (1, 2): ∆H = ⟨{x1 , x2 , c, y1 , y2 , y3 }, {a, c, y1 , y2 , y3 }, {b, y1 , y2 , y3 }, {x1 , x2 , d}, {b, d}⟩.

4

Grafos simpliciales y arco-circulares.

En esta secci´on se establece la escalonabilidad de los grafos simpliciales y de los grafos arco-circulares que contienen al menos un v´etice simplicial. Los grafos simpliciales fueron introducidos en [2] y en [3] se estudian varias propiedades de estos grafos que pueden ser establecidas con algoritmos polinomiales. En un grafo simplicial cada v´ertice es simplicial o es adyacente a un v´ertice simplicial. Definici´ on 4.1.22 Dado un grafo G, un clique de G se denomina simplejo si contiene uno o m´ as v´ertices simpliciales. El grafo G se denomina simplicial si cada v´ertice est´ a contenido en un simplejo, es decir, cada v´ertice es simplicial o pertenece a la vecindad de un v´ertice simplicial. Lema 4.1.23 Sea G un grafo simplicial. Para cualquier v´ertice v de G, el grafo Gv = G\ ({v} ∪ NG (v)) es simplicial. Demostraci´ on: Notemos que si G es un grafo simplicial, x un v´ertice simplicial de G y v un v´ertice de G tal que x ∈ / NG (v), entonces x es simplicial en el subgrafo Gv = G\ ({v} ∪ NG (v)). En efecto, al quitar del grafo G el v´ertice v y su vecindad, pueden eliminarse algunos v´ertices del simplejo


28

Roberto Cruz y Mario Estrada

que contiene a x, no obstante x y los vecinos de x que quedan en Gv inducen un simplejo en Gv . Supongamos que dado v un v´ertice cualquiera de G, el grafo Gv = G\ ({v} ∪ NG (v)) no es simplicial. Entonces existe un v´ertice u de Gv tal que u no es simplicial en Gv y no es adyacente a un v´ertice simplicial en Gv . Como G es simplicial pueden darse dos casos: (Caso 1) u es simplicial en G. Como u ∈ / NG (v), por la anterior observaci´on, u es simplicial en Gv , lo cual es una contradicci´on. (Caso 2) u es adyacente a un v´ertice x simplicial en G. Evidentemente x ∈ / NG (v), de lo contrario NG (x) ⊂ NG (v) y entonces u pertenecer´ıa a la vecindad de v. Por la anterior observaci´on, x es simplicial en Gv lo cual es una contradicci´on. ✷ El lema 4.1.23, conjuntamente con el teorema 2.1.13 permiten establecer la escalonabilidad de los grafos simpliciales. Teorema 4.1.24 Sea G un grafo simplicial, entonces G es escalonable. Demostraci´ on: La prueba es por inducci´on en el n´ umero de v´ertices. Sea G un grafo simplicial y supongamos que todo grafo simplicial con menos v´ertices es escalonable. Sea x1 un v´ertice simplicial y NG (x1 ) = {x2 , . . . , xr }. Los grafos Gi = G\ ({xi } ∪ NG (xi )) , son grafos simpliciales por el lema 4.1.23. Por la hip´otesis de inducci´on estos grafos son escalonables. Por el teorema 2.1.13 el grafo G es escalonable. ✷ Si un grafo es escalonable, entonces es secuencialmente Cohen Macaulay seg´ un se deriva del resultado de Stanley [13]. El teorema anterior implica que los grafos simpliciales son secuencialmente CohenMacaulay. Si adem´as un grafo simplicial G es no mezclado, es decir, todos los cubrimientos-v´ertices de G tienen la misma cardinalidad, entonces el teorema 4.1.24 implica que G es escalonable puro y por tanto Cohen-Macaulay. Dado un grafo G y S ⊂ VG , se considera el grafo G∪W (S), obtenido mediante la adici´on de nuevos v´ertices {yi | xi ∈ S} y nuevas aristas llamadas ”bigotes”’(whiskers) {{xi , yi } | xi ∈ S}. Un corolario del teorema anterior es el siguiente teorema de Villarreal[14]; ver tambi´en [12]. Corolario 4.1.25 [12, Teorema 2.1] Sea G un grafo simple y VG su conjunto de v´ertices. Entonces el grafo G ∪ W (VG ) es Cohen-Macaulay.


V´ertices simpliciales y escalonabilidad de grafos

29

Demostraci´ on: Cada uno de los v´ertices agregados es simplicial, por tanto G ∪ W (VG ) es un grafo simplicial y por el teorema 4.1.24 es escalonable. Como G ∪ W (VG ) es no mezclado, entonces G ∪ W (VG ) es Cohen - Macaulay. ✷ Los grafos bien cubiertos, fueron introducidos en [8] y han sido extensamente estudiados [9]. Definici´ on 4.1.26 Un grafo G es bien cubierto si todo conjunto independiente maximal es un conjunto independiente m´ aximo. La clase de los grafos bien cubiertos coincide con la clase de los grafos no mezclados pues si todos los conjuntos independientes maximales de un grafo tienen la misma cardinalidad, los cubrimientos v´ertices minimales tambien tienen la misma cardinalidad. En [10], Prisner et al. caracterizan los grafos simpliciales y triangulados que son bien cubiertos. Teorema 4.1.27 [10, Teorema 1] Un grafo G es simplicial y bien cubierto si y solo si cada v´ertice v de G pertenece exactamente a un simplejo. Teorema 4.1.28 [10, Teorema 2] Sea G un grafo triangulado. Entonces G es bien cubierto si y solo si cada v´ertice v de G pertenece exactamente a un simplejo. A´ un cuando la clase de los grafos simpliciales y la clase de los grafos triangulados no son comparables entre s´ı, es decir, ninguna de estas dos clases de grafos es subclase de la otra, el teorema 4.1.28 implica que los grafos triangulados bien cubiertos (no mezclados) son grafos simpliciales bien cubiertos. As´ı la propiedad de los grafos triangulados no mezclados de ser Cohen-Macaulay es consecuencia de la propiedad de los grafos simpliciales no mezclados de ser Cohen-Macaulay. Los grafos arco-circulares [6] son una clase de grafos que generalizan a los grafos de intervalos. Un grafo de intervalo es un grafo de intersecci´on de un conjunto de intervalos en la recta real. Los grafos de intervalos son triangulados y por tanto escalonados y secuencialmente Cohen - Macaulay. Definici´ on 4.1.29 Un grafo G es arco-circular si sus v´ertices pueden ponerse en correspondencia uno a uno con un conjunto de arcos en un c´ırculo de forma tal que dos v´ertices de G son adyacentes si y solo si sus arcos asociados se intersectan.


30

Roberto Cruz y Mario Estrada

Los grafos arco-circulares en general no son triangulados pues todos los ciclos son arco-circulares. Estos grafos no son escalonables en general, pues los ciclos pares no son escalonables. En el siguiente teorema mostraremos que si un grafo arco-circular contiene al menos un v´ertice simplicial, entonces es escalonable. Teorema 4.1.30 Sea G un grafo arco-circular que tiene al menos un v´ertice simplicial. Entonces G es escalonable. Demostraci´ on: Sea G un grafo arco-circular y v un v´ertice cualquiera de G. Entonces el grafo Gv = G\ ({v} ∪ NG (v)) , es un grafo de intervalo. De hecho, quitar de G el v´ertice v y su vecindad, es equivalente a quitar del c´ırculo el arco correspondiente y todos los arcos que se intersectan con este. Los arcos restantes se pueden entonces poner en correspondencia uno a uno con un conjunto de intervalos en la recta real, es decir, el grafo Gv es un grafo de intervalo. Supongamos que G no es completo, (de lo contrario el grafo es obviamente escalonable), x1 es un v´ertice simplicial de G y NG (x1 ) = {x2 , . . . , xr }. Los grafos Gi = G\ ({xi } ∪ NG (xi )), para i = 1, . . . , r son grafos de intervalos, por tanto son triangulados y escalonables. Por el teorema 2.1.13 el grafo G es escalonable. ✷ El teorema anterior implica que todos los grafos arco-circulares que tienen al menos un v´ertice simplicial son secuencialmente Cohen-Macaulay. Si G es un grafo arco-circular, con al menos un v´ertice simplicial y no mezclado, entonces G es Cohen-Macaulay.

Agradecimientos El financiamiento de este trabajo est´a a cargo del Proyecto de In´ vestigaci´on ”Algebra conmutativa combinatoria, ´algebras monomiales y grafos qu´ımicos”, E01250, Universidad de Antioquia. El primer autor tambi´en agradece al Programa de Asociados del International Centre of Theoretical Physics (ICTP). Roberto Cruz Rodes Departamento de Matem´ aticas, Universidad de Antioquia, Calle 67 N 53108 - A. A. 1226, Medell´ın, Colombia rcruz@matematicas.udea.edu.co

Mario Estrada Vald´es Departamento de Matem´ aticas, Universidad de Antioquia, Calle 67 N 53108 - A. A. 1226, Medell´ın, Colombia mestrada@matematicas.udea.edu.co


V´ertices simpliciales y escalonabilidad de grafos

31

Referencias [1] Bj¨orner A. y Wachs M., Shellable nonpure complexes and posets. I. Trans. Amer. Math. Soc., 348 (1996), 1299-1327. [2] Cheston G. C. A., Hare E. O., Hedetniemi S. T. y Laskar R. C., Simplicial graphs, Congressus Numerantium, 67 (1988), 241 - 258. [3] Cheston G. A. y Jap T. S., A survey of the algorithmic properties of simplicial, upper bound and midle graphs, Journal of Graph Algorithms and Applications, 10 (2006), 159 - 190. [4] Dirac G. A., On rigid circuit graphs, Abh. Math. Sem. Univ. Hamburg. 25 (1961), 71-76. [5] Fulkerson D.R. y Gross O.A., Incidence matrices and interval graphs, Pacific J. Math. 15 (1965), 835-855. [6] Golumbic M. C., Algorithmic graph theory and perfect graphs. Second edition., Elsevier, 2004. [7] Lov´asz L., A characterization of perfect graphs, J. Combin. Theory B 13 (1972), 253 - 267. [8] Plummer M. D., Some covering concepts in graphs, J. Combin. Theory, 8 (1970), 91 - 98. [9] Plummer M. D., Well covered graphs: a survey, Quaest. Math., 16 (1993), 253 - 287. [10] Prisner E., Topp J., Vestergaard P. D., Well covered simplicial, chordal and circular arc graphs, J. of Graph Theory, 21 (1996), 113-119. [11] Rose D. J., Tarjan R. E. y Leuker G. S., Algorithmic aspects of vertex elimination on graphs, SIAM J. Comput., 5 (1976), 266283. [12] Simis A., Vasconcelos W. y Villarreal R., On the ideal theory of graphs, J. Algebra 167 (1994), 389 - 416. [13] Stanley R. P., Combinatorics and Commutative Algebra.Second Edition. Progress in mathematics 41, Birkhuser Boston, Inc., Boston, MA, 1996.


32

Roberto Cruz y Mario Estrada

[14] Villarreal R. H., Cohen-Macaulay graphs, Manuscripta Math., 66 (1990), 277-293. [15] Van Tuyl A. y Villarreal R. H., Shellable graphs and sequentially Cohen - Macaulay bipartite graphs, (2007) Preprint. math CO/0701296v1.


Morfismos, Vol. 12, No. 2, 2008, pp. 33–52

Asymptotic normality of average cost Markov control processes ∗ Armando F. Mendoza-P´erez

Abstract This paper studies asymptotic normality of Markov control processes (MCPs) in Borel spaces with unbounded cost. Under suitable hypotheses we show that within the class of canonical policies there exists one where the cost is asymptotically normal.

2000 Mathematics Subject Classification: 93E20, 90C40. Keywords and phrases: (discrete-time) Markov control processes, average cost criteria, expected average cost, average variance, asymptotic normality.

1

Introduction.

We study the asymptotic normality of discrete-time MCPs in Borel spaces with possibly unbounded cost. Under suitable hypotheses we show that within the class of so-called canonical policies, those that minimize the limiting average variance have an asymptotic normality behavior, that is, certain distribution of the cost is asymptotically normal. Asymptotic normality is very useful in adaptive control problems. The only works for the variance minimization problem in MCPs are those by Mandl [7, 9, 10], Hern´andez-Lerma et al. [5], Prieto-Rumeau and Hern´andez-Lerma [11] and Zhu and Guo [15]. For the asymptotic behavior of the MCPs, there are a lot fewer works. For instance, we should mention the paper by Mandl [8] for finite state MCPs. ∗

This paper is part of the author’s Doctoral Thesis written at the Departamento de Matem´ aticas, CINVESTAV-IPN.

33


34

Armando F. Mendoza-P´erez

To obtain our results we combine two approaches. The first one, to obtain canonical policies with minimum average variance, we use the W uniform ergodicity assumptions in [5]. The second one follows Mandl’s approach [8] to extend asymptotic normality for MCPs in Borel spaces. The remainder of the paper is organized as follow. Section 2 contains a brief description of the Markov control model of interest. In Section 3 we introduce our hypotheses and state our main result, Theorem 3.7, which is proved in Section 4. Finally, a LQ system in Section 5 illustrates our results.

2

The control model.

Let (X, A, {A(x) : x ∈ X}, Q, C) be a discrete time Markov control model with state space X and control (or action) set A, both assumed to be Borel spaces with σ-algebras B(X) and B(A), respectively. For each x ∈ X there is a nonempty Borel set A(x) in B(A) which represents the set of feasible actions in the state x. The set K := {(x, a) : x ∈ X, a ∈ A(x)} is assumed to be a Borel subset of K × A. The transition law Q is a stochastic kernel on X given K and the one-stage cost C is a real-valued measurable function on K. The class of measurable functions f : X → A such that f (x) is in A(x) for every x ∈ X is denoted by F and we suppose that is nonempty. Control policies. For every n = 0, 1, . . ., let Hn be the family of admissible histories up to time n; that is, H0 := X, and Hn := Kn × X if n ≥ 1. A control policy is a sequence π = {πn } of stochastic kernels πn on A given Hn such that πn (A(xn )|hn ) = 1 for every n-history hn = (x0 , a0 , · · · , xn−1 , an−1 , xn ) in Hn . The class of all policies is denoted by Π. A policy π = {πn } is said to be a (deterministic) stationary policy if there exists f ∈ F such that πn (·|hn ) is the Dirac measure at f (xn ) ∈ A(xn ) for all hn ∈ Hn and n = 0, 1, . . .. Following a standard convention, we identify F with the class of stationary policies. For notational ease we write (1)

Cf (x) := C(x, f (x)) and Qf (·|x) := Q(·|x, f (x)) ∀x ∈ X

for every stationary policy f in F.


35

Asymptotic normality of MCPs

Let (Ω, F) be the (canonical) measurable space consisting of the sample space Ω := (X × A)∞ and its product σ-algebra F. Then, for each policy π and “initial state” x ∈ X, a stochastic process {(xn , an )} and a probability measure Pxπ are defined on (Ω, F) in a canonical way, where xn and an represent the state and control at time n, n = 0, 1, . . .. The expectation operator with respect to Pxπ is denoted by Exπ . Average cost criteria. For each n = 1, 2, . . ., let Jn (π, x) := Exπ

n−1 !

C(xt , at )

t=0

be the n-stage expected cost when using the policy π, given the initial state x ∈ X. The long-run expected average cost (EAC) is then defined as 1 (2) J(π, x) := lim sup Jn (π, x). n→∞ n Definition 2.1 (a) A policy π ∗ is said to be EAC-optimal if (3)

J(π ∗ , x) = inf J(π, x) =: J ∗ (x) π∈Π

∀x ∈ X.

(b) A stationary policy f∗ ∈ F is called canonical if there exists a constant ρ∗ and a measurable function h1 : X → R such that (4) ρ∗ + h1 (x) = min

a∈A(x)

"

C(x, a) +

#

X

$

h1 (y)Q(dy|x, a)

∀x ∈ X,

and f∗ (x) ∈ A(x) attain the minimum on the right-hand side of (4) for every x ∈ X, i.e., (5)

ρ∗ + h1 (x) = Cf∗ (x) +

#

X

h1 (y)Qf∗ (dy|x)

∀x ∈ X.

If (4) and (5) are satisfied, then (ρ∗ , h1 , f∗ ) is said to be a canonical triplet (see [1, 2, 14]). Remark 2.2 (See [2, Section 5.2].) If (ρ∗ , h1 , f∗ ) is a canonical triplet and in addition h1 satisfies that (6)

1 π E h1 (xn ) = 0 n→∞ n x lim

∀π ∈ Π, x ∈ X,

then f∗ is EAC-optimal and ρ∗ is the optimal expected average cost, that is, J(f∗ , x) = J ∗ (x) = ρ∗ ∀x ∈ X. (7)


36

Armando F. Mendoza-P´erez

Hence we have (8)

Fcp ⊂ Feac ,

where Fcp is the class of canonical policies and Feac ⊂ F is the class of stationary EAC-optimal policies. For each n = 1, 2, . . ., let (9)

Sn (f, x) :=

n−1 !

C(xt , at )

t=0

be the n-stage pathwise (or sample-path) cost when using the policy f ∈ F, given the initial state x ∈ X.

Definition 2.3 (a) For each f ∈ F and x ∈ X, define the limiting average variance (10)

"

#2

1 V (f, x) := lim sup Exf Sn (f, x) − Jn (f, x) . n→∞ n

(b) A stationary policy fˆ is called variance-minimal if (11)

3

V (fˆ, x) = inf V (f, x) f ∈Feac

∀x ∈ X.

Assumptions and main result.

In this section we introduce conditions to study asymptotic normality. We shall first introduce two sets of hypotheses. The first one, Assumption 3.1, consists of standard continuity-compactness conditions (see, for instance, [1, 3, 5, 12]) together with a growth condition on the one-step cost C. Assumption 3.1 For every state x ∈ X: (a) A(x) is a compact subset of A;

(b) C(x,a) is lower semicontinuous in a ∈ A(x); $

(c) the function a %→ X u(y)Q(dy|x, a) is continuous on A(x) for every bounded measurable function u on X; (d) there exists a measurable function W ≥ 1, a bounded measurable function b ≥ 0, and nonnegative constants r1 and β with β < 1, such that


Asymptotic normality of MCPs

(d1) |C(x, a)| ≤ r1 W (x) (d2) (d3)

!

X W (y)Q(dy|x, a)

!

X W (y)Q(dy|x, a)

37

∀(x, a) ∈ K and is continuous in a ∈ A(x); and ≤ βW (x) + b(x) for every x ∈ X.

To state our second set of hypotheses, let us first introduce the following notation: BW (X) denotes the normed linear space of measurable functions u on X with finite W -norm ∥u∥W , which is defined as (12)

∥u∥W := sup |u(x)|/W (x). x∈X

In this case we say that u is W -bounded. Let µ(·) be a measure on X. We write (13)

µ(u) :=

"

u(y)µ(dy) X

whenever the integral is well-defined. Assumption 3.2 For each stationary policy f ∈ F: (a) (W -geometric ergodicity) There exists a probability measure µf on X such that (14)

#" # # # # # u(y)Qtf (dy|x) − µf (u)# ≤ ∥u∥W Rρt W (x), # # X #

for every t = 0, 1, . . ., u in BW (X) and x ∈ X, where R > 0 and 0 < ρ < 1 are constants independent of f . (b) (Irreducibility) There exists a σ-finite measure λ on B(X) with respect to which Qf is λ-irreducible. Remark 3.3 (See [4, Theorem 3.5],[13, Theorem 4.5.3],[3, Theorem 10.3.6].) Under Assumptions 3.1 and 3.2, there exists a canonical triplet (ρ∗ , h1 , f∗ ); see Definition 2.1. To obtain asymptotic normality we need to strengthen the growth condition on the cost function C in Assumption 3.1(d1). Assumption 3.4 There exists a positive constant r2 such that (15)

C 4 (x, a) ≤ r2 W (x)

∀(x, a) ∈ K.


38

Armando F. Mendoza-P´erez

Remark 3.5 (a) Because W ≥ 1, Assumption 3.4 implies Assumption 3.1(d1). Moreover, we have that C 2 (x, a) ≤ r2 1/2 W (x) for every (x, a) in K (Assumption 3.6 in [5]), condition which is necessary to obtain optimal policies with minimal average variance. (b) Under Assumptions 3.1, 3.2 and 3.4, the function h1 satisfying (4) and (5) above is such that h21 and h41 belong to BW (X). (See Lemma 4.3 below.) By the Remark 3.5(b), the function Λ(·, ·) on K defined as (16)

Λ(x, a) :=

!

X

h1 (y)Q(dy|x, a) − 2

"!

X

#2

h1 (y)Q(dy|x, a)

is finite-valued. This function is used to state the following varianceminimization result. Proposition 3.6 (See [5, Theorem 3.8] or [3, Theorem 11.3.8].) Under Assumptions 3.1, 3.2 and 3.4, there exists a constant σ∗2 ≥ 0, a deterministic canonical policy f∗ ∈ Fcp , and a function h2 in BW (X) such that, for each x ∈ X, (17)

σ∗2

+ h2 (x) = Λf∗ (x) +

!

X

h2 (y)Qf∗ (dy|x)

Furthermore, f∗ satisfies (11) and V (f∗ , ·) = σ∗2 ; in fact (18) and (19)

V (f∗ , x) = µf∗ (Λf∗ ) = σ∗2 σ∗2 ≤ V (f, x)

∀x ∈ X

∀f ∈ Feac , x ∈ X.

Hence, (19) states that σ∗2 is the minimal average variance. We can now state our main result, which is proved in Section 4. Theorem 3.7 Suppose that Assumptions 3.1, 3.2 and 3.4 hold. Let f∗ ∈ Fcp be a canonical policy satisfying Proposition 3.6, and ρ∗ the optimal average cost as in (7). Then for every initial state x ∈ X, (20)

Sn (f∗ , x) − nρ∗ √ n

has asymptotically a normal distribution N (0, σ∗2 ) as n → ∞, with Sn (f∗ , x) as in (9).


39

Asymptotic normality of MCPs

4

Proof of Theorem 3.7.

In the remainder of this paper we suppose that Assumptions 3.1, 3.2 and 3.4 hold. To prove Theorem 3.7 we need some preliminary results, which are stated as Lemmas 4.1, 4.2, 4.3. The following lemma summarizes some well-known results, which are stated here for ease of reference. Lemma 4.1 Let f ∈ F be a deterministic stationary policy and {xt } the Markov chain induced by f . Then (a) [3, Lemma 10.4.1] For each x ∈ X and t = 1, 2, . . . (21)

Exf W (xt ) ≤ [1 + b/(1 − β)]W (x),

with b := supx∈X |b(x)|. Moreover, for every function u in BW (X) the following limits hold: lim

(22)

n→∞

1 f E u(xn ) = 0 np x

with p > 0. (b) [3, Proposition 10.2.3] |Jn (f, x) − nJf | ≤ r1 RW (x)/(1 − ρ) X, n = 1, 2, . . ., where Jf := µf (Cf ). Hence: (c) J(f, x) = limn→∞ Jn (f, x)/n = Jf

∀x ∈

∀x ∈ X.

(d) [3, Proposition 10.2.3] The function hf (x) := (23)

=

lim [Jn (f, x) − nJf ]

n→∞ ∞ ! t=0

Exf [Cf (xt ) − Jf ]

belongs to BW (X) which is called the “bias of f”. Moreover, by (b), we have ∥hf ∥W ≤ r1 R/(1 − ρ). (24) (e) [3, Theorem 10.3.6] The pair (Jf , hf ) is the unique solution of the Poisson equation (25)

Jf + hf (x) = Cf (x) +

"

X

hf (y)Qf (dy|x), ∀x ∈ X,

that satisfies the condition µf (hf ) = 0.


40

Armando F. Mendoza-P´erez

(f) [3, Theorem 10.3.7] If f is a canonical policy in Fcp , the corresponding solution (Jf , hf ) = (ρ∗ , hf ) to the Poisson equation (25) is such that hf coincides with the function h1 , with h1 as in (4) and (5), that is, hf (·) = h1 (·) + kf for some constant kf . The following lemma states a stronger version of (14) and Lemma 4.1(e). Lemma 4.2 Let w(x) := W (x)1/m with m = 2 or m = 4. For each stationary policy f ∈ F: (a) The Markov chain {xn } induced by f is w-geometrically ergodic, that is, (26)

!" ! ! ! ! ! t u(y)Qf (dy|x) − µf (u)! ≤ ∥u∥w R0 ρt0 w(x) ! ! X !

for all x ∈ X and t = 0, 1, . . ., where ρ0 = ρ1/m < 1 and R0 := R1/m ; (b) The unique solution (Jf , hf ) of the Poisson equation (25) is such that hf is w-bounded. Proof. (a) This part follows from [3, Lemma 11.3.9]. (b) Case m = 4: Note that (15) and part (a) of this lemma yield the W 1/4 -analogue of Lemma 4.1(d). Hence hf is W 1/4 -bounded. Case m = 2: Assumption 3.4 and the fact that W ≥ 1 imply that (27)

1/4

1/4

|C(x, a)| ≤ r2 W (x)1/4 ≤ r2 W (x)1/2

∀(x, a) ∈ K.

Part (a) (with m = 2) and (27) yield the W 1/2 -analogue of Lemma 4.1(d), that is, hf is W 1/2 -bounded. ✷ Lemma 4.3 (a) The function h1 (·) satisfying (4) and (5) is W 1/4 bounded. (b) The function h2 (·) satisfying (17) is W 1/2 -bounded.


Asymptotic normality of MCPs

41

Proof. (a) By Lemma 4.1(f), h1 coincides with hf except for an additive constant, with f a canonical policy. From Lemma 4.2(b), hf is W 1/4 -bounded, therefore h1 is also W 1/4 -bounded. (b) From the proof of Proposition 3.6 (see for instance, [5, Theorem 3.8] or [3, Theorem 11.3.8]) we consider the new Markov control model (X, A, {A∗ (x) : x ∈ X}, Q, Λ),

(28)

with A∗ (x) an appropriate compact subset of A(x) for every x, and Λ(x, a) as in (16). From part (a) of this lemma, h1 is W 1/4 -bounded. Hence we have that Λ satisfies the following growth condition (29)

Λ2 (x, a) ≤ r3 W (x)

∀(x, a) ∈ K,

where r3 is a positive constant. Observe that (29) yields the W 1/2 analogue of Assumption 3.1(d1); hence, by Lemma 4.2(a), the control model (28) is W 1/2 -geometrically ergodic. Then from Lemma 4.1 applied to the control model (28) with W 1/2 instead of W , and h2 instead of h1 , it follows that h2 is W 1/2 -bounded. ✷ We are finally ready for the proof of Theorem 3.7. Proof of Theorem 3.7. Let (ρ∗ , h1 , f∗ ) be a canonical triplet as in Definition 2.1. Moreover, let (σ∗2 , h2 , f∗ ) be as in Proposition 3.6. We define τ1 (x, a) :=

!

τ2 (x, a) :=

!

and

X

X

h1 (y)Q(dy|x, a) − h1 (x) + C(x, a) − ρ∗ h2 (y)Q(dy|x, a) − h2 (x) + Λ(x, a) − σ∗2

for all (x, a) ∈ K. For l = 1, 2, and x ∈ X, let ψl (x, a) :=

!

X

hl (y)Q(dy|x, a) − hl (x),

and consider the characteristic functions χn (u) := exp{iu(Sn (f∗ , x) − nρ∗ )} for n = 1, 2, · · · ; u ∈ R, with χ0 (u) := 1. Let (30) (31)

e1 (z) := exp{iz} − iz − 1, z2 − iz − 1. e2 (z) := exp{iz} + 2


42

Armando F. Mendoza-P´erez

Observe that (32)

τ1 (x, a) = ψ1 (x, a) + C(x, a) − ρ∗ ,

and (33)

τ2 (x, a) = ψ2 (x, a) + Λ(x, a) − σ∗2

for all (x, a) ∈ K. To prove the theorem we have to verify that

! u " 1 lim Exf∗ χn √ = exp{− σ∗2 u2 }. n→∞ n 2

(34)

To this end, first notice that ψl (xm , am ) for l = 1, 2, is the conditional expectation of hl (xm+1 ) − hl (xm ) given xm , am , that is, ψl (xm , am ) = Exf∗ [hl (xm+1 ) − hl (xm )|xm , am ]. This yields for l = 1, 2, with χm := χm (u) and ψl := ψl (xm , am ), the equations (35)

0=

iuExf∗

# n−1 $

m=0

and (36)

χm ψ1 −

n−1 $

m=0

% "

!

χm h1 (xm+1 ) − h1 (xm )

#

%

! " n−1 $ $ u2 f∗ n−1 Ex 0= χm h2 (xm+1 ) − h2 (xm ) − χm ψ2 . 2 m=0 m=0 !

"

To simplify the notation, let C := C(xm , am ), e1 := e1 u(C − ρ∗ ) !

"

and e2 := e2 u(C − ρ∗ ) . Moreover, notice that (37)

&

From (30), (31) and (37) we have Exf∗ χn − 1 = Exf∗ = Exf∗

(38)

n−1 $

and n−1 $

m=0

!

(χm+1 − χm )

m=0 n−1 $& m=0

−iuExf∗

'

χm+1 − χm = exp{iu(C − ρ∗ )} − 1 χm .

' 1 iu(C − ρ∗ ) − u2 (C − ρ∗ )2 + e2 χm , 2 "

χm h1 (xm+1 ) − h1 (xm ) =


43

Asymptotic normality of MCPs

!

iuExf∗ h1 (x0 ) − χn h1 (xn ) + !

n−1 "

m=0

#

h1 (xm+1 )(χm+1 − χm ) =

iuExf∗ h1 (x0 ) − χn h1 (xn )+ n−1 "

(39)

m=0

Similarly,

$

%

#

h1 (xm+1 ) iu(C − ρ∗ ) + e1 χm .

$ % " u2 f∗ n−1 Ex χm h2 (xm+1 ) − h2 (xm ) = 2 m=0

n−1 # " u2 f∗ ! Ex h2 (x0 ) − χn h2 (xn ) + h2 (xm+1 )(χm+1 − χm ) = 2 m=0

u2 f∗ ! E h2 (x0 ) − χn h2 (xn )+ 2 x n−1 "

(40)

m=0

$

%

#

h2 (xm+1 ) exp{iu(C − ρ∗ )} − 1 χm .

Adding (35)-(40) and using (32) Exf∗ χn −1 !

= iuExf∗ h1 (x0 )−χn h1 (xn )+

n−1 "

m=0

χm τ1 (xm , am )+

n−1 "

e1 h1 (xm+1 )χm

m=0

#

& ' u2 f∗ Ex χm ψ2 + 2h1 (xm+1 )(C − ρ∗ ) + (C − ρ∗ )2 2 m=0

n−1 $ % # " u2 f∗ ! Ex h2 (x0 ) − χn h2 (xn ) + h2 (xm+1 ) exp{iu(C − ρ∗ )} − 1 χm 2 m=0

n−1 "

+Exf∗

n−1 "

e2 χm .

m=0

Hence

Exf∗ χn − 1 = κ′′ (n, u)− (41)

& ' " u2 f∗ n−1 Ex χm ψ2 + 2h1 (xm+1 )(C − ρ∗ ) + (C − ρ∗ )2 2 m=0


44

Armando F. Mendoza-P´erez

with κ′′ (n, u) = !

iuExf∗ h1 (x0 ) − χn h1 (xn ) + −

n−1 "

χm τ1 (xm , am ) +

m=0 n−1 "

n−1 "

e1 h1 (xm+1 )χm

m=0

#

$ % # u2 f ∗ ! Ex h2 (x0 )−χn h2 (xn ) + h2 (xm+1 ) exp{iu(C−ρ∗ )}−1 χm 2 m=0

+Exf∗

(42)

n−1 "

e2 χm .

m=0

Observing that

$

%2

Λ(xm , am ) = Exf∗ [h21 (xm+1 )|xm , am ] − Exf∗ [h1 (xm+1 )|xm , am ] and in view of (33), we can express (41) as Exf∗ χn −1 = κ′′ (n, u)−

& " u2 f∗ n−1 Ex χm σ∗2 +τ2 (xm , am )−h21 (xm+1 ) 2 m=0 $

+ Exf∗ [h1 (xm+1 )|xm , am ] + C(xm , am ) − ρ∗

%2 '

& " u2 f∗ n−1 = κ (n, u)− Ex χm σ∗2 +τ2 (xm , am )−h21 (xm+1 ) 2 m=0 ′′

+

$(

X

h1 (y)Q(dy|xm , am ) + C(xm , am ) − ρ∗

Since f∗ is a canonical policy, it satisfies h1 (xm ) =

(

X

%2 '

.

h1 (y)Q(dy|xm , am ) + C(xm , am ) − ρ∗ .

Then, from (37), we have Exf∗ χn −1 = κ′′ (n, u) −

& ' " u2 f∗ n−1 Ex χm σ∗2 + τ2 (xm , am ) − h21 (xm+1 ) + h21 (xm ) 2 m=0


45

Asymptotic normality of MCPs

" ! u2 σ∗2 n−1 u2 Exf∗ χm − Exf∗ h21 (x0 ) − χn h21 (xn ) 2 m=0 2

= κ′′ (n, u) − +

n−1 !

χm τ2 (xm , am ) +

m=0

m=0

n−1 !

#

h21 (xm+1 )(χm+1 − χm ) .

" ! u2 σ∗2 n−1 u2 Exf∗ χm − Exf∗ h21 (x0 ) − χn h21 (xn ) 2 m=0 2

= κ′′ (n, u) − +

n−1 !

χm τ2 (xm , am ) +

m=0

n−1 !

m=0

Hence (43)

Exf∗ χn = 1 −

with

$

%

! u2 σ∗2 n−1 E f∗ χm + κ′ (n, u) 2 m=0 x

n−1 ! u2 f ∗ " 2 2 κ (n, u) = κ (n, u)− Ex h1 (x0 )−χn h1 (xn )+ χm τ2 (xm , am ) 2 m=0 ′

′′

(44)

+

n−1 !

m=0

$

$

(45)

%

#

h21 (xm+1 ) exp{iu(C − ρ∗ )} − 1 χm .

Let us rewrite (43) as

Exf∗ χn = 1 + exp{−

% n−1 ! u2 σ∗2 }−1 Exf∗ χm + κ(n, u), 2 m=0

with "

(46) κ(n, u) := κ′ (n, u) + 1 −

! u2 σ∗2 # n−1 u2 σ∗2 − exp{− } Exf∗ χm . 2 2 m=0

From (45), an induction argument gives Exf∗ χn (u) = exp{−

(47)

"

exp{−

#

h21 (xm+1 ) exp{iu(C − ρ∗ )} − 1 χm .

nσ∗2 u2 }+ 2

# n−1 & σ 2 u2 ' ! σ∗2 u2 }−1 exp − ∗ (n − 1 − m) κ(m, u) 2 2 m=0

+κ(n, u).


46

Armando F. Mendoza-P´erez

Observe that the proof of the limit (34) and consequently of Theorem 3.7 follows from (47) if we show u (48) max |κ(m, √ )| → 0 as n → ∞. 1≤m≤n n This relation is obtained by an inspection of the different terms of √ κ(m, u/ n). We will do this in the following six steps. (i) Since f∗ is a canonical policy satisfying (5), we have τ1 (xm , am ) = 0 for m = 0, 1, · · · in (42). Similarly, by (17), τ2 (xm , am ) = 0 in (44). (ii) From (22) we have that 1 lim √ Exf∗ h(xn ) = 0 n

n→∞

and

lim

n→∞

1 f∗ E h(xn ) = 0 n x

for every h in BW (X). This limit appears in (42) and (44) when we √ replace u by u/ n. (iii) In this part we prove the limit (see (42)) n−1 ! 1 lim √ Exf∗ e1 h1 (xm+1 )χm = 0. n→∞ n m=0

From the fact |e1 (z)| ≤ z 2 /2 for all z in R, we obtain n−1 " 1 " ! " " e1 h1 (xm+1 )χm " " √ Exf∗ n m=0

n−1 ! u2 1 |h1 (xm+1 )|(C(xm , am ) − ρ∗ )2 ≤ √ Exf∗ 2 n n m=0 n−1 ! u2 | = 3/2 Exf∗ 2n m=0

#

X

h1 (y)Qf∗ (dy|xm )|(Cf∗ (xm ) − ρ∗ )2 .

√ √ By Lemma 4.3(a), h1 (·) is 4 W -bounded, in particular h1 (·) is W √ $ On bounded. Hence the function X h1 (y)Qf∗ (dy|·) is W -bounded. √ 2 the other hand, by Assumption 3.4 (Cf∗ (x) − ρ∗ ) is also W -bounded. Therefore n−1 n−1 " 1 " ! ! λu2 " " e1 h1 (xm+1 )χm " ≤ 3/2 Exf∗ W (xm ) " √ Exf∗ n

2n

m=0

m=0

where λ is a constant depending on h1 and C. By (21) we obtain n−1 " 1 " ! λu2 " " e1 h1 (xm+1 )χm " ≤ 3/2 n[1 + b/(1 − β)]W (x). " √ Exf∗ n m=0

2n


Asymptotic normality of MCPs

47

which converges to zero as n → ∞. (iv) We shall next prove ! 1 f∗ n−1 lim Ex e2 χm = 0. n→∞ n m=0

√ This limit appears in (42) when we replace u by u/ n. Observe that |e2 (z)| ≤ |z|3 /6 for all z in R. So, by Assumptions 3.1(d) and 3.4, together with (21), n−1 "1 " ! " " e2 χm " " Exf∗

n

m=0

≤ ≤ ≤ ≤

! |u|3 f∗ n−1 Ex |Cf∗ (xm ) − ρ∗ |3 5/2 6n m=0

! k 3 |u|3 f∗ n−1 Ex W (xm )3/4 5/2 6n m=0 ! k 3 |u|3 f∗ n−1 Ex W (xm ) 5/2 6n m=0

k 3 |u|3 [1 + b/(1 − β)]W (x) 6n3/2

which converges to√zero as n → ∞, with k a constant. (v) Let h be a W -bounded function on X. Then # $ ! u 1 f∗ n−1 Ex h(xm+1 ) exp{i √ (C − ρ∗ )} − 1 χm = 0. n→∞ n n m=0

lim

√ This limit appears in (42) and (44) when u is replaced by u/ n. It follows from the relation e1 (z) = exp{iz} − iz − 1 that

$ # u u u exp{i √ (C − ρ∗ )} − 1 = i √ (C − ρ∗ ) + e1 √ (C − ρ∗ ) . n n n

So n−1 # $ ! u 1 h(xm+1 ) exp{i √ (C − ρ∗ )} − 1 χm | ≤ | Exf∗ n n m=0

! ! |u| f∗ n−1 1 f∗ n−1 E |h(x )||(C (x ) − ρ )| + |h(xm+1 )||e1 |. E m+1 m ∗ f ∗ n x m=0 n3/2 x m=0

This gives the desired conclusion by similar arguments to those in (iii).


48

Armando F. Mendoza-P´erez

(vi) The absolute value of the expression within brackets in (46) √ is majorized by σ∗4 u4 /8, then the corresponding term in κ(n, u/ n) is majorized by σ∗4 u4 /8n2 . The statements (i)-(vi) imply (48) and consequently prove the theorem. ✷ Remark 4.4 Taking A as a single-point set (singleton) we obtain the Central Limit Theorem for (noncontrolled) Markov chains.

5

An example: a LQ system

Consider the linear system (49)

xt+1 = k1 xt + k2 at + zt ,

t = 0, 1, · · · ,

with state space X := R and positive coefficients k1 , k2 . The control set is A := R, and the set of admisible controls in each state x is the interval (50) A(x) := [−k1 |x|/k2 , k1 |x|/k2 ].

The disturbances zt consists of i.i.d. random variables with values in Z := R, zero mean and finite variance, that is,

(51)

E(zt ) = 0,

σ 2 := E(zt2 ) < ∞.

To complete the description of our control model we introduce the quadratic cost-per-stage function (52)

C(x, a) := c1 x2 + c2 a2

∀(x, a) ∈ K,

with positive coefficients c1 , c2 . We also define (53)

W (x) := exp[γ|x|] for all x ∈ X,

with γ ≥ 4. clearly, Assumption 3.4 holds. Moreover, let sˆ > 0 be such that γˆ s < log(γ/2 + 1), which implies (54)

β :=

2 (exp[γˆ s] − 1) < 1. γ

Throughout the rest of this section, we suppose the following Assumptions taken from [6, Section 5]:


49

Asymptotic normality of MCPs

Assumption 5.1 0 < k1 < 1/2. Assumption 5.2 The i.i.d. disturbances zt have a common density g, which is a continuous bounded function supported on the interval S := [−ˆ s, sˆ]. Moreover, there exists a positive number ε such that g(s) ≥ ε for all s ∈ S. These assumptions, 5.1 and 5.2, imply that Assumptions 3.1 and 3.2 hold ( see, for instance,[6, Propositions 6, 23 and 24]). On the other hand, in [6] it is proved that there exists a unique canonical policy given by (55)

f∗ (x) = −f0 x,

∀x ∈ X,

satisfying (4) and (5), with f0 :=

v0 k1 k2 c2 + v0 k22

and v0 is the unique positive solution to the quadratic (so-called Riccati) equation k22 v02 + (c2 − c1 k22 − c2 k12 )v0 − c1 c2 = 0. In this case, the corresponding function h1 (·) is given by

(56)

h1 (x) = v0 x2

and the optimal value is (57)

∀x ∈ X,

ρ∗ = v0 σ 2 ,

where σ as in (51). Thus (ρ∗ , h1 , f∗ ) is a canonical triplet for our linear quadratic Markov control model. Since f∗ in (55) is the unique canonical policy, by Proposition 3.6 we have that this policy also minimizes the limit average variance. In particular, the optimal value for the variance is (58)

! 1 n−1 Exf∗ Λf∗ (xt ), n→∞ n t=0

σ∗2 = V (f∗ , x) = lim

We next calculate the limit in (58) and find #the value of the optimal variance. To this end, let k" := k1 −k2 f0 , B := R z 3 g(z) dz and D := # 4 R z g(z) dz. Then by (16), (55) and (56), we have (59)

$

%

f∗ 4 " Λf∗ (xt ) = v02 4k"2 σ 2 Exf∗ (x2t ) + 4kBE x (xt ) + D − σ ,


50

Armando F. Mendoza-P´erez

Replacing at in (49) with at := f∗ (xt ) = −f0 xt , we obtain ! t−1 + zt−1 xt = (k1 − k2 f0 )xt−1 + zt−1 = kx

∀t = 1, 2, · · · .

! < 1. By (50) and Assumption 5.1, we can check that |k| By an induction procedure, for all t = 1, 2, · · ·,

xt = k!t x0 +

t−1 "

j=0

From this relation, we obtain

Exf∗ (xt ) = k!t x,

(60) and (61)

k!j zt−1−j .

Exf∗ (x2t ) = k!2t x2 + σ 2 (1 − k!2t )/(1 − k!2 ).

The relations (60) and (61) imply the limits

" " 1 n−1 1 n−1 f∗ (62) lim Ex (xt ) = 0 and lim Exf∗ (x2t ) = σ 2 /(1 − k!2 ). n→∞ n n→∞ n t=0 t=0

Hence, by (59) and (62) we obtain σ∗2 = (63)

" 1 n−1 Exf∗ Λf∗ (xt ) n→∞ n t=0

lim

= v02

# 5k !2 − 1

1 − k!2

σ4 +

$

R

%

z 4 g(z) dz ≥ 0.

Finally, by Theorem 3.7 and considering (57), we obtain that for every initial state x ∈ X, as n → ∞, the distribution of the cost &n−1 t=0

Cf∗ (xt ) − nv0 σ 2 √ n

has an asymptotic normal distribution N (0, σ∗2 ) with σ∗2 as in (63). By (5), we obtain v0 (1 − k!2 ) = c1 + c2 f02 . Hence, Cf∗ (x) = (c1 + c2 f02 )x2 = v0 (1 − k!2 )x2 for all x. This implies that for every initial state x, as n → ∞, &n−1 2 2 !2 t=0 xt − nσ /(1 − k ) √ n


Asymptotic normality of MCPs

51

has asymptotic normal distribution N (0, s2 ), where s = 2

! 5k "2 − 1

1 − k"2

σ + 4

#

R

z 4 g(z) dz

$%

(1 − k"2 )2 .

Acknowledgement The author wishes to thank Professor On´esimo Hern´andez-Lerma for his valuable comments and suggestions. Armando F. Mendoza-P´erez Universidad Polit´ecnica de Chiapas, Calle Eduardo J.Selvas S/N, Tuxtla Guti´errez, Chiapas. mepa680127@hotmail.com

References [1] Gordienko E. and Hern´ andez-Lerma O., Average cost Markov control processes with weigthed norms: existence of canonical policies, Appl. Math. (Warsaw), 23 (1995), 199-218. [2] Hern´ andez-Lerma O. and Lasserre J.B., Discrete-Time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, New York, (1996). [3] Hern´ andez-Lerma O. and Lasserre J.B ., Further Topics on Discrete-time Markov Control Processes, Springer-Verlag, New York, (1999). [4] Hern´ andez-Lerma O. and Vega-Amaya O., Infinite-horizon Markov control processes with undiscounted cost criteria: From average to overtaking optimality, Appl. Math. (Warsaw), 25 (1998), 153-178. [5] Hern´ andez-Lerma O., Vega-Amaya O. and Carrasco G., Sample-path optimality and variance-minimization of average cost Markov control processes, SIAM J. Control Optim., 38(1) (1999), 79-93. [6] Hilgert N. and Hern´ andez-Lerma O., Bias optimality versus strong 0-discount optimality in Markov control processes with unbounded costs, Acta Appl. Math. 77 (2003), 215-235. [7] Mandl P., On the variance in controlled Markov chains, Kybernetika (Prague), 7 (1971), 1-12. [8] Mandl P., On the asymptotic normality of the reward in a controlled Markov chain, Colloquia Mathematica Societatis J´ anos Bolyai, 9. European Meeting of Statisticians, Budapest (Hungary), (1972). [9] Mandl P., A connection between controlled Markov chains and martingales, Kybernetika (Prague), 9 (1973), 237-241. [10] Mandl P., Estimation and control in Markov chains, Adv. Appl. Probab., 6 (1974), 40-60.


52

Armando F. Mendoza-P´erez

[11] Prieto-Rumeau T. and Hern´ andez-Lerma O., Variance minimization and the overtaking optimality approach to continuous–time controlled Markov chains, To appear in Math. Meth. Oper. Res. [12] Puterman M.L., Markov Decision Process, Wiley, New York, (1994). [13] Vega-Amaya O., Markov control processes in Borel spaces: Undiscounted criteria, Doctoral thesis, UAM-Iztapalapa, M´exico, 1998 (in Spanish). [14] Yushkevich A.A., On a class of strategies in general Markov decision models, Theory Probab. Appl., 18 (1973), 777-779. [15] Zhu Q.X. and Guo X.P., Markov decision processes with variance minimization: A new condition and approach, Stoch. Anal. Appl., 25 (2007), 577-592.


Morfismos, Vol. 12, No. 2, 2008

Errata

En la edici´on impresa del Vol. 9, No. 2 de Morfismos (diciembre de 2005) se omiti´o involuntariamente la f´ormula con etiqueta (14) al final de la p´agina 11. La forma correcta en que dicha p´agina debi´o terminar es con los dos renglones siguientes:

By an involuntary error, formula (14) was removed at the bottom of page 11 in the December 2005 printed issue of Morfismos (Vol. 9, No. 2). The last two lines in that page should have been:

... B = 1.10555. He used this to show that, for x large, (14)

0.89

x x < π(x) < 1.11 log x log x

53



Morfismos, Comunicaciones Estudiantiles del Departamento de Matem´ aticas del CINVESTAV, se termin´ o de imprimir en el mes de marzo de 2009 en el taller de reproducci´ on del mismo departamento localizado en Av. IPN 2508, Col. San Pedro Zacatenco, M´exico, D.F. 07300. El tiraje en papel opalina importada de 36 kilogramos de 34 × 25.5 cm consta de 500 ejemplares con pasta tintoreto color verde.

Apoyo t´ecnico: Omar Hern´ andez Orozco.


Contenido The vanishing discount approach to average reward optimality: the strongly and the weakly continuous cases Tom´ as Prieto-Rumeau and On´esimo Hern´ andez-Lerma . . . . . . . . . . . . . . . . . . . 1

V´ertices simpliciales y escalonabilidad de grafos Roberto Cruz y Mario Estrada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Asymptotic normality of average cost Markov control processes Armando F. Mendoza-P´erez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.