Morfismos, Vol 12, No 2, 2008 by Morfismos, Department of Mathematics, Cinvestav

VOLUMEN 12 NÚMERO 2 JULIO A DICIEMBRE DE 2008 ISSN: 1870-6525

Morfismos Comunicaciones Estudiantiles Departamento de Matem´aticas Cinvestav

Editores Responsables • Isidoro Gitler • Jes´ us Gonz´alez

Consejo Editorial • Luis Carrera • Samuel Gitler • Onésimo Hernández-Lerma • Hector Jasso Fuentes • Miguel Maldonado • Ra´ ul Quiroga Barranco • Enrique Ram´ırez de Arellano • Enrique Reyes • Armando Sánchez • Mart´ın Solis • Leticia Zárate

Editores Asociados • Ricardo Berlanga • Emilio Lluis Puebla • Isa´ıas L´opez • Guillermo Pastor • V´ıctor P´erez Abreu • Carlos Prieto • Carlos Renter´ıa • Luis Verde

Secretarias Técnicas • Roxana Mart´ınez • Laura Valencia ISSN: 1870 - 6525 Morfismos puede ser consultada electrónicamente en “Revista Morfismos” en la dirección http://www.math.cinvestav.mx. Para mayores informes dirigirse al teléfono 57 47 38 71. Toda correspondencia debe ir dirigida a la Sra. Laura Valencia, Departamento de Matemáticas del Cinvestav, Apartado Postal 14-740, México, D.F. 07000 o por correo electrónico: laura@math.cinvestav.mx.

VOLUMEN 12 NÚMERO 2 JULIO A DICIEMBRE DE 2008 ISSN: 1870-6525

Informaci´ on para Autores El Consejo Editorial de Morfismos, Comunicaciones Estudiantiles del Departamento de Matem´ aticas del CINVESTAV, convoca a estudiantes de licenciatura y posgrado a someter art´ıculos para ser publicados en esta revista bajo los siguientes lineamientos: • Todos los art´ıculos ser´ an enviados a especialistas para su arbitraje. No obstante, los art´ıculos ser´ an considerados s´ olo como versiones preliminares y por tanto pueden ser publicados en otras revistas especializadas. • Se debe anexar junto con el nombre del autor, su nivel acad´ emico y la instituci´ on donde estudia o labora. • El art´ıculo debe empezar con un resumen en el cual se indique de manera breve y concisa el resultado principal que se comunicar´ a. • Es recomendable que los art´ıculos presentados est´ en escritos en Latex y sean enviados a trav´ es de un medio electr´ onico. Los autores interesados pueden obtener el foron web mato LATEX 2ε utilizado por Morfismos en “Revista Morfismos” de la direcci´ http://www.math.cinvestav.mx, o directamente en el Departamento de Matem´ aticas del CINVESTAV. La utilizaci´ on de dicho formato ayudar´ a en la pronta publicaci´ on del art´ıculo. • Si el art´ıculo contiene ilustraciones o figuras, ´ estas deber´ an ser presentadas de forma que se ajusten a la calidad de reproducci´ on de Morfismos. • Los autores recibir´ an un total de 15 sobretiros por cada art´ıculo publicado.

• Los art´ıculos deben ser dirigidos a la Sra. Laura Valencia, Departamento de Matem´ aticas del Cinvestav, Apartado Postal 14 - 740, M´ exico, D.F. 07000, o a la direcci´ on de correo electr´ onico laura@math.cinvestav.mx

Author Information Morfismos, the student journal of the Mathematics Department of the Cinvestav, invites undergraduate and graduate students to submit manuscripts to be published under the following guidelines: • All manuscripts will be refereed by specialists. However, accepted papers will be considered to be “preliminary versions” in that authors may republish their papers in other journals, in the same or similar form. • In addition to his/her aﬃliation, the author must state his/her academic status (student, professor,...). • Each manuscript should begin with an abstract summarizing the main results.

• Morfismos encourages electronically submitted manuscripts prepared in Latex. Authors may retrieve the LATEX 2ε macros used for Morfismos through the web site http://www.math.cinvestav.mx, at “Revista Morfismos”, or by direct request to the Mathematics Department of Cinvestav. The use of these macros will help in the production process and also to minimize publishing costs. • All illustrations must be of professional quality.

• 15 oﬀprints of each article will be provided free of charge.

• Manuscripts submitted for publication should be sent to Mrs. Laura Valencia, Departamento de Matem´ aticas del Cinvestav, Apartado Postal 14 - 740, M´ exico, D.F. 07000, or to the e-mail address: laura@math.cinvestav.mx

Lineamientos Editoriales “Morfismos” es la revista semestral de los estudiantes del Departamento de Matem´ aticas del CINVESTAV, que tiene entre sus principales objetivos el que los estudiantes adquieran experiencia en la escritura de resultados matem´ aticos. La publicaci´ on de trabajos no estar´ a restringida a estudiantes del CINVESTAV; deseamos fomentar también la participaci´ on de estudiantes en México y en el extranjero, as´ı como la contribuci´ on por invitaci´ on de investigadores. Los reportes de investigaci´ on matem´ atica o res´ umenes de tesis de licenciatura, maestr´ıa o doctorado pueden ser publicados en Morfismos. Los art´ıculos que aparecer´ an ser´ an originales, ya sea en los resultados o en los métodos. Para juzgar ésto, el Consejo Editorial designar´ a revisores de reconocido prestigio y con experiencia en la comunicaci´ on clara de ideas y conceptos matem´ aticos. Aunque Morfismos es una revista con arbitraje, los trabajos se considerar´ an como versiones preliminares que luego podr´ an aparecer publicados en otras revistas especializadas. Si tienes alguna sugerencia sobre la revista hazlo saber a los editores y con gusto estudiaremos la posibilidad de implementarla. Esperamos que esta publicaci´ on propicie, como una primera experiencia, el desarrollo de un estilo correcto de escribir matem´ aticas.

Morfismos

Editorial Guidelines “Morfismos” is the journal of the students of the Mathematics Department of CINVESTAV. One of its main objectives is for students to acquire experience in writing mathematics. Morfismos appears twice a year. Publication of papers is not restricted to students of CINVESTAV; we want to encourage students in Mexico and abroad to submit papers. Mathematics research reports or summaries of bachelor, master and Ph.D. theses will be considered for publication, as well as invited contributed papers by researchers. Papers submitted should be original, either in the results or in the methods. The Editors will assign as referees well–established mathematicians. Even though Morfismos is a refereed journal, the papers will be considered as preliminary versions which could later appear in other mathematical journals. If you have any suggestions about the journal, let the Editors know and we will gladly study the possibility of implementing them. We expect this journal to foster, as a preliminary experience, the development of a correct style of writing mathematics.

Morfismos

Contenido The vanishing discount approach to average reward optimality: the strongly and the weakly continuous cases Tom´ as Prieto-Rumeau and On´esimo Hern´ andez-Lerma . . . . . . . . . . . . . . . . . . . 1

V´ertices simpliciales y escalonabilidad de grafos Roberto Cruz y Mario Estrada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Asymptotic normality of average cost Markov control processes Armando F. Mendoza-P´erez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Morfismos, Vol. 12, No. 2, 2008, pp. 1–15

The vanishing discount approach to average reward optimality: the strongly and the weakly continuous cases ∗ Toma´s Prieto-Rumeau

On´esimo Hern´andez-Lerma

Abstract We consider a discrete-time stochastic dynamic programming model and we propose conditions under which the limit of discount optimal policies, as the discount factor converges to one, is average optimal. We prove this result under strong and weak continuity conditions and, moreover, we relax the usual value boundedness condition on the relative values of the optimal discounted reward.

2000 Mathematics Subject Classification: 93E20, 90C40. Keywords and phrases: dynamic programming, vanishing discount, average optimality.

Introduction

The basic problem dealt with in this paper is the existence of control policies π that maximize the long-run expected average reward # ! T −1 " 1 r(xt , π(xt )) (1) v(x, π) := lim inf Eπx T →∞ T t=0

for every initial state x0 = x. (The underlying controlled system is a fairly general discrete-time stochastic control process described in Section 2; see (6).) Among the several known techniques to analyze this problem, the most common is the vanishing discount approach, which can be traced back to Taylor [16]. It is so-named because it is based on ∗

This research was partially supported by CONACyT Grant 45693-F.

T. Prieto-Rumeau and O. Hern´andez-Lerma

the convergence as ρ ↑ 1 (0 < ρ < 1) of ρ-discounted optimal reward policies. To state this more precisely, we need some notation. For each discount factor ρ ∈ (0, 1), let !∞ # " (2) vρ (x, π) := Eπx ρt r(xt , π(xt )) t=0

be the expected discounted reward of the admissible control policy π ∈ Π (see Section 2) when the initial state is x0 = x. The optimal ρdiscounted reward function is defined as (3)

vρ (x) := sup vρ (x, π) π∈Π

for every state x. For a given fixed state x′ , consider the relative value function uρ (x) := vρ (x) − vρ (x′ ).

This function is one of the key tools in the vanishing discount approach. To obtain the convergence of ρ-discount optimal policies to average optimal policies as ρ ↑ 1, it was assumed in [16] that uρ was uniformly bounded, that is, there exists a constant L such that |uρ (x)| ≤ L for every state x and 0 < ρ < 1. This condition was later relaxed to the following weaker value boundedness condition: there exists a constant L and a function m such that (4)

−m(x) ≤ uρ (x) ≤ L

for every state x and 0 < ρ < 1; see, e.g., [2, Assumption A1], [5, Assumption 4.1], [12, Definition 2.1] or [15]. In this paper, we further relax (4) and assume the existence of a function m (satisfying appropriate hypotheses) such that (5)

−m(x) ≤ uρ (x) ≤ m(x)

for every x and 0 < ρ < 1. Such a condition can also be found in e.g. [3, Lemma 4.5], [4, Assumption 3.3] or [7, Lemma 10.4.2]. Relaxing (4) to (5) is indeed a relevant issue because (4) is, in fact, a fairly restrictive condition. For instance, to obtain (4), it is assumed in [12] that the reward function r is bounded. Moreover, condition (4) excludes the case

The vanishing discount approach to average reward optimality

of an unbounded utility function (see the comment after Assumption 5.3 in [12, p. 1423]). Also, in Section 4 of this paper, we describe a control model for which (5) holds, whereas (4) does not. Summarizing, the goal of this paper is to give conditions on the controlled process that, together with the condition (5), ensure that the limit of ρ-discount optimal policies, as ρ ↑ 1, is average optimal. The basic control model is described in Section 2. In Section 3, we consider two diﬀerent sets of hypotheses, namely, strong and weak continuity conditions, depending on the corresponding strong or weak continuity of the control system’s transition function. Also in Section 3, we state our main results: Theorem 3.10 and Corollary 3.12, in which we mention several particular cases of interest. Finally, we present an example in Section 4, and our conclusions are stated in Section 5.

The control model

The formulation of the controlled process and the notation is mainly drawn from [12]. We assume that the state space S and the action set A are Borel spaces (that is, measurable subsets of complete and separable metric spaces). Let Γ be a nonempty set-valued function from S to A. For each x ∈ S, the corresponding set of feasible control actions is Γ(x) ⊆ A. The family of feasible state-action pairs is denoted by K, i.e., K := {(x, a) ∈ S × A : a ∈ Γ(x)}, which is assumed to be a measurable subset of S × A. (In this paper, measurability is always referred to the Borel σ-algebra.) We consider a sequence {ξt }t≥0 of i.i.d. random variables from a given probability space (Ω, F, P) to (Z, Z) with common distribution ν. Let h : K × Z → S be a measurable function. We assume that the state of the system is updated according to the function h, meaning that if the action a ∈ Γ(x) is chosen at x ∈ S and the value of the random perturbation is ξ, then the next state of the system is h(x, a, ξ) ∈ S. We suppose that the reward function is the measurable real-valued mapping r : K → R. Let Π be the family of measurable functions π : S → A such that π(x) ∈ Γ(x) for every x ∈ S. (We suppose that Π is nonempty.) We call π ∈ Π a deterministic stationary policy. For each π ∈ Π and every

T. Prieto-Rumeau and O. Hern´andez-Lerma

initial state x0 ∈ S independent of {ξt }t≥0 , xt+1 = h(xt , π(xt ), ξt ) for t = 0, 1, . . .

(6)

is a Markov process and it stands for the state of the system under the policy π. The corresponding expectation operator is denoted by Eπx0 . Although larger classes of policies may be considered, it is well known that for the control problem we are dealing with Π is a “suﬃcient” class of policies — see [6, Chapter 4] or [7, Chapter 8], for instance. Given an admissible policy π ∈ Π and an initial state x ∈ S, the corresponding long-run average reward and expected discounted reward are defined as in (1) and (2), respectively. Given a discount factor 0 < ρ < 1, we say that π ∈ Π is ρ-discount optimal if vρ (x, π) = vρ (x) for every x ∈ S (recall (3)). Similarly, π ∈ Π is average reward optimal if v(x, π ∗ ) = sup v(x, π) ∀ x ∈ S. π∈Π

Main results

As already mentioned, we will consider two diﬀerent sets of hypotheses, which we label as strong and weak continuity assumptions.

The strongly continuous case We state the assumptions we make on our control model. First, we have the following Lyapunov-like condition. Assumption 3.1 There exists a measurable function w : S → [1, ∞), and constants 0 < β < 1 and b > 0 such that ! w(h(x, a, ξ))ν(dξ) ≤ βw(x) + b ∀ (x, a) ∈ K. Z

The next assumption introduces some usual continuity and compactness requirements. We note that the function w in Assumptions 3.2 and 3.4 is taken from Assumption 3.1. Assumption 3.2 (i) For every x ∈ S, the set Γ(x) is compact. (ii) The reward function r(x, a) is upper semicontinuous on A(x) for every x ∈ S. In addition, there exists a constant M such that |r(x, a)| ≤ M w(x)

∀ (x, a) ∈ K.

The vanishing discount approach to average reward optimality

(iii) The function (x, a) !→

w(h(x, a, ξ))ν(dξ)

is continuous on A(x) for every x ∈ S. (iv) Strong continuity. For every bounded and measurable ζ : S → R, the function ! (x, a) !→ ζ(h(x, a, ξ))ν(dξ) Z

is continuous on A(x) for every x ∈ S. Remark 3.3 (The additive-noise case) The strong continuity condition is satisfied, for instance, when S = Z = R, h(x, a, ξ) = g(x, a) + ξ, where g is continuous on A(x) for each fixed x ∈ S, and, in addition, ν has an almost everywhere continuous bounded density with respect to the Lebesgue measure. This includes, of course, the linear case in which g(x, a) = k1 x + k2 a for some constants k1 , k2 . Finally, we state the value boundedness condition. Assumption 3.4 There exists a state x′ ∈ S and a constant M ′ > 0 such that sup |vρ (x) − vρ (x′ )| ≤ M ′ w(x) ∀ x ∈ S. 0<ρ<1

The weakly continuous case Among the hypotheses made so far on the control model, the most restrictive one is the strong continuity condition in Assumption 3.2(iv). Under additional appropriate conditions, strong continuity can be relaxed to weak continuity. To this end, first, the “measurability” of w in Assumption 3.1 is replaced with “continuity”. Assumption 3.5 There exists a continuous function w : S → [1, ∞), and constants 0 < β < 1 and b > 0 such that ! w(h(x, a, ξ))ν(dξ) ≤ βw(x) + b ∀ (x, a) ∈ K. Z

T. Prieto-Rumeau and O. Hern´andez-Lerma

In Assumptions 3.6 and 3.8 below, the function w is taken from Assumption 3.5. Assumption 3.6 (i) The function Γ : S → 2A is upper semicontinuous and compact-valued. (ii) The reward function r is upper semicontinuous on K and, moreover, there exists a constant M > 0 such that |r(x, a)| ≤ M w(x) (iii) The function (x, a) %→ is continuous on K.

∀ (x, a) ∈ K.

w(h(x, a, ξ))ν(dξ)

(iv) Weak continuity. The function ! ζ(h(x, a, ξ))ν(dξ) (x, a) %→ Z

is continuous on K for every bounded and continuous ζ : S → R. Remark 3.7 The weak continuity assumption is satisfied, for instance, if the function h(x, a, ξ) is continuous on K for each ξ ∈ Z. We introduce some notation. Let Bw (S) be the family of measurable functions ζ : S → R with finite w-norm, that is, ||ζ||w := sup{|ζ(x)|/w(x)} < ∞. x∈S

Assumption 3.8 The controlled process is w-uniformly ergodic on Π; that is, for each π ∈ Π, the Markov process (6) has a unique invariant probability measure µπ on S and, in addition, there exist constants R > 0 and 0 < α < 1 such that for every x ∈ S, ζ ∈ Bw (S) and t ≥ 0 " " ! " π " ζ(y)µπ (dy)"" ≤ w(x)||ζ||w Rαt . sup ""Ex [ζ(xt )] − π∈Π

In the weakly continuous case, we do not need to impose a value boundedness condition because, in fact, Assumption 3.8 implies Assumption 3.4 (the proof is easy; see, e.g., Lemma 4.5 in [3] or Lemma

The vanishing discount approach to average reward optimality

10.4.2 in [7]). A suﬃcient condition for Assumption 3.8 is proposed in [7, Proposition 10.2.5]. In what follows, we will suppose that either the Assumptions 3.1, 3.2 and 3.4 or the Assumptions 3.5, 3.6 and 3.8 hold. In either case, we know from the results in [7, Chapter 8] that the optimal ρ-discounted reward is the unique solution in Bw (S) of the discounted reward optimality equation: # ! " vρ (h(x, a, ξ))ν(dξ) ∀ x ∈ S. (7) vρ (x) = max r(x, a) + ρ a∈Γ(x)

In addition, a policy ∈ Π is ρ-discount optimal if and only if π ∗ (x) attains the maximum in (7) for every x ∈ S, i.e., " ∗ vρ (h(x, π ∗ (x), ξ))ν(dξ) ∀ x ∈ S. (8) vρ (x) = r(x, π (x)) + ρ π∗

The vanishing discount approach to average reward optimality is related to the following definition of limit and accumulation policies. Definition 3.9 Given a policy π ∗ ∈ Π and a sequence {πk }k∈N in Π, we say that (i) {πk }k∈N converges to π if limk πk (x) = π(x) for every x ∈ S; (ii) π ∗ is an accumulation policy of {πk }k∈N if, for every x ∈ S, there exists a subsequence {kx } such that πkx (x) → π(x); (iii) {πk }k∈N converges continuously to π if limk πk (xk ) = π(x) for every x ∈ S and every sequence xk → x. The concept of accumulation policy in Definition 3.9(ii) comes from [13]. Continuous convergence and its applications to stochastic dynamic programming are analyzed in [10]. Next, we prove our main result, which states the relation between average reward optimal policies and the limit of discount optimal policies. The proof of this result, Theorem 3.10, follows the same arguments needed to obtain the so-called average reward optimality inequality [7, Theorem 10.3.1], although the proof is focused on the analysis of the limit of discount optimal policies. Theorem 3.10 Let {ρk }k∈N , with ρk ↑ 1, be a sequence of discount factors, and let πk ∈ Π, for every k ∈ N, be a ρk -discount optimal policy. Then the following holds:

T. Prieto-Rumeau and O. Hern´andez-Lerma

(i) If Assumptions 3.1, 3.2 and 3.4 are satisfied and {πk } converges to π ∗ ∈ Π, then π ∗ is an average reward optimal policy; (ii) If Assumptions 3.5, 3.6 and 3.8 are satisfied and {πk } converges continuously to π ∗ ∈ Π, then π ∗ is an average reward optimal policy. Proof: From Assumption 3.1 or 3.5, an induction argument (see, e.g., [7, Lemma 10.4.1]) gives (9)

Eπx [w(xt )] ≤ β t w(x) +

(1 − β t ) (1 − β)b

∀ π ∈ Π, x ∈ S, t ≥ 0.

Therefore, by Assumption 3.2(ii) or 3.6(ii), we have Eπx |r(xt , π(xt ))| ≤ M β t w(x) +

M (1 − β t ) , (1 − β)b

so that supρ∈(0,1) |(1 − ρ)vρ (x′ )| is finite, with x′ ∈ S as in Assumption 3.4. Thus g := lim inf (1 − ρk )vρk (x′ ) k→∞

is well defined. Our proof now proceeds in two steps. In step one, we prove that g ≥ sup v(x, π) ∀ x ∈ S. π∈Π

In step two, we show that π ∗ satisfies g ≤ v(x, π ∗ ) ∀ x ∈ S. Average reward optimality of π ∗ will then follow. Step one. By definition of uρ (in Section 1), a simple calculation shows that the discounted reward optimality equation (7) can be written in the equivalent form: # ! " (10) (1 − ρ)vρ (x′ ) + uρ (x) = max r(x, a) + ρ uρ (h(x, a, ξ))ν(dξ) a∈Γ(x)

for every x ∈ S. Consider now a subsequence {k ′ } of {k} such that lim (1 − ρk′ )vρk′ (x′ ) = g.

k′ →∞

The vanishing discount approach to average reward optimality

Let u := lim inf k′ uρk′ and note that u is in Bw (S). Now, by (10), for the sequence {ρk′ } and every (x, a) ∈ K we have ! ′ uρk′ (h(x, a, ξ))ν(dξ). (1 − ρk′ )vρk′ (x ) + uρk′ (x) ≥ r(x, a) + ρk′ Z

Taking the lim inf k′ →∞ in this inequality and using Fatou’s lemma (which indeed applies as a consequence of our assumptions), we obtain ! u(h(x, a, ξ))ν(dξ) ∀ (x, a) ∈ K. (11) g + u(x) ≥ r(x, a) + Z

Iteration of (11) yields that, for every initial state x ∈ S, any policy π ∈ Π and t ≥ 0, g ≥ Eπx [r(xt , π(xt ))] + Eπx [u(xt+1 ) − u(xt )]. Summing up these inequalities for t = 0, . . . , T − 1 and then dividing by T yields $ " T −1 # Eπ [u(xT )] − u(x) 1 . r(xt , π(xt )) + x g ≥ Eπx T T t=0

Letting T → ∞, recalling that u ∈ Bw (S) and using (9), we obtain g ≥ v(x, π) and, therefore, (12)

g ≥ sup v(x, π) ∀ x ∈ S. π∈Π

This completes step one. Step two. Since πk is a ρk -discount optimal policy, from (8) and (10) we have ! (1 − ρ)vρk (x′ ) + uρk (x) = r(x, πk (x)) + ρk uρk (h(x, πk (x), ξ))ν(dξ) Z

for every k ∈ N and x ∈ S. Consequently, for every ε > 0 and large enough k, we have ! uρk (h(x, πk (x), ξ))ν(dξ) (13) g − ε + uρk (x) ≤ r(x, πk (x)) + ρk Z

for every x ∈ S.

T. Prieto-Rumeau and O. Hern´andez-Lerma

Suppose now that the Assumptions 3.1, 3.2 and 3.4 are satisfied. Then, taking the lim sup in (13), recalling that r(x, ·) is upper semicontinuous and by the extension of Fatou’s lemma [7, Lemma 8.3.7], we obtain ! ∗ g − ε + u(x) ≤ r(x, π (x)) + u(h(x, π ∗ (x), ξ))ν(dξ), Z

where u := lim supk uρk ∈ Bw (S). But ε > 0 being arbitrary, the same arguments as in the proof of step one yield that g ≤ v(x, π ∗ ) ∀ x ∈ S, which combined with (12) shows that π ∗ is an average reward optimal policy and, besides, that g is the (constant) optimal average reward. This completes the proof of statement (i), that is, under the hypotheses in the strongly continuous case. We now consider the weakly continuous case, which consists of Assumptions 3.5, 3.6 and 3.8. Following [8], we define the generalized lim sup of the sequence uρk as u∗ (x) := sup{lim sup uρk (xk )}, k→∞

where the supremum is taken over the family of sequences {xk } ⊆ S such that xk → x. Let us now go back to (13) and take the lim sup through a sequence xk → x such that lim supk uρk (xk ) ≥ u∗ (x) − ε, so that g − 2ε + u∗ (x) ≤ lim sup r(xk , πk (xk )) k→∞ ! uρk (h(xk , πk (xk ), ξ))ν(dξ). + lim sup k→∞

Then we proceed as in the proof for the strongly continuous case, but this time we take into account that both r and the multifunction Γ are upper semicontinuous. Finally, we apply the Fatou lemma for a generalized lim sup (see [8, Lemma 5] and also [14, Lemma 2.3]) to obtain ! ∗ ∗ (14) g −2ε+u (x) ≤ r(x, π (x))+ u∗ (h(x, π ∗ (x), ξ))ν(dξ) ∀ x ∈ S. Z

This implies, by standard arguments, that v(x, π ∗ ) ≥ g for every x ∈ S. The proof of Theorem 3.10 is complete. !

The vanishing discount approach to average reward optimality

Remark 3.11 The second step in the proof of Theorem 3.10 relies on the application of a Fatou-like lemma. For instance, when the usual value boundedness condition holds, then we use the standard Fatou lemma because the relative value function uρ is bounded above; see (4). Under the strong continuity assumptions, we use the Fatou lemma in [7, Lemma 8.3.7], while if the weak continuity conditions hold, then we use the Fatou lemma for a generalized lim sup in [14, Lemma 2.3]. Therefore, the assumptions we make on the control model heavily depend on the hypotheses needed for the corresponding Fatou lemma and, similarly, the kind of results we reach (statements (i) and (ii) in Theorem 3.10) also depend on the kind of Fatou lemma that is applied. We specialize Theorem 3.10 to the following important particular cases. Corollary 3.12 Suppose that {ρk }k∈N is a sequence of discount factors such that ρk ↑ 1 and let πk ∈ Π, for every k ∈ N, be a ρk -discount optimal policy. (i) Under the strong continuity conditions (Assumptions 3.1, 3.2 and 3.4), if for every x ∈ S the function ρ #→ uρ (x) is monotone (either increasing or decreasing), then any accumulation policy of {πk }k∈N is average reward optimal. (ii) If the state space S is denumerable, then under either the strong or the weak continuity conditions, any accumulation policy of {πk } is average reward optimal. The condition in Corollary 3.12(i) can be interpreted as follows: the expected discounted reward grows faster for any x ∈ S than for x′ ∈ S as ρ ↑ 1, and it is satisfied, for instance, in the consumption-investment model in [6, Section 3.6]; see also [1].

An example

In this section we give an example of a control model that satisfies (5) but does not satisfy the value boundedness condition (4). The following inventory system with permitted backlog is based on the model analyzed in [17]. The state space and the action set are S = A = R. The distribution ν is supported on [0, ∞), it satisfies the

T. Prieto-Rumeau and O. Hern´andez-Lerma

conditions in Remark 3.3, and we assume that its expectation equals one. Furthermore, we suppose that there exists some δ > 0 such that ! ∞ eδξ ν(dξ) < ∞. 0

(Note that, for instance, the mean one exponential distribution satisfies these hypotheses.) Fix a constant K > 1/2 and let ! ∞ 1 e−δξ ν(dξ). 0 < λ < − log δ 0 The action sets Γ(x) are the intervals [−x, max{−2x, −x + K}] for x ≤ 0 and [−x, max{λ, −x + K}]

for x > 0.

The system’s transition function h is given by h(x, a, ξ) = x+a−ξ. The cost function is c(x, a) = (x + a)2 − a (cf. [17, Equation (3.1)]). Finally, let w(x) = eδ|x| for x ∈ R. This control model satisfies the Assumptions 3.1 and 3.2. Given a discount factor 0 < ρ < 1, a direct calculation shows that the optimal ρ-discounted cost function (recall that we are minimizing a cost) is (ρ + 1)2 ∀ x ∈ R, vρ (x) = x − 4(1 − ρ) and the optimal ρ-discount policy is

1 πρ (x) = −x + (1 − ρ) ∀ x ∈ R. 2 Hence, the value boundedness condition (4) does not hold, whereas (5) (or Assumption 3.4) is satisfied. Moreover, for every x ∈ R, πρ (x) converges to −x as ρ ↑ 1. Therefore, by Theorem 3.10(i), the policy π(x) = −x, for x ∈ R, is average cost optimal. Further, from the proof of Theorem 3.10 we also obtain that the minimal average cost is −1 = lim (1 − ρ)vρ (x). ρ↑1

The vanishing discount approach to average reward optimality

Concluding remarks

In the previous sections, we have considered a fairly general discretetime stochastic control model and, under two diﬀerent sets of hypotheses (strong and weak continuity), we have proved that the limit of ρdiscount optimal policies, as the discount factor ρ ↑ 1, is a long-run average reward optimal policy. The main contribution of this paper is to relax the usual value boundedness assumption on the relative value funtion (4) and, instead, assume the weaker condition (5). We have illustrated our results with the generalized inventory system in Section 4. Some important issues, however, remain open. In Theorem 3.10(i) it is assumed that the discount optimal policies {πk } converge to some π ∗ , and then it is proved that π ∗ is average reward optimal. It would be interesting to know whether this convergence can be relaxed, and thus obtain a result like that in Corollary 3.12(i) under general assumptions. To this end, results on the existence of measurable selectors would be involved. Also, it would be interesting to check whether the continuous convergence in Theorem 3.10(ii) can be relaxed to (usual) convergence, perhaps by strengthening the hypotheses on the control model. Tom´ as Prieto-Rumeau Departamento de Estad´ıstica, Facultad de Ciencias, UNED, Senda del Rey 9, 28040, Madrid, Spain, tprieto@ccia.uned.es

Onésimo Hern´ andez-Lerma Departamento de Matem´ aticas, CINVESTAV-IPN, 14-470, México D.F. 07000, México, ohernand@math.cinvestav.mx

References [1] Cruz-Su´arez H. D., A stochastic consumption-investment problem with unbounded utility function, Morfismos 4 (2000), 19–30. [2] Dutta P. K., What do discounted optima converge to? A theory of discount rate asymptotics in economic models, J. Econom. Theory 55 (1991), 64–94. [3] Gordienko E.; Hern´andez-Lerma O., Average cost Markov control processes with weighted norms: existence of canonical policies, Appl. Math. (Warsaw) 23 (1995), 199–218.

T. Prieto-Rumeau and O. Hern´andez-Lerma

[4] Guo X. P.; Zhu Q. X., Average optimality for Markov decision processes in Borel spaces: a new condition and approach, J. Appl. Prob. 43 (2006), 318–334. [5] Hernández-Lerma O.; Lasserre J. B., Average cost optimal policies for Markov control processes with Borel state space and unbounded costs, Systems Control Lett. 15 (1990), 349–356. [6] Hernández-Lerma O.; Lasserre J. B., Discrete-Time Markov Control Processes: Basic Optimality Criteria, Springer, New York, 1996. [7] Hernández-Lerma O.; Lasserre J. B., Further Topics on DiscreteTime Markov Control Processes, Springer, New York, 1999. [8] Ja´skiewicz A.; Nowak A. S., On the optimality equation for average cost Markov control processes with Feller transition probabilities, J. Math. Anal. Appl. 316 (2006), 495–509. [9] Kawaguchi K.; Morimoto H., Long-run average welfare in a pollution accumulation model, J. Econom. Dynam. Control 31 (2007), 703–720. [10] Langen H. J., Convergence of dynamic programming models, Math. Oper. Res. 6 (1981), 493–512. [11] Morimoto H.; Fujita Y., Ergodic control in stochastic manufacturing systems with constant demand, J. Math. Anal. Appl. 243 (2000), 228–248. [12] Nishimura K.; Stachurski J., Stochastic optimal policies when the discount rate vanishes, J. Econom. Dynam. Control 31 (2007), 1416–1430. [13] Schäl M., Conditions for optimality and for the limit of n-stage optimal policies to be optimal, Z. Wahrs. verw. Gerb. 32 (1975), 179–196. [14] Schäl M., Average optimality in dynamic programming with general state space, Math. Oper. Res. 18 (1993), 163–172. [15] Sennott L. I., A new condition for the existence of optimal stationary policies in average cost Markov decision processes, Oper. Res. Lett. 5 (1986), 17–23.

The vanishing discount approach to average reward optimality

[16] Taylor H. M., Markovian sequential replacement processes, Ann. Math. Stat. 36 (1965), 1677â&#x20AC;&#x201C;1694. [17] Vega-Amaya O.; Montes-de-Oca R., Application of average dynamic programming to inventory systems, Math. Methods Oper. Res. 47 (1998), 451â&#x20AC;&#x201C;471.

Morfismos, Vol. 12, No. 2, 2008, pp. 17–32

V´ertices simpliciales y escalonabilidad de grafos Roberto Cruz

Mario Estrada

Resumen Dado un grafo simple no dirigido G, se le asocia un complejo simplicial ∆G cuyas caras corresponden a los conjuntos independientes de G. Van Tuyl y Villarreal definieron un grafo G como escalonable si el complejo simplicial asociado ∆G es escalonable en el sentido no puro de Björner y Wachs. Estos autores demostraron que todos los grafos triangulados son escalonables y que los grafos bipartidos escalonables son precisamente los grafos bipartidos secuencialmente Cohen-Macaulay. En el presente art´ıculo se prueba que el concepto de vértice simplicial de un grafo permite, no solo demostrar estos resultados, sino dar otras condiciones necesarias y suficientes para la escalonabilidad de un grafo. Además se demuestra que todo grafo simplicial es escalonable y que todo grafo arcocircular que contenga al menos un vértice simplicial es escalonable.

2000 Mathematics Subject Classification:13F55, 13D02, 05C38, 05C75. Keywords and phrases: grafos escalonables, v´ertices simpliciales, secuencialmente Cohen-Macaulay, grafos simpliciales, grafos arco-circulantes.

Introducci´ on

Sea G = (VG , EG ) un grafo simple (sin lazos ni aristas mu ´ltiples) no dirigido, VG = {x1 , . . . , xn } su conjunto de v´ertices y EG su conjunto de aristas. Identificando cada v´ertice xi con la variable xi en el anillo de polinomios R = k[x1 , . . . , xn ] sobre el campo k, se le asocia a G un ideal de monomios libres de cuadrados I(G) = ({xi xj | {xi , xj } ∈ EG }). El ideal I(G) se denomina ideal de aristas del grafo G. Utilizando la correspondencia de Stanley - Reisner, se le asocia al grafo G el complejo simplicial ∆G tal que I∆G = I(G), es decir que el ideal de StanleyReisner del complejo simplicial coincida con el ideal de aristas del grafo. 17

Roberto Cruz y Mario Estrada

En este caso, las caretas de ∆G son los conjuntos independientes o conjuntos estables maximales de G. Se dice que el grafo G es escalonable si su complejo simplicial asociado ∆G es escalonable. Esta definición fue introducida por Van Tuyl y Villarreal [15] y se utiliza la definición de escalonabilidad no pura introducida por Björner y Wachs [1]. Para los grafos, la generalización natural de la propiedad Cohen-Macaulay es la de ser secuencialmente Cohen-Macaulay. Un teorema de Stanley [13] afirma que la escalonabilidad implica la propiedad de ser secuencialmente Cohen-Macaulay. En el mencionado trabajo de Van Tuyl y Villarreal [15] se prueban los siguientes teoremas: Teorema 1.1.1 [15, Teorema 2.12] Sea G un grafo triangulado. Entonces G es escalonable. Teorema 1.1.2 [15, Teorema 3.8] Sea G un grafo bipartido. Entonces G es escalonable si y solo si G es secuencialmente Cohen-Macaulay. El argumento central en la prueba del teorema 1.1.1 es la existencia de un vértice x en un grafo triangulado G cuya vecindad induce un subgrafo completo [15, Lema 2.11]. Por otra parte, la demostración del teorema 1.1.2 se basa en que todo grafo bipartido, conexo y secuencialmente Cohen-Macaulay tiene un vértice con grado 1 y en la siguiente afirmacion que da condiciones necesarias y suficientes para la escalonabilidad de un grafo que contiene un vértice de grado 1: Teorema 1.1.3 [15, Teorema 2.9] Sea G un grafo y sean x1 ,y1 dos vértices adyacentes de G con deg(x1 ) = 1. Sean G1 = G\ ({x1 } ∪ NG (x1 )) y G2 = G\ ({y1 } ∪ NG (y1 )) , entonces G es escalonable si y solo si G1 y G2 son escalonables. Curiosamente la introducción del concepto de vértice simplicial permite sustituir las condiciones del anterior teorema por la condición más general de que el grafo G contenga un vértice simplicial. Un vértice x de un grafo G se denomina simplicial si su vecindad NG (x) induce un subgrafo completo. En la Sección 2 se demuestra el siguiente teorema que generaliza el teorema 1.1.3 de Van Tuyl y Villarreal.

V´ertices simpliciales y escalonabilidad de grafos

Teorema 1.1.4 (Teorema 2.1.13) Sea G un grafo, x1 un vértice simplicial, NG (x1 ) = {x2 , . . . , xr } y Gi = G\ ({xi } ∪ NG (xi )), para i = 1, . . . , r. G es escalonable si y solo si Gi es escalonable para todo i = 1, . . . , r. Este resultado es ideal para establecer la escalonabilidad de grafos que tengan al menos un vértice simplicial y ofrece otra demostración para el teorema de Van Tuyl y Villarreal sobre la escalonabilidad de los grafos triangulados y para el teorema de los mismos autores sobre la equivalencia para grafos bipartidos entre la escalonabilidad y la condición de ser secuencialmente Cohen-Macaulay. Este u ´ltimo teorema puede extenderse a los grafos que contienen al menos un vértice simplicial. Teorema 1.1.5 (Corolario 2.1.15) Sea G un grafo que contiene un vértice simplicial. Entonces G es escalonable si y solo si es secuencialmente Cohen - Macaulay. En la Sección 3 se aplica la multiplicación de vértices simpliciales para obtener nuevas condiciones necesarias y suficientes para la escalonabilidad de un grafo. Dado un grafo G y x un vértice simplicial de G, el grafo G◦x se obtiene mediante la multiplicación del vértice x, agregando un nuevo vértice x′ que se conecta a todos los vértices de la vecindad de x. En el trabajo se prueba el siguiente Teorema 1.1.6 (Teorema 3.1.17) Sea G un grafo, x un vértice simplicial de G y G ◦ x el grafo obtenido por la multiplicaci´ on del vértice x. Entonces G es escalonable si y solo si G ◦ x es escalonable. En la sección 4 y final se establece la escalonabilidad de los grafos simpliciales y de los grafos arco-circulares que tienen un vértice simplicial. En un grafo simplicial cada vértice es un vértice simplicial o es adyacente a un vértice simplicial. Teorema 1.1.7 (Teorema 4.1.24)Sea G un grafo simplicial, entonces G es escalonable. Finalmente se demuestra el siguiente teorema sobre la escalonabilidad de los grafos arco-circulares: Teorema 1.1.8 (Teorema 4.1.30) Sea G un grafo arco-circular que tiene al menos un vértice simplicial. Entonces G es escalonable.

Roberto Cruz y Mario Estrada

Escalonabilidad de grafos que contienen v´ ertices simpliciales

En esta sección se generaliza el teorema de Van Tuyl y Villarreal [15, Teorema 2.9] sobre las condiciones necesarias y suficientes para la escalonabilidad de un grafo, reemplazando la condición sobre la existencia de un vértice de grado 1, por la existencia de un vértice simplicial. Definici´ on 2.1.9 Se dice que un complejo simplicial ∆ es escalonable si sus caretas pueden ordenarse F1 , . . . , Fs de forma tal que para todo 1 ≤ umero l ∈ {1, . . . , j − 1} tal i < j ≤ s, existe un vértice v ∈ Fj \Fi y un n´ que Fj \Fl = {v}. La secuencia F1 , . . . , Fs se denomina escalonamiento de ∆. Aqui se utiliza la definición de escalonabilidad ’no pura’ introducida por Bjöner and Wachs [1]. Se dirá que ∆ es escalonable puro si todas las caretas tienen la misma dimensión. Definici´ on 2.1.10 Sea G un grafo simple no dirigido y ∆G su complejo simplicial asociado. Se dice que G es un grafo escalonable si ∆G es un complejo simplicial escalonable. La anterior definición fue introducida por Van Tuyl y Villarreal [15]. En el referido art´ıculo se demuestra que todo grafo triangulado es escalonable [15, Teorema 2.12]. Un grafo se denomina triangulado si todo ciclo de longitud estrictamente mayor que 3 posee una cuerda, es decir, una arista entre dos vértices no consecutivos del ciclo. La demostración se basa en el lema de Dirac [4] que asegura que todo grafo triangulado posee un vértice, denominado simplicial, cuya vecindad induce un subgrafo completo o clique. Dado un subconjunto S ⊂ VG , por G\S se denota el grafo formado a partir de G eliminando todos los vértices de S y todas las aristas incidentes en cada vértice de S. Si x es un vértice de G, por NG (x) se denota la vecindad de x, es decir, el conjunto de todos los vértices de G que son adyacentes a x. Definici´ on 2.1.11 Sea G un grafo simple no dirigido. Un vértice x de G se denomina simplicial si su vecindad NG (x) induce un subgrafo completo de G. Dado un grafo G y S ⊂ VG , denotemos por ⟨S⟩ el subgrafo inducido por el conjunto de vértices S. Notemos que si x es un vértice simplicial

V´ertices simpliciales y escalonabilidad de grafos

de G, el subgrafo inducido ⟨{x} ∪ NG (x)⟩ es un clique maximal, además es el u ´ nico clique maximal que contiene a x. El siguiente teorema de Dirac afirma que todo grafo triangulado tiene un vértice simplicial. Teorema 2.1.12 (Dirac, [4]) Todo grafo triangulado G tiene un vértice simplicial. Adem´ as, si G no es un clique, entonces tiene dos vértices simpliciales no adyacentes entre si. En el teorema 2.9 de [15] el vértice x1 , al ser de grado 1, es un vértice simplicial ya que este vértice junto con su vecindad induce un subgrafo completo maximal que además es el u ńico que contiene a x1 (la arista {(x1 , y1 }). Este hecho y la utilización de los vértices simpliciales en la demostración de la escalonabilidad de los grafos triangulados sugieren la siguiente generalización: Teorema 2.1.13 Sea G un grafo, x1 un vértice simplicial de G y su vecindad NG (x1 ) = {x2 , . . . , xr }. Sea Gi = G\ ({xi } ∪ NG (xi )) para i = 1, . . . , r. G es escalonable si y solo si Gi es escalonable para todo i = 1, . . . , r. Demostraci´ on: Sea G escalonable. El teorema 2.6 del art´ıculo de Van Tuyl y Villarreal[15], asegura que si G es escalonable y x cualquier vértice de G, entonces el grafo G′ = G\ ({x} ∪ NG (x)) es escalonable. Por tanto, los grafos Gi son escalonables. La prueba en la otra dirección es practicamente idéntica a la prueba del teorema 2.9 de [15] sobre la escalonabilidad de los grafos triangulados. Sea Gi escalonable y Fi1 , . . . , Fisi un escalonamiento de ∆Gi para cada ńico i = 1, . . . , r. El subgrafo ⟨{x1 } ∪ NG (x1 )⟩ = ⟨{x1 , . . . , xr }⟩ es el u subgrafo maximal que contiene a x1 . Además cada careta de ∆G , es decir, cada conjunto independiente maximal de G, intersecta a {x1 , . . . , xr } exactamente en un vértice. Por el argumento anterior, la lista completa de caretas de ∆G es F11 ∪ {x1 }, . . . , F1s1 ∪ {x1 }; . . . ; Fr1 ∪ {xr }, . . . , Frsr ∪ {xr }. Se demuestra que la lista con ese orden lineal es un escalonamiento de ∆G . Se consideran dos casos: 1. F ′ = Fik ∪ {xi }, F = Fjt ∪ {xj }, i < j. Se tiene que xj ∈ F \F ′ . Además, el conjunto Fjt ∪{x1 } es un conjunto independiente de G, por tanto está contenido en una de las caretas de ∆G que contiene a x1 , es decir, existe l, 1 ≤ l ≤ s1 , tal que Fjt ∪ {x1 } ⊂ F1l ∪ {x1 }.

Roberto Cruz y Mario Estrada

Denotando por F ′′ = F1l ∪ {x1 }, se tiene que {xj } = F \F ′′ y F ′′ es anterior a F . 2. F ′ = Fik ∪ {xi }, F = Fit ∪ {xi }, k < t. Este caso se demuestra a partir de la escalonabilidad del grafo Gi . ✷ El teorema anterior generaliza el teorema 2.9 de [15] al usar que todo vértice de grado 1, es un vértice simplicial. Este resultado además puede servir para dar otra demostración de que los grafos triangulados son escalonables [15, Teorema 2.12]. Todo subgrafo inducido de un grafo triangulado es triangulado, además todo grafo triangulado por el lema de Dirac (teorema 2.1.12) o es un clique o contiene dos vértices simpliciales. Aplicando la inducción en n = |VG | y suponiendo que el vértice x1 de G es simplicial, los subgrafos Gi son triangulados al ser subgrafos inducidos de G y son escalonables por la hipótesis de inducción. Por el teorema 2.1.13 el grafo G es escalonable. De igual forma, en la demostración de que la condición de un grafo bipartido de ser secuencialmente Cohen-Macaulay implica la escalonabilidad del mismo, [15, Teorema 3.8] se puede utilizar el teorema 2.1.13. Asumiendo que G es bipartido y secuencialmente Cohen-Macaulay y aplicando la inducción en el n´ umero de vértices, el lema 3.7 de [15] asegura la existencia en G de un vértice x1 de grado 1 (es decir, un vértice simplicial). Por el teorema 3.3 del mismo art´ıculo los subgrafos G1 = G\ ({x1 } ∪ NG (x1 )) y G2 = G\ ({y1 } ∪ NG (y1 )), donde y1 es el vértice adyacente a x1 , son secuencialmente Cohen-Macaulay. Por la hipótesis de inducción estos grafos son escalonables y por el teorema 2.1.13 se obtiene que G es escalonable. El teorema 2.1.13 también puede usarse para establecer la escalonabilidad de grafos que tengan vértices simplicales. Se toma el vértice simplicial x1 , se hallan los subgrafos Gi , si alguno de estos no es escalonable, entonces el grafo inicial no es escalonable. Si todos son escalonables entonces el grafo original es escalonable y su escalonamiento puede construirse a partir de los escalonamientos de los subgrafos Gi . a b

! ❅ ❅

g !

Los grafos G (a la derecha) y H (a la izquierda)

❅

❅ ❅

V´ertices simpliciales y escalonabilidad de grafos

Ejemplo 2.1.14 Sean G y H los grafos indicados en la figura anterior. El vértice g del grafo G es simplicial y su vecindad es NG (g) = {c, d}. Los grafos Gg , Gc , Gd son escalonables con escalonamientos: ∆Gg = ⟨{a, e}, {a, f }, {b, e}, {b, f }⟩; ∆Gc = ⟨{b, f }⟩; ∆Gd = ⟨{a, e}⟩. Por el teorema 2.1.13 se obtiene que G es escalonable y que ∆G = ⟨{a, e, g}, {a, f, g}, {b, e, g}, {b, f, g}, {b, f, c}, {a, e, d}⟩, es un escalonamiento de G. Por otra parte, el vértice a es un vértice simplicial del grafo H y su vecindad es NH (a) = {b, c}. El grafo Ha = H\ ({a} ∪ NH (a)) no es escalonable y por el teorema 2.1.13, el grafo H no es escalonable. Van Tuyl y Villarreal demostraron la equivalencia entre la escalonabilidad y la propiedad de ser secuencialmente Cohen - Macaulay para los grafos bipartidos [15, Teorema 3.8]. Como consecuencia del teorema 2.1.13, puede obtenerse un resultado análogo para los grafos que contienen al menos un vértice simplicial. Corolario 2.1.15 Sea G un grafo que contiene un vértice simplicial. Entonces G es escalonable si y solo si es secuencialmente Cohen Macaulay. Demostraci´ on: Si G es escalonable entonces es secuencialmente Cohen - Macaulay seg´ un se deriva de un resultado de Stanley [13]. Sea ahora G secuencialmente Cohen - Macaulay y supongamos que todo grafo secuencialmente Cohen - Macaulay con un n´ umero menor de vértices es escalonable. Sea x1 un vértice simplicail de G, NG (x1 ) = {x2 , . . . , xr }. Los grafos Gi = G\ ({xi } ∪ NG (xi )) para i = 1, . . . , r, son secuencialmente Cohen - Macaulay [15, Teorema 3.3] y por la hipotesis de inducción son escalonables. El teorema 2.1.13 asegura la escalonabilidad del grafo G. ✷

Multiplicaci´ on de v´ ertices simpliciales

En esta sección se aplica la multiplicación de vértices simpliciales a grafos escalonables con el fin de obtener nuevos grafos escalonables. La multiplicación de vértices es la clave de la demostración dada por Lovász [7] del teorema de los grafos perfectos, que afirma que un grafo

Roberto Cruz y Mario Estrada

es perfecto si y solo si lo es su complemento. Un grafo G es perfecto si para todo subgrafo inducido, se cumple que su n´ umero cromático es igual a su n´ umero clique. Tanto los grafos bipartidos como los grafos triangulados son grafos perfectos. En la sección se utiliza la definición de multiplicación de vértices dada por Golumbic[6]. Definici´ on 3.1.16 [6] Sea G un grafo, x un vértice de G. El grafo G ◦ x se obtiene de G agregando un nuevo vértice x′ que se conecta a todos los vértices de NG (x). En este caso se dice que el grafo G ◦ x se obtiene por multiplicaci´ on del vértice x. Teorema 3.1.17 Sea G un grafo, x un vértice simplicial de G y G ◦ x el grafo obtenido por la multiplicaci´ on del vértice x. Entonces G es escalonable si y solo si G ◦ x es escalonable. Demostraci´ on: Sea x un vértice simplicial de G, x′ el nuevo vértice que se conecta a todos los vértices de NG (x), G′ = G ◦ x y NG (x) = {x1 , . . . , xr } = NG′ (x′ ). El vértice x′ es simplicial en G′ . Para i = 1, . . . , r sea Gi = G\ ({xi } ∪ NG (xi )) = G′i = G′ \ ({xi } ∪ NG′ (xi )) .

Los grafos obtenidos al quitar los v´ertices x′ y x junto con sus vecindades de los respectivos grafos G′ y G cumplen la relaci´on: !

G′x′ = G′ \ {x′ } ∪ NG′ (x′ ) = G\ ({x} ∪ NG (x)) ∪ {x} = Gx ∪ {x},

es decir, el grafo G′x′ es el mismo grafo Gx agregandole el vértice aislado {x}. Sea G escalonable. Por teorema 2.1.13, los grafos Gx , G1 , . . . , Gr , son escalonables. El grafo Gx ∪ {x} es también escalonable, basta agregar el vértice x a todas las caretas de ∆Gx . Esto significa que los grafos G′x′ , G′1 , . . . , G′r son escalonables y por el teorema 2.1.13, G′ = G ◦ x es escalonable. Sea ahora G′ escalonable. Los grafos G′x′ , G′1 , . . . , G′r son escalonables por el teorema 2.1.13. Notemos que si G′x′ = Gx ∪ {x} es escalonable, entonces Gx es escalonable, basta quitar al vértice x de todas las caretas de ∆G′ ′ , pues x aislado. Entonces los grafos Gx , G1 , . . . , Gr son x escalonables y por el teorema 2.1.13, el grafo G es escalonable. ✷ Si G es un grafo que tiene dos vértices simpliciales no adyacentes con la misma vecindad, se puede considerar uno de estos vértices como multiplicación del otro, por tanto podemos eliminarlo del grafo y analizar la escalonabilidad del grafo reducido.

V´ertices simpliciales y escalonabilidad de grafos

La multiplicación de vértices simpliciales puede generalizarse agregando más de un vértice a cada vértice simplicial. Definici´ on 3.1.18 Sea G un grafo, S = {x1 , . . . , xr } ⊂ VG un conjunto vértices simpliales tales que NG (xi ) ̸= NG (xj ) para i ̸= j y sea h = (h1 , . . . , hr ) un vector de enteros positivos. El grafo H = G ◦ h se obtiene de G por multiplicaci´ on de los vértices de S, si por cada vértice simplicial xi , i = 1, . . . , r, se agregan a G hi nuevos vértices x1i , . . . , xhi i y cada uno de estos vértices se conecta a todos los vértices de NG (xi ). Corolario 3.1.19 Sea G un grafo y S = {x1 , . . . , xr } ⊂ VG , conjunto de vértices simpliciales tales que NG (xi ) ̸= NG (xj ) para i ̸= j y sea h = (h1 , . . . , hr ) un vector de enteros positivos. Entonces G es escalonable si y solo si el grafo H = G ◦ h es escalonable. Demostraci´ on: rema 3.1.17. ✷

Para cada v´ertice xi de S, se aplica hi veces el teo-

Nota 3.1.20 Dado un grafo escalonable G que contiene varios vértices simpliciales, el corolario anterior permite obtener nuevos grafos escalonables multiplicando cada uno de los vértices simpliciales de G. Si se tiene un escalonamiento de G, ser´ıa conveniente contar con un procedimiento sencillo que permita construir un escalonamiento del grafo multiplicado. La demostraci´ on del teorema 2.1.13 garantiza que si x es un vértice simplicial, se puede construir un escalonamiento de ∆G , F1 , . . . , Fs , F1′ , . . . , Fr′ , tal que las caretas F1 , . . . , Fs , en las cuales x est´ a contenido, ocupan las primeras posiciones. Por otra parte, la demostraci´ on del teorema 3.1.17 garantiza que el grafo G ◦ x tiene un escalonamiento que se obtiene agregando el nuevo vértice a las caretas F1 , . . . , Fs . Sin embargo, cuando el grafo escalonable H es producto de la multiplicaci´ on de m´ as de un vértice simplicial del grafo escalonable G y partiendo de un escalonamiento de G se agregan los nuevos vértices a las caretas en las cuales est´ an contenidos los vértices simpliciales correspondientes, se puede obtener una lista de caretas que no constituye un escalonamiento de H como se muestra en el ejemplo 3.1.21. Sea G un grafo escalonable, x1 , . . . , xr vértices simpliciales de G, h = (h1 , . . . , hr ) un vector de enteros positivos y F1 , . . . , Fs es un escalonamiento de ∆G . El escalonamiento del grafo escalonable H = G ◦ h se puede obtener de la siguiente forma. on de las caretas que contienen a x1 y sea Sea Ft1 . . . , Ftp la subsucesi´ Fr1 . . . , Frq la subsucesi´ on de las caretas que no contienen a x1 donde

Roberto Cruz y Mario Estrada

p + q = s. La demostraci´ on del teorema 2.1.13 garantiza que Ft1 . . . , Ftp , Fr1 . . . , Frq es un escalonamiento de ∆G . Sean ahora Ft′1 . . . , Ft′p las caretas obtenidas al agregarles a las caretas Ft1 . . . , Ftp los h1 nuevos v´ertices correspondientes a x1 ; la demostraci´ on del teorema 3.1.17 garantiza que Ft′1 . . . , Ft′p , Fr1 . . . , Frq es un escalonamiento del grafo escalonado H1 = G ◦ (h1 , 0, . . . , 0). Si aplicamos sucesivamente el procedimiento descrito a los grafos H2 = H1 ◦ (0, h2 , 0, . . . , 0), H3 = H2 ◦ (0, 0, h3 , 0, . . . , 0), . . . ,

H = Hr = Hr−1 ◦ (0, 0, . . . , 0, hr )

se obtiene el escalonamiento buscado. Ejemplo 3.1.21 Sean los grafos G y H = G ◦ h, donde S = {x1 , y1 } y h = (1, 2). x1

❅

❅ ❅

x2 " ❅ a

❅

" ❅

" " ✟ y2 ✟ " ✟

Es f´ acil ver que el grafo G es escalonable y que ∆G = ⟨{x1 , c, y1 }, {x1 , d}, {a, c, y1 }, {b, y1 }, {b, d}⟩, es un escalonamiento de ∆G . El vértice x2 y los vértices y2 , y3 del grafo H son producto de la multiplicaci´ on de los vértices x1 , y1 respectivamente. Por el corolario anterior el grafo H es escalonable, sin embargo si en el escalonamiento anterior agregamos el vértice x2 a las caretas que contienen x1 y los acil ver que la lista vértices y2 , y3 a las caretas que contienen y1 , es f´ obtenida ⟨{x1 , x2 , c, y1 , y2 , y3 }, {x1 , x2 , d}, {a, c, y1 , y2 , y3 }, {b, y1 , y2 , y3 }, {b, d}⟩, no es un escalonamiento de ∆G . Como las caretas que contienen x1 ocupan las primeras posiciones, siguiendo la demostraci´ on del teorema 3.1.17 podemos agregar a estas

V´ertices simpliciales y escalonabilidad de grafos

caretas el vértice x2 , para obtener un escalonamiento del grafo H1 = G ◦ (1, 0) producto de la multiplicaci´ on del vértice x1 . El escalonamiento obtenido es: ∆H1 = ⟨{x1 , x2 , c, y1 }, {x1 , x2 , d}, {a, c, y1 }, {b, y1 }, {b, d}⟩. Se puede reorganizar el escalonamiento anterior, tomando todas las caretas que contienen el vértice simplicial y1 en su orden y colocandolas en las primeras posiciones ∆H1 = ⟨{x1 , x2 , c, y1 }, {a, c, y1 }, {b, y1 }, {x1 , x2 , d}, {b, d}⟩. Agregando ahora a estas caretas los vértices y2 , y3 se obtiene un escalonamiento del complejo simplicial asociado a H = H1 ◦ (0, 2) = G ◦ (1, 2): ∆H = ⟨{x1 , x2 , c, y1 , y2 , y3 }, {a, c, y1 , y2 , y3 }, {b, y1 , y2 , y3 }, {x1 , x2 , d}, {b, d}⟩.

Grafos simpliciales y arco-circulares.

En esta sección se establece la escalonabilidad de los grafos simpliciales y de los grafos arco-circulares que contienen al menos un vétice simplicial. Los grafos simpliciales fueron introducidos en [2] y en [3] se estudian varias propiedades de estos grafos que pueden ser establecidas con algoritmos polinomiales. En un grafo simplicial cada vértice es simplicial o es adyacente a un vértice simplicial. Definici´ on 4.1.22 Dado un grafo G, un clique de G se denomina simplejo si contiene uno o m´ as vértices simpliciales. El grafo G se denomina simplicial si cada vértice est´ a contenido en un simplejo, es decir, cada vértice es simplicial o pertenece a la vecindad de un vértice simplicial. Lema 4.1.23 Sea G un grafo simplicial. Para cualquier vértice v de G, el grafo Gv = G\ ({v} ∪ NG (v)) es simplicial. Demostraci´ on: Notemos que si G es un grafo simplicial, x un vértice simplicial de G y v un vértice de G tal que x ∈ / NG (v), entonces x es simplicial en el subgrafo Gv = G\ ({v} ∪ NG (v)). En efecto, al quitar del grafo G el vértice v y su vecindad, pueden eliminarse algunos vértices del simplejo

Roberto Cruz y Mario Estrada

que contiene a x, no obstante x y los vecinos de x que quedan en Gv inducen un simplejo en Gv . Supongamos que dado v un vértice cualquiera de G, el grafo Gv = G\ ({v} ∪ NG (v)) no es simplicial. Entonces existe un vértice u de Gv tal que u no es simplicial en Gv y no es adyacente a un vértice simplicial en Gv . Como G es simplicial pueden darse dos casos: (Caso 1) u es simplicial en G. Como u ∈ / NG (v), por la anterior observación, u es simplicial en Gv , lo cual es una contradicción. (Caso 2) u es adyacente a un vértice x simplicial en G. Evidentemente x ∈ / NG (v), de lo contrario NG (x) ⊂ NG (v) y entonces u pertenecer´ıa a la vecindad de v. Por la anterior observación, x es simplicial en Gv lo cual es una contradicción. ✷ El lema 4.1.23, conjuntamente con el teorema 2.1.13 permiten establecer la escalonabilidad de los grafos simpliciales. Teorema 4.1.24 Sea G un grafo simplicial, entonces G es escalonable. Demostraci´ on: La prueba es por inducción en el n´ umero de vértices. Sea G un grafo simplicial y supongamos que todo grafo simplicial con menos vértices es escalonable. Sea x1 un vértice simplicial y NG (x1 ) = {x2 , . . . , xr }. Los grafos Gi = G\ ({xi } ∪ NG (xi )) , son grafos simpliciales por el lema 4.1.23. Por la hipótesis de inducción estos grafos son escalonables. Por el teorema 2.1.13 el grafo G es escalonable. ✷ Si un grafo es escalonable, entonces es secuencialmente Cohen Macaulay seg´ un se deriva del resultado de Stanley [13]. El teorema anterior implica que los grafos simpliciales son secuencialmente CohenMacaulay. Si además un grafo simplicial G es no mezclado, es decir, todos los cubrimientos-vértices de G tienen la misma cardinalidad, entonces el teorema 4.1.24 implica que G es escalonable puro y por tanto Cohen-Macaulay. Dado un grafo G y S ⊂ VG , se considera el grafo G∪W (S), obtenido mediante la adición de nuevos vértices {yi | xi ∈ S} y nuevas aristas llamadas ”bigotes”’(whiskers) {{xi , yi } | xi ∈ S}. Un corolario del teorema anterior es el siguiente teorema de Villarreal[14]; ver también [12]. Corolario 4.1.25 [12, Teorema 2.1] Sea G un grafo simple y VG su conjunto de vértices. Entonces el grafo G ∪ W (VG ) es Cohen-Macaulay.

V´ertices simpliciales y escalonabilidad de grafos

Demostraci´ on: Cada uno de los vértices agregados es simplicial, por tanto G ∪ W (VG ) es un grafo simplicial y por el teorema 4.1.24 es escalonable. Como G ∪ W (VG ) es no mezclado, entonces G ∪ W (VG ) es Cohen - Macaulay. ✷ Los grafos bien cubiertos, fueron introducidos en [8] y han sido extensamente estudiados [9]. Definici´ on 4.1.26 Un grafo G es bien cubierto si todo conjunto independiente maximal es un conjunto independiente m´ aximo. La clase de los grafos bien cubiertos coincide con la clase de los grafos no mezclados pues si todos los conjuntos independientes maximales de un grafo tienen la misma cardinalidad, los cubrimientos vértices minimales tambien tienen la misma cardinalidad. En [10], Prisner et al. caracterizan los grafos simpliciales y triangulados que son bien cubiertos. Teorema 4.1.27 [10, Teorema 1] Un grafo G es simplicial y bien cubierto si y solo si cada vértice v de G pertenece exactamente a un simplejo. Teorema 4.1.28 [10, Teorema 2] Sea G un grafo triangulado. Entonces G es bien cubierto si y solo si cada vértice v de G pertenece exactamente a un simplejo. A´ un cuando la clase de los grafos simpliciales y la clase de los grafos triangulados no son comparables entre s´ı, es decir, ninguna de estas dos clases de grafos es subclase de la otra, el teorema 4.1.28 implica que los grafos triangulados bien cubiertos (no mezclados) son grafos simpliciales bien cubiertos. As´ı la propiedad de los grafos triangulados no mezclados de ser Cohen-Macaulay es consecuencia de la propiedad de los grafos simpliciales no mezclados de ser Cohen-Macaulay. Los grafos arco-circulares [6] son una clase de grafos que generalizan a los grafos de intervalos. Un grafo de intervalo es un grafo de intersección de un conjunto de intervalos en la recta real. Los grafos de intervalos son triangulados y por tanto escalonados y secuencialmente Cohen - Macaulay. Definici´ on 4.1.29 Un grafo G es arco-circular si sus vértices pueden ponerse en correspondencia uno a uno con un conjunto de arcos en un c´ırculo de forma tal que dos vértices de G son adyacentes si y solo si sus arcos asociados se intersectan.

Roberto Cruz y Mario Estrada

Los grafos arco-circulares en general no son triangulados pues todos los ciclos son arco-circulares. Estos grafos no son escalonables en general, pues los ciclos pares no son escalonables. En el siguiente teorema mostraremos que si un grafo arco-circular contiene al menos un vértice simplicial, entonces es escalonable. Teorema 4.1.30 Sea G un grafo arco-circular que tiene al menos un vértice simplicial. Entonces G es escalonable. Demostraci´ on: Sea G un grafo arco-circular y v un vértice cualquiera de G. Entonces el grafo Gv = G\ ({v} ∪ NG (v)) , es un grafo de intervalo. De hecho, quitar de G el vértice v y su vecindad, es equivalente a quitar del c´ırculo el arco correspondiente y todos los arcos que se intersectan con este. Los arcos restantes se pueden entonces poner en correspondencia uno a uno con un conjunto de intervalos en la recta real, es decir, el grafo Gv es un grafo de intervalo. Supongamos que G no es completo, (de lo contrario el grafo es obviamente escalonable), x1 es un vértice simplicial de G y NG (x1 ) = {x2 , . . . , xr }. Los grafos Gi = G\ ({xi } ∪ NG (xi )), para i = 1, . . . , r son grafos de intervalos, por tanto son triangulados y escalonables. Por el teorema 2.1.13 el grafo G es escalonable. ✷ El teorema anterior implica que todos los grafos arco-circulares que tienen al menos un vértice simplicial son secuencialmente Cohen-Macaulay. Si G es un grafo arco-circular, con al menos un vértice simplicial y no mezclado, entonces G es Cohen-Macaulay.

Agradecimientos El financiamiento de este trabajo está a cargo del Proyecto de In´ vestigación ”Algebra conmutativa combinatoria, álgebras monomiales y grafos qu´ımicos”, E01250, Universidad de Antioquia. El primer autor también agradece al Programa de Asociados del International Centre of Theoretical Physics (ICTP). Roberto Cruz Rodes Departamento de Matem´ aticas, Universidad de Antioquia, Calle 67 N 53108 - A. A. 1226, Medell´ın, Colombia rcruz@matematicas.udea.edu.co

Mario Estrada Vald´es Departamento de Matem´ aticas, Universidad de Antioquia, Calle 67 N 53108 - A. A. 1226, Medell´ın, Colombia mestrada@matematicas.udea.edu.co

V´ertices simpliciales y escalonabilidad de grafos

Referencias [1] Bj¨orner A. y Wachs M., Shellable nonpure complexes and posets. I. Trans. Amer. Math. Soc., 348 (1996), 1299-1327. [2] Cheston G. C. A., Hare E. O., Hedetniemi S. T. y Laskar R. C., Simplicial graphs, Congressus Numerantium, 67 (1988), 241 - 258. [3] Cheston G. A. y Jap T. S., A survey of the algorithmic properties of simplicial, upper bound and midle graphs, Journal of Graph Algorithms and Applications, 10 (2006), 159 - 190. [4] Dirac G. A., On rigid circuit graphs, Abh. Math. Sem. Univ. Hamburg. 25 (1961), 71-76. [5] Fulkerson D.R. y Gross O.A., Incidence matrices and interval graphs, Pacific J. Math. 15 (1965), 835-855. [6] Golumbic M. C., Algorithmic graph theory and perfect graphs. Second edition., Elsevier, 2004. [7] Lov´asz L., A characterization of perfect graphs, J. Combin. Theory B 13 (1972), 253 - 267. [8] Plummer M. D., Some covering concepts in graphs, J. Combin. Theory, 8 (1970), 91 - 98. [9] Plummer M. D., Well covered graphs: a survey, Quaest. Math., 16 (1993), 253 - 287. [10] Prisner E., Topp J., Vestergaard P. D., Well covered simplicial, chordal and circular arc graphs, J. of Graph Theory, 21 (1996), 113-119. [11] Rose D. J., Tarjan R. E. y Leuker G. S., Algorithmic aspects of vertex elimination on graphs, SIAM J. Comput., 5 (1976), 266283. [12] Simis A., Vasconcelos W. y Villarreal R., On the ideal theory of graphs, J. Algebra 167 (1994), 389 - 416. [13] Stanley R. P., Combinatorics and Commutative Algebra.Second Edition. Progress in mathematics 41, Birkhuser Boston, Inc., Boston, MA, 1996.

Roberto Cruz y Mario Estrada

[14] Villarreal R. H., Cohen-Macaulay graphs, Manuscripta Math., 66 (1990), 277-293. [15] Van Tuyl A. y Villarreal R. H., Shellable graphs and sequentially Cohen - Macaulay bipartite graphs, (2007) Preprint. math CO/0701296v1.

Morfismos, Vol. 12, No. 2, 2008, pp. 33–52

Asymptotic normality of average cost Markov control processes ∗ Armando F. Mendoza-P´erez

Abstract This paper studies asymptotic normality of Markov control processes (MCPs) in Borel spaces with unbounded cost. Under suitable hypotheses we show that within the class of canonical policies there exists one where the cost is asymptotically normal.

2000 Mathematics Subject Classification: 93E20, 90C40. Keywords and phrases: (discrete-time) Markov control processes, average cost criteria, expected average cost, average variance, asymptotic normality.

Introduction.

We study the asymptotic normality of discrete-time MCPs in Borel spaces with possibly unbounded cost. Under suitable hypotheses we show that within the class of so-called canonical policies, those that minimize the limiting average variance have an asymptotic normality behavior, that is, certain distribution of the cost is asymptotically normal. Asymptotic normality is very useful in adaptive control problems. The only works for the variance minimization problem in MCPs are those by Mandl [7, 9, 10], Hern´andez-Lerma et al. [5], Prieto-Rumeau and Hern´andez-Lerma [11] and Zhu and Guo [15]. For the asymptotic behavior of the MCPs, there are a lot fewer works. For instance, we should mention the paper by Mandl [8] for finite state MCPs. ∗

This paper is part of the author’s Doctoral Thesis written at the Departamento de Matem´ aticas, CINVESTAV-IPN.

Armando F. Mendoza-P´erez

To obtain our results we combine two approaches. The first one, to obtain canonical policies with minimum average variance, we use the W uniform ergodicity assumptions in [5]. The second one follows Mandl’s approach [8] to extend asymptotic normality for MCPs in Borel spaces. The remainder of the paper is organized as follow. Section 2 contains a brief description of the Markov control model of interest. In Section 3 we introduce our hypotheses and state our main result, Theorem 3.7, which is proved in Section 4. Finally, a LQ system in Section 5 illustrates our results.

The control model.

Let (X, A, {A(x) : x ∈ X}, Q, C) be a discrete time Markov control model with state space X and control (or action) set A, both assumed to be Borel spaces with σ-algebras B(X) and B(A), respectively. For each x ∈ X there is a nonempty Borel set A(x) in B(A) which represents the set of feasible actions in the state x. The set K := {(x, a) : x ∈ X, a ∈ A(x)} is assumed to be a Borel subset of K × A. The transition law Q is a stochastic kernel on X given K and the one-stage cost C is a real-valued measurable function on K. The class of measurable functions f : X → A such that f (x) is in A(x) for every x ∈ X is denoted by F and we suppose that is nonempty. Control policies. For every n = 0, 1, . . ., let Hn be the family of admissible histories up to time n; that is, H0 := X, and Hn := Kn × X if n ≥ 1. A control policy is a sequence π = {πn } of stochastic kernels πn on A given Hn such that πn (A(xn )|hn ) = 1 for every n-history hn = (x0 , a0 , · · · , xn−1 , an−1 , xn ) in Hn . The class of all policies is denoted by Π. A policy π = {πn } is said to be a (deterministic) stationary policy if there exists f ∈ F such that πn (·|hn ) is the Dirac measure at f (xn ) ∈ A(xn ) for all hn ∈ Hn and n = 0, 1, . . .. Following a standard convention, we identify F with the class of stationary policies. For notational ease we write (1)

Cf (x) := C(x, f (x)) and Qf (·|x) := Q(·|x, f (x)) ∀x ∈ X

for every stationary policy f in F.

Asymptotic normality of MCPs

Let (Ω, F) be the (canonical) measurable space consisting of the sample space Ω := (X × A)∞ and its product σ-algebra F. Then, for each policy π and “initial state” x ∈ X, a stochastic process {(xn , an )} and a probability measure Pxπ are defined on (Ω, F) in a canonical way, where xn and an represent the state and control at time n, n = 0, 1, . . .. The expectation operator with respect to Pxπ is denoted by Exπ . Average cost criteria. For each n = 1, 2, . . ., let Jn (π, x) := Exπ

n−1 !

C(xt , at )

t=0

be the n-stage expected cost when using the policy π, given the initial state x ∈ X. The long-run expected average cost (EAC) is then defined as 1 (2) J(π, x) := lim sup Jn (π, x). n→∞ n Definition 2.1 (a) A policy π ∗ is said to be EAC-optimal if (3)

J(π ∗ , x) = inf J(π, x) =: J ∗ (x) π∈Π

∀x ∈ X.

(b) A stationary policy f∗ ∈ F is called canonical if there exists a constant ρ∗ and a measurable function h1 : X → R such that (4) ρ∗ + h1 (x) = min

a∈A(x)

C(x, a) +

h1 (y)Q(dy|x, a)

∀x ∈ X,

and f∗ (x) ∈ A(x) attain the minimum on the right-hand side of (4) for every x ∈ X, i.e., (5)

ρ∗ + h1 (x) = Cf∗ (x) +

h1 (y)Qf∗ (dy|x)

∀x ∈ X.

If (4) and (5) are satisfied, then (ρ∗ , h1 , f∗ ) is said to be a canonical triplet (see [1, 2, 14]). Remark 2.2 (See [2, Section 5.2].) If (ρ∗ , h1 , f∗ ) is a canonical triplet and in addition h1 satisfies that (6)

1 π E h1 (xn ) = 0 n→∞ n x lim

∀π ∈ Π, x ∈ X,

then f∗ is EAC-optimal and ρ∗ is the optimal expected average cost, that is, J(f∗ , x) = J ∗ (x) = ρ∗ ∀x ∈ X. (7)

Armando F. Mendoza-P´erez

Hence we have (8)

Fcp ⊂ Feac ,

where Fcp is the class of canonical policies and Feac ⊂ F is the class of stationary EAC-optimal policies. For each n = 1, 2, . . ., let (9)

Sn (f, x) :=

n−1 !

C(xt , at )

t=0

be the n-stage pathwise (or sample-path) cost when using the policy f ∈ F, given the initial state x ∈ X.

Definition 2.3 (a) For each f ∈ F and x ∈ X, define the limiting average variance (10)

1 V (f, x) := lim sup Exf Sn (f, x) − Jn (f, x) . n→∞ n

(b) A stationary policy fˆ is called variance-minimal if (11)

V (fˆ, x) = inf V (f, x) f ∈Feac

∀x ∈ X.

Assumptions and main result.

In this section we introduce conditions to study asymptotic normality. We shall first introduce two sets of hypotheses. The first one, Assumption 3.1, consists of standard continuity-compactness conditions (see, for instance, [1, 3, 5, 12]) together with a growth condition on the one-step cost C. Assumption 3.1 For every state x ∈ X: (a) A(x) is a compact subset of A;

(b) C(x,a) is lower semicontinuous in a ∈ A(x); $

(c) the function a %→ X u(y)Q(dy|x, a) is continuous on A(x) for every bounded measurable function u on X; (d) there exists a measurable function W ≥ 1, a bounded measurable function b ≥ 0, and nonnegative constants r1 and β with β < 1, such that

Asymptotic normality of MCPs

(d1) |C(x, a)| ≤ r1 W (x) (d2) (d3)

X W (y)Q(dy|x, a)

∀(x, a) ∈ K and is continuous in a ∈ A(x); and ≤ βW (x) + b(x) for every x ∈ X.

To state our second set of hypotheses, let us first introduce the following notation: BW (X) denotes the normed linear space of measurable functions u on X with finite W -norm ∥u∥W , which is defined as (12)

∥u∥W := sup |u(x)|/W (x). x∈X

In this case we say that u is W -bounded. Let µ(·) be a measure on X. We write (13)

µ(u) :=

u(y)µ(dy) X

whenever the integral is well-defined. Assumption 3.2 For each stationary policy f ∈ F: (a) (W -geometric ergodicity) There exists a probability measure µf on X such that (14)

#" # # # # # u(y)Qtf (dy|x) − µf (u)# ≤ ∥u∥W Rρt W (x), # # X #

for every t = 0, 1, . . ., u in BW (X) and x ∈ X, where R > 0 and 0 < ρ < 1 are constants independent of f . (b) (Irreducibility) There exists a σ-finite measure λ on B(X) with respect to which Qf is λ-irreducible. Remark 3.3 (See [4, Theorem 3.5],[13, Theorem 4.5.3],[3, Theorem 10.3.6].) Under Assumptions 3.1 and 3.2, there exists a canonical triplet (ρ∗ , h1 , f∗ ); see Definition 2.1. To obtain asymptotic normality we need to strengthen the growth condition on the cost function C in Assumption 3.1(d1). Assumption 3.4 There exists a positive constant r2 such that (15)

C 4 (x, a) ≤ r2 W (x)

∀(x, a) ∈ K.

Armando F. Mendoza-P´erez

Remark 3.5 (a) Because W ≥ 1, Assumption 3.4 implies Assumption 3.1(d1). Moreover, we have that C 2 (x, a) ≤ r2 1/2 W (x) for every (x, a) in K (Assumption 3.6 in [5]), condition which is necessary to obtain optimal policies with minimal average variance. (b) Under Assumptions 3.1, 3.2 and 3.4, the function h1 satisfying (4) and (5) above is such that h21 and h41 belong to BW (X). (See Lemma 4.3 below.) By the Remark 3.5(b), the function Λ(·, ·) on K defined as (16)

Λ(x, a) :=

h1 (y)Q(dy|x, a) − 2

h1 (y)Q(dy|x, a)

is finite-valued. This function is used to state the following varianceminimization result. Proposition 3.6 (See [5, Theorem 3.8] or [3, Theorem 11.3.8].) Under Assumptions 3.1, 3.2 and 3.4, there exists a constant σ∗2 ≥ 0, a deterministic canonical policy f∗ ∈ Fcp , and a function h2 in BW (X) such that, for each x ∈ X, (17)

σ∗2

+ h2 (x) = Λf∗ (x) +

h2 (y)Qf∗ (dy|x)

Furthermore, f∗ satisfies (11) and V (f∗ , ·) = σ∗2 ; in fact (18) and (19)

V (f∗ , x) = µf∗ (Λf∗ ) = σ∗2 σ∗2 ≤ V (f, x)

∀x ∈ X

∀f ∈ Feac , x ∈ X.

Hence, (19) states that σ∗2 is the minimal average variance. We can now state our main result, which is proved in Section 4. Theorem 3.7 Suppose that Assumptions 3.1, 3.2 and 3.4 hold. Let f∗ ∈ Fcp be a canonical policy satisfying Proposition 3.6, and ρ∗ the optimal average cost as in (7). Then for every initial state x ∈ X, (20)

Sn (f∗ , x) − nρ∗ √ n

has asymptotically a normal distribution N (0, σ∗2 ) as n → ∞, with Sn (f∗ , x) as in (9).

Asymptotic normality of MCPs

Proof of Theorem 3.7.

In the remainder of this paper we suppose that Assumptions 3.1, 3.2 and 3.4 hold. To prove Theorem 3.7 we need some preliminary results, which are stated as Lemmas 4.1, 4.2, 4.3. The following lemma summarizes some well-known results, which are stated here for ease of reference. Lemma 4.1 Let f ∈ F be a deterministic stationary policy and {xt } the Markov chain induced by f . Then (a) [3, Lemma 10.4.1] For each x ∈ X and t = 1, 2, . . . (21)

Exf W (xt ) ≤ [1 + b/(1 − β)]W (x),

with b := supx∈X |b(x)|. Moreover, for every function u in BW (X) the following limits hold: lim

(22)

n→∞

1 f E u(xn ) = 0 np x

with p > 0. (b) [3, Proposition 10.2.3] |Jn (f, x) − nJf | ≤ r1 RW (x)/(1 − ρ) X, n = 1, 2, . . ., where Jf := µf (Cf ). Hence: (c) J(f, x) = limn→∞ Jn (f, x)/n = Jf

∀x ∈

∀x ∈ X.

(d) [3, Proposition 10.2.3] The function hf (x) := (23)

lim [Jn (f, x) − nJf ]

n→∞ ∞ ! t=0

Exf [Cf (xt ) − Jf ]

belongs to BW (X) which is called the “bias of f”. Moreover, by (b), we have ∥hf ∥W ≤ r1 R/(1 − ρ). (24) (e) [3, Theorem 10.3.6] The pair (Jf , hf ) is the unique solution of the Poisson equation (25)

Jf + hf (x) = Cf (x) +

hf (y)Qf (dy|x), ∀x ∈ X,

that satisfies the condition µf (hf ) = 0.

Armando F. Mendoza-P´erez

(f) [3, Theorem 10.3.7] If f is a canonical policy in Fcp , the corresponding solution (Jf , hf ) = (ρ∗ , hf ) to the Poisson equation (25) is such that hf coincides with the function h1 , with h1 as in (4) and (5), that is, hf (·) = h1 (·) + kf for some constant kf . The following lemma states a stronger version of (14) and Lemma 4.1(e). Lemma 4.2 Let w(x) := W (x)1/m with m = 2 or m = 4. For each stationary policy f ∈ F: (a) The Markov chain {xn } induced by f is w-geometrically ergodic, that is, (26)

!" ! ! ! ! ! t u(y)Qf (dy|x) − µf (u)! ≤ ∥u∥w R0 ρt0 w(x) ! ! X !

for all x ∈ X and t = 0, 1, . . ., where ρ0 = ρ1/m < 1 and R0 := R1/m ; (b) The unique solution (Jf , hf ) of the Poisson equation (25) is such that hf is w-bounded. Proof. (a) This part follows from [3, Lemma 11.3.9]. (b) Case m = 4: Note that (15) and part (a) of this lemma yield the W 1/4 -analogue of Lemma 4.1(d). Hence hf is W 1/4 -bounded. Case m = 2: Assumption 3.4 and the fact that W ≥ 1 imply that (27)

1/4

|C(x, a)| ≤ r2 W (x)1/4 ≤ r2 W (x)1/2

∀(x, a) ∈ K.

Part (a) (with m = 2) and (27) yield the W 1/2 -analogue of Lemma 4.1(d), that is, hf is W 1/2 -bounded. ✷ Lemma 4.3 (a) The function h1 (·) satisfying (4) and (5) is W 1/4 bounded. (b) The function h2 (·) satisfying (17) is W 1/2 -bounded.

Asymptotic normality of MCPs

Proof. (a) By Lemma 4.1(f), h1 coincides with hf except for an additive constant, with f a canonical policy. From Lemma 4.2(b), hf is W 1/4 -bounded, therefore h1 is also W 1/4 -bounded. (b) From the proof of Proposition 3.6 (see for instance, [5, Theorem 3.8] or [3, Theorem 11.3.8]) we consider the new Markov control model (X, A, {A∗ (x) : x ∈ X}, Q, Λ),

(28)

with A∗ (x) an appropriate compact subset of A(x) for every x, and Λ(x, a) as in (16). From part (a) of this lemma, h1 is W 1/4 -bounded. Hence we have that Λ satisfies the following growth condition (29)

Λ2 (x, a) ≤ r3 W (x)

∀(x, a) ∈ K,

where r3 is a positive constant. Observe that (29) yields the W 1/2 analogue of Assumption 3.1(d1); hence, by Lemma 4.2(a), the control model (28) is W 1/2 -geometrically ergodic. Then from Lemma 4.1 applied to the control model (28) with W 1/2 instead of W , and h2 instead of h1 , it follows that h2 is W 1/2 -bounded. ✷ We are finally ready for the proof of Theorem 3.7. Proof of Theorem 3.7. Let (ρ∗ , h1 , f∗ ) be a canonical triplet as in Definition 2.1. Moreover, let (σ∗2 , h2 , f∗ ) be as in Proposition 3.6. We define τ1 (x, a) :=

τ2 (x, a) :=

and

h1 (y)Q(dy|x, a) − h1 (x) + C(x, a) − ρ∗ h2 (y)Q(dy|x, a) − h2 (x) + Λ(x, a) − σ∗2

for all (x, a) ∈ K. For l = 1, 2, and x ∈ X, let ψl (x, a) :=

hl (y)Q(dy|x, a) − hl (x),

and consider the characteristic functions χn (u) := exp{iu(Sn (f∗ , x) − nρ∗ )} for n = 1, 2, · · · ; u ∈ R, with χ0 (u) := 1. Let (30) (31)

e1 (z) := exp{iz} − iz − 1, z2 − iz − 1. e2 (z) := exp{iz} + 2

Armando F. Mendoza-P´erez

Observe that (32)

τ1 (x, a) = ψ1 (x, a) + C(x, a) − ρ∗ ,

and (33)

τ2 (x, a) = ψ2 (x, a) + Λ(x, a) − σ∗2

for all (x, a) ∈ K. To prove the theorem we have to verify that

! u " 1 lim Exf∗ χn √ = exp{− σ∗2 u2 }. n→∞ n 2

(34)

To this end, first notice that ψl (xm , am ) for l = 1, 2, is the conditional expectation of hl (xm+1 ) − hl (xm ) given xm , am , that is, ψl (xm , am ) = Exf∗ [hl (xm+1 ) − hl (xm )|xm , am ]. This yields for l = 1, 2, with χm := χm (u) and ψl := ψl (xm , am ), the equations (35)

iuExf∗

# n−1 $

m=0

and (36)

χm ψ1 −

n−1 $

m=0

% "

χm h1 (xm+1 ) − h1 (xm )

! " n−1 $ $ u2 f∗ n−1 Ex 0= χm h2 (xm+1 ) − h2 (xm ) − χm ψ2 . 2 m=0 m=0 !

To simplify the notation, let C := C(xm , am ), e1 := e1 u(C − ρ∗ ) !

and e2 := e2 u(C − ρ∗ ) . Moreover, notice that (37)

From (30), (31) and (37) we have Exf∗ χn − 1 = Exf∗ = Exf∗

(38)

n−1 $

and n−1 $

m=0

(χm+1 − χm )

m=0 n−1 $& m=0

−iuExf∗

χm+1 − χm = exp{iu(C − ρ∗ )} − 1 χm .

' 1 iu(C − ρ∗ ) − u2 (C − ρ∗ )2 + e2 χm , 2 "

χm h1 (xm+1 ) − h1 (xm ) =

Asymptotic normality of MCPs

iuExf∗ h1 (x0 ) − χn h1 (xn ) + !

n−1 "

m=0

h1 (xm+1 )(χm+1 − χm ) =

iuExf∗ h1 (x0 ) − χn h1 (xn )+ n−1 "

(39)

m=0

Similarly,

h1 (xm+1 ) iu(C − ρ∗ ) + e1 χm .

$ % " u2 f∗ n−1 Ex χm h2 (xm+1 ) − h2 (xm ) = 2 m=0

−

n−1 # " u2 f∗ ! Ex h2 (x0 ) − χn h2 (xn ) + h2 (xm+1 )(χm+1 − χm ) = 2 m=0

−

u2 f∗ ! E h2 (x0 ) − χn h2 (xn )+ 2 x n−1 "

(40)

m=0

h2 (xm+1 ) exp{iu(C − ρ∗ )} − 1 χm .

Adding (35)-(40) and using (32) Exf∗ χn −1 !

= iuExf∗ h1 (x0 )−χn h1 (xn )+

n−1 "

m=0

χm τ1 (xm , am )+

n−1 "

e1 h1 (xm+1 )χm

m=0

−

& ' u2 f∗ Ex χm ψ2 + 2h1 (xm+1 )(C − ρ∗ ) + (C − ρ∗ )2 2 m=0

−

n−1 $ % # " u2 f∗ ! Ex h2 (x0 ) − χn h2 (xn ) + h2 (xm+1 ) exp{iu(C − ρ∗ )} − 1 χm 2 m=0

n−1 "

+Exf∗

n−1 "

e2 χm .

m=0

Hence

Exf∗ χn − 1 = κ′′ (n, u)− (41)

& ' " u2 f∗ n−1 Ex χm ψ2 + 2h1 (xm+1 )(C − ρ∗ ) + (C − ρ∗ )2 2 m=0

Armando F. Mendoza-P´erez

with κ′′ (n, u) = !

iuExf∗ h1 (x0 ) − χn h1 (xn ) + −

n−1 "

χm τ1 (xm , am ) +

m=0 n−1 "

n−1 "

e1 h1 (xm+1 )χm

m=0

$ % # u2 f ∗ ! Ex h2 (x0 )−χn h2 (xn ) + h2 (xm+1 ) exp{iu(C−ρ∗ )}−1 χm 2 m=0

+Exf∗

(42)

n−1 "

e2 χm .

m=0

Observing that

Λ(xm , am ) = Exf∗ [h21 (xm+1 )|xm , am ] − Exf∗ [h1 (xm+1 )|xm , am ] and in view of (33), we can express (41) as Exf∗ χn −1 = κ′′ (n, u)−

& " u2 f∗ n−1 Ex χm σ∗2 +τ2 (xm , am )−h21 (xm+1 ) 2 m=0 $

+ Exf∗ [h1 (xm+1 )|xm , am ] + C(xm , am ) − ρ∗

%2 '

& " u2 f∗ n−1 = κ (n, u)− Ex χm σ∗2 +τ2 (xm , am )−h21 (xm+1 ) 2 m=0 ′′

h1 (y)Q(dy|xm , am ) + C(xm , am ) − ρ∗

Since f∗ is a canonical policy, it satisfies h1 (xm ) =

(

%2 '

h1 (y)Q(dy|xm , am ) + C(xm , am ) − ρ∗ .

Then, from (37), we have Exf∗ χn −1 = κ′′ (n, u) −

& ' " u2 f∗ n−1 Ex χm σ∗2 + τ2 (xm , am ) − h21 (xm+1 ) + h21 (xm ) 2 m=0

Asymptotic normality of MCPs

" ! u2 σ∗2 n−1 u2 Exf∗ χm − Exf∗ h21 (x0 ) − χn h21 (xn ) 2 m=0 2

= κ′′ (n, u) − +

n−1 !

χm τ2 (xm , am ) +

m=0

n−1 !

h21 (xm+1 )(χm+1 − χm ) .

" ! u2 σ∗2 n−1 u2 Exf∗ χm − Exf∗ h21 (x0 ) − χn h21 (xn ) 2 m=0 2

= κ′′ (n, u) − +

n−1 !

χm τ2 (xm , am ) +

m=0

n−1 !

m=0

Hence (43)

Exf∗ χn = 1 −

with

! u2 σ∗2 n−1 E f∗ χm + κ′ (n, u) 2 m=0 x

n−1 ! u2 f ∗ " 2 2 κ (n, u) = κ (n, u)− Ex h1 (x0 )−χn h1 (xn )+ χm τ2 (xm , am ) 2 m=0 ′

′′

(44)

n−1 !

m=0

(45)

h21 (xm+1 ) exp{iu(C − ρ∗ )} − 1 χm .

Let us rewrite (43) as

Exf∗ χn = 1 + exp{−

% n−1 ! u2 σ∗2 }−1 Exf∗ χm + κ(n, u), 2 m=0

with "

(46) κ(n, u) := κ′ (n, u) + 1 −

! u2 σ∗2 # n−1 u2 σ∗2 − exp{− } Exf∗ χm . 2 2 m=0

From (45), an induction argument gives Exf∗ χn (u) = exp{−

(47)

exp{−

h21 (xm+1 ) exp{iu(C − ρ∗ )} − 1 χm .

nσ∗2 u2 }+ 2

# n−1 & σ 2 u2 ' ! σ∗2 u2 }−1 exp − ∗ (n − 1 − m) κ(m, u) 2 2 m=0

+κ(n, u).

Armando F. Mendoza-P´erez

Observe that the proof of the limit (34) and consequently of Theorem 3.7 follows from (47) if we show u (48) max |κ(m, √ )| → 0 as n → ∞. 1≤m≤n n This relation is obtained by an inspection of the diﬀerent terms of √ κ(m, u/ n). We will do this in the following six steps. (i) Since f∗ is a canonical policy satisfying (5), we have τ1 (xm , am ) = 0 for m = 0, 1, · · · in (42). Similarly, by (17), τ2 (xm , am ) = 0 in (44). (ii) From (22) we have that 1 lim √ Exf∗ h(xn ) = 0 n

n→∞

and

lim

n→∞

1 f∗ E h(xn ) = 0 n x

for every h in BW (X). This limit appears in (42) and (44) when we √ replace u by u/ n. (iii) In this part we prove the limit (see (42)) n−1 ! 1 lim √ Exf∗ e1 h1 (xm+1 )χm = 0. n→∞ n m=0

From the fact |e1 (z)| ≤ z 2 /2 for all z in R, we obtain n−1 " 1 " ! " " e1 h1 (xm+1 )χm " " √ Exf∗ n m=0

n−1 ! u2 1 |h1 (xm+1 )|(C(xm , am ) − ρ∗ )2 ≤ √ Exf∗ 2 n n m=0 n−1 ! u2 | = 3/2 Exf∗ 2n m=0

h1 (y)Qf∗ (dy|xm )|(Cf∗ (xm ) − ρ∗ )2 .

√ √ By Lemma 4.3(a), h1 (·) is 4 W -bounded, in particular h1 (·) is W √ $ On bounded. Hence the function X h1 (y)Qf∗ (dy|·) is W -bounded. √ 2 the other hand, by Assumption 3.4 (Cf∗ (x) − ρ∗ ) is also W -bounded. Therefore n−1 n−1 " 1 " ! ! λu2 " " e1 h1 (xm+1 )χm " ≤ 3/2 Exf∗ W (xm ) " √ Exf∗ n

m=0

where λ is a constant depending on h1 and C. By (21) we obtain n−1 " 1 " ! λu2 " " e1 h1 (xm+1 )χm " ≤ 3/2 n[1 + b/(1 − β)]W (x). " √ Exf∗ n m=0

Asymptotic normality of MCPs

which converges to zero as n → ∞. (iv) We shall next prove ! 1 f∗ n−1 lim Ex e2 χm = 0. n→∞ n m=0

√ This limit appears in (42) when we replace u by u/ n. Observe that |e2 (z)| ≤ |z|3 /6 for all z in R. So, by Assumptions 3.1(d) and 3.4, together with (21), n−1 "1 " ! " " e2 χm " " Exf∗

m=0

≤ ≤ ≤ ≤

! |u|3 f∗ n−1 Ex |Cf∗ (xm ) − ρ∗ |3 5/2 6n m=0

! k 3 |u|3 f∗ n−1 Ex W (xm )3/4 5/2 6n m=0 ! k 3 |u|3 f∗ n−1 Ex W (xm ) 5/2 6n m=0

k 3 |u|3 [1 + b/(1 − β)]W (x) 6n3/2

which converges to√zero as n → ∞, with k a constant. (v) Let h be a W -bounded function on X. Then # $ ! u 1 f∗ n−1 Ex h(xm+1 ) exp{i √ (C − ρ∗ )} − 1 χm = 0. n→∞ n n m=0

lim

√ This limit appears in (42) and (44) when u is replaced by u/ n. It follows from the relation e1 (z) = exp{iz} − iz − 1 that

$ # u u u exp{i √ (C − ρ∗ )} − 1 = i √ (C − ρ∗ ) + e1 √ (C − ρ∗ ) . n n n

So n−1 # $ ! u 1 h(xm+1 ) exp{i √ (C − ρ∗ )} − 1 χm | ≤ | Exf∗ n n m=0

! ! |u| f∗ n−1 1 f∗ n−1 E |h(x )||(C (x ) − ρ )| + |h(xm+1 )||e1 |. E m+1 m ∗ f ∗ n x m=0 n3/2 x m=0

This gives the desired conclusion by similar arguments to those in (iii).

Armando F. Mendoza-P´erez

(vi) The absolute value of the expression within brackets in (46) √ is majorized by σ∗4 u4 /8, then the corresponding term in κ(n, u/ n) is majorized by σ∗4 u4 /8n2 . The statements (i)-(vi) imply (48) and consequently prove the theorem. ✷ Remark 4.4 Taking A as a single-point set (singleton) we obtain the Central Limit Theorem for (noncontrolled) Markov chains.

An example: a LQ system

Consider the linear system (49)

xt+1 = k1 xt + k2 at + zt ,

t = 0, 1, · · · ,

with state space X := R and positive coeﬃcients k1 , k2 . The control set is A := R, and the set of admisible controls in each state x is the interval (50) A(x) := [−k1 |x|/k2 , k1 |x|/k2 ].

The disturbances zt consists of i.i.d. random variables with values in Z := R, zero mean and finite variance, that is,

(51)

E(zt ) = 0,

σ 2 := E(zt2 ) < ∞.

To complete the description of our control model we introduce the quadratic cost-per-stage function (52)

C(x, a) := c1 x2 + c2 a2

∀(x, a) ∈ K,

with positive coeﬃcients c1 , c2 . We also define (53)

W (x) := exp[γ|x|] for all x ∈ X,

with γ ≥ 4. clearly, Assumption 3.4 holds. Moreover, let sˆ > 0 be such that γˆ s < log(γ/2 + 1), which implies (54)

β :=

2 (exp[γˆ s] − 1) < 1. γ

Throughout the rest of this section, we suppose the following Assumptions taken from [6, Section 5]:

Asymptotic normality of MCPs

Assumption 5.1 0 < k1 < 1/2. Assumption 5.2 The i.i.d. disturbances zt have a common density g, which is a continuous bounded function supported on the interval S := [−ˆ s, sˆ]. Moreover, there exists a positive number ε such that g(s) ≥ ε for all s ∈ S. These assumptions, 5.1 and 5.2, imply that Assumptions 3.1 and 3.2 hold ( see, for instance,[6, Propositions 6, 23 and 24]). On the other hand, in [6] it is proved that there exists a unique canonical policy given by (55)

f∗ (x) = −f0 x,

∀x ∈ X,

satisfying (4) and (5), with f0 :=

v0 k1 k2 c2 + v0 k22

and v0 is the unique positive solution to the quadratic (so-called Riccati) equation k22 v02 + (c2 − c1 k22 − c2 k12 )v0 − c1 c2 = 0. In this case, the corresponding function h1 (·) is given by

(56)

h1 (x) = v0 x2

and the optimal value is (57)

∀x ∈ X,

ρ∗ = v0 σ 2 ,

where σ as in (51). Thus (ρ∗ , h1 , f∗ ) is a canonical triplet for our linear quadratic Markov control model. Since f∗ in (55) is the unique canonical policy, by Proposition 3.6 we have that this policy also minimizes the limit average variance. In particular, the optimal value for the variance is (58)

! 1 n−1 Exf∗ Λf∗ (xt ), n→∞ n t=0

σ∗2 = V (f∗ , x) = lim

We next calculate the limit in (58) and find #the value of the optimal variance. To this end, let k" := k1 −k2 f0 , B := R z 3 g(z) dz and D := # 4 R z g(z) dz. Then by (16), (55) and (56), we have (59)

f∗ 4 " Λf∗ (xt ) = v02 4k"2 σ 2 Exf∗ (x2t ) + 4kBE x (xt ) + D − σ ,

Armando F. Mendoza-P´erez

Replacing at in (49) with at := f∗ (xt ) = −f0 xt , we obtain ! t−1 + zt−1 xt = (k1 − k2 f0 )xt−1 + zt−1 = kx

∀t = 1, 2, · · · .

! < 1. By (50) and Assumption 5.1, we can check that |k| By an induction procedure, for all t = 1, 2, · · ·,

xt = k!t x0 +

t−1 "

j=0

From this relation, we obtain

Exf∗ (xt ) = k!t x,

(60) and (61)

k!j zt−1−j .

Exf∗ (x2t ) = k!2t x2 + σ 2 (1 − k!2t )/(1 − k!2 ).

The relations (60) and (61) imply the limits

" " 1 n−1 1 n−1 f∗ (62) lim Ex (xt ) = 0 and lim Exf∗ (x2t ) = σ 2 /(1 − k!2 ). n→∞ n n→∞ n t=0 t=0

Hence, by (59) and (62) we obtain σ∗2 = (63)

" 1 n−1 Exf∗ Λf∗ (xt ) n→∞ n t=0

lim

= v02

# 5k !2 − 1

1 − k!2

σ4 +

z 4 g(z) dz ≥ 0.

Finally, by Theorem 3.7 and considering (57), we obtain that for every initial state x ∈ X, as n → ∞, the distribution of the cost &n−1 t=0

Cf∗ (xt ) − nv0 σ 2 √ n

has an asymptotic normal distribution N (0, σ∗2 ) with σ∗2 as in (63). By (5), we obtain v0 (1 − k!2 ) = c1 + c2 f02 . Hence, Cf∗ (x) = (c1 + c2 f02 )x2 = v0 (1 − k!2 )x2 for all x. This implies that for every initial state x, as n → ∞, &n−1 2 2 !2 t=0 xt − nσ /(1 − k ) √ n

Asymptotic normality of MCPs

has asymptotic normal distribution N (0, s2 ), where s = 2

! 5k "2 − 1

1 − k"2

σ + 4

z 4 g(z) dz

(1 − k"2 )2 .

Acknowledgement The author wishes to thank Professor Onésimo Hernández-Lerma for his valuable comments and suggestions. Armando F. Mendoza-Pérez Universidad Politécnica de Chiapas, Calle Eduardo J.Selvas S/N, Tuxtla Gutiérrez, Chiapas. mepa680127@hotmail.com

References [1] Gordienko E. and Hern´ andez-Lerma O., Average cost Markov control processes with weigthed norms: existence of canonical policies, Appl. Math. (Warsaw), 23 (1995), 199-218. [2] Hern´ andez-Lerma O. and Lasserre J.B., Discrete-Time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, New York, (1996). [3] Hern´ andez-Lerma O. and Lasserre J.B ., Further Topics on Discrete-time Markov Control Processes, Springer-Verlag, New York, (1999). [4] Hern´ andez-Lerma O. and Vega-Amaya O., Infinite-horizon Markov control processes with undiscounted cost criteria: From average to overtaking optimality, Appl. Math. (Warsaw), 25 (1998), 153-178. [5] Hern´ andez-Lerma O., Vega-Amaya O. and Carrasco G., Sample-path optimality and variance-minimization of average cost Markov control processes, SIAM J. Control Optim., 38(1) (1999), 79-93. [6] Hilgert N. and Hern´ andez-Lerma O., Bias optimality versus strong 0-discount optimality in Markov control processes with unbounded costs, Acta Appl. Math. 77 (2003), 215-235. [7] Mandl P., On the variance in controlled Markov chains, Kybernetika (Prague), 7 (1971), 1-12. [8] Mandl P., On the asymptotic normality of the reward in a controlled Markov chain, Colloquia Mathematica Societatis J´ anos Bolyai, 9. European Meeting of Statisticians, Budapest (Hungary), (1972). [9] Mandl P., A connection between controlled Markov chains and martingales, Kybernetika (Prague), 9 (1973), 237-241. [10] Mandl P., Estimation and control in Markov chains, Adv. Appl. Probab., 6 (1974), 40-60.

Armando F. Mendoza-P´erez

[11] Prieto-Rumeau T. and Hern´ andez-Lerma O., Variance minimization and the overtaking optimality approach to continuous–time controlled Markov chains, To appear in Math. Meth. Oper. Res. [12] Puterman M.L., Markov Decision Process, Wiley, New York, (1994). [13] Vega-Amaya O., Markov control processes in Borel spaces: Undiscounted criteria, Doctoral thesis, UAM-Iztapalapa, M´exico, 1998 (in Spanish). [14] Yushkevich A.A., On a class of strategies in general Markov decision models, Theory Probab. Appl., 18 (1973), 777-779. [15] Zhu Q.X. and Guo X.P., Markov decision processes with variance minimization: A new condition and approach, Stoch. Anal. Appl., 25 (2007), 577-592.

Morfismos, Vol. 12, No. 2, 2008

Errata

En la edición impresa del Vol. 9, No. 2 de Morfismos (diciembre de 2005) se omitió involuntariamente la fórmula con etiqueta (14) al final de la página 11. La forma correcta en que dicha página debió terminar es con los dos renglones siguientes:

By an involuntary error, formula (14) was removed at the bottom of page 11 in the December 2005 printed issue of Morfismos (Vol. 9, No. 2). The last two lines in that page should have been:

... B = 1.10555. He used this to show that, for x large, (14)

0.89

x x < π(x) < 1.11 log x log x

Morfismos, Comunicaciones Estudiantiles del Departamento de Matem´ aticas del CINVESTAV, se termin´ o de imprimir en el mes de marzo de 2009 en el taller de reproducci´ on del mismo departamento localizado en Av. IPN 2508, Col. San Pedro Zacatenco, M´exico, D.F. 07300. El tiraje en papel opalina importada de 36 kilogramos de 34 × 25.5 cm consta de 500 ejemplares con pasta tintoreto color verde.

Apoyo t´ecnico: Omar Hern´ andez Orozco.

V´ertices simpliciales y escalonabilidad de grafos Roberto Cruz y Mario Estrada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Asymptotic normality of average cost Markov control processes Armando F. Mendoza-P´erez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53