Mathematical Finance, Vol. 21, No. 4 (October 2011), 723–742

GREEKS FORMULAS FOR AN ASSET PRICE MODEL WITH GAMMA PROCESSES REIICHIRO KAWAI University of Leicester ATSUSHI TAKEUCHI Osaka City University

Greeks formulas of Delta, Rho, Vega, and Gamma are derived in closed form for asset price dynamics described by gamma processes and Brownian motions time-changed by a gamma process. The model considered here includes many well-known models of practical interest, such as the variance gamma model and the Black–Scholes model. Our approach is based upon the Malliavin calculus for jump processes by making full use of a scaling property of gamma processes with respect to the Girsanov transform. The existence of their variance is investigated. Numerical results are provided to illustrate that the derived Greeks formulas have faster rate of convergence relative to the ﬁnite difference method. KEY WORDS: Bismut–Elworthy–Li type formulas, gamma processes, Girsanov transform, Malliavin calculus, time-changed Brownian motion, variance gamma processes.

1. INTRODUCTION There have been almost 35 years since Malliavin initiated in Malliavin (1976) the stochastic calculus of variations, often called the Malliavin calculus, to prove the existence of a smooth density of a stochastic differential equation. Later in Bismut (1981), Bismut developed a martingale approach based on a measure change to Malliavin calculus to establish an integration-by-parts formula over Gaussian probability space, and revealed ¨ the relation between the Hormander condition on the hypoelliptic problem and the existence of a smooth density of a stochastic differential equation. Furthermore, in the book (Bismut 1984), he studied the logarithmic gradient of the fundamental solution to the heat equation on a Riemaniann manifold. The set of formulas derived in the book are often called the Bismut formula after his great contribution. Elworthy and Li (1994) also attacked the same problem via a different approach, and succeeded in extending the Bismut formula to a more general class of stochastic differential equations on a Riemaniann manifold. It is found that the logarithmic derivative is essentially equivalent to the Greeks in mathematical ﬁnance. In Fourni´e et al. (1999, 2001), the Malliavin calculus on the Wiener space was applied to the sensitivity analysis for an asset price model. As The authors are grateful to three anonymous referees for their very careful reading and a number of valuable suggestions. This work is supported in part by JSPS Grant-in-Aid for Scientiﬁc Research 20740059 and was largely carried out while RK was based at the Center for the Study of Finance and Insurance, Osaka University, Japan. Manuscript received April 2008; ﬁnal revision received January 2010. Address correspondence to Reiichiro Kawai, Department of Mathematics, University of Leicester, Leicester LE1 7RH, UK; e-mail: reiichiro.kawai@gmail.com. DOI: 10.1111/j.1467-9965.2010.00452.x C 2010 Wiley Periodicals, Inc.

723

724

R. KAWAI AND A. TAKEUCHI

a different approach from the Malliavin calculus, the martingale structure in diffusion processes was applied in Gobet (2004) and Gobet and Munos (2005) to derive Greeks formulas. All those results only concern diffusion processes without random jumps. It is a natural question whether similar approaches can be taken in the Greeks computations for jump processes, that is, in similar manners to the case of diffusion processes. There have been some results on the formulation of the Malliavin calculus on the Poisson space, or the Wiener-Poisson space. (See, e.g., Bichteler, Gravereaux, and Jacod 1987; Di Nunno, Øksendal, and Proske 2009.) It was ﬁrst found by Bismut (1983) that the measure change by the Girsanov transform for jump processes offsets a perturbation of the Poissonian path space and enables us to obtain the existence of a smooth density of a stochastic ¨ differential equation with jumps under the Hormander type conditions. See, for example, Komatsu and Takeuchi (2001) and L´eandre (1985). (See also Bichteler et al. 1987; Leon et al. 2002; Picard 1996 for yet different approaches to the Malliavin calculus for jump processes.) Recently, as in the Gaussian case, the Greeks computations have also been investigated in the context of jump-type ﬁnance models. Davis and Johansson (2006) and Cass and Friz (2007) studied for jump diffusion processes, although their approaches do not take effects from the jump component and end up with formulas very similar to the case of processes without random jumps. El-Khatib and Privault (2004) applied the calculus focused on the Poisson arrival times developed in Carlen and Pardoux (1990), whereas Bally, Bavouzet, and Messaoud (2007) took a uniﬁed approach considering the derivatives with respect to both the Poisson arrival times and the amplitudes of the jumps. Kawai and Kohatsu-Higa (2010) applied the Malliavin calculus on the Wiener space to pure-jump processes described by the time-changed Brownian motion conditionally on its time-changing process. Takeuchi (2010) studied the same problem of solutions to stochastic differential equations with jumps via the martingale approach based upon the Kolmogorov backward equation of the integro-differential operator associated with the stochastic differential equation in a similar manner to Elworthy and Li (1994). In this paper, we study the computations of the Greeks for asset price dynamics described by gamma processes and Brownian motions time-changed by a gamma process, where a lot of well-known models, such as the Black–Scholes model and the variance gamma model, are included. Our approach is based upon the Malliavin calculus for jump processes of Bismut (1983) by making use of a scaling property of gamma processes with respect to the Girsanov transform. Our results differ from Cass and Friz (2007) and Davis and Johansson (2006) in the sense that our model can be of a pure-jump type, while improving the results of El-Khatib and Privault (2004) and Bally et al. (2007) in that our model is formulated with L´evy processes of a more realistic inﬁnite activity type. The paper is organized as follows. Section 2 is devoted to recalling some basic results on gamma processes and describing the scaling property of the gamma process and the Brownian motion with respect to the Girsanov transform. In Section 3, the sensitivities of our model are derived and the existence of their variance is investigated. Finally, numerical results are presented to provide support for the effectiveness of our formulas in the Monte Carlo estimation.

2. PRELIMINARIES Let us begin with general notations which will be used throughout the paper. We denote by N the set of positive integers. We let Rd be the d-dimensional Euclidean space with the

GREEKS FORMULAS FOR AN ASSET PRICE MODEL WITH GAMMA PROCESSES

725

norm · and the inner product ·, ·. We denote by EP [·] and by VarP (·), respectively, the expectation and the variance taken under a probability measure P. The restriction of a probability measure P to the σ -ﬁeld F is denoted by P|F . Henceforth, we ﬁx (, F, P) L as our underlying probability space. We denote by = the identity in law under a suitable probability measure. For k ∈ N, ∂ k indicates the partial derivative with respect to the kth argument. Finally, Cbk denotes the class of k-time continuously differentiable functions with bounded derivatives, whereas the subscript K of CK indicates the compact support and ·∞ indicates the supremum norm of continuous functions. We next recall the deﬁnition and some key properties of gamma (L´evy) processes. The gamma process {Yt : t ≥ 0} is a one-sided pure-jump L´evy process in [0, +∞) whose L´evy measure is given in the form ν(dz) = a

e−bz dz, z

z ∈ (0, +∞),

where a > 0 and b > 0. For each (time) t > 0, its marginal has the gamma distribution with the characteristic function

i yz i y −at e − 1 ν(dz) = 1 − (2.1) EP ei yYt = exp t , b (0,+∞) while the marginal density ftP at time t under P is given in closed form (2.2)

ftP (y) =

bat at−1 −by e , y (at)

y ∈ [0, +∞).

Throughout this paper, we ﬁx T > 0. Deﬁne

:= λ ∈ R : EP eλY1 < +∞ = (−∞, b), and for λ ∈ , ϕT (λ) := Ta ln

b . b−λ

Fix ξ ≥ 0 and ﬁx η := (η1 , η2 ) ∈ [0, +∞) × [0, +∞). Let := ( 1 , 2 ) be a standard normal random vector in R2 , independent of {Yt : t ∈ [0, T]}. For each t ∈ [0, T], we denote by Ft the minimal σ -ﬁeld generated by σ ( ) and σ {Ys | s ∈ [0, t]}. Then, deﬁne the probability measure Qλ , equivalent to P on FT , via the Radon-Nikodym derivative eξ λYT 1 2 d Qλ eλη, 2 λ = exp ξ λY (2.3) , := − ϕ (ξ λ) + λη,

− η T T d P F T 2 EP eξ λYT EP eλη, that is, the product of two Esscher transforms. (For the use of the Esscher transform for option valuation, see, e.g., Gerber and Shiu 1994, Chan 1999, and Elliott, Chan, and Siu 2005.) The Esscher parameters ξ and η are dummy variables in the sense that we can set, for example, η = 0 to leave the distribution of the Gaussian component untransformed, while they may also serve as control parameters for the intensity of the density transform among three components. (We suppress the parameters ξ and η in the notation Qλ for simplicity, because both ξ and η are ﬁxed throughout.) Under the probability measure Qλ , the random vector is again Gaussian with mean λη and with identity variance– covariance matrix, while the stochastic process {Yt : t ∈ [0, T]} is again a gamma process

726

R. KAWAI AND A. TAKEUCHI

with the characteristic function EQÎť ei yYt = exp t

ei yz âˆ’ 1 Î˝Îť (dz) = 1 âˆ’

(0,+âˆž)

iy b âˆ’ ÎžÎť

âˆ’at

,

where Î˝Îť (dz) := a

eâˆ’(bâˆ’Îž Îť)z dz, z

z âˆˆ (0, +âˆž).

In addition, the independence of and {Yt : t âˆˆ [0, T]} remains true under QÎť . (This can be justiďŹ ed immediately through the characteristic function.) The following is what we call the scaling property, which is the key tool for our entire argument. LEMMA 2.1. Fix Îž â‰Ľ 0 and Îť such that Îž Îť âˆˆ . Let QÎť be deďŹ ned by (2.3). For x, y âˆˆ R, z âˆˆ [0, +âˆž) and t âˆˆ [0, T], we have

QÎť ( 1 â‰¤ x, 2 â‰¤ y, Yt â‰¤ z) = P ( 1 + ÎťÎˇ1 â‰¤ x) P ( 2 + ÎťÎˇ2 â‰¤ y) P

b Yt â‰¤ z . b âˆ’ ÎžÎť

Proof . Obvious by the equality EQÎť ei yYt = 1 âˆ’

iy b âˆ’ ÎžÎť

âˆ’at

b = EP ei bâˆ’Îž Îť yYt ,

and by the independence of and {Yt : t âˆˆ [0, T]}.

REMARK 2.2. It is what we meant by â€œscaling propertyâ€? that for a ďŹ xed t, the transformed version (under QÎť ) can be generated by the original one (under the physical measure P) and separately a deterministic modiďŹ cation. (Concerning the denomination, we follow Yor 2007, while in the Brownian motion case, it should be rather a drift change than a scaling.) Such a deterministic modiďŹ cation property is not shared among most of the other LÂ´evy processes. For example, even in the case of the Poisson process {Nt : t â‰Ľ 0} with intensity Î¸ > 0, we have

EP

eÎťNt ei yNt EP eÎťNt

= exp teÎť Î¸ ei y âˆ’ 1 .

This implies that the Girsanov transform may act either as the time change tâ†’eÎť t or as the change in the intensity Î¸ â†’eÎť Î¸ , while these cannot be treated as a deterministic change in simulation. In Carlen and Pardoux (1990), a differential calculus is developed on the Poisson space by looking at the Poisson arrival times T 1 , T 2 , . . . rather than the number of arrivals NT . See also Elliott and Tsoi (1993). Let us give another example of LÂ´evy processes of tempered stable type. For simplicity, we consider only the case of positive jump sizes. Set Î˝(dz) := eâˆ’bz zâˆ’1âˆ’Îą dz, z > 0, for b > t +âˆž 0 and Îą âˆˆ (0, 1), and set Xt := 0 0 zÎź(dz, ds) where Îź is a Poisson random measure

GREEKS FORMULAS FOR AN ASSET PRICE MODEL WITH GAMMA PROCESSES

727

with EP [μ(B, [0, t])] = tν(B) for any Borel set B in (0, +∞). Then, it follows from the equation eλXt i yXt e = exp t(−α) ((b − λ − i y)α − (b − λ)α ) EP λX EP e t that the parameter λ cannot be absorbed solely in y. We now introduce the asset price dynamics. Let {Wt : t ≥ 0} and {Bt : t ≥ 0} be one-dimensional standard Brownian motions, independent of one another. Moreover, let {Zt : t ≥ 0} be a stochastic process in R independent of {Yt : t ≥ 0}, {Wt : t ≥ 0}, and {Bt : t ≥ 0}. Then, our asset price model is the stochastic process {St : t ∈ [0, T]} under the physical measure P deﬁned by St := S0 exp θ Yt + σ WYt + τ Bt + Zt + c(θ, σ, τ )t , where S0 > 0, θ ∈ R, σ , τ ∈ [0, +∞), and c : R × [0, +∞) × [0, +∞) → R with ∂ k c(θ , σ , τ ) being well deﬁned for k = 1, 2, 3. The parameters σ and τ act as volatility indices, whereas c(θ , σ , τ ) is a constant drift. REMARK 2.3. Our asset dynamics model includes the variance gamma model of Madan, Carr, and Chang (1998). It has attracted much attention among market practitioners, and thus appears often in the computational ﬁnance literature, for example, Carr and Madan (1999) and Fu (2007). We can induce the (geometric) variance gamma process simply by setting the parameters a = b =: κ −1 for the gamma process {Yt : t ∈ [0, T]} in the model {θ Yt + σ WYt : t ∈ [0, T]}. Moreover, it is also shown in Madan, Carr, and Chang (1998) that the variance gamma process can be expressed as the difference L of two independent gamma processes, {θ Yt + σ WYt : t ≥ 0} = {Yt, p − Yt,n : t ≥ 0}, where the gamma processes {Yt,p : t ≥ 0} and {Yt,n : t ≥ 0} are characterized with (a, b) = (κ −1 , (μp κ)−1 ) and (a, b) = (κ −1 , (μn κ)−1 ) for some κ > 0, respectively, where 1 2 2σ 2 1 2 2σ 2 θ θ μp = θ + θ + + and μn = − . 2 κ 2 2 κ 2 Using this approach, we may also form a variance gamma process by setting {Zt : t ≥ 0} to be a gamma process with suitable parameters.

3. MAIN RESULTS To begin main discussion, observe ﬁrst that for each t ∈ [0, T], √ L St = S0 exp θ Yt + σ Yt 1 + τ t 2 + Zt + c(θ, σ, τ )t , and deﬁne for ξ λ ∈ , (λ) St

:= S0 exp θ

b Yt + σ b − ξλ + c(θ, σ, τ )t ,

√ b Yt ( 1 + η1 λ) + τ t ( 2 + η2 λ) + Zt b − ξλ

728

R. KAWAI AND A. TAKEUCHI

which is what the marginal St behaves like under QÎť . To avoid lengthy expressions, let us prepare some auxiliary notations. DeďŹ ne (Î´) âˆ‚ ST (Îť) , (3.1) ln HT := âˆ‚Î´ S0 Î´=Îť (0)

(Îť)

with HT := HT , that is, the derivative of the exponent of St , and deďŹ ne

(3.2)

(Îť) Ft

b Ďƒ Yt + := Îž Î¸ b âˆ’ ÎžÎť 2

âˆš b b Yt ( 1 + Îˇ1 Îť) + Îˇ1 bĎƒ Yt + Îˇ2 bĎ„ t, b âˆ’ ÎžÎť b âˆ’ ÎžÎť

(0)

(Îť)

with Ft := Ft . An algebraic work yields that for each Ď‰ âˆˆ , Ht (Ď‰) â†’ Ft (Ď‰)/b, (Îť) (Î´) (0) as |Îť|â†“0. Finally, deďŹ ne KT := (âˆ‚/âˆ‚Î´)FT |Î´=Îť , with KT := KT . We do not omit the (0) (0) (0) superscript (0) of St since St (Ď‰) = St (Ď‰) although L(St ) = L(St ). LEMMA 3.1. Choose Îł > 1, arbitrarily close to 1. Assume that 2Îł Î¸ + 2Îł 2 Ďƒ 2 < b and EP [e2Îł ZT ] < +âˆž. Fix Îť âˆˆ (0, 1âˆ§((bâˆ¨(b âˆ’ 2Îł Î¸ ))/Îž )), and suppose that one of the following conditions holds: (a) Ďƒ (b) Ďƒ (c) Ďƒ (d) Ďƒ

= 0, Îž = 0, Îž > 0, Îž > 0, Îž

> 0, Îˇ2 > 0, Îˇ2 = 0, Îˇ1 = 0, Îˇ2

= 0, Î¸ = 0, > 0, Ď„ > 0, Î¸ â‰Ľ 0, > 0, Îˇ2 = 0, > 0, Ď„ > 0.

Then, we have the following; (i) Let p > 0. If aT > 2 p for (a), or aT > p for (c), then it holds that (Îľ) sup|Îľ|â‰¤Îť E[(FT )âˆ’2 p ] < +âˆž. (Îľ) (Îľ) (ii) Assume that aT > 2Îł for (a). For each |Îľ| â‰¤ Îť, the random variables ST HT , (Îľ) (Îľ) (Îľ) (Îľ) (Îľ) 2 2Îł 2 ST HT /FT and KT /(FT ) are all inL (, P).Moreover, they converge inL (, P) (0) (0) respectively to ST HT , ST HT /FT and KT /(FT )2 , as |Îľ|â†“0. Proof . See Appendix. The above condition sets (a)â€“(d) correspond to various asset price models of practical interest, such as the variance gamma model and the Blackâ€“Scholes model. Let us defer this discussion to Remark 3.5 after the derivation of the Greeks formulas. ASSUMPTION 3.2. We ďŹ x the constant Îł of Lemma 3.1, which is strictly greater than, but can be chosen arbitrarily close to 1. Assume that 2Îł Î¸ + 2Îł 2 Ďƒ 2 < b and EP [e2Îł ZT ] < +âˆž. In what follows, we impose Assumption 3.2, so that Lemma 3.1 (ii) holds, unless stated 2Îł otherwise. In particular, it then holds that EP [ST ] < +âˆž. The following result is due to the Girsanov transform argument of Bismut (1983) and serves as the starting point of our discussions to follow. PROPOSITION 3.3. Let be in Cb1 (R+ ; R). It holds that (3.3)

(0) (0) (0) EP ST ST FT = EP (Îž (bYT âˆ’ aT) + bÎˇ, ) ST .

GREEKS FORMULAS FOR AN ASSET PRICE MODEL WITH GAMMA PROCESSES

729

Proof . Note ďŹ rst that EP [|(ST )|2 ] < +âˆž holds since âˆˆ Cb1 (R+ ; R), EP |(ST )|2 â‰¤ 2EP |(ST ) âˆ’ (S0 )|2 + 2EP |(S0 )|2 1 2 EP (Î´ST + (1 âˆ’ Î´)S0 ) |ST âˆ’ S0 |2 dÎ´ + 2EP |(S0 )|2 , â‰¤2 0

and Assumption 3.2. By performing the Girsanov transform (2.3) and applying Lemma 2.1, we get (0) d QÎť (Îť) EP (ST ) = EP ST (3.4) . d P F T (Îť)

First, we prove that (âˆ‚/âˆ‚Îť)EP [(ST )]|Îť=0 is equal to the left-hand side (multiplied by (Îľ) b) of (3.3). From the boundedness and the continuity in Îľ of (ST ), it immediately (0) (Îľ) 2 (Îľ) follows that sup|Îľ|â‰¤Îť EP [| (ST )| ] < +âˆž and sup|Îľ|â‰¤Îť EP [| (ST ) âˆ’ ST |2 ] tends to zero, as Îť â†“ 0. Hence, for each |Îľ| â‰¤ Îť, where Îť âˆˆ (0, 1 âˆ§ ((bâˆ¨(b âˆ’ 2Îł Î¸ ))/Îž )), by a Taylor expansion (0) 1 (Îľ) (ST ) âˆ’ ST (Î´Îľ) (Î´Îľ) (Î´Îľ) (Ď‰) = (3.5) ST ST HT (Ď‰) dÎ´, Ď‰ âˆˆ , Îľ 0 and by the Cauchyâ€“Schwarz inequality, we get (0) (0) (0) (Îľ) âˆ’ ST ST FT b EP Îľâˆ’1 ST âˆ’ ST 1 (0) (0) (Î´Îľ) (Î´Îľ) (Î´Îľ) EP ST ST HT âˆ’ ST ST FT b dÎ´ â‰¤

0

â‰¤

1

2 1/2 2 1/2 (Î´Îľ) (Î´Îľ) (Î´Îľ) (0) EP ST EP ST HT âˆ’ ST FT b dÎ´

1

1/2 (0) 2 1/2 (0) 2 (Î´Îľ) EP ST âˆ’ ST EP ST FT b dÎ´.

0

+ 0

(Note that thanks to the scaling property, the random variables can be deďŹ ned on a single probability space at each step.) In view of Lemma 3.1, the ďŹ rst claim holds by taking the supremum on the both sides over |Îľ| â‰¤ Îť and then taking Îťâ†“0. (0) It remains to show that (âˆ‚/âˆ‚Îť)EP [(d QÎť /d P|FT )(ST )]|Îť=0 is equal to the right-hand side (multiplied by b) of (3.3). This is straightforward by the Cauchyâ€“Schwarz inequal2 2 ity with the result EP [|(ST )|2 ] < +âˆž and with Îťâˆ’1 (eÎž ÎťYT âˆ’Ď•T (Îž Îť)+ÎťÎˇ, âˆ’Îť Îˇ /2 âˆ’ 1) â†’ Îž (YT âˆ’ aT/b) + Îˇ, in L2 (). We are now in a position to present the main result of this paper. Our approach is based upon an appropriate modiďŹ cation of the identity (3.4) so as to derive Greeks formulas of interest. Note that the conditions (a)â€“(d) come mainly from the denominator of the estimators, as investigated in Lemma 3.1. For the readerâ€™s convenience, let us recall the role of the parameters appearing in the conditions; Ďƒ is the (gamma) volatility, Îž and Îˇ2 are the dummy variables in the Girsanov transform for the gamma process and for the second Gaussian component, respectively, while Î¸ is the (gamma) drift. Moreover, recall (Îť) (Îť) also that FT and HT are random variables deďŹ ned respectively in (3.2) and (3.1). To

730

R. KAWAI AND A. TAKEUCHI

keep the presentation as concise as possible, we write âˆš Ďƒ b b Y ( 1 + Îˇ1 Îť) + Îˇ2 bĎ„ T Îž Îž b bâˆ’Îž Y âˆ’ aT + 1 + bÎˇ,

+ ÎˇÎť T 4 bâˆ’Îž Îť T Îť (Îť) âˆ’Îž , LT := (Îť) 2 (Îť) FT FT and LT := where

(0) LT

âˆš Ďƒ YT 1 + Îˇ2 bĎ„ T Îž Îž (bYT âˆ’ aT + 1) + bÎˇ, = âˆ’Îž 4 , FT FT2

âˆš Ďƒ (0) FT = FT = Îž Î¸ YT + YT 1 + Îˇ1 bĎƒ YT + Îˇ2 bĎ„ T. 2

Recall that Îł is the constant which is strictly greater than, but can be chosen arbitrarily close to 1, in Assumption 3.2. THEOREM 3.4. Let be in Cb1 (R+ ; R). Assume that either one of the following holds: (a) Ďƒ (b) Ďƒ (c) Ďƒ (d) Ďƒ

= 0, Îž = 0, Îž > 0, Îž > 0, Îž

> 0, Îˇ2 > 0, Îˇ2 = 0, Îˇ1 = 0, Îˇ2

= 0, Î¸ = 0, > 0, Ď„ > 0, Î¸ â‰Ľ 0, > 0, Îˇ2 = 0, > 0, Ď„ > 0.

(i) Assume that aT > 2Îł in the condition (a), or aT > 1 in (c). Then, it holds that (0) (0) EP (ST )ST = EP ST LT =: EP ST J1 . (ii) It holds that

(0) Îž (0) =: EP ST J2 . EP (ST )ST YT = EP ST YT LT âˆ’ FT

(iii) It holds that

(0) Îˇ1 b Îž (0) âˆ’ =: EP ST J3 . EP (ST )ST WYT = EP ST YT 1 LT âˆ’ 2FT FT

(iv) Assume that aT > 2Îł in the condition (a) or that aT > 1 in (c). Then, it holds that

âˆš Îˇ2 b (0) (0) =: EP ST J4 . EP (ST )ST BT = EP (ST ) T 2 LT âˆ’ FT (v) Let be in Cb2 (R+ ; R), and assume that aT > 4Îł in the condition (a) or that aT > 2 in (c). Then, it holds that EP (ST )ST2 + EP (ST )ST âˆš (0) Îž 2 bYT + Îˇ2 b Îž 2 Ďƒ YT (Îž 1 /2 + bÎˇ1 ) 2 LT âˆ’ = EP ST + FT2 4FT3

Îž (bYT âˆ’ aT + 1) + bÎˇ, Îž (bYT âˆ’ aT) + bÎˇ, LT âˆ’ + 2LT âˆ’ FT FT (0) =: EP ST J5 .

GREEKS FORMULAS FOR AN ASSET PRICE MODEL WITH GAMMA PROCESSES

731

Proof . Recall ďŹ rst that we have shown EP [|(ST )|2 ] < +âˆž in the proof of Proposition 3.3. (i) Instead of (3.4), we start with the identity (Îť) (0) ST d QÎť ST EP = EP , (Îť) d P FT FT FT and an algebraic work shows that differentiating both sides at Îť = 0 yields the formula. Concerning the right hand side, its existence, with Îť being arbitrarily close to the origin, can be shown by the Cauchyâ€“Schwarz inequality with Lemma 3.1 (i), while the rest is as straightforward as in Proposition 3.3. The interchange of the derivative and the expectation on the left hand side can be justiďŹ ed by the dominated convergence theorem, just in a similar manner to the proof of Proposition 3.3, as follows. Fix Îť âˆˆ (0, 1âˆ§((bâˆ¨(b âˆ’ 2Îł Î¸ ))/Îž )). For each |Îľ| â‰¤ Îť, in view of the Taylor expansion (3.5), we get (Îľ) 2 (0) 2 (Îľ) (0) 2 EP ST â‰¤ 2EP ST âˆ’ ST + 2EP ST â‰¤ 2Îľ

1

2 0

(Î´Îľ) (Î´Îľ) (Î´Îľ) 2 EP ST ST HT dÎ´ + 2EP |(ST )|2

2 (Î´) (Î´) 2 â‰¤ 2Îľ sup (x) sup EP ST HT + 2EP |(ST )|2 < +âˆž, 2

xâˆˆR

|Î´|â‰¤Îť

where the Cauchyâ€“Schwarz inequality and the Fubini theorem (for a nonnegative integrand) are used for the second inequality, and Lemma 3.1 for the last ďŹ niteness. Note that (Îľ) (0) this also implies sup|Îľ|â‰¤Îť EP [|(ST ) âˆ’ (ST )|2 ] converges to zero, as Îťâ†“0. As seen in the (Îľ) (0) proof of Proposition 3.3, it holds that sup|Îľ|â‰¤Îť EP [| (ST ) âˆ’ (ST )|2 ] goes to zero, as Îťâ†“0. For each |Îľ| â‰¤ Îť, by a Taylor expansion (0) (Îľ) (Îľ) FT âˆ’ ST FT ST (Ď‰) Îľ (Î´Îľ) (Î´Îľ) 1 (Î´Îľ) (Î´Îľ) (Î´Îľ) (Î´Îľ) ST ST HT FT âˆ’ ST KT = (Ď‰) dÎ´, Ď‰ âˆˆ , (Î´Îľ) 2 0 FT and by the Cauchyâ€“Schwarz inequality, we get (Îľ) (Îľ) (0) (0) (0) (0) S FT âˆ’ ST FT ST ST HT FT âˆ’ ST KT T EP âˆ’ Îľ FT2 1 1/2 2 1/2 (Î´Îľ) 2 (Î´Îľ) (Î´Îľ) (Î´Îľ) (0) EP ST EP ST HT FT âˆ’ ST HT FT dÎ´ â‰¤ 0

+

1

1/2 2 1/2 (Î´Îľ) (0) 2 (0) EP ST âˆ’ ST EP ST HT FT dÎ´

1

1/2 2 2 1/2 (Î´Îľ) 2 (Î´Îľ) (Î´Îľ) 2 FT EP ST EP KT âˆ’ KT FT dÎ´

1

2 1/2 (0) 2 1/2 (Î´Îľ) EP ST âˆ’ ST EP KT /FT2 dÎ´,

0

+

0

+

0

whose supremum taken over |Îľ| â‰¤ Îť converges, as Îťâ†“0, to zero by Lemma 3.1.

732

R. KAWAI AND A. TAKEUCHI

For (ii), (iii), and (iv), differentiating both sides at Îť = 0 of the following identities (Îť) (0) ST d QÎť ST b Y = EP EP YT , (Îť) b âˆ’ Îž Îť T d P FT FT FT (Îť) (0) ST d QÎť ST b YT ( 1 + Îˇ1 Îť) = EP EP YT 1 , (Îť) b âˆ’ ÎžÎť d P FT FT FT (Îť) (0) ST âˆš d QÎť ST âˆš EP T ( 2 + Îˇ2 Îť) = EP T 2 , (Îť) d P FT FT F T

yields each expression. The existence of the corresponding expectations can also be proved in a similar manner to the proof of (i). Finally, for (v), deďŹ ne (x) = (x)x, and by the results of (i), we have (0) EP (ST )ST2 = EP (ST ) âˆ’ (ST ) ST = EP ST LT âˆ’ EP (ST )ST . Hence, we obtain the desired equation by differentiating at Îť = 0 of (0) d QÎť (Îť) (Îť) (Îť) EP ST LT FT = EP ST LT /FT . d P F T The dominated convergence and the existence of the expectation under the given condi tions can be justiďŹ ed in a similar manner to the previous cases.

REMARK 3.5. Let us here provide a clear-cut intuition on the condition sets (a)â€“(d). (a) â€œĎƒ = 0, Îž > 0, Îˇ2 = 0, Î¸ = 0â€?: The exponent of ST reduces to Î¸ YT + ZT . We may still form the variance gamma process by setting {Zt : t âˆˆ [0, T]} to be a suitable gamma process, as described in Remark 2.3. Our derivation here seems new in the sense that this model can be of a pure-jump type and we do not rely on a nondegenerate assumption of the diffusion component, for instance, as in Cass and Friz (2007). (For the advantages of considering pure-jump processes in ďŹ nancial modeling, we refer to the introduction of Elliott and Osakwe 2006.) (b) â€œĎƒ = 0, Îž > 0, Îˇ2 > 0, Ď„ > 0, Î¸ â‰Ľ 0â€?: If Î¸ = 0, then our model reduces to the Blackâ€“Scholes one. (See also Remark 3.9.) Meanwhile, the condition Î¸ > 0 is not quite restrictive again since we may still form the variance gamma process by a suitable setting of {Zt : t âˆˆ [0, T]}. This model fairly generalizes the case (a) to a superposition of a variance gamma process and an independent Brownian motion. (c) â€œĎƒ > 0, Îž = 0, Îˇ1 > 0, Îˇ2 = 0â€?: This case corresponds to a superposition of a symmetric variance gamma process {WYt : t âˆˆ [0, T]} and an independent process {Zt : t âˆˆ [0, T]}. It is remarkable that as discussed for the case (a) above, this model can also be of a pure-jump type. This setting will be studied in Kawai and Kohatsu-Higa (2010) without specifying structures of the time-changing process {Yt : t âˆˆ [0, T]}. (d) â€œĎƒ > 0, Îž = 0, Îˇ2 > 0, Ď„ > 0â€?: This is a generalization of the case (c), further adding an independent Brownian motion. As in the case (b), we may control the variance by two parameters Îˇ1 and Îˇ2 .

GREEKS FORMULAS FOR AN ASSET PRICE MODEL WITH GAMMA PROCESSES

733

Practical sensitivity formulas are presented in the following corollary. Recall again that Îł is the constant ďŹ xed in Assumption 3.2. DeďŹ ne two classes of functions;

CL(R+ ; R) := f âˆˆ C(R+ ; R) | f (x)| â‰¤ C(1 + |x|) for some C > 0 , and

J(R+ ; R) :=

n f : R+ â†’ R f = ck fk 1 Ak , n âˆˆ N, k=1

ck âˆˆ R, fk âˆˆ CL(R+ ; R), Ak intervals of R+ .

COROLLARY 3.6 (Sensitivity). Let âˆˆ J(R+ ; R). Assume that the marginal law of ZT is absolutely continuous with respect to the Lebesgue measure. Assume that either one of the following conditions holds: (a) (b) (c) (d)

Ďƒ Ďƒ Ďƒ Ďƒ

= 0, Îž = 0, Îž > 0, Îž > 0, Îž

> 0, Îˇ2 > 0, Îˇ2 = 0, Îˇ1 = 0, Îˇ2

= 0, Î¸ = 0, aT > 2Îł , > 0, Ď„ > 0, Î¸ â‰Ľ 0, > 0, Îˇ2 = 0, aT > 1, > 0, Ď„ > 0.

Let J 1 , J 2 , J 3 , J 4 , and J 5 be random variables deďŹ ned in the formulas of Theorem 3.4. Then, we have the following. (i) [Sensitivity with respect to S0 ] 1 âˆ‚ (0) EP [(ST )] = EP ST J1 . âˆ‚ S0 S0 (ii) [Sensitivity with respect to Î¸ ] âˆ‚ âˆ‚ (0) EP [(ST )] = EP ST J2 + âˆ‚1 c(Î¸, Ďƒ, Ď„ )TS0 EP [(ST )] . âˆ‚Î¸ âˆ‚ S0 (iii) [Sensitivity with respect to Ďƒ ] âˆ‚ âˆ‚ (0) EP [(ST )] = EP ST J3 + âˆ‚2 c(Î¸, Ďƒ, Ď„ )TS0 EP [(ST )] . âˆ‚Ďƒ âˆ‚ S0 (iv) [Sensitivity with respect to Ď„ ] âˆ‚ âˆ‚ (0) EP [(ST )] = EP ST J4 + âˆ‚3 c(Î¸, Ďƒ, Ď„ )TS0 EP [(ST )] . âˆ‚Ď„ âˆ‚ S0 (v) [Second derivative with respect to S0 ]. Assume that aT > 4Îł in the condition (a) or that aT > 2 in (c). Then, we have 1 1 âˆ‚ âˆ‚2 (0) [(S E )] = E EP [(ST )] . ST J5 âˆ’ P T P 2 2 S0 âˆ‚ S0 âˆ‚ S0 S0 Proof . For âˆˆ Cb2 (R+ ; R), all the above formulas are direct consequences of Theorem 3.4. Let us now remove the regularity conditions on payoff functions imposed in Theorem 3.4 and extend to the class J(R+ ; R). To this end, we proceed by four steps. In

734

R. KAWAI AND A. TAKEUCHI

summary, to approximate the function âˆˆ E1 by {n }nâˆˆN E2 , where E2 âŠ‚ E1 , it sufďŹ ces to show that for each compact set H âˆˆ R+ , (3.6)

sup |EP [n (ST )] âˆ’ EP [(ST )]| â†’ 0,

S0 âˆˆH

and (3.7)

2 âˆ‚ 1 (0) sup EP [n (ST )] âˆ’ EP ST J1 â†’ 0, S0 S0 âˆˆH âˆ‚ S0

both as n â†‘ âˆž. Step 1: First, we proceed from Cb2 (R+ ; R) to CK (R+ ; R). Clearly, âˆˆ CK (R+ ; R) can be approximated uniformly and boundedly by a sequence {n }nâˆˆN in CKâˆž (R+ ; R). Hence, (3.8)

|EP [n (ST )] âˆ’ EP [(ST )]| â‰¤ sup |n (x) âˆ’ (x)| , xâˆˆR+

which tends to zero as nâ†‘ + âˆž. On the other hand, by the Cauchyâ€“Schwarz inequality, it holds that 2 âˆ‚ 1 (0) sup EP [n (ST )] âˆ’ EP ST J1 S0 S0 âˆˆH âˆ‚ S0 (3.9) â‰¤ EP J12 sup S0âˆ’2 sup EP |n (ST ) âˆ’ (ST )|2 , S0 âˆˆH

S0 âˆˆH

which tends to zero as nâ†‘ + âˆž, again by (3.8). Therefore, the both criteria (3.6) and (3.7) hold. Step 2: We further extend to the class âˆˆ Cb (R+ ; R) of bounded continuous functions. To this end, for a ďŹ xed Îľ âˆˆ (0, 1), we can always ďŹ nd a sequence {n }nâˆˆN of continuous functions such that (x), if x âˆˆ (0, n âˆ’ Îľ], n (x) = 0, if x âˆˆ [n + Îľ, +âˆž), and for each x âˆˆ (n âˆ’ Îľ, n + Îľ), n (x) âˆˆ [0, (x)], where an improper interval [0, âˆ’1] should be understood as [âˆ’1, 0]. Clearly, for each n âˆˆ N, n âˆˆ CK (R+ ; R), and supnâˆˆN n âˆž = âˆž . First, from the boundedness, (3.6) immediately follows. Next, observe that

EP [|n (ST ) âˆ’ (ST )|2 ] â‰¤ EP [|(ST )|2 1 (|ST | > n âˆ’ Îľ)] â‰¤ 2âˆž P (|ST | > n âˆ’ Îľ) â‰¤

2âˆž EP STÎł , (n âˆ’ Îľ)Îł

which tends to zero as nâ†‘ + âˆž, where the last inequality holds by the Chebyshev inequality with the constant Îł > 1. Hence, using the inequality (3.9), we conclude that the both criteria (3.6) and (3.7) hold. Step 3: We next consider the class of ďŹ nite linear combinations of indicator functions on an interval of R+ . To this end, it is sufďŹ cient to study the case = 1 [Îą,+âˆž) or = 1 (Îą,+âˆž) for some positive Îą. For convenience, let us write Bx,n := (x âˆ’ 1/n, x + 1/n) for x â‰Ľ 1/n.

GREEKS FORMULAS FOR AN ASSET PRICE MODEL WITH GAMMA PROCESSES

735

Then, we can always ďŹ nd a sequence {n }nâˆˆN of continuous functions such that (x), if x âˆˆ [Îą + C/n, +âˆž), n (x) = 0, if x âˆˆ (0, Îą âˆ’ C/n], where C is a positive constant satisfying C < Îąn, and n (x) âˆˆ [0, (x)] for x âˆˆ BÎą,n . Clearly, for each n âˆˆ N, n âˆˆ Cb (R+ ; R), and supnâˆˆN n âˆž â‰¤ 1. Moreover, due to the boundedness, the ďŹ rst criterion (3.6) immediately holds. For (a), the density of YT is bounded due to aT > 1. Hence, we get âˆš P (ST âˆˆ BÎą,n ) = EP P Î¸ YT + Ď„ T 2 + ZT + c (Î¸, 0, Ď„ ) T âˆˆ ln(BÎą,n /S0 ) 2 , ZT â‰¤

b(aT âˆ’ 1)aTâˆ’1 âˆ’(aTâˆ’1) Îą + 1/n ln e . Î¸ (aT) Îą âˆ’ 1/n

By taking a similar approach to (b)â€“(d), also with the help of the boundedness of the Gaussian density, we can prove that there exists a constant C independent of Îą, n, and S0 , such that

P (ST âˆˆ BÎą,n ) â‰¤ C ln

Îą + 1/n , Îą âˆ’ 1/n

which tends to zero as nâ†‘ + âˆž. Therefore, we obtain EP |n (ST ) âˆ’ (ST )|2 = EP |n (ST ) âˆ’ (ST )|2 ; ST âˆˆ BÎą,n â‰¤ 4P (ST âˆˆ BÎą,n ) , which shows the criterion (3.7) again using the inequality (3.9). Step 4: Finally, we extend to the class of functions = Ă— 1 A, where âˆˆ CL(R+ ; R) and A is an interval of R+ . For a sequence {n }nâˆˆN in Cb (R+ ; R) such that âŽ§ (x), if |(x)| â‰¤ n, âŽŞ âŽŞ âŽ¨ if (x) > n, n (x) := n, âŽŞ âŽŞ âŽŠ âˆ’n, if (x) < âˆ’n, we deďŹ ne n := n Ă— 1 A. Clearly, n â†’ pointwise. Then, we obtain EP |n (ST ) âˆ’ (ST )|2 â‰¤ EP |n (ST ) âˆ’ (ST )|2 ; |(ST )| > n â‰¤ 4EP |(ST )|2 ; |(ST )| > n â‰¤ 4n 2âˆ’2Îł EP |(ST )|2Îł â‰¤ Cn 2âˆ’2Îł EP (1 + ST )2Îł , which tends to zero as nâ†‘ + âˆž, again with Îł > 1. This shows both (3.6) and (3.7) with the help of the Cauchyâ€“Schwarz inequality and the inequality (3.9), respectively. It is now immediate to extend to the class J(R+ ; R). The regularity conditions in the formulas (ii)â€“(v) can be relaxed in a similar manner. Above, the choice of the dummy parameters (Îž , Îˇ1 , Îˇ2 ) would be of great importance in practice. It is certainly a persuasive criterion to choose those parameters making the Monte Carlo variance as small as possible. We defer the discussion to Proposition 3.10 and Remark 3.11.

736

R. KAWAI AND A. TAKEUCHI

REMARK 3.7. The class J(R+ ; R) is not as general as that of measurable functions such that EP [|(ST )|2 ] < +∞. This is so because for any ∈ CL(R+ ; R), it holds EP [|(ST )|2 ] < +∞, due to the linear growth condition and Assumption 3.2. Meanwhile, the class J(R+ ; R) is rich enough and is likely to include all the European payoffs of interest, either continuous or discontinuous. Moreover, this class is broader than the one suggested in Cass and Friz (2007) in the sense that this class can directly deal with functions without compact support. REMARK 3.8. The formulas under the condition set (a) can also be derived via the direct use of the density function √ of the gamma distribution. To simplify the notation, write Sy (ω) := S0 exp[θ y + τ T 2 (ω) + ZT (ω) + c(θ, 0, τ )T]. Concerning the Delta formula, by conditioning on 2 and ZT , and by performing the (ordinary) integration-by-parts formula, we get EP (ST )ST =

= =

+∞

Sy (ω) Sy (ω) fTP (y) dy P(dω)

+∞

−1 P ∂ y Sy (ω) θ fT (y) dy P(dω)

0

0

+∞ Sy (ω) θ −1 fTP (y) 0 P(dω)

−

+∞ 0

aT − 1 − by P Sy (ω) fT (y) dy P(dω). θy

By EP [|(ST )|2 ] < +∞ and aT > 2γ , we arrive at the desired expression (i). The formulas (ii), (iii), and (iv) can be derived in a similar manner. REMARK 3.9. The formulas for the cases (c) and (d) can be derived via the Malliavin calculus on a Wiener space. (See Nualart 2006 for details of the Malliavin calculus on a Wiener space.) Let (, F, P) be the product space of the Wiener spaces (k , Fk , Pk ) (k = 1, 2). For ω := (ω1 , ω2 ) ∈ , we shall regard the standard normal random vector := ( 1 , 2 ) as (ω) := ( 1 (ω1 ), 2 (ω2 )). Denote the Malliavin-Shigekawa k -directional (k) derivative by D(k) := {Ds : s ∈ [0, 1]}, and its adjoint operator by δ (k) . Note that the standard normal random variable k (ωk ) can be regarded as the current state at time 1 of a standard Brownian motion over each Wiener space (k , Fk , Pk ). Then, the chain rule yields

Ds(k)

(0) ST

=

(0) (0) ST Ds(k) ST

=

(0) (0) √ ST ST σ YT 1 (s ≤ 1), (0) (0) √ ST ST τ T 1 (s ≤ 1),

if k = 1, if k = 2.

Let η1 > 0, η2 > 0, and let Gk be the union of Fk and the σ -ﬁeld generated by YT and ZT . Taking the integration, we get (0) (0) ST ST =

√

1

√

η1 σ YT + η2 τ T

η1

1 0

(0) Ds(1) ST ds + η2

1 0

(0) Ds(2) (ST ) ds .

GREEKS FORMULAS FOR AN ASSET PRICE MODEL WITH GAMMA PROCESSES

737

Hence, the integration by parts formula over each Wiener space yields that âŽĄ 1 1 âŽ¤ (0) (0) (1) (2) Ds ST ds G2 + Îˇ2 EP Ds ST ds G1 âŽĽ âŽ˘ Îˇ1 EP âŽ˘ âŽĽ 0 0 EP (ST )ST = EP âŽ˘ âˆš âŽĽ âˆš âŽŁ âŽŚ Îˇ1 Ďƒ YT + Îˇ2 Ď„ T âŽ¤ (0) (0) Îˇ1 EP ST Î´ (1) (1) G2 + Îˇ2 EP ST Î´ (2) (1) G1 âŽŚ = EP âŽŁ âˆš âˆš Îˇ1 Ďƒ YT + Îˇ2 Ď„ T (0) Îˇ, = EP ST . âˆš âˆš Îˇ1 Ďƒ YT + Îˇ2 Ď„ T âŽĄ

The formulas (ii), (iii), and (iv) can be derived in a similar manner. We next investigate when our formulas have a ďŹ nite variance, with a view towards simulation. Recall that J 1 , J 2 , J 3 , J 4 , and J 5 be random variables deďŹ ned in the formulas of Theorem 3.4, and that Îł is the constant ďŹ xed in Assumption 3.2. PROPOSITION 3.10 (Variance of the estimators). In the setting of Corollary 3.6, assume that there exist conjugate exponents (q, r ) such that EP [|(ST )|2q ] < +âˆž. (i) Assume that aT > 2(r âˆ¨Îł ) in the condition (a) or that aT > r in (c). Then, (0) VarP ((ST )J1 ) < +âˆž. (0) (ii) VarP ( ST J2 ) < +âˆž. (0) (iii) Assume that aT > r in the condition (a). Then, VarP ((ST )J3 ) < +âˆž. (iv) Assume that aT > 2(r âˆ¨Îł ) in the condition (a) or that aT > r in (c). Then, (0) VarP ((ST )J4 ) < +âˆž. (v) Assume that aT > 4(r âˆ¨Îł ) in the condition (a) or that aT > 2r in (c). Then, (0) VarP ((ST )J5 ) < +âˆž. Proof . See Appendix. REMARK 3.11. As previously mentioned, for example, in the formulas of the case (d), two dummy parameters Îˇ1 and Îˇ2 are non-zero. Let Î˛ := Îˇ2 /Îˇ1 âˆˆ [0, +âˆž) and denote the variance of the J 1 by

(0)

1 + Î˛ 2 . V(Î˛) := VarP ST âˆš âˆš Ďƒ YT + Î˛Ď„ t It is not clear how to ďŹ nd the global minimum argminÎ˛âˆˆ[0,+âˆž) V(Î˛) since the function V is in general not convex with respect to Î˛. Unfortunately, this is also the case for the formulas in the case (b). Nevertheless, it would still be an interesting problem to search local minimizers with a view towards simulation. This will be investigated in a subsequent paper. Finally, we provide some numerical examples to illustrate the performance of the derived Greeks formulas. We only consider the case (a) since the case (b) is essentially the same as (a), while the cases (c) and (d) are based upon the Malliavin calculus on the Wiener space, as discussed in Remark 3.9. (Numerical experiments for the case (c) will be presented in Kawai and Kohatsu-Higa 2010.) We ďŹ x S0 = 100, K = 110, T =

738

R. KAWAI AND A. TAKEUCHI

0.042 0.041 0.040 0.039 0.038 0.037 0.036 MC FD

0.035

0

∂ −rT EP [Φ(ST )], ∂ S0 e

1000000

2000000

vratio= 36 and ε = 5e-4.

1.85 1.80 1.75 1.70 1.65 1.60 1.55

MC FD 0

∂ −rT EP [Φ(ST )], ∂θ e

1000000

2000000

vratio= 42 and ε = 5e-4.

0.0105

0.0100

0.0095

0.0090

0.0085 MC FD

0.0080 0

∂ 2 −rT e EP [Φ(ST )], ∂ S02

1000000

2000000

vratio= 105 and ε = 1e-2.

Var(Finite Difference) FIGURE 3.1. “vratio” indicates the variance ratio Var(Malliavin . The quanCalculus) X((1+ε)θ)−X((1−ε)θ) tity ε is the increment in the central-difference estimation or 2εθ X((1+ε)θ)−2X(θ)+X((1−ε)θ) . 2 (εθ)

739

GREEKS FORMULAS FOR AN ASSET PRICE MODEL WITH GAMMA PROCESSES

1, a = b = 6, Zt â‰Ą 0 for t âˆˆ [0, T], Î¸ = 0.1, Ďƒ = 0, Ď„ = 0, (Îž , Îˇ1 , Îˇ2 ) = (1, 0, 0), and c(Î¸ , Ďƒ , Ď„ ) = aln (1 âˆ’ Î¸ /b), which induces the model risk-neutral. We consider a digital payoff (ST ) := 1 (er T ST > K), with r = 0.05. Figure 3.1 presents the results of âˆ‚/âˆ‚ S0 , âˆ‚/âˆ‚Î¸ , and âˆ‚ 2 /âˆ‚ S02 of eâˆ’r T EP [(ST )]. For clear comparison, the results of the ďŹ nite difference are also provided along with the Monte Carlo convergence of our Greeks formulas. The ďŹ gures and the variance ratios evidently indicate a faster convergence of our Greeks formulas. Note also that the ďŹ nite difference approximation is a biased estimation method.

4. CONCLUDING REMARKS In this paper, we have derived the Greeks formulas for derivative securities with both continuous and discontinuous European payoff structures under a general model for asset price dynamics described by gamma processes. Our model setting includes various important asset price models of practical interest. We have adopted the Malliavin calculus for jump processes by making use of a scaling property of gamma processes with respect to the Girsanov transform. Our approach allows the asset price process to be of a purejump type and incorporate inďŹ nite jump activities. We have also provided numerical results to illustrate that our Greeks formulas have faster rate of convergence relative to the ďŹ nite difference method. As future research, it would be interesting to investigate the application of the proposed approach to payoff functions with path dependent structure. Moreover, from both theoretical and practical points of view, it would be worthwhile extending the class of underlying LÂ´evy processes to, for example, stable and tempered stable processes. These will be investigated in subsequent papers (Kawai and Takeuchi 2010a, 2010b).

APPENDIX: PROOFS Proof of Lemma 3.1. (i) Since Îž Ďƒ = 0, we have

EP

(Îť) âˆ’2 p FT

+âˆž

= 0

b ÎžÎ¸ y + Îˇ1 bĎƒ b âˆ’ ÎžÎť

âˆš b y + Îˇ2 bĎ„ T b âˆ’ ÎžÎť

âˆ’2 p fTP (y) dy,

where fTP is the gamma density given by (2.2). The moment is uniformly bounded as, under the condition (a),

(aT âˆ’ 2 p) b âˆ’ Îž Îť 2 p (aT âˆ’ 2 p) b + Îž Îľ 2 p (Îť) âˆ’2 p = EP FT â‰¤ , (aT) ÎžÎ¸ (aT) ÎžÎ¸ while under the condition (b), +âˆž âˆš âˆ’2 p P âˆš âˆ’2 p b (Îť) âˆ’2 p ÎžÎ¸ = EP FT fT (y) dy â‰¤ Îˇ2 bĎ„ T . y + Îˇ2 bĎ„ T b âˆ’ ÎžÎť 0 Next, under the condition (c),

EP

(Îť) âˆ’2 p

FT

=

(aT âˆ’ p) (aT)

âˆš

b âˆ’ ÎžÎť Îˇ1 bĎƒ

2 p â‰¤

(aT âˆ’ p) (aT)

âˆš

b + ÎžÎľ Îˇ1 bĎƒ

2 p ,

740

R. KAWAI AND A. TAKEUCHI

and ďŹ nally, under the condition (d),

EP

(Îť) âˆ’2 p FT

+âˆž

=

Îˇ1 bĎƒ

0

âˆš b y + Îˇ2 bĎ„ T b âˆ’ ÎžÎť

âˆ’2 p

âˆš âˆ’2 p fTP (y) dy â‰¤ Îˇ2 bĎ„ T .

(ii) Due to

(Îľ)

HT

(Îľ)

FT

âŽ§ âŽŞ âŽŞ = âŽŞ âŽŞ âŽŞ âŽŞ âŽŞ âŽ¨ â‰¤ âŽŞ âŽŞ âŽŞ âŽŞ âŽŞ âŽŞ âŽŞ âŽŠ=

1 , b âˆ’ ÎžÎľ 1 Î¸Îž YT + , âˆš 2 b Ď„ TÎˇ2 (b âˆ’ Îž Îľ) 1 , b

if (a), if (b), if (c) or (d),

and the exponential decays of the gamma and Gaussian densities, the L2Îł (, P)(Îľ) (Îľ) (Îľ) (Îľ) (Îľ) integrabilities of ST HT and of ST HT /FTâˆš follow directlyâˆš from that of YT exp (Î¸ bYT /(b âˆ’ Îž Îť)) for (a)-(b), or that of YT exp(Î¸ YT + Ďƒ YT 1 ) for (c)(d). Observe next that âŽ§ Îž âŽŞ âŽŞ , if (a), = âŽŞ âŽŞ âŽŞ (b âˆ’ Îž Îľ)FT(Îľ) (Îľ) âŽŞ âŽ¨ KT Î¸Îž2 (Îľ) 2 âŽŞâ‰¤ YT , if (b), âˆš âŽŞ FT âŽŞ âŽŞ b((b âˆ’ Îž Îľ)Ď„ Îˇ2 T)2 âŽŞ âŽŞ âŽŠ = 0, if (c) or (d). Its L2Îł (, P)-integrability thus follows from (i) for (a), while obvious for (b)â€“(d). Fi(Îľ) (Îľ) (Îľ) (Îľ) nally, by the deďŹ nition of ST , FT , HT and KT , and by the L2Îł (, P)-integrability (Îľ) (Îľ) (Îľ) (Îľ) (Îľ) (Îľ) (Îľ) 2 of ST HT , ST HT /FT , and KT /(FT ) for each |Îľ| â‰¤ Îť, their squared are all uniformly integrable, due to Îł > 1. Therefore, the respective convergences in L2 (, P) hold true. Proof of Proposition 3.10. It sufďŹ ces to check the square integrability of the estimators. Â¨ inequality with EP [|(ST )|2q ] < +âˆž, it sufďŹ ces to For the variance of J 1 , by the Holder check the existence of the integrals R

R

+âˆž

0

Îž (by âˆ’ aT + 1) + b(Îˇ1 x1 + Îˇ2 x2 ) âˆš âˆš âˆš Îž Î¸ y + Ďƒ2 yx1 + Îˇ1 bĎƒ y + Îˇ2 bĎ„ T

2q qâˆ’1

fTP (y) dy eâˆ’ 2 (x1 +x2 ) d x1 d x2 , 1

2

2

and R

+âˆž 0

2q âˆš âˆš qâˆ’1 Îž Ďƒ4 yx1 + Îˇ2 bĎ„ T

fTP (y) dy eâˆ’ 2 x1 d x1 . 4q âˆš qâˆ’1 âˆš Ďƒâˆš Îž Î¸ y + 2 yx1 + Îˇ1 bĎƒ y + Îˇ2 bĎ„ T 1

2

We can also check in a similar manner that the square integrability of the random variables J 2 , J 3 , J 4 , and J 5 are guaranteed under the corresponding conditions.

GREEKS FORMULAS FOR AN ASSET PRICE MODEL WITH GAMMA PROCESSES

741

REFERENCES

BALLY, V., M. P. BAVOUZET, and M. MESSAOUD (2007): Integration by Parts Formula for Locally Smooth Laws and Applications to Sensitivity Computations, Ann. Appl. Probab. 17(1), 33–66. BICHTELER, K., J. B. GRAVEREAUX, and J. JACOD (1987): Malliavin Calculus for Processes with Jumps, New York: Gordon and Breach Science Publishers. BISMUT, J. M. (1981): Martingales, the Malliavin Calculus and Hypoellipticity under General ¨ Hormander’s Conditions, Z. Wahrsch. Verw. Gebiete 56(4), 469–505. BISMUT, J. M. (1983): Calcul des Variations Stochastique et Processus de Sauts, Z. Wahrsch. Verw. Gebiete 63(2), 147–235. BISMUT, J. M. (1984): Large Deviations and the Malliavin Calculus, Boston: Birkh¨auser. CARLEN, E., and E. PARDOUX (1990): Differential Calculus and Integration by Parts on Poisson Space, in Stochastics, Algebra and Analysis in Classical and Quantum Dynamics, S. Albeverio, Ph. Blanchard, and D. Testard, eds. Dordrecht: Kluwer Academic Publications, pp. 63–73. CARR, P., and D. MADAN (1999): Option Pricing and the Fast Fourier Transform, J. Computat. Finance 2(4), 61–73. CASS, T. R., and P. K. FRIZ (2007): The Bismut-Elworthy-Li Formula for Jump-diffusions and Applications to Monte Carlo Methods in Finance, Preprint arXiv:math/0604311v3. CHAN, T. (1999): Pricing Contingent Claims on Stocks Driven by L´evy Processes, Ann. Appl. Probab. 9(2), 504–528. DAVIS, M. H. A., and M. P. JOHANSSON (2006): Malliavin Monte Carlo Greeks for Jump Diffusions, Stoch. Processes Appl. 116(1), 101–129. DI NUNNO, G., B. ØKSENDAL, and F. PROSKE (2009): Malliavin Calculus for L´evy Processes with Applications to Finance, Berlin: Springer-Verlag. EL-KHATIB, Y., and N. PRIVAULT (2004): Computations of Greeks in a Market with Jumps via the Malliavin Calculus, Finance Stoch. 8(2), 161–179. ELLIOTT, R. J., and A. H. TSOI (1993): Integration by Parts for Poisson Processes, J. Multivariate Anal. 44(2), 179–190. ELLIOTT, R. J., and C.-J. U. OSAKWE (2006): Option Pricing for Pure Jump Processes with Markov Switching Compensators, Finance Stoch. 10(2), 250–275. ELLIOTT, R. J., L. L. CHAN, and T. K. SIU (2005): Option Pricing and Esscher Transform under Regime Switching, Ann. Finance 1(4), 423–432. ELWORTHY, K. D., and X. M. LI (1994): Formulas for the Derivatives of Heat Semigroups, J. Funct. Anal. 125(1), 252–286. FOURNIE´ , E., J. M. LASRY, J. LEBUCHOUX, P. L. LIONS, and N. TOUZI (1999): Applications of Malliavin Calculus to Monte Carlo Methods in Finance, Finance Stoch. 3(4), 391–412. FOURNIE´ , E., J. M. LASRY, J. LEBUCHOUX, and P. L. LIONS (2001): Applications of Malliavin Calculus to Monte Carlo Methods in Finance II, Finance Stoch. 5(2), 201–236. FU, M. C. (2007): Variance-gamma and Monte Carlo, in Advances in Mathematical Finance, M. C. Fu, R. A. Jarrow, J.-Y. J. Yen, and R. J. Elliott, eds. Boston, MA: Birkh¨auser, pp. 21–35. GERBER, H., and E. S. W. SHIU (1994): Option Pricing by Esscher Transform, Trans. Soc. Actuaries 46, 99–191. GOBET, E. (2004): Revisiting the Greeks for European and American Options, in Stochastic Processes and Applications to Mathematical Finance, J. Akahori, S. Ogawa, and S. Watanabe, eds. River Edge, NJ: World Science Publications, pp. 53–71. ˆ Calculus and MarGOBET, E., and R. MUNOS (2005): Sensitivity Analysis Using Ito-Malliavin tingales. Application to Stochastic Optimal Control, SIAM J. Control Optimizat. 43(5), 1676– 1713.

742

R. KAWAI AND A. TAKEUCHI

KAWAI, R., and A. KOHATSU-HIGA (2010): Computation of Greeks and Multidimensional Density Estimation for Asset Price Models with Time-changed Brownian Motion, Appl. Math. Finance 17(4), 301–321. KAWAI, R., and A. TAKEUCHI (2010a): Computation of Greeks for Asset Price Dynamics Driven by Stable and Tempered Stable Processes, submitted. KAWAI, R., and A. TAKEUCHI (2010b): Sensitivity Analysis for Averaged Asset Price Dynamics with Gamma Processes, Stat. Probab. Lett. 80(1), 42–49. KOMATSU, T., and A. TAKEUCHI (2001): On the Smoothness of PDF of Solutions to SDE of Jump Type, Int. J. Differ. Equ. Appl. 2(2), 141–197. L´EANDRE, R. (1985): R´egularit´e de Processus de Sauts D´eg´en´er´es II, Ann. Inst. H. Poincar´e Probab. Statist. 24(2), 209–236. LEON, J. A., J. L. SOLE´ , F. UTZET, and J. VIVES (2002): On L´evy Processes, Malliavin Calculus and Market Models with Jumps, Finance Stoch. 6(2), 197–225. MADAN, D., P. CARR, and E. CHANG (1998): The Variance Gamma Process and Option Pricing, Eur. Finance Rev. 2(1), 79–105. MALLIAVIN, P. (1976): Stochastic Calculus of Variations and Hypoelliptic Operators, in Proceedings of the International Symposium on Stochastic Differential Equations, Kyoto 1976,Wiley 1978, pp. 195–263. NUALART, D. (2006): The Malliavin Calculus and Related Topics, 2nd ed., Berlin: Springer-Verlag. PICARD, J. (1996): On the Existence of Smooth Densities for Jump Processes, Probab. Theory Relat. Fields 105(4), 481–511. TAKEUCHI, A. (2010): The Bismut-Elworthy-Li Type Formulas for Stochastic Differential Equations with Jumps, J. Theor. Probab. 23, 576–604. YOR, M. (2007): Some Remarkable Properties of Gamma Processes, in Advances in Mathematical Finance, M. C. Fu, R. A. Jarrow, J.-Y. J. Yen, and R. J. Elliott, eds. Boston, MA: Birkh¨auser, pp. 37–47.

Mathematical Finance, Vol. 21, No. 4 (October 2011), 573–593

NO-ARMAGEDDON MEASURE FOR ARBITRAGE-FREE PRICING OF INDEX OPTIONS IN A CREDIT CRISIS MASSIMO MORINI Banca IMI, Intesa-SanPaolo, and Bocconi University DAMIANO BRIGO Department of Mathematics, King’s College London

In this work, we consider three problems of the standard market approach to credit index options pricing: the deﬁnition of the index spread is not valid in general, the considered payoff leads to a pricing which is not always deﬁned, and the candidate numeraire for deﬁning a pricing measure is not strictly positive, which leads to a nonequivalent pricing measure. We give a solution to the three problems, based on modeling the ﬂow of information through a suitable subﬁltration. With this we consistently take into account the possibility of default of all names in the portfolio, that is neglected in the standard market approach. We show on market inputs that, while the pricing difference can be negligible in normal market conditions, it can become highly relevant in stressed market conditions, like the situation caused by the credit crunch. KEY WORDS: credit index options, subﬁltrations, credit crunch, default correlation, market models, arbitrage.

1. INTRODUCTION The credit crisis that has followed the problems in the subprime market in 2007 has reduced dramatically the liquidity for the majority of multiname credit derivatives. One exception has been the Credit Index Option market, supported by the sharp increase in historical and implied volatility. During the credit crisis, Payer options have protected investors from the rise of the credit indices, and such options give important trading opportunities also in a context of rapidly declining credit spreads as observed in March– May 2009. The Credit Index Option gives to the investor the possibility to enter a Forward Credit Index at a prespeciﬁed spread K, and to receive upon exercise a Front End Protection corresponding to index losses from option inception to option expiry. The current market approach to the pricing of Credit Index Options is based on the use of a Black formula to price the option as a call on a spread adjusted to account for the Front End Protection.

We thank Tomasz Bielecki, Paolo Longato, and Lutz Schloegl for helpful comments and discussion. Tomasz Bielecki also signalled us that a related work on credit index options is being developed independently in Armstrong and Rutkowski (2007). This paper expresses the views of its authors and does not represent the opinion of Banca IMI or FitchRatings, and neither organization is responsible for any use which may be made of its contents. Manuscript received May 2008; ﬁnal revision received September 2009. Address correspondence to Damiano Brigo, Department of Mathematics, King’s College London, Strand, London WC2R 2LS, UK; e-mail: damiano.brigo@kcl.ac.uk. DOI: 10.1111/j.1467-9965.2010.00444.x C 2010 Wiley Periodicals, Inc.

573

574

M. MORINI AND D. BRIGO

In this paper, we show that this market approach presents three problems that make it not arbitrage-free. First, there is one scenario in which the Black formula on the protection-adjusted Index Spread does not take into account correctly the Front End Protection. In this scenario the pricing formula does not give the correct price of the option, while instead this could be computed in closed-form. A related second problem is that the Index Spread used in market practice is not deﬁned in all states of the world. The third problem regards the theoretical justiﬁcation of the use of a Black formula in this context. According to the Fundamental Theorem of Asset Pricing, a rigorous derivation of a Black formula for the pricing of a Credit Index Option requires the deﬁnition of an appropriate numeraire for change to the pricing measure under which the underlying spread is a martingale (see Delbaen and Schachermayer 1994; Geman, ElKaroui, and Rochet 1995). The problem here is that, since we are dealing with portfolios or indices of defaultable names, the quantity that appears to be the natural choice for a numeraire to simplify pricing is not strictly positive. This makes the standard change of numeraire theory, based on strictly positive numeraires, inapplicable. One approach to circumvent this problem is to extend to a multiname setting the ¨ concept of survival measure, as in Schonbucher (2004) for single name products, based on a numeraire which is not strictly positive. This is a popular and very efﬁcient approach, however the survival measure is not equivalent to the standard risk neutral, forward, and swap measures used in mathematical ﬁnance, but only absolutely continuous with respect to them. In some situations, this can be practically undesirable, since in such a case one can describe the dynamics of credit spreads under the survival measure, but cannot use the standard Girsanov Theorem to see the dynamics of credit spreads under the standard measures used in mathematical ﬁnance. Thus one could not, for example, extend a standard implementation of a Libor Market Model (Brace, Gatarek, and Musiela 1997) for default-free forward rates, usually done under a forward measure, to include also forward credit spreads. Here we show that the deﬁnition of an index spread valid in general, the correct description of the market payoff leading to a price deﬁned in all states of the world, and the use of a valid strictly positive quantity to deﬁne an equivalent pricing measure can be given a general mathematical solution. The solution is based on the deﬁnition of appropriate subﬁltrations, and leads to a fully arbitrage-free formula which is slightly different from the one used in market practice, and in particular it introduces a dependence on default interdependence. In the end we show that, while the mispricing associated to the market formula can be negligible when using inputs consistent with normal credit market conditions, like credit spreads and correlations as from 2007 data before the credit crunch, it can become highly relevant for stressed market conditions, as implied by the level of credit spreads and default correlations in 2008. In Section 2, we introduce the setting and describe the Credit Forward Index and the Credit Index Option Payoff. Then we describe the previous literature and the market practice on the pricing of Index options. In Section 3, we show the problems in the market approach and we introduce the main technical instrument that will allow us to solve them: subﬁltrations. In particular, we introduce a new subﬁltration apt to credit portfolio products. Then we use this instrument to compute a consistent and arbitragefree deﬁnition of the underlying Index spread. In Section 4, we introduce a new pricing measure and prove the main result of the paper, the formula for no-arbitrage pricing of

ARBITRAGE-FREE PRICING OF CREDIT INDEX OPTIONS

575

Index Options, different from the standard market approach. In Section 5, we compare the market formula and our arbitrage-free formula using market inputs before and after the summer 2007 subprime crisis. Our ﬁndings are dramatically different in the two market situations.

2. CREDIT INDEX AND INDEX OPTION 2.1. Probabilistic Setting and Basic Deﬁnitions We are in a complete ﬁltered probability space (, F, P, Ft ), where the ﬁltration Ft satisﬁes the usual hypothesis and we set F0 = (, ∅) and F = F T¯ for a terminal date 0 < T¯ < ∞. We assume the existence of a Ft -measurable bank-account numeraire with positive price process Bt and an associated risk neutral martingale measure Q ∼ P. The discount factor from T to t is D (t, T) = BBTt . To indicate expectation under the risk neutral probability measure Q we often use the notation (Xt ) = E [ Xt | Ft ] for a Q-integrable Xt . The quantity P (t, T) = E [D (t, T) | Ft ] = (D (t, T)) is the default-free zero-coupon bond. We consider n defaultable issuers in a portfolio, and we indicate by τi , i = 1, . . . , n, a positive random time deﬁned on the above probability space. We interpret τi as the time of default of the ith issuer. For a unit of portfolio notional each name has a notional of 1 and a recovery rate R, that we interpret as a deterministic quantity, constant in time n and across issuer. The cumulated loss at t is the Ft -measurable, discontinuous process (1 − R) 1{τi ≤t} . n n

L (t) =

i =1

At time t the Outstanding Notional is N (t) = 1 −

L (t) . (1 − R)

The measurability assumptions on the quantities introduced in this section will be integrated in Section 3.2 where we introduce a subﬁltration structure on (, F, P, Ft ).

2.2. Index and Index Option. Deﬁnitions and Problems DEFINITION 2.1. (Protection and Premium Leg of a Credit Index). The Protection Leg is deﬁned as a contingent claim that pays, at default times τi , the corresponding loss d L(τi ). The discounted payout of the Protection Leg is TM D (t, u) d L (u) t := TA

often approximated in the marketplace by (2.1)

t ≈

M j =A+1

D(t, Tj )[L(Tj ) − L(Tj −1 )].

576

M. MORINI AND D. BRIGO

The Premium Leg is deﬁned as a contingent claim that pays, at times Tj , j = A + 1, . . . , M or until all names have defaulted, a premium K (deterministic and constant) on the average of the outstanding notional N (t) for t ∈ (Tj −1 , Tj ]. The discounted payout of the Premium Leg is ⎧ ⎫ Tj M ⎨ ⎬ t (K) := D(t, Tj ) N (t) dt K ⎩ ⎭ Tj −1 j =A+1

often approximated in the marketplace by ⎫ ⎧

M ⎨ L(Tj ) ⎬ (2.2) K. D(t, Tj )α j 1 − t (K) ≈ ⎩ (1 − R) ⎭ j =A+1

where α j = Tj − Tj −1 . The Payer Forward Index starting at TA and lasting until TM is a contract where the investor pays the Premium Leg in return for the Protection Leg, and has a payoff discounted at t given by It (K) = t − t (K). while its price is given by (It (K)) = E [t − t (K) | Ft ] . Forward Index quotes (It (K)) are not directly provided by the market, since only spot indices (for which TA = t) are quoted. Forward quotes are extracted from spot quotes according to a few modeling assumptions. First the Protection and Premium legs are approximated via (2.1) and (2.2), respectively. The other two modeling assumptions are independence of interest rates and default and portfolio homogeneity. The latter corresponds to assuming that all names have the same credit risk, and consequently the same default probability. This default probability common to all names is expressed through intensity modeling (Lando 1998; Yu 2007), see also Section 5.1. Another important deﬁnition is the following. DEFINITION 2.2 (Index Annuity). The quantity (γt ) : = E[γt | Ft ], γt : =

M j =A+1

Tj

D(t, Tj ) Tj −1

N (t) dt ≈

⎧ M ⎨ ⎩

j =A+1

D(t, Tj )α j

⎫ L(Tj ) ⎬ 1− , (1 − R) ⎭

is the price of a portfolio of defaultable assets and is called the Index Annuity. DEFINITION 2.3 (Payer or Put Credit-Index Option). A Payer Credit Index Option with strike K and exercise date TA, written on an index with maturity TM, is a contingent claim giving the right and no obligation to enter at TA into an Index with ﬁnal payment at TM, as a protection buyer paying a ﬁxed rate K, thus entitled to receive protection from losses in the period between TA and TM. In addition, the protection buyer receives, upon

ARBITRAGE-FREE PRICING OF CREDIT INDEX OPTIONS

577

exercise, also the so-called Front End Protection covering the losses from the option inception to the exercise date TA. The discounted payout of this claim at TA is deďŹ ned as (2.3)

D(t, TA)(FTA + (ITA (K)))+ ,

where the term FTA is the claim component in TA called front end protection, whose discounted payout is deďŹ ned as Ft = D (t, TA) L (TA) . This option is called a Put Credit Index Option in market practice, since it allows to take a short position on credit risk. In the following formulas referring to the pricing of Index Options or the computation of the related equilibrium spread, the time t is assumed to satisfy t â‰¤ TA, and the spread refers to an exercise date TA and a ďŹ nal payment at TM, although this is not explicitly indicated. Analogously the front end protection refers to an exercise date TA. In the initial market approach, the Index Option is priced as a call option on an Index spread deďŹ ned as St :=

(t ) , (Îłt )

and then the value of the front end protection is added to the option price

(2.4) (Îłt ) Black(St , K, Ďƒ TA âˆ’ t) + (Ft ), where Ďƒ is the volatility of the forward spread and Black (S, K, Ďƒ ) = SN (d1 ) âˆ’ K N (d2 )

1 1 S S + Ďƒ2 âˆ’ Ďƒ2 ln ln K 2 K 2 d1 = , d2 = . Ďƒ Ďƒ This rough market approach has several ďŹ‚aws. Its main ďŹ‚aw, as noticed by Pedersen (2003), is that it fails to implement that the front end protection is received only upon exercise. This observation led to an improved Black Formula approach, described for example in Doctor and Goulden (2007). First deďŹ ne the DEFINITION 2.4 (Loss-Adjusted Index Payoff). (2.5)

IËœt (K) = It (K) + Ft = t âˆ’ t (K) + Ft .

This leads to a new spread deďŹ nition, DEFINITION 2.5 (Loss-Adjusted Market Index Spread). It is the value of K setting to zero IËœt (K) rather than (It (K)), (2.6)

(t ) + (Ft ) . SËœt = (Îłt )

578

M. MORINI AND D. BRIGO

We may now rewrite (2.3) as (2.7)

D(t, TA)((γTA )( S˜TA − K))+ .

One can think of taking (γt ) as numeraire and S˜t as lognormal underlying variable so as to price the option with the following DEFINITION 2.6 (Market Payer Credit Index Option Formula).

(γt )Black( S˜t , K, σ˜ TA − t). (2.8) where σ˜ is the volatility of the forward spread. We close this section with the following DEFINITION 2.7 (Receiver or Call Credit Index Option). A receiver Credit Index Option with the same contract speciﬁcations as the payer above, is a contract giving the right to enter at TA into the same Index with ﬁnal payment at TM, as a protection seller receiving a ﬁxed rate K. This option is called a Call Credit Index Option in market practice, since it allows to take a long position on credit risk. Following the same steps as above, the payoff is (−FTA − (ITA (K)))+ , and the Market Receiver Credit Index Option Formula is (γt )(K N(−d2 ) − S˜t N(−d1 )).

3. SUBFILTRATIONS FOR A CONSISTENT INDEX SPREAD 3.1. Problems of the Market Formula A Black Formula approach is important for building a solid and standardized option market. In this approach the complexity of the no-arbitrage dynamics of the underlying asset is consistently transferred to a correct numeraire, through change of measure. For interest rate swap options this was justiﬁed for example by the swap market model framework introduced by Jamshidian (1997). The market option formula (2.8) does not represent yet a consistent extension of this approach to Credit Index Options. In fact, it presents the following problems PROBLEM 3.1. The deﬁnition of the spread S˜t is not valid globally, but only when the denominator ⎡ ⎤ Tj M (γt ) = E ⎣ D(t, Tj ) N (t) dt Ft ⎦ j =A+1

Tj −1

is different from zero. Since (γt ) is the price of a portfolio of defaultable assets, this quantity may vanish, it is not bounded away from zero in all states of the world having positive probability.

ARBITRAGE-FREE PRICING OF CREDIT INDEX OPTIONS

579

PROBLEM 3.2. When (Îłt ) = 0 the pricing formula (2.8) is undeďŹ ned, while instead we will see that the option price is known exactly in such a scenario. We will also see that in this scenario the above spread SËœt does not set the value of the adjusted index to zero. PROBLEM 3.3. Since it is not strictly positive, taking (Îłt ) as a numeraire would lead to the deďŹ nition of a pricing measure not equivalent to the standard risk-neutral measure. To the best of our knowledge, the current literature does not solve these problems. The only partial exception is Jackson (2005), that deals with the second problem while not considering the ďŹ rst and the last one (in particular he uses a numeraire which is not strictly positive). Instead, here we give a solution to the above problems based on the deďŹ nition of appropriate subďŹ ltrations. If one wants to reach a consistent deďŹ nition of the Index spread and a safely arbitrage-free valuation formula, this requires a speciďŹ c treatment of the information sets involved, compared to the default-free case or even to the single-name default case.

3.2. SubďŹ ltration in Multiname Credit Option Modeling In the single name default case, problems similar to those of (2.8) appear in the pricing of an option on a Credit Default Swap (CDS). In the initial market approach the denominator in the deďŹ nition of the CDS fair spread and the numeraire for change of measure in option pricing are the single name defaultable annuity, which has zero value in all states of the world where the underlying name defaults before the option maturity, a set with positive measure. These problems are dealt with by Jamshidian (2004) and Brigo (2005, 2008) making use of a subďŹ ltration structure. Set (3.1)

Ft = Jti âˆ¨ Hti Jti = Ďƒ ({Ď„i > u} , u â‰¤ t) ,

where Jti the natural ďŹ ltration of Ď„i , while Hti represents the Ď„i -default-free information (see Lando 1999; Jeanblanc and Rutkowski 2000; Jamshidian 2004), in the sense that usually one assumes Q(Ď„i > t | Hti ) to be strictly positive a.s. In this context subďŹ ltrations allow to deďŹ ne pricing formulas in terms of positive conditional survival probability. In fact, since the CDS is a defaultable payoff, one can use a result dating back to Dellacherie (1972), that in most ďŹ nancial applications reads: a defaultable payoff with maturity T discounted to t, (3.2)

YtT = 1{Ď„i >T} YtT ,

can be priced via (3.3)

YtT = E YtT Ft =

1{Ď„i >t} i E YtT Hti . Q Ď„i > t Ht

The CDS annuity and the CDS itself are defaultable payoffs, so we can price them with (3.3), conďŹ ning ourselves to making assumptions on the stochastic dynamics of the credit spread only on the Hti subďŹ ltration. The same applies to single name Credit Options because they are knock-out options, namely their value is zero after default. The situation in a multiname setting is different, since we have a plurality of reference entities that can default, so the above subďŹ ltration setting gives us a plurality of possible

580

M. MORINI AND D. BRIGO

subﬁltrations Jti , Hti . However, the solution to the above three problems requires the introduction of a new subﬁltration. We deﬁne a subﬁltration that only excludes information on the event that sets (γt ) to zero. It is useful to introduce now the notation τ (k) for the kth efault time, τ (1) ≤ τ (2) · · · ≤ τ (n−1) ≤ τ (n) . DEFINITION 3.4 (No-armageddon subﬁltration). Deﬁne the Ft stopping-time τˆ := τ (n) = max (τ1 , τ2 , . . . , τn ) corresponding to the time of a portfolio “armageddon event,” namely the default of all names in a portfolio. We assume we have an auxiliary ﬁltration Hˆ t such that, for every t, Ft = Jˆt ∨ Hˆ t Jˆt = σ ({τˆ > u} , u ≤ t) . We assume that Hˆ t excludes, from the total ﬂow of market information, the information Jˆt on the happening of the armageddon event, in the sense that τˆ is not a Hˆ t -stopping time and

Q(τˆ > t | Hˆ t ) > 0 a.s. for any t > 0.

REMARK 3.5 (No-Armageddon Subﬁltration and the Market Model). This deﬁnition in general does not specify uniquely the ﬁltration Hˆ t . It rather sets some constraints on the possible candidates for Hˆ t , and also on the possible models of the market. In fact under some (not unreasonable) model assumptions a ﬁltration with the features we require for Hˆ t may not exist or be meaningless. In Section 5 we consider the standard market Gaussian copula model aggregating reduced form, deterministic intensity models for single names, where it appears possible to ﬁnd a non-trivial deﬁnition of Hˆ t . And yet one must be careful: in the market model with ρ < 1, as it happens in our examples, it is (k) n−1 Jt , where Ht = ∨in=1 Hti is a ﬁltration excluding, in harmless to construct Hˆ t as Ht ∨k=1 (k) the above sense, the information on every default time and Jt is the ﬁltration generated by the kth default time. When however ρ = 1, the only solution is Hˆ t = Ht , since if any τi is a Hˆ t -stopping time also all τi , including τˆ , are Hˆ t -stopping times. However this choice is not particularly awkward when besides ρ = 1 one has homogeneous pool (all default intensities are the same, a common assumption in the market). In this case, in fact, τ1 = τ2 = · · · = τn .1 The Gaussian Copula model is explained in some detail in Section 5.1. REMARK 3.6 (The Money-Market Account and the Armageddon). In the following we assume that Bt is Hˆ t -measurable. In the literature about single name credit models, see Jeanblanc and Rutkowski (2000) or Lando (1998), B (t) is usually adapted to Hti where Hti ideally excludes information on the default of the unique defaultable entity i considered in the single name model. We may intend our assumption of Hˆ t -measurability of B (t) as a consequence of this assumption extended to a multiname context. Since B (t) is not affected by any defaults, neither will it be affected by the last default in a pool. 1 Although a model with this property can appear bizarre, this was the implication of the use of the market standard homogeneous pool Gaussian Copula in 2008, when often quotes for senior tranches showed ρ = 1.

ARBITRAGE-FREE PRICING OF CREDIT INDEX OPTIONS

581

However in the following we need Hˆ t -measurability only, rather than Hti -measurability, ∀i . In some models it could make sense to assume that B (t) jumps at defaults from the ﬁrst to the last-but-one default in a portfolio. For example the monetary authorities may lower interest rates at default of one name to ease the credit conditions of the surviving companies and avoid contagion. When the portfolio is representative of the entire market, at the last default time τˆ there are no surviving companies and so one assumes no effect on B (t). Exploiting that γt = 1{τˆ >TA} γt , deﬁne ˆ t ) := E[γt | Hˆ t ] (γ so as to write (γt ) = E[γt | Ft ] =

1{τˆ >t}

Q(τˆ > t | Hˆ t )

E[γt | Hˆ t ] =

1{τˆ >t}

Q(τˆ > t | Hˆ t )

ˆ t ). (γ

ˆ t ) will induce an effective deﬁnition of the Index Spread and of an The quantity (γ ˆ t ) is never null. This can be equivalent pricing measure for Index Options. In fact (γ ˆ easily seen in the simpliﬁed market representation of (γt ), ⎡ ⎤ M n 1 ˆ t) = E ⎣ D(t, Tj )α j 1{τ (i ) >Tj } Hˆ t ⎦ (γ n j =A+1

=

M j =A+1

=

1 n

n i =1

i =1

α j E 1{τ (i ) >t} D(t, Tj )1{τ (i ) >Tj } Hˆ t

M n−1 1 α j 1{τ (i ) >t} E D(t, Tj )1{τ (i ) >Tj } Hˆ t n

j =A+1

+

i =1

M j =A+1

1 E D(t, Tj )1{τˆ >Tj } Hˆ t , n

where E[D(t, Tj )1{τˆ >Tj } | Hˆ t ] = E D(t, Tj )E 1{τˆ >Tj } | Hˆ Tj | Hˆ t = E[D(t, Tj )Q(τˆ > Tj | Hˆ Tj ) | Hˆ t ] is strictly positive given reasonably mild assumptions on the last-to-default intensity. For reaching a valid deﬁnition of the spread, we need to price the Loss-Adjusted index (2.5) through the Hˆ t subﬁltration. This leads to the following decomposition of the index price that, together with the consequent spread deﬁnition (3.7) and related Proposition 4.1, represent the ﬁrst main result of the paper. PROPOSITION 3.7 (Decomposition of the Loss-Adjusted Index Price Under Subﬁltrations). The arbitrage-free price of the Loss-Adjusted Credit Index of Deﬁnition 2.4 under the Hˆ t subﬁltration structure is (3.4)

( I˜t (K)) = Iˆ1 + Iˆ2 + Iˆ3

582

M. MORINI AND D. BRIGO

where Iˆ1 =

1{τˆ >t}

Q(τˆ > t | Hˆ t )

ˆ t ) − ( ˆ t (K)) + E[1{τˆ >TA} Ft | Hˆ t ]} {(

1{τˆ >t} (1 − R) E[1{t<τˆ ≤TA} D(t, TA) | Hˆ t ] Iˆ2 = Q(τˆ > t | Hˆ t ) Iˆ3 = 1{τˆ ≤t} (1 − R)P(t, TA).

Proof. Since the Loss-Adjusted index is never null, considering that even in case τˆ ≤ TA we receive front end protection, we need to generalize (3.3) as follows YtT = 1{τˆ >t} YtT +1{τˆ ≤t} YtT , T E Yt Ft = E 1{τˆ >t} YtT Ft + E 1{τˆ ≤t} YtT Ft . The ﬁrst component can be computed through (3.3), while we follow Bielecki and Rutkowski (2001) for the second component, which is (3.5)

E 1{τˆ ≤t} YtT Ft = E 1{τˆ ≤t} YtT Jˆt ∨ Hˆ t = 1{τˆ ≤t} E YtT σ (τˆ ) ∨ Hˆ t ,

leading to (3.6)

E YtT Ft =

1{τˆ >t}

Q(τˆ > t | Hˆ t )

E 1{τˆ >t} YtT | Hˆ t + 1{τˆ ≤t} E YtT | σ (τˆ ) ∨ Hˆ t .

Applying (3.6) to I˜t (K), ( I˜t (K)) =

1{τˆ >t}

Q(τˆ > t | Hˆ t )

E[1{τˆ >t} t − 1{τˆ >t} t (K) + 1{τˆ >t} Ft | Hˆ t ]

+ 1{τˆ ≤t} E[t − t (K) + Ft | σ (τˆ ) ∨ Hˆ t ] =: I1 + I2 . We analyze ﬁrst I 1 . Since 1{τˆ >t} t = t ,

1{τˆ >t} t (K) = t (K)

we have I1 =

1{τˆ >t}

Q(τˆ > t | Hˆ t )

ˆ t ) − ( ˆ t (K)) + E[1{τˆ >t} Ft | Hˆ t ]}. {(

It is convenient to perform the following decomposition:

E[1{τˆ >t} Ft | Hˆ t ] = E[1{t<τˆ ≤TA} Ft | Hˆ t ] + E[1{τˆ >TA} Ft | Hˆ t ] = (1 − R)E[1{t<τˆ ≤TA} D(t, TA) | Hˆ t ] + E[1{τˆ >TA} Ft | Hˆ t ].

ARBITRAGE-FREE PRICING OF CREDIT INDEX OPTIONS

583

We now analyze I 2 1{τˆ ≤t} E[t | σ (τˆ ) ∨ Hˆ t ] = 0,

1{τˆ ≤t} E[t (K) | σ (τˆ ) ∨ Hˆ t ] = 0,

1{τˆ ≤t} E[Ft | σ (τˆ ) ∨ Hˆ t ] = 1{τˆ ≤t} E[D(t, TA)L(TA) | σ (τˆ ) ∨ Hˆ t ] = 1{τˆ ≤t} (1 − R)E[D(t, TA) | σ (τˆ ) ∨ Hˆ t ] = 1{τˆ ≤t} (1 − R)P(t, TA)

from which we conclude.

Notice that we are able to use (3.6) for pricing the Loss-Adjusted index since we could compute the second component (3.5) without using assumptions on the dynamics of the forward spread, that, analogously to the single name case, will be given only for the Hˆ t ﬁltration. The second component (3.5) corresponds to the value of the payoff when we know that all defaults have already happened and we know the exact “armageddon” time. In case of the Loss-Adjusted index it corresponds to the expectation of front end protection conditioning to knowledge that all names have already defaulted, so it is simply the entire notional diminished by the recovery. Formula (3.4) shows that the last two components of the Loss-Adjusted index value refer to scenarios where no names survive until maturity. We deﬁne the Loss-Adjusted index as the level of K setting the Index value to zero only in scenarios where some names survive until maturity, leaving out scenarios where no names survive and the payoff has no dependence on K whatsoever (it corresponds to setting to zero only Iˆ1 , which is the price of an armageddon-knock out tradable asset). In this sense, the deﬁnition appears ﬁnancially meaningful. Besides the ﬁnancial interpretation, this deﬁnition is regular since the denominator is bounded away from zero, and it is a martingale under a natural pricing measure, as we see in the next section. DEFINITION 3.8. (Equilibrium and Arbitrage-Free Credit Index Spread). (3.7)

ˆ t ) + E[1{τˆ >TA} Ft | Hˆ t ] ( Sˆt = . ˆ t) (γ

4. ARBITRAGE-FREE PRICING OF INDEX OPTIONS 4.1. General Index Option Pricing Formula under a Subﬁltration Structure The use of subﬁltrations and pricing formula (3.6) has induced a redeﬁnition of the Spread and of the value of the Index at Maturity. This implies a redeﬁnition of the Index Option payoff leading to the following general formula for the price of the Credit Index Option introduced in Deﬁnition 2.3. PROPOSITION 4.1 (Index Option Price Decomposition under Subﬁltrations). (4.1)

(D(t, TA)(( I˜TA (K)))+ ) = Oˆ 1 + Oˆ 2 + Oˆ 3 ,

584

M. MORINI AND D. BRIGO

where 1{Ď„Ë† >t}

Ë† TA )( SË†TA âˆ’ K)+ | HË† t ] E[D(t, TA)(Îł Q(Ď„Ë† > t | HË† t ) 1{Ď„Ë† >t} OË† 2 = E[D(t, TA)1{t<Ď„Ë† â‰¤TA} (1 âˆ’ R) | HË† t ] Q(Ď„Ë† > t | HË† t )

OË† 1 =

OË† 3 = 1{Ď„Ë† â‰¤t} (1 âˆ’ R)P(t, TA).

Proof. From (3.4) and (3.7), ( IËœTA (K)) =

1{Ď„Ë† >TA}

Q(Ď„Ë† > TA | HË† TA )

Ë† TA )( SË†TA âˆ’ K) + 1{Ď„Ë† â‰¤TA} (1 âˆ’ R). (Îł

Indicating with Yt (K) the option discounted payoff, we have + Yt (K) := D (t, TA) IËœTA (K) + 1{Ď„Ë† >TA} Ë† Ë† = D(t, TA) (ÎłTA )( STA âˆ’ K) + 1{Ď„Ë† â‰¤TA} (1 âˆ’ R) Q(Ď„Ë† > TA | HË† TA ) = D(t, TA)

1{Ď„Ë† >TA}

Q(Ď„Ë† > TA | HË† TA )

Ë† TA )( SË†TA âˆ’ K)+ + 1{Ď„Ë† â‰¤TA} (1 âˆ’ R), (Îł

where the last passage follows from the properties of indicators. Using again pricing formula (3.6), (Yt (K)) = 1{Ď„Ë† â‰¤t} E[Yt (K) | Ďƒ (Ď„Ë† ) âˆ¨ HË† t ] +

1{Ď„Ë† >t}

Q(Ď„Ë† > t | HË† t )

E[1{Ď„Ë† >t} Yt (K) | HË† t ]

= O1 + O2 O1 = 1{Ď„Ë† â‰¤t} 1{Ď„Ë† â‰¤TA} E[D(t, TA)(1 âˆ’ R) | Ďƒ (Ď„Ë† ) âˆ¨ HË† t ] = 1{Ď„Ë† â‰¤t} (1 âˆ’ R)P(t, TA), 1{Ď„Ë† >t} 1{Ď„Ë† >TA} + Ë† Ë† Ë† TA )( STA âˆ’ K) | Ht O2 = (Îł E D (t, TA) Q(Ď„Ë† > t | HË† t ) Q(Ď„Ë† > TA | HË† TA ) +

1{Ď„Ë† >t}

Q(Ď„Ë† > t | HË† t )

E[D(t, TA)1{t<Ď„Ë† â‰¤TA} (1 âˆ’ R) | HË† t ],

where we used the assumption that Bt is HË† t -measurable, stated in Remark 3.5. We can simplify further by noticing 1{Ď„Ë† >t} 1{Ď„Ë† >TA} + Ë† TA )( SË†TA âˆ’ K) | HË† t (Îł E D(t, TA) Q(Ď„Ë† > t | HË† t ) Q(Ď„Ë† > TA | HË† TA ) 1{Ď„Ë† >t} 1 + Ë† TA )( SË†TA âˆ’ K) E[1{Ď„Ë† >TA} | HË† TA ] | HË† t = (Îł E D (t, TA) Q(Ď„Ë† > t | HË† t ) Q(Ď„Ë† > TA | HË† TA ) =

1{Ď„Ë† >t}

Q(Ď„Ë† > t | HË† t )

Ë† TA )( SË†TA âˆ’ K)+ | HË† t ]. E[D(t, TA)(Îł

ARBITRAGE-FREE PRICING OF CREDIT INDEX OPTIONS

585

Notice that the correct payoff Yt (K) is split in two parts, and differently from (2.7) it does not lead to Problem 3.2 of Section 3, but instead it represents the actual payoff in all states of the world. The option formula of Proposition 3.7 shows the different components of the option value, and allows us to compute them in a convenient way. The third part Oˆ 3 is just the present value of the option when a portfolio “armageddon event” happens before t. The second part Oˆ 2 takes correctly into account the probability of such an event between now and the option expiry. The ﬁrst part Oˆ 1 , that was the only one considered in the simpler formula (2.8), is the part that we will compute through a closed-form standard option formula.

4.2. An Equivalent Measure for Credit Index Options Expressing the ﬁrst term Oˆ 1 in a tractable way requires the deﬁnition of a viable change of measure, therefore it means solving also Problem 3.3. The solution is given by the following Theorem, that, together with Corollary 4.3, is the second main result of the paper. THEOREM 4.2 (Credit Index Option Price via the No-Armageddon Pricing Measure). ˆ on (, Hˆ TA ) through the Radon– Deﬁne the TA, TM-no-armageddon pricing measure Q Nikodym derivative ZTA =

(4.2)

ˆ ˆ TA ) dQ B0 (γ . Hˆ TA = ˆ dQ (γ0 ) BTA

ˆ is equivalent to the risk neutral measure and the ﬁrst term Oˆ 1 of the Credit Index Then Q Option pricing formula given in Proposition 4.1 can be computed as ˆ TA,TM ( SˆTA − K)+ | Hˆ t , Oˆ 1 = (γt )E ˆ. where Sˆt is a Hˆ t -martingale under the equivalent pricing measure Q ˆ t ) to deﬁne the equivalent TA, TM-noProof. We take the strictly positive quantity (γ ˆ ˆ armageddon pricing measure Q on (, HTA ) through deﬁnition of the Radon–Nikodym derivative of this measure with respect to Q, which is ZTA deﬁned in (4.2). We deﬁne the ˆ to Hˆ t , t ≤ TA via the Hˆ t martingale Zt , t ≤ TA, restriction of Q ˆ dQ ˆ Hˆ t , Zt = E HTA dQ where Zt can be expressed in closed form through market quantities, in fact ⎡ ⎡ ⎤ ⎤ Tj M B0 1 Zt = E ⎣E ⎣ N (t) dt Hˆ TA ⎦ Hˆ t ⎦ ˆ (γ ) B 0

⎡ =

B0 ˆ (γ0 ) Bt

E⎣

j =A+1

M j =A+1

Tj

Bt BTj

Tj −1

Tj Tj −1

⎤ ˆ t) B0 (γ N (t) dt Hˆ t ⎦ = . ˆ (γ0 ) Bt

586

M. MORINI AND D. BRIGO

In our problem we have ⎡

⎤ ˆ dQ ˆ ⎢ E d Q HTA ⎥ ⎢ ⎥ 1{τˆ >t} + ˆ ⎥ ⎢ ˆ ˆ ˆ O1 = (γt )E ⎢ ( STA − K) | Ht ⎥ . ˆ ˆ Q(τˆ > t | Ht ) d Q ˆ ⎣ ⎦ E Ht dQ Since ( SˆTA − K)+ is Hˆ TA -measurable, the well-known Bayes rule for conditional change of measure yields the following result Oˆ 1 =

(4.3)

1{τˆ >t}

Q(τˆ > t | Hˆ t )

ˆ TA,TM [( SˆTA − K)+ | Hˆ t ] ˆ t )E (γ

ˆ TA,TM [( SˆTA − K)+ | Hˆ t ]. = (γt )E ˆ It is not difﬁcult to show that Sˆt is a Hˆ t -martingale under Q

ˆ TA,TM

E

ˆ (γTA ) B t Hˆ [ SˆTA | Hˆ t ] = E SˆTA ˆ t) t BTA (γ =

E[E[t + 1{τˆ >TA} D(t, TA)L(TA) | Hˆ TA ] | Hˆ t ] ˆ t) (γ

=

ˆ t ) + E[1{τˆ >TA} Ft | Hˆ t ] ( = Sˆt . ˆ t) (γ

COROLLARY 4.3 (Black-Like Credit Index Option Pricing Formula). If we assume d Sˆt = σˆ Sˆt d V TA,TM , t ≤ Ta , ˆ and σˆ is the instantaneous volatility, where V TA,TM is a standard Brownian motion under Q we have that the ﬁrst term in the credit index option pricing formula of Proposition 4.1 becomes (4.4)

Oˆ 1 = (γt )Black( Sˆt , K, σˆ TA − t).

REMARK 4.4. Note that we might have selected a different martingale dynamics since our results are more general. In particular a dynamics involving jumps would be more appropriate for Loss-adjusted spread. However the market quotation standard assumes lognormality for the Loss-adjusted spread, like Corollary 4.3. Since in the next we want to show the possible mispricing in the standard market formula due only to the neglecting of the portfolio armageddon scenario, we need to stick to the standard lognormal assumption and compare (4.4) with (2.8).

ARBITRAGE-FREE PRICING OF CREDIT INDEX OPTIONS

587

5. COMPARISON WITH THE MARKET FORMULA IN TWO MARKET CASES In the previous sections we have provided a rigorous theoretical framework for the pricing of Index options, that lacked in the market approach. This has led to replacing the market option formula (2.8) with the arbitrage-free option formula (4.1, 4.4). In this section we show that the improvement of (4.1, 4.4) over (2.8) is not limited to the theoretical justiﬁcation. We compare prices computed with (4.1, 4.4) and prices with (2.8), ﬁrst using inputs in line with market conditions at the beginning of 2007, before the so-called credit crunch, and then using more recent 2008 market data. This will show that, while with the ﬁrst set of data the difference between the two formulas is negligible, with the second set of inputs the arbitrage-free option formula (4.1, 4.4) yields a result relevantly different from (2.8). In fact, as a consequence of the crisis that struck credit markets in summer 2007, the correct accounting of the portfolio armageddon risk allowed only by (4.1, 4.4) has become important, while in normal market conditions it is an event with negligible probability. A feature to notice is that (4.1, 4.4) introduces an explicit dependence on the interdependence of default times, that instead does not enter directly the market option formula (2.8). In fact (4.1, 4.4) depends on the conditional Hˆ t -probability of a portfolio armageddon in (t, TA]. Since we want to test the formulas on inputs in line with market information, we also have to consider market information about default interdependency.

5.1. Default Correlation in the Pricing of Index Options The most important market products depending on default interdependence are the [a, b) Index Tranches, where the Protection buyer receives the changes of the tranched loss, La,b (t) : = (L (t) − a) 1{a≤L(t)<b} + (b − a) 1{L(t)>b} / (b − a) , {a, b} ∈ [0, 1] , b > a. = (bL0,b (t) − a L0,a (t)) / (b − a) . The last passage shows the decomposition of a tranche [a, b) in two equity tranches, which are [0, x) tranches, x ∈ X. These equity tranches are fundamental for the market quotation system where the prices of tranches [a, b) , a, b ∈ X, are computed from prices of [0, a) and [0, b) tranches. In the Euro market tranches are written on the i-Traxx Europe Main index, including the most liquid 125 European investment grade entities, and we have X I = {0.03, 0.06, 0.09, 0.12, 0.22}. In the United States, market tranches are written on the CDX Investment Grade index, including the most liquid 125 American investment grade entities, and we have XC = {0.03, 0.07, 0.1, 0.15, 0.3}. In the market practice the tranche quotes are expressed using the Gaussian Copula Model, for which we refer to Glasserman, Kang, and Shahabuddin (2007), Yu (2007), and references therein. t When used as a model for default times, the fundamental assumption is τi = inf{t : 0 λis ds ≥ εi }, as in Lando (1998), with joint CDF of the default times given (U1 , . . . , Un ), where Ui = exp(−εi ) and C1,...,n is the Gaussian Copula function by C1,...,n with correlation matrix . The positive deterministic process λis , s ≥ 0, is called default intensity of the name i. In particular, it is market standard to assume is a matrix with i , j = ρ, i = j . This is an extension to default interdependence of the homogeneous pool assumption mentioned

588

M. MORINI AND D. BRIGO

in Section 3. A different correlation value ρx is used for pricing different standardized equity tranches [0, x). The map x → ρx is the correlation skew, with x called detachment. The procedure consisting in changing correlation for different tranches allows to overcome some limitations of the Gaussian Copula framework, in particular the lack of asymptotic tail dependency when correlation is lower than unit. Glasserman, Kang, and Shahabuddin (2007) explains that in market quotes tranches with higher x have higher correlation, and relates it to the fact that wider tranches are sensitive to the defaults of the most highly rated underlying credits, those with lower default intensity λis . In fact higher correlation can offset the diminished interdependence of defaults when intensities are very low, a ﬁnding from Glasserman, Kang, and Shahabuddin (2007). For applying the arbitrage-free option formula (4.1, 4.4) we need to compute the risk-neutral probability of a portfolio armageddon by expiry TA,

E[1{τˆ ≤TA} | Hˆ 0 = F0 ]. We notice that this coincides with (n − 1)(1 − R) E[La,b (t) | Hˆ 0 = F0 ], a = , b = (1 − R). n This indicates that knowing ρa and ρb we can compute this probability as implied by market quotations. Unfortunately, quotes are not provided for tranches so wide. If we consider an index with n = 125 and R = 0.4, typical features of i-Traxx Main and CDX market standards, we have a = 0.5952, b = 0.6, while market quotations are provided on i-Traxx only up to x = 0.22 and on CDX up to x = 0.3. As explained above, correlation increases with the size of the tranche, and in fact in the market practice correlation for higher tranches are provided by extrapolating higher correlations from the increasing x → ρx map. Thus, besides expecting ρa ≈ ρb because b − a is extremely small, we also expect this correlation level to be even higher than the level x = 0.3 observable on the CDX market. However, in the following tests we do not increase correlation further, but we consider for ρa = ρb a range of equally spaced correlations in-between the highest correlations of i-Traxx (ρ0.22 ) and CDX (ρ0.3 ). Thus, while extrapolation would suggest to increase correlation beyond any quoted ρx , we limit this correlation to the market quoted CDX ρ0.3 . This has two advantages. First, we limit our tests to values which are observable in the market. Second, this choice tends to underestimate the probability of τˆ . This implies that, in the following tests, the difference between our arbitrage-free pricing of Index Options and the standard Market pricing is more likely to be underestimated, rather than overestimated.

5.2. Market Tests The arbitrage-free option formula (4.1, 4.4) is implemented in practice as follows. At time 0, under independence of default risk and interest rates,

(1 − R) P (0, TA) Q (τˆ ≤ TA) , K, σˆ TA (Y0 (K)) = (γ0 ) Black S˜0 − (γ0 ) + (1 − R) P (0, TA) Q (τˆ ≤ TA) ,

ARBITRAGE-FREE PRICING OF CREDIT INDEX OPTIONS

589

TABLE 5.1 Market Inputs: March 9, 2007 (left), March 11, 2008 (right) March 9, 2007

March 11, 2008

22.50 bp 23.67 bp 52% 54% 0.545 0.701 3.993

154.50 bp 163.60 bp 108% 113% 0.912 0.999 3.912

0,5y

Spot spread 5y: S0 9m,5y Forward Spread Adjusted 9m–5y: S˜0 9m,5y × 0.9 Implied Volatility, K = S˜0 9m,5y ˜ × 1.1 Implied Volatility, K = S0 I Correlation 22% i-Traxx Main: ρ0.22 C Correlation 30% CDX IG: ρ0.3 9m,5y ) Annuity 9m-5y: (γ0

based on ˆ 0 ) + E[1{τˆ >TA} F0 | Hˆ 0 = F0 ] ( (0 ) + E[1{τˆ >TA} F0 ] Sˆ0 = = ˆ (γ0 ) (γ0 ) =

(0 ) + E[F0 ] − E[1{τˆ ≤TA} F0 ] (γ0 )

(1 − R)P(0, TA)Q(τˆ ≤ TA) = S˜0 − . (γ0 ) The main inputs we use are given in Table 5.1. They are provided by JPM and based on the homogeneous pool assumption, that we also follow for the price computations. Inputs include Index spreads for the i-Traxx Main and implied volatilities for two Credit Index Options on that Index. The options considered have expiry in 9 months and allow to enter the index with maturity in 5 years. We consider one in-the-money option and one out-of-the-money option. Inputs include the annuity and the default correlations that, together with default probabilities, allows to compute Q (τˆ ≤ TA). Notice that StTA,TM indicate the Index spread St for an index starting at TA and lasting until TM. We price options using ﬁrst market inputs as of March 9, 2007, and then using market inputs as of March 11, 2008, about 1 year later and after the subprime crisis initiated in July 2007. Looking at Table 5.1 one sees that the situation is very different in the two dates. The reference market spreads have increased by a factor of six. Implied volatility is more than doubled. Correlations have also increased dramatically with the increased perceived systemic risk. The results for March 9, 2007 inputs are shown in Table 5.2. In the ﬁrst row is the price of a standard Index Option using the market option formula (2.8). Below is the price computed with the arbitrage-free option formula (4.1, 4.4), using the same implied volatility provided by market quotations, and with the different correlation values considered. The differences are negligible, considering also that in March 9, 2007 the bid-offer spread, the threshold over which a pricing difference starts to be ﬁnancially relevant, was higher than 1 bp. The results for March 11, 2008 inputs are shown in Table 5.3. Here the difference between the price computed with the market option formula (2.8) and the price with the arbitrage-free option formula (4.1, 4.4) is much larger. For out-of-the-money Call options differences range from 9.45 bp to 81.45 bp. The differences appear particularly

590

M. MORINI AND D. BRIGO

TABLE 5.2 March 9, 2007 Options on i-Traxx 5y, Maturity 9m: Call (left), Put (right) Strike

26

21

Market Formula No-Arb. Form. ρ No-Arb. Form. ρ No-Arb. Form. ρ No-Arb. Form. ρ

= 0.545 = 0.597 = 0.649 = 0.701

23.289 23.289 23.289 23.289 23.286

11.619 11.619 11.619 11.618 11.614

Market Formula No-Arb. Form. ρ No-Arb. Form. ρ No-Arb. Form. ρ No-Arb. Form. ρ

= 0.545 = 0.597 = 0.649 = 0.701

13.840 13.840 13.840 13.840 13.843

21.069 21.069 21.069 21.069 21.071

TABLE 5.3 March 11, 2008 Options on i-Traxx 5y, Maturity 9m: Call (left), Put (right) Strike

180

147

Market Formula No-Arb. Form. ρ No-Arb. Form. ρ No-Arb. Form. ρ No-Arb. Form. ρ

= 0.912 = 0.941 = 0.970 = 0.999

286.241 277.668 271.460 258.887 212.867

189.076 179.624 172.769 158.862 107.630

Market Formula No-Arb. Form. ρ No-Arb. Form. ρ No-Arb. Form. ρ No-Arb. Form. ρ

= 0.912 = 0.941 = 0.970 = 0.999

222.242 226.871 230.326 237.606 268.186

253.076 256.826 259.634 265.580 290.948

relevant under a ﬁnancial point of view since the bid offer spread in March 11, 2008 was in the range 5–8 bp. Recalling that the main difference between (4.1, 4.4) and (2.8) is the relevance of Q (τˆ ≤ TA) in the arbitrage-free option formula (4.1, 4.4), it appears that perceived higher systemic risk, triggered by the subprime crisis, has made the risk-neutral probability of an armageddon event not negligible anymore. While we understand that market agents tend not to look at the correlation market when pricing Credit Options, the above results show that taking correctly into account the possibility of portfolio armageddon is not only an issue that allows the deﬁnition of the spread and of the pricing measure to be regular under a mathematical point of view, but can also be of ﬁnancial relevance in peculiar market situations, such as the credit crisis initiated in July 2007.

5.3. Portfolio Armageddon Risk-Neutral Probability In order to give the reader a more precise indication about the effect of the different inputs on the above results, we show in Figure 5.1 the relation

ARBITRAGE-FREE PRICING OF CREDIT INDEX OPTIONS

5

591

Armageddon Probability in T= 9 months

x 10

Index Spread =154.5%, March 11 08 Index Spread =22.5%, March 09 07

4.5 4 3.5

Pr(Ď„<T)

3 2.5 2 1.5 1 0.5 0 0.5

0.6

0.7

Ď

0.8

0.9

1

FIGURE 5.1. Probability of armageddon in 9 months.

between the portfolio armageddon risk neutral probability and the level of default correlation. We do it both for a level of the homogeneous index spread consistent with March 9, 2007 and for a level consistent with March 11, 2008. We see that in both cases we have an exponentially increasing relationship, indicating that it is mainly the increase of correlation which can make the probability of a portfolio armageddon become nonnegligible. However we clearly see that the correlation level at which this happens depends on the level of the default probabilities, expressed by the index spread. This can be related to the results of Glasserman, Kang, and Shahabuddin (2007), showing that defaults become nearly independent when probabilities decrease. This also conďŹ rms that the relevant differences between the price with the market option formula (2.8) and the price with the arbitrage-free option formula (4.1, 4.4), depending on the size of the risk-neutral armageddon probability, is associated to both features typical of a systemic credit crisis: higher correlations and higher default probabilities. We also mention that, although the historical probability of total portfolio default appears clearly negligible when we are considering large portfolios of investment grade issuers (such as the i-Traxx 125-name Main Index), here we are looking at risk-neutral, not historical, probabilities. We also add that there is evidence in the literature that the armageddon risk is priced also in normal market conditions, so that its risk-neutral probability is lower than in the above March 21, 2008 results, but can still be detected. Brigo, Pallavicini, and Torresetti (2007a,b) and Longstaff and Rajan (2008) show that dynamics Loss models need to include a jump process associated to an armageddon event so as to price correctly the tranches written on the i-Traxx Main and CDX IG indices. Finally, there is a category of credit portfolio options for which the probability of the whole portfolio to be wiped out by defaults is never negligible, even in normal market

592

M. MORINI AND D. BRIGO

conditions: it is the case of Tranche Options. For such options, that are often embedded in cancelable tranches, the Tranched Loss Lx,y (t) replaces the Loss L (t) and τˆ is the ﬁrst time when Lx,y (t) = 1. For tranches where x is low and y − x is narrow, τˆ can be lower than the option maturity with a probability which is never negligible. Using a formula analogous to the arbitrage-free option formula (4.1, 4.4) is fundamental for the correct pricing of such products.

6. CONCLUSION In this work, we have considered three problems of the standard market approach to the pricing of portfolio credit options: the deﬁnition of the index spread is not valid in general, the considered payoff leads to a pricing which is not deﬁned in all states of the world, and the candidate numeraire to deﬁne a pricing measure is not strictly positive, which would lead to a nonequivalent pricing measure. We have given to the three problems a solution based on modeling the ﬂow of information through a suitable subﬁltration. Using this subﬁltration, we take into account consistently the possibility of default of all names in the portfolio, that is neglected in the standard market approach. We have shown that, while this mispricing can be negligible for standard options in normal market conditions, it can become highly relevant in stressed market conditions. In particular, we show with inputs from 2008 market data that in the credit crunch crisis the difference between the market formula price and the no-arbitrage formula we propose has become ﬁnancially relevant, while it is negligible if computed with inputs from 1 year earlier. The results apply not only to Credit Index options, but to all market cases in which a spread deﬁned with reference to a portfolio of defaultable entities must be modeled in an arbitrage-free way. The deﬁnition of an equivalent pricing measure through subﬁltrations, and the related results on the dynamics of a well-deﬁned portfolio credit spread, lay the foundations of an extension to a multiname credit setting of the no-arbitrage approach known as Market Models, dating back to Jamshidian (1997) and Brace, Gatarek, and Musiela (1997). Such extension is considered by Jamshidian (2004), Brigo (2005), and Brigo and Morini (2005) for the case of single name credit products. The extension to a multiname setting is technically different and also shows a wider market applicability. In fact the Market Models approach allows precise arbitrage-free consistency in the modeling of market rates and spreads, and requires information on volatilities and correlations of such rates and spreads. These features have made the approach very successful in the interest rate derivatives market, but they hardly hold for single name credit derivatives, split among different reference entities with rare options. Instead the Credit Indices, even during crises like the credit crunch, absorb most of the liquidity of the credit derivatives market, providing a reference market from which information for modeling market quantities can be extracted. REFERENCES

ARMSTRONG, A., and M. RUTKOWSKI (2007): Valuation of Credit Default Index Swaps and Swaptions. Working Paper, UNSW. BIELECKI, T., and M. RUTKOWSKI (2001): Credit Risk: Modeling, Valuation and Hedging, New York: Springer Verlag.

ARBITRAGE-FREE PRICING OF CREDIT INDEX OPTIONS

593

BRACE, A., D. GATAREK, and M. MUSIELA (1997): The Market Model of Interest Rate Dynamics, Math. Finance 7, 127–154. BRIGO, D. (2005): Market Models for CDS Options and Callable Floaters, Risk Mag., January issue, 89–94. BRIGO, D. (2008): CDS Options through Candidate Market Models and the CDS-Calibrated CIR++ Stochastic Intensity Model, in Credit Risk: Models, Derivatives and Management, N. Wagner, ed. Chapman & Hall/CRC Financial Mathematics Series, pp. 393–426. BRIGO, D., and M. MORINI (2005): CDS Market Models and Formulas, in Proceedings of the 18th Annual Warwick Option Conference, September 30, 2005, Warwick University, UK. BRIGO, D., A. PALLAVICINI, and R. TORRESETTI (2007a): Calibration of CDO Tranches with the Dynamical Generalized-Poisson Loss Model, Risk Mag., May issue. BRIGO, D., A. PALLAVICINI, and R. TORRESETTI (2007b): Cluster-Based Extension of the Generalized Poisson Loss Dynamics and Consistency with Single Names, Int. J. Theor. Appl. Finance, 10(4). (Also in A. LIPTON and A. RENNIE, eds. (2007): Credit Correlation—Life a After Copulas, World Scientiﬁc.) DELBAEN, F., and W. SCHACHERMAYER (1994): A General Version of the Fundamental Theorem of Asset Pricing, Matematische Annalen 300, 463–520. DELLACHERIE, C. (1972): Capacit´es et processus stochastiques, Springer-Verlag, Berlin. DOCTOR, S., and J. GOULDEN (2007): An Introduction to Credit Index Options and Credit Volatility, JP Morgan, Credit Derivat. Res., Vol. 2007. GEMAN, H., N. ELKAROUI, and J. C. ROCHET (1995): Changes of Numeraire, Changes of Probability Measure and Pricing of Options, J. Appl. Probab. 32, 443–458. GLASSERMAN, P., W. KANG, and P. SHAHABUDDIN (2007): Large Deviations in Multifactor Portfolio Credit Risk, Math. Finance 17(3), 345–379. JACKSON, A. (2005): A New Method for Pricing Index Default Swaptions, Proceedings of the 18th Annual Warwick Option Conference, September 30, 2005, Warwick University, UK. JAMSHIDIAN, F. (1997): LIBOR and Swap Market Models and Measures, Finance Stoch. 1(4), 293–330. JAMSHIDIAN, F. (2004): Valuation of Credit Default Swaps and Swaptions, Finance Stoch. 8, 343–371. JEANBLANC, M., and M. RUTKOWSKI (2000): Default Risk and Hazard Process, in Mathematical Finance Bachelier Congress 2000, H. Geman, D. Madan, S. R. Pliska, and T. Vorst, eds. Heidelberg: Springer Verlag. LANDO, D. (1998): On Cox Processes and Credit Risky Securities, Derivat. Res. 2, 99–120. LONGSTAFF, F. L., and A. RAJAN (2008): An Empirical Analysis of the Pricing of Collateralized Debt Obligations, J. Finance 63(2), 529–563. PEDERSEN, C. M. (2003): Valuation of Portfolio Credit Default Swaptions, Lehman Brothers Quantit. Credit Res., Vol. 2003. ¨ , P. (2004): A Measure of Survival, Risk Mag., January issue. SCHONBUCHER YU, F. (2007): Correlated Defaults in Intensity-Based Models, Math. Finance 17(2), 155–173.

Mathematical Finance, Vol. 21, No. 4 (October 2011), 681–701

OPTIMAL TRADE EXECUTION IN ILLIQUID MARKETS ERHAN BAYRAKTAR University of Michigan MICHAEL LUDKOVSKI University of California

We study optimal trade execution strategies in ﬁnancial markets with discrete order ﬂow. The agent has a ﬁnite liquidation horizon and must minimize price impact given a random number of incoming trade counterparties. Assuming that the order ﬂow N is given by a Poisson process, we give a full analysis of the properties and computation of the optimal dynamic execution strategy. Extensions, whereby N is a Markov-modulated compound Poisson process are also considered. We derive and compare the properties of the various cases and illustrate our results with computational examples. KEY WORDS: optimal order execution, liquidity modeling, dark pools, Markov-modulated Poisson process.

1. INTRODUCTION One of the most important problems faced by a stock trader is how to unwind large block orders of security shares. Liquidation of a large position in a security is a challenge due to two factors: (a) possible lack of a counterparty; and (b) price impact that depresses prices by increasing supply. This occurs because the immediate market resiliency is limited and a single large order may exhaust all current buyers, bringing about dramatic price declines. Price impact implies that it is generally beneﬁcial to split the order into several smaller blocks and sell each subblock separately. Presence of counterparties is less of a concern in traditional limit order book markets where a market maker is always quoting a price. However, trading in such markets may be disadvantageous due to information leak/privacy concerns. Indeed, by examining the order book, other participants may recognize the large trader and move against her, even if she attempts to split her trades. Thus, a recent trend involves trading in dark pool markets where there is no order book and buyers/sellers are matched up electronically without revealing any information. Such dark trades minimize information leakage and dramatically reduce risk of adverse price movement compared to conventional limit book trading. However, liquidity becomes a major concern as there is no market maker and no counterparty may be forthcoming. This work was initiated at the NSF-CBMS Regional Conference on Convex Duality Method in Mathematical Finance at UC Santa Barbara. We thank the organizers for their hospitality and Alexander Schied for his stimulating talk that provided the original impetus for our analysis. We are also grateful to the two referees whose comments helped us improve our paper. E. Bayraktar is supported in part by the National Science Foundation, under Grant NSF-DMS-09-06257. Manuscript received February 2009; ﬁnal revision received October 2009. Address correspondence to Michael Ludkovski, Department of Statistics and Applied Probability, University of California, Santa Barbara, CA 93106-3110, USA; e-mail: ludkovski@pstat.ucsb.edu. DOI: 10.1111/j.1467-9965.2010.00446.x C 2010 Wiley Periodicals, Inc.

681

682

E. BAYRAKTAR AND M. LUDKOVSKI

We refer to trade publications such as QPL Newsletter (2008) for more information on the evolving marketplace of dark pools and their numerous speciﬁcation variations. In this paper, we propose a new framework that explicitly takes into account such liquidity features of large order trades. Thus, we replace the classical continuous trading environment with a discrete order book. In our model, incoming buy orders are represented by a Poisson process which encodes the order arrival times. To capture the empirical feature of splitting large orders into smaller pieces, we will focus on price impact and eschew consideration of actual prices. Larger trades involve a volume discount and therefore tend to carry higher spread versus the current quoted market order price. Also, smaller trades are desirable in order to maintain anonymity and mitigate information leaks. Subject to the constraint that trades are only possible at order times, the objective of the agent is to execute her large order trade within a speciﬁed time window while minimizing execution costs. Most of the existing analysis of optimal execution has focused on limit order book markets (see for example Almgren 2003; Almgren and Lorenz 2006; Obizhaeva and ¨ Wang 2006; Alfonsi, Fruth, and Schied 2010; Schied and Schoneborn 2009, 2007). Since a market maker is always present, all cited models assume a continuous-time trading environment, with the asset price usually represented by a diffusion price process. The price impact is decomposed into temporary and permanent effects and execution strategies are speciﬁed in terms of liquidation rates per unit time. The overall problem is then translated into a continuous or singular stochastic control formulation. Conversely, in our approach all trades are discrete and therefore an execution strategy corresponds to an impulse control setting. Also, in the above literature the optimal liquidation strategies turn out to be deterministic and can be sometimes explicitly determined. In contrast, our optimal strategies are intrinsically path-dependent and will be affected by the stochastic order ﬂow. Finally, while the above papers typically consider an inﬁnite horizon, we assume that the agent has a hard deadline to liquidate her large block. Thus, time-to-maturity is a crucial variable in our setup; it can also be used to express time-dependencies, where, for example, the opening and closing hours have more active trading than midday. To sum up, our contribution is a new approach to modeling dark pool liquidity in terms of point processes. As we show below, our models are ﬂexible, allow for a quick implementation, and admit fruitful probabilistic analysis. Let us now outline the basic ingredients of our model. We assume that the order book is a Poisson process N with arrival times σi which denote the time-stamp of the ith order. In our base model we postulate that N is a simple Poisson process with constant intensity λ on a stochastic basis (, F, P). Suppose the agent has k shares (or units) to sell and an execution horizon of T time epochs. We postulate that at terminal date T all unsold units are immediately disposed off as one large trade, for example, through the traditional limit order book. Thus, effectively there is always one more matching order arriving at T. The price impact is represented in terms of a strictly increasing and strictly convex market depth function F, where F(a) represents the cost of placing a trade of size a (F(a) could also represent the average cost of a random price impact, assuming this randomness is independent of everything else in the model). Let F = (Ft ), Ft = σ (Ns : 0 ≤ s ≤ t) be the ﬁltration generated by the observation process. Then the optimization problem of the agent can be written as (1.1)

⎡

v(k, T) = inf E ⎣ ξ ∈Ak

i :σi ≤T

⎤ F(ξσi − − ξσi ) + F(ξT )⎦ =: inf v ξ (k, T), ξ ∈Ak

k ∈ N+ , T ∈ R+ ,

TRADE EXECUTION IN ILLIQUID MARKETS

683

where Ak is the set of all F -adapted, integer-valued, positive, and nonincreasing processes whose values change only at the time of jumps of the Poisson process N with ξ0 = k. The convexity of F is interpreted as the limited market resiliency and encourages the agent to split the large k-order into smaller pieces. However, placing a smaller trade now is risky as no more orders might come in and the trader will be left with a large leftover at T (which will carry a large associated penalty). Thus, the convexity of F also represents the impatience of the agent in terms of current versus future trading and is formally similar ¨ to the risk-aversion level in Schied and Schoneborn (2007, 2009). In terms of the stochastic control formulation, (1.1) is related to best choice problems with Poisson processes (see for example Cowan and Zabczyk 1978; Bruss 1987). In particular, Stadje (1987, 1990) studied a similar problem for a Poisson process in the context of multi-item dynamic pricing. The mathematical problem in (1.1) is a compromise between a tractable analytical model and real markets. In general, the execution problem with illiquid trading is not so well studied and a big challenge is to develop parsimonious models that will prescribe reasonable optimal liquidation policies. The use of a Poisson process for N allows for a comprehensive analysis of (1.1) in Section 2, however, it is clearly not very rich to capture all the intricacies of real order books. Accordingly, we consider in Section 3 several extensions to address such issues. Our base model allowed arbitrary trade sizes; in practice the agent is only able to trade up to the order size which is the second dimension of the order ﬂow. Accordingly, in Section 3.1 we take N to be a compound Poisson process, consisting of pairs (σi , Yi ) of (order times, order sizes). Correspondingly, the original problem (1.1) is modiﬁed to constrain ξσi − − ξσi ≤ Yi . In Section 3.2 we proceed to relax the assumption that the intensity of N is constant throughout the problem horizon. Given widespread evidence that real markets experience different liquidity regimes, we extend our model to the case where N is a Markovmodulated Poisson process. Finally, in Section 3.3 we consider the limiting case where trade amounts can be continuous, which allows a dimension reduction for a certain class of depth functions F. To illustrate the different models above, Section 4 presents several computational examples; ﬁnally Section 5 concludes and points to possible future extensions.

2. ANALYSIS OF THE OPTIMAL LIQUIDATION PROBLEM In this section we analyze the properties of v(k, T) as deﬁned in (1.1). The treatment below allows us to give a clear insight of the structure of v(k, T) and leads to a particularly simple algorithm to compute v and the associated optimal strategy, see Remark 2.7. Our ﬁrst observation is that v(k, T) satisﬁes the following dynamic programing equation: (2.1)

v(k, T) = E min {v(k − a, T − σ1 ) + F(a))} · 1{σ1 <T} + F(k) · 1{σ1 ≥T} . a∈{1,...,k}

A more general version of this dynamic programing principle is proved in Proposition 3.2. Let a(k, T) be the optimal order size to place when an order arrives given that one has k units remaining and T epochs until the terminal date. It follows from (2.1) that (2.2)

a(k, T) = argmin {v(k − a, T) + F(a)}. a∈{1,...,k}

684

E. BAYRAKTAR AND M. LUDKOVSKI

The above equation is simply the dynamic programing principle that says that the best immediate action is to sell a units, such that the sum of the current cost F(a) and expected future costs as represented by the value function v(k − a, T) is minimized. To avoid ambiguity, we will assume that if the minimizer in (2.2) is not unique, then a(k, T) is the smallest minimizer. Before proceeding, let us make further note of the Markovian properties of our system. In parallel with the original formulation in (1.1) in terms of dynamic controls ξ , one may also describe Markov control strategies as {b(k, T) : k ∈ N, T ∈ R+ }, specifying the trading amount conditional on still having k units left with time horizon of T periods. Given such {b(k, T)}, the corresponding dynamic unit inventory process is denoted by (b,k,T) ξt and satisﬁes (2.3)

(b,k,T)

dξt

(b,k,T)

= −b ξt , T − t d Nt ,

(b,k,T)

ξ0

= k.

(b,k,T)

Economically, ξt represents the remaining number of units at date t when employing the execution strategy b. Using a to denote the strategy characterized by (2.2), it follows that an optimal inventory process for (1.1) is of the Markovian feedback type and is given by ξ ∗ ≡ ξ (a,k,T) . This result is similar to the derivation of the optimal impulse control for piecewise deterministic Markov processes (see, for example, Davis 1993).

2.1. Computing v(k, T) To illustrate the problem, let us compute v(k, T) for k ≤ 4. Clearly v(1, T) = F(1). For k = 2 and k = 3 due to the convexity of F it is always optimal to place orders of size one. For k = 2, this will be possible as long as there is an arrival before date T, that is, N(T) ≥ 1 (recall that the remainder, if any, is disposed of at T). Applying (2.1) and recalling the properties of the Poisson process yields v(2, T) = 2F(1) · (1 − e−λT ) + F(2) · e−λT . Similar logic implies v(3, T) = F(3)P(σ1 > T) + E[(F(1) + v(2, T − σ1 ))1{σ1 <T} ] T −λT + (F(1) + v(2, T − s))λe−λs ds = F(3)e 0

= e−λT F(3) + λTe−λT F(2) + (3 − 3e−λT − 2λTe−λT )F(1). The above analysis shows that a(1, T) = a(2, T) = a(3, T) = 1 for all T > 0. When k = 4, the order size is based on whether v(3, T − σ1 ) + F(1) ≥ v(2, T − σ1 ) + F(2) at the ﬁrst arrival time σ 1 . If the latter inequality is true, then one is better off selling two units, otherwise a single unit is optimal to trade. Using the above formulas for v(2, ·) and v(3, ·), it can be seen that there exists a critical threshold t (4,2) , such that a(4, T) = 1 + 1{T<t(4,2) } . Hence, if little time until expiration is left, the agent becomes impatient and trades more. We conclude this section with upper and lower bounds for v(k, T). The next lemma gives an easy way to compute lower bound for the value function. Below we extend the domain of F to the whole positive real line such that F : R+ → R+ is still strictly convex and increasing.

TRADE EXECUTION IN ILLIQUID MARKETS

685

LEMMA 2.1. We have (2.4) v(k, T) ≥

(n + 1)F

n<k−1

k n+1

· P(N(T) = n) + kF(1) · P(N(T) ≥ k − 1) := v(k, T).

Proof. Consider a genie who is affected by the randomness but for each state of the world can tell how many arrivals there will be and is further allowed to divide up her orders into nonintegral bits of size ≥ 1. The right-hand side of (2.4) is the genie’s solution to (1.1). As counterpart to Lemma 2.1, we have the following tight upper bound to the value function. LEMMA 2.2. We have (2.5) v(k, T) ≤ min v (k, T) min c

c∈N+

c∈N+

⎧ ⎪ ⎨ k ⎪ ⎩

c

· F(c) + F k − c ·

k c

· P N(T) ≥

k c

⎫ k ⎪

c −1

⎬ k · P(N(T) = n) , + n · F(c) + F k − n ⎪ c ⎭ n=0

in which x/c is the largest integer smaller than x/c. Proof. The right-hand side of (2.5) is the cost of a constant c-strategy. This is the strategy where the agent insists on trading c units at each arrival time until terminal date T, whence the remainder is liquidated. Although she originally optimizes over c, clearly this is a suboptimal strategy. The bound in (2.5) becomes tight as T → ∞, the liquidity risk vanishes, and the optimal strategy is to always trade a single unit c∗ = 1.

2.2. Properties of the Value Function Below, we study the dependence of v(k, T) and a(k, T) on the (integer-valued) number of units to sell k and the (continuous) time to expiration T. Intuitively, we expect that more units cost more for the agent to liquidate, while longer time horizon reduces costs. These observations are in fact model-free in the sense that they depend solely on the convexity of F and not on any properties of the arrival process N. Below we state them as a basic lemma. LEMMA 2.3. The map k → v(k, T) is increasing and the map T → v(k, T) is decreasing. Proof. Let ξ be any admissible control for v(k, T). Then ξ is also admissible for v(, T) for any ≥ k, which immediately establishes the ﬁrst part of the lemma. Conversely, for any T > T, deﬁne a control ξ via ξt = ξt for t ≤ T and ξσ i − − ξσ i = 1{ξσ i >0} for T < σi ≤ T . Then ξ is an admissible control for v(k, T ). Moreover, due to the convexity

686

E. BAYRAKTAR AND M. LUDKOVSKI

of F, the pathwise cost of ξ is at most the pathwise cost of ξ , F(ξσ i − − ξσ i ) + F(1) + F(ξT ) ≤ F(ξσi − − ξσi ) + F(ξT ) i :σi ≤T

j :T<σ j ≤T

P − a.s.,

i :σi ≤T

with strict inequality if ξT > 1 and N(T ) − N(T) > 0. It follows that v(k, T; ξ ) ≤ v(k, T, ξ ) with strict inequality as long as P(N(T) = 0, N(T ) − N(T) > 0) > 0. Note that the last statement is satisﬁed for N a Poisson process and any T < T . A more careful analysis is furnished by the following two Propositions 2.4 and 2.5 whose proofs are deferred to the Appendix. PROPOSITION 2.4. The dependence of v and a on the number of units to sell k is (i) k → v(k, T) is nondecreasing, “convex” (see A.8), and v(k + 1, T) − v(k, T) < F(k + 1) − F(k) for all k, T. (ii) k → a(k, T) is nondecreasing and increases by jumps of size 1 only. Proposition 2.4(i) shows that v inherits the convexity of F but in a milder form. Proposition 2.4(ii) shows that facing more units to sell, the agent will trade at least as much; moreover an addition of one more unit will increase the trading amount by at most 1. Deﬁne G(k, T) v(k, T) −

(2.6)

min

a∈{0,1,...,k}

[v(k − a, T) + F(a)].

The quantity G(k, T) in (2.6) represents the maximal gain from an immediate impending trade. As shown in our next result, G is also related to the time-derivative of v. PROPOSITION 2.5. The dependence of v and a on the time to maturity T is (i) the derivative of v with respect to time to maturity is (2.7)

∂T v(k, T) = −λG(k, T) < 0.

Moreover, ∂T v(k, T) is increasing in k; (ii) there exist distinct thresholds t (k,i ) such that (2.8)

a(k, T) = i ,

when t (k,i +1) < T ≤ t (k,i ) .

Thus, T → a(k, T) is decreasing and right-continuous with a(k, 0+) = k/2 and limT→∞ a(k, T) = 1. (iii) T → v(k, T) is convex and ∂T2 v(k, T) is continuous except at T ∈ {t (k,i ) : i = 1, . . . k/2 }. Proposition 2.5(i) provides an implicit formula for the time-derivative of v; item (iii) further reﬁnes this result by studying the second-derivative. Proposition 2.5(ii) shows that as time to expiration approaches, the agent trades in larger and larger amounts; conversely when facing a very long time horizon, it is optimal to trade just one unit at a time. In the latter sections we will relax the assumption of a constant arrival intensity λ of N. To understand its effect, we state the following lemma

TRADE EXECUTION IN ILLIQUID MARKETS

687

LEMMA 2.6. The value function v(k, T) and optimal action a(k, T) are decreasing in λ. Note that we have the scaling property v(k, T; λ) = v(k, αT; λ/α) for any α > 0 since the main parameter is intensity of arrivals per effective horizon. Thus, dependence of v (and a) on λ is equivalent to its inverse dependence on time horizon. More generally, suppose that the arrival intensity is a deterministic function of time λ(t). Then deﬁning t the strictly increasing function τ (t) 0 λ(s) ds it follows that v(k, T; λ(·)) = v(k, τ (T); 1),

(2.9)

and a(k, T; λ(·)) = a(k, τ (T); 1). Below we give a second proof of this result using the concept of coupling. Proof. Consider two Poisson processes N1 , N2 with intensities λ1 (t) > λ2 (t). Then one (i ) may construct a probability space ( , F , P ) and random variables σk , i = 1, 2, k = t (1) (2)

(i ) 1, 2, . . . such that P (σk > t) = exp(− 0 λi (s) ds) and σk ≤ σk P -almost surely. Let (i ) ting Ni (t) = max(k : kj=1 σ j ≤ t) we obtain two coupled copies N1 , N2 of N1 , N2 , such that P (N1 (t) ≥ N2 (t) ∀t) = 1. Now it is fairly obvious that v λ1 (k, T) ≤ v λ2 (k, T) since working under P , the ﬁrst case has almost surely more arrivals than the second case. Formally, let us deﬁne a deterministic time-change by ν(t) = τ2 (τ1 (t))−1 . Since (1) (2) τ1 (t) > τ2 (t), ν(t) > t. Then P (σk ∈ dt) = P (σk ∈ ν(dt)), which implies P (N1 (t) ≤ j ) = P(N2 (ν(t)) ≤ j ) for all j and therefore v λ1 (k, T) = v λ2 (k, ν(T)) (map any control ξ for v λ1 (k, T) into a control ξν(t) for v λ2 (k, ν(T))). Now, since ν(T) > T it follows that v λ2 (k, T) > v λ2 (k, ν(T)) = v λ1 (k, T). REMARK 2.7. A word on the computation of the value function and the optimal action. Using the above results, one may easily compute v(k, T) for any depth function F(·) by using the coupled family of ﬁrst-order ordinary differential equations (2.7) over a time grid. Note that given v(k, T), a(k, T), ﬁnding the minimum in the deﬁnition of G(k, T + h) requires just one comparison since a(k, T + h) ∈ {a(k, T), a(k, T) − 1}. Given v(k, T) and a(k, T) an optimal trading strategy is straightforwardly implemented using (2.3).

3. EXTENSIONS Using the analysis of Section 2 as a starting point, we now consider several progressively more sophisticated versions of the original model (1.1) so as to better express the complexities of real markets.

3.1. Constrained Trading In this section we consider the modiﬁed model whereby N is a compound Poisson process with characteristics (λ, ν) and the agent is constrained to trade only up to the order size Yi (which have distribution ν). To wit, we look at the constrained value function ⎡ v˜ (k, T) = inf E ⎣ ξ ∈A˜ k

i :σi ≤T

⎤ F(ξσi − − ξσi ) + F(ξT )⎦ ,

k ∈ N+ , T ∈ R+

688

E. BAYRAKTAR AND M. LUDKOVSKI

where AËœ k âŠ‚ Ak has the additional constraint that 0 â‰¤ ÎžĎƒi âˆ’ âˆ’ ÎžĎƒi â‰¤ Yi . As a ďŹ rst remark, note that we trivially have the bound vËœ (k, T) â‰Ľ v(k, T). In counterpart to the dynamic programing equation (2.1), the constrained value funcËœ tion vËœ is the unique ďŹ xed point of the following functional operator L: Ëœ (3.1) LËœv (k, T) = E F(k)1{Ďƒ1 >T} + min (F(a) + vËœ (k âˆ’ a, T âˆ’ Ďƒ1 )) 1{Ďƒ1 â‰¤T} . aâˆˆ{1,2,...,Y1 âˆ§k}

Ëœ has a unique ďŹ xed point is identical to that The proof of (3.1), as well as of the fact that L of Proposition 3.2 below and therefore deferred. Let us now deďŹ ne (3.2)

a(k, Ëœ T) = argmin {Ëœv (k âˆ’ a, T) + F(a)}. aâˆˆ{1,...,k}

The next proposition is analogous to Propositions 2.4 and 2.5. However, it is complicated by the fact that without proving the convexity results regarding vËœ (Âˇ, T) it is not clear that min

aâˆˆ{1,2,...,Y1 âˆ§k}

(F(a) + vËœ (k âˆ’ a, T âˆ’ Ďƒ1 ))

= F(a(k, Ëœ T) âˆ§ Y1 ) + vËœ (k âˆ’ (a(k, Ëœ T) âˆ§ Y1 ), T âˆ’ Ďƒ1 ) . The above statement implies that an optimal liquidation strategy consists of placing trades of size a(k, Ëœ T) and then letting them be ďŹ lled to the maximum extent by the matching incoming orders. PROPOSITION 3.1. The following hold: (i) k â†’ vËœ (k, T) is nondecreasing, â€œconvex,â€? and vËœ (k + 1, T) âˆ’ vËœ (k, T) < F(k + 1) âˆ’ F(k); (ii) T â†’ vËœ (k, T) is decreasing and convex; (iii) denote by Î˝[a(k, Ëœ T), âˆž) = P(Y1 > a(k, Ëœ T)). Then

Ëœ T), T) + F(a(k, Ëœ T))] Î˝[a(k, Ëœ T), âˆž) (3.3) âˆ‚T vËœ (k, T) = âˆ’Îť vËœ (k, T) âˆ’ [Ëœv (k âˆ’ a(k,

a(k,T) Ëœ

+

Î˝(y)[Ëœv (k âˆ’ y, T) + F(y)] < 0;

y=1

moreover, âˆ‚T vËœ (k, T) is increasing in k; (iv) k â†’ a(k, Ëœ T) is nondecreasing and increases by jumps of size 1 only; (v) T â†’ a(k, Ëœ T) is nonincreasing and right continuous with a(k, Ëœ 0) = k/2 and limTâ†’âˆž a(k, Ëœ T) = 1. Moreover, its jumps are of size 1. The jumps of T â†’ a(k, Ëœ T) occur at the discontinuity points of T â†’ âˆ‚T2 vËœ (k, T). Proof. We will ďŹ rst consider an auxiliary control problem in which the agent has to submit her sell orders before seeing the size of the incoming buy orders 1 . Let us call the corresponding value function by V(k, T). Again, a dynamic programing principle implies that this value function is the unique ďŹ xed point of an operator L that is deďŹ ned by LV(k, T) = E[F(k)1{Ďƒ1 >T} + (F(Îą(k, T) âˆ§ Y1 ) + V (k âˆ’ (Îą(k, T) âˆ§ Y1 ), T âˆ’ Ďƒ1 )) 1{Ďƒ1 â‰¤T} ], 1 This parallels real markets where once an order is placed, it will be maximally partially ďŹ lled against any incoming matching order.

TRADE EXECUTION IN ILLIQUID MARKETS

689

in which α(k, T) = argmin (V(k − a) + F(a)). a∈{1,...,k}

All the proofs in Section 2 now go through to show that the pair (V, α) satisﬁes (i)–(v) of Proposition 3.1. Now since V(·, T) is convex, it follows that V(k − a, T) + F(a) is ˜ from (3.1) against monotone on the set {a ≤ α(k, T)} and therefore the action of L and L V is the same. Since V is a ﬁxed point of L, ˜ V = LV = LV. ˜ has a unique ﬁxed point, so that v˜ (·, ·) = V(·, ·) and α(·, ·) = a(·, But L ˜ ·), and the proof is complete.

3.2. Stochastic Liquidity The model in Section 2 assumed a constant level of trade activity over the full time horizon. However, as practitioners know, real-life order ﬂows experience multiple regime changes. For instance, a common intraday pattern features high level of activity in the beginning and end of the trading session and a lower trade intensity during midday. Alternatively, markets may experience liquidity crises, whereby order ﬂow abruptly slows down. To capture such stylized features, in this section we assume that N is a regimeswitching (doubly-stochastic) compound Poisson process, modulated by the market state variable M. M represents the market liquidity; namely the order frequency and order sizes in the order ﬂow book are driven by M. Formally, let N(1) , . . . , N(m) be m independent compound Poisson processes with intensities and jump distributions (λ1 , ν1 ), . . . , (λm , νm ). We assume that M forms an independent ﬁnite state Markov chain with state space E = {1, 2, . . . , m} and inﬁnitesimal generator Q = (qi j ). Then the observed order ﬂow is given by t Nt = (3.4) 1{Ms =i } d Ns(i ) , t ≥ 0. 0 i ∈E

By construction, the increments of N are independent conditioned on M. Let v(k, T; i ) represent the minimal execution costs conditional on M0 = i . Note that the lower and upper bounds derived in Lemmas 2.1 and 2.2 also bound the value function in the regime switching case. The next proposition establishes the dynamic programing equation for v(k, T; i ). PROPOSITION 3.2. The value function v satisﬁes the dynamic programing equation v = Lv, in which L is the ﬁrst jump operator given by (3.5) Lv(k, T; i ) = Ei F(k)1{σ1 >T} +

min

a∈{1,2,...,Y1 ∧k}

(F(a) + v (k − a, T − σ1 ; Mσ1 )) 1{σ1 ≤T} .

In fact, v is the unique ﬁxed point of L. Proof. Let us introduce (3.6)

u 0 (k, T; i ) = F(k),

u n (k, T; i ) Lu n−1 (k, T; i ), n ≥ 1.

690

E. BAYRAKTAR AND M. LUDKOVSKI

Following the logic of the proof of proposition 3.1 in Bayraktar and Ludkovski (2009b), we can show that ⎤ ⎡ i ⎣ u n (k, T; i ) = v n (k, T; i ) inf E (3.7) F(ξσk − − ξσk ) + F(ξT )⎦ , ξ ∈An

k≤n;σk ≤T

which denotes the value function under the constraint that the agent only trades during the ﬁrst n orders (and makes zero-trades thereafter until the close T). On the other hand v n (k, T; i ) = v(k, T; i ), for n ≥ k since at most k trades are needed to liquidate a position of size k. Now, thanks to (3.6) v(k, T; i ) = v k+1 (k, T; i ) = Lv k (k, T; i ) = Lv(k, T; i ). The fact that v is the unique ﬁxed point of L, which is an increasing, continuous, and concave operator, follows from standard results in optimal control, see, for example, Zabczyk (1983) or the proof of theorem 3.1 in Bayraktar and Ludkovski (2010). The Hamilton-Jacobi-Bellman equation for v(k, T; i ) is given by the following lemma, also compare with (2.7). LEMMA 3.3. Let us denote G(k, T; i ) := v(k, T; i ) − min[v(k − a, T; i ) + F(a)]. a

Then derivative of v with respect to its second variable is (3.8)

∂T v(k, T; i ) = −λi G(k, T; i ) +

qi j (v(k, T; j ) − v(k, T; i )).

j ∈E\{i }

Since the proof of Proposition 2.4 is pathwise, it is unaffected by the regime-switching setting and therefore we have COROLLARY 3.4. For a ﬁxed i , k → v(k, T; i ) is convex, and the optimal actions are nondecreasing in k, a(k, T; i ) ≤ a(k + 1, T; i ). A further possibility is to assume that the market liquidity variable M is not observed. This is a good proxy for real markets where market participants do not know the full liquidity state. Instead, agents infer current liquidity based on observed trades. Thus, decreased frequency of trades may point to an impending liquidity crisis and force agents to preemptively place larger trades. A related model with continuous trading was considered in Almgren and Lorenz (2006). Such a model can be tackled within the framework of stochastic control with partially observable Poisson processes, investigated by the authors in the previous papers (Bayraktar and Ludkovski 2009b, c). Namely, one postulates a Bayesian setting whereby the agent dynamically updates her beliefs about M. The conditional probability process (t) (1 (t), . . . , m (t)) with (3.9)

i (t) = Pπ {Mt = i |FtN },

for i ∈ E, and t ≥ 0

TRADE EXECUTION IN ILLIQUID MARKETS

691

becomes a new hyperstate and the partially observed execution problem can be stated as ⎡ ⎤ (3.10) F(ξ (σi −) − ξ (σi )) + F(ξ (T))⎦ , v(k, T, π ) = infp Eπ ⎣ ξ ∈Ak

i :σi ≤T

where the minimization is over all F N -adapted admissible controls ξ with ξ0 = k. The model (3.10) can be approached using the methods in Bayraktar and Ludkovski (2009b, c), formulating a dynamic programing equation and characterizing the optimal strategy; we refer to Bayraktar and Ludkovski (2009a) for further details. Note that the optimal execution strategies are now more complex since they depend on the beliefs (t).

3.3. Continuous Sale Amounts A related limiting model is obtained when we allow the sale amounts to be arbitrary real numbers, rather than integers. The corresponding problem becomes ⎤ ⎡ (3.11) F(ξσi − − ξσi ) + F(ξT )⎦ x ∈ R+ , T ∈ R+ , u(x, ˆ T) = infc E ⎣ ξ ∈Ax

i :σi ≤T

where Ack ⊇ Ak is now the set of all F-adapted, nonincreasing processes whose values change only at the time of jumps of the Poisson process N with ξ0 = x. The value function when continuous sales are allowed is easier to work with. For instance, because the new set of admissible strategies Ack is convex (which was not true under integer-constraints), it immediately follows that uˆ is convex in its ﬁrst argument x (compare to Proposition 2.4(i)). A further advantage is that the value function uˆ satisﬁes a scaling property whenever F does, which helps to reduce the dimension of the problem. LEMMA 3.5. Suppose the depth function F admits the following scaling property, F(xβ)/F(β) = H(x), for some function H and all β > 0. Then u(x, ˆ T) = H(x)u(T) in which u is the unique solution of (3.12) u(T) = E H(1)1{σ1 >T} + min (H(a) + H(1 − a) · u (T − σ1 )) 1{σ1 ≤T} . a∈[0,1]

In particular, suppose that F(x) = xγ (i.e., H(x) = xγ ), γ > 1. Then the function u in (3.12) satisﬁes the following nonlinear ordinary differential equation (ODE):

⎧ 1 ⎪ ⎪ ⎨ ∂T u(T) = λu(T) [1 + u(T)1/(γ −1) ]γ −1 − 1 , u(0) = 1; (3.13) ⎪

⎪ ⎩ ∂T a(T) = λ a(T)(1 − a(T)) (1 − a(T))γ −1 − 1 < 0, a(0) = 1/2. γ −1 Proof. Using (3.11) and the assumption on F we can see that u(x, ˆ T) = H(x)u(1, ˆ T) since if ξ is a strategy for u(x, ˆ T) then ξ/x is a strategy for u(1, ˆ T). With the latter scaling property, the dynamic programing equation (3.12) is just the counterpart of the original (2.1).

692

E. BAYRAKTAR AND M. LUDKOVSKI

In the case where H(x) = xγ , the dynamic programing equation (3.12) leads to the integral equation (note H(1) = 1) T −λT u(T) = e + min (a γ + (1 − a)γ u(T − s)) λe−λs ds 0

a∈[0,1]

= e−λT 1 +

T 0

min (a γ + (1 − a)γ u(s)) λeλs ds .

a∈[0,1]

The optimal action evidently satisﬁes (3.14)

a(T) =

u(T)1/(γ −1) . 1 + u(T)1/(γ −1)

If we let f (T) = eλT u(T), it can be shown that ∂T f (T) = λ f (T) · [1 + ( f (T)e−λT )1/(γ −1) ]1−γ , from which we can derive the ODE for u in (3.13). Finally, the ODE for a in (3.13) follows using (3.14) and the ODE for u. Since a(T) ≤ 1/2, by inspection the right-hand side of the ODE for a in (3.13) is negative and it can also be shown that ∂T2 a(T) > 0. We ﬁnd that for a power depth function, Lemma 3.5 provides an excellent approximation even for moderate values k ≥ 20. Thus, when the scaling property of F is satisﬁed, we obtain a very fast method to compute v(k, T) H(k)u(T) and a(k, T) k · a(T) as deﬁned in (3.13).

4. NUMERICAL ILLUSTRATIONS In this section we illustrate the results of our analysis with some computational examples. We begin with the base model where we take without loss of generality λ = 1. We also take a quadratic depth function F(a) = a 2 /2. Solving for a(k, T) using Remark 2.7 we obtain Figure 4.1. As shown in Proposition 2.5, a(k, ·) decreases by steps of size 1; at the same time as shown in Proposition 2.4, a(·, T) increases by steps of size 1. This surface is used in conjunction with (2.3) to react to the arrivals of orders in an optimal way. We then proceed to study the more complex extensions of Section 3. Thus, we assume that several liquidity regimes are possible; to be concrete, we ﬁx the liquidity regimeswitching model as Mt ∈ E = {High, Med, Low} ≡ {1, 2, 3} with inﬁnitesimal generator ⎛ ⎞ −2 2 0 ⎜ ⎟ Q = ⎝ 1 −4 3 ⎠ . 0

2

−2

The intensity of orders is λ(Mt ) with λ = [3, 3, 1] and order sizes have the strictly exp(−μi ) (μi ) y , y = 1, 2, . . . , with “mean” sizes positive Poisson distributions νi (y) = 1−exp(−μi ) y! μ = [8, 4, 4]. The value function v(k, T, i ) is easily computed by solving the corresponding system of ODE’s in (3.8). In this context, Figure 4.2 shows the effect of constraints on optimal strategy and optimal execution cost. We observe that constraints play the largest role at medium time horizons, as on long time horizons the agent has plenty of opportunities to trade, while with very short deadlines the convexity of F is the determining factor. Also, as expected the agent responds to constraints by preemptively placing marginally larger orders in the hope they will be ﬁlled.

TRADE EXECUTION IN ILLIQUID MARKETS

693

FIGURE 4.1. Optimal sale amounts a(k, T) as a function of current holding k and time to maturity T. We take λ = 1, F(a) = a 2 /2, and the model (1.1).

FIGURE 4.2. The effect of constraints in the regime-switching setting of Section 3.2. Left panel shows the difference between v(k, T; i ) and v˜ (k, T; i ) for a ﬁxed i = 1 and k = 20; right panel plots the difference between a(k, T; i ) (circular markers) and a(k, ˜ T; i ) (dashed line) for same i = 1, k = 20. To compare the different models of Section 3, Table 4.1 presents a summary of the various value functions. Namely, we compare the effect of stochastic liquidity, and also of constraints. Finally, we also show the accuracy of upper and lower bounds of Lemmas 2.1 and 2.2 for this case. We see that these bounds are quite tight (relative difference of about 10–15%) and can be used to give a quick idea about v. The bounds are easily computed

694

E. BAYRAKTAR AND M. LUDKOVSKI

TABLE 4.1 We Consider the Regime-Switching Case with T = 1, k = 20, F(a) = a 2 /2. The Lower Bounds v Are Computed Using Lemma 2.1 and the Upper Bound v Is Computed Using Lemma 2.2. We also Compare the Constrained v˜ to the Basic v Regime switching liquidity Initial regime i 1 2 3

v(k, T; i )

v(k, T; i )

v˜ (k, T; i )

v(k, T; i )

73.16 84.26 98.94

77.80 88.36 102.22

83.54 98.97 114.25

83.31 93.50 107.11

via a Monte Carlo simulation: one ﬁrst simulates paths of the Markov chain M; the conditional distribution of N(T) follows using the fact that if Ms = j for s ∈ [T1 , T2 ] then N(T2 ) − N(T1 ) ∼ Poi sson(λ j (T2 − T1 )). Since the formulas in Lemmas 2.2 and 2.1 are for the base case without constraints, the constrained value functions v˜ are typically larger than the upper bound v. One could compute an adjusted v that takes into account constraints, but no simple formulas like in Lemma 2.2 appear to be forthcoming.

5. CONCLUSION In this paper we have proposed a new model for studying the optimal trade execution problem in ﬁnancial markets. Our model is directly based on a discrete order ﬂow and therefore is specially suited to capture the features of trading in dark pools where orders are executed only when matched with a crossing counterparty. To simplify our presentation, our analysis assumed a simple compound Poisson representation of the order ﬂow. However, the obtained dynamic programing equations and most of the stylized properties of the value function and optimal strategy are expected to hold in much more general setups. These could include time-dependent parameters (such as price impact, order intensity, and size distribution) or further constraints on optimal strategy. Realistic dark pool trading involves simultaneous execution on several exchanges. In particular, the trader will place trades both in the dark pool and on the regular limit order book in order to optimize the trade-off between liquidity, minimal price impact, and information content (dark pool prices are often delayed compared to the limit book). In the case where the order ﬂows of different exchanges are independent, the problem still ﬁts into our framework, since superposition of independent Poisson processes is another Poisson process. The only modiﬁcation is that orders will now carry the tag of the associated exchange and therefore the depth function F will depend on order type. A second direction for extensions is to introduce a price dimension. Thus, each crossing trade carries a price tag and the agent maximizes total revenue (rather than minimizing slippage costs). This would extend the framework of Bertsimas and Lo (1998) to include asynchronous buy arrivals and counterparty risk. Explicit prices would allow consideration of proportional depth functions F. Also, if the agent herself uses limit orders, then her control is a two-dimensional (price, quantity)-pair and the probability of a

TRADE EXECUTION IN ILLIQUID MARKETS

695

matching trade being accepted could be taken to depend on both of these variables. Such multidimensional versions of our model will be taken up in future work.

APPENDIX A.1. Proof of Proposition 2.4. The ﬁrst preparatory lemma below shows that the slope of v is smaller than that of F. LEMMA A.1. For any k1 > k2 and t we have v(k1 , T) − v(k2 , T) < F(k1 ) − F(k2 ). Alternatively, F(k) − v(k, T) is increasing in k. Proof. Let ξ k2 denote ξ (a,k2 ,T) . Recall that v ξ (k, T) denotes the expected performance of any control ξ . Interpreting ξ k2 as a suboptimal control for v(k1 , T) (which disposes of the extra k1 − k2 units at maturity), we have v(k1 , T) − v(k2 , T) ≤ v ξ k2 (k1 , T) − v ξ k2 (k2 , T) #k $ 2 =E (F(i + k1 − k2 ) − F(i ))1{ξ k2 =i } T

i =0

<

k2 E (F(k1 ) − F(k2 ))1{ξ k2 =i } = F(k1 ) − F(k2 ), T

i =0

where the second inequality is due to the convexity of F, whereby y → F(a + y) − F(y) is increasing. The next lemma shows that if one starts with more units initially and sells them in an optimal way, then one will always have more units at any later point in time. (a,k,T)

LEMMA A.2. Let ξtk denote ξt t ∈ [0, T].

, k ∈ N+ . Then for ≥ k we have that ξt ≥ ξtk for all

Proof. First note that if at any date s ≤ t we would have ξs = ξsk , then it follows from (2.3) and the Markov nature of a(k, T) that for all s ≥ s we will have ξs = ξsk as well. Thus, to have ξs < ξsk on a set A of strictly positive probability there necessarily must be an arrival σ j such that dl := ξσj − > ξσkj − := dk and b := ξσj < ξσkj := bk on A. By construction, b = d − a(d , T − σ j ) and bk = dk − a(dk , T − σ j ). Moreover,

(A.1)

⎧ {v(d − a, T − σ j ) + F(a)}; ⎪ ⎨ a(d , T − σ j ) =: a = argmin a ⎪ ⎩ a(dk , T − σ j ) =: ak = argmin{v(dk − a, T − σ j ) + F(a)}. a

Deﬁne c = d − dk + ak > ak , and ck = dk − d + a < a . Therefore from (A.1) (and recalling that a is the smallest minimizer, while a > c ) %

v(d − a , T − σ j ) + F(a ) < v(d − c , T − σ j ) + F(c ), v(dk − ak , T − σ j ) + F(ak ) ≤ v(dk − ck , T − σ j ) + F(ck ).

696

E. BAYRAKTAR AND M. LUDKOVSKI

Rearranging, we obtain % v(d − c , T − σ j ) − v(d − a , T − σ j ) > F(a ) − F(c ), (A.2) v(dk − ak , T − σ j ) − v(dk − ck , T − σ j ) ≤ F(ck ) − F(ak ). However, the left-hand sides of both equations in (A.2) are the same by construction and are in fact equal to v(bk , T − σ j ) − v(b , T − σ j ) > 0. On the other hand, since a > ck and c > ak , while a − c = ck − ak = a − ak + dk − d = bk − b > 0, by the convexity of F we must have F(a ) − F(c ) ≥ F(ck ) − F(ak ), contradicting (A.2). To keep the corresponding inventory processes ordered correctly it follows that for any T ∈ R+ and ≤ k, we have (A.3)

a(k, T) − a(, T) ≤ k −

∀t ≥ 0.

In particular, a(k + 1, T) ≤ a(k, T) + 1. LEMMA A.3. We have v(k, T) is “convex” in k, that is, for any k ∈ N+ (A.4) v(k, T) − v(, T) ≥ v(k − n, T) − v( − n, T),

∀ ∈ {1, . . . , k}, ∀n ∈ {1, . . . , l}.

Also, for any T ∈ R+ and ≤ k, (A.5)

a(, T) ≤ a(k, T).

Proof. We will prove both of the above statements together by induction. Note that (A.4) holds when k = 1 since v(0, T) = 0. Also a(1, T) ≥ a(0, T) = 0. Suppose that (A.4) and (A.5) hold for some k ≥ 1. We will show that they are also true when k is replaced by k + 1. It is enough to prove that (A.6)

v(k + 1, T) − v(k, T) ≥ v(k, T) − v(k − 1, T),

and that a(k + 1, T) ≥ a(k, T). First, by deﬁnition a(k + 1, T) = argmina {v(k + 1 − a, T) + F(a)}. Now suppose that a(k, T) > b ≥ 1. This implies that v(k − b, T) + F(b) > v(k − a(k, T), T) + F(a(k, T)) ⇐⇒ v(k − b, T) − v(k − a(k, T), T) > F(a(k, T)) − F(b) =⇒ v(k + 1 − b, T) − v(k + 1 − a(k, T), T) > F(a(k, T)) − F(b) > 0 =⇒ a(k + 1, T) = b, since the sale of b shares is less preferable than selling a(k, T) shares. The third line follows from the induction hypothesis since k + 1 − b ≤ k. Since a(k + 1, T) = b ∀b < a(k, T) we obtain a(k + 1, T) ≥ a(k, T).

TRADE EXECUTION IN ILLIQUID MARKETS

697

Thanks to the fact that a(k + 1, T) â‰Ľ a(k, T) for all T âˆˆ R+ , the induction hypothesis âˆ’ ÎžĎƒk+1 = ÎžĎƒkn âˆ’ âˆ’ on a, and the dynamics of Îž i â‰Ą Îž (a,i ,T) given in (2.3) we have that ÎžĎƒk+1 nâˆ’ n k ÎžĎƒn + Ďƒn , where Ďƒn âˆˆ {0, 1}(Ďƒn â‰¤ 1 due to the bound in A.3). The process should be thought of as the â€œadditionalâ€? action needed to sell one more unit starting with k units. Now, the left-hand side of (A.6) becomes (A.7)

v(k + 1, T) âˆ’ v(k, T) âŽĄ &

' âˆ’ F ÎžĎƒkn âˆ’ âˆ’ ÎžĎƒkn 1{Ďƒn >0} F ÎžĎƒk+1 âˆ’ ÎžĎƒk+1 = EâŽŁ nâˆ’ n n:Ďƒn â‰¤T

âŽ¤ &

' + 1{n Ďƒn =0} F ÎžTk+1 âˆ’ F ÎžTk âŽŚ âŽĄ = EâŽŁ

&

'

1{Ďƒn >0} F ÎžĎƒkn âˆ’ âˆ’ ÎžĎƒkn + 1 âˆ’ F ÎžĎƒkn âˆ’ âˆ’ ÎžĎƒkn

n:Ďƒn â‰¤T

âŽ¤ &

' + 1{n Ďƒn =0} F ÎžTk + 1) âˆ’ F(ÎžTk âŽŚ . Let us analyze the right-hand side of (A.6). DeďŹ ne the control Îž by Îž0 = k and ÎžĎƒ n âˆ’ âˆ’ âˆ’ ÎžĎƒkâˆ’1 + Ďƒn . This is an admissible control for selling k units. Then, ÎžĎƒ n = ÎžĎƒkâˆ’1 nâˆ’ n âŽĄ âŽ¤ v(k, T) âˆ’ v(k âˆ’ 1, T) â‰¤ E âŽŁ F(ÎžĎƒ âˆ’ âˆ’ ÎžĎƒ ) + F(ÎžT )âŽŚ n

n:Ďƒn â‰¤T

âŽĄ

âˆ’EâŽŁ âŽĄ = EâŽŁ

n

âŽ¤

kâˆ’1 + F(ÎžTk )âŽŚ F ÎžĎƒn âˆ’ âˆ’ ÎžĎƒkâˆ’1 n

n:Ďƒn â‰¤T

&

' 1{Ďƒn >0} F ÎžĎƒkâˆ’1 âˆ’ ÎžĎƒkâˆ’1 + 1 âˆ’ F ÎžĎƒkâˆ’1 âˆ’ ÎžĎƒkâˆ’1 nâˆ’ n nâˆ’ n

n:Ďƒn â‰¤T

âŽ¤ &

'

+ 1{n Ďƒn =0} F ÎžTkâˆ’1 + 1 âˆ’ F ÎžTkâˆ’1 âŽŚ

âŽĄ â‰¤ EâŽŁ

&

'

1{Ďƒn >0} F ÎžĎƒkn âˆ’ âˆ’ ÎžĎƒkn + 1 âˆ’ F ÎžĎƒkn âˆ’ âˆ’ ÎžĎƒkn

n:Ďƒn â‰¤T

âŽ¤ &

'

k k + 1{n Ďƒn =0} F ÎžT + 1 âˆ’ F ÎžT âŽŚ

= v(k + 1, T) âˆ’ v(k, T). The last inequality is by the convexity of F and the induction hypothesis on a from which it follows that ÎžĎƒkâˆ’1 âˆ’ ÎžĎƒkâˆ’1 â‰¤ ÎžĎƒkn âˆ’ âˆ’ ÎžĎƒkn . The last equality is from (A.7). This completes nâˆ’ n the proof.

698

E. BAYRAKTAR AND M. LUDKOVSKI

To conclude the proof of Proposition 2.4(i), we now show by induction that v(k − a − 1, T) ≤ αv(k − b, T) + (1 − α)v(k − a, T),

(A.8)

for any a < k and any b ∈ N+ with a < b ≤ k and α = 1/(b − a). Note that (A.8) or equivalently, v(k − a, T) − v(k − b, T) ≤ (b − a)[v(k − a, T) − v(k − a − 1, T)]

(A.9)

holds for b = a + 1. Let us assume that (A.9) holds for b = a + n (in which n is such that a + n + 1 ≤ k), that is, v(k − a, T) − v(k − a − n, T) ≤ n[v(k − a, T) − v(k − a − 1, T)]. On the other hand, v(k − a − n, T) − v(k − a − n − 1, T) ≤ v(k − a, T) − v(k − a − 1, T), thanks to Lemma A.3. Adding the last two inequalities, we obtain (A.9) for b = a + n + 1. The proof of Proposition 2.4(ii) follows from (A.3) and (A.5).

A.2. Proof of Proposition 2.5. We start with a preliminary Lemma A.4 that shows that the more units the agent has, the more eager she is to sell and so the beneﬁt of a matching order is larger. LEMMA A.4. The map k → G(k, T) is nondecreasing for all T ∈ R+ . Proof. Let k ≥ . Using a(, T) ≤ a(k, T) and Lemma A.3 on the second line, G(k, T) ≥ v(k, T) − (v(k − a(, T), T) + F(a(, T))) ≥ v(, T) − (v( − a(, T), T) + F(a(, T))) = G(, T).

We now prove Proposition 2.5(ii): For h > 0, let A = {σ1 > h}, B = {σ1 < h, σ2 > h}, and C = (A ∪ B)c . We have that P(A) = e−λh , P(B) = λhe−λh , and P(C) = o(h). Using the dynamic programing principle, we can write v(k, T + h) = E[v(k, T)1 A + (v(k, T) − G(k, T))1 B + X1C ], in which X is a bounded random variable. Then sending h → 0 we obtain (A.10) lim

h→0

v(k, T + h) − v(k, T) E[v(k, T)(1 A∪B ) − G(k, T)1 B ] − v(k, T) + o(h) = lim h→0 h h −λhG(k, T) + o(h) = lim = −λG(k, T). h→0 h

Next, we return to studying the properties of a(k, T). LEMMA A.5. Optimal trading amount decreases as the horizon becomes longer: a(k, S) ≤ a(k, T), ∀k ∈ N+ , ∀S > T.

TRADE EXECUTION IN ILLIQUID MARKETS

699

Proof. For any b > a(k, T) v(k − b, T) + F(b) > v(k − a(k, T), T) + F(a(k, T)) ⇐⇒ v(k − a(k, T), T) − v(k − b, T) < F(b) − F(a(k, T)). We have that ∂T v(k, T) ≤ ∂T v(, T) for ≤ k, due to Lemma A.4 and (A.10). Therefore, v(k − a(k, T), S) − v(k − b, S) ≤ v(k − a(k, T), T) − v(k − b, T) < F(b) − F(a(k, T)) which implies that a(k, T) performs strictly better than action b for the minimization problem mina∈{0,···l} {v(k − a, S) + F(a)}, which implies that b = a(k, S), which is the smallest minimizer for this problem. Since this holds for any b > a(k, T) we necessarily have that a(k, T) ≥ a(k, S). In the next lemma we shall see that T → a(k, T) decreases to 1. By construction, we also have that a(k, 0+) = k/2 . LEMMA A.6. limT→∞ v(k, T) = kF(1) and limT→∞ a(k, T) = 1. Proof. Recall from Lemma 2.2 that v(k, T) ≤ v 1 (k, T) where v 1 denotes the performance of a constant 1-strategy that always sells a single unit. Since v 1 (k, T) = kF(1)P(N(T) ≥ k) +

k−1 (n F(k) + F(k − n))P(N(T) = n) → kF(1),

as T → ∞,

n=0

while v(k, T) ≥ kF(1) ∀T, the ﬁrst statement of the lemma follows. Let us choose a positive 0 < δ < F(2) − 2F(1). Fix k > 0; by above, for large enough T , we have that v(a, T) ≤ a F(1) + δ for all a ∈ {1, . . . , k}. Then by convexity of F v(k − 1, T) + F(1) ≤ (k − 1)F(1) + δ + F(1) < (k − c)F(1) + F(c) ≤ v(k − c, T) + F(c), for any T ≥ T and any 1 < c < k. Comparing with the deﬁnition of a(k, T) in (2.2), we conclude that a(k, T) = 1 for T ≥ T . The existence of threshold t (k,i ) of (2.8) follows from Lemma A.5. It remains to show that the thresholds are distinct, that is, t (k,i ) < t (k,i −1) , so that as a function of T, a(k, T) experiences jumps of size 1 only. Toward a contradiction, suppose that there exists T and level k such that a(k, T−) − a(k, T) > 1. Let a = a(k, T) and b = a(k, T−) > a + 1. Since 1 ≤ a(k, ·) ≤ k/2 is nonincreasing and has at most k/2 − 1 jumps, ∃δ > 0, such that b = a(k, T − s) for all s < δ. By optimality of b we have that v(k − b, T − s) + F(b) = v(k − a(k, T − s), T − s) + F(a(k, T − s)) ≤ v(k − a, T − s) + F(a) ∀s < δ Therefore, by continuity of the value function in T, and optimality of a at T we must have (A.11)

v(k − a, T) + F(a) = v(k − b, T) + F(b).

700

E. BAYRAKTAR AND M. LUDKOVSKI

Let α = 1/(b − a) ∈ (0, 1). By the strict convexity of F we have that F(a + 1) < α F(b) + (1 − α)F(a). Similarly, by Lemma A.3, we have that v(k − a − 1, T) ≤ αv(k − b, T) + (1 − α)v(k − a, T). Adding the two latter equations together we obtain v(k − a − 1, T) + F(a + 1) < α(v(k − b, T) + F(b)) + (1 − α)(v(k − a, T) + F(a)) = v(k − a, T) + F(a),

by (A.11),

which contradicts the optimality of a. Finally, we may obtain the properties of the second time-derivative of v. By (2.7), ∂T2 v(k, T) = −λ∂T G(k, T). For any T = t (k,i ) we have from combining (2.6) and (2.7) that ∂T G(k, T) = −λ(G(k, T) − G(k − a(k, T), T)),

(A.12)

since a(k, T) is constant in a neighborhood of T thanks to (2.8). By Lemma A.4, ∂T G(k, T) > 0 and therefore ∂T2 v(k, T) < 0. When T = t (k,i ) , the right derivative of G is still equal to (A.12) since T → a(k, T) is right continuous. However, ∂T G(k, T−) = −λ(G(k, T) − G(k − a(k, T) − 1, T)) < ∂T G(k, T+).

A.3. Proof of Lemma 3.3 Denote by τk the kth transition time of M. For h > 0, let A = {σ1 > h, τ1 > h}, BN = {σ1 < h, σ2 > h, τ1 > h}, B j = {σ1 > h, τ1 < h, τ2 > h, Mτ1 = j }, j ∈ E \ {i }, and C = (A ∪ BN ∪ j B j )c . By conditional independence of N and M we have that Pi (A) = P(A|M0 = i ) = e−(λ−qii )h , Pi (BN ) = λhe−(λ−qii )h , Pi (B j ) = qi j he−(λ−qii )h , and Pi (C) = o(h). Using the dynamic programing principle, we can write v(k, T + h; i ) ⎡ = Ei ⎣v(k, T; i )1 A + (v(k, T; i ) − G(k, T; i ))1 BN +

⎤ v(k, T; j )1 B j + X1C ⎦ ,

j ∈E\{i }

in which X is a bounded random variable. Taking the limit h → 0 we obtain lim

h→0

v(k, T + h; i ) − v(k, T; i ) h ⎡ E ⎣v(k, T; i )(1 A∪BN ∪ j B j ) − G(k, T; i )1 BN +

⎤ (v(k, T; j ) − v(k, T; i )1 B j )⎦ − v(k, T; i ) + o(h)

j ∈E\{i }

= lim

h→0

−λhG(k, T; i ) + = lim

h→0

h qi j h(v(k, T; j ) − v(k, T; i )) + o(h)

j ∈E\{i }

h

and (3.8) follows.

REFERENCES

ALFONSI, A., A. FRUTH, and A. SCHIED (2010): Optimal Execution Strategies in Limit Order Books with General Shape Functions. Quant. Finance 10(2), 143–157.

TRADE EXECUTION IN ILLIQUID MARKETS

701

ALMGREN, R. (2003): Optimal Execution with Nonlinear Impact Functions and TradingEnhanced Risk, Appl. Math. Finance 10, 1–18. ALMGREN, R., and J. LORENZ (2006): Bayesian Adaptive Trading with a Daily Cycle, J. Trading 1(4), 38–46. URL http://cims.nyu.edu/∼ almgren/. BAYRAKTAR, E., and M. LUDKOVSKI (2009a): Optimal Trade Execution in Illiquid Markets. Technical report, University of Michigan and University of California at Santa Barbara. Available at http://arxiv.org/abs/0902.2516, accessed August 25, 2010. BAYRAKTAR, E., and M. LUDKOVSKI (2009b): Sequential Tracking of a Hidden Markov Chain Using Point Process Observations, Stoch. Process. Appl. 119(6), 1792–1822. BAYRAKTAR, E., and M. LUDKOVSKI (2010): Inventory Management with Partially Observed Nonstationary Demand, Ann. Oper. Res. 176, 7–39. BERTSIMAS, D., and A. LO (1998): Optimal Control of Execution Costs, J. Financ. Markets 1, 1–50. BRUSS, F. T. (1987): On an Optimal Selection Problem of Cowan and Zabczyk, J. Appl. Probab. 24(4), 918–928. ISSN 0021-9002. COWAN, R., and J. ZABCZYK (1978): An Optimal Selection Problem Associated with the Poisson Process, Teor. Veroyatnost. i Primenen. 23(3), 606–614. ISSN 0040-361X. DAVIS, M. H. A. (1993): Markov Models and Optimization, Vol. 49, Monographs on Statistics and Applied Probability, London: Chapman & Hall, ISBN 0-412-31410-X. OBIZHAEVA, A., and J. WANG (2006): Optimal Trading Policy and Demand/Supply Dynamics. Technical report, MIT, 2006. Available at http://web.mit.edu/wangj/www/ pap/OW_060408.pdf, accessed August 25, 2010. QPL NEWSLETTER. (2008): Introduction to Dark Pools. Technical report, Deutsche Bank Quantitative Products Laboratory, Oct 2008. Available at http://www.qpl.db.com/ EN/binaer_view.asp?BinaerNr=86, accessed August 25, 2010. ¨ (2007): Optimal Basket Liquidation for CARA Investors. TechSCHIED, A., and T. SCHONEBORN nical report, TU Munich, 2007. Available at http://www.alexschied.de/publications.html, accessed August 25, 2010. ¨ (2009): Risk Aversion and the Dynamics of Optimal LiquidaSCHIED, A., and T. SCHONEBORN tion Strategies in Illiquid Markets, Finance Stoch. 13(2), 181–204. ISSN 0949-2984. STADJE, W. (1987): An Optimal k-Stopping Problem for the Poisson Process. In Mathematical Statistics and Probability Theory, Vol. B, Reidel, Dordrecht: Bad Tatzmannsdorf, 1986, pp. 231–244. STADJE, W. (1990): A Full Information Pricing Problem for the Sale of Several Identical Commodities, Z. Oper. Res. 34(3), 161–181. ISSN 0340-9422. ZABCZYK, J. (1983): Stopping Problems in Stochastic Control. Proceedings of the International Congress of Mathematicians, pp. 1425–1437.

Mathematical Finance, Vol. 21, No. 4 (October 2011), 627–641

PRICING OPTIONS ON VARIANCE IN AFFINE STOCHASTIC VOLATILITY MODELS JAN KALLSEN Christian-Albrechts-Universit¨at zu Kiel JOHANNES MUHLE-KARBE Universit¨at Wien MORITZ VOß Universit´e Pierre et Marie Curie

We consider the pricing of options written on the quadratic variation of a given stock price process. Using the Laplace transform approach, we determine semi-explicit formulas in general afﬁne models allowing for jumps, stochastic volatility, and the leverage effect. Moreover, we show that the joint dynamics of the underlying stock and a corresponding variance swap again are of afﬁne form. Finally, we present a numerical example for the Barndorff-Nielsen and Shephard model with leverage. In particular, we study the effect of approximating the quadratic variation with its predictable compensator. KEY WORDS: quadratic variation, realized variance, volatility swap, afﬁne process, stochastic volatility, leverage effect, Laplace transform approach.

1. INTRODUCTION Because of growing trading volume, the valuation of options written on the realized variance (1.1)

N

log(Stn /Stn−1 )2 =

n=1

N (Xtn − Xtn−1 )2 ,

0 = t0 < · · · < tN = T,

n=1

of a stock S = S 0 exp(X), has been studied increasingly (cf. Sepp 2008) and the list of references therein. Most work has focused on the continuous time approximation of (1.1) by the quadratic variation [X, X] of the log-price X since the latter is considerably more tractable from a mathematical point of view. This approach is justiﬁed by the fact that the realized variance (1.1) converges to [X, X]T in probability as the mesh size supk=1,...,N |tk − tk−1 | tends to zero (see, e.g., Jacod and Shiryaev 2003, I.4.47). Broadie and Jain (2008) and Sepp (2008) conﬁrm that the approximation works quite well for daily ﬁxings tn . Consequently, we will only deal with the pricing of options written on the quadratic variation [X, X] of some log-price X. One can broadly divide the existing literature into two distinct categories. If the underlying stock price process is modeled as a continuous semimartingale, Carr and Lee (2008, Manuscript received May 2009; ﬁnal revision received October 2009. Address correspondence to Jan Kallsen, Mathematisches Seminar, Christian-Albrechts-Universit¨at zu Kiel, Westring 383, 24098 Kiel, Germany; e-mail: kallsen@math.uni-kiel.de. DOI: 10.1111/j.1467-9965.2010.00447.x C 2010 Wiley Periodicals, Inc.

627

628

J. KALLSEN, J. MUHLE-KARBE, AND M. VOß

2007) as well as Friz and Gatheral (2005) propose model-free valuation approaches based on a replicating portfolio of European options. In the presence of jumps or correlation between stock returns and volatility, one instead has to specify a parametric model. Previous work in this area includes Carr et al. (2005), considering L´evy processes, Benth, Groth, and Kufakunesu (2007), dealing with the model of Barndorff-Nielsen and Shephard (henceforth BNS; 2001) without leverage, as well as Broadie and Jain (2008) and Sepp (2008), which use the Bates model (Bates 1996) resp. the Heston model augmented by speciﬁc jumps in stock and volatility. In related work, Schoutens (2005) and Itkin and Carr (2007) price log-contracts resp. options written on the predictable quadratic variation X, X in time-changed L´evy models with Cox-Ingersoll-Ross type activity process. In this paper we study the valuation of options written on quadratic variation in the unifying framework of afﬁne stochastic volatility models. More speciﬁcally, we suppose that the stochastic volatility v and the log-price X are modeled as a bivariate afﬁne process in the sense of Dufﬁe, Filipovi´c, and Schachermayer (2003). This class of models encompasses the speciﬁc models that have been considered in the context of options on variance so far. Moreover, it includes most other option pricing models that have been proposed in the literature, as for example, the BNS model with leverage and its generalization to time-changed L´evy models by Carr et al. (2003) and Carr and Wu (2004). We show that the afﬁne structure of the stochastic volatility model (v, X) is passed on to (v, X, [X, X]). This in turn allows to compute the corresponding characteristic function by solving some generalized Riccati equations. With the characteristic function at hand, we can proceed to price options on quadratic variation using Laplace resp. Fourier transform methods as proposed by Carr and Madan (1999) and Raible (2000). Moreover, we valuate options written on the predictable compensator X, X of [X, X], which differs from the quadratic variation in the presence of jumps. Afterwards, we also determine the joint dynamics of the market spanned by the stock and a variance swap. Combined with the stochastic volatility process, this market again turns out to be an afﬁne process. This opens the door to variance-optimal hedging of options on quadratic variation in models with jumps, which is subject to current research. Finally, we present a numerical example for the BNS model with leverage. Here, we investigate to what extent the predictable quadratic variation X, X can be used as a proxy for the quadratic variation [X, X] in the context of option pricing. This paper is organized as follows. We start by recalling the notion of semimartingale characteristics. In Section 3 we introduce our general afﬁne stochastic volatility model. Subsequently, we study the quadratic variation process [X, X] of the log-price X. In Section 5 we likewise investigate the properties of the predictable quadratic variation X, X in afﬁne models. We then turn to the pricing of options written on quadratic variation before studying the market consisting of the stock and a variance swap in Section 7. We conclude with a numerical example. As for stochastic background and terminology, we refer to the monograph of Jacod and Shiryaev (2003). We write C− for the complex numbers with nonpositive real part. For an Rd -valued L´evy process Y with L´evy-Khintchine triplet (β, γ , κ), we denote by 1 ψ Y (u) = u β + u γ u + 2

eu

x

− 1 − (h(x1 ), . . . , h(xd )) u κ(d x),

the corresponding L´evy exponent, that is, the continuous function ψ Y : i Rd → C such that E(euYt ) = exp(tψ Y (u)). Here, h denotes a truncation function on R, as for example,

OPTIONS ON VARIANCE IN STOCHASTIC VOLATILITY MODELS

629

h(x) = x1{|x|≤1} . Likewise, for a process Y which is afﬁne relative to L´evy-Khintchine triplets (βi , γi , κi ), i = 0, . . . , d, we write ψ Yj for the L´evy exponent corresponding to the jth triplet (β j , γ j , κ j ).

2. DIFFERENTIAL CHARACTERISTICS This paper uses semimartingale characteristics to describe the behavior of stochastic processes. For the convenience of the reader we recall a few of the basic notions here. For a more thorough introduction, we refer to Kallsen (2006) and Jacod and Shiryaev (2003). To any Rd -valued semimartingale Y there is associated a triplet (B, C, ν) of characteristics, where B resp. C denote Rd - resp. Rd×d -valued predictable processes and ν a predictable random measure on R+ × Rd . The ﬁrst characteristic B depends on a truncation function, as for example, h(x) = x1{|x|≤1} , which is chosen a priori. The characteristics of most processes in applications are absolutely continuous in time, that is, they can be written as t t t Bt = bs ds, Ct = cs ds, ν([0, t] × G) = Ks (G) ds ∀G ∈ Bd , 0

0

0

with predictable processes b, c and a transition kernel K from ( × R+ , P) into (Rd , Bd ). In this case we call (b, c, K) the differential or local characteristics of Y . We implicitly assume that (b, c, K) is a good version in the sense that the values of c are nonnegative symmetric matrices, Ks ({0}) = 0 and (1 ∧ |x|2 )Ks (d x) < ∞. From an intuitive viewpoint one can interpret differential characteristics as a local L´evy-Khintchine triplet. Very loosely speaking, a semimartingale with differential characteristics (b, c, K) resembles locally after t a L´evy process with triplet (b, c, K)(ω, t), that is, with drift rate b, diffusion matrix c, and jump measure K. Indeed, Y is a L´evy process if and only if the differential characteristics are deterministic and constant, cf. Jacod and Shiryaev (2003, II.4.19). Afﬁne processes generalize L´evy processes by moving from constant differential characteristics (b, c, K) to afﬁne functions of Yt− in the following sense: bt (ω) = β0 +

d

Y it− (ω)βi ,

ct (ω) = γ0 +

i =1

κt (ω, G) = κ0 (G) +

d

Y it− (ω)γi ,

i =1 d

Y it− (ω)κi (G)

∀G ∈ Bd .

i =1

Here, (βi , γi , κi ) denote d + 1 given L´evy-Khintchine triplets on Rd . In order to ensure the existence of a semimartingale Y with the speciﬁed characteristics, these triplets cannot be chosen arbitrarily. Much rather they have to satisfy certain admissibility conditions to ensure, for example, that the matrix c remains nonnegative deﬁnite. This issue has been investigated in full generality by Dufﬁe, Filipovi´c, and Schachermayer (2003), which shows that given admissibility of the respective triplets, the characteristic function of the respective afﬁne process can be computed by solving some generalized Riccati equations. A reformulation in terms of semimartingale calculus can be found in Kallsen (2006). For our stochastic volatility model in Section 3, admissibility is ensured by the conditions required there.

630

J. KALLSEN, J. MUHLE-KARBE, AND M. VOß

3. AFFINE STOCHASTIC VOLATILITY MODELS Our mathematical framework for a frictionless market model is as follows. Fix a terminal time T > 0 and a ﬁltered probability space (, F, (Ft )t∈[0,T] , P). For simplicity, we assume zero interest rates on a risk-free asset S0 with price St0 = 1 for all t ∈ [0, T]. Furthermore, we suppose that the stochastic volatility v and the logarithm of a stock price S = S 0 exp(X) are modeled as a bivariate afﬁne process. This means that the differential characteristics (b, c, K) of the R+ × R-valued process (v, X) relative to some truncation function (x1 , x2 ) → (h(x1 ), h(x2 )) on R2 are of the form b=

β01 + β11 v −

,

β02 + β12 v −

c=

γ111 v −

γ112 v −

γ112 v −

γ022 + γ122 v −

,

∀G ∈ B2 .

K(G) = κ0 (G) + κ1 (G)v − ,

Here, (βi , γi , κi ), i = 0, 1 are given L´evy-Khintchine triplets on R2 which areadmissible in the sense that the L´evy measures κ0 , κ1 are supported on R+ × R and β01 − h(x1 )κ0 (d x) is well deﬁned and positive. Moreover, we assume that {x1 >1} x1 κ1 (d x) < ∞, which ensures that (v, X) does not explode in ﬁnite time and hence is a semimartingale in the usual sense (cf. Dufﬁe, Filipovi´c, and Schachermayer 2003, lemma 9.2, theorem 2.12). Finally, we suppose without loss of generality that X0 is normalized to zero. EXAMPLE 3.1. This class of models includes many speciﬁcations that have been proposed in the option pricing literature, as for example, 1. L´evy processes X with L´evy-Khintchine triplet (b, c, K), in which case (β1 , γ1 , κ1 ) = 0 and β0 =

0 b

,

γ0 =

0

0

0

c

,

κ0 (G) =

1G (0, x)K(d x)

∀G ∈ B2 .

2. CIR-time-change models of the form Xt = L t v s ds + (v t − v 0 ) + μt, 0 √ dv t = (η − λv t ) dt + σ v t d Zt , corresponding to (β0 , γ0 , κ0 ) = γ1 =

η μ + η

, 0, 0 ,

σ2

σ 2

σ 2

σ 2 2 + c L

,

β1 =

−λ

, b L − λ κ1 (G) = 1G (0, x)K L(d x),

for all G ∈ B2 . Here, η ≥ 0, μ, , λ, σ are constants, L denotes a L´evy process with triplet (b L, c L, K L) and Z an independent Wiener process. Note that we recover the dynamics of the Heston model, if L is chosen to be a Brownian motion with drift (cf. Kallsen 2006).

OPTIONS ON VARIANCE IN STOCHASTIC VOLATILITY MODELS

631

3. OU-time-change models of the form Xt = L t v s ds + Zt + Îźt, 0

dv t = âˆ’Îťv tâˆ’ dt + d Zt , which correspond to bZ Î˛0 = , Îł0 = 0, Îş0 (G) = 1G (z, z)K Z(dz), Îź + b Z âˆ’Îť 0 0 Î˛1 = , Îł1 = , Îş1 (G) = 1G (0, x)K L(d x), bL 0 cL for all G âˆˆ B2 . Here, Îź, , Îť are constants and L resp. Z denote a LÂ´evy process with triplet (b L, c L, K L) resp. an independent subordinator with triplet (b Z, 0, K Z). Observe that we obtain the dynamics of the BNS model if L is chosen to be a Brownian motion with drift (cf. Kallsen 2006).

4. QUADRATIC VARIATION In this section, we characterize the stochastic volatility v, the log-stock price X, and the corresponding quadratic variation [X, X] as a trivariate afďŹ ne process. This in turn leads quickly to the characteristic function and the conditional expectation of [X, X]. LEMMA 4.1. (v, X, [X, X]) is afďŹ ne w.r.t. the triplets (Î˛Ë† i , ÎłË†i , ÎşË† i ), i = 0, 1 given by âŽ› 1 âŽž âŽ› âŽž Î˛0 0 0 0 âŽœ 2 âŽ&#x; âŽœ âŽ&#x; âŽ&#x; , ÎłË†0 = âŽœ 0 Îł 22 0 âŽ&#x; , ÎşË† 0 (G) = 1G x1 , x2 , x2 Îş0 (d x) âˆ€G âˆˆ B3 , Î˛ Î˛Ë† 0 = âŽœ 2 0 âŽ? 0 âŽ âŽ? âŽ Îł022 0 0 0 âŽ› 1âŽž âŽ› 11 âŽž 12 Î˛1 Îł1 Îł1 0 âŽœ âŽœ âŽ&#x; âŽ&#x; Ë†Î˛1 = âŽœ Î˛12 âŽ&#x; , ÎłË†1 = âŽœ Îł112 Îł122 0 âŽ&#x; , ÎşË† 1 (G) = 1G x1 , x2 , x22 Îş1 (d x) âˆ€G âˆˆ B3 , âŽ? âŽ? âŽ âŽ Îł122 0 0 0 relative to the truncation function (x1 , x2 , x3 ) â†’ (h(x1 ), h(x2 ), 0) on R3 . Proof . By deďŹ nition, we have [X, X] = X2 âˆ’ X02 âˆ’ 2Xâˆ’ â€˘X. The joint characteristics of (v, X, [X, X]) can therefore easily be derived using Kallsen (2006, propositions 2 and 3). Since the process (v, X, [X, X]) is afďŹ ne, its characteristic function can be determined by solving some generalized Riccati equations. To simplify notation and since it sufďŹ ces for our purposes here, we only compute the characteristic function of [X, X] and leave the analogous derivation of its counterpart for (v, X, [X, X]) to the interested reader. LEMMA 4.2. For u âˆˆ Câˆ’ , we have E eu[X,X]T Ft = exp ( 0 (T âˆ’ t, u) + 1 (T âˆ’ t, u)v t + u[X, X]t ) ,

632

J. KALLSEN, J. MUHLE-KARBE, AND M. VOß

where 1 (·, u) is the unique solution to the initial value problem γ 11 ∂ 1 (t, u) = 1 12 (t, u) + β11 1 (t, u) + γ122 u ∂t 2 2 + e 1 (t,u)x1 +ux2 − 1 − 1 (t, u)h(x1 ) κ1 (d x), 1 (0, u) = 0, and 0 (t, u) =

t 0

β01 1 (s, u)

+

γ022 u

+

e

1 (s,u)x1 +ux22

− 1 − 1 (s, u)h(x1 ) κ0 (d x) ds.

Proof . For u ∈ i R, the assertion follows from Kallsen (2006, theorem 3.1). The extension to u ∈ C− is a consequence of Dufﬁe, Filipovi´c, and Schachermayer (2003, proposition 6.4). EXAMPLE 4.3. 1. For a L´evy process X with triplet (b, c, K), we have 1 (t, u) = 0 and 2 0 (t, u) = t cu + (eux − 1)K(d x) . This recovers the formula obtained in Carr et al. (2005), where the integral w.r.t. the L´evy measure is computed in closed form for the CGMY L´evy process (cf., e.g., Schoutens 2003). Consequently, the characteristic function of [X, X]T is known explicitly in this case. 2. For CIR-time-change models, 1 is given as the solution to the classical Riccati ODE ∂ σ2 2 2 1 (t, u) = 1 (t, u) − λ 1 (t, u) + (σ 2 2 + c L)u + (eux − 1)K L(d x) , ∂t 2 t with initial condition 1 (0, u) = 0 and 0 (t, u) = η 0 1 (s, u) ds. Disregarding ˇ the trivial cases where λ = 0, σ = 0, or L is deterministic (Cern´ y and Kallsen 2008, lemma A.1), straightforward calculations show 2g(u)(e f (u)t − 1) , f (u) − λ + e f (u)t ( f (u) + λ) 2η 2 f (u)et( f (u)+λ)/2 0 (t, u) = 2 log . σ f (u) − λ + e f (u)t ( f (u) + λ) 1 (t, u) =

Here, log denotes the distinguished in the sense ofSato (1999, lemma logarithm 2 7.6) and g(u) := (σ 2 2 + c L)u + (eux − 1)K L(d x), f (u) := λ2 − 2σ 2 g(u). The square-root represents the principal branch with branch cut along the negative real line. Note that this extends the representations from the proof of Lamberton and Lapeyre (1996, proposition 6.2.5) to C− . If L is chosen to be a Brownian motion with drift (i.e., in the Heston model) or a CGMY L´evy process, all expressions can be evaluated in closed form.

OPTIONS ON VARIANCE IN STOCHASTIC VOLATILITY MODELS

633

3. For OU-time-change models the situation is more involved because of simultaneous jumps of v and [X, X] for = 0. We obtain 1 (t, u) =

1 âˆ’ eâˆ’Îťt Îť

c Lu +

2 (eux âˆ’ 1)K L(d x) ,

as the solution to a linear ODE and t Z 1 (s,u)z+u 2 z2 Z 0 (t, u) = b 1 (s, u) + (e âˆ’ 1 âˆ’ 1 (s, u)h(z))K (dz) ds. 0

Hence, 1 is known in closed form if L is a Brownian motion with drift (i.e., in the BNS model) or a CGMY LÂ´evy process. Evaluation of the function 0 , on the other hand, involves one numerical integration even in the BNS t model with leverage. Without leverage, that is, for = 0, we obtain 0 (t, u) = 0 Ďˆ Z( 1 (s, u)) ds. This integral can be computed in closed form if v is chosen to be, for example, a Gammaor IG-OU process (see, e.g., Nicolato and Venardos 2003 for more details). By differentiating the characteristic function, we can compute expectations of [X, X]T . LEMMA 4.4. If grable and

t 0

E(v s ) ds < âˆž and

x22 Îşi (d x) < âˆž for i = 0, 1, then [X, X]T is inte-

E([X, X]T |Ft ) = 0 (t) + 1 (t)v t + [X, X]t , where i (t) :=

âˆ‚ i (T âˆ’ t, u)|u=0 , âˆ‚u

i = 0, 1.

Proof . Under the stated assumptions, Lemma 4.1 and Jacod and Shiryaev (2003, II.2.29a, II.2.38) imply that the process [X, X] is a special semimartingale with canonical decomposition [X, X]T = x2 âˆ— (Îź X âˆ’ Î˝ X )T + Îł022 + x22 Îş0 (d x) T + Îł122 + x22 Îş1 (d x)

T

v s ds. 0

Here, Îź X and Î˝ X denote the random measure of jumps of X and its compensator (cf. Jacod and Shiryaev 2003 for more details). Since E(|x2 âˆ— (Îź X âˆ’ Î˝ X )T |) â‰¤ 2E x2 âˆ— Î˝TX = 2T

x22 Îş0 (d x) +

x22 Îş1 (d x)

T

E(v s ) ds, 0

by Jacod and Shiryaev (2003, II.1.8) and Fubiniâ€™s theorem, [X, X]T is integrable. By exchanging integration and differentiation as in the proof of Bauer (2002, Satz 25.2), the second assertion follows. From the representation in Lemma 4.4, one can infer that (v t , Xt , E([X, X]T |Ft ))tâˆˆ[0,T] is again an afďŹ ne process, albeit with time-dependent triplets. This is expounded on in Proposition 7.1 below.

634

J. KALLSEN, J. MUHLE-KARBE, AND M. VOĂ&#x;

EXAMPLE 4.5. In our set of concrete speciďŹ cations, Lemma 4.4 leads to the following results by applying Bauer (1992, lemma 16.2) to exchange the order of integration and differentiation. 1. Let X be a LÂ´evy process with triplet (b, c, K) and x2 K(d x) < âˆž. Then Lemma 4.4 is applicable with 1 (t) = 0 and 0 (t) = (T âˆ’ t) c + x2 K(d x) . 2. For CIR-time-change models satisfying x2 K L(d x) < âˆž, we obtain 1 âˆ’ eâˆ’Îť(Tâˆ’t) 1 (t) = Ďƒ 2 2 + c L + x2 K L(d x) , Îť âˆ’Îť(Tâˆ’t) e Îˇ âˆ’ 1 + Îť(T âˆ’ t) 2 2 L 2 L Ďƒ + c + x K (d x) . 0 (t) = Îť Îť 3. Finally, for OU-time-change models with z2 K Z(dz) < âˆž and x2 K L(d x) < âˆž, 1 âˆ’ eâˆ’Îť(Tâˆ’t) L 2 L 1 (t) = c + x K (d x) , Îť eâˆ’Îť(Tâˆ’t) âˆ’ 1 + Îť(T âˆ’ t) L 2 L Z Z c + x K (d x) b + (z âˆ’ h(z))K (dz) 0 (t) = Îť2 + (T âˆ’ t) 2 z2 K Z(dz).

5. PREDICTABLE QUADRATIC VARIATION For continuous stochastic processes X, the quadratic variation [X, X] and its predictable compensator X, X coincide. For processes with jumps, the two notions differ, even though they have the same expected value under the assumptions of Lemma 4.4 (cf. Jacod and Shiryaev 2003, I.4.50, I.4.2). For afďŹ ne models as in Section 3 above, we have the following: LEMMA 5.1. Suppose

x22 Îşi (d x) < âˆž for i = 0, 1. Then X, X is well deďŹ ned and given

by t X, Xt = Îł022 + x22 Îş0 (d x) t + Îł122 + x22 Îş1 (d x) v s ds. 0

Moreover, the conditional characteristic function of X, XT is given by E euX,XT Ft = exp (Ď’0 (T âˆ’ t, u) + Ď’1 (T âˆ’ t, u)v t + uX, Xt ) , where Ď’1 (Âˇ, u) is the unique solution to the initial value problem âˆ‚ Ď’1 (t, u) = Ďˆ1v (Ď’1 (t, u)) + Îł122 + x22 Îş1 (d x) u, âˆ‚t Ď’1 (0, u) = 0,

u âˆˆ Câˆ’ ,

OPTIONS ON VARIANCE IN STOCHASTIC VOLATILITY MODELS

and

t

ϒ0 (t, u) = 0

635

ψ0v (ϒ1 (s, u)) ds + γ022 + x22 κ0 (d x) ut.

Proof . The ﬁrst part of the assertion follows from Jacod and Shiryaev (2003, I.4.52 and II.2.6). As for the second, notice that an application of Kallsen (2006, proposition 2) allows to compute the joint differential characteristics of (v, X, X). Since this process turns out to be afﬁne, Kallsen (2006, theorem 3.1) yields that the characteristic function of X, X is of the proposed form. Notice that ϒ0 , ϒ1 coincide with 0 , 1 for continuous v and X, that is, for κ0 = κ1 = 0. This reﬂects that [X, X] = X, X in this case. In the presence of jumps, ϒ0 , ϒ1 represent a kind of ﬁrst-order approximation of 0 , 1 for small jumps, since they can be obtained from 0 , 1 by some suitable ﬁrst-order Taylor expansions of the integrands of the L´evy measures κ0 , κ1 . Whether this approximation works well or leads to a signiﬁcant error depends on the speciﬁc model and its parameters. A numerical example is presented in Section 8. EXAMPLE 5.2. The generalized Riccati ODEs for the characteristic function of X, X have been thoroughly studied in the context of interest rate theory. In particular, we have the following: 1. ϒ1 (t, u) = 0 and ϒ0(t, u) = (c + x2 K(d x))ut, if X is a L´evy process with triplet (b, c, K) satisfying x2 K(d x) < ∞. ˇ 2. For CIR time-change models with x2 K L(d x) < ∞, Cern´ y and Kallsen (2008, lemma A.1) yields 2q(u)(e p(u)t − 1) , p(u) − λ + e p(u)t ( p(u) + λ) 2η 2 p(u)et( p(u)+λ)/2 ϒ0 (t, u) = 2 log , σ p(u) − λ + e p(u)t ( p(u) + λ) for q(u) = (σ 2 2 + c L + x2 K L(d x))u and p(u) = λ2 − 2σ 2 q(u). 3. In OU time-change models, the ODE for ϒ1 is once again linear. We obtain 1 − e−λt L 2 L ϒ1 (t, u) = u c + x K (d x) , λ t ϒ0 (t, u) = ψ Z(ϒ1 (s, u)) ds + 2 z2 K Z(dz) ut, ϒ1 (t, u) =

0

provided that x2 K L(d x) < ∞ and z2 K Z(dz) < ∞. Note that if the L´evy exponent ψ Z of Z is of a suitable form as, for example, for Gamma-OU or IG-OU processes, ϒ0 can be evaluated in closed form, (cf. Nicolato and Venardos 2003), unlike for quadratic variation.

The characteristic function of the predictable quadratic variation X, X is considerably easier to compute than its counterpart for the quadratic variation [X, X]. However, in the presence of jumps realized variance converges to the latter rather than the former

636

J. KALLSEN, J. MUHLE-KARBE, AND M. VOĂ&#x;

by Jacod and Shiryaev (2003, I.4.47). We will therefore study the effect of approximating [X, X] with X, X in the context of option pricing in Section 8 below.

6. PRICING OPTIONS ON QUADRATIC VARIATION We now turn to the valuation of options written on the quadratic variation of the logprice X. To this end, we henceforth assume that the dynamics of the process (v, X) are modeled directly under a pricing measure Q. We ďŹ rst consider variance swaps with payoff [X, X]T âˆ’ Kvar at time T. Here, the variance swap rate Kvar is chosen so as to set the initial value E Q ([X, X]T âˆ’ Kvar ) of the swap equal to zero. Thus we have the following immediate consequence of Lemma 4.4. COROLLARY 6.1 (Variance Swap). Suppose the conditions of Lemma 4.4 hold. Then Kvar = E Q ([X, X]T ) = 0 (0) + 1 (0)v 0 , and the price at time t of the variance swap is given by E Q ([X, X]T âˆ’ Kvar |Ft ) = 0 (t) + 1 (t)v t + [X, X]t âˆ’ Kvar , where the functions 0 , 1 are deďŹ ned as in Lemma 4.4. âˆš Next, we turn to volatility swaps with terminal payoff [X, X]T âˆ’ Kvol atâˆštime T. As above, the volatility swap rate Kvol is chosen so as to set the initial value E Q ( [X, X]T âˆ’ Kvol ) of the contract equal to zero. Here, the nonlinearity introduced by the square root function can be dealt with using the well-known integral representation âˆž âˆš 1 1 âˆ’ eâˆ’ux x= âˆš du, x â‰Ľ 0, u 3/2 2 Ď€ 0 Â¨ and Fubiniâ€™s theorem (cf., e.g., Schurger 2002). Combined with Lemma 4.2 this leads to the following: LEMMA 6.2 (Volatility Swap). Suppose the conditions of Lemma 4.4 hold. Then âˆž 1 1 âˆ’ exp( 0 (T, âˆ’u) + 1 (T, âˆ’u)v 0 ) Kvol = E Q ( [X, X]T ) = âˆš du, u 3/2 2 Ď€ 0 and the price of the volatility swap at time t satisďŹ es âˆš E Q ( [X, X]T âˆ’ Kvol | Ft ) âˆž 1 1 âˆ’ exp( 0 (T âˆ’ t, âˆ’u) + 1 (T âˆ’ t, âˆ’u)v t âˆ’ u[X, X]t ) = âˆš du âˆ’ Kvol . u 3/2 2 Ď€ 0 Next, we turn to puts on variance, which can be evaluated in semi-explicit form using the characteristic function of [X, X]T , Fubiniâ€™s theorem and the integral representation âˆ’K(R+i y) R+i âˆž âˆ’K z 1 1 âˆž e e zx (R+i y)x (K âˆ’ x)+ = dy, e dz = Re e 2Ď€i Râˆ’i âˆž z2 Ď€ 0 (R + i y)2 for x â‰Ľ 0 and any R < 0 (cf. e.g., Carr and Lee 2008, corollary 7.8 and Rudin 1987, theorem 9.2).

OPTIONS ON VARIANCE IN STOCHASTIC VOLATILITY MODELS

637

LEMMA 6.3 (Variance Put). Let K > 0 and ďŹ x R < 0. Then at time t â‰¤ T, the price of a variance put with payoff (K âˆ’ [X, X]T )+ at time T is given by E Q ((K âˆ’ [X, X]T )+ | Ft ) âˆ’K(R+i y) 1 âˆž e 0 (Tâˆ’t,R+i y)+ 1 (Tâˆ’t,R+i y)v t +(R+i y)[X,X]t = dy. Re e Ď€ 0 (R + i y)2 The corresponding formula for variance calls can immediately be obtained using the put-call parity (x âˆ’ K)+ = x âˆ’ K + (K âˆ’ x)+ combined with Lemmas 4.4 and 6.3. Furthermore, the price formulas for options written on the predictable quadratic variation are easily obtained by inserting Ď’0 and Ď’1 from Lemma 5.1 in place of 0 resp. 1 . Other European options can be dealt with analogously, provided that their payoff admits a suitable integral representation (cf. Carr and Lee 2008 for more details).

7. JOINT DYNAMICS OF STOCK AND VARIANCE SWAP We now show that the dynamics of the process (v t , Xt , E Q ([X, X]T |Ft ))tâˆˆ[0,T] again turn out to be afďŹ ne. This opens the door to the computation of hedging strategies trading both the underlying stock and a suitable variance swap. PROPOSITION 7.1. Suppose the prerequisites of Lemma 4.4 hold and the characteristics of (v, X) are given relative to the truncation function h(x) = x on R2 . Then (v t , Xt , E Q ([X, X]T âˆ’ Kvar |Ft ))tâˆˆ[0,T] is afďŹ ne in v relative to the timeinhomogenous triplets (Î˛ i (t), Îł i (t), Îş i (t)), i = 0, 1, t âˆˆ [0, T] given by âŽ› âŽœ âŽœ Î˛ 0 (t) = âŽœ âŽœ âŽ? Îş 0 (t, G) =

Î˛02 0 (t) + 1 (t)Î˛01 + Îł022 +

âŽ› âŽœ Îł 1 (t) = âŽœ âŽ? Îş 1 (t, G) =

x22 Îş0 (d x)

âŽ&#x; âŽ&#x; âŽ&#x;, âŽ&#x; âŽ

âŽ›

âŽž

Î˛11 Î˛12 1 (t) + 1 (t)Î˛11 + Îł122 + Îł111

Îł112

Îł112

Îł122

1 (t)Îł111

1 (t)Îł112

x22 Îş1 (d x)

1 (t)Îł111

âŽ&#x; âŽ&#x; âŽ&#x;, âŽ&#x; âŽ

âŽž

âŽ&#x; 1 (t)Îł112 âŽ&#x; âŽ , 21 (t)Îł111

1G x1 , x2 , 1 (t)x1 + x22 Îş1 (d x) âˆ€G âˆˆ B3 ,

with respect to the truncation function h(x) = x on R3 .

0

âŽœ Îł 0 (t) = âŽœ âŽ?0

1G x1 , x2 , 1 (t)x1 + x22 Îş0 (d x) âˆ€G âˆˆ B3 ,

âŽ› âŽœ âŽœ Î˛ 1 (t) = âŽœ âŽœ âŽ?

âŽž

Î˛01

0

0 Îł022 0

0

âŽž

âŽ&#x; 0âŽ&#x; âŽ , 0

638

J. KALLSEN, J. MUHLE-KARBE, AND M. VOß

ˆ formula for semimartinProof . This follows from Corollary 6.1, Lemma 4.1, and Ito’s gale characteristics (Kallsen 2006, proposition 3). Notice that h(x) = x can be used as the truncation function, since (v t , Xt , E Q ([X, X]T − Kvar |Ft ))t∈[0,T] is a special semimartingale by Jacod and Shiryaev (2003, II.2.29a).

8. NUMERICAL ILLUSTRATION We now show how to apply our results to the BNS-Gamma-OU model with leverage. This means that the L´evy process L in the OU-time-change model is chosen to be a Brownian motion, which implies that in particular, c L = 1 and K L = 0. Moreover, the background driving L´evy process Z is assumed to be a compound Poisson process with exponentially distributed jumps, that is, K Z(dz) = 1(0,∞) (z)abe−bz dz, for constants a, b > 0, and b Z = 0 relative to h = 0, because Z is constant between jumps. In view of Corollary 6.1 and Example 4.5, it follows that the variance swap rate Kvar is given by −λT 1 − e−λT e − 1 + λT a 2a 2 Kvar = v0 + + 2 T. 2 λ λ b b Moreover, Example 4.3 yields 1 (t, u) = as well as 0 (t, u) = ab

t 0

∞ 0

ab = 2 − 2 u

1 − e−λt u. λ

(e 1 (s,u)z+u

t

U 0

2 2

z

− 1)e−bz dz ds

1 1 (b − 1 (s, u))2 , , 2 2 −4 2 u

ds − at,

where U denotes the hypergeometric U function. By Lemmas 6.2 and 6.3, the volatility swap rate Kvol and prices of puts on quadratic variation can therefore be computed by numerically performing a nested integration. For comparison, we also consider the value of a volatility swap resp. a put option written on the predictable quadratic variation. For the Gamma-OU process, we have ψ Z(u) = au/(b − u), by for example, Schoutens (2003, 5.5.1). Insertion into Lemma 5.1 yields 1 − e−λt u, λ t aϒ1 (s, u) 2 2 Z ϒ0 (t, u) = ds + z K (dz) ut 0 b − ϒ1 (s, u) a b − ϒ1 (t, u) 2a 2 = b log + ut + 2 ut, bλ − u b b ϒ1 (t, u) =

after an elementary integration. Here, log denotes the distinguished logarithm in the sense of Sato (1999, lemma 7.6). Consequently, the swap rate for the volatility swap and

OPTIONS ON VARIANCE IN STOCHASTIC VOLATILITY MODELS

639

Swap Rates Variance Swap

50

Volatility Swap Predictable Volatility Swap

volatility points

40

30

20

10

0

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

1.8

2

maturity T in years

FIGURE 8.1. Swap rates in the BNS-Gamma-OU model with leverage. Variance Put Prices 50 45 40

volatility points

35 30 25 20 15 10 Variance Put Prices

5 0

Predictable Variance Put Prices 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

maturity T in years

FIGURE 8.2. Put prices in the BNS-Gamma-OU model with leverage for strike K = 50 (top), 40 (middle), 30 (bottom) volatility points. prices of puts on the predictable quadratic variation can be computed by performing a single numerical integration. As for parameters, we use a = 1.4338,

b = 11.6641,

Îť = 0.5783,

= âˆ’1.2606,

v 0 = 0.0145,

obtained by Schoutens (2003) by calibrating the BNS-Gamma-OU model to a set of 75 call options on the S&P500. The results are shown in Figures 8.1 and 8.2. The predictable volatility swap rate consistently overestimates the unpredictable one. Similarly, the prices of puts on predictable quadratic variation are always smaller than those for its unpredictable counterpart in our setup. Hence, one should be wary about using X, X as a proxy for [X, X] to obtain simpler formulas.

640

J. KALLSEN, J. MUHLE-KARBE, AND M. VOß

REFERENCES

BARNDORFF-NIELSEN, O., and N. SHEPHARD (2001): Non-Gaussian Ornstein-Uhlenbeck-Based Models and Some of Their Uses in Financial Economics, J. R. Stat. Soc., Ser. B 63, 167–241. BATES, D. (1996): Jumps and Stochastic Volatility: Exchange Rate Processes Implicit in Deutsche Mark Options, Rev. Financial Stud. 9, 69–107. BAUER, H. (1992): Maß - und Integrationstheorie, 2nd ed., Berlin: de Gruyter. BAUER, H. (2002): Wahrscheinlichkeitstheorie, 5th ed., Berlin: de Gruyter. BENTH, F., M. GROTH, and R. KUFAKUNESU (2007): Valuing Volatility and Variance Swaps for a Non-Gaussian Ornstein-Uhlenbeck Stochastic Volatility Model, Appl. Math. Finance 14, 347–363. BROADIE, M., and A. JAIN (2008): The Effects of Jumps and Discrete Sampling on Volatility and Variance Swaps, Int. J. Theor. Appl. Finance 11, 761–797. CARR, P., H. GEMAN, D. MADAN, and M. YOR (2003): Stochastic Volatility for L´evy Processes, Math. Finance 13, 345–382. CARR, P., H. GEMAN, D. MADAN, and M. YOR (2005): Pricing Options on Realized Variance, Finance Stoch. 9, 453–475. CARR, P., and R. LEE (2007): Realized Volatility and Variance: Options via Swaps, RISK 20, 76–83. CARR, P., and R. LEE (2008): Robust Replication of Volatility Derivatives, Preprint. CARR, P., and D. MADAN (1999): Option Valuation Using the Fast Fourier Transform, J. Comp. Finance 2, 61–73. CARR, P., and L. WU (2004): Time-Changed L´evy Processes and Option Pricing, J. Finan. Econ. 71, 113–141. ˇ CERNY´ , A., and J. KALLSEN (2008): Mean-Variance Hedging and Optimal Investment in Heston’s Model with Correlation, Math. Finance 18, 473–492. DUFFIE, D., D. FILIPOVIC´ , and W. SCHACHERMAYER (2003): Afﬁne Processes and Applications in Finance, Annal. Appl. Prob. 13, 984–1053. FRIZ, P., and J. GATHERAL (2005): Valuation of Volatility Derivatives as an Inverse Problem, Quant. Finance 5, 531–542. ITKIN, A., and P. CARR (2007): Pricing Swaps and Options on Quadratic Variation under Stochastic Time Change Models, in Presentation at the 14th Annual CAP Workshop, Columbia University. JACOD, J., and A. SHIRYAEV (2003): Limit Theorems for Stochastic Processes, 2nd ed., Berlin: Springer. KALLSEN, J. (2006): A Didactic Note on Afﬁne Stochastic Volatility Models, in From Stochastic Calculus to Mathematical Finance, Y. Kabanov, R. Liptser, and J. Stoyanov, eds. Berlin: Springer, pp. 343–368. LAMBERTON, D., and B. LAPEYRE (1996): Stochastic Calculus Applied to Finance, London: Chapman & Hall. NICOLATO, E., and E. VENARDOS (2003): Option Pricing in Stochastic Volatility Models of the Ornstein-Uhlenbeck Type, Math. Finance 13, 445–466. RAIBLE, S. (2000): L´evy Processes in Finance: Theory, Numerics, and Empirical Facts, Dissertation Universit¨at Freiburg i. Br. RUDIN, W. (1987): Real and Complex Analysis, 3rd ed., New York: McGraw-Hill. SATO, K. (1999): L´evy Processes and Inﬁnitely Divisible Distributions, Cambridge: Cambridge University Press.

OPTIONS ON VARIANCE IN STOCHASTIC VOLATILITY MODELS

641

SCHOUTENS, W. (2003): L´evy Processes in Finance, New York: Wiley. SCHOUTENS, W. (2005): Moment Swaps, Quant. Finance 5, 525–530. ¨ SCHURGER , K. (2002): Laplace Transforms and Suprema of Stochastic Processes, in Advances in Finance and Stochastics, Berlin: Springer, pp. 285–294. SEPP, A. (2008): Pricing Options on Realized Variance in Heston Model with Jumps in Returns and Variance, J. Comp. Finance 11, 33–70.

Mathematical Finance, Vol. 21, No. 4 (October 2011), 703–722

RELAXED UTILITY MAXIMIZATION IN COMPLETE MARKETS SARA BIAGINI University of Pisa PAOLO GUASONI Boston University

For a relaxed investor—one whose relative risk aversion vanishes as wealth becomes large—the utility maximization problem may not have a solution in the classical sense of an optimal payoff represented by a random variable. This nonexistence puzzle was discovered by Kramkov and Schachermayer (1999), who introduced the reasonable asymptotic elasticity condition to exclude such situations. Utility maximization becomes well posed again representing payoffs as measures on the sample space, including those allocations singular with respect to the physical probability. The expected utility of such allocations is understood as the maximal utility of its approximations with classical payoffs—the relaxed expected utility. This paper decomposes relaxed expected utility into its classical and singular parts, represents the singular part in integral form, and proves the existence of optimal solutions for the utility maximization problem, without conditions on the asymptotic elasticity. Key to this result is the Polish space structure assumed on the sample space. KEY WORDS: utility maximization, asymptotic elasticity, integral representation.

1. INTRODUCTION The problem of maximizing expected utility from a set of payoffs of price x: (UM)

max E P [U(X)]

p(X)=x

is central to asset pricing and portfolio choice. If the market is complete (i.e., p(X) = EQ [X] for some pricing measure Q), the typical solution starts from the Euler equation: (1.1)

U (X) = y

dQ , dP

which aligns marginal utility with the state price density d Q/d P. If the Lagrange multiplier y satisﬁes the saturation condition EQ [X] = x, then the payoff X∗ (y) = (U )−1 (yd Q/d P) is optimal for the problem (UM). This argument is so common that passing from a solution of (1.1) to a solution of (UM) is considered almost automatic. We thank Alberto Abbondandolo, Luigi Ambrosio, and Giuseppe Buttazzo for stimulating discussions, and the two anonymous referees for their careful reading of the paper. This work was supported by the National Science Foundation under grants DMS-0532390 and DMS-0807994. Manuscript received May 2009; ﬁnal revision received January 2010. Address correspondence to Paolo Guasoni, Department of Mathematics and Statistics, Boston University, 111 Cummington St, Boston, MA 02215, USA; e-mail: guasoni@bu.edu. DOI: 10.1111/j.1467-9965.2010.00451.x C 2010 Wiley Periodicals, Inc.

703

704

S. BIAGINI AND P. GUASONI

Checking the condition EQ [X] = x seems a formality, to be skipped if the actual value of y is not required. Yet, the argument may fail. For certain combinations of the utility function U, the state price density Q, and the initial capital x, none of the payoffs X∗ (y) satisﬁes EQ [X∗ (y)] = x, and the problem (UM) has no solution—a phenomenon ﬁrst discovered by Kramkov and Schachermayer (1999, example 5.2). Indeed, they show the existence of a solution under the asymptotic elasticity condition: AE(U) = lim sup x↑∞

xU (x) < 1, U(x)

which has a clear interpretation in terms of asymptotic relative risk aversion: ARRA(U) = lim − x↑∞

xU (x) . U (x)

ˆ When this limit exists, De l’Hopital’s rule implies that the condition AE(U) < 1 is equivalent to ARRA(U) > 0, that is, relative risk aversion is bounded away from zero for arbitrarily large wealth levels. Thus, optimal payoffs may not exist for utility functions which are asymptotically relatively risk neutral, that is, ARRA(U) = 0. It is tempting to dismiss such examples as mathematical pathologies without economic substance. After all, common utility functions such as the logarithmic, power, exponential utilities, and in general the HARA (Hyperbolic Absolute Risk Aversion) class, do satisfy ARRA(U) > 0. These utility functions are ubiquitous in ﬁnance, and a condition violated by them seems of little interest. However, power utilities themselves lead to a utility function satisfying ARRA(U) = 0 in heterogeneous preferences equilibria. In a model with several agents with individual constant relative risk aversion (i.e., power utility), Benninga and Mayshar (2000) and Cvitanic and Malamud (2008) show that the utility function of the representative agent has decreasing relative risk aversion, which converges—for large levels of wealth—to the value of the least risk adverse agent. Thus, the presence of agents with arbitrarily low relative risk aversion implies that ARRA(U) = 0. This paper studies the utility maximization problems for complete markets, relaxing the assumption AE(U) < 1. The central idea is that the topological structure on the sample space allows to obtain a solution even in the critical case AE(U) = 1. In all models of interest, the sample space is already endowed with such a topology, but the classical theory of utility maximization discards topological information, focusing on the measurable structure alone. This loss of information is inconsequential if AE(U) < 1: then a random variable X that maximizes expected utility always exists. But if AE(U) = 1 and the initial x capital exceeds some critical value x∗ , then the agent may achieve maximal utility by concentrating capital on events of arbitrarily small probability. Thus, the candidate optimum would allocate ﬁnite capital on a set of probability zero. Alas, expected utility neglects null sets, and cannot account for such singular allocation. The topology on resolves this problem by identifying available payoffs with Radon measures μ of mass less than or equal to x—the space of relaxed payoffs. Then the contribution to expected utility of μ = μa + μs splits into two parts. The classical expected utility EP [U(X)] accounts for the component dμa = Xd Q, absolutely continuous with respect to Q. The component μs , singular with respect to Q leads to the novel term ϕdμs , which credits the concentration of capital on null sets for its contribution to

RELAXED UTILITY MAXIMIZATION IN COMPLETE MARKETS

705

expected utility. The “singular utility” ϕ depends on both the utility function U and on the pricing measure Q. This paper contributes to mathematical ﬁnance by resolving the nonexistence puzzle of Kramkov and Schachermayer (1999) in complete markets, proving the existence of a solution in a larger space of payoffs, and it clariﬁes the structure of the expected utility and its maximizers. Mathematically, the main result is Theorem 2.3, which can be read as an integral representation of the utility functional. In comparison to similar results in the literature, applications to mathematical ﬁnance require more ﬂexibility on the sample space , which is assumed to be Polish, but not necessarily locally compact. The rest of the paper is organized as follows: Section 1 summarizes the assumptions and the main results, discussing their signiﬁcance. Section 2 proves the integral representation result, and is probably the most technical part of the paper. The utility maximization result is proved in Section 3, whereas the last section contains examples and counterexamples which show the relevance of the results and their assumptions.

2. SUMMARY OF RESULTS Let (, T ) be a Polish space, and P a Borel-regular probability on the Borel σ -ﬁeld F. The set of payoffs C(x) available with initial capital x is deﬁned in terms of the pricing measure Q, implying that the market is complete: C(x) := {X ∈ L0+ | E Q [X] ≤ x} for x > 0, where Q is equivalent to P. The paper makes the following assumptions: ASSUMPTION 2.1. (i) The utility function U : (0, +∞) → (−∞, +∞) is strictly increasing, strictly concave, continuously differentiable, and satisﬁes the Inada conditions U (0+ ) = +∞ and U (+∞) = 0. (ii) sup X∈C(x) E P [U(X)] < U(∞). (iii) P (and hence Q) has full support, that is P(G) > 0 for any open set G. (i) means that marginal utility spans the whole range (0, ∞). Appetites change smoothly; (ii) is a well-posedness condition. Bliss utility cannot be reached; (iii) means that includes only relevant events. It does not restrict generality, in that (iii) always holds after replacing with the support of P. The pricing probability Q allows to identify each classical payoff X with the ﬁnite Borel measure dμX = Xd Q, deﬁned by μX (A) = EQ [X1A ]. With this identiﬁcation, the expected utility map X → IU (X) has the expression: dμ X dμ X dP U U (ω) d P(ω) = (ω) (ω)d Q(ω). IU (X) := E P [U(X)] = dQ dQ dQ Kramkov and Schachermayer (1999) show with counterexamples that the original problem (UM) may not have a solution if AE(U) = 1. In a complete market, they show that maximizing sequences (Xn )n≥1 may concentrate capital on “cheap” Arrow-Debreu (ω) ≈ 0. Such securities, which yield a large payoff X on an event securities, on which dd Q P of tiny probability, seem superﬁcially irrelevant for utility maximization, as the marginal utility U (X) decreases to zero for large payoffs. However, because the contribution to P (ω), it may still remain positive on those events expected utility is driven by U (X(ω)) dd Q

706

S. BIAGINI AND P. GUASONI

P where dd Q (ω) is unbounded. See Schachermayer (2002) for a further discussion of this phenomenon. This scenario bafﬂes the existing mathematical theory in two ways. First, the utility map loses its upper semicontinuity with respect to maximizing sequences, as the utilities of maximizing payoffs are no longer uniformly integrable. Second, the purely measure theoretic setting (, F, P) becomes inadequate to represent singular capital allocations. If a maximizing sequence (Xn )n≥0 ⊂ L1 (Q) converges to a Dirac delta on some ω, this delta is a natural candidate for a maximizer. On the other hand, if P(ω) = 0, removing ω from the original leads to an equivalent model where no such candidate exists. Thus, a solution may or may not exist, depending on the initial choice of the sample space . This paper starts from the observation that in most models the sample space is already equipped with a topological structure. For example, in diffusion models is the Wiener space endowed with the uniform topology, while discontinuous models lead to the Skorokhod space. Furthermore, these topologies are compatible with a complete separable metric—they are Polish spaces. The Polish space structure allows to identify payoffs as measures. This perspective is economically straightforward, thinking of as a roulette table, and of a payoff as a distribution of chips on the various numbers. The payoffs μX of the form dμX = Xd Q are a subclass of the norm dual space (Cb ())∗ , which is isometric to r ba(), the space of Borel regular, ﬁnitely additive signed measures on (Dunford and Schwartz 1988, IV.6). Each element μ ∈ r ba() admits the unique three-way decomposition:

(2.1)

μ = μa + μs + μ p ,

where μa and μs are countably additive measures, respectively absolutely continuous and singular with respect to Q (and P), and μp is a purely ﬁnitely additive measure. All three components are Borel regular. Because r ba() is the dual of a Banach space, its bounded sets—including sequences of available payoffs in C(x)—are relatively weak star compact. This property is crucial, as it yields limits to maximizing sequences. DEFINITION 2.2. A relaxed payoff is an element of D(x), the weak star σ (rba(), Cb ()) closed set {μ ∈ r ba()+ | μ() ≤ x}. The disadvantage of D(x) is to include purely ﬁnitely additive measures, which have a dubious interpretation as payoffs. By contrast, countably additive measures—including those singular with respect to Q—allow the usual Arrow-Debreu interpretation of bets paying off in certain states of nature. This paper resolves this issue by allowing a priori all relaxed payoffs, including ﬁnitely additive ones. Then, an additional assumption (Assumption 2.4) implies a posteriori that the optimal payoff is countably additive. The relaxed utility map IU : r ba() → [−∞, +∞), deﬁned on r ba(), is the upper semicontinuous envelope of the original IU : IU (μ) = inf{G(μ) | G : r ba() → [−∞, +∞), G weak∗ u.s.c., G ≥ IU on L1 (Q)}. Because the relaxed utility map IU is weak star upper semicontinuous by deﬁnition, and the space of relaxed payoffs D(x) is weak star compact, the relaxed utility maximization problem: (RUM)

max IU (μ)

μ∈D(x)

RELAXED UTILITY MAXIMIZATION IN COMPLETE MARKETS

707

admits a solution by construction. In addition, as shown below the problems (UM) and (RUM) have the same value sup IU (X) = max IU (μ). X∈C(x)

μ∈D(x)

Then, the challenge is to ﬁnd a “concrete” representation for IU , that is an explicit formula for the relaxed utility map. This task, accomplished in Section 3, involves two additional concepts: the singular utility ϕ and the sup-convolution W. These concepts in turn rely on the convex conjugate of the utility function U, which is now discussed. The convex conjugate of U is the function V : R → (−∞, +∞] deﬁned as V(y) = supx>0 (U(x) − xy), so that V(y) = +∞ for y < 0. The nonnegative function ϕ is deﬁned as:

dQ ϕ(ω) = inf g(ω) g ∈ Cb (), E P V g (2.2) <∞ , dP which is upper semi-continuous, because it is the inﬁmum of a family of continuous functions. Assumption 2.1 (ii) implies that ϕ is ﬁnite valued. Indeed, Kramkov and Schachermayer (1999, theorem 2.0 (i)) show that this assumption is equivalent to the existence of some y > 0 such that E P [V( y dd Q )] < +∞. Thus, ϕ ≤ y. Then W : × R+ → P R is deﬁned as the pointwise sup-convolution of the utility function U and of the random (ω): function x → xϕ(ω) dd Q P dQ W(ω, x) := sup U(z) + (x − z)ϕ(ω) (2.3) (ω) . dP z≤x The main result on integral representation is then: THEOREM 2.3. Let μ ∈ r ba()+ , and Q be a probability fully supported on and equivalent to P. (i) In general, (2.4)

dμa IU (μ) = E P W ·, + ϕdμs + dQ

inf

f ∈Cb (),E P [V ( f

dQ dP

)]<∞

μ p ( f ).

(ii) If ϕ = 0 P-a.s., then (2.5)

dμa IU (μ) = E P U + ϕdμs + dQ

(iii) If lim supx↑∞ (2.6)

xU (x) U(x)

inf

f ∈Cb (),E P [V ( f

dQ dP

)]<∞

μ p ( f ).

< 1, then {ϕ = 0} = and dμa . IU (μ) = E P U dQ

This result is understood as follows. The general formula (i) holds for any μ ∈ rba()+ , but does not have a sound economic interpretation, because it involves the ﬁnitely additive part μp and the sup-convolution W, which differs from the original utility function U. Formula (ii) resolves the second issue, showing that W boils down to U if ϕ is almost surely null. Example 5.3 in Section 5 shows with a counterexample that U and W may differ without this additional assumption.

708

S. BIAGINI AND P. GUASONI

Then the relaxed utility is the sum of three parts: the usual expected utility E[U(X)] a , the purely ﬁnitely additive part μ , whereas the term ϕdμs allows the where X = dμ p dQ interpretation of singular utility, because it accounts for the utility from the concentration of wealth on P-null events, in that ϕ(ω) represents the maximal expected utility from a Dirac delta concentrated at ω. Indeed, ϕ vanishes at each ω where d P/d Q is locally bounded (i.e., bounded in a neighborhood of ω), because concentrating wealth is suboptimal if the odds are ﬁnite. On the other hand, concentration of wealth may yield a positive utility ϕ(ω) at those ω where d P/d Q is unbounded, that is, if the odds are arbitrarily good. The value of ϕ(ω) depends on the speed at which d P/d Q explodes near ω. Finally, formula (iii) reconciles the theorem with the result of Kramkov and Schachermayer (1999), who show the existence of a classical solution under the asymp (x) totic elasticity assumption AE(U) = lim supx↑∞ xU < 1. Indeed, this assumption imU(x) plies that ϕ is zero everywhere (and not merely almost), whence additional terms vanish, a and the expected utility function depends only on X = dμ . dQ If AE(U) = 1, the condition ϕ = 0 P-a.s. and the Assumption 2.1 are not sufﬁcient to guarantee that any optimizer μ∗ of (RUM) is a measure, i.e., μ∗p = 0. Example 5.4 makes this point with a counterexample. This problem is resolved by the next assumption, which rules out the purely ﬁnitely additive part. Mathematically, it is simply a coercivity condition on the singular utility ϕ. From an economic viewpoint, it ensures that exceptionally favorable states (i.e., those with high ϕ) do not disperse outside the compact sets of . The assumption trivially holds if is compact (and not merely Polish), but compactness is too stringent an assumption to encompass typical models. ASSUMPTION 2.4. Denoting by y0 = supω∈ ϕ(ω), assume that either y0 = 0, or there exist ε > 0 and g ∈ Cb () such that the closed set K = {g ≥ y0 − ε} is compact and E P [V(g dd Q )] < ∞. P To state the main result on utility maximization, deﬁne u as the value function of the utility maximization problem (UM) u(x) = sup{E P [U(X)] | E Q [X] ≤ x} and let v be its conjugate: v(y) = supx>0 {u(x) − xy}. Finally, set x0 = lim y↓y0 −v (y) = −v + (y0 ). Then, x0 ∈ (0, +∞] is the capital threshold above which the optimal payoff includes a singular component. THEOREM 2.5. If Assumptions 2.1 and 2.4 hold, and ϕ = 0 a.s., it follows that: (i) u(x) = maxμ∈D(x) IU (μ). (ii) μ∗ = μa∗ + μ∗s , and u(x) = E[U(X∗ )] +

ϕdμ∗s ,

dμ∗

where X∗ = d Qa . X∗ is unique, and the budget constraint is binding: μ∗ () = E Q [X∗ ] + μ∗s () = x. ϕ attains its maximum, and the support of any μ∗s satisﬁes supp(μ∗s ) ⊆ argmax(ϕ).

RELAXED UTILITY MAXIMIZATION IN COMPLETE MARKETS

709

(iii) Optimizers depend on the initial capital x as follows: (a) x â‰¤ x0 (Kramkov and Schachermayer 1999, theorem 2.0). The unique solution Îźâˆ— is absolutely continuous with respect to Q, and dÎźaâˆ— dQ Xâˆ— (x) = = (U )âˆ’1 y(x) , dQ dP where y(x) = (v )âˆ’1 (âˆ’x). (b) x > x0 . dÎźâˆ— Any solution has the form Îźâˆ— = Îźaâˆ— + Îźâˆ—s , where Xâˆ— (x) = d Qa = Xâˆ— (x0 ) = ) and Îźâˆ—s () = x âˆ’ x0 . Therefore, u(x) = u(x0 ) + (x âˆ’ x0 )max Ď‰ (U )âˆ’1 (y0 dd Q P Ď•(Ď‰) = u(x0 ) + (x âˆ’ x0 )y0 . The novelty of this theorem is the existence of optimal solutions, and their description in the singular case: when x0 is ďŹ nite and x > x0 , it is optimal to invest the residual capital x âˆ’ x0 in a very unlikelyâ€”but also very favorableâ€”bet Îźâˆ—s . Such bet is not unique in general, because its contribution to expected utility is linear and therefore multiple solutions arise as soon as argmax(Ď•) has more than one element.

3. REPRESENTATION OF RELAXED UTILITY This section proves Theorem 2.3, the representation formula for the relaxed utility map IU . The argument proceeds in three steps: (i) separate in IU the countably additive part from the purely ďŹ nitely additive part (Lemma 3.2); (ii) ďŹ nd an integral representation for the countably additive part, separating the absolutely continuous and the singular components with respect to Q (Proposition 3.6); (iii) identify the absolutely continuous part as the original expected utility map, and the singular part as an â€œasymptotic utilityâ€? (Lemmas 3.4 and 3.9). The convex conjugate JV : Cb () â†’ (âˆ’âˆž, +âˆž] of the expected utility map IU is

(3.1) JV (g) := sup IU (X) âˆ’ E Q [g X] . XâˆˆL1 (Q)

The proper domain of JV is deďŹ ned as DomJV = {g âˆˆ Cb () : E[V(gd Q/d P)] < âˆž}. The next lemma collects some properties of the conjugate functional JV . LEMMA 3.1.

; (i) JV (g) = E P V g dd Q P (ii) Dom(JV ) = {g âˆˆ Cb () | E P [V(g dd Q )] < +âˆž} is not empty, is contained in P Cb ()+ , and is directed downward; (iii) Ď•(Ď‰) = inf gâˆˆDom(JV ) g(Ď‰) deďŹ nes a random variable, which is positive, bounded, and upper semicontinuous. In addition, there exists a decreasing sequence (gk )kâ‰Ľ1 âŠ‚ Dom(JV ) such that gk (Ď‰)â†“Ď•(Ď‰) for all Ď‰.

Proof. L1 (Q) is decomposable (i.e., f 1A + g1\A âˆˆ L1 (Q) for any f , g âˆˆ L1 (Q) and A âˆˆ F), therefore (i) follows from Rockafellar (1974, theorem 21, part a). Because Dom(V) âŠ† R+ , then Dom(JV ) âŠ‚ Cb ()+ and it is not empty by Assumption 2.1 (ii) (as

710

S. BIAGINI AND P. GUASONI

already noted in the discussion after equation (2.2)). Thus, the pointwise inﬁmum ϕ of the family of continuous, bounded, nonnegative functions Dom(JV ) is well-deﬁned, nonnegative, bounded, and upper semicontinuous. Also, Dom(JV ) is directed downward, because g ∧ f ∈ Dom(JV ) if g, f ∈ Dom(JV ) dQ dQ dQ EP V g ∧ f = EP V g 1{g≤ f } + E P V f 1{ f <g} < +∞. dP dP dP Moreover, the space Cb () has the countable supremum property (Aliprantis and Border 2006, theorem 8.22). This, combined with the directed-downward property implies the existence of a monotone sequence (gk )k≥1 in Dom(JV ) such that gk ≥ ϕ and gk ↓ϕ pointwise. An application of the Hahn–Banach separation theorem (see, e.g., Borwein and Lewis 2006, theorem 4.2.8) ensures that the relaxation IU coincides with the biconjugate functional (IU )∗∗ : r ba() → [−∞, +∞), which is deﬁned as: dQ (3.2) . (IU )∗∗ (μ) = inf μ(g) + E P V g g∈Cb () dP The inﬁmum over Cb () in this formula can be replaced by the inﬁmum over Dom(JV ). As (IU )∗∗ = IU = −∞ whenever μ is not positive, the results in the rest of the section are stated only for μ ∈ r ba + . The following lemma proves the ﬁrst part of Theorem 2.3, which states that the relaxation is additive across the Yosida and Hewitt (1952) decomposition of μ = μc + μp in terms of the countably additive part μc = μa + μs , and the purely ﬁnitely additive part μp . Because is a Polish space, any measure μ = μc ∈ r ba()+ is a Radon measure, that is compact-inner regular (Aliprantis and Border 2006, theorem 12.7). By contrast, any purely ﬁnitely additive μ = μp vanishes on compact sets (Aliprantis and Border 2006, theorem 12.4). This contrasting behavior allows to separate the contributions of μc and μp in the relaxation (3.3). LEMMA 3.2. Let μ ∈ r ba()+ . Then (3.3)

IU (μ) = IU (μc ) +

inf

f ∈Dom(JV )

μ p ( f ).

Proof . The inequality ≥ follows from IU = (IU )∗∗ and from the inequality dQ dQ + μ(g) ≥ E P V g +μc (g) + inf EP V g μp( f ) f ∈Dom(JV ) dP dP ≥ (IU )∗∗ (μc ) +

inf

f ∈Dom(JV )

μ p ( f ).

For the opposite inequality, note that (P + μc ) is a Radon measure. Hence there exists an increasing sequence of compact sets K n such that (P + μc )( \ K n ) < n1 . By contrast, μp (K n ) = 0 for all n because μp is purely ﬁnitely additive. Thus, μp is concentrated on \K n . The Borel-regularity of μp implies the existence of closed sets C n ⊆ \K n such that μ p ( \ C n ) <

1 . n

RELAXED UTILITY MAXIMIZATION IN COMPLETE MARKETS

711

In the Polish space , disjoint closed sets can be separated by continuous functions. That is, there exists a continuous function α n : → [0, 1] which is equal to 1 on K n and 0 on C n . In fact, if d is a distance that induces the topology T on , one such function is α n (ω) =

d(ω, C n ) . d(ω, C n ) + d(ω, K n )

Up to a subsequence, α n converges to 1 (P + μc )-a.s. Fix some f , g ∈ Dom(JV ), and set h n = α n g + (1 − α n ) f . Convexity of V and boundedness of α n imply h n ∈ Dom(JV ), because dQ dQ dQ ≤ EP αn V g + E P (1 − α n )V f < +∞. EP V hn dP dP dP Also, because h n − f = α n (g − f ), and 0 ≤ α n ≤ 1 μ p (h n ) ≤

1 g − f ∞ + μ p ( f ). n

It follows that dQ + μ(h n ) (IU )∗∗ (μ) ≤ E P V h n dP dQ dQ n n ≤ EP α V g + E P (1 − α )V f + μc (h n ) dP dP 1 + g − f ∞ + μ p ( f ) n and passing to the liminf,

dQ + μ(h n ) (IU )∗∗ (μ) ≤ lim inf E P V h n n↑∞ dP dQ dQ + E P (1 − α n )V f ≤ lim E P α n V g n↑∞ dP dP

1 + μc (h n ) + g − f ∞ + μ p ( f ) n dQ = EP V g + μc (g) + μ p ( f ), dP where the liminf in the second line becomes a limit, because α n is bounded and converges to 1 (P + μc )-a.s., hence the dominated convergence theorem applies. Thus, dQ + μc (g) + μ p ( f ) (IU )∗∗ (μ) ≤ EP V g inf f ,g∈Dom(JV ) dP = (IU )∗∗ (μc ) + which completes the proof.

inf

f ∈Dom(JV )

μ p ( f ),

712

S. BIAGINI AND P. GUASONI

REMARK 3.3. It is tempting to replace the expression inf f ∈Dom(JV ) μ p ( f ) with the simpler μp (ϕ), that is, exchange the inﬁmum and the integral. However, because μp is not countably additive, only the inequality μ p (ϕ) ≤ inf f ∈Dom(JV ) μ p ( f ) holds in general. Example 5.4 shows a situation where ϕ = 0 but inf f ∈Dom(JV ) μ p ( f ) > 0. Denote the countably additive elements of r ba()+ simply by M+ , the subset of positive Radon measures. The next step is to prove an integral representation for IU (μ) when μ ∈ M+ . This result extends in part the work of Bouchitt´e and Valadier (1988), who consider a locally compact space . Relaxing this assumption is central in mathematical ﬁnance where sample spaces are typically inﬁnite-dimensional. Recall the deﬁnition of W : × R+ → R, the ω-wise sup-convolution of the utility function U and of the random function (ω, x) → xϕ(ω) dd Q (ω): P dQ (3.4) (ω) . W(ω, x) := sup U(z) + (x − z)ϕ(ω) dP z≤x The sup-convolution W may differ from U only on the event {ϕ dd Q > 0}. P LEMMA 3.4. {ω | W(ω, x) = U(x) for all x > 0} = {ω | ϕ(ω) dd Q (ω) = 0}. P Proof . If ϕ(ω) dd Q (ω) = 0, then W(ω, x) = U(x) from the deﬁnition of W. Vice versa, P observe that if W(ω, x) = U(x) for all x > 0, then U (z) − ϕ(ω)

dQ (ω) ≥ 0 dP

for all z > 0,

and the claim follows from the Inada condition U (∞) = 0: 0 = lim U (z) ≥ ϕ(ω) z→+∞

dQ (ω). dP

LEMMA 3.5. If μ ∈ M+ , then IU (μ) = sup lim sup IU (Xn ), ∗

Xn →μ

n↑∞

where the supremum is taken over all sequences (Xn )n that weak star converge to μ. Proof . The relaxation IU is deﬁned as the upper semicontinuous envelope of IU , hence (cf. Buttazzo 1989, proposition 1.3.1): IU (μ) = sup lim sup IU (Xα ), ∗

Xα →μ

α

where the supremum is taken over all nets (Xα )α∈I converging in weak star sense to μ. Because the trace of the weak star topology on norm bounded subset of M+ is metrizable (e.g., by the Dudley distance, cf. Ambrosio, Gigli, and Savar´e 2008, section 5.1), nets can be replaced by sequences for μ ∈ M+ . PROPOSITION 3.6. Let μ ∈ M+ , so that μ = μc = μa + μs . Then dμa (3.5) IU (μ) = E P W ·, + ϕdμs . dQ

RELAXED UTILITY MAXIMIZATION IN COMPLETE MARKETS

713

Proof . By (3.2) and Lemma 3.5, the relaxed functional satisﬁes dQ ∗∗ IU (μ) = sup lim sup IU (Xn ) = (IU ) (μ) = inf μ(g) + E P V g . g∈Cb () ∗ dP X →μ n↑∞ n

Consider a maximizing sequence (Xn )n≥1 for IU (μ). As dμn := Xn d Q converges to μ in the weak star topology, (Xn )n≥1 is bounded in L1 (Q). Up to a sequence of convex combinations, which preserves the maximizing property by concavity of IU , Komlos Theorem implies that (Xn )n≥1 converges Q-a.s. to some positive random variable Z. a Lemma 3.7 implies that Z ≤ dμ . For any g ∈ Dom(JV ), the pointwise Fenchel inequality dQ U(x) ≤ xy + V(y) yields dQ dQ ≤V g . U(Xn ) − Xn g dP dP Passing to the limsup of the expectations, Fatou’s Lemma implies that dQ dQ dQ lim sup E P U(Xn ) − Xn g ≤ E P U(Z) − Zg ≤ EP V g . dP dP dP n↑∞ Because (Xn )n≥1 is maximizing, and E[Xn g dd Q ] = E Q [Xn g] converges to μ(g), P dQ dQ ≤ EP V g , IU (μ) − μ(g) ≤ E P U(Z) − Zg dP dP adding μ(g) to all members above and decomposing μ = μa + μs dQ dμa dQ −Z g + μs (g) ≤ E P V g + μ(g), IU (μ) ≤ E P U(Z) + dQ dP dP which holds for any g ∈ Dom(JV ). Take now the inﬁmum on g in the above chain. Lemma 3.1 (iii) and Monotone Convergence Theorem ensure that the inﬁmum can be taken within the expectation signs in the middle term. Then dμa dQ + μs (ϕ) ≤ (IU )∗∗ (μ), (3.6) IU (μ) ≤ E P U(Z) + −Z ϕ dQ dP which implies that both inequalities are in fact equalities. Thus, it remains to prove that dμa dμa dQ = E P W ·, . −Z ϕ E P U(Z) + dQ dP dQ For any g ∈ Dom(JV ) and 0 ≤ z ≤ x U(z) + (x − z)ϕ

dQ dQ dQ dQ ≤ U(z) + (x − z)g ≤V g + xg , dP dP dP dP

where the ﬁrst inequality is due to ϕ ≤ g and the second one is an application of Fenchel inequality, U(z) − zy ≤ V(y). Therefore, dQ dQ dQ dQ U(z) + (x − z)ϕ ≤V g + xg ≤ W(ω, x) = sup U(z) + (x − z)ϕ . dP dP dP dP z≤x

714

S. BIAGINI AND P. GUASONI

a Substituting x with dÎź , z with Z in the ďŹ rst term on the left, and taking expectations dQ dÎźa dÎźa dQ âˆ’Z Ď• â‰¤ E P W Âˇ, E P U(Z) + dQ dP dQ (3.7) dQ â‰¤ EP V g + Îźa (g). dP

Thus, combining (3.7) with (3.6), the following holds for any g âˆˆ Dom(JV ): dQ dÎźa âˆ’Z Ď• + Îźs (Ď•) IU (Îź) = E P U(Z) + dQ dP dÎźa dQ + Îźs (Ď•) â‰¤ E P V g + Îźa (g) + Îźs (g), â‰¤ E P W Âˇ, dQ dP a a whence the conclusion (3.5). Moreover, U(Z) + ( dÎź âˆ’ Z)Ď• dd Q = W(Âˇ, dÎź ) almost surely, dQ P dQ whence the pointwise limit Z of the maximizing (Xn )n veriďŹ es dQ dÎźa âˆ’1 Ď• Z= âˆ§ (U ) a.s. dQ dP

LEMMA 3.7. Let (Xn )nâ‰Ľ1 be a bounded sequence in L1+ (Q), such that Xn converges to X a almost surely, and weak star to Îź âˆˆ r ba(). Then X â‰¤ dÎź almost surely. dQ Proof . Note ďŹ rst that Îź â‰Ľ 0, X â‰Ľ 0 and X âˆˆ L1 (Q) by Fatouâ€™s Lemma. By the compact-inner regularity of the measure Îźa + Îźs , it sufďŹ ces to show that E Q [IK X] â‰¤ (Îźa + Îźs )(K) for all compact sets K. Indeed, because the inequality holds for all compact sets K, it also holds for all Borel a Q-a.s. sets B, whence EQ [IB X] â‰¤ (Îźa + Îźs )(B), and in particular X â‰¤ dÎź dQ To this end, proceed similarly to the ďŹ rst part of the proof of Proposition 3.2. Consider a compact K. For any h â‰Ľ 1 there exists a closed set Ch âŠ† K c with Îź p (Ch ) â‰Ľ Îź p () âˆ’ h1 . Consider now a continuous function ghK such that 0 â‰¤ ghK â‰¤ 1, ghK = 1 on K, ghK = 0 on Ch and ghK â†’ 1 K pointwise. Then, for all h â‰Ľ 1

E Q [IK X] â‰¤ E Q ghK X â‰¤ lim E Q ghK Xn = Îź ghK , nâ†‘âˆž

where the second inequality is a consequence of Fatouâ€™s Lemma, while the equality follows from the weak star convergence of (Xn )n â‰Ľ1 âŠ‚ L1 (Q) to Îź. By construction, Îź p (ghK ) â‰¤ Îź p ( \ Ch ) â‰¤ h1 whence

1 E Q [IK X] â‰¤ (Îźa + Îźs ) ghK + , h and the conclusion follows passing to the limit as hâ†‘âˆž. REMARK 3.8. The inequality X â‰¤ give an example in which X = 0 and

dÎźa dQ dÎźa dQ

can be strict. Ball and Murat (1989, example 2) = 1.

It only remains now to put the pieces together.

RELAXED UTILITY MAXIMIZATION IN COMPLETE MARKETS

715

Proof of Theorem 2.3. (i) follows from Lemma 3.2 and Proposition 3.6. Also, ϕ = 0 a.s. implies that W(ω, x) = U(x) almost surely, whence (ii) follows from (i) and Lemma 3.4. To show (iii), recall that if AE(U) holds, EP [V(yd Q/d P)] < +∞ for all constants y > 0 (Kramkov and Schachermayer 2003, note 2). Then ϕ = 0 everywhere on , hence both ϕdμs and inf f ∈Dom(JV ) μ p ( f ) vanish inf

f ∈Dom(JV )

μ p ( f ) ≤ inf μ p (y) = μ p () inf y = 0. y>0

y>0

Denote by F ⊂ the set where neighborhood)

dP dQ

is essentially locally bounded (i.e., bounded in a

dP F := ω ess supω ∈U (ω ) < ∞ for some open U ω . dQ P P The complementary set F c := \F is the set of the poles of dd Q , the points at which dd Q c is unbounded. By deﬁnition, F is open, so F is closed. The following proposition shows that ϕ may be positive only on poles.

LEMMA 3.9. F ⊂ {ϕ = 0}, hence {ϕ > 0} ⊂ F c . P ≤ m a.s. on Proof . If ω∗ ∈ F, there exists an open ball B(ω∗ , ε) ⊂ F such that dd Q dQ ∗ y > 0 large enough, so that E P [V( y d P )] < ∞, and for any y ∈ (0, y) B(ω , ε). Consider consider the continuous bounded function g y = yα + y(1 − α), where

(3.8)

α(ω) =

d(ω, \ B(ω∗ , ε)) d(ω, B(ω∗ , 2ε )) + d(ω, \ B(ω∗ , ε))

.

y. In addition, gy (ω) = y for ω ∈ B(ω∗ , ε/2) Because α ∈ [0, 1] by construction y ≤ g y < ∗ and g y (ω) = y for ω ∈ \B(ω , ε). To prove that g y ∈ Dom(JV ), split the integral JV (g) = E[V(g y dd Q )] as P dQ dQ ∗ ∗ E V gy IB(ω ,ε) + E V g y I\B(ω ,ε) ≤ dP dP y P(B(ω∗ , ε)) + V( ≤V y)P( \ B(ω∗ , ε)), m ≥ where the inequality holds because V is decreasing and g y dd Q P deﬁnition of ϕ ϕ(ω∗ ) =

inf

g∈Dom(JV )

y m

on B(ω∗ , ε). By

g(ω∗ ),

and from gy (ω∗ ) = y, the conclusion ϕ(ω∗ ) = 0 follows. COROLLARY 3.10. If P(F c ) = 0, then ϕ = 0 a.s.

716

S. BIAGINI AND P. GUASONI

4. PROOF OF THEOREM 2.5 The dual value function v was deﬁned after Assumption 2.4 as the convex conjugate of the value function u(x). Kramkov and Schachermayer (1999) show that v(y) coincides )], and is therefore the restriction of JV to the constant functions. v is also with E[V(y dd Q P differentiable for y > y0 . Recall that x0 = lim y↓y0 −v (y) = −v + (y0 ). The next lemma shows an alternative characterization of u(x). LEMMA 4.1. inf (JV (g) + g∞ x) = inf (xy + v(y)) = u(x). y>0

g∈Cb ()

Proof . Only the left equality needs a proof, the other one following from Kramkov and Schachermayer (1999). The inequality ≤ is obvious. To see the reverse inequality, observe that Dom(JV ) ⊆ Cb ()+ and V is decreasing. Thus, for all g ≥ 0 (4.1)

JV (g) + xg∞ ≥ v(g∞ ) + xg∞ ,

which completes the proof.

LEMMA 4.2. Let D(x) = {μ ∈ r ba()+ | μ() ≤ x} as in Deﬁnition 2.2, and let δD(x) be the indicator of D(x). Then its conjugate and biconjugate satisfy

(δD(x) )∗ (g) = sup μ(g) − δD(x) = g + ∞ x, μ∈r ba()

∗∗

(δD(x) ) (μ) = δD(x) (μ). Proof . As μ(g) ≤ μ(g + ) for any positive μ, the supremum in the formula for the conjugate is reached on the μ in D(x) with support contained in {g ≥ 0}. Thus, without loss of generality suppose g ≥ 0. The inequality (δD(x) )∗ (g) ≤ g∞ x follows from the deﬁnition of D(x). To show that equality holds, ﬁx an arbitrary > 0. The upper level set μ deﬁned by A = {g > g∞ − } is open. Because Q has full support, Q(A) > 0. Then 1A d μ = x Q(A) d Q, is in C(x) ⊆ D(x) and μ(g) > (g∞ − ), whence (δD(x) )∗ (g) ≥ (g∞ − ) for all . The original convex functional δD(x) is already weak star lower semicontinuous, because D(x) is weak star closed. Therefore, it coincides with its lower semicontinuous envelope (δD(x) )∗∗ . Assumption 2.4 is used for the ﬁrst time in Lemma 4.3. LEMMA 4.3. If Assumption 2.4 holds, then argmax ϕ is compact, and (4.2)

y0 = max ϕ(ω). ω∈

Proof . Set c = inf k gk ∞ , where (gk )k≥1 is a sequence decreasing to ϕ, and exists by Lemma 3.1 (iii). As shown in (4.1), gk ∞ ∈ Dom(JV ) for all k, so c ≥ y0 ≥ sup ϕ(ω), ω∈

RELAXED UTILITY MAXIMIZATION IN COMPLETE MARKETS

717

where the last inequality follows from the deﬁnitions of y0 and ϕ. To prove (4.2), we show that c = max ϕ. Up to replacing (gk )k≥1 with (g∧gk )k≥1 , assume that the g in Assumption 2.4 is one of the gk , say gk∗ . Then there is a compact upper level of gk∗ of the form K = {gk∗ ≥ y0 − ε∗ }. As c ≥ y0 , K contains the closed set K ∗ = {gk∗ ≥ c − ε∗ }, which is in turn compact. Outside K ∗ ϕ ≤ gk∗ < c − ε∗ .

(4.3)

As K ∗ is compact and ϕ is u.s.c., it attains its maximum on K ∗ . K ∗ contains all the non-empty, closed sets with the ﬁnite intersection property: V k, = {gk ≥ c − } for all k ≥ k∗ , < ε∗ . Therefore, their intersection Y := k, Vk, is not empty, compact, and consists of all the points ω∗ where limk gk (ω∗ ) = c. But limk gk (ω∗ ) = ϕ(ω∗ ), so c = y0 = max ω∈ ϕ and Y = argmax ϕ. Proof of Theorem 2.5. (i) It sufﬁces to show that inf (JV (g) + g∞ x) = max IU (μ). μ∈D(x)

g∈Cb ()

Then the claim follows from Lemma 4.1 and the duality formula u(x) = inf y>0 (xy + v(y)) (cf. Kramkov and Schachermayer 1999, theorem 2.0). Because Dom(JV ) ⊆ Cb ()+ ,

inf (JV (g) + g∞ x) = inf JV (g) + g + ∞ x = inf JV (g) + (δD(x) )∗ (g) ,

g∈Cb ()

g∈Cb ()

g∈Cb ()

where the last equality follows by Lemma 4.2. This lemma and the Fenchel Theorem (Brezis 1983, chapter 1) yield the identity: inf

g∈Cb ()

JV (g) + (δD(x) )∗ (g) = max IU (μ). μ∈D(x)

In fact, the Fenchel Theorem implies that inf

g∈Cb ()

JV (g) + (δD(x) )∗ (g) = max −(JV )∗ (−μ) − (δD(x) )∗∗ (μ) . μ∈r ba()

Now, by deﬁnition (JV )∗ (μ) = supg∈Cb () {μ(g) − JV (g)} and thus −(JV )∗ (−μ)) = (IU )∗∗ (μ) = IU (μ), while (δD(x) )∗∗ (μ) = δD(x) (μ). (ii) The constraint is binding because IU is monotone. To prove that any optimal μ∗ must be a measure, consider the formula ∗ dμa IU (μ∗ ) = E P U + ϕdμ∗s + dQ

inf

f ∈Dom(JV )

μ∗p ( f ).

Suppose that μ∗p = 0, say 0 < μ∗p () = x ≤ x. Using (4.3), the contribution of the purely ﬁnitely additive μ∗p to the (optimal) value IU (μ∗ ) is bounded above by inf

f ∈Dom(JV )

μ∗p ( f ) ≤ μ∗p (gk∗ ) = μ∗p (gk∗ I\K ∗ ) ≤ (y0 − ε∗ )x .

718

S. BIAGINI AND P. GUASONI

Thus, a redistribution of capital, for example, the measure μ = μa∗ + μ∗s + x νs , where ν s is any probability with support contained in the set argmax ϕ, gives a higher utility ∗ dμa IU ( μ) = E P U + ϕd(μ∗s + x νs ) dQ ∗ dμa = EP U + ϕdμ∗s + y0 x ≥ IU (μ∗ ) + ε∗ x > IU (μ∗ ), dQ a

which is a contradiction. Also, X∗ (x) = dμ is unique because U is strictly convex. dQ Finally, a monotonicity argument shows that the support of any optimal μ∗s is contained in argmax ϕ. (iii) The dual problem inf y>0 (v(y) + xy) admits a unique minimizer y(x) for all ﬁxed x > 0. (a) x ≤ x0 . y(x) is the unique solution of the equation −v (y) = x, that V (y(x) dd Q )] = x. Setting X∗ (x) = −V (y(x) dd Q ) = (U )−1 (y(x) dd Q ), is, −E P [ dd Q P P P P Fenchel equality yields dQ dQ ∗ U(X∗ (x)) = V y(x) + y(x) X (x), dP dP whence E[U(X∗ (x))] = v(y(x)) + xy(x). From u(x) = inf y>0 {v(y) + xy}, X∗ (x) ∈ C(x) is the unique optimal payoff. (b) x > x0 . The minimizer of the dual problem is constant, y(x) = y(x0 ) = y0 . Setting X∗ (x) = X∗ (x0 ) = −V (y0 dd Q ) = (U )−1 (y0 dd Q ), now EQ [X∗ (x)] = x0 ≤ x. An application P P of Fenchel equality again yields dQ dQ ∗ + y0 U(X∗ (x)) = U(X∗ (x0 )) = V y0 X (x0 ). dP dP Taking expectations, EP [U(X∗ (x))] = v(y0 ) + x0 y0 . Then u(x) = inf (v(y) + xy) = v(y0 ) + xy0 = E[U(X∗ (x))] + y0 (x − x0 ). y>0

By (ii) above, any optimal μ∗s must satisfy μ∗s () = x − x0 .

COROLLARY 4.4. If v(y) < +∞ for all y > 0 (in particular if AE(U) < 1), then y0 = 0 and x0 = v + (0) = +∞. So the optimal solution is of the form dμ = Xd Q for all x > 0. Proof . The Inada condition U (0) = +∞ implies that V (0) = −∞, whence dQ dQ = +∞. x0 = lim −v (y) = lim −E P V y y↓0 y↓0 dP dP The thesis follows from Theorem 2.5 (iii).

REMARK 4.5. The corollary shows that y0 = 0 implies that x0 = +∞. However, the reverse implication fails, see Example 5.2 where x0 = +∞ but y0 > 0.

RELAXED UTILITY MAXIMIZATION IN COMPLETE MARKETS

719

5. EXAMPLES AND COUNTEREXAMPLES The examples below explain the role of the singular utility function ϕ, and the role of the condition ϕ = 0 a.s. and Assumption 2.4. The utility function U used is the one deﬁned implicitly by its conjugate: V(y) = e1/y for y > 0 and +∞ otherwise. Thus, U(x) = inf y>0 (V(y) + xy) = V( yˆ ) + xˆy, where yˆ is the unique solution to the equation V (y) = −x, that is, e1/y /y2 = x. Because this is a trascendental equation, U does not admit a simple expression in terms of elementary functions. Nevertheless, U satisﬁes the Inada conditions because V (0) = −∞ and V (∞) = 0. Similarly, U(0) = 1 and U(∞) = ∞ because V(0) = ∞ and V(∞) = 1. Finally, U has asymptotic elasticity equal to 1: (5.1)

lim

x↑∞

V (y)y xU (x) = lim − = 1, y↓0 U(x) V(y) − yV (y)

and therefore it violates the assumptions of Kramkov and Schachermayer (1999). Because ˆ it is also twice-differentiable, de l’Hopital’s rule implies that U is asymptotically riskneutral, that is lim −

(5.2)

x↑∞

xU (x) = 0. U (x)

EXAMPLE 5.1 (INFINITELY MANY PRIMAL SOLUTIONS). Consider a bounded double sequence (ωk )k∈Z0 ⊆ R, with downward limit ω−∞ and upward limit ω+∞ , and set = (ωk )k∈Z0 ∪ {ω−∞ , ω+∞ }. Endowed with the Euclidean topology, is a compact Polish space. Deﬁne P by P(ωk ) = c1 |k|−3 e−|k| and P(ω−∞ ) = P(ω+∞ ) = 0, and set dd Q (ωk ) = P c2 / | k | , where c1 , c2 are normalizing constants. )] is ﬁnite iff y ≥ 1/c2 = y0 , and that A simple calculation shows that v(y) =E[V(y dd Q P v+ (y0 ) is also ﬁnite and equal to −2c1 c2 n≥1 1/n 2 . In particular, the no-bliss condition P sup X∈C(x) E P [U(X)] < ∞ is satisﬁed. Moreover, because dd Q is ﬁnite on Z0 , and any ωk ∈ Z0 is an isolated point, it follows that ϕ(ωk ) = 0 for all k ∈ Z0 , while ϕ(ω−∞ ) = ϕ(ω+∞ ) = y0 . Thus, {ϕ > 0} = {ω−∞ , ω+∞ } is a negligible set and ϕ = 0 a.s. holds. Assumption 2.4 also holds because is compact. dμ∗ For x ≤ x0 = −v + (y0 ), the problem admits a classical solution X∗ (ωk ) = d Qa (ωk ), identiﬁed by the system U (X∗ (ωk )) = y

dQ (ωk ) dP

k ∈ Z0

X∗ (ωk )Q(ωk ) = x.

k∈Z0

When x > x0 , the above system no longer admits a solution, because the second equality cannot be achieved for any choice of the Lagrange multiplier y. The singular utility closes this gap, replacing the previous system by the relaxed system ∗ μa dQ U k ∈ Z0 (ωk ) = y (ωk ) dQ dP dμ∗ a (ωk )Q(ωk ) = x, μs (ω−∞ ) + μs (ω+∞ ) + dQ k∈Z0

720

S. BIAGINI AND P. GUASONI

which contains the two additional unknowns μs (ω−∞ ) and μs (ω+∞ ). The solution to the relaxed system is obtained by choosing y = y0 . The value of μs (ω−∞ ) + μs (ω+∞ ) is thus determined from the second equation, but the two individual values μs (ω−∞ ) and μs (ω−∞ ) remain free. Indeed, because the singular utility term is ϕdμs , and ϕ(ω−∞ ) = ϕ(ω+∞ ), any measure of the form (5.3)

dμ∗ (x) = X∗ (x0 )d Q + (x − x0 )(tδω−∞ + (1 − t)δω+∞ )

for any t ∈ [0, 1] is an optimal solution. EXAMPLE 5.2 (x0 = ∞, BUT y0 > 0). Consider a bounded sequence (ωn )n≥1 ⊂ R decreasing to ω∞ , and deﬁne as (ωn )n≥1 ∪ {ω∞ }, endowed with the Euclidean topology, under which it is Polish compact. Deﬁne P by P(ωn ) = pn = e−n /(e − 1) and P(ω∞ ) = p∞ = 0. The payoff set is deﬁned as C(x) = {X | E Q [X] ≤ x}, where Q by dd Q (ωn ) = c1 /n, P where c1 > 1 is a normalizing constant, and the value at ω∞ is irrelevant. As in the previous example, a simple calculation shows that v(y) = EP [V(yd Q/d P)] is ﬁnite iff y > 1/c1 := y0 > 0. Thus, the no-bliss condition sup X∈C(x) E P [U(X)] < ∞ is satisﬁed, and ϕ(ωn ) = 0 for n ≥ 1 and ϕ(ω∞ ) = c11 = y0 . ϕ = 0 a.s. and Assumption 2.4 holds because {ϕ > 0} = {ω∞ } is a P-negligible set and is compact. (y0 ) = +∞ and therefore the optimal payoff X∗ is classical for In this model, x0 = −v + any x > 0, obtained as the unique solution to the system of equations: dQ U (X∗ (ωn )) = y (ωn ) dP X∗ (ωn )qn = x. n≥1

EXAMPLE 5.3 (DROPPING ϕ = 0 A.S.). Let be as in the previous example, but modify P so that P(ω∞ ) > 0. More precisely, ﬁx δ ∈ (0, 1), and deﬁne P by pn = P(ωn ) = (1 − dQ δ)e−n /(e − 1) and p∞ = P(ω∞ ) = δ. Likewise, deﬁne Q by d P (ωn ) = 1/n and Q(ω∞ ) = (1 − n≥1 Q(ωn )) > 0. Now, v(y) is ﬁnite iff y > 1, so y0 = 1. Because the continuous function 1 if n ≤ k k gk (ωn ) = 1 + k1 if +∞ ≥ n > k is in Dom(JV ) for all k ≥ 1, ϕ(ωn ) = 0 for all 1 ≤ n < +∞, while ϕ(ω∞ ) = 1. For ω∞ , the following holds:

dQ W(ω∞ , x) = max U(z) + (x − z)ϕ(ω∞ ) (ω∞ ) , z≤x dP and consider the derivative U (x) − ϕ(ω∞ )

dQ (ω∞ ). dP

If x > x∗ = (U )−1 (ϕ(ω∞ ) dd Q (ω∞ )), W attains its maximum at x∗ , so that P U(x) if x ≤ x∗ , W(ω∞ , x) = U(x∗ ) + (x − x∗ )ϕ(ω∞ ) dd Q (ω∞ ) if x > x∗ . P

RELAXED UTILITY MAXIMIZATION IN COMPLETE MARKETS

721

EXAMPLE 5.4 (NECESSITY OF ASSUMPTION 2.4). The setup is the same as in Example 5.1, only remove the points ω−∞ , ω+∞ . The resulting is no longer compact, but still P Polish with the Euclidean topology.1 As dd Q is now ﬁnite everywhere and the topology is discrete, ϕ is identically null. However, the value function u is the same as in Example 5.2, so in particular for x > x0 u(x) = sup E[U(X)] = E[U(X∗ (x0 ))] + y0 (x − x0 ), X∈C(x)

and the extra contribution cannot be given by a singular measure μ∗s —only by a pure ﬁnitely additive μ∗p with inf f ∈Dom(JV ) μ∗p ( f ) = y0 (x − x0 ) > 0. The maximizing sequences in C(x) for the value u(x) are the same as in Example 5.2, but this time the sequences have a weak star cluster point in D(x) \ M+ .

REFERENCES

ALIPRANTIS, C. D., and K. C. BORDER (2006): Inﬁnite Dimensional Analysis: A Hitchhiker’s Guide, 3rd ed., Berlin: Springer. AMBROSIO, L., N. GIGLI, and G. SAVARE´ (2008): Gradient Flows in Metric Spaces and in the Space of Probability Measures, Lectures in Mathematics ETH Z¨urich, 2nd ed., Basel: Birkh¨auser Verlag. BALL, J. M., and F. MURAT (1989): Remarks on Chacon’s Biting Lemma, Proc. Am. Math. Soc. 107(3), 655–663. BENNINGA, S., and J. MAYSHAR (2000): Heterogeneity and Option Pricing, Rev. Deriv. Res. 4(1), 7–27. BORWEIN, J. M., and A. S. LEWIS (2006): Convex Analysis and Nonlinear Optimization, Vol. 3 of CMS Books in Mathematics/Ouvrages de Math´ematiques de la SMC: Theory and Examples, 2nd ed., New York: Springer. BOUCHITTE´ , G., and M. VALADIER (1988): Integral Representation of Convex Functionals on a Space of Measures, J. Funct. Anal. 80(2), 398–420. BREZIS, H. (1983): Analyse Fonctionnelle, Collection Math´ematiques Appliqu´ees pour la Maˆıtrise: Th´eorie et applications [Collection of Applied Mathematics for the Master’s Degree: Theory and applications.], Masson, Paris. BUTTAZZO, G. (1989): Semicontinuity, Relaxation and Integral Representation in the Calculus of Variations, Vol. 207 of Pitman Research Notes in Mathematics Series, Harlow: Longman Scientiﬁc & Technical. CVITANIC, J., and S. MALAMUD (2008): Asset Prices, Funds’ Size and Portfolio Weights in Equilibrium with Heterogeneous and Long-Lived Funds, Working paper, ETH Zurich and Caltech. DUNFORD, N., and J. T. SCHWARTZ (1988): Linear Operators. Part I, Wiley Classics Library, New York: John Wiley & Sons Inc. General theory, with the assistance of William G. Bade and Robert G. Bartle, Reprint of the 1958 original, A Wiley Interscience Publication. KRAMKOV, D., and W. SCHACHERMAYER (1999): The Asymptotic Elasticity of Utility Functions and Optimal Investment in Incomplete Markets, Ann. Appl. Probab. 9(3), 904–950. KRAMKOV, D., and W. SCHACHERMAYER (2003): Necessary and Sufﬁcient Conditions in the Problem of Optimal Investment in Incomplete Markets, Ann. Appl. Probab. 13(4), 1504– 1516. 1

But the complete distance inducing the topology cannot be the Euclidean metric.

722

S. BIAGINI AND P. GUASONI

ROCKAFELLAR, R. T. (1974): Conjugate Duality and Optimization, Society for Industrial and Applied Mathematics, Philadelphia, PA. Lectures given at the Johns Hopkins University, Baltimore, MD, June, 1973, Conference Board of the Mathematical Sciences Regional Conference Series in Applied Mathematics, No. 16. SCHACHERMAYER, W. (2002): Optimal Investment in Incomplete Financial Markets, in Mathematical Finance—Bachelier Congress, 2000 (Paris), Springer Finance, Berlin: Springer, pp. 427–462. YOSIDA, K., and E. HEWITT (1952): Finitely Additive Measures, Trans. Amer. Math. Soc. 72, 46–66.

Mathematical Finance, Vol. 21, No. 4 (October 2011), 743–774

RISK MEASURES: RATIONALITY AND DIVERSIFICATION SIMONE CERREIA-VIOGLIO Department of Decision Sciences and IGIER, Universit`a Bocconi FABIO MACCHERONI AND MASSIMO MARINACCI Department of Decision Sciences, Dondena, and IGIER, Universit`a Bocconi LUIGI MONTRUCCHIO Collegio Carlo Alberto, Universit`a di Torino

When there is uncertainty about interest rates (typically due to either illiquidity or defaultability of zero coupon bonds) the cash-additivity assumption on risk measures becomes problematic. When this assumption is weakened, to cash-subadditivity for example, the equivalence between convexity and the diversiﬁcation principle no longer holds. In fact, this principle only implies (and it is implied by) quasiconvexity. For this reason, in this paper quasiconvex risk measures are studied. We provide a dual characterization of quasiconvex cash-subadditive risk measures and we establish necessary and sufﬁcient conditions for their law invariance. As a byproduct, we obtain an alternative characterization of the actuarial mean value premium principle. KEY WORDS: risk measures, diversiﬁcation, cash-subadditivity, quasiconvexity, law-invariance, mean value premium principle.

1. INTRODUCTION Risk assessment is a fundamental activity for both regulators and agents in ﬁnancial markets. The problem of a formal deﬁnition of a risk measure and of the economic and mathematical properties that it should satisfy has been heating the debate since the seminal papers of Artzner et al. (1997, 1999) on coherent risk measures. In the last 10 years there has been a ﬂourishing of methodological proposals, mathematical extensions, and variations on this topic. The convex mone¨ tary risk measures of Follmer and Schied (2002, 2004) and Frittelli and Rosazza Gianin (2002) are especially interesting in terms of economic content and mathematical tractability among the generalizations of coherent risk measures. Moreover, these measures naturally appear in pricing and hedging problems in incomplete markets, as shown, for example, by El Karoui and Quenez (1997), Carr, Geman, and Madan (2001), We thank Damir Filipovi´c, Marco Frittelli, Michael Kupper, Frank Riedel, Walter Schachermayer, and, especially, Dilip Madan, an associate editor, and two anonymous referees for stimulating discussions and helpful comments. Part of this research was done while the ﬁrst two authors were visiting the Collegio Carlo Alberto, which they thank for its hospitality. The ﬁnancial support of ERC (advanced grant BRSCDPTEA) is gratefully acknowledged. Simone Cerreia-Vioglio thanks Universit`a Bocconi for summer support and Columbia University where part of this research was done during his Ph.D. studies. Manuscript received January 2009; ﬁnal revision received December 2009. Address correspondence to Fabio Maccheroni, Department of Decision Sciences, Universit`a Bocconi, via Sarfatti 25, 20136, Milano, Italy; e-mail: fabio.maccheroni@unibocconi.it. DOI: 10.1111/j.1467-9965.2010.00450.x C 2010 Wiley Periodicals, Inc.

743

744

S. CERREIA-VIOGLIO ET AL.

Frittelli and Rosazza Gianin (2004), Staum (2004), Filipovi´c and Kupper (2008), and Jouini, Schachermayer, and Touzi (2008). A risk measure is a decreasing function ρ that associates to a future risky position X the minimal reserve amount ρ(X) that should be collected today to cover risk X. The leading examples are solvency capital requirements imposed by supervising agencies to insurance companies and ﬁnancial institutions. Decreasing monotonicity is a minimal rationality requirement imposed on the agencies: higher losses require higher reserves. Convex monetary risk measures have the additional requirement of being convex and cash-additive.1 As pointed out by El Karoui and Ravanelli (2009), cash-additivity fails as soon as there is any form of uncertainty about interest rates; for example when the riskfree asset is illiquid or inexistent.2 For this reason, they suggest to replace cash-additivity with cash-subadditivity, and, maintaining convexity, they provide a representation result for convex cash-subadditive risk measures, together with several examples arising from applications. This paper starts from the observation that once cash-additivity is replaced with the economically sounder assumption of cash-subadditivity, convexity should be replaced by quasiconvexity in order to maintain the original interpretation in terms of diversiﬁcation. Although convexity is generally regarded as the mathematical translation of the fundamental principle “diversiﬁcation cannot increase risk,” literally this principle means “if positions X and Y are less risky than Z, so it is any diversiﬁed position λX + (1 − λ)Y with λ in (0, 1).”

Using a measure of risk ρ, this statement translates into “ρ(X), ρ(Y) ≤ ρ(Z) implies ρ(λX + (1 − λ)Y) ≤ ρ(Z) for all λ in (0, 1),” which is equivalent to convexity under the cash-additivity assumption, while in general (also under cash-subadditivity) it only corresponds to quasiconvexity.3 From a ﬁnancial viewpoint, the passage from convexity to quasiconvexity is conceptually very important. It allows a complete disentangling between the diversiﬁcation principle, which is arguably the central pillar of risk management, and the assumption of liquidity of the riskless asset, which is an abstract (still very useful and popular) simpliﬁcation. The economic counterpart of quasiconvexity of risk measures is quasiconcavity of utility functions (that is, convexity of preferences), which is classically associated to uncertainty aversion in the economics of uncertainty (see, e.g., Debreu 1959; Schmeidler 1989). Uncertainty aversion, namely “if X and Y are preferred to Z, so it is any mixture λX + (1 − λ)Y with λ in (0, 1),”

is one of the soundest empirical ﬁndings in situations where agents ignore the probabilistic model that underlines the economic phenomenon they are facing (for example, it has been recently indicated by Caballero and Krishnamurthy 2008, as one of the possible causes behind the 2008 crisis).

1

See Section 2 for details and formal deﬁnitions. Black (1972) is one of the ﬁrst contributions that casted doubts on liquidity and existence of riskless assets. 3 See Follmer ¨ and Schied (2010), Proposition 2.1, and Example 2.2. 2

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

745

For this reason, in this paper we study quasiconvex cash-subadditive risk measures on an L∞ space.4 We show in Section 3 that these measures take the form (1.1)

ρ(X) = max R(E Q (−X), Q), Q∈M1, f

where M1, f is the set of (ﬁnitely additive) probabilities and R : R×M1, f → [−∞, ∞] is an upper semicontinuous quasiconcave function that is increasing and nonexpansive in the ﬁrst component and such that inf t∈R R(t, ·) is constant. The function R is unique. Convex monetary risk measures correspond to the separable speciﬁcation (1.2)

R(t, Q) = Dt − α(DQ)

for some constant D ∈ (0, 1], while convex cash-subadditive risk measures correspond to (1.3)

R(t, Q) = sup (ct − α(cQ)), c∈[0,1]

where α(·) is the Fenchel conjugate of ρ(−·).5 Representation (1.1) is not only general enough to capture most of the risk measures introduced in the literature, but it also has a very natural interpretation: R(t, Q) is the reserve amount required today, under the probabilistic scenario Q, to cover an expected loss t in the future. Since there is uncertainty about probabilistic scenarios, the supervising agency follows the most cautious approach, that is, it requires the maximum reserve. The evaluations R(t, Q) keep two factors into account, the expected loss t and the plausibility of scenario Q, assessed by the supervising agency. As the special cases (1.2) and (1.3) show (see again the discussion in El Karoui and Ravanelli 2009), the separability of these two risk factors is lost as soon as risky positions and reserve amounts cannot be expressed in the same numeraire in an unambiguous way. This loss of separability becomes even clearer if, inspired by (1.2), one sets a(t, Q) = t − R(t, Q) and rewrites (1.1) as (1.4)

ρ(X) = max {E Q (−X) − a(E Q (−X), Q)}, Q∈M1, f

where the “penalty function” a(E Q (−X), Q) now depends both on the probabilistic scenario and on the expected loss of the position (rather than on the probabilistic scenario alone). It is important to observe that, while the results on convex and cash-subadditive measures build on classic convex duality, our results build on the quasiconvex monotone duality developed in Cerreia-Vioglio et al. (2008b). Speciﬁcally, the techniques developed there are the main tool for our analysis in the L∞ case. We extend them in the Appendix where the general L p case (1 ≤ p ≤ ∞) is considered. As a consequence, there is also a substantial difference between the mathematics that underlies our results and that used in the study of convex risk measures. In view of the importance of law-invariance with respect to a given probability measure P, in Section 5 we characterize quasiconvex risk measures that satisfy this property and we show that in this case the quantile representation 1 ρ(X) = max R (1.5) q−X (s)q d Q (s) ds, Q Q∈M1

4

0

dP

The extension to the general L p case is studied in the Appendix. More in general, there is a natural correspondence between the properties of ρ and those of R as shown in Section 4. 5

746

S. CERREIA-VIOGLIO ET AL.

¨ holds. This result extends those of Chong and Rice (1971), Kusuoka (2001), Follmer and Schied (2004), Dana (2005), Frittelli and Rosazza Gianin (2005), and Leitner (2005) from the domain of convex analysis to that of quasiconvex analysis. Technically speaking, this is one of the most substantive contributions of the present paper. As a byproduct, in Section 5.1 we characterize the risk measures that agree with the actuarial mean value premium principle (see Rotar 2007), that is, the measures of the form (1.6)

ρ(X) = −1 (E P ((−X))),

where is a strictly increasing and convex loss function. Though in a static setting, this result is in the spirit of a very recent one of Kupper and Schachermayer (2008) and it builds on the classic Nagumo-Kolmogorov-de Finetti Theorem.6 Interestingly, Proposition 5.3 shows that for this class of functions (1.7)

R(t, Q) = t − L(−t; Q, P),

where L(−t; Q, P) is the generalized distance between probability measures induced by , introduced by Bellini and Frittelli (2002) in the context of minimax martingale measures. The closing Section 5.2 proposes maxima of risk measures of the form (1.6) as emerging from the agreement of a group of supervising agencies, and studies their properties. We conclude by observing that, as it happens with coherent risk measures and maxmin expected utility preferences (Gilboa and Schmeidler 1989), or convex monetary risk measures and variational preferences (Maccheroni, Marinacci, and Rustichini 2006), also quasiconvex risk measures have a decision theoretic foundation: the uncertainty averse preferences we recently studied in Cerreia-Vioglio et al. (2008a) to which we refer the interested reader for details.

2. PRELIMINARIES Let (, A, P) be a probability space and L∞ (, A, P) be the space of bounded random variables.7 Its topological dual L∞ (, A, P)∗ is isometrically isomorphic to the space of all bounded ﬁnitely additive set functions on A that are absolutely continuous with respect to P (see, e.g., Yosida 1980, chapter IV.9). The positive unit ball of L∞ (, A, P)∗ is denoted by M1, f (, A, P) and coincides with the set of ﬁnitely additive probabilities that are absolutely continuous with respect to P; in particular, M1 (, A, P) is the subset of M1, f (, A, P) consisting of all its countably ∞ ∞ ∗ additive elements. For this reason, given X ∈ L (, A, P) and μ ∈ L (, A, P) , we indifferently write: μ(X), Xdμ, or even Eμ (X) if μ ∈ M1, f (, A, P). The speciﬁcation of the probability space (, A, P) is often omitted and we just write L∞ and M1, f . Unless otherwise stated, L∞ (, A, P) is endowed with its norm topology, L∞ (, A, P)∗ is endowed with its weak∗ topology, and its subsets with the relative weak∗ topology. Product spaces are endowed with the product topology. We consider one period of uncertainty {0, T}. The elements of L∞ represent payoffs at time T of ﬁnancial positions held at time 0. A risk measure is a decreasing function ρ: L∞ → [−∞, ∞]. 6 See Nagumo (1930), Kolmogorov (1930), de Finetti (1931), as well as Hardy, Littlewood, and Polya ´ (1934). 7 Equalities and inequalities among random variables hold almost surely with respect to P.

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

747

As anticipated in the introduction, ρ(X) is interpreted as the minimal reserve amount that should be collected today to cover future risk X. Decreasing monotonicity is justiﬁed by the fact that smaller losses cannot require greater reserves. Given a (deterministic) discount factor D ∈ (0, 1], the function ρ is a monetary risk measure if, in addition, it satisﬁes: Cash-additivity. ρ(X − m) = ρ(X) + Dm for all X ∈ L∞ and m ∈ R. This condition is interpreted in the following way “when m dollars are subtracted from the future position the present capital requirement is augmented by the same discounted amount Dm.” In fact, investing Dm in a risk-free manner offsets the certain future loss m. Cash-additivity is a controversial assumption both from a theoretical and practical viewpoint. For, D is the price of a nondefaultable zero coupon bond available on the market at time 0 with maturity T and face value 1: existence and liquidity of such an asset is not an innocuous assumption and, as observed by El Karoui and Ravanelli (2009), any form of uncertainty in interest rates is sufﬁcient to make the cash-additivity assumption too stringent. For example, in case of illiquidity, D may well depend on the amount m of purchased assets. These considerations lead to the following relaxed version of cash-additivity, which only takes into account the time value of money: Cash-subadditivity. ρ(X − m) ≤ ρ(X) + m for all X ∈ L∞ and m ∈ R+ . The meaning of this condition is “when m dollars are subtracted from a future position the present capital requirement cannot be augmented by more than m dollars.” This is a much more compelling assumption than cash-additivity since it just relies on the fact that an additional reserve of m dollars surely covers the additional loss of the same amount.8 As discussed in the introduction, the risk diminishing effect of diversiﬁcation is usually translated by: Convexity. ρ(λX + (1 − λ)Y) ≤ λρ(X) + (1 − λ)ρ(Y) for all X, Y ∈ L∞ and λ ∈ (0, 1). But, it actually corresponds to the much weaker: Quasiconvexity. ρ(λX + (1 − λ)Y) ≤ max {ρ(X), ρ(Y)} for all X, Y ∈ L∞ and λ ∈ (0, 1). The next simple proposition shows that convexity is equivalent to quasiconvexity for monetary risk measures.9 Clearly, this is not the case for cash-subadditive risk measures.10 In reading the result, recall that a function ρ: L∞ → [−∞, ∞] is nonexpansive (Lipschitz continuous with constant 1) if ρ(Y) ≤ ρ(X) + X − Y for all X, Y ∈ L∞ . PROPOSITION 2.1. Let ρ be a risk measure. (a) If ρ is cash-additive, then it is convex if and only if it is quasiconvex. (b) ρ is cash-subadditive if and only if it is nonexpansive. In both cases, ρ is either ﬁnite valued or identically ±∞. Notice that cash-subadditivity is equivalent to require ρ(X + m) ≥ ρ(X) − m for all X ∈ L∞ and all m ∈ R+ . In fact, it implies ρ(X) = ρ(X + m − m) ≤ ρ(X + m) + m, and the converse is proved in the same way. In particular, our deﬁnition is equivalent to that of El Karoui and Ravanelli (2009). 9 Clearly, the above deﬁnition of quasiconvexity and the one we reported in the introduction are equivalent. 10 See Example 2.2. 8

748

S. CERREIA-VIOGLIO ET AL.

Proof . (a) is essentially known (see, e.g., Gilboa and Schmeidler 1989, lemma 3.3, or Marinacci and Montrucchio 2004, corollary 4.2). Next we prove (b). If ρ : L∞ → R is nonexpansive then ρ(X − m) ≤ ρ(X) + 1X − (X − m) for all X ∈ L∞ and all m ∈ R+ , that is, ρ(X − m) ≤ ρ(X) + m. Conversely, for all X, Y ∈ L∞ , X − Y ≤ X − Y, then X − X − Y ≤ Y, monotonicity and cash-subadditivity imply ρ(Y) ≤ ρ(X − X − Y) ≤ ρ(X) + X − Y, as wanted. Next example shows how the illiquidity of the risk-free asset naturally generates quasiconvex cash-subadditive risk measures that are neither convex nor cash-additive. EXAMPLE 2.2. Let ∅ C L∞ be the set of future positions considered acceptable by the supervising agency, and assume that C is convex and C + L∞ + ⊆ C. For all m ∈ R denote by v(m) the price at time 0 of m dollars at time T and deﬁne, as in Artzner et al. (1999), ρC,v (X) = inf{v(m) : X + m ∈ C} ∀X ∈ L∞ . If v(m) = Dm with D ∈ (0, 1] then ρ C,v is a (ﬁnite valued) convex monetary risk measure.11 The linearity of v is precisely the assumption that fails when zero coupon bonds with maturity T are illiquid. Still it remains sensible to assume that v : R →(−∞, ∞] is increasing and v(0) = 0. Provided v is also upper semicontinuous, we have ρC,v (X) = v(inf{m ∈ R : X + m ∈ C}) = v(ρC,id (X))

∀X ∈ L∞ ,

where id : R →(−∞, ∞] is the identity. Moreover, since ρ C,id is a convex monetary risk measure, then for any nonexpansive v that is not convex, ρ C,v is a quasiconvex cashsubadditive risk measure that is neither convex nor cash-additive. Finally, R0 (R×M1, f ) denotes the class of functions R : R×M1, f → [−∞, ∞] that are upper semicontinuous, quasiconcave, increasing in the ﬁrst component, with inf t∈R R(t, Q) = inf t∈R R(t, Q ) for all Q, Q ∈ M1, f . Moreover, R1 (R×M1, f ) is the subset of R0 (R×M1, f ) consisting of functions R that are nonexpansive in the ﬁrst component, that is, R(t , Q) ≤ R(t, Q) + |t − t | for all t, t ∈ R and all Q ∈ M1, f .

3. REPRESENTATION We are now ready to state and prove our ﬁrst representation result. THEOREM 3.1. A function ρ: L∞ → [−∞, ∞] is a quasiconvex cash-subadditive risk measure if and only if there existsR ∈ R1 (R×M1, f ) such that (3.1)

ρ(X) = max R(E Q (−X), Q) Q∈M1, f

∀X ∈ L∞ .

The function R ∈ R1 (R×M1, f ) for which (3.1) holds is unique and satisﬁes (3.2)

11

R(t, Q) = inf{ρ(X) : E Q (−X) = t}

¨ See, for example, Follmer and Schied (2004, chapter 4).

∀(t, Q) ∈ R×M1, f .

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

749

Recall that ρ is a quasiconvex cash-subadditive risk measure if and only if it is a quasiconvex and nonexpansive risk measure. The next lemma characterizes quasiconvex and upper semicontinuous risk measures. LEMMA 3.2. A function ρ: L∞ → [−∞, ∞] is a quasiconvex upper semicontinuous risk measure if and only if there existsR ∈ R0 (R×M1, f ) such that ρ(X) = max R(E Q (−X), Q)

(3.3)

Q∈M1, f

∀X ∈ L∞ .

The function R ∈ R0 (R×M1, f ) for which (3.3) holds is unique and satisﬁes (3.4)

R(t, Q) = inf{ρ(X) : E Q (−X) = t}

∀(t, Q) ∈ R×M1, f .

Proof . Notice that L∞ is a normed Riesz space with order unit I , M1, f is the positive unit ball of its topological dual, and −ρ is a quasiconcave, lower semicontinuous, and monotone increasing function. The statement then follows from lemma 8 and theorem 3 of Cerreia-Vioglio et al. (2008b). Proof of Theorem 3.1. It only remains to show that ρ is cash-subadditive if and only if R is nonexpansive in the ﬁrst component. Suppose ρ is cash-subadditive, then, for all (t, Q) ∈ R×M1, f and m ∈ R+ , R (t + m, Q) = inf ρ(X) : E Q (−X) = t + m = inf ρ(X) : E Q (− (X + m)) = t = inf ρ (Y − m) : E Q (−Y) = t ≤ inf ρ (Y) + m : E Q (−Y) = t = R(t, Q) + m. Therefore, for all t, t ∈ R and Q ∈ M1, f , t ≤ t + |t − t | and monotonicity of R in the ﬁrst component imply R(t , Q) ≤ R(t + |t − t |, Q) ≤ R(t, Q) + |t − t |, as wanted. Conversely, if R is nonexpansive in the ﬁrst component, then, for all (t, Q) ∈ R×M1, f and m ∈ R+ , R (t + m, Q) ≤ R(t, Q) + |t − (t + m)| = R(t, Q) + m. Moreover, for all X ∈ L∞ , there is Q ∈ M1, f such that ρ(X − m) = R (E Q (−(X − m)), Q ). Therefore, ρ(X − m) = R (E Q (−(X − m)), Q ) = R(E Q (−X) + m, Q ) ≤ R (E Q (−X), Q ) + m ≤ max R(E Q (−X), Q) + m = ρ(X) + m, Q∈M1, f

as wanted.

750

S. CERREIA-VIOGLIO ET AL.

In particular, denoting by α(·) the Fenchel conjugate of ρ(−·),12 a quasiconvex cashsubadditive risk measure ρ is convex if and only if R(t, Q) = sup (ct − α(cQ))

∀(t, Q) ∈ R×M1, f ,

c∈[0,1]

thus obtaining the result of El Karoui and Ravanelli (2009). Moreover, ρ is cash-additive if and only if R(t, Q) = Dt − α(DQ)

∀(t, Q) ∈ R×M1, f ,

which corresponds to the well-known characterization of convex monetary risk measures.13 Maintaining the interpretation of R(t, Q) as the reserve amount required today to cover an expected loss t in the future under the probabilistic scenario Q, the above relations corroborate the claim of El Karoui and Ravanelli (2009) that the passage to cash-subadditivity is the most parsimonious way of taking into account interest rate uncertainty and a supervising agency that is averse to such uncertainty.

4. ADDITIONAL PROPERTIES In this section, we further investigate the correspondence between the properties of ρ and those of R.

4.1. Subadditivity, Positive Homogeneity, and Star-Shapedness Here we analyze some of the most common properties that risk measures, directly or indirectly, have been required to satisfy. We already discussed cash-additivity and convexity as well as their consequences. The ﬁrst property, introduced by Artzner et al. (1999), is very natural and suggests that the overall risk is controlled by controlling the risk of each position: Subadditivity. ρ(X + Y) ≤ ρ(X) + ρ(Y) for all X, Y ∈ L∞ . The second property, still introduced by the same authors, is: Positive Homogeneity. ρ(λX) = λρ(X) for all λ ∈ (0, ∞). Positive homogeneity was early realized to be controversial in terms of liquidity.14 Finally, we consider: Star-Shapedness. ρ(λX) ≥ λρ(X) for all X ∈ L∞ and λ ∈ [1, ∞). This property seems to be the sensible weakening of positive homogeneity imposing that capital requirements increase more than linearly if the position is magniﬁed by a factor greater than 1.15 That is α(μ) = sup X∈L∞ (,A,P) (μ(X) − ρ(−X)) for all μ ∈ L∞ (, A, P)∗ . For more details on the relations between convex duality and quasiconvex monotone duality, see Cerreia-Vioglio et al. (2008b). 14 The fact that an additional liquidity risk may arise if a position is multiplied by a large factor is indeed ¨ one of the motivations leading to the introduction of convex risk measures (see Follmer and Schied 2002). 15 Notice that this property is equivalent to: ρ(λX) ≤ λρ(X) for all λ ∈ (0, 1]. 12 13

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

751

Next proposition characterizes the above properties of ρ in terms of properties of R. From a ﬁnancial perspective it conﬁrms the intuition that R represents scenariodependent reserves while ρ represents their synthesis. In order to avoid indeterminacies, real valued risk measures are considered. PROPOSITION 4.1. Let ρ : L∞ → R be an upper semicontinuous and quasiconvex risk measure. Then: (a) ρ is subadditive if and only if R(·, Q) is subadditive for all Q ∈ M1, f . (b) ρ is positively homogeneous if and only if R(·, Q) is positively homogeneous for all Q ∈ M1, f . (c) ρ is star-shaped if and only if R(·, Q) is star-shaped for all Q ∈ M1, f . Proof . (a) Suppose ρ is subadditive, and let t, t ∈ R and Q ∈ M1, f . Then, R(t + t , Q) = inf ρ(X) : E Q (−X) = t + t ≤ inf ρ(Y + Z) : E Q (−Y) = t and E Q (−Z) = t ≤ inf ρ (Y) + ρ (Z) : E Q (−Y) = t and E Q (−Z) = t = inf ρ (Y) : E Q (−Y) = t + inf ρ (Z) : E Q (−Z) = t = R(t, Q) + R t , Q . Conversely, assume R(·, Q) is subadditive for all Q ∈ M1, f , and let X, Y ∈ L∞ . Then, by (3.3), there exists Q¯ ∈ M1, f such that ρ (X + Y) = R E Q¯ (−X − Y) , Q¯ ≤ R E Q¯ (−X), Q¯ + R E Q¯ (−Y), Q¯ ≤ max R(E Q (−X), Q) + max R E Q (−Y), Q = ρ(X) + ρ (Y) . Q∈M1, f

Q∈M1, f

(b) Suppose ρ is positively homogeneous, and let t ∈ R, λ ∈ (0, ∞), and Q ∈ M1, f . Then,

X R (λt, Q) = inf ρ(X) : E Q (−X) = λt = inf ρ(X) : E Q − =t λ = inf ρ (λY) : E Q (−Y) = t = inf λρ (Y) : E Q (−Y) = t = λ inf ρ (Y) : E Q (−Y) = t = λR(t, Q). Conversely, assume that R(·, Q) is positively homogeneous for all Q ∈ M1, f , and let X ∈ L∞ and λ ∈ (0, ∞). Then, ρ (λX) = max R E Q (−λX) , Q = λ max R(E Q (−X), Q) = λρ(X). Q∈M1, f

Q∈M1, f

752

S. CERREIA-VIOGLIO ET AL.

(c) Suppose ρ is star-shaped, and let t ∈ R, λ ∈ [1, ∞), and Q ∈ M1, f . Then,

X =t R (λt, Q) = inf ρ(X) : E Q (−X) = λt = inf ρ(X) : E Q − λ = inf ρ (λY) : E Q (−Y) = t ≥ inf λρ (Y) : E Q (−Y) = t = λ inf ρ (Y) : E Q (−Y) = t = λR(t, Q). Conversely, assume that R(·, Q) is star-shaped for all Q ∈ M1, f , and let X ∈ L∞ and λ ∈ [1, ∞). Then, ρ (λX) = max R E Q (−λX) , Q ≥ λ max R(E Q (−X), Q) = λρ(X). Q∈M1, f

Q∈M1, f

Similar considerations hold for cash-additivity and convexity.16 We conclude this section by providing an alternative characterization of coherent risk measures (i.e., risk measures that are: cash-additive, subadditive, and positively homogeneous). The following proposition can be also proved by standard convex analysis arguments, but the proof presented here seems shorter and elegant. PROPOSITION 4.2. Let ρ : L∞ → R be a risk measure. The following conditions are equivalent: (i) ρ is coherent; (ii) ρ is subadditive, star-shaped, and ρ(−I ) ≤ −ρ(I ) = D ∈ (0, 1]. Proof . (i) implies (ii) is trivial. First, by subadditivity ρ(0) ≥ 0. Then, 0 ≥ ρ (I ) + ρ (−I ) ≥ ρ (I − I ) = ρ (0) ≥ 0 it follows that ρ (0) = 0 and −ρ(−I ) = ρ(I ) = −D ∈ (0, 1]. For all X ∈ L∞ , the function ρ X : R+ → R deﬁned by ρ X (t) = ρ(t X), for all t ∈ R+ , is star-shaped (since ρ is), hence it is superadditive, and it is subadditive (since ρ is), thus it is additive. Moreover, since ρ X (t) is star-shaped, then ρ Xt(t) is increasing on (0, ∞), and ρ X (t) = t ρ Xt(t) has at most countably many discontinuity points on (0, ∞). By corollary 9 of Acz´el and Dhombres (1989, chapter 2), ρ X is linear, therefore ρ is positively homogeneous. Thus, ρ is sublinear and Lipschitz continuous of rank D. Finally, cashadditivity is proved. By Proposition 4.1 and since ρ is quasiconvex, then R(·, Q) is subadditive and positively homogeneous for all Q ∈ M1, f . Finally, let Q = Q ∈ M1, f : R (1, Q) > −∞ . If Q ∈ Q, by subadditivity R(1, Q) + R(−1, Q) ≥ R(0, Q) = 0 (where the latter equality descends from monotonicity and upper semicontinuity of R(·, Q)). This, in turn, implies 16 Cash-additivity of ρ translates into R(t − m, Q) = R(t, Q) − Dm for all t ∈ R, m ∈ R, and Q ∈ M 1, f . Convexity of ρ corresponds to convexity of R in the ﬁrst component; see corollary 5 of Cerreia-Vioglio et al. (2008b).

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

753

that R(âˆ’1, Q) > âˆ’âˆž. By positive homogeneity of R(Âˇ, Q), it follows that R(Âˇ, Q) is ďŹ nite. Moreover, since Ď (X) â‰Ľ R(E Q (âˆ’X), Q) for all X âˆˆ Lâˆž , D = Ď (âˆ’I ) â‰Ľ R (1, Q) â‰Ľ âˆ’R (âˆ’1, Q) â‰Ľ âˆ’Ď (I ) = D / Q then and positive homogeneity of R(Âˇ, Q) yields R(t, Q) = Dt for all t âˆˆ R. If Q âˆˆ R(1, Q) = âˆ’âˆž. By monotonicity and positive homogeneity, it follows that R(t, Q) = âˆ’âˆž for all t âˆˆ R. Lemma 3.2 guarantees Ď (X) = D max QâˆˆQ E Q (âˆ’X) for all X âˆˆ Lâˆž , proving the statement.

4.2. Continuity from Below (and Above) Next proposition shows that, as in the special cases of convex cash-additive and cashsubadditive risk measures, the possibility of replacing ďŹ nitely additive probabilities with countably additive probabilities in the variational representation (3.1), and indeed in (3.3), corresponds to the following continuity requirement: Continuity from below. Xn X implies Ď (Xn ) â†’ Ď (X) for all Xn , X âˆˆ Lâˆž . PROPOSITION 4.3. Let Ď : Lâˆž â†’ [âˆ’âˆž, âˆž] be a quasiconvex upper semicontinuous risk measure. The following conditions are equivalent: (i) Ď is continuous from below; (ii) R(t, Q) = inf XâˆˆLâˆž Ď (X) for all (t, Q) âˆˆ RĂ— M1, f \ M1 . In this case, (4.5)

max R(E Q (âˆ’X), Q) = max R(E Q (âˆ’X), Q)

QâˆˆM1, f

QâˆˆM1

âˆ€X âˆˆ Lâˆž .

Proof . DeďŹ ne inf Lâˆž Ď = inf XâˆˆLâˆž Ď (X). Consider the following condition: (iii) Q âˆˆ M1, f : R(t, Q) â‰Ľ m âŠ† M1 for all m âˆˆ (inf Lâˆž Ď , +âˆž] and all t âˆˆ R. We show that (i) =â‡’ (iii) =â‡’ (ii) =â‡’ (i). (i) implies (iii). Let t âˆˆ R, m âˆˆ (inf Lâˆž Ď , +âˆž], and Q âˆˆ Q âˆˆ M1, f : R(t, Q) â‰Ľ m . Since m > inf Lâˆž Ď , there exist X âˆˆ Lâˆž such that Ď (X) < m and x â‰Ľ X in R such that Ď (x) â‰¤ Ď (X) < m. If En âˆ… in A,17 then x âˆ’ kI En x in Lâˆž for each k > 0. Continuity from below guarantees that there exists Nk âˆˆ N such that for all n â‰Ľ Nk m > Ď (x âˆ’ kI En ) = max R E Q (kI En âˆ’ x) , Q = max R (kQ (En ) âˆ’ x, Q) . QâˆˆM1, f

QâˆˆM1, f

If kQ (En ) âˆ’ x â‰Ľ t for some n â‰Ľ Nk , since R is increasing, it follows that max R (kQ (En ) âˆ’ x, Q) â‰Ľ R kQ (En ) âˆ’ x, Q â‰Ľ R(t, Q ) â‰Ľ m,

QâˆˆM1, f

17

That is En is a decreasing and vanishing sequence of elements of A.

754

S. CERREIA-VIOGLIO ET AL.

which is absurd. Then kQ (En ) âˆ’ x < t for all n â‰Ľ Nk , hence Q (En ) <

x+t k

âˆ€n â‰Ľ Nk ,

thus limnâ†’âˆž Q (En ) â‰¤ kâˆ’1 (x + t). Since this is the case for each k > 0, then limnâ†’âˆž Q (En ) = 0 and Q âˆˆ M1 . (iii) implies (ii). Clearly, for all (t, Q) âˆˆ RĂ—M1, f , R(t, Q) = inf{Ď (X) : E Q (âˆ’X) = t} â‰Ľ inf Lâˆž Ď . If, per contra, there exists (t0 , Q0 ) âˆˆ RĂ—(M1, f \ M1 ) such that R(t0 , Q0 ) > inf Lâˆž Ď , then, setting m0 = R(t 0 , Q0 ), by (iii) it follows that Q0 âˆˆ Q âˆˆ M1, f : R (t0 , Q) â‰Ľ m0 âŠ† M1 , a contradiction. (ii) implies (i). Let {Xn }nâ‰Ľ1 be a sequence in Lâˆž such that Xn X0 âˆˆ Lâˆž . For each n â‰Ľ 0, deďŹ ne Îłn : M1, f â†’ [âˆ’âˆž, +âˆž] by Îłn (Q) = R E Q (âˆ’Xn ), Q âˆ€Q âˆˆ M1, f . Each Îł n is weak* upper semicontinuous, and the sequence {Îłn }nâˆˆN is decreasing. If Q âˆˆ M1 , then E Q (âˆ’Xn ) E Q (âˆ’X0 ), by the Levi Monotone Converge Theorem, and so, since R(Âˇ, Q) is upper semicontinuous and increasing on R , lim R E (âˆ’X ), Q = nâ†’âˆž Q n / M1 , then R E Q (âˆ’Xn ), Q = inf Lâˆž Ď for all n â‰Ľ 0. ConR E Q (âˆ’X0 ) , Q ; else if Q âˆˆ clude that {âˆ’Îłn }nâˆˆN pointwise converges and so -converges to âˆ’Îł 0 (see, e.g., Dal Maso 1993, remark 5.5). By theorem 7.4 of Dal Maso (1993), min QâˆˆM1, f âˆ’Îłn (Q) â†’ min QâˆˆM1, f âˆ’Îł0 (Q), that is âˆ’Ď (Xn ) â†’ âˆ’Ď (X0 ). Finally, we show that (ii) implies (4.5). If X is such that Ď (X) = inf Lâˆž Ď , then for all Q âˆˆ M1, f , by Lemma 3.2, Ď (X) â‰Ľ R(E Q (âˆ’X), Q) = inf Ď (Y) : E Q (âˆ’Y) = E Q (âˆ’X) â‰Ľ inf Lâˆž Ď = Ď (X). Therefore the maximum in (3.3) is attained at each Q in M1, f , thus at each Q in M1 . Else if Ď (X) > inf Lâˆž Ď , by (ii), the maximum in (3.3) cannot be attained on M1, f \ M1 , thus it is attained on M1 . Next, notice that continuity from below implies norm upper semicontinuity for a risk measure Ď . Indeed: PROPOSITION 4.4. A risk measure Ď is continuous from below (resp., above) if and only if it is upper (resp., lower) semicontinuous with respect to bounded pointwise convergence. Proof . Let {Xn }nâˆˆN be a bounded sequence in Lâˆž that pointwise converges to X. Set Yn = inf kâ‰Ľn Xk for all n âˆˆ N. Then, we have that Xn â‰Ľ Yn for all n âˆˆ N and Yn X. Monotonicity and continuity from below imply lim sup n Ď (Xn ) â‰¤ lim nâ†’âˆž Ď (Yn ) = Ď (X). Conversely, if Xn X, then monotonicity of Ď delivers Ď (X) â‰¤ lim inf n Ď (Xn ), while upper semicontinuity with respect to bounded pointwise convergence delivers lim sup n Ď (Xn ) â‰¤ Ď (X). Moreover, continuity from below and norm lower semicontinuity imply continuity with respect to bounded pointwise convergence, provided Ď is quasiconvex. Formally:

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

755

PROPOSITION 4.5. Let ρ: L ∞ → [−∞, ∞] be a quasiconvex risk measure. The following conditions are equivalent: (i) ρ is continuous from below and norm lower semicontinuous; (ii) ρ is continuous with respect to bounded pointwise convergence. Proof . Clearly, (ii) and Proposition 4.4 yield (i). By Proposition 4.4, to prove the converse it is sufﬁcient to show that ρ is continuous from above. Let Xn X. By monotonicity, ρ(Xn ) is increasing and limn→∞ ρ(Xn ) ≤ ρ(X). Assume, per contra, strict inequality holds. Then {Xn }n∈N is contained in {ρ < c} for some c < ρ(X). The assumptions on ρ guarantee that {ρ ≤ c} is nonempty, convex, norm closed, and

{ρ ≤ c} ⊆ [Qi ≥ bi ] , i ∈I

where {(bi , Qi ) : i ∈ I} = {(b, Q) ∈ R × M1 : [Q ≥ b] ⊇ {ρ ≤ c}}. As to the converse inclusion, let Y∈{ρ ≤ c}. By a Separating Hyperplane Theorem, there exist b ∈ R, ε > 0, and Q ∈ L∞ (, A, P)∗ \ {0} such that {ρ ≤ c} ⊆ [Q ≥ b] and Y ∈ [Q < b − ε] . Monotonicity allows to assume Q ∈ M1, f .18 If Q ∈ / M1 , then R(t, Q) = inf X∈L∞ ρ(X) ≤ ρ (X1 ) < c for all t ∈ R. For t = −b + ε, this implies c > R (−b + ε, Q) = inf ρ (Z) : E Q (Z) = b − ε . Then ρ (Z ) < c for some Z ∈ [Q = b − ε], which is absurd since {ρ ≤ c} ⊆ [Q ≥ b]. Summing up, if Y∈{ρ ≤ c} there are c ∈ M1 such that [Q ≥ b] ⊇ {ρ ≤ c} b ∈ R and Q [Q ] ≥ b . and Y∈[Q ≥ b]. Thus, {ρ ≤ c}c ⊆ i i i ∈I Finally, {Xn }n∈N ⊆ {ρ ≤ c} implies E Qi (Xn ) ≥ bi for all n ∈ N and i ∈ I. By the Monotone Convergence Theorem, E Qi (X) ≥ bi for all i ∈ I, then ρ(X) ≤ c which contradicts c < ρ(X).

5. LAW-INVARIANCE In this section, we consider a continuous from below quasiconvex risk measure ρ(X) = max R(E Q (−X), Q) Q∈M1

∀X ∈ L∞ .

In the study of law-invariance it is useful to consider some important stochastic orders. The convex order cx is deﬁned on L1 by X cx Y if and only if E P (φ(X)) ≥ E P (φ (Y)) for all convex φ : R → R. The increasing convex order icx and second order stochastic dominance ssd are deﬁned analogously by replacing convex functions by increasing convex functions and increasing concave functions, respectively. Notice that X icx Y if 18 If Z ∈ L∞ then X + n Z ∈ {ρ ≤ c} for all n ∈ N, and Q(X ) + n Q(Z) ≥ b delivers Q(Z) ≥ 0. Then Q 1 1 + is a nonzero positive linear functional, and if Q ∈ / M1, f it is sufﬁcient to normalize it.

756

S. CERREIA-VIOGLIO ET AL.

and only if −X ssd − Y and that the three preorders share the same symmetric part ∼, which is the identical distribution with respect to P relation.19 As widely discussed in the literature (see, e.g., the classic Rothschild and Stiglitz 1970; Marshall and Olkin 1979), X cx Y intuitively means that the values of X are more dispersed than those of Y, while X ssd Y is the standard formalization of the statement “X is less risky than Y,” provided P is the unanimously accepted model for uncertainty. The convex order naturally induces a relation on M1 by Q cx Q if and only if

dQ d Q cx . dP dP

The intuition is the same: the probability masses d Q(ω) are more scattered with respect to d P(ω) than the masses d Q (ω). An extended real valued function γ deﬁned on a subset of L1 is law-invariant (or rearrangement invariant) if and only if X ∼ Y implies γ (X) = γ (Y) ; while γ is Schur concave if and only if X cx Y implies γ (X) ≤ γ (Y) . Finally, γ preserves second order stochastic dominance if and only if X ssd Y implies γ (X) ≤ γ (Y) . Clearly, the latter property is desirable for a risk measure, under the assumption that all the agents agree on P. If X is considered to be less risky than Y, it is difﬁcult for the supervising agency to require a higher reserve amount for X than for Y. THEOREM 5.1. Let ρ be a quasiconvex and continuous from below risk measure. The following conditions are equivalent: (i) ρ preserves second order stochastic dominance; (ii) R(t, ·) is Schur concave on M1 for all t ∈ R. In this case, (5.1)

Q∈M1

1

ρ(X) = max R 0

q−X (s)q d Q (s) ds, Q dP

∀X ∈ L∞

and (5.2)

R(t, Q) = inf ρ (Y) :

1 0

q d Q (s)qY (1 − s) ds = −t dP

∀(t, Q) ∈ R×M1 .

Moreover, if (, A, P) is adequate, then (i) and (ii) are equivalent to: (iii) ρ is law-invariant; (iv) R(t, ·) is rearrangement invariant on M1 for all t ∈ R. 19

See Chong (1974) for this fact and for alternative characterizations of these orders.

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

757

Â¨ Here qZ denotes any quantile of Z âˆˆ L1 (see, e.g., Follmer and Schied 2004),20 and a probability space is adequate if and only if it is either ďŹ nite and endowed with the uniform probability or nonatomic. We used the term â€œrearrangement invariantâ€? rather than the equivalent â€œlaw-invariantâ€? in (iv) since it gives a better intuition of what happens in the ďŹ nite equidistributed case: R(t, Q) = R(t, Qâ—ŚĎƒ ) for all permutations Ďƒ of and all (t, Q) âˆˆ RĂ—M1 . Proof . The proof heavily relies on the theory of rearrangement invariant Banach spaces developed by Luxemburg (1967) and Chong and Rice (1971). For convenience, the latter reference is denoted from now on by CR. Following its notation, if X is measurable, set Î´ X (s) = inf {x âˆˆ R : P ({Ď‰ âˆˆ : X (Ď‰) > x}) â‰¤ s} = inf {x âˆˆ R : FX (x) â‰Ľ 1 âˆ’ s} = FXâˆ’1 (1 âˆ’ s) = q Xâˆ’ (1 âˆ’ s) for all s âˆˆ [0, 1]. Step 1. If Y âˆˆ Lâˆž and Q âˆˆ M1 , then (5.4)

1

E Q (Y) : M1 Q cx Q =

Î´Y (s)Î´ d Q (1 âˆ’ s) ds, dP

0

1

Î´Y (s)Î´ d Q (s) ds . dP

0

Moreover, if (, A, P) is adequate, then (5.5)

1

Î´Y (s)Î´ d Q (1 âˆ’ s) ds = min E Q (Y) : M1 Q âˆź Q and dP

0

1

(5.6) 0

Î´Y (s)Î´ d Q (s) ds = max E Q (Y) : M1 Q âˆź Q . dP

Proof of Step 1. [CR 13.4] and [CR 13.8] guarantee that, if Y and X belong to the set M(, A, P) of measurable functions and Î´ |Y | Î´ |X| âˆˆ L1 ([0, 1]), then

YX d P : M(, A, P) X cx

X =

1

Î´Y (s)Î´ X (1 âˆ’ s) ds,

0

1

Î´Y (s)Î´ X (s) ds .

0

Moreover, if (, A, P) is adequate, then

1

YX d P : M(, A, P) X âˆź X and

YX d P : M(, A, P) X âˆź X .

Î´Y (s)Î´ X (1 âˆ’ s) ds = min

0

1

Î´Y (s)Î´ X (s) ds = max

0 20 Notice that we are not committing to any speciďŹ c version of the quantile, e.g., the right continuous, or the left continuous, etc.

758

S. CERREIA-VIOGLIO ET AL.

The condition Î´ |Y | Î´ |X| âˆˆ L1 ([0, 1]) is implied by Î´ |Y | âˆˆ L p ([0, 1]) and Î´ |X| âˆˆ Lq ([0, 1]), where either p = âˆž and q = 1 or p = 1 and q = âˆž, which is equivalent to Y âˆˆ L p () and X âˆˆ Lq () [CR 4.3]. In this case,

X âˆˆ M(, A, P) : X cx X = X âˆˆ Lq : X cx X .

In fact, X âˆˆ Lq and X cx X imply X âˆˆ Lq [CR 10.2]. Therefore, if Y âˆˆ L p () and X âˆˆ Lq (), then YX d P : Lq X cx X =

(5.7)

1

Î´Y (s)Î´ X (1 âˆ’ s) ds,

0

1

Î´Y (s)Î´ X (s) ds .

0

Moreover, if (, A, P) is adequate, then (5.8)

1

YX d P : Lq X âˆź X and

YX d P : Lq X âˆź X .

Î´Y (s)Î´ X (1 âˆ’ s) ds = min

0

(5.9)

1

Î´Y (s)Î´ X (s) ds = max

0

If, in addition, X is a probability density (p.d.) and X cx X, then X â‰Ľ 0 [CR 10.2] and E (X ) = E(X) = 1, that is X is a probability density. Finally, if Y âˆˆ Lâˆž and Q âˆˆ M1 , then

dQ E Q (Y) : M1 Q cx Q = YX d P : X is a p.d. and X cx dP

dQ = YX d P : L1 X cx dP 1 1 = Î´Y (s)Î´ d Q (1 âˆ’ s) ds, Î´Y (s)Î´ d Q (s) ds . dP

0

dP

0

Moreover, if (, A, P) is adequate, then 0

1

Î´Y (s)Î´ d Q (s) ds = max dP

YX d P : L1 X âˆź

dQ dP

dQ = max YX d P : X is a p.d. and X âˆź dP = max E Q (Y) : M1 Q âˆź Q .

The formula for the minimum is proved in the same way. The next step is essentially due to Hardy, see, e.g., [CR 9.1]: Step 2. Let p = âˆž and q = 1 or vice versa, X, X âˆˆ L p and Y âˆˆ Lq . 1 1 (a) Xcx X implies 0 Î´ X (s)Î´Y (s) ds â‰¤ 0 Î´ X (s)Î´Y (s) ds. 1 1 (b) Xcx X implies 0 Î´ X (s)Î´Y (1 âˆ’ s) ds â‰Ľ 0 Î´ X (s)Î´Y (1 âˆ’ s) ds. 1 1 (c) Xicx X and Y â‰Ľ 0 implies 0 Î´ X (s)Î´Y (s) ds â‰¤ 0 Î´ X (s)Î´Y (s) ds.

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

759

Proof of Step 2. X, X ∈ Lp and Y ∈ Lq is equivalent to δ X , δ X ∈ L p ([0, 1]) and δ Y ∈ L ([0, 1]) [CR 4.3]. In particular, δ X δY , δ X δY ∈ L1 ([0, 1]). Also notice that f (s) ∈ Lq ([0, 1]) if and only if f (1 − s) ∈ Lq ([0, 1]), and 1 1 f (s) ds = f (1 − s) ds. q

0

0

w w (a) (resp., (b)) If Xcx X , then 0 δ X (s) ds ≤ 0 δ X (s) ds for all w ∈ [0, 1] and 1 1 δ (s) ds = 0 δ X (s) ds, since δ Y (s) is decreasing (resp., δ Y (1 − s) is increasing), then 1 1 1 01 X 0 δ X (s)δY (s) ds ≤ 0 δ X (s)δY (s) ds (resp., 0 δ X (s)δY (1 − s) ds ≥ 0 δ X (s)δY (1 − s) ds) [CR 9.1]. w w (c) If Xicx X and Y ≥ 0, then 0 δ X (s) ds ≤ 0 δ X (s) ds for all w ∈ [0, 1] and 1 1 δ Y is decreasing and nonnegative [CR 2.8], then 0 δ X (s)δY (s) ds ≤ 0 δ X (s)δY (s) ds [CR 9.1]. Step 3. If either R(t, ·) is Schur concave on M1 for all t ∈ R or (, A, P) is adequate and R(t, ·) is rearrangement invariant on M1 for all t ∈ R, then 1 (5.12) δ−X (s)δ d Q (s) ds, Q ∀X ∈ L∞ . ρ(X) = max R Q∈M1

dP

0

1 Proof of Step 3. Let X ∈ L∞ . Then, E Q (−X) ≤ 0 δ−X (s)δd Q/d P (s) ds for all Q ∈ M1 , by (5.4), thus, monotonicity of R in the ﬁrst component implies 1 ρ(X) = max R(E Q (−X), Q) ≤ sup R δ−X (s)δ d Q (s) ds, Q . Q∈M1

Q∈M1

0

dP

Conversely, for any Q ∈ M1 , by (5.4) there exists Q cx Q (resp., by (5.6) there exists Q ∼ Q) such that 1 δ−X (s)δ d Q (s) ds = E Q (−X). dP

0

Thus,

1

R 0

δ−X (s)δd Q/d P (s) ds, Q = R E Q (−X), Q ≤ R E Q (−X), Q ≤ ρ(X)

by Schur concavity (resp., rearrangement invariance). Therefore, 1 δ−X (s)δ d Q (s) ds, Q ≤ ρ(X) sup R Q∈M1

0

dP

and the supremum is attained.

Step 4. (ii) implies (i) and (5.1), also (iv) implies (i) provided (, A, P) is adequate. Proof of Step 4. By Step 3, (ii) guarantees that (5.12) holds, and the same is true for (iv) if (, A, P) is adequate. But (5.12) is equivalent to (5.1) since δY (s) = qY− (1 − s) for s ∈ [0, 1].

760

S. CERREIA-VIOGLIO ET AL.

1 if and only if −Xicx − Y. Thus, Step 2.c implies 1Moreover, X ssd Y δ (s)δ (s) ds ≤ d Q/d P 0 −X 0 δ−Y (s)δd Q/d P (s) ds for all Q ∈ M1 , and monotonicity of R allows to conclude that 1 1 ρ(X) = max R δ−X (s)δ d Q (s) ds, Q ≤ max R δ−Y (s)δ d Q (s) ds, Q = ρ (Y) . Q∈M1

0

Q∈M1

dP

0

dP

Therefore, ρ preserves second order stochastic dominance and, in particular, it is lawinvariant. Step 5. If either ρ preserves second order stochastic dominance or (, A, P) is adequate and ρ is law-invariant, then, for all (t, Q) ∈ R×M1 ,

1 (5.13) R(t, Q) = inf ρ (Y) : δ d Q (s)δY (1 − s) ds ≤ −t dP

0

= inf ρ (Y) :

1

δ d Q (s)δY (1 − s) ds = −t . dP

0

Proof of Step 5. Notice that if ρ preserves second order stochastic dominance, then it is Schur convex, that is, Xcx Y implies ρ(X) ≤ ρ(Y). Let ρ be Schur convex (resp., law-invariant). First observe that R(t, Q) = inf ρ(X) : E Q (−X) ≥ t = inf ρ(X) : E Q (X) ≤ −t for all (t, Q) ∈ R×M1 .21 Since ρ is Schur convex (resp., rearrangement invariant), then inf ρ(X) : E Q (X) ≤ −t = inf ρ (Y) : there exists X cx Y such that E Q (X) ≤ −t (resp. = inf ρ (Y) : there existsX ∼cx Y such that E Q (X) ≤ −t ), but,

inf ρ (Y) : E Q (X) ≤ −t for some X cx Y

dQ ∞ Xd P : L X cx Y ≤ −t = inf ρ (Y) : min dP (resp., inf ρ (Y) : E Q (X) ≤ −t for some X ∼ Y

dQ ∞ Xd P : L X ∼ Y ≤ −t ). = inf ρ (Y) : min dP

By (5.7) and (5.8), for all (t, Q) ∈ R×M1 ,

1 δ d Q (s)δY (1 − s) ds ≤ −t R(t, Q) = inf ρ (Y) :

dP

0

≤ inf ρ (Y) : 0

1

δ d Q (s)δY (1 − s) ds = −t . dP

21 Clearly, R(t, Q) ≥ inf{ρ(Y) : E (−Y) ≥ t}. Conversely, assume per contra that R(t, Q) > inf{ρ(Y) : Q E Q (−Y) ≥ t} for some (t, Q) ∈ R×M1 . This implies the existence of Z ∈ L∞ for which E Q (−Z) ≥ t and ρ(Z) < R(t, Q). Set m = E Q (−Z) − t ≥ 0, then Z + m ≥ Z, E Q (−(Z + m)) = t and R(t, Q) ≤ ρ(Z + m) ≤ ρ(Z) < R(t, Q), a contradiction. The second equality is trivial.

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

761

1 Finally, assume per contra that R(t, Q) < inf{ρ(Y) : 0 δd Q/d P (s)δY (1 − s) ds = −t} for 1 some (t, Q) ∈ R×M1 . This implies the existence of Z ∈ L∞ for which 0 δd Q/d P (s)δ Z(1 − s) ds ≤ −t and

1 ρ (Z) < inf ρ (Y) : δ d Q (s)δY (1 − s) ds = −t . 0

dP

Since δ Z+m = δ Z + m for all m ∈ R, then 1 δ d Q (s)δ Z+m (1 − s) ds = dP

0

Choose m ≥ 0 so that

1 0

1 0

δ d Q (s)δ Z(1 − s) ds + m. dP

δ d Q (s)δ Z+m (1 − s) ds = −t, dP

then Z + m ≥ Z, and ρ(Z + m) ≤ ρ(Z) < inf{ρ(Y) : contradiction.

1 0

δd Q/d P (s)δY (1 − s) ds = −t}, a

Step 6. (i) implies (ii) and (5.2), also (iii) implies (ii) provided (, A, P) is adequate. Proof of Step 6. By Step 5, (i) guarantees that (5.13 ) holds, and the same is true for (iii) if (, A, P) is adequate. But, the second part of (5.13) is equivalent to (5.2) since δY (s) = qY− (1 − s) for s ∈ [0, 1]. While the ﬁrst part together with Step 2.b yields the following chain of implications 1 1 Q cx Q =⇒ δ d Q (s)δY (1 − s) ds ≥ δ d Q (s)δY (1 − s) ds for all Y ∈ L∞ 0

dP

0

dP

1 1 =⇒ Y : δ d Q (s)δY (1 − s) ds ≤ −t ⊆ Y : δ d Q (s)δY (1 − s) ds ≤ −t ∀t ∈ R 0

dP

0

=⇒R(t, Q) ≥ R(t, Q )

dP

∀t ∈ R.

Hence, R(t, ·) is Schur concave for all t ∈ R.

Finally, Steps 4 and 6 guarantee that (i)⇐⇒(ii), and in this case (5.1) and (5.2) hold. Moreover, if (, A, P) is adequate, the same steps deliver (iv) =⇒ (i) =⇒ (iii) and (iii) =⇒ (ii) =⇒ (iv). Theorem 5.1 considers law-invariant quasiconvex risk measures that are upper semicontinuous with respect to bounded pointwise convergence (see Proposition 4.4). Jouini, Schachermayer, and Touzi (2006) show that law-invariant convex monetary risk measures are automatically lower semicontinuous with respect to bounded pointwise convergence, provided (, A, P) is standard. Whether this remains true for quasiconvex risk measures is left for future research (but see Proposition 4.5).

5.1. Mean Value Premium Principle Important examples of law-invariant quasiconvex risk measures that are continuous from below are those of the form (5.17)

ρ(X) = −1 (E P ((−X)))

∀X ∈ L∞ ,

762

S. CERREIA-VIOGLIO ET AL.

where is a strictly increasing and convex loss function. The characterization of these measures is a version of the classic Nagumo-Kolmogorov-de Finetti Theorem and relies on two additional properties: Constancy. ρ(m) = −m for all m ∈ R.22 Conditional consistency. Let A ∈ A and X, Y, Z ∈ L∞ , ρ (XI A) ≥ ρ (YI A) ⇐⇒ ρ (XI A + ZI Ac ) ≥ ρ (YI A + ZI Ac ) . The latter property is inspired by Savage (1954)’s “sure thing principle” and clearly hints at dynamic consistency (see, e.g., Ghirardato 2002).23 The seminal paper of Ellsberg (1961) shows how this assumption is noncontroversial only if agents think that P is a reliable model of the uncertainty they face.24 LEMMA 5.2. Let (, A, P) be a nonatomic probability space. A law-invariant risk measure ρ satisﬁes constancy, conditional consistency, and continuity with respect to bounded pointwise convergence if and only if there exists a strictly increasing and continuous : R → R such that ρ(X) = −1 (E P ((−X)))

∀X ∈ L∞ .

The function is unique up to strictly increasing afﬁne transformations, and it is convex if and only if ρ is quasiconvex. Proof . Sufﬁciency is trivial. Necessity reduces to check that the function M : D∞ → R deﬁned for each distribution with bounded support F = FX by

M(F) = −ρ(X) satisﬁes the assumptions of the Nagumo-Kolmogorov-de Finetti Theorem. For the sake of completeness we include such check. Let [a, b] be any closed interval in the real line and D(a, b) be the set of all simple probability distributions supported in [a, b]. The Dirac distribution concentrated in x is denoted by Dx . Constancy guarantees that: Step 1. For all x ∈ [a, b], M (Dx ) = x. Step 2. If F, G ∈ D(a, b), F ≥ G, and F = G, then M(F) < M (G). Proof of Step 2. Since (, A, P) is nonatomic there are two simple measurable functions X ≤ Y such that F = FX and G = FY . By monotonicity of ρ, M(F) ≤ M (G). Assume per contra that M(F) = M (G), that is ρ(X) = ρ(Y). If X = Y then F = G, which is absurd. Thus (again by nonatomicity) there exist n ∈ N, A1 ∈ A with P(A1 ) = 1/n, and x, y ∈ R such that X (ω) < x < y < Y (ω)

∀ω ∈ A1 .

Throughout the paper, ρ(m) is a little abuse for ρ(mI ). Indeed, in the Savagean perspective, these risk measures correspond to certainty equivalents of expected utility maximizers. 24 See, e.g., Maccheroni, Marinacci, and Rustichini (2006) for a recent discussion of this issue. 22 23

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

763

Therefore, X ≤ xI A1 + XI Ac1 ≤ yI A1 + XI Ac1 ≤ Y. By monotonicity of ρ, ρ xI A1 + XI Ac1 = ρ yI A1 + XI Ac1 , and so, by conditional consistency, ρ (xI A1 ) = ρ (yI A1 ) . Let A2 , . . . , An be such that {Ai }in=1 form a partition of with P(Ai ) = 1/n for all i . By law-invariance ρ (xI Ai ) = ρ (yI Ai )

∀i = 1, . . . , n.

Repeated applications of conditional consistency then deliver ρ (xI ) = ρ (xI A1 + xI A2 + xI A3 + · · · + xI An ) = ρ (yI A1 + xI A2 + xI A3 + · · · + xI An ) = ρ (yI A1 + yI A2 + xI A3 + · · · + xI An ) = · · · = ρ (yI ) ,

which is absurd by constancy.

Step 3. If F, G, H ∈ D(a, b), λ ∈ (0, 1), and M(F) = M (G), then M(λF + (1 − λ)H) = M (λG + (1 − λ) H). Proof of Step 3. For every λ ∈ [0, 1], since (, A, P) is nonatomic, there are X, Y, Z ∈ L∞ and A = Aλ ∈ A that are independent and such that F = FX , G = FY , H = FZ , and P(A) = λ (see, e.g., Billingsley 1995, theorem 5.3). Independence guarantees that FWI A+W I Ac = λFW + (1 − λ) FW if W, W ∈ {X, Y, Z}. −1 −1 −1 −1 If λ = 1/2, then FWI +W I Ac = 2 FW + 2 FW = 2 FW A−1 + 2 FW = FW I A+WI Ac . −1 −1 −1 Assume, per contra, M 2 F + 2 H = M 2 G + 2 H , then ρ (XI A + ZI Ac ) ≷ ρ (YI A + ZI Ac ), by conditional consistency and law-invariance ρ(X) = ρ (XI A + XI Ac ) ≷ ρ (YI A + XI Ac ) = ρ (XI A + YI Ac ) ≷ ρ (YI A + YI Ac ) = ρ (Y) , which contradicts M(F) = M (G). Thus the statement is true for λ = 2−1 . Induction guarantees that it is true for any dyadic rational. Continuity with respect to bounded pointwise convergence of ρ and the Skorohod Theorem (see, e.g., Billingsley 1995, theorem 25.6) guarantee that the statement is true for any λ. For all n ∈ N, M satisﬁes the properties described in Steps 1–3 on D (−n, n). By the Nagumo-Kolmogorov-de Finetti Theorem,25 for all n ∈ N there exists a unique strictly increasing continuous function φn : [−n, n] → R such that φ n (0) = 0 = φ n (1) − 1 and −1 M(F) = φn φn (x)d F(x) ∀F ∈ D (−n, n) . R

25

´ See, e.g., Hardy, Littlewood, and Polya (1934, theorem 215).

764

S. CERREIA-VIOGLIO ET AL.

DeďŹ ne Ď†(x) = Ď† n (x) if |x| â‰¤ n to obtain M(F) = Ď† âˆ’1 probability distribution, then,

R

Ď†(x)d F(x) for each simple

Ď (X) = âˆ’Ď† âˆ’1 (E P (Ď†(X))) for all simple and measurable X : â†’ R. Continuity with respect to bounded pointwise convergence yields the result for (Âˇ) = âˆ’Ď†(âˆ’Âˇ). Finally, if Ď is quasiconvex, Theorem 5.1 guarantees that Ď preserves second order stochastic dominance. Hence is convex. The converse is trivial. The next result builds on Rockafellar (1971) and explicitly evaluates R for risk measures that admit an expected loss representation. PROPOSITION 5.3. If : R â†’ R is a strictly increasing convex function and Ď (X) = âˆ’1 (E P ((âˆ’X))) for all X âˆˆ Lâˆž , then dQ âˆ€(t, Q) âˆˆ RĂ—M1 . R(t, Q) = âˆ’1 max xt âˆ’ E P âˆ— x xâ‰Ľ0 dP Observe that this amounts to say that R(t, Q) = t âˆ’ L(âˆ’t; Q, P), where L(w; Q, P) is the generalized distance between probability measures considered by Bellini and Frittelli (2002) and corresponding to an initial endowment w and a utility âˆ’(âˆ’Âˇ). Proof . Observe that (R) is an open half line (l, âˆž), with l = inf xâˆˆR (x). Then âˆ’1 can be extended to an extended-valued continuous and monotone function from [âˆ’âˆž, âˆž] to [âˆ’âˆž, âˆž] by setting âˆ’1 (x) = âˆ’âˆž if x â‰¤ l and âˆ’1 (âˆž) = âˆž. For all (t, Q) âˆˆ RĂ—M1 , R(t, Q) = inf âˆ’1 (E P ((âˆ’X))) : E Q (âˆ’X) = t = âˆ’1 inf E P ((âˆ’X)) : E Q (âˆ’X) = t . Set Ď†(Âˇ) = âˆ’(âˆ’Âˇ). Then inf E P ((âˆ’X)) : E Q (âˆ’X) = t = inf âˆ’E P (âˆ’(âˆ’X)) : E Q (âˆ’X) = t = âˆ’ sup E P (Ď†(X)) : E Q (X) = âˆ’t . However, the function (X) = E P (Ď†(X)) for all X âˆˆ Lâˆž is concave, continuous, and monotone. Then, it follows immediately from lemma 19 of Cerreia-Vioglio et al. (2008b) and corollary 2A of Rockafellar (1971) that sup E P (Ď†(X)) : E Q (X) = âˆ’t dQ . = min [x(âˆ’t) âˆ’ âˆ— (xQ)] = min x(âˆ’t) âˆ’ E P Ď† âˆ— x xâ‰Ľ0 xâ‰Ľ0 dP Thus, dQ âˆ— R(t, Q) = âˆ’Ď† min x(âˆ’t) âˆ’ E P Ď† x xâ‰Ľ0 dP dQ = âˆ’1 max xt âˆ’ E P âˆ— x , xâ‰Ľ0 dP âˆ’1

as wanted.

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

765

5.2. Robust Mean Value Premium Principle We conclude by introducing a new class of law-invariant quasiconvex risk measures. Consider a supervising agency whose decisions are taken by several supervisors 1, . . . , k, all of them using the mean value premium principle to rank risks. The cautious nature of the supervising task suggests that: “a reserve for position X is deemed acceptable if and only if all supervisors agree.”

Clearly, the minimum reserve that complies with this criterion is ρ(X) = min m ∈ R : m ≥ i−1 (E P (i (−X)))

∀i = 1, . . . , k = max i−1 (E P (i (−X))) . i =1,...,k

In light of this observation, in this section we study the quasiconvex risk measures deﬁned by ρL (X) = max −1 (E P ((−X))) ∈L

∀X ∈ L∞

where L is a compact convex (or ﬁnite) set of strictly increasing and convex loss functions.26 PROPOSITION 5.4. Let L be a nonempty compact and convex set of strictly increasing and convex functions. Then ρL : L∞ → R has the following properties: (a) it is a well-deﬁned, continuous with respect to bounded pointwise convergence, and quasiconvex risk measure that satisﬁes constancy and preserves second order stochastic dominance; (b) for each X ∈ L∞ , ρL (X) = max∈ext(L) −1 (E P ((−X))), where ext (L) is the set of extreme points of L; (c) RL (t, Q) = max∈L −1 (maxx≥0 [xt − E P (∗ (x dd Q ))]) for all (t, Q) ∈ R × M1 . P Proof . Given a closed and convex set C in L∞ . Consider the function : C × L → R deﬁned by

(X, ) = −1 (E P ((−X))) ∀(X, ) ∈ C × L. When C = L∞ , then ρL (X) = max∈L (X, ) for all X ∈ L∞ . Step 1. For each X ∈ C the function (X, ·) : L → R is continuous and quasiafﬁne.27 Proof of Step 1. In order to prove the statement, it is enough to prove that, for each c ∈ R, the sets { ∈ L : (X, ) ≥ c} and { ∈ L : (X, ) ≤ c} are closed and convex. Notice that, if ∈ L, (5.20)

(X, ) ≥ c ⇔ −1 (E P ((−X))) ≥ c ⇔ E P ((−X)) ≥ (c).

26 Here, C(R) is endowed with the compact convergence topology, deﬁned by the family of seminorms ϕ K ( f ) = max x∈K | f (x)| where K ranges over all compact subsets of R. This locally convex topology is maxx∈[−n,n] | f (x)−g(x)| metrizable by d( f , g) = ∞ n=1 2n (1+maxx∈[−n,n] | f (x)−g(x)|) . Subsets of C(R) are endowed with the inherited topology. 27 That is both quasiconvex and quasiconcave.

766

S. CERREIA-VIOGLIO ET AL.

Assume that n âˆˆ { Îł (X, Âˇ) â‰Ľ c} for all n âˆˆ N and n â†’ . Then n (c) â†’ (c) and E P (n (âˆ’X)) â†’ E P ((âˆ’X)).28 This, matched with (5.20), implies that E P ((âˆ’X)) â‰Ľ (c) and âˆˆ {(X, Âˇ) â‰Ľ c}. Thus {(X, Âˇ) â‰Ľ c} is closed. Next, consider 1 , 2 âˆˆ {(X, Âˇ) â‰Ľ c} and Îť âˆˆ (0, 1). By (5.20), E P (1 (âˆ’X)) â‰Ľ 1 (c) and E P (2 (âˆ’X)) â‰Ľ 2 (c), then

E P ((Îť1 + (1 âˆ’ Îť) 2 ) (âˆ’X)) = ÎťE P (1 (âˆ’X)) + (1 âˆ’ Îť) E P (2 (âˆ’X)) â‰Ľ Îť1 (c) + (1 âˆ’ Îť) 2 (c) = (Îť1 + (1 âˆ’ Îť) 2 ) (c) that is, Îť1 + (1 âˆ’ Îť)2 âˆˆ {(X, Âˇ) â‰Ľ c}. Thus, {(X, Âˇ) â‰Ľ c} is convex. The same arguments yield closure and convexity of {(X, Âˇ) â‰¤ c}. Step 2. Ď L is a well-deďŹ ned risk measure that satisďŹ es constancy and that preserves second order stochastic dominance. Proof of Step 2. Let C = Lâˆž , and arbitrarily choose X âˆˆ Lâˆž . Since L is compact and (X, Âˇ) is continuous, then Ď L (X) = maxâˆˆL (X, ) is well deďŹ ned. If m âˆˆ R then âˆ’1 (E P ( (âˆ’m))) = âˆ’m for all âˆˆ L and hence Ď L (m) = âˆ’m. Finally, if X ssd Y, then âˆ’X icx âˆ’ Y, and âˆ’1 (E P ((âˆ’X))) â‰¤ âˆ’1 (E P ((âˆ’Y))) for all âˆˆ L so that Ď L (X) â‰¤ Ď L (Y). This implies that Ď L preserves second order stochastic dominance (and a fortiori Ď L is decreasing and hence a risk measure). Step 3. For each âˆˆ L the function (Âˇ, ) : C â†’ R is continuous and quasiconvex. The proof is trivial. Moreover, taking C = Lâˆž , since suprema of lower semicontinuous and quasiconvex functions are lower semicontinuous and quasiconvex, Step 3 implies the following: Step 4. Ď L is lower semicontinuous and quasiconvex. Step 5. Ď L is continuous with respect to bounded pointwise convergence. Proof of Step 5. By Proposition 4.5, Step 2, and Step 4, it is enough to prove that Ď L is continuous from below. If Xn X, the Dominated Convergence Theorem guarantees that (Xn , ) (X, ) for all âˆˆ L. This is equivalent to say that (Xn , Âˇ) (X, Âˇ) pointwise on L. By Step 1 and Diniâ€™s Theorem (Aliprantis and Border 2006, theorem 2.66), (Xn , Âˇ) â†’ (X, Âˇ) uniformly on L and, in particular, maxâˆˆL (Xn , ) â†’ maxâˆˆL (X, ), that is Ď L (Xn ) â†’ Ď L (X). Step 6. For each X âˆˆ Lâˆž , Ď L (X) = maxâˆˆext(L) âˆ’1 (E P ((âˆ’X))), where ext (L) is the set of extreme points of L. Proof of Step 6. The statement follows from the Bauer Maximum Principle (Aliprantis and Border 2006, corollary 7.75), after observing that (X, Âˇ) is explicitly quasiconvex. In fact, (X, 1 ) < (X, 2 ) implies (X, Îť1 + (1 âˆ’ Îť)2 ) < (X, 2 ) for all 1 , 2 âˆˆ L âˆ’1 and Îť âˆˆ (0, 1). SpeciďŹ cally, âˆ’1 1 (E P (1 (âˆ’X))) < 2 (E P (2 (âˆ’X))) implies E P (1 (âˆ’X)) < 1 âˆ’1 2 (E P (2 (âˆ’X)))

Take a version of X such that âˆ’X() âŠ† [a, b], since n â†’ uniformly on [a, b], then 0 â‰¤ (âˆ’X)|d P â‰¤ supĎ‰âˆˆ |n (âˆ’X(Ď‰)) âˆ’ (âˆ’X(Ď‰))| â‰¤ supxâˆˆ[a,b] |n (x) âˆ’ (x)| â†’ 0. 28

|n (âˆ’X) âˆ’

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

767

thus

E P ((Îť1 + (1 âˆ’ Îť) 2 ) (âˆ’X)) = ÎťE P (1 (âˆ’X)) + (1 âˆ’ Îť) E P (2 (âˆ’X)) âˆ’1 < Îť1 âˆ’1 2 (E P (2 (âˆ’X))) + (1 âˆ’ Îť) 2 2 (E P (2 (âˆ’X))) = (Îť1 + (1 âˆ’ Îť) 2 ) âˆ’1 2 (E P (2 (âˆ’X))) hence (Îť1 + (1 âˆ’ Îť)2 )âˆ’1 (E P ((Îť1 +(1 âˆ’Îť)2 )(âˆ’X))) < âˆ’1 2 (E P (2 (âˆ’X))), as wanted. Step 7. RL (t, Q) = maxâˆˆL âˆ’1 (maxxâ‰Ľ0 [xt âˆ’

E P (âˆ— (x dd QP ))])

for all (t, Q) âˆˆ R Ă— M1 .

Proof of Step 7. By Step 2, Step 4, Step 5, and Lemma 3.2, it follows that RL (t, Q) = inf Ď L (Y) : E Q (âˆ’Y) = t

âˆ€(t, Q) âˆˆ R Ă— M1 .

By Step 1, Step 3, the Sion Minimax Theorem (Sion 1958, corollary 3.3), and Proposition 5.3, it follows that RL (t, Q) =

max (X, )

inf

Xâˆˆ{YâˆˆLâˆž :E Q (âˆ’Y)=t } âˆˆL

= max

inf

(X, )

= max

inf

âˆ’1 (E P ((âˆ’X)))

âˆˆL Xâˆˆ{YâˆˆLâˆž :E Q (âˆ’Y)=t } âˆˆL Xâˆˆ{YâˆˆLâˆž :E Q (âˆ’Y)=t } âˆ’1

= max âˆˆL

dQ âˆ— max xt âˆ’ E P x . xâ‰Ľ0 dP

Finally, (a) follows from Steps 2, 4, and 5, (b) is Step 6, (c) is Step 7.

In particular, observe that the case where L ďŹ nite is encompassed by point (b).

6. FINAL REMARKS Though for mathematical convenience we considered quasiconvex risk measures deďŹ ned on Lâˆž (, A, P), a parallel analysis can be carried out in any function space with unit,29 like, for example, the space B(, A) of bounded and measurable functions and the space Cb () of bounded and continuous functions (provided is a topological space). A more delicate case is the one of quasiconvex risk measures deďŹ ned on L p (, A, P) for p âˆˆ [1, âˆž); here the results of Cerreia-Vioglio et al. (2008b) no longer apply directly. In view of the importance of these spaces in mathematical ďŹ nance, in the Appendix we show that a more general version of the key Lemma 3.2 can be proved for these spaces. This is another novel technical contribution of this paper.

29 That is, in any Riesz space of functions with order unit and endowed with the supnorm. This is the general setup of Cerreia-Vioglio et al. (2008b).

768

S. CERREIA-VIOGLIO ET AL.

APPENDIX: UNBOUNDED RANDOM VARIABLES In this appendix, we generalize Lemma 3.2 to any L p space with p ≥ 1. The generalization goes in two directions: unbounded random variables are allowed and the requirement of upper semicontinuity is relaxed. We adhere to the notation and terminology previously used. For p ∈ [1, ∞), the topological dual L p (, A, P)∗ of L p (, A, P) is isometrically isomorphic to Lq (, A, P) with q −1 + p−1 = 1, which in turn can be identiﬁed with the subspace of all countably additive set functions on A that are absolutely continuous with respect to P and whose Radon-Nykodim derivative is q-integrable. The subset of countably additive probabilities with q-integrable density is denoted by

dQ M1,q = Q ∈ M1 : ∈ Lq dP

∀ p ∈ [1, ∞)

while M1,q = M1, f if p = ∞. Like in the p = ∞ case, given X ∈ L p (, A, P) and μ ∈ L p (, A, P)∗ , we indifferently write: μ(X), Xdμ (= X ddμP d P), or even Eμ (X) if μ can be identiﬁed with an element of M1,q (, A, P). A subset C of L p , is evenly convex if and only if for each X¯ ∈ C there exists a linear and continuous functional μ on L p such that ¯ < μ(X) μ( X)

∀X ∈ C.

Clearly, evenly convex sets are convex and the intersections of evenly convex sets are evenly convex. Moreover, by standard separation results, it follows that if a set is open (or closed) and convex, then it is evenly convex. A risk measure now is a decreasing function ρ: L p → [−∞, ∞] and we consider the following quasiconvexity property: Even Quasiconvexity. The set {X ∈ L p : ρ(X) ≤ α} is evenly convex for all α ∈ R. Clearly, evenly quasiconvex risk measures are quasiconvex and it is easy to show that quasiconvex upper semicontinuous risk measures (the ones considered in Lemma 3.2) are evenly quasiconvex. In order toprovide a generalization of Lemma 3.2, we have to introduce a new class of functions: R R × M1,q for all q ∈ [1, ∞]. Set is R = R \ {0}. A subset C of R × M1,q ¯ ¯ evenly convex if and only if for each (t , Q) ∈ R × M1,q \ C there exists (s, X) ∈ R × L p such that t¯s + E Q¯ (X) < ts + E Q (X)

∀(t, Q) ∈ C.

∞] is -evenly quasiconcave if and only Analogously, a function R : R × M1,q → [−∞, if the set (t, Q) ∈ R × M : R(t, Q) ≥ α is -evenly convex for all α ∈ R. Finally, 1,q R R × M1,q denotes the class of functions R : R×M1,q → [−∞, ∞] that are evenly quasiconcave, increasing in the ﬁrst component, and such that inf t∈R R(t, Q) = inf t∈R R(t, Q ) for all Q, Q ∈ M1,q .30

30

For p = ∞, R(R × M1, f ) ⊇ R0 (R × M1, f ) ⊇ R1 (R × M1, f ).

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

769

We are now ready to state the anticipated generalization of Lemma 3.2. Let us remark that while the case p = ∞ follows from the results of Cerreia-Vioglio et al. (2008b), this is not the case if p ∈ [1, ∞). p THEOREM A.1. A function ρ: ∞] is an evenly quasiconvex risk measure if L → [−∞, and only if there exists R ∈ R R × M1,q such that

(A.1)

ρ(X) = sup R(E Q (−X), Q)

∀X ∈ L p .

Q∈M1,q

The function R ∈ R R × M1,q for which (A.1) holds is unique and satisﬁes (A.2)

R(t, Q) = inf ρ(X) : E Q (−X) = t

∀(t, Q) ∈ R × M1,q .

Proof . If p = ∞, notice that L∞ is a normed Riesz space with order unit I , M1, f is the positive unit ball of its topological dual, and −ρ is an evenly quasiconcave and monotone increasing function. The statement then follows from lemma 8 and theorem 2 of Cerreia-Vioglio et al. (2008b). Else, if p ∈ [1, ∞), then L p is a normed Riesz space, but it does not admit an order unit and M1,q is not the positive unit ball of its topological dual. Therefore, we cannot invoke the previous results. Next we provide a direct proof. “Only if.” Suppose ρ is evenly quasiconvex and deﬁne R by (A.2). By the argument we used in footnote 21, R(t, Q) = inf ρ (Y) : E Q (−Y) ≥ t

∀(t, Q) ∈ R × M1,q .

In particular, for each X ∈ L p (A.3)

ρ(X) ≥ R(E Q (−X), Q)

∀Q ∈ M1,q

thus (A.4)

ρ(X) ≥ sup R(E Q (−X), Q)

∀X ∈ L p .

Q∈M1,q

Fix X¯ ∈ L p . If X¯ is not a global minimum, then there is a sequence {rn }n∈N ⊆ R that ¯ and such that X¯ ∈ tends to ρ( X) / {ρ ≤ rn } = ∅ for all n ∈ N. Let k ∈ N. Since {ρ ≤ rk } is ¯ < μk (X) for evenly convex, there exists μk in the topological dual of L p such that μk ( X) all X ∈ {ρ ≤ rk }. Without loss of generality, μk can be identiﬁed with an element Qk of 31 p ¯ , that is, {ρ > rk } ⊇ } {ρ ⊆ X ∈ L . It follows that, ≤ r : E (−X) < E (− X) M 1,q k Q Q k k ¯ . Thus, R E Qk (− X), ¯ Qk ≥ rk and X ∈ L p : E Qk (−X) ≥ E Qk (− X)

p

First, observe that μk is a positive linear functional. Let X ∈ {ρ ≤ rk }, for each Y ∈ L+ and all n ∈ N, ¯ < μk (X + nY) = μk (X) + nμk (Y) for all ρ(X + nY) ≤ ρ(X) and X + nY ∈ {ρ ≤ rk }. Therefore, μk ( X) (Y) ≥ 0. Moreover, denoting by dμ /d P the Lq representative of μk , μk = 0 implies n ∈ N. This implies μ k k μk (I ) = (dμk /d P)d P = dμk /d P1 > 0. Finally, set Qk = μk /μk (I ). 31

770

S. CERREIA-VIOGLIO ET AL.

¯ ≥ sup R E Q (− X), ¯ Q ≥ R E Qk (− X), ¯ Qk ≥ r k ρ( X) Q∈M1,q

¯ is for all k ∈ N. Passing to the limits, this implies equality in (A.4), which is obvious if X a global minimum. . Let Q ∈ M1,q . If t ≥ t then remains to show that R ∈ R R × M1,q It only p p X ∈ L : E Q (−X) ≥ t ⊆ X ∈ L : E Q (−X) ≥ t . This implies that R(t, Q) ≥ R(t , Q). Next, observe that R(t, Q) ≥ inf X∈L p ρ(X) for all (t, Q) ∈ R × M1,q . Hence, p inf t∈R R(t, Q) ≥ inf X∈L p ρ(X) for all Q ∈ M1,q . Conversely, consider {X n }n∈N ⊆ L such that ρ(Xn ) ↓ inf X∈L p ρ(X). For all Q ∈ M1,q , ρ(Xn ) ≥ R E Q (−Xn ), Q ≥ inf t∈R R(t, Q) for all n ∈ N. This implies that inf t∈R R(t, Q) ≤ inf X∈L p ρ(X) for all Q ∈ M1,q . It follows that inf R(t, Q) = inf p ρ(X) = inf R(t, Q ) t∈R

X∈L

t∈R

∀Q, Q ∈ M1,q .

Finally, we show that (t, Q)→ R(t, Q) is -evenly quasiconcave. Fix α ∈ R and deﬁne Uα = (t, Q) ∈ R × M1,q : R(t, Q) ≥ α . ¯ ∈ R × M1,q such that Suppose that U α is neither empty nor R × M1,q . Pick (t¯, Q) ¯ ¯ (t¯, Q) ∈ / Uα . Then, it follows that R(t¯, Q) < α. This implies that there exists X¯ ∈ L p such ¯ ≥ t¯ and ρ( X) ¯ < α. But R(t, Q) ≥ α for all (t, Q) ∈ U α , which implies that that E Q¯ (− X) ¯ E Q (− X) < t for all (t, Q) ∈ U α .32 This, in turn, implies that ¯ ≤ 0 < t + E Q ( X) ¯ ∀(t, Q) ∈ Uα t¯ + E Q¯ ( X) as wanted. “If.” Suppose (A.1) holds. We ﬁrst prove that ρ is evenly quasiconvex. Pick α ∈ R. We prove that {ρ ≤ α} is evenly convex. If {ρ ≤ α} = L p or {ρ ≤ α} = ∅ then there ¯ / {ρ ≤ α}. By (A.1), there exists Q¯ ∈ M1,q for is nothing to prove. Otherwise, let X ∈ ¯ ¯ which R E Q¯ (− X), Q > α. Let X ∈ {ρ ≤ α}. Suppose, by contradiction, that E Q¯ (X) ≤ ¯ Then, since R is increasing in the ﬁrst component, ρ(X) ≥ R E Q¯ (−X), Q¯ ≥ E Q¯ ( X). ¯ Q¯ > α, a contradiction. In other words, there exists Q¯ ∈ M1,q such that R E Q¯ (− X), ¯ < E Q¯ (X) for all X ∈ {ρ ≤ α}. Next, suppose that X, Y ∈ L p and X ≥ Y. E Q¯ ( X) Then, E Q (X) ≥ E Q (Y) for all Q ∈ M1,q and R(E Q (−X), Q) ≤ R E Q (−Y), Q for all Q ∈ M1,q . By (A.1), ρ(X) ≤ ρ(Y), proving that ρ is a risk measure. Finally, assume that ρ admits representation (A.1) for some R ∈ R R × M1,q . In order to prove uniqueness it is sufﬁcient to show that R satisﬁes (A.2). ¯ ∈ R × M1,q , Claim. For each (t¯, Q) ¯ R(t¯, Q) = sup inf R(E Q (−X), Q) . Q∈M1,q X∈{Y∈L p :E Q¯ (−Y)≥t¯}

Proof of the Claim. Consider the program ¯ t¯ = π Q, Q, inf (A.5) R(E Q (−X), Q) X∈{Y∈L p :E Q¯ (−Y)≥t¯} 32

¯ ≥ t for some (t , Q ) ∈ U α would imply R(t , Q ) ≤ ρ( X) ¯ < α, a contradiction. E Q (− X)

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

771

¯ t¯ ≤ π ( Q, ¯ Q, ¯ t¯) = R(t¯, Q) ¯ for all with Q ∈ M1,q . It is sufﬁcient to show that π Q, Q, Q ∈ M1,q . For the second equality just notice that, since R is increasing in the ﬁrst component, ¯ Q, ¯ t¯) = ¯ π ( Q, inf R E Q¯ (−X), Q¯ ≥ R(t¯, Q). p X∈{Y∈L :E Q¯ (−Y)≥t¯} ¯ = t¯ implying the ¯ ∈ L p such that E Q¯ −Y Furthermore, since Q¯ ∈ M1,q , there exists Y inverse inequality. Next, ﬁx Q ∈ M1,q . We have two cases: ¯ . Then, Q = α Q¯ for some α ∈ R. Since Q, Q¯ ∈ M1,q , we have • Suppose Q ∈ span Q ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ that α = 1. It follows that Q = Q and π Q, Q, t = π ( Q, Q, t ) = R(t , Q). ¯ . By the Fundamental Theorem of Duality (see, e.g., Alipran• Suppose Q ∈ span Q tis and Border 2006), ker Q¯ ker (Q). That is, there exists Z ∈ L p such that ¯ = −t¯, then the straight ¯ ∈ L p such that E Q¯ Y E Q¯ (Z) = 0 and E Q (Z) = 0. Choose Y ¯ + α Z is thus included into the hyperplane Y ∈ L p : E Q¯ (−Y) = t¯ . Hence, line Y since R belongs to R R × M1,q , ¯ − α Z , Q = inf R(t, Q) = inf R t, Q¯ ≤ R(t¯, Q). ¯ ¯ t¯ ≤ inf R E Q −Y π Q, Q,

α∈R

t∈R

t∈R

¯ t¯ ≤ R(t¯, Q) ¯ for all Q ∈ M1,q and π ( Q, ¯ Q, ¯ t¯) = R(t¯, Q). ¯ In sum, π Q, Q, ¯ ∈ R × M1,q . Observe that Let (t¯, Q) inf ρ(X) : E Q¯ (−X) = t¯ = inf ρ(X) : E Q¯ (−X) ≥ t¯ =

sup R(E Q (−X), Q).

inf

X∈{Y∈L p :E Q¯ (−Y)≥t¯} Q∈M1,q

¯ = sup Q∈M (inf X∈{Y∈L p :E ¯ (−Y)≥t¯} R(E Q (−X), Q)). The general By the Claim, R(t¯, Q) Q 1,q maxmin inequality implies (A.6) ¯ R(t¯, Q) = sup

inf

Q∈M1,q X∈{Y∈L p :E Q¯ (−Y)≥t¯}

R(E Q (−X), Q) ≤

inf

sup R(E Q (−X), Q),

X∈{Y∈L p :E Q¯ (−Y)≥t¯} Q∈M1,q

¯ = sup(t,Q)∈R×M R(t, Q), the equalit remains to prove the converse inequality. If R(t¯, Q) 1,q ¯ We have α < ∞. Moreover, ity in (A.6) is easily checked. Otherwise, set α = R(t¯, Q). for each scalar β > α, Uβ = (t, Q) ∈ R × M1,q : R(t, Q) ≥ β is -evenly convex and ¯ ∈ Uβ . If β is small enough then U β is neither empty nor R × M1,q .33 Therefore, (t¯, Q) there are X¯ ∈ L p and s = 0 such that (A.7)

¯ < st + E Q ( X) ¯ st + E Q¯ ( X)

∀(t, Q) ∈ Uβ .

Since R is increasing in the ﬁrst component, it is easy to see that s > 0.34 Set λ = −1 ˆ = −t¯, and for each (t, Q) ∈ U β ¯ ˆ ¯ −t − E Q¯ s X and X = s −1 X¯ + λ. It follows that E Q¯ ( X) 33 Take (t , Q ) such that R(t , Q ) > R(t¯, Q) ¯ and set γ = R(t , Q ). For all β ∈ (α, γ ), (t , Q ) ∈ U β and ¯ ∈ (t¯, Q) / Uβ . 34 If per contra s < 0, take (t , Q ) ∈ U , then by monotonicity (t + n, Q ) ∈ U for all n ∈ N. Therefore, β β ¯ > st + E Q¯ ( X) ¯ for all n ∈ N, which is absurd. it would follow that st + sn + E Q ( X)

772

S. CERREIA-VIOGLIO ET AL.

¯ > st + E Q¯ ( X) ¯ =⇒ E Q (s −1 X¯ + λ) + t > E Q¯ (s −1 X¯ + λ) + t¯ st + E Q ( X) ˆ + t > E Q¯ ( X) ˆ + t¯ =⇒ E Q ( X) ˆ + t > 0. =⇒ E Q ( X) ˆ + t then (t, Q) ∈ Uβ . For each Q ∈ M1,q , set Therefore, it follows that if 0 ≥ E Q ( X) ˆ ˆ ˆ Q) ∈ Uβ for all Q ∈ tQ = E Q (− X), then E Q ( X) + tQ = 0. Therefore, (tQ , Q) = (E Q (− X), ˆ ˆ M1,q . This implies R(E Q (− X), Q) < β for all Q ∈ M1,q . Since X ∈ {Y ∈ L p : E Q¯ (−Y) ≥ t¯}, we have that α≤

inf

ˆ Q) ≤ β. sup R(E Q (−X), Q) ≤ sup R(E Q (− X),

X∈{Y∈L p :E Q¯ (−Y)≥t¯} Q∈M1,q

Q∈M1,q

This is true for each β in a right neighborhood of α, thus inf X∈{Y∈L p :EQ¯ (−Y)≥t¯} sup Q∈M1,q R(E Q (−X), Q) = α, as desired.

REFERENCES

ACZEL, J., and J. DHOMBRES (1989): Functional Equations in Several Variables, Cambridge: Cambridge University Press. ALIPRANTIS, C. D., and K. C. BORDER (2006): Inﬁnite Dimensional Analysis, 3rd ed. Berlin: Springer Verlag. ARTZNER, P., F. DELBAEN, J. EBER, and D. HEATH (1997): Thinking Coherently, RISK 10, 68–71. ARTZNER, P., F. DELBAEN, J. EBER, and D. HEATH (1999): Coherent Measures of Risk, Math. Finance 9, 203–228. BELLINI, F., and M. FRITTELLI (2002): On the Existence of Minimax Martingale Measures, Math. Finance 12, 1–21. BLACK, F. (1972): Capital Market Equilibrium with Restricted Borrowing, J. Business 45, 444– 455. BILLINGSLEY, P. (1995): Probability and Measure, New York: John Wiley & Sons. CABALLERO, R. J., and A. KRISNAMURTHY (2008): Knightian Uncertainty and Its Implications for the TARP, Financial Times, November 28. CARR, P., H. GEMAN, and D. B. MADAN (2001): Pricing and Hedging in Incomplete Markets, J. Finan. Econ. 62, 131–167. CERREIA-VIOGLIO, S., F. MACCHERONI, M. MARINACCI, and L. MONTRUCCHIO (2008a): Uncertainty Averse Preferences, Carlo Alberto Notebook 77. CERREIA-VIOGLIO, S., F. MACCHERONI, M. MARINACCI, and L. MONTRUCCHIO (2008b): Complete Monotone Quasiconcave Duality, Carlo Alberto Notebook 80. ´ and their CHONG, K. M. (1974): Some Extensions of a Theorem of Hardy, Littlewood and Polya Applications, Can. J. Math. 26, 1321–1340. CHONG, K. M., and N. M. RICE (1971): Equimeasurable Rearrangements of Functions, Queen’s Papers Pure Appl. Math. 28, 1–177. DAL MASO, G. (1993): An Introduction to -convergence, Boston: Birkh¨auser. DANA, R. A. (2005): A Representation Result for Concave Schur Concave Functions, Math. Finance 15, 613–634. DEBREU, G. (1959): Theory of Value, New Haven: Yale University Press.

RISK MEASURES: RATIONALITY AND DIVERSIFICATION

773

FINETTI, B. (1931): Sul Concetto di Media, Giornale dell’Istituto Italiano degli Attuari 2, 369–396.

DE

EL KAROUI, N., and M. C. QUENEZ (1997): Non-Linear Pricing Theory and Backward Stochastic Differential Equations, in Lecture Notes in Mathematics, Vol. 1656, W. Runggaldier, ed. New York: Springer. EL KAROUI, N., and C. RAVANELLI (2009): Cash Subadditive Risk Measures and Interest Rate Ambiguity, Math. Finance, 19, 561–590. ELLSBERG, D. (1961): Risk, Ambiguity, and the Savage Axioms, Quart. J. Econ. 75, 643–669. FILIPOVIC´ , D., and KUPPER, M. (2008): Equilibrium Prices for Monetary Utility Functions, Int. J. Theor. Appl. Finance 11, 325–343. ¨ , H., and A. SCHIED (2002): Convex Measures of Risk and Trading Constraints, Finance FOLLMER Stoch. 6, 429–447. ¨ , H., and A. SCHIED (2004): Stochastic Finance: An Introduction in Discrete Time, 2nd FOLLMER ed., Berlin: De Gruyter. ¨ , H., and A. SCHIED (2010): Convex Risk Measures, in Encyclopedia of Quantitative FOLLMER Finance, R. Cont, ed. New York: John Wiley & Sons, pp. 355–363. FRITTELLI, M., and E. ROSAZZA GIANIN (2002): Putting Order in Risk Measures, J. Bank. Finance 26, 1473–1486. FRITTELLI, M., and E. ROSAZZA GIANIN (2004): Dynamic Convex Risk Measures, in New Risk ¨ ed. New York: John Wiley & Sons. Measures for the 21st Century, G. Szego, FRITTELLI, M., and E. ROSAZZA GIANIN (2005): Law-Invariant Convex Risk Measures, Adv. Math. Econ. 7, 33–46. GHIRARDATO, P. (2002): Revisiting Savage in a Conditional World, Econ. Theory 20, 83–92. GILBOA, I., and D. SCHMEIDLER (1989): Maxmin Expected Utility with Non-Unique Prior, J. Math. Econ. 18, 141–153. ´ (1934): Inequalities, Cambridge: Cambridge UniHARDY, G., J. E. LITTLEWOOD, and G. POLYA versity Press. JOUINI, E., W. SCHACHERMAYER, and N. TOUZI (2006): Law Invariant Risk Measures Have the Fatou Property, Adv. Math. Econ. 9, 49–71. JOUINI, E., W. SCHACHERMAYER, and N. TOUZI (2008): Optimal Risk Sharing for Law Invariant Monetary Utility Functions, Math. Finance 18, 269–292. KUPPER, M., and W. SCHACHERMAYER (2008): Representation Results for Law Invariant Time Consistent Functions, preprint, Vienna Institute of Finance, Vienna. KOLMOGOROV, A. N. (1930): Sur la Notion de la Moyenne, Atti della R. Accademia Nazionale dei Lincei 12, 388–391. KUSUOKA, S. (2001): On Law Invariant Coherent Risk Measures, Adv. Math. Econ. 3, 83–95. LEITNER, J. (2005): A Short Note on Second-Order Stochastic Dominance Preserving Coherent Risk Measures, Math. Finance 15, 649–651. LUXEMBURG, W. A. J. (1967): Rearrangement-Invariant Banach Function Spaces, Queen’s Papers Pure Appl. Math. 10, 83–144. MACCHERONI, F., M. MARINACCI, and A. RUSTICHINI (2006): Ambiguity Aversion, Robustness, and the Variational Representation of Preferences, Econometrica 74, 1447–1498. MARINACCI, M., and L. MONTRUCCHIO (2004): Introduction to the Mathematics of Ambiguity, in Uncertainty in Economic Theory, I. Gilboa, ed. New York: Routledge. MARSHALL, A. W., and I. OLKIN (1979): Inequalities: Theory of Majorization and its Applications, New York: Academic Press.

774

S. CERREIA-VIOGLIO ET AL.

¨ NAGUMO, M. (1930): Uber eine Klasse der Mittelwerte, Jpn. J. Math. 7, 71–79. ROCKAFELLAR, R. T. (1971): Integrals Which Are Convex Functionals II, Paciﬁc J. Math. 39, 439–469. ROTAR, V. I. (2007): Actuarial Models: The Mathematics of Insurance, Boca Raton: CRC Press. ROTHSCHILD, M., and J. E. STIGLITZ (1970): Increasing Risk: I. A Deﬁnition, J. Econ. Theory 2, 225–243. SAVAGE, L. J. (1954): The Foundations of Statistics, New York: John Wiley & Sons. SCHMEIDLER, D. (1989): Subjective Probability and Expected Utility without Additivity, Econometrica 57, 571–587. SION, M. (1958): On General Minimax Theorems, Paciﬁc J. Math. 8, 171–176. STAUM, J. (2004): Fundamental Theorems of Asset Pricing for Good Deal Bounds, Math. Finance 14, 141–161. TUY, H. (1974): On a General Minimax Theorem, Soviet Math. Dokl. 15, 1689–1693. YOSIDA, K. (1980): Functional Analysis, New York: Springer.

Mathematical Finance, Vol. 21, No. 4 (October 2011), 643–679

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES ANDREW E. B. LIM University of California J. GEORGE SHANTHIKUMAR Purdue University THAISIRI WATEWAI Chulalongkorn University

In this paper, we introduce a new approach for ﬁnding robust portfolios when there is model uncertainty. It differs from the usual worst-case approach in that a (dynamic) portfolio is evaluated not only by its performance when there is an adversarial opponent (“nature”), but also by its performance relative to a stochastic benchmark. The benchmark corresponds to the wealth of a ﬁctitious benchmark investor who invests optimally given knowledge of the model chosen by nature, so in this regard, our objective has the ﬂavor of min–max regret. This relative performance approach has several important properties: (i) optimal portfolios seek to perform well over the entire range of models and not just the worst case, and hence are less pessimistic than those obtained from the usual worst-case approach; (ii) the dynamic problem reduces to a convex static optimization problem under reasonable choices of the benchmark portfolio for important classes of models including ambiguous jump-diffusions; and (iii) this static problem is dual to a Bayesian version of a single period asset allocation problem where the prior on the unknown parameters (for the dual problem) correspond to the Lagrange multipliers in this duality relationship. This dual static problem can be interpreted as a less pessimistic alternative to the single period worst-case Markowitz problem. More generally, this duality suggests that learning and robustness are closely related when benchmarked objectives are used. KEY WORDS: ambiguity, model uncertainty, relative performance measure, relative regret, regret, robust portfolio selection, robust control, convex duality, Bayesian models.

1. INTRODUCTION This paper concerns optimal asset allocation when there is model uncertainty (or ambiguity). More speciﬁcally, we propose a characterization of robustness which is less pessimistic than worst-case expected utility, and use it as the basis for ﬁnding robust portfolios. We show that portfolios obtained using our model are substantially This work is supported in part by an NSF CAREER Award DMI-0348746 (Lim) and the NSF Grant DMI-0500503 (Lim and Shanthikumar). Nevertheless, the opinions, ﬁndings, conclusions, and recommendations expressed in this material are those of the authors and do not necessarily reﬂect the views of the National Science Foundation. Support from the NUS-Berkeley Risk Management Institute at the National University of Singapore and the Coleman Fung Chair in Financial Modelling (Lim) is also acknowledged. Manuscript received May 2008; ﬁnal revision received October 2009. Address correspondence to Andrew E.B. Lim, Industrial Engineering and Operations Research, University of California, Berkeley, CA; e-mail: lim@ieor.berkeley.edu. DOI: 10.1111/j.1467-9965.2010.00448.x C 2010 Wiley Periodicals, Inc.

643

644

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

different from those obtained from the classical worst-case Markowitz/expected utility approaches. In our approach, robust portfolios are obtained by optimizing performance relative to a family of stochastic benchmarks. Each benchmark corresponds to the wealth of a ﬁctitious benchmark investor who (unlike our investor) is blessed with knowledge of the model generating asset returns and makes optimal decisions conditional on this knowledge. Our robust investor seeks to perform well relative to the entire family of benchmarks. We show that resulting robust portfolio is characterized by the solution of a (static) convex optimization problem which has a natural interpretation as a (less pessimistic) version of the robust Markowitz problem. We also show, using convex duality, that the solution of this (primal) static problem is also the solution of another (dual) portfolio selection problem with a Bayesian prior on the unknown parameter. The prior distribution in the dual problem is nondegenerate and corresponds to the Lagrange multiplier associated with the convex (primal) problem. In contrast, the solution of the usual worst-case asset allocation problem is equivalent to a Bayesian problem with a degenerate prior on the most pessimistic/unfavorable model. Robust stochastic optimization differs from usual stochastic optimization in that it accounts for a family of possible models in the formulation of the problem. The primary reason for doing so is that solutions of stochastic optimization models (which assume a single model) are often highly sensitive to errors in the model, and substantial deterioration in performance of “optimal” (model-based) decisions can arise if actual events/returns are generated from a distribution or process that is different from the one that is assumed. Multiple models enable the decision maker to account for different (possibly conﬂicting) statistical assumptions and/or parameter values in the underlying stochastic model. In the context of portfolio selection, it is well known that solutions of classical Markowitz problems are highly sensitive to errors in the expected return and covariance matrix. This observation has motivated substantial interest in the general problem of robust portfolio selection. Almost all of the work on robust portfolio selection has centered around worst-case or max–min objectives, with focus on the development of efﬁcient algorithms for solving these problems1 and/or the mathematical analysis of their solutions.2 Robust optimization has also generated substantial interest in other ﬁelds including electrical engineering,3 operations research,4 and economics and ﬁnance,5 where the objectives, again, are almost always of worst-case type. (An important exception includes the papers of Klibanoff, Marinacci, and Mukerji (2005, 2009) which axiomatize a smooth model of decision making under ambiguity; see also Ceria and Stubbs (2006) for another “non-worst-case” approach). One criticism of worst-case objectives as a normative model is that they lead to overly pessimistic solutions.6 For example, the 1

¨ unc ¨ u¨ and Koenig (2004). El Ghaoui, Oks, and Oustry (2003), Goldfarb and Iyengar (2003), Tut Gundel (2005), Schied (2005, 2007). 3 Bernhard and Basar (2008), Doyle et al. (1989), Petersen, James, and Dupuis (2000). 4 Ben-Tal and Nemirovski (1998, 1999, 2000), El Ghaoui, Oks, and Oustry (2003), Goldfarb and Iyengar (2003), Lim and Shanthikumar (2007), Lim, Shanthikumar, and Shen (2006), Lim, Shanthikumar, and ¨ unc ¨ u¨ and Koenig (2004). Watewai (2008b), Nilim and El Ghaoui (2005), Tut 5 Chen and Epstein (2002), Epstein and Wang (1994), Garlappi, Uppal, and Wang (2005), Gilboa and Schmeidler (1989), Gundel (2005), Hansen et al. (2006), Liu, Pan, and Wang (2005), Maenhout (2004), Schied (2005, 2007). 6 This particular criticism, in and of itself, is less serious in descriptive models where worst-case models (motivated by the observations in Ellsberg 1961) are justiﬁed by the axiomatic work of Chen and Epstein (2002) and also Gilboa and Schmeidler (1989). 2

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

645

solution of the dynamic worst-case case expected utility problem with i.i.d. ambiguity is that of a classical (non-worst-case) utility maximization problem with a degenerate prior on the alternative model with the smallest Sharpe ratio. Similarly, the solution of a worst-case Markowitz problem is that of a (nonrobust) Markowitz problem with the same (pessimistic) degenerate prior. It can also be shown that the solution associated with ambiguity models, such as relative entropy, has a similar property. The ﬁrst contribution of this paper is the proposal of a new measure of robust performance which avoids the pessimism in the standard worst-case approach. Indeed, we show that the solution of the asset allocation problem associated with this new objective corresponds to a Bayesian problem with a nondegenerate prior distribution that puts mass on optimistic as well as pessimistic models. More speciﬁcally, we formulate this optimal robust asset allocation problem in continuous time for a family of models that account for nonstationarities as well as jumps, and show that it reduces to convex static optimization problem under fairly reasonable assumptions (which we discuss later). Under this relative performance measure, the performance of a portfolio is evaluated by comparing its resulting terminal wealth to that of a ﬁctitious benchmark investor who behaves optimally according to some given performance measure and complete information about the otherwise ambiguous market model. In particular, a portfolio does well if the resulting wealth compares favorably to the benchmark performance over the family of possible models. This differs from classical worst-case approaches where (as we will show) good performance under the worst model is rewarded and under-performance when the model happens to be “good” has no bearing on the solution. Our benchmark objective is related to the literature on competitive analysis and regret minimization, as discussed in Wald (1950), Savage (1951), Blackwell (1956), Hannan (1957), Milnor (1954), and more recently by Cover (1991), Bergemann and Schlag (2008), DeMarzo, Kremer, and Mansour (2005), Hayashi (2008), Stoye (2008), Terlizzese (2006), and many others (see, e.g., the collection of papers in Foster, Levine, and Vohra 1999). Although it is presently open whether an axiomatic justiﬁcation for our benchmark objective can be developed,7 we believe that it provides a reasonable alternative to the usual worst-case model, which is often criticized as being too pessimistic as a normative model for applications with robustness concerns. Also related is the work by Cover (1991) on universal portfolios. Here, the benchmark investor is blessed with prophetic powers and maximizes wealth over the class of constant rebalanced portfolios given knowledge of future prices. In this context, the (robust) portfolio delivered in Cover (1991) achieves a wealth that has the same asymptotic growth rate as the (prophetic) benchmark investor. In contrast, our benchmark investor is endowed with knowledge of the model that generates the prices but not the realization of future prices, and is restricted to picking portfolios which are adapted to the underlying ﬁltration. In addition, the robust portfolio we seek maximizes relative performance over a ﬁnite time horizon. An important class of stochastic dynamic optimization problems are those where robustness is a concern and learning is possible (see Chen and Epstein 2002; Knox 2002; Epstein and Schneider 2003; Hansen and Sargent 2005, for contributions in this direction). In this regard, we note that while the model in this paper excludes learning, our analysis, results, and interpretations can be extended to problems where learning is possible (see Lim, Shanthikumar, and Vahn 2010). More speciﬁcally, we show using 7 Along these lines, Milnor (1954), Hayashi (2008), and Stoye (2008) develop axiomatic foundations for regret, and Terlizzese (2006) for relative regret, although their results do not cover the models used in this paper.

646

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

convex duality that the (primal) robust benchmarking problem (in this paper) is dual to a single-period asset allocation problem with a Bayesian prior on the unknown (and previously ambiguous) parameters. From a technical point of view, the Lagrange multipliers associated with the benchmarking problem map to the Bayesian prior for the dual problem. The prior is a nondegenerate probability measure on the set of unknown parameters, where nondegeneracy arises because our objective has a benchmark. In contrast, the dual problem for the usual worst-case objective has a degenerate prior that puts all its mass on the most pessimistic model. This duality has broader signiﬁcance in that it suggests a way of combining robust asset allocation with learning. More speciﬁcally, when benchmarked objectives are used, convex duality gives a nontrivial asset allocation problem with Bayesian learning. This contrasts with the model in Knox (2002) which avoids degeneracy by imposing a parameterized family of priors in the speciﬁcation of the problem with nature doing the usual worst-case optimization over the parameters describing this family. The outline of this paper is as follows. We formulate the class of uncertain models in Section 2. In Section 3, we examine the structure of solutions to the typical worstcase Markowitz and worst-case expected utility problems that have been studied in the literature. We show that the worst-case Markowitz problem is precisely the local problem that investors maximizing worst-case expected utility of terminal wealth will solve when price processes are log-normal, and that both problems have a pessimistic solution in that they are equivalent to solving standard Markowitz/utility maximization problems under the assumption that the least favorable model (in the sense of Sharpe ratio) within the family of alternative models is the true one. We introduce benchmarked objectives in Section 4 and analyze the properties of the solution for jump-diffusion and continuous price processes in Sections 5 and 6 in the special case when the benchmark process maximizes logarithmic utility. We show in Section 5 that the optimal robust policy corresponding to a log-optimal benchmark is characterized by the solution of a static convex optimization problem that may be interpreted as a (generalized) worst-case minimum norm problem. From an economic point of view, this static problem can be interpreted as a less pessimistic version of the robust Markowitz problem. We show in Section 6, using convex duality, that the dual to this static problem is a single-period asset allocation problem with a nondegenerate Bayesian prior on the set of unknown parameters. This enables us to draw connections between benchmarked objectives and Bayesian learning. Finally, we generalize these results to nonlogarithmic benchmark processes in Section 7.1 and benchmark weighted objectives in Section 7.2. Numerical tests are conducted in Section 8.

2. CLASS OF UNCERTAIN MODELS In this section, we describe the family of alternative models that we shall be using in our analysis. We adopt what is essentially the i.i.d. ambiguity model of Chen and Epstein (2002). Extensions which incorporate other models of ambiguity are possible but will not be considered here. Let (, F) be a measurable space and W : → C[0, T]d and N : → D[0, T]k be given and ﬁxed processes. Let N denote the family of admissible models. Elements of N are characterized by a choice of measure P and process H(t) = (κ(t), σ (t), λ(t)) such that (i) W (t) is a standard d-dimensional Brownian motion under P;

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

647

(ii) N(t) is a k-dimensional counting process with F-predictable intensity λ(t) = [λ1 (t), λ2 (t), . . . , λk (t)] , where F is the P-completed ﬁltration generated by (W(t), N(t)); (iii) H(t) is F-predictable and H(t) ∈ H for every t where H is a compact subset. We assume throughout that nature is free to choose any model from this set. For every admissible model (P, H(t)) ∈ N , we have the associated market model d S 0 (t) = S0 (t)r (t) dt, S0 (0) = 1, (2.1) d Si (t) = Si (t − ){κ i (t) dt + σ i (t)d W(t) + θ i d N(t)}, Si (0) given, i = 1, . . . , q, where (the row vectors) θ i = [θ i1 , θ i2 , . . . , θ ik ] (i = 1, 2, . . . , q) are assumed to be given and unambiguous. More speciﬁcally, for every model H(t) = (κ(t), σ (t), λ(t)) and each asset price process Si (t) deﬁne Ji (t)

k

θ ij Nj (t) = θ i N(t).

j =1

Ji (t) is a jump process with arrival rate intensity 1θ1i =0 λ1 (t) + 1θ2i =0 λ2 (t) + · · · + 1θki =0 λk (t), ρi (t) = 0,

θ i = 0, θ i = 0,

and jump distribution νi (θ ij ) = P Ji (t) − Ji (t − ) = θ ij Ji (t) − Ji (t − ) = 0 ⎧ ⎪ ⎨ λi (t) 1θ i =0 , ρi (t) = 0, = ρi (t) j ⎪ ⎩ 0, ρi (t) = 0. Observe that θ i = [θ i1 , θ i2 (t), . . . , θ ik ] determines the support of the jumps in d S i (t)/S i (t − ), whereas λ(t) = [λ1 (t), λ2 (t), . . . , λk (t)] determines the jump rate ρ i (t) and the jump-size distribution ν i (θ ij ). Assuming that θ i is known but λ(t) is ambiguous means that we are unambiguous about the support but are uncertain about the rate ρ i (t) and the jump distribution ν i (θ ij ). The asset S0 (t) in (2.1) is the money-market account with interest rate r(t). Although interest rates r(t) are typically stochastic processes (with an ambiguous model), we assume for simplicity that r(t) is known, deterministic, and uniformly bounded: |r (t)| ≤ K for some constant K < ∞. Our results extend immediately to the case when r(t) is stochastic and ambiguous, so long as it is constrained to a compact set. We denote by G the ﬁltration generated by the asset price processes (S0 (t), S1 (t), . . . , Sq (t)). It is natural to assume that investors make decisions on the basis of market observations, and for this reason, we restrict policies to the following admissible class

T (2.2) A = π : [0, T] × → Rq π (t) is G − predictable and |π (t)|2 dt < ∞ . 0

648

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

Components of an admissible portfolio π (t) = [π 1 (t), π 2 (t), . . . , π q (t)] correspond to the dollar amounts invested in each stock, whereas π 0 (t) = x(t) − π (t) 1 corresponds to the amount invested in the money market account at time t, where x(t) is the value of the portfolio. Under the self-ﬁnancing assumption, x(t) evolves according to the stochastic differential equation: (2.3) d x(t) = [r (t)x(t) + b(t) π (t)] dt + π (t) σ (t) d W(t) + π (t) θ d N(t),

x(0) = x,

where b(t) = [κ1 (t) − r , κ2 (t) − r , . . . , κq (t) − r ] = κ(t) − r 1, σ (t) = [σ 1 (t) , σ 2 (t) , . . . , σ q (t) ],

θ = [θ 1 , θ 2 , . . . , θ q ]. Observe that π 0 (t) is uniquely determined by the wealth x(t) and the investment in stock π (t) and hence does not need to be included as one of the decision variables. Finally, we note that ambiguity in κ(t) gives rise to ambiguity in the process b(t), and we shall speak, for the remainder of this paper, about uncertainty in b(t) instead of κ(t).

3. ROBUST PORTFOLIO SELECTION PROBLEM: STANDARD CASE We discuss the structure of solutions to classical robust asset allocation problems. We show that the solution to these problems solves a standard (nonrobust) asset allocation problem corresponding to the model with the smallest Sharpe ratio. It is in this sense that the solution of the usual worst-case problem is pessimistic, and provides motivation for our work on alternative objectives in subsequent sections.

3.1. Single-Period Robust Mean–Variance Problems Assume for the moment that the parameters σ (t), b(t), and r(t) in (2.3) are constant and known to an investor for a short time period 0 ≤ t ≤ δ 1, and suppose that this investor holds his/her portfolio π constant over this interval. It follows from (2.3) that his/her excess return at time δ is √ Rδ (π ) x(δ) − x(0)(1 + r ) = b π δ + π σ W(δ) = D b π δ + δπ σ Z, where Z ∼ N(0, I) is a standard d-dimensional multivariate normal distribution. The mean and variance of Rδ (π ) are given by

E(Rδ (π )) = b π δ,

Var(Rδ (π )) = δπ Qπ,

where Q σ σ is the covariance of asset returns. An investor who invests across the period δ with a mean–variance objective would solve minπ Var (x(δ)) subject to E(Rδ (π )) ≥ vδ, or equivalently ⎧ min π Qπ ⎪ ⎪ ⎨ π (3.1) subject to: ⎪ ⎪ ⎩ π b ≥ v.

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

649

The solution of this problem is elementary and can be obtained by solving the unconstrained quadratic optimization problem min{π Qπ − 2γ π b},

(3.2)

π

where different choices of γ ≥ 0 give solutions of (3.1) for different values of v. Consider now the case where the exact values of b and Q are not known aside from the fact that (Q, b) ∈ H1 × H2 for some given uncertainty sets H1 and H2 . Assume that H1 is a compact subset of S q , the set of symmetric positive-semi-deﬁnite q × q matrices with real-valued entries, whereas H2 is a compact subset of Rq . In this section, we shall assume (unless otherwise stated) that H1 and H2 are convex as well as compact, although convexity is not really necessary for the insights we obtain if we use the methods in Section 6.2. The following problem has been proposed as a robust version of the single-period ¨ unc ¨ u¨ and Koenig Markowitz problem (3.1) (see, e.g., Goldfarb and Iyengar 2003; Tut 2004; Garlappi, Uppal, and Wang 2005): ⎧ min max π Qπ, ⎪ ⎪ π Q∈H1 ⎪ ⎪ ⎨ (3.3) subject to: ⎪ ⎪ ⎪ ⎪ ⎩ min π b ≥ v. b∈H2

In this section, we examine the structure of the solution of (3.3) and the implications of using this method to ﬁnd robust portfolios. In Section 3.2, we shall repeat the analysis for worst-case dynamic problems (which are closely related to the problems formulated in Chen and Epstein 2002; Gundel 2005; Hansen et al. 2006; Liu, Pan, and Wang 2005). The results in this section are not new (similar observations are made, for instance, in ¨ unc ¨ u¨ and Koenig 2004) but motivates our study of alternative (benchmark) objectives Tut in subsequent sections. To begin, observe that the objective (3.3) is convex in π (irrespective of whether H1 and H2 are convex) and hence can be solved using duality. In particular, the introduction of a Lagrange multiplier γ ≥ 0 leads to the problem min π

max

(Q, b)∈H1 ×H2

{π Qπ − 2γ π b}.

More generally, consider the problem (3.4)

p∗ min max {π Qπ − 2γ π b}. π

(Q, b)∈H

q

Let us assume that H is a compact subset of S+ × Rq (although not necessarily of the form H1 × H2 ). We assume throughout that there is some δ > 0 such that8 (3.5)

Q ≥ δ I for every (Q, b) ∈ H.

8 In the case of the asset price model (2.1) (when there is no model uncertainty and σ (t) is known) it is typically assumed that the so-called nondegeneracy assumption holds; that is, there is a δ > 0 such that σ (t)σ (t) ≥ δ I for every t a.s. By assuming (3.5) we are simply saying that the family of models (σ , b) satisﬁes the nondegeneracy assumption uniformly with respect to δ.

650

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

The following result gives the structure of the optimal solution of (3.4). The proof can be found in the Appendix. PROPOSITION 3.1. Suppose that H ⊆ S q × Rq is compact and convex and satisﬁes the nondegeneracy condition (3.5). Then p∗ min max {π Qπ − 2γ π b}

(3.6)

π

(Q, b)∈H

= max min{π Qπ − 2γ π b} (Q, b)∈H

π

(3.7)

= min{π Q∗ π − 2γ π b∗ }

(3.8)

= −γ 2 min b Q−1 b,

π

(Q, b)∈H

where (Q∗ , b∗ ) = arg min b Q−1 b

(3.9)

(Q, b)∈H

and π ∗ = γ [Q∗ ]−1 b∗

(3.10)

determine the optimal response by nature and the optimal robust policy, respectively. Observe that b Q−1 b is the Sharpe ratio corresponding to the model with parameters (Q, b). Consequently, the equation (3.7) says that the solution (3.10) of the robust problem (3.4) is the solution of the nonrobust version under the (pessimistic) assumption that the true model is (Q∗ , b∗ ), the one with the smallest Sharpe ratio.9,10 For example, if the risk free rate r is 0.01 and there is a single stock with known volatility σ = 0.1 (say) but unknown mean κ which lies in the interval [0.01 − , 0.15] for some small 0 ≤ r , then H = {(Q, b) | Q = 0.12 , b ∈ [− , 0.14]} contains the model with b = 0 (recall that b = κ − r ) and the robust solution takes b∗ = 0 (even though b takes values as high as 0.14) and invests only in the money market account (π ∗ = 0). Alternatively, if μ denotes an arbitrary probability measure on H, then min Eμ {π Qπ − 2γ b π } = min {π Eμ (Q)π − 2γ Eμ (b) π } π

π

is a Bayesian version of the asset allocation problem (3.2) and it follows from (3.7) that (3.4) is equivalent to such a problem with a degenerate prior μ∗ on the most pessimistic/least favorable model (Q∗ , b∗ ). In either case, the solution (3.10) delivered by the worst-case approach seems overly pessimistic. It is also interesting to observe that (Q∗ , b∗ ) is independent of γ . ¨ unc ¨ u¨ and Koenig (2004) for similar observations when H = {(Q, b) : L Q U, l ≤ b ≤ u}. See Tut If the set of alternative models H (or indeed, nature’s optimization problem) is nonconvex, it can still be shown using methods in Section 6.2 that nature’s decision has the same pessimistic feature: minimize Sharpe ratio, but now, over distributions on the set of alternative models (i.e., it is no longer a point mass, but still has the same pessimistic property). The simplifying convexity assumption enables us to get to the same basic insight more easily. 9

10

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

651

3.2. Dynamic Worst-Case Models Let U(x) : R â†’ R be a strictly concave twice continuously differentiable utility function. Consider the following worst-case robust portfolio selection problem11 : âŽ§ V(s, x) max min E[U(x(T))] âŽŞ âŽŞ Ď€ (t) H(t) âŽŞ âŽŞ âŽŞ âŽŞ âŽŞ âŽŞ âŽŞ âŽ¨ subject to: (3.11) d x(t) = [r (t)x(t) + b(t) Ď€ (t)] dt + Ď€ (t) Ďƒ (t)d W(t), âŽŞ âŽŞ âŽŞ âŽŞ âŽŞ âŽŞ x(s) = x, âŽŞ âŽŞ âŽŞ âŽŠ Ď€ (t) âˆˆ A, H(t) = (b(t), Ďƒ (t)) âˆˆ N . We assume throughout this section that the parameter uncertainty set H is compact and convex. The following structural result, however, does not depend on H being convex. The proof follows along the lines of that of Proposition 5.1 and we leave details to the interested reader. PROPOSITION 3.2. Suppose that U(x) is strictly concave and increasing in x. Then V(s, x) is concave and increasing in x. The dynamic programming equation for (3.11) is âŽ§ âŽŞ âŽ¨ max min Vt (t, x) + Vx (t, x)(r x + b Ď€ ) + 1 Vxx Ď€ Ďƒ Ďƒ Ď€ = 0 Ď€ HâˆˆH 2 (3.12) âŽŞ âŽŠ V(T, x) = U(x). The nonlinear PDE (3.12) involves the following optimization problem: 1 Vx (t, x)b Ď€ + Vxx (t, x)Ď€ Ďƒ Ďƒ Ď€ max min (3.13) Ď€ (Ďƒ Ďƒ , b)âˆˆH 2 1 = Vxx (t, x) min max Ď€ QĎ€ âˆ’ 2Îł (t, x)b Ď€ , Ď€ (Q, b)âˆˆH 2 where Îł (t, x) âˆ’

Vx (t, x) â‰Ľ0 Vxx (t, x)

with the equality in (3.13) and the nonnegativity of Îł (t, x) coming from the monotonicity and convexity of V(t, x) as outlined in Proposition 3.2. Observe that for a given (t, x), the static optimization problem (3.13) is precisely the robust Markowitz problem (3.4) with risk-return parameter Îł (t, x), so the optimal investment strategy Ď€ âˆ— (t, x) for (3.11) coincides (locally) with a robust mean-variance optimal portfolio characterized by the solution of (3.13). Furthermore, the risk-return preferences of this investor depends on the utility function U(x) through the value function V(t, x) by way of the risk-return parameter Îł (t, x). The following result is an immediate consequence of Proposition 3.1. 11 Although this model does not include consumption, the conclusions of this section apply equally to situations where consumption is allowed and the utility of consumption is concave and increasing.

652

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

PROPOSITION 3.3. Suppose that H is compact and convex and satisﬁes the nondegeneracy assumption (3.5). Then min max {π Qπ − 2γ (t, x)b π } π

(Q, b)∈H

= max min{π Qπ − 2γ (t, x)π b} (Q, b)∈H

(3.14)

π

= min{π Q∗ π − 2γ (t, x)π b∗ } π

=−

Vx (t, x) Vxx (t, x)

2

b∗ [Q∗ ]−1 b∗

where (Q∗ , b∗ ) arg min b Q−1 b

(3.15)

(Q, b)∈H

is the optimal response by nature and π ∗ (t, x) =

(3.16)

Vx (t, x) ∗ −1 ∗ [Q ] b Vxx (t, x)

is the optimal robust portfolio. It is easy to show that there is an explicit solution if U(x) = xη /η (η < 1), namely (3.17)

V(t, x) =

1 1 η x exp η(T − t) r + b∗ [Q∗ ]−1 b∗ η 2(1 − η)

π ∗ (t, x) =

1 [Q∗ ]−1 b∗ . 1−η

Several properties of the dynamic solution (and similarities with the worst-case singleperiod problem) are important to highlight. First, nature’s optimal response is the same for both single-period and dynamic worst-case problems: it is the model (σ ∗ , b∗ ) with the smallest Sharpe ratio (see (3.9) and (3.15)). Nature’s model is independent of the risk-aversion parameter γ (in the single-period case) and the utility function U(x) is the dynamic problem. Secondly, as in the single-period case, the optimal portfolio (3.16) for the dynamic worst-case problem is nothing but the optimal portfolio for a standard (nonrobust) utility maximization problem ⎧ max E[U(x(T))] ⎪ ⎪ ⎪ π ⎪ ⎪ ⎪ ⎨ subject to: ⎪ ⎪ d x(t) = [r (t)x(t) + π (t) b∗ ] dt + π (t) σ ∗ d W(t) ⎪ ⎪ ⎪ ⎪ ⎩ x(s) = x with parameters (σ ∗ , b∗ ) chosen from H so as to have minimal Sharpe ratio (i.e., (σ ∗ σ ∗ , b∗ ) = (Q∗ , b∗ ), the solution of (3.15)). To see this, observe (by way of (3.13)

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

653

and (3.14)) that the dynamic programming equation (3.12) for the worst-case problem can be rewritten as ⎧ ⎪ ⎨ Vt (t, x) + r xVx (t, x) + max Vx (t, x)π b∗ + 1 Vxx (t, x)π (σ ∗ σ ∗ )π = 0, π 2 (3.18) ⎪ ⎩ V(T, x) = U(x) which is nothing but the dynamic programming equation for the standard utility maximization problem above. It follows that the only concern when solving the robust dynamic problem (as in the single-period case) is to perform well for the model with the smallest Sharpe ratio. It is unconcerned about under-performance of this strategy when the model does not happen to be the worst-case one.

3.3. Totally Optimistic Investors We assume throughout this section that H is compact; convexity is not required. We deﬁne a totally optimistic investor12 as one who uses the following asset allocation problem as the basis of ﬁnding π ∗ (t): ⎧ max E[U(x(T))] ⎪ ⎪ π (t), H(t) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ subject to: (3.19) d x(t) = [r x(t) + b(t) π (t)] dt + π (t) σ (t)d W(t), ⎪ ⎪ ⎪ ⎪ ⎪ x(0) = x0 , ⎪ ⎪ ⎪ ⎪ ⎩ π (t) ∈ A, H(t) = (b(t), σ (t)) ∈ N . The dynamic programming equation for this problem is ⎧ ⎪ ⎨ max Vt (t, x) + Vx (r + b π ) + 1 Vxx π σ σ π = 0, π, H∈H 2 (3.20) ⎪ ⎩ V(T, x) = U(x). Suppose that (σ ∗ , b∗ ) is the model in H with the largest Sharpe ratio: (3.21)

−1

b∗ (σ ∗ σ ∗ ) b∗

max b (σ σ )−1 b

(σ σ , b)∈H

(3.22)

= =

max

μ≥0, μ(H)=1 H

max

μ≥0, μ(H)=1

b (σ σ )−1 b μ(d(σ σ , b))

Eμ [b (σ σ )−1 b],

where the max in (3.22) is over the set of probability measures on the Borel sets B(H) of H. Observe that the optimal Sharpe ratio in (3.21) is achieved by some (σ ∗ σ ∗ , b∗ ) ∈ H and the probability measure μ∗ which maximizes (3.22) puts all its mass at {(σ ∗ σ ∗ , b∗ )}. 12 This section comes in handy when we interpret the solution of our robust benchmarking problem in Section 5.

654

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

Consider the following (standard) portfolio selection problem with representative model (σ ∗ , b∗ ): ⎧ max E[U(x(T))] ⎪ ⎪ ⎪ π (t) ⎪ ⎪ ⎪ ⎨ subject to: (3.23) ⎪ ⎪ d x(t) = [r x(t) + b∗ (t) π (t)] dt + π (t) σ ∗ (t)d W(t) ⎪ ⎪ ⎪ ⎪ ⎩ x(0) = x0 . The associated dynamic programming equation is ⎧ ⎪ ⎨ max Vt (t, x) + Vx (t, x)[r x + b∗ π ] + 1 Vxx (t, x)π σ ∗ σ ∗ π = 0, π 2 (3.24) ⎪ ⎩ V(T, x) = U(x). The following result shows that the totally optimistic investor who wants to solve (3.19) chooses the alternative model with the largest Sharpe ratio and solves a standard utility maximization problem under the assumption that this model is correct. We leave the proof to the reader. PROPOSITION 3.4. The partial differential equations (3.20) and (3.24) have the same solution V(t, x) which is the value function of both (3.19) and (3.23). The optimal portfolio for both (3.19) and (3.23) is π (t, x) = −

(3.25)

Vx (t, x) ∗ ∗ −1 ∗ (σ σ ) b Vxx (t, x)

where b (σ σ )−1 b = arg (σ ∗ σ ∗ , b∗ ) arg max (σ σ , b)∈H

max

μ≥0, μ(H)=1

Eμ [b (σ σ )−1 b].

4. RELATIVE PERFORMANCE: GENERAL FORMULATION 4.1. Benchmark Investor Let (, F, P) and H(t) = (b(t), σ (t), λ(t)) ∈ N be an admissible alternative model and U B (y) be a given concave utility function. For every H(t) ∈ N , consider the following (standard) utility maximization problem: ⎧ EP [U B (y(T))] ⎪ ⎪ max ψ(t) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ subject to: (4.1) dy(t) = [r y(t − ) + b(t) ψ(t)] dt + ψ(t) σ (t)d W(t) + ψ(t) θ (t) d N(t) ⎪ ⎪ ⎪ ⎪ ⎪ y(0) = y0 , ⎪ ⎪ ⎪ ⎪ ⎩ ψ(t) ∈ U. The dynamics describe the evolution of the wealth y(t) of an investor with portfolio ψ(t) when the model is H(t) ∈ N . We assume that the investor in (4.1) is free to choose any

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

655

policy ψ(t) from the class13

U = ψ(t) : [0, T] × → Rq ψ(t) is F-predictable and

T

|ψ(t)|2 dt < ∞ .

0

We denote by y H (t) the optimal wealth process associated with the model H(t) and by ψ H (t) the associated optimal trading strategy obtained by solving (4.1). We assume that the ﬁrst derivative of the utility function satisﬁes [U B ] (0) = ∞ so as to guarantee that y H (t) > 0 for all t ∈ [0, T], P-a.s. for every model H(t). Observe that y H (t) is adapted to the ﬁltration F generated by (W(t), N(t)) because H(t) is F-predictable. Intuitively, an investor with utility function U B (y) would choose the portfolio ψ H (t) if he/she knew that the model was H(t). In this case, y H (t) would be the associated optimal wealth process. In what follows, we shall use y H (t) as a benchmark and evaluate the portfolio π (t) by comparing its performance with y H (t) over the family of models N . We refer to the investor who solves the problem (4.1) as the benchmark investor and to y H (t) and ψ H (t) as the benchmark investor’s wealth and benchmark investor’s policy, respectively.

4.2. Relative Performance Measure Consider an investor with some trading strategy π (t). Suppose that stock prices evolve according to the model (2.1) which is characterized by the parameters H(t) = (κ(t), σ (t), λ(t)). The wealth process corresponding to this trading strategy would evolve as (4.2)

d x(t) = [r (t)x(t) + b(t) π (t)] dt + π (t) σ (t)d W(t) + π (t) θ (t)d N(t), x(0) = x0 .

The traditional worst-case approach evaluates π (t) by ﬁnding the worst possible performance under all alternative models, namely min H(t)∈H E[U(xπ (T))], and says that π (t) is “good” if it maximizes this objective. In this regard, we showed in Section 3 that portfolios obtained in this way are typically extremely pessimistic because they are only concerned with optimizing performance for the least favorable model (in the sense of Sharpe ratio) and not the losses incurred by being pessimistic. In this section, we introduce a new measure of robust performance that addresses this problem.14 More speciﬁcally, consider an investor with portfolio π (t). The intuition behind this relative performance measure is that an investor is “satisﬁed” with the performance of H π (t) if his/her wealth xH π (T) compares favorably with the wealth y (T) of the wellinformed benchmark investor who behaves optimally given knowledge of the true model H H(t). Alternatively, we say that the investor’s portfolio is robust if the ratio xH π (T)/y (T) is “large” for all alternative models H(t) ∈ N . This motivates the following relative performance or benchmarked objective:

13

This class of policies is typical for general statements of utility maximization problems. Although not considered in this paper, this framework can be extended to include consumption (Lim, Shanthikumar, and Watewai 2008a). We do not address the important problem of ambiguity with respect to the utility function. 14

656

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

H ⎧ xπ (T) ⎪ ⎪ U min E max ⎪ ⎪ ⎪ y H (T) ⎪ π H(t) ⎪ ⎪ ⎪ ⎪ ⎨ subject to: (4.3)

xπH (t) is the solution of (4.2) under (π (t), H(t)) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ y H (t) is the solution of (4.1) under H(t) ⎪ ⎪ ⎪ ⎩ H(t) = (b(t), σ (t), λ(t)) ∈ N , π (t) ∈ A,

where U(z) is a utility function which represents the investors attitude towards “beating” or “falling below” the benchmark y H (T). Observe that U(z) represents utility of performance (where performance is a unit-less quantity because it is the ratio of wealths) whereas U B (y) measures utility of wealth (a dollar amount). As such, they are utilities over different objects and hence there is no reason to expect that U(z) and U B (y) should be the same.15 In deﬁning the class of admissible investment portfolios for the investor and nature, a key issue concerns the information that is available to both “players.” It is natural to allow nature to observe F (the complete ﬁltration generated by (W(t), N(t))) and its choice of parameters H(t) to be F-predictable. On the other hand, the investor’s observation ﬁltration should not be F, but the history of the asset prices G, and consequently his/her portfolio π (t) should be restricted to the admissible class A as we have done in (4.3). This setup creates some complications, however, because y H (t) is F-adapted but not G-adapted and hence not observable to an investor with observation ﬁltration G (recall that it is the wealth of the ﬁctitious benchmark investor, who is assumed to know the model chosen by nature). This information asymmetry can (in principle) be handled using ﬁltering theory although this brings additional technical baggage. One method of avoiding ﬁltering is to simply ignore the issues just raised and to allow both the investor and nature to make F-predictable decisions (in particular, to make H decisions that depend on the pair (xH π (t), y (t))) and to hope that such an approach still gives us a G-predictable investment policy. This leads to the following modiﬁcation of (4.3): H ⎧ xπ (T) ⎪ ⎪ U min E max ⎪ ⎪ π H(t) ⎪ y H (T) ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ subject to: (4.4) xπH (t) is the solution of (4.2) under (π (t), H(t)), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ y H (t) is the solution of (4.1) under H(t), ⎪ ⎪ ⎪ ⎩ H(t) = (b(t), σ (t), λ(t)) ∈ N , π (t) ∈ U, where the admissible class A has been replaced by

(4.5) U = π (t) : [0, T] × → Rq π (t) is F-predictable and

T

|π (t)|2 dt < ∞ .

0

15 Although (4.3) differs from the usual deﬁnition of regret/relative regret, namely minπ max H(t) E[U(y H (T)) − U(xπH (T))] or minπ max H(t) E[U(y H (T))/U(xπH (T))] with U B (x) = U(x), it is of the same spirit because it is a relative measure. Clearly all three objectives are equivalent when U(x) = U B (x) = log (x). Only (4.3) is studied in this paper.

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

657

(Observe that this admissible class is exactly the same as the one associated with the benchmark investor’s problem (4.1).) This problem can be analyzed using dynamic programming and the optimal policy for arbitrary utility functions U(z) and U B (y) is gener− H − ally Markov in the state (xH π (t ), y (t )), and hence not G-predictable (as we suspected). In certain cases, however, we are lucky. In particular, the choice of log or power utility − for either U(z) or U B (y) results in an optimal policy π (t) which is Markov in xH π (t ) and hence is predictable with respect to the asset price ﬁltration G. For this reason, we shall adopt (4.4) for the rest of the paper (although we take care in all cases to show that the policy we obtain is G-predictable).

5. JUMP-DIFFUSIONS AND A LOG-OPTIMAL BENCHMARK 5.1. Formulation We consider in this section the special case of (4.1) when the benchmark process y H (t) associated with the model H(t) = (b(t), σ (t), λ(t)) is chosen to maximize log utility U B (y) = log (y). It can be shown that y H (t) is the solution of the following stochastic differential equation: ⎧ dy H (t) = y H (t − ){[r + b(t) ψ(t)] dt + ψ(t) σ (t)d W(t) + ψ(t) θ (t) d N(t)}, ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ y H (0) = x0 , (5.1) n ⎪ ⎪ θi ⎪ ⎪ where: b(t) − σ (t)σ (t) ψ(t) + λi (t) = 0. ⎪ ⎩ 1 + ψ(t) θi i =1

The implicit equation for ψ(t) determines a nonlinear relationship between the model H(t) = (b(t), σ (t), λ(t)) and ψ(t). Observe that ψ(t) satisﬁes the inequality 1 + ψ(t) θ i > 0 for every (b(t), σ (t), λ(t)) and i = 1, 2, . . . , n so that y H (t) > 0 for every t ∈ [0, T] P-a.s.16 As a ﬁrst step towards solving the problem (4.4), let us derive the dynamics for the H benchmarked process z(t) xH π (t)/y (t). Observing that 1 1 = − H − [r + b(t) ψ(t) − ψ(t) σ (t)σ (t) ψ(t)] dt d y H (t) y (t ) n ψ(t) θi + ψ(t) σ (t)d W(t) + d Ni (t) , 1 + ψ(t) θi i =1

another application of Ito’s formula (and a bit of algebra) gives H 1 x (t) = H − [π (t) − x H (t − )ψ(t − )] [b(t) − σ (t)σ (t) ψ(t)] dt d y H (t) y (t ) + [π (t) − x(t − )ψ(t)] σ (t) d W(t) n + [π (t) − x(t − )ψ(t)] i =1

θi d Ni (t) . 1 + ψ(t) θi

16 If 1 + ψ(t) θ ≤ 0 for some i on a nonzero measure subset of [0, T] with positive probability, then i y H (T) ≤ 0 with positive probability. This is clearly nonoptimal when U B (y) is logarithmic.

658

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

Observing that ψ(t) is the solution of the implicit equation in (5.1) we obtain n θ (t) π (t) i σ (t)d W(t) + − z(t − )ψ(t) d Mi (t) . dz(t) = y H (t − ) 1 + ψ(t) θi (t) i =1

Deﬁning u(t)

π (t) y H (t − )

which may be interpreted as the amount of money, in benchmarked dollars, invested in each stock, we obtain the dynamics of the normalized process z(t). Observe that u(t) ∈ U if and only if π (t) ∈ U.17 It now follows that the problem (4.4) with a log-optimal benchmark is equivalent to the two-player zero-sum stochastic differential game ⎧ max min E[U(z(T))] ⎪ ⎪ u(t)∈U H(t)∈N ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ subject to: ⎪ ⎪ ⎪ ⎪ ⎪ n ⎪ ⎪ θi (t) ⎨ − dz(t) = [u(t) − z(t )ψ(t)] σ (t)d W(t) + d Mi (t) (5.2) 1 + ψ(t) θi (t) i =1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ z(0) = 1 ⎪ ⎪ ⎪ ⎪ n ⎪ ⎪ θi ⎪ ⎪ where: b(t) − σ (t)σ (t) ψ(t) + λi (t) = 0, ⎪ ⎩ 1 + ψ θi i =1

where U(z) is any increasing strictly concave utility function. Deﬁne the value function V(s, z) max min E[U(z(T)) | z(s) = z, u(t), H(t)] u(t)∈U H(t)∈N

which is the solution of the dynamic programming equation ⎧ 1 ⎪ ⎪ max min Vt (t, z) + Vzz (t, z)(u − zψ(H)) σ σ (u − zψ(H)) ⎪ ⎪ u H∈H ⎪ 2 ⎪ ⎪ ⎪ ⎪ n ⎪ ⎪ θi ⎪ ⎨ + − V(t, z) λi V(t, z + u − zψ(H)) 1 + ψ(H) θi (5.3) i =1 ⎪ ⎪ ⎪ ⎪ θi ⎪ ⎪ ⎪ = 0, −Vz (t, z)(u − zψ(H)) ⎪ ⎪ 1 + ψ(H) θi ⎪ ⎪ ⎪ ⎩ V(T, z) = U(z), where ψ(H) = ψ(κ, σ , λ) is the solution of the equation b − σ σ ψ +

n i =1

λi

θi = 0. 1 + ψ θi

17 This follows from the observation that a.e. realization of y(t) is bounded and bounded away from 0; that is, there are constants δ(ω) > 0 and K(ω) < ∞ (which may depend on ω) such that δ(ω) ≤ y(t) ≤ K(ω) for every t ∈ [0, T].

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

659

The value function has the following structural properties. The proof can be found in the Appendix. PROPOSITION 5.1. Suppose that U(z) is strictly concave and increasing. Then V(s, z) is strictly concave and increasing in z for every s ∈ [0, T].

5.2. Optimal Robust Portfolio: Power Utility The robust portfolio selection problem (5.2) has an explicit solution when U(z) is a power utility function. PROPOSITION 5.2. Suppose that U(z) = zγ /γ . The value function is V(t, z) =

max G γ (q, ψ(H)) 1 γ −(T−t) min q H∈H z e γ

where G γ (q, H)

1 (1 − γ )(q − ψ(H)) σ σ (q − ψ(H)) 2 n 1 1 (q − ψ(H)) θi (q − ψ(H)) θi γ . λi − + + 1 + γ 1 + ψ(H) θi γ 1 + ψ(H) θi i =1

The optimal robust policy is given by u ∗ (t, z(t − )) = q ∗ z(t − ), or equivalently, π ∗ (t, x(t − )) = x(t − ) q ∗ , where q ∗ , the robust optimal proportion of wealth to be invested into each stock, and the associated response by nature H∗ , are given by (q ∗ , H∗ ) = arg min max G γ (q, H).

(5.4)

q

H

In particular, π ∗ (t) belongs to the set A. It follows from Proposition 5.2 that the static problem (5.4) needs to be solved to determine the optimal robust portfolio π ∗ (or q ∗ ). In this regard, (5.4) plays the analogous role for the relative performance problem (5.2) that the local robust mean–variance problem (3.3) plays for (3.11). In addition, it is clear that π ∗ (t) = π ∗ (t, x(t − )) = x(t − )q ∗ ∈ A. (In particular, it is the product of a constant q ∗ and a process x(t − ) that is predictable with respect to the price ﬁltration G.) Another important observation is that the static problem (5.4) is convex in q. Convexity in the case of (5.4) follows from convexity of G γ (q, H) in q, for every ﬁxed H, which implies in turn that f (q) = max G γ (q, H) H∈H

(a maximum of convex functions) is convex in q. It is also interesting that G γ (q, H) ≥ 0 for every (q, H) with equality if and only if q = ψ(H). With this in mind, deﬁne d H (q|ψ) =

1 (1 − γ )(q − ψ) Q(q − ψ) 2 n (q − ψ) θi (q − ψ) θi γ 1 1 + 1 + . λi − + γ 1 + ψ θi γ 1 + ψ θi i =1

660

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

Because dH (q|ψ) > 0 for every given H and ψ satisfying 1 + ψ θ i > 0 but varying over q, it may be regarded as a measure of the distance of q from ψ (although it is not a metric because it is not symmetric: dH (q | ψ) = dH (ψ | q)). Hence we can write G γ (q, H) = dH (q|ψ(H)) and the static problem (5.4) can be written as min max G γ (q, H) = min max d H (q|ψ(H)).

(5.5)

q

H∈H

q

H

This says that the relative performance robust investor chooses the portfolio q ∗ which is as “close as possible” to every alternative decision ψ(H) that is made by a perfectly informed investor under all possible market models H ∈ H. This problem is a generalization of the so-called worst-case minimum norm problem (see El Ghaoui and Lebret 1997). A key difference between (5.5) and classical worst-case minimum norm problems is that the metric is part of nature’s choice when Q is uncertain. When Q is known and there are no jumps (see the next section), we obtain the classical worst-case minimum norm problem and (5.5) can be solved efﬁciently for certain choices of H. This observation is signiﬁcant because the problem (5.4) (or (5.5)) for the relative performance problem (5.2) is the counterpart of the (local) robust Markowitz problem for the standard dynamic worst-case problem (3.11). In particular, if one agrees (at least normatively) that “benchmarking” (4.4) or (5.2) is preferred to worst-case expected utility (3.11), then one must agree that (5.4) is preferred to (3.3) or (3.4) as a formulation of the single-period robust portfolio selection problem. Related ideas are presented in Sections 7.1 and 7.2 where other benchmark processes and objectives will be discussed.

6. LOG OPTIMAL BENCHMARK: DIFFUSION PROCESSES We turn now to the special case when the price process is a diffusion for which there is uncertainty concerning the value of the drift and diffusion (b(t), σ (t)) (with λ(t) = 0 and θ = 0). The focus of this section is on the structure of the optimal robust portfolio as well as its connection to Bayesian problems which we establish using convex duality. The key insights and results from this section generalize to the jump-diffusion problem discussed in the previous section (at the cost of longer equations).

6.1. Formulation In the case of an ambiguous diffusion process, the optimal response by nature (5.1) has a closed form expression ψ(H) = ψ(b, σ ) = (σ σ )−1 b

(6.1)

and the benchmark process y H (t) simpliﬁes to

dy H (t) = y H (t − ){[r + b(t) (σ (t)σ (t) )−1 b(t)] dt + b(t) (σ (t)σ (t) )−1 σ (t) d W(t)}, y H (0) = x0 .

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

661

The “benchmarking” problem becomes ⎧ max min E[U(z(T))] ⎪ ⎪ u(t)∈U H(t)∈H ⎪ ⎪ ⎪ ⎪ ⎨ subject to: (6.2) ⎪ ⎪ ⎪ dz(t) = [u(t) − z(t − )(σ (t)σ (t) )−1 b(t)] σ (t) d W(t) ⎪ ⎪ ⎪ ⎩ z(0) = 1 for which the associated dynamic programming equation is ⎧ ⎪ ⎨ max min Vt (t, x) + 1 Vzz (t, z)[u − z(σ σ )−1 b] σ σ [u − z(σ σ )−1 b] = 0, u H∈H 2 (6.3) ⎪ ⎩ V(T, z) = U(z). We have the following representation of the value function. PROPOSITION 6.1. V(t, z) = E[U(z∗ (T))]

(6.4)

is the solution of (6.3) and the value function of (6.2) where ∗ dz (t) = z∗ (t)[q ∗ − (σ ∗ σ ∗ )−1 b∗ ] σ ∗ d W(t) z∗ (0) = 1 and (6.5) (q ∗ , H∗ ) = (q ∗ , (b∗ , σ ∗ )) = arg min q

max

H=(b, σ )∈H

[q − (σ σ )−1 b] σ σ [q − (σ σ )−1 b].

Moreover, π ∗ (t, x(t)) = q ∗ x(t) is the optimal robust portfolio and H∗ = (b∗ , σ ∗ ) is the optimal response by nature. q ∗ is the robust optimal proportion of wealth to be allocated to each stock. π ∗ (t) belongs to A. Observe that the optimal portfolio and response by nature, as characterized by (6.5), are independent of the utility function U(z). This is an anomaly unique to the case when benchmark investors maximize log-utility and the price processes are diffusions, and does not occur if there are jumps or if other benchmark processes are used; see (5.4) and also Section 7.1. It is important to note that unlike the mean–variance problem (see (3.4) and Proposition 3.1), the order of min and max in (6.5) cannot be exchanged, even if H is convex. Indeed exchanging the order gives 0=

max min[q − (σ σ )−1 b] σ σ [q − (σ σ )−1 b]

(σ σ , b)∈H

q

≤ min max [q − (σ σ )−1 b] σ σ [q − (σ σ )−1 b], q

(σ σ , b)∈H

where the inequality is strict unless H is a singleton (i.e., there is no model ambiguity). This happens because the function f (q, (Q, b)) [q − Q−1 b] Q[q − Q−1 b]

662

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

is jointly convex in (Q, b) when joint concavity is the property required for the min– max theorem to hold.18 The key point is that benchmarking results in nonconvexities in nature’s problem which leads to important differences between the solution of the usual worst-case model (3.4) and benchmarking (6.5). It also means that the methods used to obtain Proposition 3.1 can no longer be used directly to analyze (6.5). We introduce an alternative approach in the next section and show that (6.5) is dual to another single-period asset allocation problem with a nondegenerate prior, where the nondegeneracy is a consequence of the nonconvexities introduced by benchmarking (particularly, the joint convexity in (Q, b)). Of particular interest is the way in which the nondegenerate prior is deﬁned in terms of the worst and best case models (see Propositions 3.1 and 3.4). Observe that similar interpretations for the general case (5.3) can also be obtained by observing that V(s, z) is concave in z.

6.2. Duality and Bayesian Problems Assume once again that the uncertainty set H is compact (though not necessarily convex). The min–max problem (6.5) is equivalent to ⎧ ∗ η ⎪ ⎪ p min q, η ⎪ ⎨ (6.6) subject to: ⎪ ⎪ ⎪ ⎩ [q − Q−1 b] Q[q − Q−1 b] ≤ η, ∀ (Q, b) ∈ H, where we have replaced σ σ by Q. Observe that this problem is convex in (q, η) though the number of constraints (parameterized by (Q, b) ∈ H) is generally inﬁnite. We analyze this problem using convex duality. Let C(H) denote the space of real-valued continuous functionals on H with sup-norm g sup |g(Q, b)|,

∀ g ∈ C(H).

(Q, b)∈H

The linear space C(H) with this norm is a Banach space (Dunford and Schwartz 1988). Let P {g ∈ C(H) | g(Q, b) ≥ 0, ∀ (Q, b) ∈ H} deﬁne the positive cone in C(H). It is easy to see that P has nonempty interior (which is needed for certain strong duality results that we will later show). We say that f ≥ g for f , g ∈ C(H) if f − g ∈ P and g ≤ 0 if −g ∈ P. We write g > 0 if g(Q, b) > 0 for every (Q, b) ∈ H. The relevance of this formalism is that the function G(q, η) = (q − Q−1 b) Q(q − Q−1 b) − η (for (q, η) ﬁxed) is an element of the space C(H) and the constraint in (6.6) may be regarded as one of the form G(q, η) ≤ 0 (or G(q, η) ∈ P) where G : Rq × R → C(H) associates every choice of (q, η) to the continuous function {(q − Q−1 b) Q(q − Q−1 b) − η | (Q, b) ∈ H} on H. Therefore, (6.6) can be equivalently written as: 18 To see joint convexity in (Q, b), write f (q, (Q, b)) = q Qq − 2 q b + b Q−1 b. Observing now that q Qq − 2q b is jointly convex in (Q, b), and that b Q−1 b = max π {2 b π − π Qπ } is the maximum over π of a family of π -parameterized linear functions of (Q, b) and hence jointly convex in (Q, b), it follows that f (Q, b) is jointly convex as claimed.

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

663

p∗ min η subject to G(q, η) ≤ 0.

(6.7)

q, η

Observe that p∗ in (6.7) (or equivalently (6.6)) is achieved uniquely by some (q ∗ , η∗ ). Further deﬁnitions are required to determine the dual problem associated with (6.7). The dual (or conjugate) space C ∗ (H) is (isomorphic to) the set of measures deﬁned on the Borel sets B(H) of H with bounded total variation

|μ(d(Q, b))| < ∞ ; C ∗ (H) = μ H

see, for example, section IV.6.3 in Dunford and Schwartz (1988). Observe that elements of C ∗ (H) are signed measures. The dual cone of P is deﬁned by P ∗ {μ ∈ C ∗ (H) | H f dμ ≥ 0, ∀ f ∈ P} (see Luenberger 1968) and is equal to the subset of C ∗ (H) consisting of positive measures P ∗ = {μ ∈ C ∗ (H) | μ(A) ≥ 0, ∀ A ∈ B(H)}. We write μ ≥ 0 when μ ∈ P ∗ . We now follow a fairly standard duality argument to analyze (6.7); see, for example, Luenberger (1968). Lagrange multipliers for the constraint G(π , η) ≤ 0 in (6.7) are elements of P ∗ and the Lagrangian associated with (6.7) is

L(q, η, μ) = η − {η − [q − Q−1 b] Q[q − Q−1 b]}μ(d(Q, μ)). H

∗

For every μ ≥ 0 (i.e. μ ∈ P ) deﬁne the dual function: ψ(μ) min L(q, η, μ) q, η

q − Q−1 b Q q − Q−1 b μ(d(Q, b)) = min [1 − μ(H)]η + (q, η)

=

H

⎧

⎪ ⎨ min [q − Q−1 b] Q[q − Q−1 b] μ(d(Q, b)),

μ(H) = 1,

⎪ ⎩ −∞,

μ(H) = 1.

q

H

It follows that ⎧

⎪ ⎪ b Q−1 bμ(d(Q, μ)) ⎪ ⎪ ⎪ H ⎪ ⎨

−1

ψ(μ) = bμ(d(Q, b)) Qμ(d(Q b)) bμ(d(Q, b)) , μ(H) = 1 − ⎪ ⎪ ⎪ H H H ⎪ ⎪ ⎪ ⎩ −∞, otherwise, where the minimizing q in the deﬁnition of ψ(μ) is

−1

Qμ(d(Q, b)) bμ(d(Q, b)) = [Eμ (Q)]−1 Eμ (b). qμ = H

H

Observe that the dual variable μ is a positive measure (i.e., μ ∈ P ∗ ) that satisﬁes μ(H) = 1 and hence is a probability measure on H. In recognition of this, we will adopt the convention

664

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

Îź

E (Q)

QÎź(d(Q, b)) H

(and similarly for EÎź (b)) when Îź(H) = 1, and Ďˆ(Îź) can then be written in the following equivalent form: Îź âˆ’1 E [b Q b] âˆ’ EÎź (b) [EÎź (Q)]âˆ’1 EÎź (b), Îź(H) = 1 Ďˆ(Îź) = âˆ’âˆž, otherwise. The dual function Ďˆ(Îź) is a lower bound to the optimal cost pâˆ— of the optimization problem (6.5) (or equivalently, (6.6)) for every feasible dual variable Îź âˆˆ P âˆ— , and the associated dual problem to (6.6) is Îź âˆ’1 E [b Q b] + min EÎź [q Qq âˆ’ 2b q] max Ďˆ(Îź) = max (6.8) q Îźâ‰Ľ0, Îź(H)=1 Îźâ‰Ľ0, Îź(H)=1 Îź âˆ’1 E [b Q b] âˆ’ EÎź (b) [EÎź (Q)]âˆ’1 EÎź (b) . = max Îźâ‰Ľ0, Îź(H)=1

The following result shows that strong duality holds, and that the maximum in (6.8) is achieved (thus justifying our use of â€œmaxâ€? instead of â€œsupâ€?). PROPOSITION 6.2. The dual function Ďˆ(Îź) is concave in Îź, the dual problem achieves its optimal solution Îźâˆ— , and strong duality holds (6.9)

min max [q âˆ’ Qâˆ’1 b] Q[q âˆ’ Qâˆ’1 b] = q

(Q, b)âˆˆH

max

Îźâ‰Ľ0, Îź(H)=1

Ďˆ(Îź).

The optimal robust portfolio is Ď€ âˆ— (t, x(t)) = q âˆ— x(t) where âˆ—

âˆ—

q âˆ— = [EÎź (Q)]âˆ’1 EÎź (b)

(6.10) and

Îźâˆ— arg

max

Îźâ‰Ľ0, Îź(H)=1

Ďˆ(Îź).

Ď€ âˆ— (t, x(t)) belongs to the admissible class A. Proof. Clearly the cost function in (6.6) (or equivalently (6.7)) is convex and the constraint deďŹ ned by the functional G : Rq Ă— R â†’ C(H) G(q, Îˇ) = [q âˆ’ Qâˆ’1 b] Q[q âˆ’ Qâˆ’1 b] âˆ’ Îˇ,

(Q, b) âˆˆ H

is convex in (q, Îˇ). In addition, G(q, Îˇ) < 0 is satisďŹ ed by choosing Îˇ sufďŹ ciently small, and pâˆ— is ďŹ nite because (6.5) and (6.6) (or (6.7)) are equivalent. It follows from Theorem 1 on p. 224 in Luenberger (1968) that pâˆ— min Îˇ subject to G(q, Îˇ) â‰¤ 0 q, Îˇ

=

max

Îźâ‰Ľ0, Îź(H)=1

Ďˆ(Îź)

which is precisely (6.9), and that the optimal solution for the dual problem is achieved by some Îźâˆ— . Finally, because the optimal solution (q âˆ— , Îˇâˆ— ) of (6.6) or (6.7) is achieved

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

665

(which follows from the observation that (6.5) achieves its optimal solution q ∗ and has ﬁnite optimal cost), it follows again from Theorem 1 on p. 224 in Luenberger (1968) that q ∗ is given by (6.10). Clearly π ∗ (t, x(t)) = q ∗ x(t) is admissible. Observe once again that the dual variable μ is a probability measure over the alternative models and the optimal robust policy (6.10) corresponds to the optimal policy for the following utility maximization problem ⎧ max E [log(x(T))] ⎪ ⎪ ⎪ π ⎪ ⎪ ⎪ ⎨ subject to: (6.11) ⎪ ⎪ d x(t) = [r x(t) + π (t) b∗ ] dt + π (t) σ ∗ d W(t), ⎪ ⎪ ⎪ ⎪ ⎩ x(0) = x 0

where the representative model (b∗ , σ ∗ ) is determined from μ∗ via ∗

b∗ = Eμ (b),

∗

σ ∗ σ ∗ = Eμ (Q).

The structure of the optimal robust portfolio for (6.2) is similar to that of the completely pessimistic (3.11) and the totally optimistic (3.19) investors in that the optimal policy coincides with the optimal choice under an appropriately chosen representative model (Q∗ , b∗ ). What is interesting is that in solving (6.8) a representative model which balances between the completely pessimistic (Proposition 3.1) and totally optimistic (Proposition 3.4) extremes is chosen. Alternatively, the investor’s problem in the ﬁrst equality of (6.8)

min Eμ {q Qq − 2b q} min (6.12) {q Qq − 2b q}μ(d(Q, b)) q

q

(Q, b)∈H

can be interpreted as a Bayesian version of the problem (3.4) with prior μ on the unknown parameter. The prior distribution that delivers the optimal robust portfolio is characterized by the solution of the dual problem (6.9) and is nondegenerate unless H is a singleton. The signiﬁcance of this observation is that it relates the benchmarking problem (and particularly (6.5)) with a Bayesian problem (6.12). This issue is explored further in Lim, Shanthikumar, and Vahn (2010) in the context of a dynamic model where learning is possible.

7. EXTENSIONS We consider extensions of our robust asset allocation problem to “non-log optimal” benchmark wealth processes and objectives of the form E[ f (y(T))U(x(T)/y(T))] (where f (y) = yκ , 0 < κ < q, is an increasing nonnegative function of y). One observation from the results of this section is that the general solution structure of the benchmarked problem does not change signiﬁcantly in these situations.

7.1. Nonlogarithmic Benchmark Processes One limitation of the results in Section 5 is that the benchmark process y(t) corresponds to maximizing a log utility: U B (y) = log (y). In this section, we consider benchmarks

666

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

y(t) when U B (y) is not necessarily logarithmic. In this context, an interesting question concerns the impact of the risk aversion of both the benchmark investor as well as the utility function U(z) (which need not be the same) on the resulting solution of (5.2). 7.1.1 Formulation. In this section, we consider benchmark portfolios of the form (7.1)

ψ(t) =

1 [σ (t)σ (t) ]−1 b(t)y(t), 1−γ

γ <1

for which the corresponding wealth process is (7.2)

dy(t) = y(t) r +

1 1 −1 −1 b(t)(σ (t)σ (t) ) b(t) dt + b(t) (σ (t)σ (t) ) σ (t)d W(t) . 1−γ 1−γ

Clearly, the benchmark wealth process y(t) is strictly positive while the choice of γ = 0 gives the log-optimal benchmark portfolio (6.1). We shall focus on the problem ⎧ x(T) ⎪ ⎪ min E U max ⎪ ⎪ π H(t) y(T) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ subject to: (7.3)

x(t) is the solution of (4.2) under (π (t), H(t)), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ y(t) is the solution of (7.2) under H(t), ⎪ ⎪ ⎪ ⎪ ⎩ π (t) ∈ U, H(t) ∈ N .

We can justify the benchmark portfolio (7.1) in a number of ways. First, (7.1) is the solution of a Merton problem with utility function U B (y) = γ1 yγ when the parameters (r (t), b(t), σ (t)) are deterministic. It follows that γ can be interpreted as the risk-aversion parameter for the benchmark investor. Alternatively, (7.1) is the optimal portfolio of a locally mean–variance optimal investor who chooses his/her target earnings to be a constant proportion of his/her current wealth. Speciﬁcally, if y(0) = y is the investor’s wealth at the beginning of a time slice of size δ and his/her portfolio is ψ, then excess return over risk free investment is Rδ (ψ) y(δ) − y(0)(1 + r ) = b ψδ +

√ δψ σ Z.

(7.1) is optimal for the mean–variance problem ⎧ min Var[Rδ (ψ)] = δψ σ σ ψ ⎪ ⎪ ⎪ ⎨ ψ subject to: ⎪ ⎪ ⎪ ⎩ E[Rδ (ψ)] = δψ b ≥ δv y, when v=

b(t) (σ (t)σ (t) )−1 b(t) ≥ 0. 1−γ

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

667

If z(t) = x(t)/y(t) deﬁnes the benchmarked wealth process and u(t) = π (t)/y(t), then (7.3) is equivalent to

(7.4)

⎧ max min E[U(z(T))] ⎪ ⎪ ⎪ H u(t) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ subject to: ⎪ ⎪ ⎪ ⎪ ⎪ (σ (t)σ (t) )−1 b(t) γ ⎪ ⎪ u(t) − z(t) b(t) dt dz(t) = − ⎨ 1−γ 1−γ ⎪ ⎪ ⎪ (σ (t)σ (t) )−1 b(t) ⎪ ⎪ σ (t) d W(t) + u(t) − z(t) ⎪ ⎪ 1−γ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ z(0) = 1, ⎪ ⎪ ⎪ ⎪ ⎩ u(t) ∈ U, H(t) = (b(t), σ (t), λ(t)) ∈ N .

It can be shown that the value function V(t, z) associated with (7.4) is strictly concave and increasing in z for every t whenever U(z) is strictly concave and increasing. When U(z) = zη /η (η < 1), dynamic programming techniques give the following characterization of the value function and optimal policy for (7.4). PROPOSITION 7.1. When U(z) = zη /η (η < 1), the value function for (7.4) is zη 1−η γ, η V(t, z) = exp −(T − t) min max G (q, H) q H η 2 where 1 − η ��� γ (σ σ )−1 b 1 − η − γ (σ σ )−1 b σσ q − G γ , η (q, H) = q − 1−η 1−γ 1−η 1−γ 2 γ b (σ σ )−1 b. − (1 − η)(1 − γ )

(7.5)

The optimal portfolio is u ∗ (t, z(t)) = q ∗ z(t), or equivalently π ∗ (t, x(t)) = q ∗ x(t), and H∗ is the optimal choice by nature, where (7.6)

(q ∗ , H∗ ) = arg min max G γ , η (q, H). q

H∈H

The static problem (7.6) is convex in q. π ∗ (t) belongs to A. The static optimization problem (7.6) is analogous to the local problems (5.4) and (6.5) associated with the log-optimal benchmark, and as in these cases, is a convex optimization problem in q. The following result gives a Bayesian characterization of q ∗ and can be proven along the lines as Proposition 6.2. PROPOSITION 7.2. Suppose that U(z) = zη /η for η < 1 and γ < 1 in (7.4). Then min max G γ , η (q, H) = q

H∈H

max

μ≥0, μ(H)=1

ψ(μ)

668

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

where ψ(μ)

1−η−γ (1 − η)(1 − γ )

2

γ − (1 − η)(1 − γ )

[Eμ (b (σ σ )b) − Eμ (b) [Eμ (σ σ )]−1 Eμ (b)]

2

μ

−1

E [b (σ σ ) b]

is the dual function. The dual problem μ∗ arg

(7.7)

max

μ≥0, μ(H)=1

ψ(μ)

is a convex optimization problem and the optimal solution μ∗ is achieved. The optimal portfolio is q∗ =

(7.8)

1−η−γ ∗ ∗ [Eμ (σ σ )]−1 Eμ (b). (1 − η)(1 − γ )

The dual problem (7.7) is analogous to the dual problem (6.8) for the case of a logoptimal benchmark. The key difference between (7.7) and (6.8) is the additional term in the dual objective which plays the role of shifting the balance between the “optimistic” and “pessimistic” terms Eμ (b (σ σ )−1 b and Eμ (b) [E(σ σ )]−1 Eμ (b). Observe too that the optimal policy q ∗ given by (7.8) is also affected by γ , and that (7.7) and (7.8) reduce to the solution with a log-optimal benchmark when γ = 0. 7.1.2 Comparative Statics. We can now evaluate the impact of the risk aversion of the benchmark investor γ and the risk aversion of the investor towards missing the benchmark η on the resulting optimal portfolio π ∗ (t) (or equivalently, q ∗ (t)). Consider ﬁrst the impact of letting γ → −∞. It is clear from (7.1) that ψ(t) = 0 when γ → −∞ which corresponds to the benchmark investor putting all his/her money in the money-market account. The t resulting benchmark process is nothing but the moneymarket account y(t) = x0 exp{ 0 r (s)ds} (which is clear from (7.2)) and it follows from (7.4) that z(t) = x(t)/y(t) is nothing but the usual (money-market account) discounted wealth and satisﬁes dz (t) = b(t) u(t) dt + u(t) σ (t)d W(t), z(0) = 1. The dual objective is ψ(μ) = −

1 [Eμ (b)] [Eμ (σ σ )]−1 [Eμ (b)] (1 − η)2

and the dual problem becomes μ∗ = arg

max

μ≥0, μ(H)=1

ψ(μ) = −

1 arg min [Eμ (b)] [Eμ (σ σ )]−1 [Eμ (b)]. μ≥0, μ(H)=1 (1 − η)2

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

669

The resulting optimal portfolio is q∗ =

1 ∗ ∗ [Eμ (σ σ )]−1 Eμ (b). 1−η

When H is convex, μ∗ is a degenerate measure on {(b∗ , σ ∗ σ ∗ )} where (b∗ , σ ∗ σ ∗ ) = arg

min

(b, σ σ )∈H

b (σ σ )−1 b.

The resulting optimal portfolio is q∗ =

1 (σ ∗ σ ∗ )−1 b∗ 1−η

which coincides exactly with the solution of the standard worst-case dynamic asset allocation problem (3.11) with a power utility U(x) = η1 xη of terminal wealth; see (3.17). The other interesting limiting case corresponds to letting η → −∞ which can be interpreted as the investor being highly averse to missing the benchmark. In this case, the dual objective becomes (7.9)

ψ(μ) =

1 {Eμ (b (σ σ )−1 b) − [Eμ (b)] [Eμ (σ σ )]−1 [Eμ (b)]} (1 − γ )2

and (7.10)

q∗ =

1 ∗ ∗ [Eμ (σ σ )]−1 Eμ (b). 1−γ

The dual problem (7.9), its solution μ∗ , and the associated representative model (b∗ , σ ∗ ), are equivalent to the ones encountered when the benchmark investor maximizes a logutility (see, e.g., (6.8) and Proposition 6.2). Unlike the case of the log-optimal benchmark, however, the optimal portfolio q ∗ given by (7.10) depends on γ and coincides with the optimal asset allocation rule for a power utility function with risk-aversion parameter γ (i.e., the same as the benchmark investors) and the representative model (b∗ , σ ∗ ).

7.2. Benchmark Weighted Objectives In this section, we consider an objective of the form ⎧ x(T) ⎪ ⎪ f (y(T))U min E max ⎪ ⎪ π H(t) y(T) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ subject to: (7.11) x(t) is the solution of (4.2) under (π (t), H(t)), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ y(t) is the solution of (7.2) under H(t), ⎪ ⎪ ⎪ ⎪ ⎩ π (t) ∈ U, H(t) ∈ N , where f (y) is an increasing, nonnegative function of y ∈ (0, ∞). The weighting function f (y) puts larger weight on U( x(T) ) when the terminal wealth of the benchmark y(T) investor y(T) is large. The inclusion of this factor in the objective is motivated by the intuition that “good performance” of the investor’s portfolio decision relative to the benchmark y(T) is more important when y(T) is large than when it is small. When

670

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

f (y) = yκ (0 ≤ κ < 1), U(z) = η1 zη (0 < η < 1), and y(t) is the solution of (7.2) for some γ < 1, it can be shown (along the lines of Sections 4, 6, and 7.1) that the value function is zη κ 1−η V(t, z) = exp −(T − t) − r + min max G κ, γ , η (q, H) q H∈H η η 2 where (7.12)

1+κ −η−γ 1+κ −η−γ G κ, γ , η (q, H) = q − (σ σ )−1 b σ σ q − (σ σ )−1 b (1 − η)(1 − γ ) (1 − η)(1 − γ ) κ γ 2 − [γ + (1 − η)(1 + κ − η − γ )] η b (σ σ )−1 b, − (1 − η)2 (1 − γ )2

and the optimal portfolio is u ∗ (t, z(t)) = q ∗ z(t), or equivalently, π ∗ (t, x(t)) = q ∗ x(t), and H∗ is the optimal choice by nature, where (7.13)

(q ∗ , H∗ ) = arg min max G κ, γ , η (q, H). q

H∈H

8. EXAMPLES Suppose that the wealth dynamics associated with policy q(t) and model parameters H = (Q, b) are given by ⎧ ⎨ d x(t) = x(t){r + b q(t)} dt + x(t)q(t) σ d W(t), (8.1) ⎩ x(0) = 1, where (for the purposes of this example) the parameters H = (Q, b) are constant over time but constrained to some known uncertainty set H. It is well known that x(T) = e

T 0

T {r +b q(t)− 12 q(t) Qq(t)} dt+ 0 q(t) σ d W(t)

is the solution of (8.1). We assume that expected utility given (H, q(t)) is

T 1 r + b q(t) − q(t) Qq(t) dt. (8.2) V(H, q) E H log(x(T)) = E H 2 0 Within this setting, we consider three investor types: •

Investors blessed with knowledge of the model parameters H ∈ H and who maximize expected utility (8.2) conditional on this information. It is well known that q H (t) = Q−1 b

(8.3) is the optimal portfolio and that

T

V(H) V(H, q H ) = 0

is the associated (optimal) utility value.

1 r + b Q−1 b dt 2

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

•

671

Investors who are ambiguous about the model parameters, but know they belong to H, and decide to adopt the solution qwc of a classical worst-case problem max min E H [log(x(T))] subject to (8.1). q

H∈H

We know from Proposition 3.3 that (8.4)

qwc = arg max min {2q b − q Qq} = [Q∗ ]−1 b∗ q

(Q,b)∈H

where (Q∗ , b∗ ) arg min b Q−1 b (Q, b)∈H

(see also (3.17) with η = 0). The associated utility as a function of H (substitute qwc into (8.2)) is

T 1 Qqwc dt. r + b qwc − qwc Vwc (H) V(H, qwc ) = 2 0 •

Investors who are ambiguous about the model parameters, but know they belong to H, and decide to adopt the solution qbm of a “benchmarking” problem (7.4) in which the utility function is logarithmic (U(z) = log(z)). We know from Proposition 7.1 (with η = 0) that 2 γ −1 −1 −1 (8.5) qbm = arg min max (q − Q b) Q(q − Q b) − bQ b , q (Q,b)∈H 1−γ where γ = 0 in the case of the log-utility benchmark. The corresponding utility as a function of H ∈ H (substitute qbm into (8.2)) is

T 1 r + b qbm − qbm Qqbm dt. Vbm (H) V(H, qbm ) = 2 0

Consider a ﬁnancial market consisting of a single risky asset and a money market account. We assume that r = 0.05, that initial wealth of all investors x(0) = 1 and that the time horizon is T = 1. We adopt the uncertainty set (8.6)

H = {(Q, b) | 0.1 ≤ b ≤ 0.4,

0.5 ≤ Q ≤ 0.7}.

When we use log-optimal benchmark (γ = 0), the optimal investment policy for the static worst-case problem (8.4) is qwc = 0.143, while that of the benchmarking problem (8.5) is qbm = 0.444. Figure 8.1 plots Vwc (H), Vbm (H), and V (H) over the family of alternative models H. It compares the utilities Vwc (H) and Vbm (H) that the investor would obtain under the “robust” policies qwc and qbm to the utility V (H) which he/she would obtain in the ideal situation when the parameters were known and he/she invested optimally. By deﬁnition, the knowledgeable investor’s utility V (H) dominates Vbm (H) and Vwc (H). In addition, because the worst-case investor optimizes for the most pessimistic model, it follows that his/her utility equals that of the knowledgeable investor, and dominates that of the benchmarking investor, for the most pessimistic model (b∗ = 0.1, Q∗ = 0.7). On the other hand, the benchmark investor’s utility Vbm (H) sticks closer to the knowledgeable

672

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

FIGURE 8.1. Plot of Vbm (H), Vwc (H), and V(H) when H is deﬁned by (8.6).

investor’s V (H) over the family of models and substantially dominates that of the worst case investor for more “optimistic” models (e.g., when b = 0.4, Q = 0.5). The benchmark investor’s problem was designed to deliver a portfolio that does “reasonably well” for the entire range of possible models, and not just the worst case, which is exactly what we see. It is also interesting to observe the sensitivity of portfolio weights qbm and qwc to variations in the uncertainty set. The following example shows that they can be quite different. Consider uncertainty sets of the form δb δb H = (Q, b)b0 − ≤ b ≤ b0 + , Q0 (1 − δ Q ) ≤ Q ≤ Q0 (1 + δ Q ) 2 2 which are characterized by (b0 , δ b , Q0 , δ Q ). Observe that the uncertainty set is described by the nominal parameters (b0 , Q0 ) and the maximum deviations from the nominal point determined by δ b and δ Q . We choose the additive deviation for the expected return, but the multiplicative deviation for the variance due to the nonnegativity of the variance. The plots of qbm and qwc as functions of the ambiguity level in the expected return measured by δ b under various alternative models are given in Figure 8.2, and those as functions of the ambiguity level in the variance measured by δ Q under various alternative models are given in Figure 8.3. These two examples are computed under the assumption of the log-utility benchmark. From Figures 8.2 and 8.3, we can see that qbm increases as (i) δ b increases, (ii) b0 increases, (iii) δ Q increases, and (iv) Q0 decreases. It might be surprising to see the ﬁrst and the third observations above, which suggest that the investor should increase his or her portfolio weights when the level of ambiguity increases. However, as we mentioned earlier, adopting the benchmarking framework is more or less a tradeoff between pessimism and optimism, and hence adding better and worse alternative models to the uncertainty set could result in an increase in the optimal portfolio weight as we see here. On the other hand, qwc decreases when the level of ambiguity increases, and it equals zero when the uncertainty set contains zero expected return. This, as expected, is due to its inherited property of being worst-case concentrated.

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES δQ = 0

γ=0

δQ = 0.3

δQ = 0.5

673

δQ = 0.7

2

WC: b0= 0.1 WC: b0= 0.2 WC: b0= 0.3 BM: b0= 0.1 BM: b0= 0.2 BM: b = 0.3

Q = 0.3

1.5

0

1 0.5

0

portfolio weights, q

0

2 Q = 0.5

1.5

0

1 0.5 0

2 Q0 = 0.7

1.5 1 0.5 0 0

0.5

0

0.5

0

0.5

0

0.5

size of ambiguity set of expected return, δ

b

FIGURE 8.2. Plot of qbm and qwc as function of uncertainty set for mean. γ=0

δ =0

δ = 0.1

b

δ = 0.3

b

δ = 0.5

b

b

1.5

portfolio weights, q

0

b = 0.1

WC: Q0= 0.3 WC: Q = 0.5 0 1 WC: Q0= 0.7 BM: Q0= 0.3 0.5 BM: Q0= 0.5 BM: Q0= 0.7 0

1.5

0

b = 0.2

1

0.5 0

1.5

0

b = 0.3

1 0.5 0 0

0.5

0

0.5

0

0.5

0

size of ambiguity set of variance, δ

Q

FIGURE 8.3. Plot of qbm and qwc as function of uncertainty set for variance.

0.5

674

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

9. CONCLUSION In this paper, we have proposed a new characterization of portfolio robustness and used it as the basis for ﬁnding robust portfolios. The key feature of this approach is that portfolio performance is evaluated by comparing its return to that of a benchmark investor who knows the model and invests optimally conditional on this knowledge. Its advantage over the usual worst-case methodology is that resulting portfolios are less pessimistic for the same uncertainty set. In particular, a portfolio optimizes the benchmarked objective if its return compares favorably to that of the benchmark investor over all alternative models. This stands in contrast with the typical worst-case approach where “good” performance under the “worst” model is rewarded while “poor” performance under all other market models is ignored. We have obtained fairly explicit solutions for the relative performance problem for utility functions that are of log and power type. In particular, the optimal policy is the solution of a static convex optimization problem. This static problem can be interpreted as the local investment problem that relative performance investors need to solve, and is a less pessimistic alternative to the classical worst-case Markowitz problem. The solution of the local problem has an interesting ﬁnancial interpretation: It is the portfolio that minimizes the furthest distance from the set of all optimal portfolios of the benchmark investor under all alternative models. It also coincides, via convex duality, with the solution of an asset allocation problem with a Bayesian prior on the unknown parameters. The prior is characterized by the solution of the dual problem, is nondegenerate, and depends on the risk aversion of both the investor and the benchmark. This contrasts with the Bayesian interpretation of the classical worst-case problem which puts a degenerate prior at the “most pessimistic” alternative model. Finally, although our (original) dynamic robust asset allocation problem does not include learning, the duality relationships for the associated static problem reveal a close connection between Bayesian learning and robust asset allocation with benchmarks. This insight suggests a natural approach for combining robust asset allocation and learning in a dynamic setting, and forms the basis of another paper, Lim, Shanthikumar, and Vahn (2010). This framework can also be extended to handle problems with consumption (Lim, Shanthikumar, and Watewai 2008a).

APPENDIX Proof of Proposition 3.1. Let π Q, b γ Q−1 b = arg min{π Qπ − 2γ π b}. π

Compactness of H implies the existence of a constant M < ∞ such that |b| ≤ M for all (Q, b) ∈ H. Together with the nondegeneracy assumption (3.5), it now follows that |π Q, b | = γ |Q−1 b| ≤ γ Q−1 |b| ≤

γM δ

for every (Q, b) ∈ H (where the constant γ M/δ is independent of (Q, b)), and hence γM π Q, b = γ Q−1 b ∈ S π ∈ Rq |π | ≤ , δ

∀ (Q, b) ∈ H,

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

675

where S is clearly both compact as well as convex. Observe next that (3.4) has a unique solution π ∗ arg min max {π Qπ − 2γ π b}. π

(Q, b)∈H

Indeed this follows from the observation that (A.1)

f (π ) = max {π Qπ − 2γ π b} (Q, b)∈H

(as a function of π ) is strictly convex together with the property that f (π ) ≥ δ|π |2 − 2γ |π ||b| ≥ δ|π |2 − 2γ M|π | → ∞ as |π | → ∞, which follows from the nondegeneracy assumption (3.5) and the compactness of H (which implies the existence of a constant M < ∞ such that |b| ≤ M for all (Q, b) ∈ H). We can now deﬁne C co({π ∗ }, S), the convex hull of {π ∗ } and S. Clearly, C is compact and convex and min max {π Qπ − 2γ π b} π

(Q, b)∈H

= min max {π Qπ − 2γ π b} π ∈C (Q, b)∈H

= max min{π Qπ − 2γ π b} (Q, b)∈H π ∈C

= max min{π Qπ − 2γ π b} (Q, b)∈H

π

= max {−γ 2 b Q−1 b}, (Q, b)∈H

where the ﬁrst and third equalities follow from the deﬁnition of C, while the second follows from the min–max theorem (which allows the exchange of min and max, so long as the function satisﬁes the required marginal convexity/concavity properties, and the variables are constrained to convex compact sets; see Luenberger 1968).

Proof of Proposition 5.1. Deﬁne

(A.2)

⎧ g s (z, u) min E[U(z(T)) | z(s) = z, u(t), H(t)] ⎪ ⎪ ⎪ H∈H ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ subject to: n θi (t) ⎪ − ⎪ dz(t) = [u(t) − z(t )ψ(t)] σ (t) d W(t) + d Mi (t) ⎪ ⎪ 1 + ψ(t) θi (t) ⎪ ⎪ i =1 ⎪ ⎪ ⎪ ⎩ z(s) = z.

Clearly, g(λz1 + (1 − λ)z2 , λπ1 + (1 − λ)π2 ) = min E[U(λz1 (T) + (1 − λ)z2 (T))] H∈H

676

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

where zi (t) is the solution of the wealth equation (A.2) when the initial value z(s) = zi and the portfolio u(t) = u i (t). Concavity of the utility function U(z) implies that g(λz1 + (1 − λ)z2 , λu 1 + (1 − λ)u 2 ) ≥ λg(z1 , u 1 ) + (1 − λ)g(z2 , u 2 ) which shows that g(z, u) is jointly concave in (z, u). Observing that V(s, z) = max g(z, u) u

we see that for any admissible u 1 , u 2 ∈ U V(s, λz1 + (1 − λ)z2 ) = max g(λz1 + (1 − λ)z2 , u) u

≥ g(λz1 + (1 − λ)z2 , λu 1 + (1 − λ)u 2 ) ≥ λg(z1 , u 1 ) + (1 − λ)g(z2 , u 2 ), where the ﬁrst inequality follows from optimality while the second follows from the joint concavity of g(z, u). Because this result is true for every u 1 and u 2 it follows that V(s, λz1 + (1 − λ)z2 ) ≥ λV(s, z1 ) + (1 − λ)V(s, z2 ) which shows that V(s, z) is concave in z. Clearly, V(s, z) is strictly concave if U(z) is strictly concave. To see that V(s, z) is monotonically increasing, let z1 < z2 and suppose that H(t) and u(t) are given and ﬁxed. Denote by (u, zi , zi (T)) the triple that corresponds to starting at z(s) = zi and applying u(t) with zi (T) denoting the resulting terminal condition. Deﬁning δ = z2 − z1 > 0, it can be seen that z2 (T) = z1 (T) + δ v(T) where v(t) is the solution of the SDE: ⎧ n ⎪ θi (t) ⎪ − ⎨ dv(t) = −v(t )ψ(t) σ (t)d W(t) + d Mi (t) , 1 + ψ(t) θi (t) i =1 ⎪ ⎪ ⎩ v(s) = 1, for which it is well known that the closed form solution is v(t) = e− 2 1

t 0

t k t ψ(s) σ (s)σ (s) ψ(s)ds− 0 ψ(s) σ (s)d W(s) i =1 s=0

1 Ni (s) > 0. 1 + ψ(s) θi

It follows that z2 (T) = z1 (T) + δ v(T) > z1 (T) and hence E[U(z1 (T))] ≤ E[U(z2 (T))] for every u(t) and H(t) because U(z) is increasing in z. This implies that g(z1 , u) = min E[U(z1 (T))] ≤ min E[U(z2 (T))] = g(z2 , u), H(t)

H(t)

∀ u,

and hence V(s, z1 ) = max g(z1 , u) ≤ max g(z2 , u) = V(s, z2 ). u

u

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

677

REFERENCES

BEN-TAL, A., and A. NEMIROVSKI (1998): Robust Convex Optimization, Math. Oper. Res. 23, 769–805. BEN-TAL, A., and A. NEMIROVSKI (1999): Robust Solutions to Uncertain Programs, Oper. Res. Lett. 25, 1–13. BEN-TAL, A., and A. NEMIROVSKI (2000): Robust Solutions of Linear Programming Problems Contaminated with Uncertain Data, Math. Prog. 88, 411–424. BERGEMANN, D., and K. SCHLAG (2008): Robust Monopoly Pricing, Working Paper, Yale University. BERNHARD, P., and T. BASAR(2008): H-inﬁnity Optimal Control and Related Minimax Design Problems, Boston: Birkhauser. BLACKWELL, D.(1956): An Analog of the Minimax Theorem for Vector Payoffs, Paciﬁc J. Math. 6, 1–8. CERIA, S., and R. A. STUBBS(2006): Incorporating Estimation Errors into Portfolio Selection: Robust Portfolio Construction, J. Asset Manage. 7(2), 109–127. CHEN, Z., and L. EPSTEIN (2002): Ambiguity, Risk, and Asset Returns in Continuous Time, Econometrica 70(4), 1403–1443. COVER, T. M. (1991): Universal Portfolios, Math. Finance, 1, 1–29. DEMARZO, P. M., I. KREMER,and Y. MANSOUR (2005): On Hannan and Blackwell’s Approachability and Aptions—A Game Theoretic Approach for Option Pricing, Working Paper, Graduate School of Business, Stanford University. DOYLE, J. C., K. GLOVER, P. P. KHARGONEKAR, and B. FRANCIS (1989): State-Space Solutions to Standard H2 and H∞ Optimal Control Problems, IEEE Trans. Autom. Control 34, 831– 847. DUNFORD N., and J. T. SCHWARTZ (1988): Linear Operators, Part 1: General Theory, Hoboken, NJ: Wiley. EL GHAOUI, L., and H. LEBRET (1997): Robust Solutions to Least-Square Problems to Uncertain Data Matrices, SIAM J. Matrix Anal. Appl. 18, 1035–1064. EL GHAOUI L., M. OKS, and F. OUSTRY (2003): Worst Case Value-at-Risk and Robust Portfolio Optimization: A Conic Programming Approach, Operat. Res. 51(4), 543–556. ELLSBERG, D. (1961): Risk, Ambiguity, and the Savage Axioms, Quart. J. Econ. 75, 643– 669. EPSTEIN, L., and M. SCHNEIDER (2003): Recursive Multiple Priors, J. Econ. Theory, 113(1), 1–31. EPSTEIN, L., and T. WANG (1994): Intertemporal Asset Pricing under Knightian Uncertainty, Econometrica 62, 283–322. FOSTER, D., D. LEVINE, and R. VOHRA (1999): Introduction to the Special Issue, Games Econ. Behav. 29, 1–6. GARLAPPI, L., R. UPPAL, and T. WANG (2005): Portfolio Selection with Parameter and Model Uncertainty: A Multi-Prior Approach, Working Paper, London Business School. GILBOA, I., and D. SCHMEIDLER (1989): Maxmin Expected Utility with Non-Unique Prior, J. Math. Econ., 18, 141–153. GOLDFARB, D., and G. IYENGAR (2003): Robust Portfolio Selection Problems, Math. Operat. Res. 28(1), 1–38. GUNDEL, A. (2005): Robust Utility Maximization for Complete and Incomplete Market Models, Finance Stoch. 9, 151–176.

678

A. E. B. LIM, J. G. SHANTHIKUMAR, AND T. WATEWAI

HANNAN, J. (1957). Approximation to Bayes Risk in Repeated Plays, in Contributions to the Theory of Games, M. Dresher, A. Tucker, and P. Wolfe, eds. Princeton, NJ: Princeton University Press, pp. 97–139. HANSEN, L. P., and T. J. SARGENT (2005): Recursive Robust Estimation and Control without Commitment, J. Econ. Theory 124, 258–301. HANSEN, L. P., T. J. SARGENT, G. A. TURMUHAMBETOVA, and N. WILLIAMS (2006): Robustness Control and Model Misspeciﬁcation, J. Econ. Theory 128, 45–90. HAYASHI, T. (2008): Regret Aversion and Opportunity Dependence, J. Econ. Theory 139, 242– 268. KLIBANOFF, P., M. MARINACCI, and S. MUKERJI (2005): A Smooth Model of Decision Making under Ambiguity, Econometrica 73(6), 1849–1892. KLIBANOFF, P., M. MARINACCI, and S. MUKERJI (2009): Recursive Smooth Ambiguity Preferences, J. Econ. Theory 144, 930–976. KNOX, T. A. (2002): Learning How to Invest When Returns Are Uncertain, Working Paper, Harvard University. LIM, A. E. B., and J. G. SHANTHIKUMAR (2007): Relative Entropy, Exponential Utility, and Robust Dynamic Pricing, Operat. Res. 55(2), 198–214. LIM, A. E. B., J. G. SHANTHIKUMAR, and Z. J. M. SHEN (2006): Model Uncertainty, Robust Optimization, and Learning, TutORial Operat. Res. 3, 66–94. LIM, A. E. B., J. G. SHANTHIKUMAR, and G. Y. VAHN (2010): Robust Portfolio Choice with Learning in the Framework of Regret: Single Period Case, Working Paper, Department of Industrial Engineering and Operations Research, University of California, Berkeley. LIM, A. E. B., J. G. SHANTHIKUMAR, and T. WATEWAI (2008a): Robust Portfolio Selection and Consumption Problems in the Framework of Relative Regret, Working Paper, Department of Industrial Engineering and Operations Research, University of California, Berkeley. LIM, A. E. B., J. G. SHANTHIKUMAR, and T. WATEWAI (2008b): Robust Multi-Product Pricing, Working Paper, Department of Industrial Engineering and Operations Research, University of California, Berkeley. LIU, J., J. PAN, and T. WANG (2005): An Equilibrium Model of Rare Event Premia and Its Implication for Option Smirks, Rev. Financ. Stud. 18, 131–164. LUENBERGER, D. G. (1968): Optimization by Vector Space Methods, New York: John Wiley. MAENHOUT, P. J. (2004): Robust Portfolio Rules and Asset Pricing, Rev. Financ. Stud. 17(4), 951–983. MILNOR, J. (1954): Games against Nature, in Decision Processes, R. M. Thrall, C. H. Coombs, and R. L. Davis, eds. New York: John Wiley. NILIM, A., and L. EL GHAOUI (2005): Robust Solutions to Markov Decision Problems with Uncertain Transition Matrices, Operat. Res. 53(5), 780–798. PETERSEN, I. R., M. R. JAMES, and P. DUPUIS (2000): Minmax Optimal Control of Stochastic Uncertain Systems with Relative Entropy Constraints, IEEE Trans. Autom. Control 45, 398– 412. SAVAGE, L. J. (1951): The Theory of Statistical Decision, J. Am. Stat. Assoc. 46, 55–67. SCHIED, A.(2005): Optimal Investments for Robust Utility Functionals in Complete Market Models, Math. Oper. Res. 30(3), 750–764. SCHIED, A. (2007): Optimal Investments for Risk- and Ambiguity-Averse Preferences: A Duality Approach, Finance Stoch. 11(1), 107–129. STOYE, J. (2008): Axioms for Minimax Regret Choice Correspondences, Working Paper, New York University.

ROBUST ASSET ALLOCATION WITH BENCHMARKED OBJECTIVES

679

TERLIZZESE, D. (2006): Relative Minimax, EIEF Working Papers Series, No. 804, Einaudi Institute for Economic and Finance. ¨ UNC ¨ U¨ , R. H., and M. KOENIG (2004): Robust Asset Allocation, Ann. Operat. Res. 132, TUT 157–187. WALD, A. (1950): Statistical Decision Functions, New York: John Wiley.

Mathematical Finance, Vol. 21, No. 4 (October 2011), 595–625

THE STOCHASTIC VOLATILITY MODEL OF BARNDORFF-NIELSEN AND SHEPHARD IN COMMODITY MARKETS FRED ESPEN BENTH Centre of Mathematics for Applications, University of Oslo and School of Management, University of Agder

We consider the non-Gaussian stochastic volatility model of Barndorff-Nielsen and Shephard for the exponential mean-reversion model of Schwartz proposed for commodity spot prices. We analyze the properties of the stochastic dynamics, and show in particular that the log-spot prices possess a stationary distribution deﬁned as a normal variance-mixture model. Furthermore, the stochastic volatility model allows for explicit forward prices, which may produce a hump structure inherited from the meanreversion of the stochastic volatility. Although the spot price dynamics has continuous paths, the forward prices will have a jump dynamics, where jumps occur according to changes in the volatility process. We compare with the popular Heston stochastic volatility dynamics, and show that the Barndorff-Nielsen and Shephard model provides a more ﬂexible framework in describing commodity spot prices. An empirical example on UK spot data is included. KEY WORDS: commodity markets, Ornstein–Uhlenbeck process, stochastic volatility, Heston model, subordinators.

1. INTRODUCTION A classical model for the spot price dynamics in commodity markets is the Schwartz mean-reversion process, deﬁned as the exponential of an Ornstein–Uhlenbeck stochastic process. It has been applied in many markets such as oil (Schwartz 1997), gas (Benth ˇ and Saltyt˙ e-Benth 2004), and electricity (Lucia and Schwartz 2002). Geman (2005) and Eydeland and Geman (1998) propose to use the Schwartz model augmented with the Heston stochastic volatility dynamics (see Heston 1993) to model spot power prices. In this paper, we study an alternative class of stochastic volatility models proposed by Barndorff-Nielsen and Shephard (2001) for equity markets in the context of commodities. Our paper extends parts of the analysis in Hikspoors and Jaimungal (2007), where various spot models are studied based on a diffusion dynamics for the stochastic volatility. A typical example is the popular Heston model, and the authors apply an asymptotic analysis to discuss pricing of options. The stochastic volatility model of BarndorffNielsen and Shephard (2001) (BNS from now on) is deﬁned as the sum of non-Gaussian Ornstein–Uhlenbeck processes. The structure of the model is very simple, however, introducing a very ﬂexible framework for modeling accurately the dependency structure and The author is grateful to Steen Koekebakker for interesting and fruitful discussions. Two anonymous referees are thanked for their constructive criticism of the paper improving its presentation. Manuscript received November 2008; ﬁnal revision received October 2009. Address correspondence to Fred Espen Benth, Centre of Mathematics for Applications, University of Oslo, P.O. Box 1053, Blindern, N–0316 Oslo, Norway; e-mail: fredb@math.uio.no; URL: http://folk.uio.no/fredb/ DOI: 10.1111/j.1467-9965.2010.00445.x C 2010 Wiley Periodicals, Inc.

595

596

F. E. BENTH

leptokurtic distributional properties. Moreover, the BNS stochastic volatility model allows for analytic pricing of forwards, contrary to the Heston model, say. We provide a thorough analysis of the probabilistic properties of the Schwartz dynamics with stochastic volatility of the BNS type. We make frequent comparison with the Heston model and show that in some sense the BNS dynamics offer a generalization of this. For the BNS model, we are able to characterize explicitly the stationary behavior of the log-spot prices. The characterization is in terms of the cumulant function of the stochastic volatility. Further, we show that the (mean-reversion adjusted) spot returns are normal variance-mixtures allowing for a wide range of distributions, such as for instance the popular CGMY (see Carr et al. 2002) and normal inverse Gaussian (see BarndorffNielsen 1998). The normal inverse Gaussian distribution (NIG for short) turns out to be appropriate to model returns in commodity markets. We refer to the works of Benth and ˇ Saltyt˙ e-Benth (2004) for using the Schwartz model with NIG-distributed innovations ¨ for gas and oil, or Borger et al. (2009) for cross-commodity modeling using exponential NIG-distributed L´evy processes. In this paper, we derive the autocorrelation structure of the (adjusted) returns and their squares. Finally, we ﬁt the model to a series of UK gas spot prices collected at the National Balancing Point. The ﬂexibility of the BNS stochastic volatility allows for accurate modeling of the (adjusted) return distribution as well as the exponentially decaying autocorrelation function observed in the squares of the (adjusted) returns. The BNS model allows for explicit pricing of forwards due to its afﬁne form. We calculate prices for forward contracts with ﬁxed maturity time, and show that both the spot price and the factors of volatility enter explicitly in the price dynamics. In fact, although the dynamics of the spot price have continuous paths, the forward price dynamics will be of a mixed diffusion and jump type. At every instance of a jump in volatility, the forward price will experience a jump as well. Considering a stochastic volatility of diffusion type, such as the Heston model, the dynamics of the forward price will remain a diffusion and not have any discontinuities in the paths. We give a detailed analysis of the shape of the forward curve implied by the BNS model. The model may produce a hump shape in the curve as a result of the mean-reversion in the stochastic volatility factors. Explicit knowledge of the hump shape can be given in terms of the speeds of mean-reversion in log-spot prices and volatility factors. We also provide some new insight into the forward curve structure of the Heston model by analyzing the solution of a Riccatti equation. The Heston stochastic volatility dynamics is also mean-reverting, and a hump can be observed in the forward curve. On the other hand, the results are not as explicit as for the BNS model. The ﬁndings of this paper are presented as follows. In Section 2, we deﬁne the spot price dynamics with stochastic volatility of BNS type and study its properties. An empirical study of the model for UK gas data is performed, along with a comparison with the Heston stochastic volatility model. Analytical forward prices are derived in Section 3 together with an explicit price dynamics. The hump-shape of the forward curve is analyzed, and compared with the corresponding forward curves resulting from the Heston model. Finally, in Section 4, we conclude.

2. A MEAN-REVERTING SPOT MODEL WITH STOCHASTIC VOLATILITY In this section, we study the Schwartz one-factor mean-reversion model (see Schwartz 1997) with stochastic volatility deﬁned by the Barndorff-Nielsen and Shephard dynamics

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

597

(see Barndorff-Nielsen and Shephard 2001). The probabilistic properties of the spot price are studied in detail, using the Heston stochastic volatility model for comparison. Finally, we make an empirical study of UK natural gas spot price data, where the one-factor model with the proposed stochastic volatility dynamics is demonstrated to ﬁt well.

2.1. Deﬁnition and Properties Let (, F, P) be a probability space equipped with a ﬁltration Ft satisfying the usual conditions (see Karatzas and Shreve 1991). We denote by Lj , j = 1, . . . , n, n independent subordinator processes, that is, increasing L´evy processes. We choose to work with the RCLL version of the Lj s. Let us introduce the stochastic volatility model of Barndorff-Nielsen and Shephard (2001). Deﬁne for j = 1, . . . , n the Ornstein–Uhlenbeck process (2.1)

dYj (t) = −λ j Yj (t) dt + d L j (t),

where λj > 0 is constant. The L´evy measure of Lj is denoted j . Let w j > 0 and w 1 + · · · + w n = 1, and deﬁne a volatility process σ (t) by (2.2)

σ 2 (t) =

n

w j Yj (t).

j =1

Note that because the Lj s are subordinators, it follows that Yj (t) are nonnegative for all j = 1, . . . , n, and thus σ 2 (t) is nonnegative as well. Therefore, σ (t), the square-root of σ 2 (t), is well deﬁned. We shall assume that the subordinators are driftless, that is, that Lj (1) have cumulant functions given by (2.3)

ψ j (θ ) ln E eiθ L j (1) =

∞

{eiθ z − 1} j (dz),

0

for j = 1, . . . , n. Note that we have a constant volatility process σ (t) whenever λj = 0 and Lj = 0 for all j . If only the latter holds, the volatility becomes deterministic, converging to zero with time. To account for possible seasonal effects one may allow for time-dependent coefﬁcients w j and λj . However, we shall not consider this case in the present paper, but restrict our attention to constant coefﬁcients. In our spot price model, we suppose that the seasonal level is modeled by a bounded and measurable function : [0, ∞) → R+ . In case there is no seasonality, (t) is simply a constant, usually put equal to 1. Deﬁne the spot price S(t) to be (2.4)

S(t) = (t) exp(X(t)),

with (2.5)

d X(t) = (μ − α X(t)) dt + σ (t) d B(t),

598

F. E. BENTH

where B(t) is a Brownian motion independent of Lj , j = 1, . . . , n, and μ and α > 0 are ˆ Formula yields the dynamics of S(t), constants. A straightforward application of Ito’s (t) 1 2 (2.6) d S(t) = + μ + σ (t) + α ln (t) − α ln S(t) S(t) dt + σ (t)S(t) d B(t). (t) 2 We observe that this is a generalization of the Schwartz’ one-factor model, see Schwartz (1997). In this paper, we shall frequently encounter a scaling function which compares two speeds of mean-reversions in the spot model. Introduce for two positive constants a and b the notation ⎧ ⎨ 1 e−bs − e−as , a = b, γ (s; a, b) a − b (2.7) ⎩ se−as , a = b. Here, s ≥ 0, and we note that the function is continuously deﬁned for a → b. We easily ﬁnd that LEMMA 2.1. The function γ (s; a, b) deﬁned in (2.7) is nonnegative and bounded. Moreover, γ (0; a, b) = 0 and lims→∞ γ (s; a, b) = 0. Deﬁne ⎧ ln a − ln b ⎪ ⎪ ⎨ a − b , a = b, s∗ = ⎪ 1 ⎪ ⎩ , a = b. a The function is strictly increasing for 0 ≤ s < s ∗ and strictly decreasing for s > s ∗ , and has a maximum value for s = s ∗ . Proof . It is easy to see that γ (s; a, b) is nonnegative and bounded, and that it is zero for s = 0 and s → ∞. A straightforward differentiation yields that (for a = b) γ (s; a, b) =

1 (ae−as − be−bs ). a−b

Putting this equal to zero gives s ∗ and the monotonicity of γ . The case a = b follows similarly. We study the stationarity properties of the deseasonalized log-spot prices X(t). In the next proposition, we calculate the cumulant function ψ t (θ ) of X(t), deﬁned by (2.8) ψt (θ ) = ln E eiθ X(t) . Here, θ is a constant. PROPOSITION 2.2. For θ ∈ R it holds that μ ψt (θ ) = iθ e−αt X(0) + iθ (1 − e−αt ) α 1 − θ2 w j Yj (0)γ (t; 2α, λ j ) + 2 n

n

j =1

j =1

t 0

1 ψ j i θ 2 w j γ (u; 2α, λ j ) du. 2

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

599

Here, Ďˆ j is the cumulant function of Lj (1) deďŹ ned in (2.3) for j = 1, . . . , n. Proof . Because X(t) = X(0)eâˆ’Îąt +

Îź (1 âˆ’ eâˆ’Îąt ) + Îą

t

Ďƒ (s)eâˆ’Îą(tâˆ’s) d B(s),

0

we ďŹ nd Ďˆt (Î¸ ) = iÎ¸ X(0)e

âˆ’Îąt

t Îź âˆ’Îąt âˆ’Îą(tâˆ’s) + i (1 âˆ’ e ) + ln E exp iÎ¸ Ďƒ (s)e d B(s) . Îą 0

To this end, we use the argument for theorem 2.2 in Nicolato and Venardos (2003), and introduce the Ďƒ -algebra Gt Ďƒ {Ďƒ 2 (s), 0 â‰¤ s â‰¤ t}

Ft .

By the properties of conditional expectation, it follows, Gt 0 t 1 = E E exp âˆ’ Î¸ 2 Ďƒ 2 (s)eâˆ’2Îą(tâˆ’s) ds Gt 2 0 âŽžâŽ¤ âŽĄ âŽ› t n 1 = E âŽŁexp âŽ?âˆ’ Î¸ 2 wj Yj (s)eâˆ’2Îą(tâˆ’s) ds âŽ âŽŚ . 2 0

t t E exp iÎ¸ Ďƒ (s)eâˆ’Îą(tâˆ’s) d B(s) = E E exp iÎ¸ Ďƒ (s)eâˆ’Îą(tâˆ’s) d B(s) 0

j =1

Because Yj (s) = Yj (0)e

âˆ’Îť j s

s

+

eâˆ’Îť j (sâˆ’u) d L j (u),

0

we ďŹ nd (assuming 2Îą = Îťj for all j = 1, . . . , n)

t

e2Îąs Yj (s) ds =

0

=

Yj (0) (2Îąâˆ’Îť j )t e âˆ’1 + 2Îą âˆ’ Îť j

t

s

e(2Îąâˆ’Îť j )s 0

Yj (0) (2Îąâˆ’Îť j )t 1 âˆ’1 + e 2Îą âˆ’ Îť j 2Îą âˆ’ Îť j

eÎť j u d L j (u) ds

0

t

eÎť j u e(2Îąâˆ’Îť j )t âˆ’ e(2Îąâˆ’Îť j )u d L j (u),

0

where we have applied the stochastic Fubini Theorem (see, e.g., Protter 1990). If 2Îą = Îťj for one or more j â€™s, we get by a similar calculation

t

eÎť j s Yj (s) ds = tYj (0) +

0

t

(t âˆ’ u)eÎť j u d L j (u).

0

Hence, from the deďŹ nition of the scaling function Îł (u; a, b),

t

Yj (s)e 0

âˆ’2Îą(tâˆ’s)

t

ds = Yj (0)Îł (t; 2Îą, Îť j ) + 0

Îł (t âˆ’ u; 2Îą, Îť j ) d L j (u).

600

F. E. BENTH

Appealing to the independence of Lj , this leads to âŽĄ

âŽ›

1 wj ln EâŽŁexp âŽ?âˆ’ Î¸ 2 2 n

j =1

âŽžâŽ¤

t

Yj (s)eâˆ’2Îą(tâˆ’s) ds âŽ âŽŚ

0

t n n 1 1 = âˆ’ Î¸2 w j Yj (0)Îł (t; 2Îą, Îť j ) + ln E exp âˆ’ Î¸ 2 w j Îł (t âˆ’ u; 2Îą, Îť j ) d L j (u) . 2 2 0 j =1

j =1

Using the independent increment property of Lj along with the deďŹ nition of the cumulant function of Lj (1), yields the desired result. Finally, note that Ďˆ j (i f (u)) is well deďŹ ned for functions f being nonnegative. Observe that for a constant volatility Ďƒ , when Îťj = Ďˆ j = 0 holds for all j , we have Ďˆt (Î¸ ) = iÎ¸ X(0)eâˆ’Îąt + iÎ¸

Îź 1 Ďƒ 2 (0) (1 âˆ’ eâˆ’Îąt ) âˆ’ Î¸ 2 (1 âˆ’ eâˆ’2Îąt ). Îą 2 2Îą

Not surprisingly, this cumulant coincides with the standard Schwartz model. In the limit when t â†’ âˆž, X(t) is normally distributed with mean Îź/Îą and variance equal to Ďƒ 2 (0)/2Îą. For the spot model with stochastic volatility, we ďŹ nd the cumulant function of the stationary distribution of X(t) in Proposition 2.3. PROPOSITION 2.3. Assume that for j = 1, . . . , n ln z j (dz) < âˆž. zâ‰Ľ2

Then, when t â†’ âˆž, there exists a stationary distribution for X(t) with cumulant function Îź Ďˆâˆž (Î¸ ) = i Î¸ + Îą n

j =1

âˆž 0

Ďˆj

1 2 iÎ¸ w j Îł (u; 2Îą, Îť j ) du. 2

Proof . We adapt the argument in theorem 17.5 of Sato (1999). According to Bochnerâ€™s Theorem (see, e.g., proposition 2.5 in Sato 1999), we must show that the limit Ďˆ âˆž exists and is continuous at zero. This holds if

âˆž 0

1 2 iÎ¸ w j Îł (u; 2Îą, Îť j ) du < âˆž, j = 1, . . . , n, sup Ďˆ j 2

|Î¸ |â‰¤a

for every a > 0. Consider ďŹ rst the case 2Îą = Îťj , and assume without loss of generality 2Îą > Îťj . Note that we can majorize as follows when x and y are two positive numbers: 1 âˆ’ eâˆ’xy â‰¤ xyex 1(0 â‰¤ y â‰¤ 1) + 1(y > 1).

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

601

Hence, by letting x = (1 âˆ’ eâˆ’(2Îąâˆ’Îť j )u )Î¸ 2 w j /2(2Îą âˆ’ Îť j ) and y = eâˆ’Îť j u z, we can estimate the cumulant function as follows: Ďˆ j 1 iÎ¸ 2 w j Îł (u; 2Îą, Îť j ) 2 âˆž wj 1 1 âˆ’ exp âˆ’ Î¸ 2 (1 âˆ’ eâˆ’(2Îąâˆ’Îť j )u )eâˆ’Îť j u z j (dz) = 2 2Îą âˆ’ Îť j 0 âˆž wj 1 2 wj 1 (1 âˆ’ eâˆ’(2Îąâˆ’Îť j )u )eâˆ’Îť j u z exp (1 âˆ’ eâˆ’(2Îąâˆ’Îť j )u ) â‰¤ Î¸2 Î¸ 2 2Îą âˆ’ Îť j 2 2Îą âˆ’ Îť j 0 âˆž 1(eâˆ’Îť j u z > 1) j (dz) Ă— 1(0 â‰¤ eâˆ’Îť j u z â‰¤ 1) j (dz) +

â‰¤

1 2 wj 2 Î¸ eÎ¸ w j /2(2Îąâˆ’Îť j ) 2 2Îą âˆ’ Îť j 0 âˆž + 1(eâˆ’Îť j u z > 1) j (dz).

0 âˆž

z1(0 â‰¤ eâˆ’Îť j u z â‰¤ 1) j (dz)eâˆ’Îť j u

0

Thus,

âˆž 0

1 2 iÎ¸ w j Îł (u; 2Îą, Îť j ) du sup Ďˆ j 2

|Î¸ |â‰¤a

âˆž âˆž 1 2 wj a 2 w j /2(2Îąâˆ’Îť j ) e z1(0 â‰¤ eâˆ’Îť j u z â‰¤ 1) j (dz)eâˆ’Îť j u du â‰¤ a 2 2Îą âˆ’ Îť j 0 0 âˆž âˆž 1(eâˆ’Îť j u z > 1) j (dz) du. + 0

0

However, âˆž âˆž 1(eâˆ’Îť j u z > 1) du j (dz) = 0

0

âˆž âˆž 1

1(u < ln z/Îť j ) du j (dz) =

0

1 Îťj

âˆž

ln z j (dz),

1

which is ďŹ nite by the assumption on j (dz). Furthermore, âˆž âˆž z1(eâˆ’Îť j u z â‰¤ 1)eâˆ’Îť j u du j (dz) 0

0

1

=

âˆž

z 0

1 = Îťj

eâˆ’Îť j u du j (dz) +

0 âˆž

âˆž

âˆž

z 1

eâˆ’Îť j u 1(z â‰¤ eÎť j u ) du j (dz)

0

(z âˆ§ 1) j (dz).

0

This is also ďŹ nite by the properties of the LÂ´evy measure j (dz). Hence, this proves the proposition for 2Îą = Îťj . If 2Îą = Îťj we proceed as follows. Naturally, Îťj > 0, and we can choose an > 0 such that < Îťj and u exp (âˆ’Îťj u) â‰¤ c exp (âˆ’(Îťj âˆ’ )u) for a suitable constant c > 0. Next, choose x = Î¸ 2 Ď‰j c/2 and y = exp (âˆ’(Îťj âˆ’ )u)z in the inequality above. Following the same arguments lead to the conclusion of the Proposition also for this case. Hence, the proof is complete.

602

F. E. BENTH

Remark in passing that the logarithmic integrability condition on the LÂ´evy measures is the same as in Sato (1999), theorem 17.5. The condition may be slightly weakened by not majorizing Îł (u; 2Îą, Îťj ) by an exponentially decaying function but instead exploiting its monotonicity properties. However, this will come at the cost of a much more complicated integrability condition which is not as simple to verify as the one above. Further, remark that the logarithmic integrability condition ensures that each of the factors Yj in the volatility process has a stationary limit (see Sato 1999, theorem 17.5 again). Note that Ďˆ j (Î¸ ) are all complex-valued because they are the cumulants of subordinators. Thus, Ďˆ j (i 12 Î¸ 2 Îł (u; 2Îą, Îť j )) becomes real-valued. We conclude that X(t) âˆ’ Îź/Îą has a centered and symmetric stationary distribution. In fact, the stationary distribution of X(t) âˆ’ Îź/Îą is a sum of meanâ€“variance mixture models. In fact, each term in the cumulant function of the stationary distribution can be viewed as the cumulant of a normal distribution with zero mean and variance given by a random variable having a cumulant deďŹ ned as âˆž Ďˆ j (Î¸ ) Ďˆ j (Î¸ w j Îł (u; 2Îą, Îť j )) du. 0

j (Î¸ ) is the Appealing to the results in Proposition 2.3, it is straightforward to see that Ďˆ t limiting cumulant for the stochastic process w j 0 Îł (t âˆ’ u; 2Îą, Îť j ) d L j (t). The variance of the stationary distribution of X(t) can be calculated from the expres sion âˆ’Ďˆâˆž (0). It holds that Ďˆâˆž (Î¸ ) = i

n

0

j =1

âˆ’Î¸

2

âˆž

wj n

Ďˆ j

w 2j

j =1

âˆž 0

1 2 iÎ¸ w j Îł (u; 2Îą, Îť j ) Îł j (u; 2Îą, Îť j ) du 2

Ďˆ j

1 2 iÎ¸ w j Îł (u; 2Îą, Îť j ) Îł j2 (u; 2Îą, Îť j ) du. 2

Letting Î¸ = 0 implies a variance given by (0) âˆ’Ďˆâˆž

= âˆ’i

n

w j Ďˆ j (0)

j =1

âˆž

Îł (u; 2Îą, Îť j ) du =

n

0

wj

j =1

E[L j (1)] . 2ÎąÎť j

Because,

E[Yj (t)] = Yj (0)eâˆ’Îť j t + E = Yj (0)eâˆ’Îť j t +

t 0

âˆž

eâˆ’Îť j (tâˆ’u) d L j (u)

t

z 0

eâˆ’Îť j (tâˆ’u) du j (dz)

0

âˆ’Îť j t = Yj (0)eâˆ’Îť j t + Îťâˆ’1 ) j (1 âˆ’ e

âˆž

z j (dz),

0

it holds that lim E[Ďƒ 2 (t)] =

tâ†’âˆž

n j =1

wj

1 Îťj

âˆž 0

z j (dz) =

n j =1

wj

E[L j (1)] . Îťj

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

603

Thus, the variance of the limiting distribution of X(t) is equal to lim E[σ 2 (t)]/2α,

t→∞

corresponding to the constant volatility case σ 2 limt→∞ E[σ 2 (t)] in the Schwartz model (see Schwartz 1997). Obviously, the considerations here only ∞holds as long as the expectation of σ 2 (t) is ﬁnite, which is indeed the case when 1 z j (dz) < ∞ for all j = 1, . . . , n. Consider an example. Let n = 1 and assume L(t) be a compound Poisson process L(t) =

N(t)

Jk ,

k=1

where the jump intensity of the Poisson process N(t) is the constant ρ > 0 and Jk are independent and identically Gamma distributed random variables with parameters β > 0 and ν > 0. The density of Jk is β ν ν−1 x exp(−βx). (ν) Hence, the cumulant function of Jk is ψ J (θ ) = ν ln

β β − iθ

,

and therefore the cumulant function of L(t) becomes ν β ψ(θ ) = ρ −1 . β − iθ The stationary distribution of X(t) can be characterized by the cumulant ∞ μ 1 ψ∞ (θ ) = iθ + ρ ν − 1 du. 2 α θ 0 1+ γ (u; 2α, λ) 2β Supposing that 2α > λ, we have γ (u; 2α, λ) ≤

1 e−λu . 2α − λ

Thus, the cumulant ψ ∞ (θ ) − iθ μ/α is dominating the cumulant function ⎞ ⎛ ρ 0

∞

⎜ ⎜ ⎜ ⎝ 1+

1 θ2 e 2β(2α − λ)

⎟ ⎟ − 1 ⎟ du. ν ⎠ −λu

For ν = 1, we get −

ρ θ2 ln 1 + . λ 2β(2α − λ)

604

F. E. BENTH

This is recognized as the cumulant of a variance mixture model of a normal distribution with a (Ď /Îť, 1/Î˛(2Îą âˆ’ Îť))-variable, known as the variance-gamma model (see Carr Ë‡ et al. 2002; Benth, SaltytË™ e-Benth, and Koekebakker 2008). The next point of investigation is the covariance structure of X(t). From an empirical point of view, we are in particular interested in the stationary autocorrelation function for the deseasonalized log-spot prices. It turns out that the stochastic volatility does not affect this. To have ďŹ niteness of the variance of X(t), we from now on assume that

âˆž

(2.9)

z j (dz) < âˆž, j = 1, . . . , n.

1

First, we derive the covariance structure for X(t). PROPOSITION 2.4. It holds that Cov(X(t), X(t + Ď„ )) = Var(X(t)) Ă— exp(âˆ’ÎąĎ„ ), for t, Ď„ â‰Ľ 0, where, Var(X(t)) =

n

w j Yj (0)Îł (t; 2Îą, Îť j ) +

âˆž

0

j =1

t

z j (dz)

Îł (u; 2Îą, Îť j ) du .

0

Proof . The ItoË† Isometry yields

t

Cov(X(t), X(t + Ď„ )) = E

Ďƒ (s)eâˆ’Îą(tâˆ’s) d B(s)

0

= eâˆ’ÎąĎ„

t+Ď„

Ďƒ (s)eâˆ’Îą(t+Ď„ âˆ’s) d B(s)

0

t

E[Ďƒ 2 (s)]eâˆ’2Îą(tâˆ’s) ds.

0

Observe that the integral on the right-hand side is equal to Var(X(t)). Because s E[Yj (s)] = Yj (0)eâˆ’Îť j s + E eâˆ’Îť j (sâˆ’u) d L j (u) = Yj (0)e

âˆ’Îť j s

0

âˆž

+

z 0

s

eâˆ’Îť j (sâˆ’u) du j (dz)

0

âˆ’Îť j s = Yj (0)eâˆ’Îť j s + Îťâˆ’1 ) j (1 âˆ’ e

âˆž

z j (dz),

0

the result follows from a straightforward integration.

In stationarity, that is, when t â†’ âˆž, we obtain Corr(X(t), X(t + Ď„ )) âˆź exp(âˆ’ÎąĎ„ ). Hence, there is no effect of the stochastic volatility in the autocorrelation function of log-spot prices.

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

605

Consider next the adjusted logreturns of the spot over the interval [t, t + ), deﬁned as (2.10) Rα (t, ) X(t + ) − e−α X(t) −

μ (1 − e−α ) = α

t+

σ (s)e−α(t+−s) d B(s),

t

for t, > 0. Letting be small, a reasonable approximation (in distribution) is (2.11)

Rα (t, ) ≈

1 − e−2α σ (t)B(t) 2α

with B(t) = B(t + ) − B(t). Hence, Rα (t, ) can be thought of as a normal variance mixture, because approximately, Rα (t, ) conditioned on σ 2 (t) is normally distributed with mean zero and variance equal to 1 − e−2α 2 σ (t). 2α By appropriately choosing the stationary distribution of σ 2 (t), we can obtain a rich class of distributions modeling the adjusted logreturns. For instance, if σ 2 (t) is generalized inverse Gaussian distributed, the adjusted logreturns will become generalized hyperbolic distributed, with particular cases including the hyperbolic, Student t, and normal inverse Gaussian distributions. The generalized hyperbolic distribution has been applied ˇ to model gas prices in Benth and Saltyt˙ e-Benth (2004) and electricity prices in Eberlein and Stahl (2003). We refer to Barndorff-Nielsen and Shephard (2001) for more on such mean–variance mixture models. Another popular choice in ﬁnancial models is the CGMY distribution, which appears as the mixture of a variance-gamma with a normal distribution (see Carr et al. 2002). The motivation for the approximation (2.11) of Rα (t, ) comes from the deﬁnition of stochastic integration. Using the standard limiting arguments of step functions validates theoretically the approximation when ↓0. In this respect, the scaling function ! (1 − e−2α )/2α is slightly unusual. However, it is chosen so that the approximation in (2.11) is equal in variance to Rα (t, ) when conditioning on the path of σ (s) for s ∈ [t, t + ). One may use the Itoˆ Isometry to calculate error bounds for the approximation. Letting time t run over a grid t = 0, , 2, . . . , we consider the correlation structure for the adjusted returns. Note that due to the independent increment property of Brownian motion and the Itoˆ Isometry, Corr(Rα (t + τ, ), Rα (t, )) = 0, for τ ≥ . Hence, the adjusted logreturns are uncorrelated (but not necessarily independent because they are in general not normal). The dependency structure shows up in the squared adjusted returns, as the next proposition shows. But ﬁrst we impose a stronger integrability hypothesis on j (dz) to ensure ﬁnite variances and covariances of the adjusted logreturns. Suppose from now on that

∞

(2.12) 1

for j = 1, . . . , n.

z2 j (dz) < ∞,

606

F. E. BENTH

PROPOSITION 2.5. It holds that

Cov Rα2 (t + τ, ), Rα2 (t, ) t+τ + = e−4α e−2α(2t+τ ) t+τ

t+

e2α(u+v) Cov(σ 2 (u), σ 2 (v)) du dv,

t

with τ ≥ . t+τ + σ (u)eαu d B(u))2 Proof . We consider the covariance between ( t+τ t+ αv 2 σ (v)e d B(v)) . Use the law of double conditioning to ﬁnd that ( t " 2 2 # t+τ +

E

t+

σ (u)eαu d B(u)

t+τ t+

= E

2 " σ (v)eαv d B(v) E

t

"

t+τ +

2 αu σ (u)e d B(u)

t+τ

2 t+ αv σ (v)e d B(v)

= E t

=

σ (v)eαv d B(v)

t

"

and

t+τ +

e

2αu

## Ft+

E σ (u) Ft+ du

#

2

t+τ

"

t+τ +

e

2αu

E σ (u)

t+

σ (v)e

2

t+τ

αv

2 # d B(v) du,

t

where we have used the deﬁnition of the Itoˆ integral along with the independent increment property of Brownian motion in the second to last equality, and the law of double expectation in the last. Using the deﬁnition of the Itoˆ integral and the independent increment property of the Brownian motion, we obtain the desired result. Letting be small, we have

Cov Rα2 (t + τ, ), Rα2 (t, ) ≈ e−4α Cov(σ 2 (t + τ ), σ 2 (t))2 . In Lemma 2.6 we calculate the covariance of σ 2 (t) and σ 2 (s). LEMMA 2.6. It holds that Cov(σ 2 (t), σ 2 (s)) =

n w 2j j =1

2λ j

∞

z2 j (dz)e−2λ j (t−s) (1 − e−2λ j s ),

0

with t > s. Proof . Because the random variables Yj (t), j = 1, . . . , n are mutually independent, it holds that Cov(σ 2 (t), σ 2 (s)) =

n

w 2j Cov(Yj (t), Yj (s)).

j =1

∞

z j (dz) t, we can write t t ∞ −λ j t −λ j (t−u) Yj (t) = Yj (0)e + e z j (dz) du + e−λ j (t−u) d L j (u),

Compensating Lj (t) by

0

0

0

0

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

607

for a martingale L j with zero mean. Hence, for t > s, t s eâˆ’Îť j (tâˆ’u) d L j (u) eâˆ’Îť j (sâˆ’u) d L j (u) Cov(Yj (t), Yj (s)) = E 0

=e

âˆ’Îť j (t+s)

0

"

s

E

e

Îťju

2 # L j (u)

0

= eâˆ’Îť j (t+s)

s

âˆž

e2Îť j u du 0

z2 j (dz).

0

The result of the Lemma follows. For small we can conclude from Lemma 2.6 that âˆž n w 2j 2

âˆ’4Îą Var RÎą (t, ) â‰ˆ e z2 j (dz)(1 âˆ’ eâˆ’2Îť j t )2 . 2Îť j 0 j =1

In stationarity, after letting t â†’ âˆž, we ďŹ nd that the autocorrelation function of the squared adjusted logreturns becomes (still assuming being small and Ď„ â‰Ľ ) (2.13)

n

Corr RÎą2 (t + Ď„, ), RÎą2 (t, ) â‰ˆ w $ j eâˆ’Îť j Ď„ , j =1

for constants w 2j

âˆž

2Îť j 0 w $j = n w2

âˆž

k

k=1

2Îťk

z2 j (dz) , z k (dz) 2

0

summing to one. Hence, the autocorrelation structure of the adjusted squared returns is described by a sum of exponentials with decay rates given by the mean-reversion of the stochastic volatility factors. Before moving to the empirical example, we include a discussion of skewness in the adjusted logreturns. The adjusted logreturns will be symmetrically distributed in the speciďŹ ed model. However, we can account for skewness by considering the following extension of the deseasonalized log-spot price process,

d X(t) = Îź + Î˛Ďƒ 2 (t) âˆ’ Îą X(t) dt + Ďƒ (t) d B(t). (2.14) The mean reversion level will be Îź + Î˛Ďƒ 2 (t), that is, depending on the stochastic volatility through a parameter Î˛. This extension is directly adopted from Barndorff-Nielsen and Shephard (2001), and the conditional adjusted logreturns will be approximately normally distributed with mean Î˛Ďƒ 2 (t)(1 âˆ’ exp (âˆ’ Îą))/Îą and variance Ďƒ 2 (t)(1 âˆ’ exp Ă— (âˆ’2Îą))/2Îą. In the case we choose Ďƒ 2 (t) to be inverse Gaussian in stationarity, the adjusted logreturns become normal inverse Gaussian distributed with a skewness parameter equal to Î˛. All the considerations above are easily generalized for the model in (2.14), as well as the analysis to follow. Because our empirical example below did not show any signiďŹ cant skewness, and to keep the analysis slightly more transparent, we have chosen to focus on the case Î˛ = 0 in this paper.

608

F. E. BENTH

FIGURE 2.1. Gas spot prices in pence/therms (top panel) and the log-prices together with their ﬁtted seasonal function (bottom panel). 2.2. Empirical Analysis on UK Gas Spot Prices We analyze empirically a time series of UK gas spot price data using the Schwartz model with stochastic volatility. The data are daily prices collected at the National Balancing Point (NBP) and range from February 6, 2001 to April 27, 2004. Weekends and holidays are disregarded, and the time series contains 806 records. In Figure 2.1 we have plotted the spot prices (top graph). Gas prices are inﬂuenced from a seasonally varying demand, and we assume a seasonality function deﬁned by (2.15)

ln (t) = b1 + b2 sin (2π b3 (t − b4 )/250) .

We suppose 250 trading days a year, approximately the average number of trading days for a year in the data set. Fitting the four parameters in the seasonal function to the logprices using nonlinear least squares1 yield the estimates reported in Table 2.1. Observe that the estimated frequency parameter b3 is close to 1, telling us that there is an annual cycle in the price dynamics. After deseasonalizing the log-spot prices, we are left with estimating the parameters of the X(t)-process in (2.5). Using the dynamics of X(t), we ﬁnd that t+1 μ X(t + 1) = (1 − e−α ) + e−α X(t) + σ (s)e−α(t+1−s) d B(s), α t that is, a linear regression of tomorrow’s log-prices against today’s (seasonality subtracted). The residuals are possibly nonnormal due to the stochastic volatility σ (s). A scatter plot of the observed x(t + 1) against x(t) is shown in Figure 2.2, and a linear regression exercise estimates the intercept to be 0 (with p-value 0.98) and slope 0.881 1

We applied the nlinfit function in Matlab.

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

609

TABLE 2.1 Parameter Estimates for UK Gas Seasonal Function (2.15) Estimate b1 b2 b3 b4

95% Conﬁdence interval

2.91 −0.287 1.03 53.4

(2.89, 2.93) (−0.313, −0.262) (1.01, 1.04) (47.5, 59.3)

FIGURE 2.2. Scatter plot of ln S(t + 1) against ln S(t).

(with p-value 0). The R2 is 78%. The intercept is not signiﬁcantly different than zero, as we may expect because the seasonality function has a level already incorporated. Hence, we estimate μ to be zero. The 95% conﬁdence interval of the slope estimate is (0.848, 0.913), which is rather tight (approx. 5% variation of the estimated value). The estimated speed of mean reversion becomes $ α = 0.127, which corresponds to a half-life of ln 2/$ α ≈ 5.5 days. We next investigate the residuals from the regression. The mean of the residuals is estimated to be zero, while the standard deviation is 0.127, or roughly 200% yearly, a rather high but not untypical number for energy prices. Our model for the residuals are t

t+1

σ (s)e−α(t+1−s) d B(s),

610

F. E. BENTH

FIGURE 2.3. Top panel: Empirical density of the residuals (complete line) with the ﬁtted normal distribution (dashed line). Bottom, left panel: The autocorrelation function of residuals. Bottom, right panel: The autocorrelation function of squared residuals together with the ﬁtted exponential function.

which, in distribution, is approximately equal to % σ (t) (1 − e−2α )/2αB(t), with B(t) = B(t + 1) − B(t). We recall the properties of σ (t)B(t) as a mean-variance mixture and with exponentially decaying autocorrelation function for the squared residuals. In Figure 2.3 (top panel), the empirical density function (complete line) of the ! residuals after dividing out the factor (1 − exp(−2α))/2α is plotted. The residuals are clearly not normally distributed (the dashed line is the ﬁtted normal distribution). Moreover, the autocorrelation function shows zero correlation (see bottom right panel in Figure 2.3).2 We ﬁnd the number of factors required in the stochastic volatility model by looking at the autocorrelation function for the squared residuals. From the bottom right panel of Figure 2.3, we show the empirical autocorrelation function of the squared residuals together with a ﬁtted function exp (−λτ ). Using nonlinear least squares, we found $ λ= 1.11 with 95% conﬁdence interval (0.958, 1.255). From inspection of the graph it seems that one factor Y(t) = Y1 (t) in the deﬁnition of σ 2 (t) is sufﬁcient. To validate this choice, we tried to ﬁt the sum of two or three exponentials (corresponding to the choices n = 2, 3), however, the quality of the ﬁt did not improve signiﬁcantly. Hence, we settled with one factor in the stochastic volatility model for this data set, and the estimated speed of 2 All lags are within a 95% conﬁdence interval around zero. This is not shown on the graphs, but a result from the estimation.

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

611

FIGURE 2.4. NIG distribution (dashed line) and the empirical density (complete line). mean reversion for σ 2 (t) = Y(t) becomes $ λ = 1.11. We remark in passing that BarndorffNielsen and Shephard (2001) calibrated the stochastic volatility model to exchange rate data where the best ﬁt to the empirical autocorrelation function of the squared logreturn data was obtained for n = 3. Depending on the data set at hand, one may use the ﬂexibility allowed for in the BNS stochastic volatility model to accurately ﬁt the autocorrelation function of squared (adjusted/log)return data. The various factors can be interpreted as stochastic volatility reverting at different speeds, for instance large jumps followed by a fast reversion together with slowly reverting jumps of smaller size. The choice of subordinator process L(t) is given from the choice of the residual distribution. In Figure 2.4, we show the empirical density of the residuals together with the ﬁtted normal inverse Gaussian (NIG) distribution. The NIG distribution is a four-parameter family of distributions, with parameters a (tail heaviness), b (skewness), m location, and d (scale).3 Because we assume centered and symmetric residuals in our model, we must have zero skewness and location in the NIG distribution, leaving us with two parameters a and d to estimate. It is natural to have zero location because we have taken out the mean in the data by the seasonality function. We refer to Barndorff-Nielsen (1998) for more on the NIG distribution and its applications to ﬁnance. Except for being slightly more peaky than the empirical, the NIG seems to ﬁt the residual data very well. The NIG distribution has been used to ˇ ¨ model various energy spot prices in Benth and Saltyt˙ e-Benth (2004) and Borger et al. ˇ (2009), and electricity forwards in Benth, Saltyt˙e-Benth, and Koekebakker (2008). From Barndorff-Nielsen and Shephard (2001), one may choose a subordinator process Z(t) driving Y(t) such that in stationarity the residuals are approximately NIG distributed. To have a stationary distribution of Y(t) being independent of λ, one may let L(t) Z(λt) (see Barndorff-Nielsen and Shephard 2001 for this modeling choice). Letting Z be 3

The usual notation for these parameters is α, β, μ, and δ.

612

F. E. BENTH

TABLE 2.2 Parameter Estimates for the NIG Distribution

a d

Estimate

95% Conﬁdence interval

4.83 0.071

(3.65, 6.01) (0.061, 0.081)

a subordinator with inverse Gaussian distributed marginals, we obtain NIG distributed residuals. The parameters of the NIG distribution were estimated using maximum likelihood, and the results are reported in Table 2.2. Estimating a NIG distribution with all four parameters gave basically zero skewness and location, which justiﬁes the assumption on a centered symmetric NIG distribution for the residuals. We conclude that the one factor model of Schwartz with stochastic volatility provides a reasonable model for the UK gas spot price dynamics.

2.3. Comparison with the Heston Stochastic Volatility Model Let us compare the non-Gaussian stochastic volatility dynamics with the Heston model (see Heston 1993). The Heston stochastic volatility model applied to gas and electricity spot prices is discussed in Geman (2005) (see also Eydeland and Geman 1998, for a general discussion of stochastic volatility in power markets), and takes the form σ 2 (t) = Y(t), where ! (2.16) B(t). dY(t) = η (ζ − Y(t)) dt + δ Y(t) d Here, η, ζ , and δ are positive constants and B is a Brownian motion which in general may be correlated with B. We recognize the dynamics of Y(t) as the Cox-Ingersoll-Ross (CIR) model (see Cox, Ingersoll, and Ross 1981). This is known to have a positive solution Y(t) which admits an explicit characteristic function. The conditional characteristic function for Y(t) is derived in Cox, Ingersoll, and Ross (1981), and presented in the next lemma for convenience. LEMMA 2.7. The conditional cumulant function of the Heston stochastic volatility model Y(t) in (2.16) is given as ln E [exp (iθ Y(T)) | Y(t) = y] = a(T − t) + b(T − t)y, where

a(τ ) = ζ c ln

c , c − iθ (1 − e−ητ )

b(τ ) = iθ e−ητ

c , c − iθ (1 − e−ητ )

for c = 2η/δ 2 and θ ∈ R. Proof . See Cox, Ingersoll, and Ross (1981).

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

613

The probability distribution of Y(t) is non-central χ 2 (see Cox, Ingersoll, and Ross 1981). Observe that when letting t = 0 and passing to the limit T → ∞, we ﬁnd that the cumulant has a limiting distribution given as c lim ln E [exp (iθ Y(T))] = ζ c ln , T→∞ c − iθ with c as in the lemma above. We recognize this as the cumulant function of a (c, ζ c)-distribution, as observed by Cox, Ingersoll, and Ross (1981). This is a special case of the BNS stochastic volatility model, as the following example shows. Consider a one-factor BNS stochastic volatility model driven by compound Poisson &aN(t) process with exponentially distributed jumps. That is, L(t) = k=1 Jk for a Poisson process N(t) with intensity ρ > 0 and Jk being i.i.d. exponentially distributed with parameter c > 0. Hence, we easily calculate the cumulant function of L(t) to be ψ(θ ) = ρ

iθ . c − iθ

Appealing to theorem 17.5 in Sato (1999) gives that the stationary distribution of the BNS volatility process with speed of mean-reversion equal to λ > 0 becomes ∞ c ρ ψ(θ e−λs ) ds = ln . ψY (θ ) = λ c − iθ 0 Now, by choosing λ and ρ such that their ratio is equal to ζ c, we recover the stationary distribution of the Heston stochastic volatility model. A basic assumption in the BNS stochastic volatility model is the independence between the factors Yj (t) and the Brownian motion B(t) driving the log-price X(t). Supposing that B and B are independent in the Heston model, the adjusted logreturns will become a variance-mixture of a normal distribution with a Gamma-law of the variance. This is very restrictive compared to the ﬂexibility allowed for in the BNS model. In fact, from Barndorff-Nielsen and Shephard (2001), any self-decomposable distribution can be used as a stationary limiting distribution, because this class allows for the existence of subordinators driving the factors in the stochastic volatility process. This opens for a big variety of normal variance-mixture models for the adjusted logreturns. Furthermore, the BNS model can ﬁt the autocorrelation structure for the squared adjusted returns with a high degree of accuracy because we can use several factors with different speeds of mean-reversion yielding a sum of exponentials for the autocorrelation structure. The Heston model allows for correlation between the driving noises in the spot and volatility processes, B and B. Hence, one may incorporate a leverage effect in the spot price. In energy markets one frequently talks of the inverse leverage effect, meaning that high prices are associated with high volatility (see Geman 2005, for further discussion of this). In probabilistic terms we can let B and B be positively correlated to model this. The BNS stochastic volatility model presented in this paper will not have such a ﬂexibility, because we assume independence between B and the subordinators Lj in the volatility factors. It is difﬁcult to assume any dependence structure between a Brownian motion and L´evy processes (see Kallsen and Tankov 2006, for a discussion on this and L´evy copulas). However, the obvious way out is to include some of the subordinators directly in the log-spot dynamics, following the idea proposed in Barndorff-Nielsen and Shephard (2001). In the context of energy and commodity markets, it may be more natural to have a

614

F. E. BENTH

multifactor model for the log-spot dynamics, where some of the factors are dependent on the stochastic volatility factors. We investigate the various modeling perspectives closer in Benth and Vos (2009), where the results of this paper are generalized. Let us brieďŹ‚y discuss the stationary distribution of the log-spot prices X(t) with the Heston stochastic volatility model assuming independence between the volatility noise and B(t). Using the argument with conditioning on knowledge of the paths of Y(s) (as we used for the BNS model), we derive ' ( 1 2 t Îź âˆ’Îąt âˆ’Îąt âˆ’2Îą(tâˆ’s) E [exp (iÎ¸ X(t))] = exp X(0)e + (1 âˆ’ e ) E exp âˆ’ Î¸ Y(s)e ds . Îą 2 0 The characterization of a stationary limit of X(t) rests on knowing the limiting distribution (or the cumulant) of t (2.17) Y(s)eâˆ’2Îą(tâˆ’s) ds. 0

This is the same problem as for the BNS model, however, in that case the Ornsteinâ€“ Uhlenbeck dynamics of the Yâ€™s makes it possible to calculate the limiting cumulant of (2.17) explicitly. For the Heston model, the problem is more delicate. Another approach is to calculate the conditional characteristic function of X(t) given X(s) for t â‰Ľ s using that this is a Feynmanâ€“Kac solution of a partial differential equation. The structure of the partial differential equation suggests an afďŹ ne solution in terms of X(s) and Y(s), however, the coefďŹ cients in the solution will themselves be solutions of Riccatti equations which seem difďŹ cult to solve analytically. We will come back to this approach in calculating the forward price in the next section. However, given that we have the limiting cumulant of (2.17) for the Heston stochastic volatility dynamics, we see that X(t) is a normal variance-mixture in the limit.

3. PRICING OF FORWARDS AND OPTIONS In this section, we derive forward and option prices based on our proposed spot model. The forward prices will be explicit in terms of the underlying spot price and the state of the stochastic volatility. We want to derive the forward price F(t, T) at time t for a contract delivering the spot at time T â‰Ľ t. For this purpose, a risk-neutral dynamics for the spot price is required, and it is convenient to introduce a parametric class of risk-neutral probabilities by a Girsanov transform (see, e.g., Karatzas and Shreve 1991), (3.1)

d W(t) = d B(t) âˆ’

Î¸ (t) dt. Ďƒ (t)

Here, Î¸ (t) is a bounded measurable function on [0, T]. It is frequently called the market price of risk, because it can be viewed as a measure of the degree of unhedgeability of the Ë‡ derivatives based on the spot (see, e.g., Benth, SaltytË™ e-Benth, and Koekebakker 2008, for more on these issues). Note that the Novikov condition (see, e.g., Karatzas and Shreve 1991) is satisďŹ ed on [0, T] because Ďƒ 2 (t) â‰Ľ

n j =1

w j Yj (0)eâˆ’Îť j t ,

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

615

and Î¸ (t) is bounded. Hence, there exists a probability measure Q such that W is a QBrownian motion on [0, T]. The characteristics of Lj remain unchanged under the new probability Q. The Q-dynamics of X becomes (3.2)

d X(t) = (Îź + Î¸ (t) âˆ’ Îą X(t)) dt + Ďƒ (t) d W(t).

Furthermore, a straightforward calculation shows that T t (3.3) X(T) = X(t)eâˆ’Îą(Tâˆ’t) + Ďƒ (u)eâˆ’Îą(Tâˆ’t) d W(u), (Îź + Î¸ (u)) eâˆ’Îą(Tâˆ’u) du + t

t

for t â‰¤ T. One may also introduce a change of measure with respect to the jump processes Yj in the stochastic volatility dynamics. For instance, a simple class of measures could be obtained from the Esscher transform, that would exponentially tilt the LÂ´evy jump measure by a coefďŹ cient function. The tilting could give more or less emphasis on the big jumps, thereby introducing a risk premium for jump risk. We will not consider this more Ë‡ general risk-neutral measure, but refer to Benth, SaltytË™ e-Benth, and Koekebakker (2008) for more on the Esscher transform for non-Gaussian Ornsteinâ€“Uhlenbeck processes. In the following proposition, we derive an explicit expression for the forward price F(t, T). PROPOSITION 3.1. Assume for j = 1, . . . , m the exponential integrability condition âˆž âˆ— e0.5Îł (s ;2Îą,Îť j )w j z j (dz) < âˆž, 1

where Îł (s; 2Îą, Îťj ) is deďŹ ned in (2.7) and s âˆ— is its maximum point (see Lemma 2.1). Then we have âŽ› âŽž n S(t) exp(âˆ’Îą(Tâˆ’t)) 1 w j Îł (T âˆ’ t; 2Îą, Îť j )Yj (t)âŽ , F(t, T) = (T)HÎ¸ (t, T) exp âŽ? 2 (t) j =1

where ln HÎ¸ (t, T) =

( T Îź' 1 âˆ’ eâˆ’Îą(Tâˆ’t) + Î¸ (u)eâˆ’Îą(Tâˆ’u) du Îą t n Tâˆ’t ' w ( j Ďˆ j âˆ’i Îł (u; 2Îą, Îť j ) du. + 2 0 j =1

Proof . Appealing to (3.3) and the adaptedness of X(t), we ďŹ nd F(t, T) = E Q [S(T) | Ft ] = (T) exp X(t)eâˆ’Îą(Tâˆ’t) +

T t

T

Ă— E Q exp t

(Îź + Î¸ (u))eâˆ’Îą(Tâˆ’u) du

Ďƒ (u)eâˆ’Îą(Tâˆ’u) d W(u)

Ft .

616

F. E. BENTH

Note that the characteristics of Ďƒ (t) remain unchanged by the probability transform from P to Q. We use the same argument as in the proof of Proposition 2.2 to reach that T T 1 âˆ’Îą(Tâˆ’u) 2 âˆ’2Îą(Tâˆ’u) E Q exp Ďƒ (u)e d W(u) Ft = E exp Ďƒ (u)e du Ft 2 t t n ) wj T = E exp Yj (u)eâˆ’2Îą(Tâˆ’u) du Ft , 2 t j =1

where the second equality follows by the independence of the stochastic volatility factors Yj . The explicit dynamics of Yj (u), u â‰Ľ t, is given as u eâˆ’Îť j (uâˆ’v) d L j (v), Yj (u) = Yj (t)eâˆ’Îť j (uâˆ’t) + t

and thus we ďŹ nd by appealing to the stochastic Fubini Theorem (see, e.g., Protter 1990) T T âˆ’2Îą(Tâˆ’u) Yj (u)e du = Yj (t)Îł (T âˆ’ t; 2Îą, Îť j ) + Îł ((T âˆ’ u; 2Îą, Îť j ) d L j (u). t

t

Hence, by the independent increment property of Lj and the exponential integrability condition, we ďŹ nd wj T E exp Yj (u)eâˆ’2Îą(Tâˆ’u) du Ft 2 t ( 'w wj T j Îł (T âˆ’ u; 2Îą, Îť j ) d L j (u) Ft = exp Yj (t)Îł (T âˆ’ t; 2Îą, Îť j ) E exp 2 2 t T ' w ( wj j Ďˆ j âˆ’i Îł (T âˆ’ u; 2Îą, Îť j ) du , = exp Yj (t)Îł (T âˆ’ t; 2Îą, Îť j ) + 2 2 t where we have used the adaptedness of Yj (t). The proposition follows.

We observe that the forward price is explicitly dependent on the spot price, however, not on the volatility. Each factor Yj in the volatility structure comes in weighted by the function Îł (T âˆ’ t; 2Îą, Îťj ) instead. Thus, the forward price is not Markovian with respect to spot and volatility, but with respect to spot and each of the factors of the volatility. This means that in general we cannot derive the forward price based solely on observing the spot price. To recover the different factors making up the stochastic volatility, one must resort to ďŹ ltering techniques. The particle ďŹ lter methods described and analyzed in Barndorff-Nielsen and Shephard (2001) can be applied to recover the states of the factors in the stochastic volatility from observations of the spot price. Alternatively, one may adapt the particle ďŹ lter so that forward price observations can be used to ďŹ nd the current states of the factors. In the special case n = 1, that is, a one-factor model Ďƒ 2 (t) = Y(t), one may observe the stochastic volatility from the quadratic variation process of the spot price. Thus, in this situation we can indeed recover the volatility from spot price observations directly. Note that the forward prices have jumps explicitly given in their dynamics. These jumps are inherited from the stochastic volatility model and tell us that whenever the spot price is experiencing a volatility jump, this is directly affecting the forward price. This occurs despite the fact that the spot price paths are continuous. Because Yj (t) are mean-reverting, the jumps in the forward price are killed over time with a speed given

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

617

by the mean-reversion rates of Yj and the spot through the term γ (t; 2α, λj ). Hence, how short- or long-lived a jump is depends on the parameters λj and α. Based on the explicit forward price dynamics Proposition 3.1 we can calibrate the market price of risk θ from spot and forward data. This can be done by minimizing the distance between the observed and theoretical forward prices, $ θ arg min θ ∈

Mk N

Fk,l − Fθ (tk , Tl ).

k=1 l=1

Here, the observations are F k,l , the forward price at time tk of a contract maturing at time Tl . The theoretical forward price F θ (tk , Tl ) is given in Proposition 3.1, where we use a subscript θ to emphasize the dependency on the market price of risk. The distance is measured by ·, usually being the Euclidean distance or some weighted version of it. The set speciﬁes the class of market price of risk functions. A typical choice is the set of piecewise constant functions. Among many papers utilizing this established technique, we would like to mention a recent article by H¨ardle and Lopez Cabrera (2009), where the market price of risk is studied for temperature forwards. The shape of the forward curve T→ F(t, T) is determined by the seasonality function (T), the function T→ Hθ (t, T) incorporating the risk premium, today’s spot price S(t), and the volatility factors Yj (t). The two ﬁrst terms, (T) and Hθ (t, T) give rise to ﬁxed shapes for the forward curve, not stochastically varying with the level of spot or volatility. We investigate the two other terms. First, the term (3.4)

S(t) (t)

exp(−α(T−t))

will monotonically decrease towards 1 when S(t) > (t) as T increases, starting at S(t)/(t) for T = t. This corresponds to a backwardating curve. On the other hand, a contango situation is obtained when S(t) < (t), in which case the term (3.4) will give an increasing curve towards 1 with T → ∞. In Figure 3.1 (top panel), we have plotted the two situations for α given as for the UK gas spot data and setting t = 0. The term giving the contribution from the stochastic volatility is ⎛ (3.5)

exp ⎝

n 1

2

⎞ w j γ (T − t; 2α, λ j )Yj (t)⎠ .

j =1

For simplicity, we restrict our attention to one factor, n = 1. Because γ (u; 2α, λ) is an increasing function for u < s ∗ and decreasing to zero when u > s ∗ according to Lemma 2.1, we ﬁnd that (3.5) produces a hump shape. In the short end, when T ≈ t, the expression in (3.5) is close to 1, which is also the case in the long end of the curve. However, for T − t < s ∗ it will be increasing, and decreasing thereafter. Hence, there will be a hump at T − t = s ∗ , where we recall from Lemma 2.1 that s ∗ = (ln 2α − ln λ)/(2α − λ) for 2α = λ and s ∗ = 1/2α otherwise. In the bottom panel of Figure 3.1, we have plotted the shape of (3.5) for the parameters estimated for UK gas spot. We have supposed that the current volatility is Y(0) = σ 2 (0) = 0.06, which corresponds to approximately 388% annually (compare with the 200% being the mean annual volatility of the data). The maximal value of this curve is achieved at around 2 days with the parameters from UK gas spot data.

618

F. E. BENTH

FIGURE 3.1. (Top panel): The shape from the spot in (3.4) in the forward curve with (0) = 24.33, and S(0) = 26 and S(0) = 22, respectively. (Bottom panel): The shape from the stochastic volatility in (3.5) with Y(0) = 0.06. Note that the state of the volatility σ 2 (t) = Y(t) will scale the size of the hump, the bigger volatility the bigger hump contribution. We obtain n such hump shaped curves in the general case, leading to the possibility of several humps along the forward curve. In Figure 3.2 we have plotted the forward curve in two situations to illustrate the ﬂexibility implied by the BNS model. We see the combination of a contango or a backwardating shape with a hump implied by the stochastic volatility factor. The estimated parameters of UK gas spot price and a risk premium equal to zero (i.e., θ = 0) are used for illustration. The function ψ in H0 becomes the cumulant function of an inverse Gaussian variable, which is known analytically (see, e.g., Barndorff-Nielsen and Shephard 2001, p. 31). The integration in the calculation of H0 is performed using a simple Riemann approximation. We have plotted the two situations with a spot being below or above the seasonal mean. We note the hump in the backwardation case, and the concavity in the contango case. In the latter case, the effect of volatility is very small, and the spot together with seasonality is dominating. In the former case, we see directly the inﬂuence of volatility on the curve. In the long end, the prices settle around the seasonal function. Of course, in reality the forward prices will include a risk premium so the illustrations here are only relevant for presenting the possible shapes of the predicted spot prices over time to maturity, unadjusted for any risk premium. In Figure 3.3, we show the contribution from a constant market price of risk θ . The two curves show an exponentially upward (corresponding to a positive market price of risk, here θ = 0.05) or downward sloping (corresponding to a negative market price of risk, here θ = −0.05) graph. To make up the true forward curve, we multiply these with the curves depicted in Figure 3.2. The dynamics of F(t, T) is presented in Proposition 3.2.

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

619

FIGURE 3.2. The forward curve using parameters estimated from UK gas spot prices. (Top panel): Spot price above seasonal mean. (Bottom panel): Spot price below seasonal mean.

FIGURE 3.3. Contribution to the forward curve from a constant market price of risk θ . (Top panel): θ = 0.05. (Bottom panel): θ = −0.05.

620

F. E. BENTH

PROPOSITION 3.2. Under the exponential integrability conditions in Proposition 3.1, the dynamics of F(t, T) is d F(t, T) = σ (t)e−α(T−t) d W(t) + F(t−, T) n

j =1

∞

( ' N(dz, dt), ew j γ (T−t;2α,λ j )z/2 − 1

0

with N being the compensated Poisson random measure associated to Lj , j = 1, . . . , n. ˆ Formula for jump processes (see, e.g., Ikeda and Watanabe Proof . We apply Ito’s 1981) on F(t, T), where we may simplify considerably because we know that F must be a martingale. We observe that the d W-term in the dynamics of F has a typical volatility structure incorporating the Samuelson effect, in the sense that when time to maturity of the contract goes to zero, the forward volatility tends to the spot volatility. The jump terms also contribute to the volatility, however, because γ (T − t; 2α, λj ) → 0 when T − t → 0, we ﬁnd that these terms disappear in the limit. This is obviously in line with the fact that the forward price converges to the spot as time to delivery approaches zero. Multifactor spot models are popular in commodity markets, because they can describe effects like spikes in electricity prices, say. Such models frequently consists of one or more mean-reverting factors where the driving noise is a L´evy process (jump process). The forward price based on such spot models can be derived analytically in terms of the characteristic function of the L´evy noise, and its dynamics will contain terms analogous to ˇ what we have in Proposition 3.2 (see chapter 4 in Benth, Saltyt˙ e-Benth, and Koekebakker 2008, for explicit calculations). Thus, pathwise (ﬁrst-order) effects of the spot price are transferred into a pathwise effect of the forward, where the mean reversion leads to a damping over time to maturity. The interesting consequence from the analysis in this paper is that also second-order (stochastic volatility) effects transfer over to a similar pathwise effect of the forward dynamics. This is valuable knowledge if one for instance wants to perform an analysis of jumps in observed spot and forward time series data (using, say, the techniques developed by Ait-Sahalia and Jacod 2009). Let us consider the forward price under the Heston stochastic volatility model with the market price of risk θ in (3.1) supposed to be constant. Remark that one may do a measure-change in the volatility dynamics Y(t) in order to produce a risk-neutral probability. We ignore this possibility here. The result for the noncorrelated case is collected in the next proposition (see also Hikspoors and Jaimungal 2007, for a similar result). PROPOSITION 3.3. The forward price F(t, T) under the Heston stochastic volatility model is given by F(t, T) = (T)G θ (T − t) exp (ξ (T − t)Y(t))

S(t) (t)

exp(−α(T−t))

where

μ+θ 1 − e−ατ + ηζ ln G θ (τ ) = α

τ 0

ξ (s) ds,

,

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

621

and ξ (τ ) is the solution of the Riccati equation (3.6)

' η (2 η2 1 − + e−2ατ , ξ (τ ) = δ ξ (τ ) − 2δ 4δ 2

with initial condition ξ (0) = 0. Proof . The forward price is (after using the Markov property) F(t, T) = (T)Eθ e X(T) | X(t) where u(t, x, y) = Eθ e X(T) | X(t) = x, Y(t) = y is the Feynman–Kac solution of the partial differential equation ∂ 2u ∂u ∂u 1 ∂ 2u ∂u + (μ + θ − αx) + η(ζ − y) + y 2 + δy 2 = 0 ∂t ∂x ∂ y 2 ∂x ∂y and terminal condition u(T, x, y) = exp (x). Supposing a solution of the afﬁne form u(t, x, y) = exp(a(t) + b(t)x + c(t)y) gives a (t) + (μ + θ )b(t) + ηζ c(t) = 0, b (t) − αb(t) = 0, 1 c (t) − ηc(t) + b2 (t) + δc2 (t) = 0, 2 with terminal conditions a(T) = c(T) = 0 and b(T) = 1. The proof is complete after reversing time in a, b, and c. Observe the structural similarity in the forward prices resulting from the Heston and the BNS stochastic volatility models. The Heston stochastic volatility model leads to a Riccati equation for the coefﬁcient function ξ (T − t) in front of the volatility term Y(t) which seems difﬁcult to solve analytically. For the BNS model, we have an explicit formula for the forward price. Note that the correlated case for the Heston model has a similar solution, which we leave to the interested reader to derive. As noted by Hikspoors and Jaimungal (2007), it seems difﬁcult to obtain an analytical solution for the forward price on a mean-reverting spot dynamics with Heston stochastic volatility. Let us discuss at an informal level the properties of the solution ξ (t) of the Riccatti equation (3.6). First, because ξ (0) = 0, we have that the derivative of ξ at zero must be equal to 0.5, and thus the solution is increasing to begin with. Further, if ξ (t) = 0 for some t > 0, we ﬁnd from (3.6) that ξ (t) =

1 −2αt > 0, e 2

and thus ξ (t) will be pushed back into the positive domain. Hence, ξ (t) ≥ 0 for all t ≥ 0. For small t we have that ξ (t) is increasing, and therefore (ξ (t) − η/2δ)2 is increasing. On the other hand, exp (−2αt) − η2 /2δ is decreasing with time, and eventually turning

622

F. E. BENTH

FIGURE 3.4. Plot of ξ as a function of time to maturity. The function was evaluated numerically based on a Euler scheme for the Riccatti equation. negative (if not negative to begin with). We conclude that ξ (t) will increase until the quadratic term on the right-hand side of (3.6) balances the negative contribution from the inhomogeneous term. At this point we will have a zero derivative. This will be reached for a value of ξ (t) less than η/2δ, because for ξ (t) = η/2δ we would have a negative derivative, and thus a decreasing value of ξ (t). Thus, ξ (t) ∈ [0, η/2δ). Next, a simple calculation shows that the maximum value of ξ (t) is given at a point t ∗ > 0, for which + * η 2δ −2αt∗ ∗ ξ (t ) = . 1± 1− 2e 2δ η We must choose the negative solution because ξ (t) ≤ η/2δ. Of course, we do not know what t ∗ is, but the derivation tells us that ξ (t) is increasing from zero up to ξ (t ∗ ) > 0 for 0 ≤ t ≤ t ∗ . Thereafter, we have that ξ (t) ≤ 0. From above we know that ξ (t) ≥ 0, so at least limt→∞ ξ (t) = 0 otherwise we will cross zero, and using this in (3.6) shows that the limit of ξ (t) is either 0 or η/δ. From the upper bound of ξ (t) we reach that lim ξ (t) = 0.

t→∞

Hence, we have argued for a hump-shaped function, starting at zero and with a zero asymptotic behavior, analogous to the shape of the function γ (t) in the forward price for the BNS model. In Figure 3.4 we have plotted the ξ as a function of time to maturity T − t. The parameters were chosen to be η = 1.11, α = 0.127, and δ = 0.05. An Euler scheme was used to solve the Riccatti equation for ξ numerically. Note that we used the same speed of mean-reversion for the Heston model as estimated for the BNS for the gas data in Section 2.2. Further, “the volatility of volatility” δ was

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

623

arbitrarily chosen as δ = 0.05 for illustration only. As in the BNS stochastic volatility model, the Heston model produces a hump in the short end of the forward curve, whereas the shapes from S(t) will be either contango or backwardation. In this respect, the two stochastic volatility models produce similar forward curves as long as we choose one factor driving the volatility in the BNS model. In the case of many factors in the BNS model, a forward curve with many humps may be produced. The time-dynamics of the forward price for the Heston stochastic volatility model is straightforwardly calculated using the Ito Formula, (3.7)

d F(t, T) B(t) . = σ (t) e−α(T−t) d W(t) + ξ (T − t) d F(t, T)

In this case there are no jumps in the dynamics of the forward price, but two Brownian motions driving the price evolution. In fact, this model is equivalent to a geometric Brownian dynamics with volatility deﬁned as % σ (t) exp(−2α(T − t)) + ξ 2 (T − t), and thus fundamentally different from the dynamics we obtained for the BNS stochastic volatility model. The stochastic volatility structure resulting from the Heston model still possesses the Samuelson effect, because ξ (0) = 0. Note, however, that we need to know ξ (τ ) in order to have the dynamics of F(t, T) available, contrary to the BNS case where we have all parameters accessible. We end this section with a remark on the pricing of European options written on forward contracts. Such options are commonly traded in many commodity markets, even in an organized way. For example, one may buy options on oil and gas forwards at NYMEX. Options written on forwards F(t, T) as above (including BNS and Heston, and other afﬁne structures), can readily be priced by means of Fourier transformation. The expression for the price will involve the Fourier transform of the payoff function of the option and the characteristic function of the log-forward price. This expression can be calculated numerically using the fast Fourier transform device as long as the characteristic function of the log-price is known. For the BNS model this will indeed be ˇ the case. We refer the reader to Carr and Madan (1998) and Benth, Saltyt˙ e-Benth, and Koekebakker (2008) for more information on the Fourier approach to option pricing.

4. CONCLUSIONS We analyze the stochastic volatility model of Barndorff-Nielsen and Shephard (2001) for the Schwartz dynamics (see Schwartz 1997). The stochastic volatility model is deﬁned as a sum of non-Gaussian Ornstein–Uhlenbeck processes. The log-spot price dynamics will be stationary, with a limiting distribution given as a normal variance-mixture with the cumulant of the variance explicitly given in terms of the cumulants for the subordinator processes driving the volatility. Furthermore, we show that the spot returns adjusted for mean-reversion (so-called adjusted returns) are variance mixture models incorporating known distributions like the normal inverse Gaussian or CGMY. The dependence structure of returns can be accurately modeled by appropriately choosing the speed of mean-reversion of the stochastic volatility factors. An empirical study using UK gas spot prices observed at the National Balancing Point demonstrates the ﬂexibility in the stochastic volatility model.

624

F. E. BENTH

Explicit forward prices are calculated, along with the price dynamics. A particularly interesting observation is the explicit dependence on the stochastic volatility in the forward price, giving rise to jumps in the dynamics. This is an immediate result of the afﬁne structure of the model, and occurs despite the spot price having continuous paths. The forward curve incorporates a hump in the short end, which is dependent on the state of the volatility and its mean-reverting property. Another popular stochastic volatility model is the Heston dynamics, where the volatility is given by the Cox–Ingersoll–Ross process (see Cox, Ingersoll, and Ross 1981). This is also a mean-reverting model, however, one can consider the Barndorff-Nielsen and Shephard model as a generalization of Heston because the latter has a stationary limit being Gamma distributed. There is a hump structure implied by the Heston model in the forward curve. The hump shape is deﬁned as the solution of a Riccatti equation which seems difﬁcult to solve analytically. We provide a qualitative analysis of the solution yielding a hump shape. Further, we observe that although the Heston volatility explicitly drives the forward price dynamics, the paths will be continuous due to the diffusion structure of the volatility. Admittedly, the one-factor mean-reversion model for spot price evolution of Schwartz is too simplistic in order to account for all the stylized facts of commodity prices. Even with stochastic volatility, it will not capture important features like a stochastic mean level, spikes and (inverse) leverage. Extensions of the Barndorff-Nielsen and Shephard stochastic volatility model to incorporate these effects will be studied in future work, see Benth and Vos (2009).

REFERENCES

AIT-SAHALIA, Y., and J. JACOD (2009): Testing for Jumps in Disceretely Observed Processes, Ann. Stat. 37(1), 184–222. BARNDORFF-NIELSEN, O. E. (1998): Processes of Normal Inverse Gaussian Type, Finance Stoch. 2(1), 41–68. BARNDORFF-NIELSEN, O. E., and N. SHEPHARD (2001): Non-Gaussian Ornstein-UhlenbeckBased Models and Some of Their Uses in Economics, J. R. Statist. Soc. B 63(2), 167–241 (with discussion). BENTH, F. E., and J. Sˇ ALTYTE˙ -BENTH (2004): The Normal Inverse Gaussian Distribution and Spot Price Modelling in Energy Markets, Intern. J. Theor. Appl. Finance 7(2), 177– 192. BENTH, F. E., J. Sˇ ALTYTE˙ BENTH, and S. KOEKEBAKKER (2008): Stochastic Modeling of Electricity and Related Markets, Singapore: World Scientiﬁc. BENTH, F. E., and L. VOS (2009): A Non-Gaussian Stochastic Volatility Model with Leverage for Commodity Markets, Working Paper, available at: http://ssrn.com/abstract=1495156. ¨ , R., A. CARTEA, R. KIESEL, and G. SCHINDLMAYER (2009): A Multivariate Commodity BORGER Analysis and Applications to Risk Management, J. Futures Markets 29(3), 197–217. CARR, P., H. GEMAN, D. P. MADAN, and M. YOR (2002): The Fine Structure of Asset Returns: An Empirical Investigation, J. Business 75(2), 61–73. CARR, P., and D. P. MADAN (1998): Option Valuation Using the Fast Fourier Transform, J. Comp. Finance 2, 61–73. COX, J. C., J. E. INGERSOLL, and S. A. ROSS (1981): A Theory of the Term Structure of Interest Rates, Econometrica 53, 385–408.

STOCHASTIC VOLATILITY IN COMMODITY MARKETS

625

EBERLEIN, E., and G. STAHL (2003): Both Sides of a Fence: A Statistical and Regulatory View of Electricity Risk, Energy Risk 8, 371–406. EYDELAND, A., and H. GEMAN (1998): Pricing Power Derivatives, RISK, September issue. GEMAN, H. (2005): Commodities and Commodity Derivatives, Chichester, UK: Wiley-Finance, John Wiley and Sons. ¨ , W., and B. LOPEZ CABRERA (2009): Implied Market Price of Weather Risk, Discussion HARDLE Paper, SFB649, Humboldt University, Germany. HESTON, S. L. (1993): A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options, Rev. Finan. Stud. 6(2), 327–343. HIKSPOORS, S., and S. JAIMUNGAL (2007): Asymptotic Pricing of Commodity Derivatives for Stochastic Volatility Spot Models, Appl. Math. Finance 15(5–6), 449–477. IKEDA, N., and S. WATANABE (1981): Stochastic Differential Equations and Diffusion Processes, Amsterdam, Oxford, New York: North-Holland/Kodansha. KALLSEN, J., and P. TANKOV (2006): Characterization of Dependence of Multivariate L´evy Processes Using L´evy Copulas, J. Multivar. Anal. 97(7), 1551–1572. KARATZAS, I., and S. SHREVE (1991): Brownian Motion and Stochastic Calculus, Berlin, Heidelberg, New York: Springer Verlag. LUCIA, J., and E. S. SCHWARTZ (2002): Electricity Prices and Power Derivatives: Evidence from the Nordic Power Exchange, Rev. Derivat. Res. 5(1), 5–50. NICOLATO, E., and E. VENARDOS (2003): Option Pricing in Stochastic Volatility Models of the Ornstein-Uhlenbeck Type, Math. Finance 13(4), 445–466. PROTTER, PH. (1990): Stochastic Integration and Differential Equations, Berlin, Heidelberg, New York: Springer Verlag. SATO, K. (1999): L´evy Processes and Inﬁnite Divisibility, Cambridge: Cambridge University Press. SCHWARTZ, E. S. (1997): The Stochastic Behaviour of Commodity Prices: Implications for Valuation and Hedging, J. Finance LII(3), 923–973.