Loss Models: Study Pack Joshuah Touyz c Draft date November 28, 2009
Contents Contents
i
Preface
1
1 Review of probability and statistics
3
1.1 1.2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.1.1
The indicator function . . . . . . . . . . . . . . . . . . . . . .
3
Conditional distribuions . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.2.1
Marginal Distributions . . . . . . . . . . . . . . . . . . . . . .
4
1.2.2
Conditional distribution of XY = y
. . . . . . . . . . . . . .
5
1.2.3
Conditional expectation . . . . . . . . . . . . . . . . . . . . .
6
2 Parameter Estimation
9
2.1
Unbiased estimation . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.2
Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . 10
2.3
Bayesian Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.3
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Credibility
17
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2
Limited Fluctation Credibility . . . . . . . . . . . . . . . . . . . . . . 18 i
ii
CONTENTS
3.3
3.2.1
Full Credibility . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2
Partial Credibility
. . . . . . . . . . . . . . . . . . . . . . . . 21
Greatest Accuracy Credibility Theory . . . . . . . . . . . . . . . . . . 26 3.3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.2
Background on risk classes . . . . . . . . . . . . . . . . . . . . 26
3.3.3
Bayesian Methodology . . . . . . . . . . . . . . . . . . . . . . 27
3.3.4
Examplesputting it all together . . . . . . . . . . . . . . . . . 29
3.3.5
The Credibility Premium . . . . . . . . . . . . . . . . . . . . . 36
3.3.6
Buhlmann Model . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.7
The BuhlmannStraub Model . . . . . . . . . . . . . . . . . . 44
3.3.8
Exact credibility . . . . . . . . . . . . . . . . . . . . . . . . . 49
4 Severity Distributions
53
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2
Parametric Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2.1
Multiplication by a constant . . . . . . . . . . . . . . . . . . . 54
4.2.2
Raising to a Power: . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.3
Exponentiation: . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.4
Mixture of distributions . . . . . . . . . . . . . . . . . . . . . 55
4.2.5
Summary and Cautions
5 Tail Distributions
. . . . . . . . . . . . . . . . . . . . . 56 57
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2
Existence of moments . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3
Limiting Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4
Hazard Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.4.1
5.5
Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
The residual lifetime and excess loss r. v.s . . . . . . . . . . . . . . . 61 5.5.1
The residual lifetime and its mean . . . . . . . . . . . . . . . . 61
5.5.2
Excess loss random variable and its mean . . . . . . . . . . . . 62
CONTENTS
iii
5.5.3
Some important equations for the moments of the excess loss r.v 62
5.5.4
Terminology for residual lifetime r.v. . . . . . . . . . . . . . . 63
6 Policy Limits
67
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2
Policy adjustments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3
6.2.1
Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.2
Order of application . . . . . . . . . . . . . . . . . . . . . . . 69
Loss payment vs. payment basis . . . . . . . . . . . . . . . . . . . . . 70 6.3.1
6.4
Distributional properties of YP . . . . . . . . . . . . . . . . . . 70
Analysis of policy adjustments on reporting methods . . . . . . . . . 71 6.4.1
Policy limit u with no deductible . . . . . . . . . . . . . . . . 71
6.4.2
Policy limit u with deductible d . . . . . . . . . . . . . . . . . 72
6.4.3
Policies with an ordinary deductible d . . . . . . . . . . . . . . 75
6.4.4
Franchise deductible . . . . . . . . . . . . . . . . . . . . . . . 76
6.4.5
Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7 Frequency Models 7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.1.1
7.2
7.3
81
Useful properties of the PGF . . . . . . . . . . . . . . . . . . 81
Possible candidates for modelling N . . . . . . . . . . . . . . . . . . . 82 7.2.1
Selection of basic distributions . . . . . . . . . . . . . . . . . . 82
7.2.2
The (a, b, 0) class of distributions . . . . . . . . . . . . . . . . 87
7.2.3
The (a, b, 1) class of distributions . . . . . . . . . . . . . . . . 88
7.2.4
Mixture of distributions . . . . . . . . . . . . . . . . . . . . . 90
7.2.5
Compound random variables . . . . . . . . . . . . . . . . . . . 91
7.2.6
Panjerâ€™s recursion for the (a, b, 0) class . . . . . . . . . . . . . 95
7.2.7
Panjerâ€™s recursion for the (a, b, 1) class . . . . . . . . . . . . . 98
Effect of policy adjustments . . . . . . . . . . . . . . . . . . . . . . . 98 7.3.1
Effect of exposure . . . . . . . . . . . . . . . . . . . . . . . . . 99
iv
CONTENTS 7.3.2
Bibliography
Effect of Severity . . . . . . . . . . . . . . . . . . . . . . . . . 101 105
Preface The notes herewith are an attempt to provide some structure to the University of Waterlooâ€™s actuarial science classes Actsc431/831, Actsc432/832 (Loss Models 1, Loss Models 2) in electronic form. They aim to be a comprehensive study pack for the concepts presented in class and will hopefully bring an increased level of clarity to students who may otherwise find the concepts difficult.
1
2
CONTENTS
Chapter 1 Review of probability and statistics To boldly go where no map has gone before
1.1
Introduction
Before developing formal mathematical tools to understand credibility theory it is necessary to review basic probability and statistics. Below the indicator function, marginals, conditionals, compound mixtures/distrbutions and generating functions are considered.
1.1.1
The indicator function
The indicator function is similar to a Bernoulli trial. It assumes a value of 1 if the event A occurs and 0 otherwise. It facilitates writing functions in more compact forms.
Definition 1.1.1 The indicator function or characteristic function is a function defined on a set Ω which indicates whether an element in a subset A is contained in the set Ω. Should A (or an element of A) be contained in Ω then the function’s value is 1 and 0 otherwise, that is A * Ω. Formally: ( 1 if x ∈ A (1.1) 1A (x) = 0 if x ∈ / A 3
4
CHAPTER 1. REVIEW OF PROBABILITY AND STATISTICS
Example: Let Ω be a sample space and let A be an event. Given that 1A (x) is a random variable compute E[1A ]. ( 1 if A occurs . Solution: We are given that: 1A (x) = 0 if A does not occur Taking the expected value: E[1A ] = 1 × Probability A occurs + 0 × Probability A does not occur = 1 × P r(A) + 0 × P r(Ac ) = P r(A). Example: Suppose X exp(β). Write the probability density function of X using the indicator function. Solution: The first thing to do in problems like these is write out the interval’s on which the function exists including all end points. Once all end points have been described the indicator function is tacked on to the appropriate PMF or PDF. ( βe−βx if x > 0 = βe−βx 1x>0 . fX (x) = 0 if x < 0 Similarly:
1.2
E[X] =
Z
∞
βe
−βx
1x>0 dx =
−∞
Z
∞
βe−βx dx
0
Conditional distribuions
In this section we examine the conditional distriubtions of random variables. For discrete random variables we consider there joint probability mass functions (PMF), whereas for continuous random variables we consider there joint probability density functions (PDF). For convenience both are represented by the function fX,Y (x, y). Marginals, conditionals, expectations and variance are reviewed in this context.
1.2.1
Marginal Distributions
Let X and Y be two random variables that have PMF or PDF fX,Y (x, y), then the marginal distribution of X can be written as: Z ∞ fX,Y (x, y)dy continous case fX (x) = −∞ X = fX,Y (x, y) discrete case y∈Z
1.2. CONDITIONAL DISTRIBUIONS
5
The marginal distribution of Y can be similarly written: Z ∞ fX,Y (x, y)dy continous case fX (x) = −∞ X = fX,Y (x, y) discrete case y∈Z
1.2.2
Conditional distribution of XY = y
A conditional distribution is an a posterior (after the fact) distribution whose existence depends on an a priori (before the fact) event occuring. It may be formally denoted as: f{X,Y } (x, y) f{XY } (xy) = if fY (y) 6= 0 fY (y) If X and Y are independent then f{XY } (xy) = fX (x). A similar argument can be made for Y . By induction it can be shown that that if X and Y are independent f{X,Y } (x, y) = fX (x)fY (y) is always satisfied. Example: Suppose X is a mixture distribution. It depends on the the random variable Y. Assume Y is a discrete distribution with two possible values y1 and y2 . Find fX (x) as a conditional distribution of y1 and y2 . Solution: We are given that Y is discrete and only has two possible values this implies: if i = 1 p PY (Y = yi ) = 1 − p if i = 2 0 otherwise
where 0 < p < 1. Since X is a 2 point mixutre it can be written as the sum of conditionals: fX (x) = f{XY } (xy1 ) · PY (Y = y1 ) + f{XY } (xy2 ) · PY (Y = y2 ) = f{XY } (xy1 ) · p + f{XY } (xy2 ) · (1 − p) The preceding example illustrates an important property of mixutre distributions. Although it seems trivial the result can be generalized to a npoint mixture:: fX (x) = f{XY } (xy1 ) · PY (Y = y1 ) + ... + f{XY } (xyn ) · PY (Y = yn ) where i=1,...,n The next example should be familiar to sudents writing Actuarial Exam MLC. Example: Suppose XΘ = θ ∼ Poisson(θ) and that Θ ∼ Gamma(α, β). Find the
6
CHAPTER 1. REVIEW OF PROBABILITY AND STATISTICS
PMF of X. α−1 β α e−θβ 1θ>0 , then to find the marginal distribution of Solution: Let be fΘ (θ) = θ Γα X we integrate over the product of the marginal of Θ and the conditional XΘ.: Z fX (x) = fΘ (θ) · fXΘ (xθ)dθ Z ∞ −θ x α−1 α −βθ e θ θ β e = · dθ x! Γ(α) 0 Z ∞ βα = θx+α−1 e−(β+1)θ dθ Γ(α)x! 0 Z Γ(x + α) ∞ β + 1α+x α+x−1 −(β+1) βα · ·θ e θdθ = Γ(α)x! β + 1α+x 0 Γ(x + α) βα Γ(x + α) = · Γ(α)x! (β + 1)( α + x) α x β 1 Γ(α + x) = Γ(α)x! β+1 β+1 α x β x+α−1 1 = x β+1 β+1 R∞
β+1α+x Γ(x+α)
· θα+x−1 e−(β+1)θ dθ = 1. The resultant distribution of X is a negative β β binomial with parameters α and β+1 i.e. X ∼ NB α, β+1 .This result should be committed to memory. Several other conditional distributions are outlined below and are equally important to know: Since
0
Conditional Distribution of Y Θ : Y ∼ Exp(Θ) Y ∼ InvExp(Θ) Y ∼ N (Θ, c)
1.2.3
Distribution of Θ Θ ∼ InverseGamma(α, θ) Θ ∼ (α, θ) Θ ∼ N (µ, σ 2 )
Mixed Distribution of Y Y ∼ P areto(α, θ) Y ∼ InvP areto(τ = α, θ) Y ∼ n(µ, c + θ2 )
Conditional expectation
To derive conditional expectations directly it is necessary to know the conditional probability mass/density function. If we assume XY = y is a continous random variable then: Z E[XY = y] = x · fXY (xy)dx (1.2)
1.2. CONDITIONAL DISTRIBUIONS
7
A similar equation is derived for XY = y if it is discrete. Note that E[XY ] is a random variable whereas E[XY = y] is not. Method of Double Expectation Suppose the distributions of E[XY ] and fy (y) are given then it is possible to find E[X] using these two distribution. The way in which E[X] is obtained is through the method of double expectiations. Z E[E[XY ]] = E[XY ]fY (y)dy (note: E[E[XY ]] is a function of Y) Z = E[XY = y]fY (y)dy (we’re integrating over y it is assumed to be known) Z Z = xfXY (xy)dx fY (y)dy Z Z = x fXY (xy)fY (y)dy dx (by Fubini’s Theorem) Z Z = xfX (f )dx since fXY (xy)fY (y)dy = fX (x)
(1.3)
E[E[XY ]] = E[X]
Example: Compute E[X] given XΘ = θ ∼ Poisson(θ) and that Θ ∼ Gamma(α, β). Solution: E[X] = E[E[XΘ]] = E[θ] since XΘ = θ ∼ Poisson(θ) α = β r(1 − p) beta = if α = r and =p p 1+β β , In this solution the parametirization for the negative binomial is X ∼ N B α, β+1 which means the expected value for X is α/β. Using the alternate parametrization X ∼ N B(r, p), the same result is realized as shown above. The method of double expectation can be generalized to the function h(·, ·). (1.4)
E[h(x, y)] = E[E[h(x, y)Y ]]
8
CHAPTER 1. REVIEW OF PROBABILITY AND STATISTICS
This proves to be valuable when deriving the variance of a conditional distribution where only the distributions XY and Y are given. More precisely: E[V ar(XY )] = E[E[(X − E[x])2 Y ]] = E[E[X 2 Y ]] − E[(E[XY ])2 ] = E[X 2 ] − E[(E[XY ])2 ] V ar[E(XY )] = E[(E[XY ])2 ] − (E[E[XY ]])2 = E[(E[XY ])2 ] − (E[X])2
Putting these two equations together:
V ar[E(XY )] + E[V ar(XY )] = (E[X 2 ] − E[(E[XY ])2 ]) + (E[(E[XY ])2 ] − (E[X])2 ) = (E[X 2 ] − (E[X])2 ) + (E[(E[XY ])2 ] − E[(E[XY ])2 ]) = V ar[X] − 0 = V ar[X] The variance may accordingly be obtained from the conditional distributions and is summarized by the formula: (1.5)
V ar[X] = E[V ar(XY )] + V ar[E(XY )]
Example: Compute V ar[X] given XΘ = θ ∼ Poisson(θ) and that Θ ∼ Gamma(α, β). Solution: Since XΘ ∼ Poisson(θ): E[V ar[XY ]] = E[θ] = α/β
V ar[E[XY ]] = V ar[θ] α 1+β α α V ar[X] = E[θ] + V ar[θ] = + 2 = β β β β
Example(Compound Possion): Suppose Yi {i = 1, ..} are independent identically distributed variables with mean= µ and variance= σ 2 . If N ∼Poission(λ) Prandom N and X = i=1 Yi 1N ≤1 find E[X] and V ar[X]. Solution: " " N ## " N # X X E[X] = E[E[XN ]] = E E Yi N =E E[Yi ] = E[N E[Y ]] i=1
i=1
= E[N ]E[Y ]
"
V ar[X] = E[V ar[XN ]] + V ar[E[XN ]] = E V ar =E
"
N X i=1
#
V ar[Yi ] + V ar
"
N X i=1
#
"
N X i=1
Yi N
##
" "
+ V ar E
N X
E[Yi ] = E[N V ar[Y ]] + V ar[N E[Y ]]
= E[N ]V ar[Y ] + V ar[N ](E[Y ])2 = λσ 2 + λµ2 = λE[Y 2 ] (since E[Y 2 ] = σ 2 + µ2 )
i=1
Yi N
##
Chapter 2 Parameter Estimation 2.1
Unbiased estimation
ˆ In unbiased estimation an estimator θ(x) is chosen for a given parametric model ˆ (f (x; θ)) such that E[θ(x)]. To ease future notation suppose that we have n independent and identically distributed random variables X1 , ..., Xn , then the may be represented by the random vector X = [X1 , ..., Xn ]T .
Example 2.1.1 Let X1 , ..., Xn be n independent identically distributed random variables with a common P underlying distribution. Show that if the distribution is f (x; β) = −xβ βe 1x>0 then x¯ = i /n represents an unbiased estimator. Solution 2.1.2 Clearly E[Xi ] = 1/β then”: " # P X X 1 1X 1 i i ¯ =E E[X] =E ( Xi ) = E[Xi ] = (nβ) = β n n i n i n Warning: Two words of caution: • X1¯ is not unbiased for β • There might be significant differences in the range of parameters and the estimator In credibility theory we will costruct nonparametric unviased estimators. Consider the example below with variance. 9
10
CHAPTER 2. PARAMETER ESTIMATION
Example 2.1.3 Suppose X1 , ..., Xn be n independent identically distributed random variables with mean µ and varaince σ. Show: P
X
1. barX = in i is unbiased for µ Pn 1 ¯ 2 is unbiased for σ 2 2. s2 = n−1 i=1 (Xi − X)
Solution 2.1.4 1. Clearly: " # P X X 1 1X 1 i i ¯ =E =E ( Xi ) = E[Xi ] = (nµ) = µ E[X] n n i n i n 2. The sample variance s2 is a bit more dfficult to show that it is unbaised. Pn Pn ¯ 2 ¯ 2 2 i ((Xi − µ) + (µ − X) i (Xi − X) == E E[s ] = E n−1 n−1 Pn 2 ¯ + (µ − X) ¯ 2 )] E[ i ((Xi − µ) − 2(Xi − µ)(µ − X) = Pn Pnn− 1 Pn 2 ¯ ¯ 2 i E[(Xi − µ) ] − 2 i E[(Xi − µ)(µ − X)] + i E[(µ − X) ] = n−1 Pn 2 ¯ 2] V ar[Xi ] − 2nE[(Xi − µ) ] + nE[(µ − X) 1 ¯ = i = [nV ar[Xi ] − nV ar[X]] n−1 n−1 = σ2 The following may seem “strange” at the moment but it will make sense whne looking atht he B¨ uhLmannStraub model: For j = 1, 2, ..., n ,Xj is the average of mj identical independentally distributed Pmj (j) Zk /mj .) random variables with mean µ and variance α (think of Xj as Xj = k=1 Then mj α 1 X α= E[Xj ] = µ V ar[Xj ] = α/mj = 2 mj k=1 mj For later use lets assume that
2.2
Maximum Likelihood Estimation
Assume that X1 , ..., Xn is a sample with distribution f (X; θ) where θ can be a vector, then the likelihood function L(θx) for θ is given by: (2.1)
L(θx) = L(θ) =
n Y j=1
f (xj θ) where x = [x1 , ..., xn ]T
2.2. MAXIMUM LIKELIHOOD ESTIMATION
11
ˆ is the point θ where L(θ) reaches its maxiThe maximum likelihood estimator (θ) mum. A similar concept may be defined by taking the log of the likelihood function to derive the loglikelihood l(θx) function. It is defined as follows and is in general easy to work with than the likelihood function: (2.2)
l(θx) = ln(L(θx)) = ln(
n Y j=1
f (xj θ)) =
n X
ln(fXj (xj θ))
j=1
Example 2.2.1 Assume that X1 , ..., Xn are independent identically distributed random variables with a poisson distribution. If the mean of the distribution is λ, find ˆ the likelihood function as well the maximum likelihood estimator λ Solution 2.2.2 The likelihood function is given by: L(θ) =
n Y λxi e−λi
xi !
i=1
=e
−nλ
λ
Pn
i=1
xi
n Y 1 x! i=1
To find the MLE it is easier to use the loglikelihood function: l(θ) = −nλ +
n X
xi [ln(λ)] + c
c is a constant
i
∂l(θ) n¯ x = −n + setting this to 0 ∂λ λ n¯ x =0 ⇒ −n ˆ λ ˆ ⇒ x¯ = λ ˆ is a maximum: It can be further checked that λ ∂ 2 l(θ) = −n¯ x/λ2 ≤ 0 2 (∂λ) This however was unnecessary because we know that the possion distribution is positively increasing bounded function. Example 2.2.3 Suppose that X1 , ..., Xn are independent identically distributed normal random variables with N ∼ (µ, σ 2 ) then the likelihood and loglikelihood func
12
CHAPTER 2. PARAMETER ESTIMATION
tions are given by: n Y
(xi − µ)2 1 1/2 exp − L(µ, σ ) = 2πσ 2 2σ 2 i Pn 2 1 n/2 i=1 (xi − µ) = exp − taking the ln 2πσ 2 2σ 2 Pn 2 n 2 i=1 (xi − µ) ⇒ ln(µ, σ ) = − ln(2πσ) − − 2 2σ 2 2
Now we ca take the partial derivaitive with respect to two parameters µ and σ 2 . The function clearly has two maximum likelihood estimates, lets find them: Pn (xi − µ) ∂l(µ, σ 2 ) = 2 i=1 2 setting this equal to 0 ∂µ 2σ Pn (xi − µ ˆ) ⇒ 2 i=1 2 =0 ⇒µ ˆ = x¯ 2σ n ∂l(µ, σ 2 ) n 1 X =− 2 + (xi − µ)2 taking the partial and setting equal to 0 ∂σ 2 2σ 2(σ 2 )2 i=1 n
1 X n (xi − µ)2 ⇒ 2 = 2 2 2ˆ σ 2(σ ) i=1 Pn (xi − µ)2 ⇒σ ˆ 2 = i=1 n
There are two things that should be noted: 1. the estimator for σ ˆ 2 is biased and 2.we can check if µ ˆ and σ ˆ 2 are really maximums, this can be done by verifying the Jacobian for the functionwe take it as given here.
2.3 2.3.1
Bayesian Estimation Introduction
In previous sections we took a frequentist approach to estimating various parameters. That is we generated the most likely estimates for paratmeters of a given fixed underlying disitribution. The Bayesian approach is different in that it does not assume a fixed underlying distribution and allows parameters to change as more data is collected.
2.3. BAYESIAN ESTIMATION
2.3.2
13
Definitions
Definition 2.3.1 “A prior distribution is a probability distribution over the space of all possible parameter values. It is denoted π(θ) and represents our opinion concenin the relative chances that various values of θ are the true value of the parameter.”[1] That is an a prior distribution represents our original assumption on the probabilities of what the parameter should look like before data has been collected. The literature suggests that it is often difficult to define an exact prior distribution and consequently the definition of a prior distribution has been loosened to include the improper prior distribution. Definition 2.3.2 An improper prior distribution is one for which the probabilities (or pdf ) are nonnegative but their sum (or integral) is infinite. A noninformative or vague prior is a distribution that reflects minimal knowledge of the underlying prior/improper prior distribution. Opinions vary concerning the construction of vague priors. There is agreement however that an appropriate noninformative prior distribution for a scale pararmeter is π(θ) = 1/θ, θ > 0, which is an example of an improper prior. Under Bayesian analysis as data is collected a model distribution is concstructed to relfelct this new in formation. Definition 2.3.3 “ A model distribution is the probability distribution for the data as collected given a particular value for the parameter. Its pdf is denoted as fXΘ (xθ)”[1] where x is a vector of data collected. It’s function is: fXΘ (xθ) =
n Y i=1
fXΘ (x1 θ) = fXi Θ (x1 θ) · ... · fXΘ (xn θ)
Similarly the joint and marginal distributions are defined: fX,Θ (x, θ) = fXΘ (xθ)π(θ) Z fX (x) = fX,Θ (x, θ)fXΘ (xθ)π(θ)dθ Notice that a model distribution (when consisting of more than one data point) is identical to the likelihood function (the names are used interchangeably). As more data is collected a better estimate of the data can be obtained. The updated knolwedge is reflected in the posterior distribution.
14
CHAPTER 2. PARAMETER ESTIMATION
Definition 2.3.4 The posterior distribution “is the conditional probability distribution of the parameters given the observed data”[1]. It is written as: (2.3) (2.4)
fXΘ (xθ)π(θ) = fX,Θ (x, θ) = fXΘ (xθ) fXΘ (x1 θ) · ... · fXΘ (xn θ) = fX (x)
πΘX (θx) = R
Similarly suppose we wanted to estimate the probability of a new observation given our current knowledge of the posterior then we would construct a predictive distribution. Definition 2.3.5 “The predictive distribution is the conditional probability distribution of a new observation y given the data x: Z (2.5) fY X (yθ) = fY θ (yθ)πΘX (θX)dθ
where fY θ (yθ) is the pdf of the new observation, given the paramter value”[1]
Remark 2.3.6 There are two things to remark from the definitions above: 1.The posterior mean E[ΘX1 = x1 , ..., Xn = xn ] is a point estimator for Θ. This will be important in Bayesian credibility. 2. The predictive mean is often used in forecasting as a point estimator for the n + 16st observation given the first n observations and the prior distribution.
2.3.3
Examples
In this section we look at several examples involving the definitions above to enforce the ideas introduced. Example 2.3.7 We are given n independent, identically distributed random variables X1 , ..., Xj . Suppose Xj Θ = θ ∼Poisson(θ) {j = 1, 2, ..., n} and that the prior α α−1 −βθ distribution is Θ ∼Gamma(α, β) (i.e. π(θ) = β θ Γ(α)e 1θ>0 ). Find the model distribution, the marginal distribution of x, the posterior distribution and the posterior mean. Solution 2.3.8 Model Distribution: n n Y Y θ xj e θ fXΘ (xθ) = fXΘ (xj θ) = xj ! j=1 j=1 Pn e−nθ θ j=1 xj = Qn j=1 xj !
2.3. BAYESIAN ESTIMATION
15
Marginal distribution Z Z Pnj=1 xj −nθ α α−1 −βθ θ e β θ e Qn dθ fX (x) = fXΘ (xθ)π(θ)dθ = Γ(α) j=1 xj ! Z P n βα = Qn θ j=1 xj e−nθ θα−1 e−βθ dθ j=1 xj !(Γ(α)) Z P n βα θ j=1 xj +α−1 e−nθ−βθ dθ = Qn j=1 xj !(Γ(α)) Z βα = Qn θn¯x+α−1 e−θ(n+β) dθ x !(Γ(α)) j j=1 βα now let α∗ = n¯ x + α, β∗ = n + βc = Qn j=1 xj !(Γ(α)) Z = c θα∗−1 e−θβ∗ dθ Remark 2.3.9 This distribution is very similar to a Gamma distribution with parameters α∗, β∗ Posterior distribution: in the posterior distribution fX (x) is a constant with respect to θ, so all that needs to be focused on is the numerator. fXΘ (xθ)π(θ) πΘX (θx) = R fX,Θ (x, θ) ∝
θ
Pn
xj −nθ
βα β α θα−1 e−βθ e = Qn (θn¯x+α−1 e−θ(n+β) Γ(α) x ! x !(Γ(α)) j j j=1 j=1
j=1
Qn
∝ cθα∗−1 e−θβ∗ )
where α∗ = n¯ x + α, β∗ = n + β
So the posterior distribution is in fact gamma distributed with πΘX (θx) ∼Gamma(α∗ = (n¯ x + α), β∗ = (n + β)). Posterior mean α∗ n¯ x+α E[ΘX1 = x1 , ..., Xn = xn ] = = β∗ n+β n β α = x¯ + n+β β +nβ n = Z x¯ + (1 − Z)E[Θ] where 0 < Z = <1 n+β Interstingly, the posterior mean is a weighted average of the sample mean and the prior distribution mean.
16
CHAPTER 2. PARAMETER ESTIMATION
Chapter 3 Credibility Credibility Theory:a branch of actuarial science originally developed as a method to calculate an individual’s risk premium by combining the individual risk experience with the class risk experience.
3.1
Introduction
Credibility theory was first developed as a response to the credibiliy problem. It tries to deal with reconciling “how much of a difference in experience of a given policyholder is due to random variation in the underlying claims expeience and how miuch is due [...] to the policyholder really [being] a better or worse risk than average for the rating class”[1]. The original paper by Mowbray in (1914)[?] formulates the problem as follows: Suppose: 1. Xj represents the experience of a policy (or policyholder) j th (j ∈ {1, 2, 3, ...}) from a group or (particular rating scheme.) 2. Xj ’s mean is stable over time i.e. E[Xj ] = ξ (this would be the premium charged net of expensesif it’s value was known) 3. Xj ’s variance is V ar[Xj ] = σ 2 for a ll j ¯ = Pn /n (which Now the past experience may be summarized by the statistic X j=1 ¯ we get that is the average over n.) Taking the expected value and variance of X ¯ = ξ and V ar[X] ¯ = σ 2 /n. The insurer’s goal is to choose an appropriate value E[X] for ξ. This may be done in one of three ways: 1. Ignore past data (no credibility) and choose the manual premium M (which 17
18
CHAPTER 3. CREDIBILITY can be found in a book (manual) of premiumshence its name) ÂŻ 2. Use only past data (full credibility) and charge X ÂŻ (partial credibility) 3. Use a mixture of M and X
ÂŻ because itâ€™s more stable across time and a good Generally insurers like to use X indicator of next yearâ€™s results. Where more variablity exists a manual preimum is more suitable.
3.2 3.2.1
Limited Fluctation Credibility Full Credibility
Framework ÂŻ must be relatively stable over time. To assign full credibility to a policyholder X ÂŻ and Îž must be From a statistical perspetive this means the difference between X relatively small with high probability. Mathematically: (3.1)
ÂŻ âˆ’ Îž â‰Ľ rÎž) â‰Ľ p Pr(âˆ’rÎž â‰¤ X
where r > 0 and is close to 0 and 0 < p < 1. It is often more mathematically convenient to express this in terms of absolute values and something akin to a normal approximation (since for a large part of the time a the normal assumption is made.) Specifically:
ÂŻ ÂŻ âˆš
X âˆ’ Îž
X âˆ’ Îž rÎž n
â‰Ľ p â‡’ yp = inf y Pr âˆš â‰¤ y â‰Ľ p Pr âˆš â‰¤ (3.2) Ďƒ Ďƒ/ n Ďƒ/ n Here yp is the minimum value of y that satisfies the probability statement above. ÂŻ is continuous), â‰Ľ is substituted with = since the exact In continuous cases (i.e. X value p is known. Accordingly the condition for full credibility is met when: âˆš rÎž n (3.3) â‰Ľ yp Ďƒ This equation can be cleaned up a bit, rearranging the terms and letting Îť0 = (yp /r)2 : âˆš Ďƒ râˆš rÎž n â‰Ľ yp â‡’ â‰¤ n Ďƒ Îž yp r Ďƒ n â‡’ â‰¤ Îž Îť0
3.2. LIMITED FLUCTATION CREDIBILITY
19
The left hand side of the equation (σ/ξ) is the coefficient of variation. Similarly, n can be isolated for the full number of untis required for full credibility: 2 σ n ≥ λ0 ξ
(3.4)
This will be the equation used to determine credibility on the amount of claims and numer of claims basis. The normality assumption ¯ which has mean ξ and When n is large the normal assumption is applied to X 2 variance σ /n. To find the value yp , we solve using the normal distribution:
¯
X − ξ
p = Pr
√
≤ yp = Pr(Z ≤ yp ) = Φ(yp ) − Φ(yp ) σ/ n = 2Φ(yp ) − 1 Solving for yp : (3.5)
1+p Φ(yp ) = 2
=⇒
−1
yp = Φ
1+p 2
This is important since given a p, yp will not be Φ−1 (p) but rather the equation above. Consider the examples below: Example 3.2.1 Suppose p = 0.95 and r = 0.05 find yp and λ0 Solution 3.2.2 Using the equation 2.4 yp = Φ−1 ((1 + .95)/2) = Φ−1 (0.975) = 1.96. Also λ0 = (1.96/0.05)2 = (39.2)2 = 1536.64. This means given p = 0.95 and r = 0.05 the number of claims required for full credibility will be n ≥ (λ0 )2 σ 2 /ξ 2 = 1536.64σ 2 /ξ 2 . Determining the number of claims/ amount Questions on full credibiliy come in two flavours: 1. Determine the number of exposure units required for full credibility based on the expected number of claims per policy (also known as number of claims basis)
20
CHAPTER 3. CREDIBILITY 2. Determine the number of exposure units required for full credibility based on the expected dollars of claims per policy (also known as amount claims basis)
The two questions are slightly different in what they ask. The first looks at the number of claims produced by policyholders without considering the amount of each claim. The second question however considers the amount each claim produces be policyholders (that means we have to consider the conditional distributional of claims).Consider the following example: Example 3.2.3 Suppose we are looking at a compound Poisson distribution. Let: 1. Yi,j be the j th claim for policyholder i with mean θY and variance σY2 2. Xi be the total loss for policyholder i 3. Ni ∼Poisson(λ) Then Xi is given by: Xi =
Ni X
Yi,j = Yi,1 + Yi,2 + ... + Yi,Ni
j=1
Determine the number of units required for full credibility based on the number claims and amount claims if p = 0.05, p = 0.95, θY = 45, σY = 10 and λ = 0.8/(per policy). Solution 3.2.4 Number of claims basis: We are only interested in the number of claims generated. Accordingly we look at the mean and variance for the claim distribution Ni . The distribution’s mean is given by ξ = E[Ni ] = λ = 0.8 and its variance is given by σ 2 = V ar[Ni ] = λ = 0.8.Plugging these values into equation 2.4: 2 0.5 2 σ 0.8 1536.64 2 n ≥ λ0 = 1920.8 = (1.96/0.05) = ξ 0.8 0.8 For full credibility 1920.8 observations are required. But since the quesion asks for the number of claims generated we multiply 1920.8 by 0.8claims/(per policy). That is the number of claims required for full credibility is 1536.64 claims. Amount claims basis: Now we are interested in the total claim random variable Xi . So for Xi : ξ = E[Xi ] = λθY and σ 2 = V ar[Xi ] = λ(θY2 + σY2 )
3.2. LIMITED FLUCTATION CREDIBILITY
21
" 2 # 2 λ0 λ(θY2 + σY2 ) σY σ = 1+ = λ0 λ0 2 ξ (λθY ) λ θY # " 2 1536.64 10 = 2015.65 = 1+ 0.8 45 Under an amount claims basis 2015.65 observations are required. However the questions asks for the expected total dollar of claims, so 2015.65 is multiplied by 0.8claims/(per policy) and 45$(per claim) ie. (Num observations)E[claim amount/policy]E[number of claims/policy] = 2015.65(0.8)45 = $72563.4 So $72563.4 dollars worth of claims have to be generated before considering full credibility. Note: In the example above λ was given. For empirical distributions this will not be the case both them underlying mean and variance will need to be estimated. Suppose that in the example above there were 1500 observations out of which 450 resulted in policy claims. Then λ =(number of claims)/(number of observations)= 450/1500 = 0.3/(per policy).
3.2.2
Partial Credibility
Sometimes the full credibility assumption may be inappropriate. This may arise due to competitive policy pricing or the feel that the premium charged under full credibility isn’t an appropriate reflection of the actual premium. In such cases recourse to partial credibility is availble. ¯ in the net premium with the externally Partial credibility uses past experience X obtained mean M . Various methods for defining the premium under partial credibility (called PC ) exist (see []) however an intuitively appealing method is to take a ¯ and M : weighted average of X (3.6)
¯ + (1 − Z)M Pc = Z X
z ∈ [0, 1]
There are different ways to define Z. In this particular context we define Z such that the difference between Pc and ξ is small with high probability: "
#
P ξ rξ c
≤ Pr[Pc − ξ ≤ rξ] = Pr
p ≥p V ar(Pc ) V ar(Pc )
22
CHAPTER 3. CREDIBILITY
Clearly p can be controlled by changing the variance of Pc . Before proceeding recall ÂŻ = Ďƒ / n, now to find the variance of Pc we have: V ar[X] 2
(3.7)
ÂŻ + (1 âˆ’ Z)M ] = Z 2 V ar[X] ÂŻ = Z2 Ďƒ V ar[Pc ] = V ar[Z X n
Plugging V ar[Pc ] back into the equation above. ďŁš ďŁŽ # "
Pc âˆ’ Îž
rÎž ďŁť
Pc âˆ’ Îž
â‰¤ p rÎž â‰Ľp = Pr ďŁ° q Pr
p
â‰¤
2 Z âˆšĎƒn V ar(Pc ) V ar(Pc ) Z 2 Ďƒn " # rÎž = Pr yp â‰¤ â‰Ľ p where yp is given by Pr[N (0, 1) â‰¤ yp ] = p 2 Z 2 Ďƒn The condition for parital credibility is subsequently satsified when: (3.8)
yp â‰¤
rÎž âˆš ZĎƒ/ n
Condition for partial credibility
Rearranging the terms, Z is obtained: r n (3.9) Z = infy 1, Îť0 (Ďƒ/Âľ)2
since 0 â‰¤ Z â‰¤ 1
Clearly if Z â‰Ľ 1 then the number of observations required for pricing Pc reduces to finding number of observations required for full credibility. Also, its interseting to see that the denominator (Îť0 (Ďƒ/Âľ)2 ) is the number of required claims for full credibility. Consequently, Z can be interpreted as the square root of total obervations divided by the number of observations required for full credibility. Letâ€™s look at an example: Example 3.2.5 : Compound Gamma Suppose we are looking at a compound Gamma distribution. Let: 1. Yi,j be the j th claim for policyholder i with mean Î¸Y = 45 and variance ĎƒY2 = 10 P i 2. Xi be the total loss for policyholder i with Xi = N j=1 Yi,j = Yi,1 +Yi,2 +...+Yi,Ni 3. Ni âˆźGamma(Îą, Î˛) where Îą = 4, Î˛ = 3 A.Determine the number of units required for full credibility under number claims and amount claims methods if p = 0.05, p = 0.95. B.Given that there have been 15 observations and that the manual premium is $75 determine the credibility estimate for Z and the premium paid under both number claims and amount claims methods.
3.2. LIMITED FLUCTATION CREDIBILITY
23
Solution 3.2.6 Before digging into the solution there is some prelimanry work that has to be done to find E[Xi ] and V ar[Xi ]. Since Xi is a compound distribution the double expectation theory can be use to find E[Xi ] and V ar[Xi ]. # "N i X Yi,j = E[E[Yi,j Ni ]] = E[Ni E[Yi,j ]] = E[Ni ]E[Yi,j ] E[Xi ] = E j=1
= (αβ)θy = (4(3))45 = 540
E[V ar[Yi,j Ni ]] = E[Ni V ar[Yi,j ]] = E[Ni ]E[V ar[Yi,j ]] = E[Ni ]V ar[Yi,j ] = (αβ)σY2 = (3(4))10 = 120 V ar[E[Yi,j Ni ]] = V ar[Ni E[Yi,j ]] = V ar[Ni ](E[Yi,j ]]2 = (α(β)2 )(θY )2 = (4(3)2 )(452 ) = 72900 V ar[Xi ] = E[V ar[Yi,j Ni ]] + V ar[E[Yi,j Ni ]] = 120 + 72900 = 73020 Now that we have E[Xi ] and V ar[Xi ], we may proceed: A. Let nnc be the number of required observations under the number of claims method and let nac be the number of required observations under the amount claims method. Given that lambda0 = 1536.64, E[Ni ] = 4(3) = 12, V ar[Ni ] = 4(3)2 = 36 then: 2 2 σnc 6 nnc = λ0 = 384.16 = 1536.64 µnc 12 !2 √ 2 σac 73020 = 384.79 = 1536.64 nac = λ0 µac 540 In both situations roughly 385 observations are required. B. Since the required number of observations is the same under both methods, only 1 Z needs to be defined. r 15 Z= = 0.1973855 −→ ¶c = (0.1973855)$45 + (1 − .1973855)$75 = $69.07843 385 The premium charged under partial credibility should be ∼$69.08. Comparing this to the full credibility case there is a signigicant difference in pricing: $69.08 − $45 = $24.08. Two additional examples are included below for completeness: Example 3.2.7 For a group, the amount of claims in a year i is a compound Poisson random variable with Poisson parameter λ. The indiviudal claim size has a distribution that is exponential with mean β. If credibility is based on the number of claims then Z = 0.8. Determine Z if credibility is based on the total amount of claims.
24
CHAPTER 3. CREDIBILITY
Solution 3.2.8 Let: 1. Yi,j represent the j th claim during the ith year i.e. Yi,j âˆź exp(Î˛) 2. Ni represent the number of claims taking place in year i then Ni âˆź P oisson(Îť) 3. Xi be amount in claims experienced in year i Also, note that E[Yi,j ] = Î˛, V ar[Yi,j ] = Î˛ 2 and E[Ni ] = V ar[Ni ] = Îť. The example further specifies that credibility under the number of claims method has Z = 0.8, accordingly: ďŁź ďŁą ďŁ´ ďŁ´ r ďŁ´ ďŁ´ ďŁ˝ r n ďŁ˛ v u n nÎť u = 0.8 = min 1, t = 2 1 0.5 ďŁ´ ďŁ´ Îť0 Îť0 ( Îť ) i] ďŁ´ ďŁ´ Îť0 V ar[N ďŁž ďŁł E[Ni ] 0.82 = 0.64 =
n Îť0 ( Îť1 )
Now tackling Z under the amount claims method: ďŁą ďŁź ďŁ´ ďŁ´ v ďŁ´ ďŁ´ ďŁ˛ u ďŁ˝ n u Zamount = min 1, t 2 ďŁ´ ďŁ´ V ar[Yi,j ]0.5 ďŁ´ Îť (1 + )ďŁ´ ďŁł ďŁž 0 E[Yi,j ] ďŁź ďŁą ďŁ´ ďŁ´ ďŁ´ ďŁ´ ďŁ˝ ďŁ˛ v u n u = min 1, t 2 0.5 2 ďŁ´ ďŁ´ ďŁ´ Îť0 (1 + (Î˛ Î˛) )ďŁ´ ďŁž ďŁł r r n n = min 1, = min 1, Îť0 (1 + 1) Îť0 (2) r n 1 0.8 0.8 âˆš = min 1, = min 1, âˆš = âˆš = 0.5657 Îť0 2 2 2 Example 3.2.9 The compound negative binomial: Suppose the total amount in claims produced in year i is a compound negative binomial random variable (Xi ). Let Ni âˆź N egativeBinomial(r, Î˛) then Xi is given by: (P Ni if Ni â‰Ľ 1 i=1 Yi,j , Xi = 0, if Ni = 0 If Yi,j represents the j th claim in year i (with mean=ÂľY and variance=(ĎƒY )2 ) develop formulas for Z under partial credibility for both the number of claims and claim amount methods.
3.2. LIMITED FLUCTATION CREDIBILITY
25
Solution 3.2.10 Number of claims method Since Ni represents the number of claims in period i, Ni â€™s respective expected vlaue and varaince are: E[Ni ] = rÎ˛ and V ar[Ni ] = rÎ˛(1 + Î˛). Under full credibility the required number of claims is given by: !2 !2 p p V ar[Ni ] rÎ˛(1 + Î˛) rÎ˛(1 + Î˛) = Îť0 = Îť0 nf c â‰Ľ Îť 0 E[Ni ] rÎ˛ (rÎ˛)2 Îť0 [1 + Î˛] rÎ˛ So for partial credibility the factor Z is given by: ďŁź ďŁą ( s ) s r ďŁ˛ n n ďŁ˝ n(rÎ˛) Z = min 1, = min 1, Îť0 [1+Î˛] = min 1, ďŁž ďŁł nf c Îť0 [1 + Î˛] =
rÎ˛
Amount claimed method Under this method E[Xi ] and V ar[Xi ] are required. Using the double expectation theory we have: ďŁšďŁš ďŁŽ ďŁŽ Nj X Yi,j Ni ďŁťďŁť E[Xj ] = Âľ = E[E[Xi Ni ]] = E ďŁ°E ďŁ° j=1
= E[Ni E[Yi,j ]] = E[Ni ]E[Yi,j ] = ÂľY [rÎ˛] ďŁšďŁš ďŁŽ ďŁŽ Nj X Yi,j Nj ďŁťďŁť = E[Ni V ar[Yi,j ]] E[V ar[Xi Ni ]] = E ďŁ°V ar ďŁ° j=1
= E[Ni ]V ar[Yi,j ] = rÎ˛(ĎƒY )2 ďŁšďŁš ďŁŽ ďŁŽ Nj X Yi,j Nj ďŁťďŁť = Âľ2Y rÎ˛(1 + Î˛) V ar[E[Xi Ni ]] = V ar ďŁ°E ďŁ° j=1
V ar[Xi ] = rÎ˛(ĎƒY )2 + Âľ2Y rÎ˛(1 + Î˛)
So under full expectation the number of observations required is: !2 ! p p V ar[Xi ] rÎ˛(ĎƒY )2 + Âľ2Y rÎ˛(1 + Î˛) ĎƒY2 Îť0 nf c â‰Ľ Îť 0 1+Î˛+ 2 = Îť0 = E[Xi ] (rÎ˛(ĎƒY )2 )2 rÎ˛ ÂľY Under partial credibility Z is given as: ďŁź ďŁą ( s ) r s ďŁ˝ ďŁ˛ n n n(rÎ˛)(ÂľY )2 = min 1, = min 1, Z = min 1, 2 ĎƒY Îť0 ďŁł nf c Îť0 [Âľ2Y (1 + Î˛) + ĎƒY2 ] 1+Î˛ 2 ďŁž rÎ˛
ÂľY
26
CHAPTER 3. CREDIBILITY
3.3 3.3.1
Greatest Accuracy Credibility Theory Introduction
Greatest accuracy credibility theory is known by two other names: “European” and greatest fluctuation credibility. It is an outgrowth of B¨ ulhmann’s 1967 paper, which addresses similar questions asked in limited fluctuation credibility but from a bayesian approach. Let’s consider the basic problem. Suppose for a particular policyholder n exposure units of past claims are observed i.e. X = (X1 , ..., Xn )T . Although we have a manual rate µ, past experience may suggest that it is an inappropriate measure of what should be charged for the net premium next year. Should the preimum be ¯ or a combination of the two? This was similar to the question we based on on µ, X asked before but this time risk class is considered. Two questions are considered in this context: 1. What premium should be charged given a policyholder’s past experience is different from there class? 2. How much of this difference is due to random variation amongst the members of a class and how much is due to random chance? It is these two questions that greatest accuracy credibility theory addresses.
3.3.2
Background on risk classes
When selecting an individual to be part of an insurance policy (or any risk based contract for that matter), there is usually a strict underwriting process that accompanies a policyholder’s application. In this process a policyholder is rated on different levels then placed into a risk class. The risk class is assumed to be homogeneous however the underwriting process is far from perfect. There is usually some heterogeneity (in risk ratings) amongst policyholders of the same class. The assumption that a policyholder is different from members of the same class system leads to the question of pricing. How does an underwriter choose an appropriate rate for the policyholder? Several assumptions need to be made: • Every policyholder is characterized by a risk level θ within the rating class • θs vary amongst policyholders • Since θs vary a PDF and CDF can be assigned to θ, call them πΘ (θ) and ΠΘ (θ) =Pr(Θ ≤ θ), respectively.
3.3. GREATEST ACCURACY CREDIBILITY THEORY
27
Remark 3.3.1 : “Since all observable underwriting characteristics have been used , θ may be viewed as presentative of the residual, unobserved factors that affect the risk level”[1]. Also note: 1. ΠΘ (θ) represents the proportion of policyholders with risk parameter less than or equal to θ 2. We have implicitly assumed the structure of risk characteristics within the population is known for πΘ (θ).
3.3.3
Bayesian Methodology
Under the Bayesian methodology we assume that the pool of potential risk parameters derives from the prior distribution π(θ). The conditional distribution fXΘ (xθ) represents a particular policyholder loss/claims given risk parameter θ. The problem we face is deriving a premium to cover the claim Xn+1 given X = x have occured. Some caution should be taking when looking at the past claims. The Xi s for different exposure periods are assumed to be conditionally indepedent but not necessarily identical i.e. Xi θ are independnet but not necessarily identical since Xi may change from one period of exposure to the next. In order to predict the claims for the Xn+1 period we would normally condition on θ, however θ for n + 1 is unknown so instead Xn+1 is conditioned on x. The distribution that results is the predictive distribution which was introduced earlier in the section on parametric estimation. Recall the predictive distribution for Xn+1 is given by: R f (x, θ; x = [x1 , ..., xn+1 ]T )dθ fXn+1 (xn+1 x) = fX (x = [x1 , ..., xn ]T ) R Qn+1 f (x θ) π(θ)dθ j X Θ j i=1 = fX (x) Qn ! Z f (x θ)π(θ) j X Θ j fXn+1 Θ (Xn+1 θ) i=1 = dθ =⇒ fX (x) (3.10)
fXn+1 (xn+1 x) =
Z
fXn+1 Θ (Xn+1 θ)πΘX (θx)dθ
Effectively what is being said is that we are using the past information of x to ascertain the future loss/claim amount at time n + 1. The formula above nicely
28
CHAPTER 3. CREDIBILITY
summarizes the idea using past information to generate a conditional maximum likelihood estimate for θ. A similar equation exists for a discrete distribution that is: fXn+1 X (xn+1 x)πΘX (θx)
(3.11)
Bayesian Premium “In what follows we assume that the predicitive distributon is known. How much should we charge to cover Xn+1 . Ideally, one would the hypotheical mean µn+1 (θ) = E[Xn+1 Θ = θ]. Unfortunately, Θ can not be observed. We replace Θ by X and calculate the Bayesian Premium E[Xn+1 X = x]. By definition: If no expereince is known then a collective premium is charged to cover Xn+1 : E[Xn+1 X = x] =
Z
xn+1 fXn+1 X (xn+1 x)dx Z Z = xn+1 fXn+1 Θ (xn+1 θ)πΘX (θx)dθ dx Z Z = xn+1 fXn+1 Θ (xn+1 θ)dxn+1 πΘX (θx)dθ Z
E[Xn+1 X = x] =
(3.12)
µn+1 πΘX (θx)dθ
In the discrete case the equation above is rewritten as; (3.13)
E[Xn+1 X = x] =
X
µn+1 πΘX (θx)
θ
If no experience is known then a collective premium is charged to cover Xn+1 : Z
Z Z
xn+1 fXn+1 Θ (xn+1 θ)π(θ) dθdxn+1 xn+1 fXn+1 (xn+1 )dxn+1 = Z Z Z xn+1 fXn+1 Θ (xn+1 θ) dxn+1 π(θ)dθ = µn+1 (θ)π(θ)dθ =
E[Xn+1 ] =
Where µn+1
3.3. GREATEST ACCURACY CREDIBILITY THEORY
3.3.4
29
Examplesputting it all together
Introduction The ideas of a predictive distribution, posterior distribution and the Bayesian premium may seem purely theortical at this points. Below two examples are presented to solidfy these concepts one is discrete the other is continous. Example 1discrete Suppose there are three types of drivers. Good drivers make up 60% of the population the probability of having 0, 2 and 4 claims in a given year is 0.75, 0.15, 0.1 respectively. Similarly average drivers make up 20% of the population. Their probability of having 0, 2 and 4 claims is given by 0.6, 0.3, 0.1. Lastly bad drivers make up the remaining 20% of the population. Their probability of having 0, 2 and 4 claims is given by 0.4, 0.4, 0.2. When a driver buys insurance it is unknown whether they are a good, average or bad driver. The risk parameter Θ can take on one of three values either Θ = G, Θ = A or Θ = B. The probability model for the number of claims,X, and risk parameter Θ is given in the table below. For a particular policyholder it has been observed x 0 2 4
Pr(X = xΘ = G) 0.75 0.15 0.1
Pr(X = xΘ = A) 0.6 0.3 0.1
Pr(X = xΘ = B) 0.4 0.4 0.2
θ G A B
Pr(Θ = θ) 0.6 0.2 0.2
Table 3.1: Probabilties associated with risk parameters that x1 = 4, x2 = 2. Determine the predictive distribution X3 X1 = 4, X2 = 2, the posterior distribution ΘX1 = 4, X2 = 2 as well as the Bayesian premium for X3 . Predictive distribution X fX (4, 2) = fX1 Θ (4θ)fX2 Θ (2θ)π(θ) θ
= Pr(X1 = 4Θ = G)Pr(X2 = 2Θ = G)π(θ = G) + Pr(X1 = 4Θ = A)Pr(X2 = 2Θ = A)π(θ = A) + Pr(X1 = 4Θ = B)Pr(X2 = 2Θ = B)π(θ = B) = (0.1)(0.15)(0.6) + (0.1)(0.3)(0.2) + (0.2)(0.4)(0.2) = 0.031
30
CHAPTER 3. CREDIBILITY
Similarly the joint distribution over all three variables is given by; X fX,X3 (4, 2, x3 ) = fX1 Θ (4θ)fX2 Θ (2θ)fX3 Θ (x3 θ)π(θ) θ
Writing out all probabilities of claims values over X3 : fX,X3 (4, 2, 0) = (0.1)(0.15)(0.75)(0.6) + (0.1)(0.3)(0.6)(0.2) + (0.2)(0.4)(0.4)(0.2) = 0.01675 fX,X3 (4, 2, 2) = (0.1)(0.15)(0.15)(0.6) + (0.1)(0.3)(0.3)(0.2) + (0.2)(0.4)(0.4)(0.2) = 0.00955 fX,X3 (4, 2, 2) = (0.1)(0.15)(0.1)(0.6) + (0.1)(0.3)(0.1)(0.2) + (0.2)(0.4)(0.2)(0.2) = 0.0047 Which The predictive distribution fX3 X (x3 x1 = 4, x2 = 2) is then given by: 0.01675 = 0.5403 0.031 0.00955 fX3 X (24, 2) = = 0.3081 0.031 0.0047 = 0.1516 fX3 X (44, 2) = 0.031
fX3 X (04, 2) =
Which can be simplified to: 0.5403 x3 = 0 fX3 X (x3 4, 2) = 0.3081 x3 = 2 0.1516 x3 = 4
The posterior distribution. The posterior distribution will depend on the values of Θ assumes. (0.1)(0.15)(0.6) f (4G)f (2G)π(G) = = 0.2903 f (4, 2) 0.031 f (4A)f (2A)π(A) (0.1)(0.3)(0.2) π(A4, 2) = = = 0.1935 f (4, 2) 0.031 f (4B)f (2B)π(B) (0.2)(0.4)(0.2) π(B4, 2) = = = 0.5161 f (4, 2) 0.031 π(G4, 2) =
Once again the distribution can be neatened up: 0.2903 θ = G π(θ4, 2) = 0.1935 θ = A 0.5161 θ = B
3.3. GREATEST ACCURACY CREDIBILITY THEORY
31
From a computational point of view, a simpler method for deriving the predictive distribution probabilities is available. Suppose we have a vector x of n observations (x = [x1 x2 ...xn ]T ) where observations can take on values of {a1 , a2 , ..., ar } and we wish to estimate the probability distribution of the (n + 1)st observation (as in the example above). Then given the posterior distribution is known for x we have: X (3.14) f (xn+1 = ai x) = f (xn+1 = ai x1 , x2 , ..., xn ) = f (ai θ)π(θx1 , x2 , ..., xn ) ∀θ
In the example above we have; f (04, 2) = f (0θ = G)π(θ = G4, 2) + f (0θ = A)π(θ = A4, 2) + f (0θ = B)π(θ = B4, 2) = 0.75(0.2903) + 0.6(0.1935) + 0.4(0.5161) = 0.540265 f (24, 2) = f (2θ = G)π(θ = G4, 2) + f (2θ = A)π(θ = A4, 2) + f (2θ = B)π(θ = B4, 2) = 0.15(0.2903) + 0.3(0.1935) + 0.4(0.5161) = 0.308035 f (44, 2) = f (2θ = G)π(θ = G4, 2) + f (2θ = A)π(θ = A4, 2) + f (2θ = B)π(θ = B4, 2) = 0.1(0.2903) + 0.1(0.1935) + 0.2(0.5161) = 0.1516 Notice that is the same result as before, with differences due to rounding. Bayesian Premium The Bayesian premium can be computed in one of two ways: • Directly E[X3 0, 1] =
3 X i=1
xn+1 fXn+1 X (xn+1 x)
= (0)Pr(xn+1 = 0x) + (2)Pr(xn+1 = 2x) + (4)Pr(xn+1 = 4x) = (0)(0.5403) + (2)(0.3081) + (4)(0.1516) = 1.2226 • Indirectly this method is a bit longer since it requires the computation of the unobservable means. µ3 (ΘΘ = G) = µ3 (ΘΘ = A) = µ3 (ΘΘ = B) =
3 X
i=1 3 X
i=1 3 X i=1
xi Pr(xi Θ = G) = (0)(0.75) + (2)(0.15) + (4)(0.1) = 0.7 xi Pr(xi Θ = A) = (0)(0.6) + (2)(0.3) + (4)(0.1) = 1 xi Pr(xi Θ = B) = (0)(0.4) + (2)(0.4) + (4)(0.2) = 1.6
32
CHAPTER 3. CREDIBILITY Now to calculate E[X3 4, 2]:
E[X3 4, 2] =
3 X
µ3 (ΘΘ = θ)Pr(Θ = θx)
i=1
= µ3 (ΘΘ = G)Pr(G4, 2) + µ3 (ΘΘ = A)Pr(A4, 2) + µ3 (ΘΘ = B)Pr(B4, 2) = 0.7(0.2903) + 1(0.1935) + 1.6(0.5161) = 1.22247
The answer is slightly off due to rounding.
Example 2continuous Suppose an individual’s claim amounts are given by an exponential distribution with mean 1/Θ where Θ is Gamma with parameters α = 5, β = 1/2000. Three claims have been observed in the amounts of $2000, $1000, 3000. With this information:
1. Provide a mathematical description of this model 2. Determine the predictive distribution as well as the posterior distribution 3. Find the Bayesian premium
Mathematical description The claims amount distribution is given by:
fXΘ (xθ) = θe−xθ
where x ≥ 0, θ ≥ 0
The risk parameter distribution is given by:
π(θ) =
θ4 20005 e−2000θ θα−1 eθβ β = Γ(α) Γ(5)
3.3. GREATEST ACCURACY CREDIBILITY THEORY
33
Predictive Distribution: Some preliminary work needs to be done: Z ∞ 1 2 −2000θ −2000θ −1000θ −3000θ θ 2000 e dθ f (2000, 1000, 3000) = (θe )(θe )(θe ) Γ(5) 0 Z ∞ 1 = (20005 ) θ7 e−8000θ dθ 24 Z0 ∞ 7 5 θ 80008 e−8000θ 2000 Γ(8) ]dθ = 80008 Γ(5) 0 Γ(8) {z }  1 5
5
2000 2000 5040 = 210 = 8 24 80008 Z8000 ∞ θ1 20002 e−2000θ f (2000, 1000, 3000, x4 ) = (θe−2000θ )(θe−1000θ )(θe−3000θ )θe−xθ dθ Γ(5) 0 Z ∞ 1 5 θ8 e−θ(8000+x) dθ = (2000 ) 24 0 Z ∞ Γ(9) 20005 = (8000 + x4 )9 θ8 e−θ(8000+x) dθ 24 (8000 + x4 )9 0 {z }  1
20005 8! = 4! (8000 + x4 )9
Now the predictive distribution is given by: f (x4 2000, 1000, 3000) =
8! 20005 4! (8000+x4 )9 7! 20005 4! 80008
8(8000)8 = (8000 + x4 )9
So the predictive distribution is a type 2 Pareto distribution with parameters (α = 8, β = 2000). Posterior Distribution: We’ll take a shortcut here and show that π(θ2000, 1000, 3000) is proportional to a certain distribution then add the constants back in to make sure it integrates to 1. π(θ2000, 1000, 3000) = =
f (θ, 2000, 1000, 3000) f (θ, x1 , x2 , ..., xn ) = f (x1 , x2 , ..., xn ) f (2000, 1000, 3000) 4 −2000θ 3 −6000θ θ e θ e 7! 20005 4! 80008 7 −8000θ
∝θ e
This implies π(θ2000, 1000, 3000) ∼ Γ(8, 8000). Having obtained the distribution for π(θ2000, 1000, 3000) the predictive distribution could have also been calculated
34
CHAPTER 3. CREDIBILITY
by: Z
∞
Z
∞
8 7 −8000θ
θ e θe f (x4 )π(θx)dθ = Γ(8) 0 0 8 Z ∞ 8000 = θ8 e−θ(8000+x) dθ Γ(8) 0 Z ∞ 8 −θ(8000+x) Γ(9) 80008 θ e = dθ 9 Γ(8) (8000 + x) 0 Γ(9) {z } 
f (x4 2000, 1000, 3000) =
−xθ 8000
dθ
1
8(8000)8 = (8000 + x)9
Bayesian Premium The Bayesian premium can be derived in one of two ways, both are considered for completeness.
E[Xn+1 X] = =
Z
Z
∞ 0
0
∞
xn+1 fXn+1 X (xn+1 x)dxn+1 8(8000)8 8000 8000 dxn+1 = = 9 (8000 + x4 ) 8−1 7 {z } mean of Pareto
xn+1
Alternatively
E[Xn+1 X] =
Z
∞
Z0 ∞
µn+1 (θ)πΘX (θx)dθ
where µn+1 (Θ) = E[Xn+1 Θ = θ] =
1 θ
80008 Γ(7) ∞ θ6 (8000)7 e−θ8000 1 θ7 (8000)8 e−θ8000 dθ = ∞ = θ Γ(8) Γ(8) 80007 0 Γ(7) 0 8000 = 7
The results of example 2 can be generalized. Assuming the claims distribution is exponential with mean Θ−1 and that the underlying risk parameters distribution is Gamma with parameters (α, β), then the marginal, posterior, predictive and Bayesian Premium are given by the following equations:
3.3. GREATEST ACCURACY CREDIBILITY THEORY
35
Marginal
f (xn ) = = =
Z
Z
Z
Z
∞
fXΘ (xθ)πΘ (θ)dθ
0
0
n ∞Y i=1
∞
0
fXi Θ (xi θ)πΘ (θ)dθ =
θe−x1 θ θe−x2 θ · ... · θe−xn θ P
∞
Z
0
n ∞Y
θe−xi θ
i=1 α α−1 −βθ
β θ e Γ(α)
β α θα−1 e−βθ dθ Γ(α)
dθ
β α θα+n−1 e−θ(β+ i xi ) = Γ(α) 0 Z ∞ α+1 α+n−1 −θ(β+P xi ) i βα Γ(α + n) β θ e P = Γ(α) (β + i xi )α+n 0 Γ(α + n) {z }  1
α
=
β Γ(α + n) P Γ(α) (β + i xi )α+n
Posterior Distribution
π(θ) fΘ,X (θ, x) = π(θx) = fX (x) n Y ∝ π(θ) fXi Θ (xi θ) ∝θ
Qn
fXi Θ (xi θ) fX (x)
i=1
i=1 P α+n−1 −θ(β+ i=1 xi )
e
P
θαn −1 β α+n e−θ(β+ i=1 xi ) ⇒ Γ(α + n) {z }  P Gamma(α+n,β+ i=1 xi )
Predictive Distribution
=
β α θα−1 e−θβ n −θ(Pi xi ) θ e Γ(α)
36
CHAPTER 3. CREDIBILITY
fXn+1 X (xn+1 x) = =
Z
Z
0 ∞ 0
fXn+1 Θ (xn+1 θ)πΘX (θx)dθ θe
−xn+1 θ θ
αn −1 α+n −θ (β+
β
e Γ(α + n)
P
i=1
xi )
dθ
P
θαn β α+n e−θ(β+ i=1 xi +xn+1 ) = dθ Γ(α + n) 0 P (β + i=1 xi )α+n Γ(α + n + 1) = P Γ(α + n) (β + i=1 xi + xn+1 )α+n+1 P P θα+n (β + i=1 xi + xn+1 )α+n+1 e−θ(β+ i=1 xi +xn+1 ) · dθ Γ(α + n + 1)  {z } 1 P (α + n) (β + i=1 xi )α+n = P (β + i=1 xi + xn+1 )α+n+1 Z
Bayesian Premium
∞
E[Xn+1 X] = =
Z
Z
∞
∞ 0
µn+1 (θ)πΘX (θx)dθ
∞
xn+1 fXn+1 X (xn+1 x)dxn+1 P Z ∞ (α + n) (β + i=1 xi )α+n = xn+1 dxn+1 P (β + i=1 xi + xn+1 )α+n+1 0  {z } P mean of Pareto with parameters r=α+n,k=β+ i=1 xi P β + i=1 xi = α+n
3.3.5
0
The Credibility Premium
In the previous sections we looked at how two to estimate the unknown individual premium µn+1 (θ) (also called the hypothetical mean) using past data, the result was the Bayesian Premium E[Xn+1 X] One major drawback of the Bayesian method is the limited number of distributions that can be used without numerical solve. This limitation can hardly be considered representative of models******* To address this issue we can use an approximation developed by Buhlmann. “The idea is to estimate µn+1 (θ) by a linear function of past data.” Suppose the estimator takes on
3.3. GREATEST ACCURACY CREDIBILITY THEORY
37
the following form: (3.15)
α0 +
n X
α j Xj
j=1
where αi i = {1, 2, ..., n} are defined by the model. Then we can minimize the distance between µn+1 (Θ) and the estimator: (3.16)
∆2 =
µn+1 (Θ) −
α0 +
n X
α j Xj
j=1
!!2
However ∆ is a random variable. Taking it’s expected value over all of Θ we derive the average squared error. !!2 n X Q = E[∆2 ] = E µn+1 (Θ) − α0 + α j Xj j=1
To minimize Q we take the derivatives with respect to α0 and αj j = {1, 2, ..., n} and set the ensuing equation. Let the minimized values be represented by α ˆ 0 and α ˆ j j = {1, 2, ..., n}. " # n X ∂Q = E 2 [µn+1 (Θ)−( α0 + αj Xj )] (−1) = 0 =⇒ ∂α0 j=1 E[µn+1 (Θ)] = α ˆ+
n X
α ˆ j E[Xj ]
Unbiasedness Equation
j=1
The boxed equation above is known as the Unbiasedness Equation because it has to be unbiased for E[µn+1 (Θ)] = E[E[Xn+1 Θ]] = E[Xn+1 ]. Note however that the credibility estimator µn+1 (Θ) = E[Xn+1 Θ] may be biased (this is what we are trying to estimate). Since we averaging the values of µn+1 (Θ) = E[Xn+1 Θ] over all Θ the bias is minimized. Now taking the derivative with respect to αi we have that: " # n X ∂Q = E 2 [µn+1 (Θ)−( α0 + αj Xj )] (−Xi ) = 0 =⇒ ∂αi j=1 X E[µn+1 (Θ)Xi ] = α ˆ 0 E[Xi ] + α ˆ j E[Xj Xi ] j=1
38
CHAPTER 3. CREDIBILITY
Note however that: E[µn+1 (Θ)Xi ] = E[E[µn+1 Xi Θ]] = E[µn+1 E[Xi Θ]] = E[E[Xn+1 Θ]E[Xi Θ]] = [E[Xn+1 Xi Θ]] = E[Xn+1 Xi ] Which gives: E[Xn+1 Xi ] = α ˆ 0 E[Xi ] +
X
α ˆ j E[Xj Xi ]
j=1
Multiplying the unbiasedness equation by E[Xi ] and subtracting it off from the equation above the covariance for Xi , Xn+1 is derived:
E[Xi Xn+1 ] − E[Xi ]E[Xn+1 ] = α ˆ 0 E[Xi ] + Cov(Xi , Xn+1 ) =
X j=1
=
n X j=1
X j=1
α ˆ j E[Xj Xi ] −
α ˆ j E[Xj Xi ] −
X
α ˆ 0 E[Xi ] +
X
α ˆ j E[Xj ]E[Xi ]
j=1
!
α ˆ j E[Xj ]E[Xi ]
j=1
α ˆ j Cov(Xi , Xj ) i = {1, 2, ..., n}
The unbiasedness equation and the covariance equation together are known as the Normal Equations. Using these equations we can solve for the values of α ˆ0, α ˆ 1 , ..., α ˆ n } to yield the credibility premium: α ˆ0 +
n X
α ˆ j Xj
j=1
The solutions α ˆ0, α ˆ 1 , ..., α ˆ n } may be expressed as a matrix. It is interesting to note that the credibility premium also minimize the Bayesian premium E[Xn+1 X] and Xn+1 . That is it can be shown that Q1 and Q@ can both be minimized:
Q1 = E E[Xn+1 X] −
α0 +
n X j=1
α j Xj
!!2
Q2 = E Xn+1 −
α0 +
n X j=1
α j Xj
!!2
3.3. GREATEST ACCURACY CREDIBILITY THEORY
39
Examplepage 583 LM Suppose E[Xj ] = µ, V ar[Xj ] = σ 2 and Cov(Xi ,P Xj ) = ρσ 2 for i 6= j where −1 < ρ < 1. Then using the credibility premium α ˆ 0 + nj=1 α ˆ j Xj : µ=α ˆ0 +
n X j=1
2
ρσ =
n X
α ˆj µ ⇒
n X j=1
α ˆ Cov(Xi , Xj ) =
j=1
=
n X
α ˆj = 1 −
n X
α ˆ Cov(Xi , Xj ) + α ˆ i V ar(Xi )
j=1
i6=j
α ˆ Cov(Xi , Xj ) + α ˆ i V ar(Xi ) =
j=1
ρ=
i6=j n X
α ˆ0 µ
n X
α ˆ j ρσ 2 + α ˆ i σ 2 =⇒
j=1
i6=j
α ˆj ρ + α ˆi
j=1
=
i6=j n X j=1
α ˆi =
α ˆj ρ + α ˆ i (1 − ρ)
P ρ 1 − nj=1 α ˆj
Summing ∗∗ over i
1−ρ
n X
α ˆi =
i=1
ρˆ α0 µ(1 − ρ)
α ˆj =
i=1
α ˆ0 = which implies:
n X
=
i = {1, 2, ..., n} =⇒ ∗∗
nρˆ α0 α ˆ0 =1− =⇒ µ(1 − ρ) µ
(1 − ρ)µ 1 − ρ + nρ
ρ ρˆ α0 = µ(1 − ρ) 1 − ρ + nρ The credibility can then be written as: X n n X (1 − ρ) ρ α ˆ0 + α ˆ j Xj = µ+ Xj 1 − ρ + nρ 1 − ρ + nρ j=1 j=1 Pn ρ (1 − ρ) j=1 Xj µ+n 1 − ρ + nρ 1 − ρ + nρ  {z n } {z } {z }   α ˆj =
1−Z
¯ = (1 − Z)µ + Z(X)
Z
¯ X
40
CHAPTER 3. CREDIBILITY
3.3.6
Buhlmann Model
The Buhlmann model is a credibility model. It is the simplest model that we explored in the following chapters and is foundational to the subsequent development of the BuhlmannStraub model. The Buhlmann model assumes past losses X1 , ..., Xn have the same mean and variance are i.i.d conditional on Θ. Define the following terms as: µ(θ) = E[Xj Θ = θ] Hypothetical mean v(θ) = V ar[Xj Θ = θ] Process Variance µ = E[µ(Θ)] Expected value of the Hypothetical Mean v = E[v(Θ)] = E[V ar[Xj Θ]] Expected value of the Process Variance a = V ar[µ(Θ)] = V ar[E[Xj Θ]] Variance of the hypothetical means = E[µ(Θ)] − (E[µ(Θ)])2
µ is known as the collective premium and is the valued that is used if there is no past information on claims data. The mean,variance and covariance of the Xj ’s are given as: E[Xj ] = E[E[Xj Θ]] = E[µ(Θ)] = µ V ar[Xj ] = E[V ar[Xj Θ]] + V ar[E[Xj Θ]] = E[v(Θ)] + V ar[µ(Θ)] = v + a Cov(Xi , Xj )i6=j = E[Xi Xj ] − E[Xi ]E[Xj ] = E[E[Xi Xj Θ]] − E[E[Xi Θ]]E[E[Xj Θ]] = E[E[Xi Θ]E[Xj Θ]] − E[µ(Θ)]E[µ(Θ)] = E[µ(Θ)2 ] − (E[µ(Θ)])2 =a Using the linear relation developped in the previous section the credibility premium is given by:
(3.17)
v n ¯ a α ˆ0 + α ˆ j Xj = v X+ v µ n + n + j=1  {z a }  {z a }
n X
Z
1−Z
There is a special name for the ratio v/a it is called the “The Buhlmann Credibility Factor”. Notice that when it is written out (3.18)
k=
E[V ar(Xj Θ)] v = a V ar[E[Xj Θ]]
it is in fact a ratio of two of the components of V ar[Xj ]. The Buhlmann model seems intuitively appealing since as more data is collected the weighting tends more
3.3. GREATEST ACCURACY CREDIBILITY THEORY
41
towards the sample mean rather than collective premium µ. In a homogeneous population the ratio n/(n + k) will be small since the Θ will be relatively similar between the policyholder and everyone else in the same class, in which case Z → 0. However should the population be heterogeneous more weight will be given to the sample mean as it is a better indicator of an individual’s future claims.
Examplediscrete
Suppose we are using the data from the discrete example in the Bayesian Premium section. For convenience the table is repeated below: Suppose three claims are ob
x 0 2 4
Pr(X = xΘ = G) 0.75 0.15 0.1
Pr(X = xΘ = A) 0.6 0.3 0.1
Pr(X = xΘ = B) 0.4 0.4 0.2
θ G A B
Pr(Θ = θ) 0.6 0.2 0.2
Table 3.2: Probabilties associated with risk parameters
¯ served for 4, 4, 2 then using the table above and this information find µ(Θ), v(Θ), µ, v, a, k, Z, X and the best approximation to the Bayesian Premium.
42
CHAPTER 3. CREDIBILITY
Solution µ(G) = 0(0.75) + 2(0.15) + 4(0.1) = 0.7 µ(A) = 0(0.6) + 2(0.3) + 4(0.1) = 1 µ(B) = 0(0.4) + 2(0.4) + 4(0.2) = 1.6 0.7 Θ = G µ(Θ) = 1 Θ=A 2 Θ=B
v(G) = 02 (0.75) + 22 (0.15) + 42 (0.1) − 0.72 = 1.71 v(A) = 02 (0.6) + 22 (0.3) + 42 (0.1) − 12 = 1.8 v(B) = 02 (0.4) + 22 (0.4) + 42 (0.2) − 1.62 = 2.24 1.71 Θ = G v(Θ) = 1.8 Θ = A 2.24 Θ = B X µ(θ)π(θ) = 0.7(0.6) + 1(0.2) + 1.6(0.2) = 0.94 µ= θ
a = V ar[µ(Θ)] = V ar[E[Xj Θ]] = 2
2
2
X
µ(θ)2 π(θ)
θ
= 0.7 (0.6) + 1 (0.2) + 1.6 (0.2) − 0.942 = 0.1224 X v = E[v(Θ)] = E[V ar[Xj Θ]] = vθπ(θ) = 1.71(0.6) + 1.8(0.2) + 1.6(0.2) = 1.704 θ
1.704 v = = 13.9215 a 0.1224 3 Z= = 0.1773 3 + 13.9215 ¯ = 4 + 4 + 2 = 3.¯3 X 3 Buhlmann premium = 3.¯30.1773 + 0.94(1 − 0.1773) = 1.364338 k=
Examplemixed Suppose you are given that claim amounts follow a Poisson distribution with parameter Λ. If Λ is a Pareto type II distribution with parameters 3, 10 and 4 claims in ¯ the amount of 15, 18, 16 and 13 have been made determine µ(Λ), v(Λ), µ, v, a, k, Z, X
3.3. GREATEST ACCURACY CREDIBILITY THEORY
43
and the best approximation to the Bayesian Premium.
µ(Λ) = Λ
v(Λ) = Λ 10(3) µ = E[µ(Λ)] = E[Λ] = = 15 2 10(3) v = E[v(Λ)] = E[Λ] = = 15 2 102 (3) = 75 a = V ar[µ(Λ)] = V ar[Λ] = (3 − 1)2 (3 − 2) v 15 k= = = 0.2 a 75 4 Z= = 0.95238 4 + 0.2 ¯ = 15 + 18 + 16 + 13 = 15.5 X 4 Buhlmann premium = 0.95238(15.5) + (1 − 0.95238)15 = 15.47619
Note Notice that when a Poisson distribution is used to model claim amount that approximation to the Bayesian premium is a function of the risk parameter distribution and not that of the claims distribution.
Examplecontinuous
Recall the generalized continuous exponentialgamma model visited in the section on the Bayesian premium (i.e.X ∼Exp(Θ),Θ ∼ Γ(α, β)). Suppose n claims are observed ¯ provide a generalized model under the Buhlmann framework for µ(Λ), v(Λ), µ, v, a, k, Z, X
44
CHAPTER 3. CREDIBILITY
and the best approximation to the Bayesian Premium. 1 v(Θ) = 2 Θ β 1 = µ = E[µ(Θ)] = E Θ α−1 β2 1 = v = E[v(θ)] = E Θ (α − 1)(α − 2) 2 β2 1 β a = V ar[µ(Θ)] = V ar = − Θ (α − 1)(α − 2) α−1 2 β 1 1 = − (α − 1) α−2 α−1 2 1 β = (α − 1) (α − 1)(α − 2) β2 1 = (α − 1)(α − 2) α − 1 β2 v (α − 1)(α − 2) k= = β2 a 1 (α − 1)(α − 2) α−1 =α−1 n n = Z= n+ α − 1 n + k β α − 1 n ¯+ X Buhlmann premium = n+α−1 n+α−1 α−1 n 1 ¯+ X n+α−1 n+α−1 µ(Θ) =
3.3.7
1 Θ
The BuhlmannStraub Model
A practical difficulty that is encountered when dealing with Buhlmann model is that it does not take into account variations in size or exposure. The BuhlmannStraub model adjusts the variation to accommodate for fluctuations in size and exposure. The model will be similar to the Buhlmann model but some modifications are. For
3.3. GREATEST ACCURACY CREDIBILITY THEORY
45
the BuhlmannStraub model we define the following as: µθ = E[Xj Θ = θ] v(θ) V ar[Xj Θ = θ] = mj where mj is a measure of exposure that modifies V ar[Xj Θ = θ]. Note: The model above is appropriate if each Xj is an average of mj independent variables (conditional on Θ). That means Xj in such a case would have mean µ(θ) and variance v(θ) . mj Using similar notation to the Buhlmann model we derive µ, v, a and V ar[Xj ] µ = E[µ(Θ)]
v = E[v(Θ)]
a = V ar[µ(Θ)]
V ar[Xj ] requires some modification:
v(Θ) V ar[Xj ] = E[V ar[Xj Θ]] + V ar[E[Xj Θ]] = E + V ar[µ(Θ)] mj v E[v(Θ)] +a= +a = mj mj Pn Now to find the credibility Premium. Let m = i mi then using the normal equations we have that: E[Xn+1 ] = µ = α ˆ0 +
n X j=1
Cov(Xi , Xn+1 ) =
n X
α ˆ j E[Xj ]  {z }
⇒
µ
α ˆ j Cov(Xi , Xj ) + α ˆ i V ar[Xj ]
j=1
i6=j
=a=
n X
α ˆj a + α ˆi
j=1
i6=j
=
αi =
n X
v +a mj
v(θ) =⇒ mj j=1 P mj (a) 1 − nj=1 α ˆj
α ˆj a + α ˆi
v
Equating ∗ and ∗∗ we have that: mj (a) v
=
α ˆ0 µ
mj (a) v
=1−
α ˆ0 µ
α ˆ0 v
∗∗
n X j=1
α ˆj = 1 −
α ˆ0 µ
∗
46
CHAPTER 3. CREDIBILITY
Which reduces to: (3.19)
α ˆ0 =
v/a µ = µ 1 + (ma/v) m + v/a
and so (3.20)
mj mj (a)ˆ α0 = µv m + v/a
α ˆj =
So the credibility premium under the Buhlmann Straub model is given by: n X
Pc,BS = α ˆ0 +
j=1
¯ + (1 − Z)µ α ˆ j Xj = X X
which implies (3.21)
Z=
m m = m+k m + v/a
where
(3.22)
¯= X
n X mj j=1
m
Xj
There are a couple of things to notice here. First, m is the total exposure of the ¯ is a weighted average of policyholder, clearly Z is dependent on m. Secondly, X Xj . If Xj is interpreted to be the average claim/loss experienced by the mj group members in year j then mj Xj is the total claims/loss experienced by group mj in year j. If we wanted to price a group premium for mn+1 policyholders in year n + 1 theBuhlmannStraub credibility premium would be multiplied by mn+1 i.e.: (3.23)
¯ + (1 − Z)µ] mn+1 [Z X
¯ then under the Buhlmann frameHad we in fact known the correct weighting for X work the same answer results would have been derived as in the BuhlmmanStraub
3.3. GREATEST ACCURACY CREDIBILITY THEORY
47
¯ instead of Xj been used as the random variable. model had X
n X mj Xj
¯ Θ V ar(XΘ) = V ar m
j=1
n X mj v(θ)
!
m(v(θ)) v(θ) = 2 m m j=1 ## " " Pn Xj j=1 ¯ Θ V ar[µ(Θ)] = V ar[E[XΘ]] = V ar E n " n # X E[Xj Θ] = V ar = V ar[E[Xj Θ]] n j=1 =
m2
=
= V ar[µ(Θ)] = a
Which implies under the Buhlmann framework: Z=
1 m n = = n+k 1 + v(θ)/(ma) m + v/a
Note In many problems they will give you the total number of claims Nj and the number of policyholders in the group mj . To find the value Xj it will be necessary to take the average of the number of claims that is Xj = Nj /mj . If a distribution is given for Nj then it will be necessary to transform the functions of Xj . Typically most problems will give you a functions for Xj and include Nj and mj so that the group premium may be calculated once the individual premium has been calculated. Note that there is an inherent assumption here that the group has the same distribution as Xj which would make sense if the group were homogeneous, however the purpose of the credibility premium is to adjust based on an individuals past experience.
Example 1Discrete Suppose that individual claim amounts for policies are Poisson distributed with parameter Λ where Λ follows a negative binomial with parameters r and β. Also it is known that during year j are Nj claims from mj policies j = 1, 2, .., n. Given ¯ that there are n years of data determine µ(Λ), v(Λ), µ, v, a, V ar[E[Xj Θ]], k, Z, X, the BuhlmannStraub individual premium as well as the BuhlmannStraub collective premium.
48
CHAPTER 3. CREDIBILITY
Solution From above, Xj ∼Poisson(Λ) and Λ ∼NegBin(r, s) then:
µ(Λ) = E[Xj Θ] = Λ v(Λ) = V ar[Xj Θ] = Λ v = E[V ar[Xj Θ]] = E[Θ] = rβ a = V ar[E[Xj Θ]] = V ar[Θ] = rβ(1 + β) Θ v(θ) = V ar[Xj Θ] = mj m 1 rβ = k = varva = rβ(1 + β) 1+β m β Z= 1 = 1+β m + 1+β Pc,BS−I =
β 1+β
¯ X {z}
Pn
j=1
+
mj X j m
Pc,BS−C = mn+1 Pc,BS−I = mn+1
1 µ 1 + β {z} rβ
β ¯ rβ X+ 1+β 1+β
β An interesting point to notice in this example is that Z = 1+β which is the probability assigned to an event occurring in the negative binomial model.
Example 2Continuous Suppose that individual claim amounts for policies are exponential with mean Θ−1 where Θ follows a Gamma distribution with parameters α, β where α > 2. It is known that during year j there are Nj claims from mj policies j = 1, 2, .., n using ¯ the Buhlmannthis information determine µ(Λ), v(Λ), µ, v, a, V ar[E[Xj Θ]], k, Z, X, Straub individual premium as well as the BuhlmannStraub collective premium. Solution The problem gives Xj = Nj /mj ∼Exp(θ) and Θ ∼Gamma(α, β). This
3.3. GREATEST ACCURACY CREDIBILITY THEORY
49
implies: µ(Θ) = E[Xj Θ] = Θ−1
vθ = mj V ar(Xj Θ) = (mj )
1 Θ2 (m
j)
= Θ−2
β α−1 β2 v = E[v(Θ)] = E[Θ−2 ] = (α − 1)(α − 2) 2 2 β β β2 −1 a = V ar[µ(Θ)] = V ar[Θ ] = − = (α − 1)(α − 2) α−1 (α − 1)2 (α − 2) −2 β2 Θ v(Θ) =E = E[V ar[Xj Θ]] = E mj mj mj (α − 1)(α − 2) 2 β v (α − 1)(α − 2) =α−1 k= = β2 a (α−1)2 (α−2) m m = Z= m+k m+α−1 m ¯ + α−1 Pc,BS−I = X µ m+α−1 m + α − 1 {z} µ = E[µ(Θ)]E[Θ−1 ] =
β α−1
Pc,BS−C = mn+1 Pc,BS−I = mn+1
3.3.8
m ¯ + α−1 µ X m+α−1 m + α − 1 {z} β α−1
Exact credibility
In some cases the best linear approximation of the Bayesian Premium equals the Bayesian Premium. In such instances the term exact credibility is employed to show the equivalency between the two premiums. For example as in the case of the PoissonGamma distribution. In the context of exact credibility it is required that E[µ(Θ)], E[V ar[Θ]] and V ar[(Θ)] be finite. Exact credibility arises in the Buhlmann (and BuhlmannStraub) models specfically in ”situations involving the linear exponential family members (Xj Θ = θ)) and their conjugate priors (Θ)”. “Begin:AsideLinear Exponential Family Definition 3.3.2 The distribution of the XΘ = θ is a member of the linear ex
50
CHAPTER 3. CREDIBILITY
ponential family if its PDF (or PMF) can be written as: (3.24)
fXΘ = (xθ) =
p(x)er(θ)x q(θ)
where the support does not depend on the parameter θ Some common members of the linear exponential family incude: Note the expo
Poisson Normal
p(x) 1/x! (2π)−1/2
r(θ) ln(θ) θ/v
q(θ) eθ θ2 /2v
Table 3.3: Members of the linear exponential family
nential distribution is not part of the linear exponential family. For a member of the linear exponential family the mean is given by: (3.25)
µ(θ) = E[XΘ = θ] =
q ′ (θ) r′ (θ)q(θ)
also (3.26)
V ar(X) = v(θ) =
µ′ (θ) r′ (θ)
” Definition 3.3.3 A prior distribution is said to be a conjugate prior distribution for a given model ifhte resulting posterior distribution is from the same family as the prior distribution but perhaps with different parameters Theorem 3.3.4 Suppose that given Θ = θ the random variables X1 , ..., Xn are i.i.d. with pf; p(xj )er(θ)xj fXj Θ (xj θ) = q(θ) where Θ has pdf: π(θ) =
[q(θ)]−k eµkr(θ) r′ (θ) c(µ, k)
3.3. GREATEST ACCURACY CREDIBILITY THEORY
51
where k and µ are parameters of the same distribution and c(µ, k) is the normalizing constant. The the posterior pf of πΘX (θx) Proof: Qn
p(xj )er(θxj ) [q(θ)]−k eµkr(θ) r′ (θ) q(θ)n c(µ, k) P (r(θ)x ) −(k+n) µkr(θ) j ∝e j [q(θ)] e r′ (θ) P = er(θ)(µk+ j xj ) [q(θ)]−(k+n) r′ (θ) P µk + j xj = exp r(θ) (k + n) [q(θ)]−(k+n) r′ (θ)  {z } k+n {z } k∗ 
π(θx) ∝
j=1
µ∗
=
[q(θ)]
−k∗
µ∗ k∗ r(θ)
e c(µ, k)
r′ (θ)
where k∗ = k + n P µk + j xj n n µ∗ = = µ+ x¯ n+k n+k n+k ”
52
CHAPTER 3. CREDIBILITY
Chapter 4 Severity Distributions To boldly go where no map has gone before
4.1
Introduction
When a loss occurs the full amount of the loss is not necessariliy the amount paid by the insurer (e.g. policy adjustments occur such as a deductible, limit or coinsurnace). Since full amounts are not necessarily paid a full due data set is no used. When only considering a deductible within a payment the function becomes truncated whereas when a limit policy is considered the function becomes censored. There is a clearly a dsitinction to be made between the amount of loss (called the groundup loss) and the amouint paid by the insurer. Our obective is to find a reasonable model for hte groundup loss. Essential properties of a groundup loss: The distribution 1. Distribution is on the positive reals 2. The distribution
53
54
CHAPTER 4. SEVERITY DISTRIBUTIONS
4.2
Parametric Distributions
For parametric distributions we treat historical data as a sample from the undelying distribution of X. Based on the shape of the empirical distribution we choose a contiuous parametric distribution as a candidate to model X. Definition: A parametric distribution is defined as the set of distribution functions, each member of which is determined by specifiying its parameters. Examples include the exponential, gamma, pareto,... New parametric distributions can be created in one of 4 ways:
4.2.1
Multiplication by a constant
In the process of searching for a loss distribution multiplying by a constant (c > 0) is equivalent to applying the multiplicative factor uniformly across all loss amounts. Let X be a continuous random variable with PDF fX (x) and CDF Fx (x). Define Y = cX, where the CDF of Y is given by: y y FY (y) = P (Y ≤ y) = P (cX ≤ y) = P X ≤ = FX c x It follows that:
y 1 y d FX = fx dy c c c Definition: We say taht a parametric distribution is a scale distribution if when X belongs to a set of distributions so is Y = cX. fY (y) =
x
Example: If X ∼exp(θ), fX (x) = 1θ e−x θ ; X ≥ 0, letting Y = cX, fY (y) = x 1 − cθ ∼ exp(cθ). So the exponential distribution is a scale distribution with θ as a e cθ scale parameter. Definiton For random variables with nonnegative support, a scale parameter is a parameter of a scale distribution that meets 2 criteria: • When a scale distribution is multiplied by a constant the parameter is also muliplied by that constant. • all other parameters, if any are unaffected Example: If we use X ∼ exp(λ) is fX (x) = λeλx , x ≥ 0. Letting Y = cX, λ fY (y) = yc e− c y . In this form Y ∼ exp( λc ) is not quite a scale parameter.
4.2. PARAMETRIC DISTRIBUTIONS
4.2.2
55
Raising to a Power:
Let X be a continuous radom variavle with density fX (x) and CDF fX (x). Define 1 Y = X τ . (going to monotone increasing or monotone decreasing): a. τ ≥ 0 →Increasing monotone, i.e.:
1
FY (y) = P (Y ≤ y) = P (X 1 τ ≤ y) = P (X ≤ y τ ) = FX (y τ )
similarly fY (y) =
d FX (y τ ) = τ y τ −1 fx (y tau ) dy
b. τ ≤ 0 →Decreasing monotone, i.e.:
similarly
1
FY (y) = P (Y ≤ y) = P (X τ ≥ y) = P (X ≥ y τ ) = 1 − FX (y τ ) fY (y) =
d FX (y τ ) = −τ y τ −1 fx (y tau ) y ≥ 0 dy
Note: examine the appendix in Loss Models several inverse distributions are dervied from this technique.
4.2.3
Exponentiation:
Let X be a continuous random variavle with density function fX (x) and CDF FX (x), then define Y = ex . Then the CDF of Y is given by: FY (y) = P (Y ≤ y) = P (ex ≤ y) = P (X ≤ ln(y)) = FX (ln(y)) d 1 fY (y) = FX (ln(y)) = fX (ln(y)) dy y There is a special case for exppnentiation, that is when X ∼ N (µ, σ 2 ). If this is the case Y ∼ LogN (µ, σ 2 ).
4.2.4
Mixture of distributions
see handout Suppose Θ is a mixing random variable:
56
CHAPTER 4. SEVERITY DISTRIBUTIONS
4.2.5
Summary and Cautions
Note: It is possible to have a scale distribution without a scale parameter. Recall a scale parameter is defined by two conditions: 1. 1. When a scale distribution is multiplied by a postivie constant the scale parameter is also multiplied by a positive constant. 2. 2. When a member of the scale distribution is multiplied by a constant all other parameters are unchanged. An example of a scale distribution is without a scale parameter is the inverse normal distribution. Two of its paramters are multiplied by a constant when scaling occurs but it remains part of the same distribution. When applying a transform it is best to assume no scaling intially takes place. Once a distribution has been found scaling may be added back in. Example: Find the transformed gamma distribution . Solution: Let X ∼Gamma(α, β) and the transformed distribution be Y = X 1/τ . Then: PrY (Y ≤ y) = PrX (X 1/τ ≤ y) = PrX (X ≤ y τ ) Using the CDF of a Gamma distribution is difficult. The PDF presents a more tractable approach. Accordingly Pr(X ≤ y τ ) is differentiated with respect o y. d d(y τ ) (y τ )α−1 e−y fY (y) = PrX (X ≤ y τ ) = dx dx Γ(α) τ α −y τ τ −1 τ α−1 −y τ τ (y ) e τ (y )(y ) e = = Γ(α) yΓ(α)
τ
In the PDF above no scale paramter was assumed, with the scale paramter added in the function becomes: τ ((y/β)τ )α e−(y/β) fY (y) = yΓ(α)
τ
Chapter 5 Tail Distributions :
5.1
Introduction
A primary concern for the insurer is to appropriately quanitfy the tail of a ground up loss distribution (i.e. probability of large losses.) In this section a number of criteria are deveoped to assess whether the tail of the distribution is thick or not. In what follows let the random variable X be continuous: 1. PDF: fX (x) 2. CDF: FX (x) (where F¯X (x) = 1 − FX (x)) 3. Hazard rate : h(x) = F¯X (x)/fX (x) Given that some crtieria compare tails of two distributions a second random variable Y is defined with its own PDF,CDF and hazard rate.
5.2
Existence of moments
Recall that the k t h moment of X is defined as: Z ∞ K xk fX (x)dx E[X ] = 0
If the density function of X can take on large values with significant probability then the k th moment of X may not converge i.e. E[X k ] = ∞. Accordingly X can be classified based on its moments convergence: 57
58
CHAPTER 5. TAIL DISTRIBUTIONS 1. A distribution is said to have a heavy tail if (below a certain threshold k) it lacks one or many moments. 2. Conversly, a distribution is said to have a light tail if all of its moments exist.
A threshold for k is specified because some distributions may lack moments yet have light tails. Example: Consider the tdistribution with k degrees of freedom. The tdistribution is a unique in since for k degrees of freedom it has k − 1 moments. As its degrees freedom increases so do its moments. At roughly 30 degrees of freedom k is sufficiently large to be considered a light tail distibution. If the k t h moment of X does not exist this implies that the MGF of X does not exist either. Consider the Taylor series expansion of the MGF: tx
MX (t) = E[e ] = E
"∞ # X tk X k k=0
k!
=
∞ X tk k=0
k!
E[X k ]
Accordingly heavy and light tailed distributions can be classified based on the existence of an MGF. Caution however should be taken when describing whether a distribution has a heavy or light tail based on the MGF. The Lognormal distribution is an example of a distribution where all of its moments converge however it does not have an MGF.
5.3
Limiting Ratios
The survival function can be used to assess the thickness of the tail of a given distribution. We ask the question “how large is F¯ (x) for large values of X?” This question can be answered by comparing the distributions of X and Y by taking the limit of the ratio of their survival functions. ¯ FX (x) =c≥0 lim ¯ 0→∞ FY (x) If: 1. c = 0, F¯X (x) goes a lot faster to 0 thatn its counterpart F¯Y (x) ⇒ X has a lighter tail than Y 2. 0 ≤ c ≤ ∞, F¯X (x) and F¯Y (x) have comparable beahvior 3. c → 0, F¯X (x) goes to 0 slower than F¯Y (x) ⇒ X has a heavier tail than Y
5.4. HAZARD RATE
59
From l’Hospital’s rule the ratio above is equivalent to the limit of the ratio’s of the density functions i.e.: ! ¯ d ¯ F (x) −fX (x) FX (x) H X dx = lim = lim lim ¯ d ¯ 0→∞ 0→∞ 0→∞ −fY (x) FY (x) F (x) Y dx fX (x) =c = lim 0→∞ fY (x) Example: Demonstrate that the Pareto distribution with X ∼Pareto(α, θ) has a heavier tail than a Gamma distribution with Y ∼Gamma(τ, λ). Solution: ! αθ α fPareto (x) (θ+x)α+1 lim = lim xτ −1 e−x/λ 0→∞ 0→∞ fGamma (x) λτ Γ(τ ) αθα xτ −1 ex/λ = lim 0→∞ λτ Γ(τ ) (x + θ)α+1 H
−→ ∞ Note: an exponential will go to infinty faster than any polynomial becuase it can be continuously differentiated whereas a polynomial can only be differentiated a finite number of times. Hence the Pareto distribution both goes to infinity faster than a Gamma distribution and has a heavier tail. This was to be expected since a Pareto distribution has a finite number of moments whereas the Gamma distribution has infinitely many.
5.4
Hazard Rate
The hazard rate reveals information about the tail of the distribution. The hazard rate when multiplied by dx can be interpreted as the conditional probability of X failing within dx given X < x, mathematically: fX (x) Pr[X ∈ (x, x + dx)] ∼ hX (x)dx = ¯ dx ∼ = Pr[X ∈ (x, x + dx)X > x] = Pr[X > x] FX (x) Hazard rates can be seperated based on whether they are increasing or decreasing. Suppose the hazard rate of X is a decreasing function in x. Then: h(x1 )dx ≥ h(x2 )dx where x1 ≤ x2 Pr[X ∈ (x1 , x1 + dx)X ≥ x1 ] ≥ Pr[X ∈ (x2 , x2 + dx)X ≥ x2 ]
60
CHAPTER 5. TAIL DISTRIBUTIONS
Which means that the random variable X (conditional on the fact that X¿x) is decreasingly likely (as x increases) to take on smaller values. As a a result, XX ≥ x is increasingly likey to have values in the range (x + dx, ∞). In such situtations X is heavy tailed. Now suppose the hazard rate of X is a increasing function in x. Then: h(x1 )dx ≤ h(x2 )dx where x1 ≤ x2 Pr[X ∈ (x1 , x1 + dx)X ≥ x1 ] ≤ Pr[X ∈ (x2 , x2 + dx)X ≥ x2 ] Which means that the random variable X (conditional on the fact that X > x) is increasingly likely (as x increases) to take on larger values. In such situtations X is light tailed tailed.
5.4.1
Terminology
More compact terminology has been developed to refer to increasing and deceasing hazard rates: 1. The distribution function FX (x) is said to have a increasing failure rate (IFR) if hX (x) is a nondecreasing function in x −→ lighttailed distribution. 2. The distribution function FX (x) is said to have a decreasing failure rate (DFR) if hX (x) is a nonincreasing function in x −→ heavytailed distribution. Given that the exponential distribution is the only distribution with a constant failure rate it is the only distribution that is both IFR and DFR. Example: For X ∼Pareto(α, θ) with PDF fX (x) = (αθα )/(x + θ)α+1 and survival function F¯X (x) = (θ/(x + θ))α determine if its hazard rate is IFR or DFR. Solution: We have: fX (x) (αθα )/(x + θ)α+1 α hX (x) = ¯ = = (θ/(x + θ))α x+θ FX (x) Clearly hX (x) is a decreasing function. As x increases the value of hX (x) decreases. Sometimes it may be difficult to check whether hX (x) is decreasing or increasing. In cases like these it is necessary to differentiate with respect to x and consider the α subsequnt slope of the function. To check if x+θ it needs to be dofferentiatedwith respect to x.
5.5. THE RESIDUAL LIFETIME AND EXCESS LOSS R. V.S
61
Example: Show that the exponential function has a constant hazard rate and is consequently both IFR and DFR: Solution: Let X ∼Exponential(λ) with fX (x) = (1/λ)(e−x/λ ) and F¯X (x) = (e−x/λ ), then: (1/λ)(e−x/λ ) fX (x) 1 = hX (x) = ¯ = −x/λ (e ) λ FX (x)
5.5
The residual lifetime and excess loss r. v.s
In the following section the excess loss random variables mean will be used to determine the heavines of a distributions tail. Before proceding howver it is necessary to revise some terminology for both the residual lifetime and excess loss random variables. The residual lifetime and excess loss random variables are similarly defined. The residual lifetime random variable TX is predominantly used for modelling survival over a given time period whereas the excess loss random variable YP is used to model financial loss.
5.5.1
The residual lifetime and its mean
In life contingecies TX denotes the residual lifetime random variable. It is the future amount of time a person/machine will live/survive given that it has already survived x years. Formally its definition is as follows: Definition 5.5.1 The residual life time random variable TX is defined as”’ TX = X − xX > x
x>0.
TX ’s survival function can be represented as: F¯TX (t) = Pr(TX > t) = Pr[X − x > tX > x] wheret ≥ 0 Similarly its density function is: fTX =
fX (x + t) F¯X (x)
t≥0
Taking the expected value of TX the mean residual life time (E[TX ]) is realized: R∞ (t − x)f (t)dt ◦ eX (x) = E[TX ] = E[X − xX > x] = x ¯ FX (x)
62
CHAPTER 5. TAIL DISTRIBUTIONS
5.5.2
Excess loss random variable and its mean
In loss models YP denotes the excess loss random variable.
Definition 5.5.2 The excess loss random variable YP is defined as:
YP = X − dX > d
x>0.
The excess loss random variable’s survival function can be represented as: F¯YP (y) = Pr(YP > y) = Pr[X − d > yX > d] where y ≥ 0 Similarly its density function is:
f YP =
fX (x + y) F¯X (x)
y≥0
Taking the expected value of YP the mean excess loss (E[YP ]) is realized:
eX (d) = E[YP ] = E[X − dX > d] =
5.5.3
R∞ d
(y − d)f (y)dy F¯X (y)
Some important equations for the moments of the excess loss r.v
Listed below are several important properties of the moments of the excess loss r.v.. The nt h moment is described below and is only defined if the integralover fX (x) converges: R∞ (x − d)k f (x)dx ekX (d) = d 1 − F (d)
5.5. THE RESIDUAL LIFETIME AND EXCESS LOSS R. V.S
63
For the mean excess loss function we have: R∞ (x − d)f (x)dx integrating by parts eX (d) = d 1 − F (d) R∞ −(x − d)S(x)∞ d + d S(x)dx = S(d) R∞ S(x)dx = d S(d) Z ∞ S(x) dx let: u = x − d = S(d) d Z ∞ S(u + d) = dx S(d) 0 Z ∞ S(u + d) Sd (u)dx where: Sd (u) = = S(d) 0 Also it should be noted that while T V aR(X) is similar to the mean excess loss they are not quite the same. Recall that: eX (d) = E[X − dX > d] whereas T V aR(X) = E[XX > d] T V aR(X) however can be written in terms of the mean excess loss plus a constant (d). T V aR(X) = E[XX > d] = E[X − d + dX > d] = E[X − dX > d] + d = eX (d) + d Which makes sense since eX (d) = E[X −dX > d] = E[XX > d]−d = T V aR(X)− d.
5.5.4
Terminology for residual lifetime r.v.
Definition 5.5.3 A distribution function FX (x) is said to have an increasing mean residual lifetime (IMRL) if e(x) is a nondecreasing function. Definition 5.5.4 A distribution function FX (x) is said to have an dereasing mean residual lifetime (IMRL) if e(x) is a nonincreasing function. Conclusions 1. If X is IMRL → X is heavy tailed 2. If X is DMRL → X is light tailed
64
CHAPTER 5. TAIL DISTRIBUTIONS
Example 5.5.5 Consider the following mixture of two exponentials: 1 1 −x/λ 1 1 −x/θ += where x ≥ 0 e e fX (x) = 2 θ 2 λ Show using the mean residual lifetime that the function is in fact a nondecreasing function and is consequently IMRL. Solution: Recall: Z ∞ Z ∞ fX (y) e(x) = E[TX ] = E[X − xX > x] = (y − x)fXX>x (y)dy = (y − x) ¯ dy FX (x) x x Z ∞ 1 1 −y/(2θ) 1 1 1 1 −y/θ ¯ + = e−y/θ + e−y/(2θ) FX (x) = e e 2 θ 2 2θ 2 2 x Now inserting the equations above: R∞ 1 −y/(2θ) (y − x)( 12 1θ e−y/θ + 21 2θ e )dy let y ′ = y − x e(x) = x 1 −x/(2θ) 1 −x/θ e + e R ∞ ′ 1 21 −(y′ +x)/θ2 1 1 −(y′ +x)y/(2θ) (y )( 2 θ e + 2 2θ e )dy = 0 1 −x/θ 1 −x/(2θ) e + 2e 2 =
1 (θ)e−x/θ + 21 (2θ)e−x/(2θ) 2 1 −x/θ e + 21 e−x/(2θ) 2 −x/(2θ) −x/(2θ)
cleaning up the equation
e +2 θe let y(x) = e−x/(2θ) −x/(2θ) −x/(2θ) e e +1 y(x) + 2 1 =θ =θ 1+ y(x) + 1 y(x) + 2
=
At this point it is difficult to tell if e( x) is IMRL. To find out e( x) is differentiated with respect to x. d 1 −x/(2θ) y ′ (x) = − e(x) = −θ θe dx (y(x) + 1)2 2 Accordingly e(x) can be seen to be a nondecreasing function of X, that means it is IMRL and hence heavy tailed. In what follows we show that there exists a connection between the concept of IMRL/DMRL of a dstribution function FX and the concept of DFR/IFR of the assocaited hazard rate. Indeed we have: Z x d ¯ hX (y)dy ⇒ hX (y) = − ln(F¯X (x)) FX (x) = exp − dx 0
5.5. THE RESIDUAL LIFETIME AND EXCESS LOSS R. V.S Which implies:
65
R
x+t F¯X (x + t) e− 0 hX (y)dy = − R x h (y)dy = F¯X (t) e 0 X
Assume that FX is DFR. It follows that (e− decreasing function of x. What follows is: F¯X (x2 + t) F¯X (x1 + t) ≤ F¯X (x1 ) F¯X (x2 )
R x+t 0
hX (y)dy
)/(e−
Rx 0
hX (y)dy
) is a non
where :x1 ≤ x2
Which means: e(x1 ) =
Z
∞ 0
¯ Z ∞ ¯ FX (x1 + t) FX (x2 + t) ≤ = e(x2 ) F¯X (x1 ) F¯X (x2 ) 0
From the equation above it is clear that e(x) is a nondecreasing function for x:x1 ≤ x2 when FX (x) is DFR. Thats is X is IMRL. The conclusions can be summarized as follows: 1. IF X is DFR → X is IMRL 2. If X is IFR → X is DMRL The converse impactions are however not true, that is if: 1. IF X is IMRL 9 X is DFR 2. If X is DMRL 9 X is IFR
66
CHAPTER 5. TAIL DISTRIBUTIONS
Chapter 6 Policy Limits 6.1
Introduction
This chapter introduces policy adjustments. When an individual, family or company puchase an insurance policy there are usually some constraints placed on the policy at the time of purchase. Such constraints are called policy adjustments. They provide an economic incentive for the policyholder to not abuse the insurance system by limiting the exposure of the insurer to extreme events, petty claims and policyholder underinsurance (which occurs when a policyholder has not purchased full insurance coverage but is entitled to an insurance benefit). A combination of adjustments is possible. For example an insurance company may both require a minimum loss amount before an insurance policy becomes active as well a limit on the amount of coverage available. This is normal in autoinsurance where both a deductible (usually $500 dollars) and maximum loss (value of the car) are embedded into the insurance policy. In this chapter we will proceed by:
1. Introducing the various types of policy adjustments that exist in insurance contracts. 2. Identify two types of reporting methods. 3. Analyse the impact policy adjustments have on these reporting methods.
At which point several examples, using the two reporting methods and policy adjustments, will be given to solidfy the concepts. 67
68
CHAPTER 6. POLICY LIMITS
6.2
Policy adjustments
6.2.1
Types
Insurance policies often contain adjustments to determine the amount to be paid for a given (groundup) loss. The most common policy adjustments are listed below: 1. Policy Limit u (u > 0): • For any given loss the maximum amount paid by the insurer is u • Provides protection against extreme events with large losses • The insurer pays the minimum between the loss X and the policy limit u • If there are no other policy adjustments on the policy the insurer pays ( X, X < u max(X, u) = u, X ≥ u 2. Ordinary deductible d (d ≥ 0) • • • •
For any given loss the first d dollars falls on the insured Provides protection against a large volume of very small claims The insurer pays the maximum between X − d and 0 If there are no other adjustments on the policy the insurer pays ( 0, X − d < u max(X − d, 0) = u, X − d ≥ d
3. Coinsurance factor α (0 ≤ α ≤ 1)
• Insurer pays a proportion α of the loss amount the remaining (1 − α) is the insureds responsibility • Kicks in when an insured is entitled to a poriton of insurance but not the full amount • If there are no other policy adjustments on the policy the insurer pays αX
4. Franchise deducible (d ≥ 0)
• Differs from theordinary deductible in that when the loss exceeds d, the dedcutible is waived and the full loss is paid by the insurer • If there are no other policy adjustments on the policy, the insurer pays ( 0 X<d X1(X≥d) = X X≥d
6.2. POLICY ADJUSTMENTS
6.2.2
69
Order of application
It is possible that more than one policy adjustment is included in an insurance policy. An important question arises: which policy adjustment is applied first to determine the amount paid by the insurer? In this course (and unless otherwise specified), we assume that policy adjustments are introduced in the following order • Policy limit (if any) • Policy Deductible (if any) • Coinsurance (if any) For instance if there is an ordinary deductible d and the maximum amount paid by the insurer would be insurer pays: 0 max(min(X, u) − d, 0) = X − d u−d
a policy limit u (u ≥ d), then u − d (not u). In general the X≤d d<X≤u X>u
for a given loss X on such a policy. If the policy also contains a coninsurance factor α then the amount paid is: X≤d 0 αmax(min(X, u) − d, 0) = α(X − d) d < X ≤ u α(u − d) X > u
Notice that the order in which these policy adjustments are applied affects the amount paid by the insurer. As above consider an insurance policy with a deductible and policy limit, but this time the deductible is applied first before the poilicy limit, then the insurer has to pay: d>X 0 min(max(0, X − d), u) = X − d d ≤ X ≤ u u u≤X
Note that the application of deductibles and limits to losses are examples of truncation and censoring of observations:
Definition 6.2.1 A truncated loss is a loss that is not observed. For a policy with a deductible d, any losses below the deductible would be truncated. Truncated distribtutions are conditional distributions i.e. f (xx > d) is a conditional distribution since x > d.
70
CHAPTER 6. POLICY LIMITS
Definition 6.2.2 A censored loss is a loss that is observed to occur, but whose value is not known. For a policy with a limit u, any losses above the limit u would be censored given that those losses will be observed but the exact loss amount would be unknown. Censored losses are unconditionally distributed.
6.3
Loss payment vs. payment basis
When considering the amount paid by the insurer, it is typical to consider and distinguish between 2 types of reporting methods for an amount paid loss basis and the payment basis: 1. Loss basis: YL = amount paid per loss (report all losses) • There is an entry for every single loss (even if the amount paid by the insurer is 0) • Includes the 0 amount paid by the insurer on some losses (if any) • For a policy with limit u, deductible d and a coninsurance α; YL = αmax{min(X, u) − d, 0} • In the presence of a deductible d, the distribution of YL has a probability mass at 0 of FX (d) =Pr(X ≤ d) 2. Payment basis: YP = YL ≤ yYL > 0 = amount paid per payment
• Only the nonzero payments of the insurer are included • There is not an entry for every single loss • In the presence of a deductible d the distribution of YP does not have a probability mass at 0
6.3.1
Distributional properties of YP
Since YP = YL ≤ yYL > 0 it is possible to characterize YP in terms of YL . Suppose that FYL (y) = Pr(YL ≤ y) then cumulative distribution function of YP is given by; FYP (y) = Pr[YP ≤ y] = Pr[YL ≤ yYL > 0] =
Pr(0 ≤ YL < y) FY (y) − FYL (0) = L Pr(YL > 0) 1 − FYL (0)
d FYL (y) then the Assuming that FYL (y) is differentiable for y ≥ 0 i.e. fYL (y) = dy probability density function of YP is: fYL (y) d FYL (y) − FYL (0) d = FYP (y) = y>0 fYP (y) = dy dy 1 − FYL (0) 1 − FYL (0)
6.4. ANALYSIS OF POLICY ADJUSTMENTS ON REPORTING METHODS 71 Assuming FYL (0) is differntiable then the information about YL is sufficient to characterize YP . Typically if an insurance policy does not contain a deductible then there is no difference between the 2 reporting methods, that is if FYL (d = 0) = 0 then FY (y) − 0 FYP (y) = L = FYL (y) 1−0
However if there is a deductible then clearly the two methods are different: FYL (0) = FX (d) = Pr(X ≥ d) > 0 since a mass point now exists at 0 for YL but not for YP .
6.4
Analysis of policy adjustments on reporting methods
In this section policy adjustments are revisited and considered under different reporting methods. For ease of use it assumed that the groundup loss random variable X has probability density function fX (x) and cumulative density function FX (x) = 1 − F¯X (x) where F¯X (x) =Pr(X > x).
6.4.1
Policy limit u with no deductible
Given that there is no deductible in this specific case (d = 0, Pr[YL = 0] =Pr[X ≤ 0]) YL and YP will have the same distribution. With only a policy limit u: (6.1)
YL = Yp = min(X, u) =
(
X u
X<u X≥u
Note that YL and YP is a mixed random variable with the following properties: • Mass point YL = YP = u with probability P r[X > u] = F¯X (u) • Its probability density function is defined as fYL (y) = fX (x) ; 0 ≤ y < u • Its cumulative density function is defined as ( FX (y) 0 ≤ y < u (6.2) FYL (y) = 1 y≥u
72
CHAPTER 6. POLICY LIMITS • Its mean and variance are given by;
E[YL ] = E[YP ] = E[min(X, u)] = E[X ∧ u] Z u Z = yfX (y)dy + uF¯X (u) =
(6.3)
0
(6.4)
6.4.2
F¯X (y)dy
0
V ar(YL ) = V ar(YP ) = E[min(X, u)2 ] − (E[min(X, u)])2 Z u = y 2 fX (y)dy + u2 F¯X (u) − (E[min(X, u)])2 0 Z u 2 Z u =2 y F¯X (y)dy − F¯X (y)dy 0
(6.5)
u
0
Policy limit u with deductible d
Let u represent the policy limit and d represent the deductible. Unlike before YL and YP do not have the same distribution because the deductible is nonzero (Pr[YL = 0] =Pr[X ≤ d] = FX (d) > 0) which truncates YP . Analysis of YP To understand YP it is first necessary to understand YL . When YL is subject to both a policy limit and deductible it’s random variable is defined as: YL = max(min(X, u) − d, 0) = min{X, u} − min{X, d} ( ( X X<u X X<d = − =⇒ u X>u d X>d
(6.6)
X<d 0 YL = X − d d ≤ X < u u−d X ≥u
Clearly YL is a mixed random variable, it has the following properties: • It’s cumulative density function is given by two mass points at 0 and u−d along with a region in between where the function is increasing mathematically: YL = 0 FX (d) = Pr(X ≤ d) (6.7) FYL (y) = Pr(YL ≤ y) = FX (y + d) = Pr(X ≤ y + d) 0 < y < u − d FX (u + d) = 1 y ≥u−d
6.4. ANALYSIS OF POLICY ADJUSTMENTS ON REPORTING METHODS 73 • Notice there are two mass points for YL
– One at YL = 0; X = d with probability FX (d) =Pr(X ≤ d) – Another at YL = u − d; X = u with probability F¯X (u) =Pr(X > u)
• Note that an amount paid of y (0 < y < u − d) is generated by from a loss of y + d, proof: max(min((y + d), u) − d, 0) = max(y + d − d, 0) = y • The mean and variance of YL are more complicated to find when there is both a deductible and policy limit. Below both the mean and variance are derived for YL
(6.8)
E[YL ] = E[min(X, u) − min(X, d)] = E[X ∧ u] − E[X ∧ d] Z u Z d ¯ = FX (x)dx − F¯X (x)dx 0
=
(6.9)
Z
0
u
F¯X (x)dx
d
V ar[YL ] = E[YL2 ] − (E[YL ])2 Z d Z u xfX (x)dx xfX (x)dx + 2 =2 0 0 Z d Z u 2 − 2( x f (x)dx + d xf (x)dx + duF¯X (x)) 0 0 Z u 2 − F¯X (x)dx d
where
E[YL2 ] = E[(min(X, u) − min(X, d))2 ] = E[(X ∧ u)2 ] − 2E[(X ∧ u)(W ∧ d)] + E[(X ∧ d)2 ] Z u 2 x2 f (x)dx + u2 F¯X (u) E[(X ∧ u) ] = 0 Z u xfX (x)dx =2 0 Z d Z u 2 E[(X ∧ u)(X ∧ d)] = x f (x)dx + d xf (x)dx + duF¯X (x) 0
where
2 X<d X (X ∧ u)(X ∧ d) = Xd d < X < u ud X > u
0
74
CHAPTER 6. POLICY LIMITS
Analysis of Yp Recall that YP = YL YL > 0 so YP can be defined as: YP = max(min(X, u) − d, 0)X > d =
(6.10)
(
X −d d<X <u u−d X >u
YP is a mixed random variable. YP has the following properties: • Its CDF is given by: (6.11) FYL (y)−FYL (0) = FX (y+d)−FX (d) 0 < y < u − d; d < X < u 1−FYL (0) 1−FX (d) FYP (y) = FY (u−d)−F (0) 1−F (0) YL L = 1−FYYL (0) = 1 y > u − d; X > u 1−FY (0) L
L
• If F¯YL (ω)ω→∞ = 0 over YP then YP has a singular mass point at YP = u − d; X = u with probability F¯Y (u − d) − F¯YL (ω)ω→∞ F¯X (u) F¯YP (u − d) = F¯YL (u − dYL > 0) = L = F¯YL (0) F¯X (d) • If
d (YL ) dy
exists then YP ’s PDF is given by:
d FY (y) = fYP (y) = dy P
d (FYL (y) dy
− FYL (0))
1 − FYL (0)
fY (y) fX (y + d) fYP (y) = ¯ L = FYL (0) FX¯(d)
=
d F (y) dy YL
1 − FYL (0)
=⇒
0<y <u−d ; d<X <u
• The mean and variance for YP are similar to those of YL , the difference arises in that they are divided by the factor F¯X (d) since YP is a truncated version of YL . The mean and variance are:
(6.12)
(6.13)
E(YL 1(0<y<u−d) YL > 0) E(YP ) = E(YL YL > 0) = Pr(YL > 0) Ru F¯X (x)dx E(YL ) = d ¯ = Pr(YL > 0) FX (d) Z u ¯ FX (x) = dx ¯ d FX (d) V ar(YL 1(0<y<u−d) YL > 0) V ar(YP ) = V ar(YL YL > 0) = (Pr(YL > 0))2 V ar(YL ) = (Pr(YL > 0))2
6.4. ANALYSIS OF POLICY ADJUSTMENTS ON REPORTING METHODS 75
6.4.3
Policies with an ordinary deductible d
This is a special case of of a policy with a deductible u and a limit u that goes to infity (u → ∞) Analysis of YL The random variable YL is defined as: (6.14)
YL = max(X − d, 0)
It is a mixed random variable with; • A mass point at YL = 0 with probability FX (d) • Assuming FYL (y) is differentiable for y ≥ 0 YL ’s PDF and CDF are given by: (6.15)
FYL (y) = Pr(YL ≤ y) = Pr(X ≤ y + d) =
(
FX (d) y=0 FX (y + d) y > 0
d (FYL (y)) = fYL (y) = fX (y + d) y > 0 dy • YL ’s mean and variance are given by: (6.16) E[YL ] = E[max((X − d), 0)] = E[(X − d) · 1(X>d) )] Z ∞ (x − d)fX (x)dx let u = x − d = d Z ∞ Z ∞ Z ∞ ⊥ ¯ ufX (u + d)du = FX (u + d)du = = F¯X (u)F¯X (d)du 0 0 Z0 ∞ = F¯X (u)duF¯X (d) = eX (d)F¯X (d) 0
(6.17)
V ar[YL ] = E[(max(X − d, 0))2 ] − (E[max(X − d, 0)])2 = E[(X − d)2 · 1(X>d) ] − (E[(X − d) · 1(X>d) ])2 Z ∞ = (x − d)2 fX (x)dx − (eX (d)F¯X (d))2 d
= E[X 2 ] − (E[X])2
76
CHAPTER 6. POLICY LIMITS
Analysis of YP The random variable YP is known as the excess loss random variable. In life contingencies it is referred to as the residual lifetime random variable. It is defined as: (6.18)
YP = YL YL > 0 = max(X − d, 0)X > d
Unlike YL , YP is a continuous random variable and has the following properties: • YP ’s CDF is given by: FYP (y) = Pr(YP ≤ y) = Pr(X − d ≤ yX > d) Pr(d < X ≤ y + d) = Pr(X < y + dX > d) = Pr(X > d) =⇒
(6.19)
FYP (y) =
FX (y + d) − FX (d) 1 − FX (d)
y≥0
• The pdf of YP is given by: fYP (y) =
dFYP (y) fX (y + d) fX (y + d) = = ¯ dy 1 − FX (d) FX (d)
• The mean and variance of YP are given as: E[YL ] = E[X − dX > d] = eX (d) =
R∞ d
F¯X (x)dx F¯X (d)
V ar(YL ) E[X 2 ] − (E[X])2 V ar[YP ] = V ar[X − dX > d] = ¯ = (FX (d))2 (F¯X (d))2 • Notice that when a deducible is imposed the mean excess loss function is similar to T V aR(d), where E[YP ] representes the excess loss over V aR(d)
6.4.4
Franchise deductible
Clearly YL and YP do not have the same distribution.
6.4. ANALYSIS OF POLICY ADJUSTMENTS ON REPORTING METHODS 77 Analysis of YL The random variable YP is defined as; YL = max(0, X · 1(X>d) ) =
(6.20)
(
0 X
X<d X≥d
As in the case of an ordinary deductible YP is a mixed random variable and has the following properties: • YL has a mass point for YL = 0 with probability FX (d). • YL ’s CDF and PDF are given by: (6.21) ( FYL (y) = Pr(YL ≤ y) = Pr(max(0, X · 1(X>d) ) ≤ y) = dFYL (y) = fYL (y) = dy
(
FX (d) 0<X<d FX (y + d) X > d
0 0<X<d fX (y) X > d
• YL ’s mean and variance are given as: Z ∞ u = dx dv = fX (x)dx E[YL ] = (0)FX (d) + xfX (x)dx let du = dx v = −F¯X (x) d Z ∞
∞ = −xF¯X (x) d + F¯X (x)dx (6.22) d R ∞ R∞
F¯X (x) F¯X (d) d = dF¯X (d) + F¯X (d) d¯ where = eX (d) FX (d) F¯X (d) = F¯X (d)[d + eX (d)] (6.23) V ar[YL ] = =
E[YL2 ] Z
∞ d
2
− (E[YL ]) =
0+
Z
∞ d
x fX (x)dx − (F¯X (d)[d + eX (d)])2 2
x2 fX (x)dx − (F¯X (d)[d + eX (d)])2
Analysis of YP For a given deductible d the payment loss franchise deductible random variable YP actually has the same form as T V aR(d), consequently the analysis is very similar.
78
CHAPTER 6. POLICY LIMITS
If YP denotes the payment loss franchise deductible random variable then: (6.24)
n YP = max(0, X · 1(X>d) )YL > 0 = YL YL > 0 = X
X>d
YP is a continuous random variable. It’s PDF, CDF, mean and variance are given by: ( 0 0<y<d dFYP (y) (6.25) fYP (y) = = fX (y) dy y≥d 1−FX (d) ( 0 0<y<d F YL − F 0 (6.26) FYP (y) = FYL (YL YL > 0) = = FX (y)−fX (d) 1 − FYL (0) y≥d 1−FX (d) F¯X (d)(d + eX (d)) E[YL ] E[YP ] = E[YL YL > 0] = = (6.27) = d + eX (d) 1 − FYL (0) F¯X (d) R∞ 2 x fX (x)dx − (F¯X (d)(d + eX (d)))2 (6.28) V ar[YP ] = V ar[YL YL > 0] = d (F¯X (d))2 Coinsurance With coinsurance the amount paid by the insurer is α times the amount that would have been paid without coinsurance. Thus we have (6.29)
YL = αYL∗
YP = αYP∗
where YL∗ and YP∗ denote the amount paid (per loss and per payment respectively) on an identical contract without coinsurance. We can therefore apply the results of “creating distributions” (specifically multiplication by a constant) to YL and YP . Hence the CDF, PDF and mean of YL and YP are given by: y y FYL (y) = FYL∗ (6.30) FYP (y) = FYP∗ α α y y 1 1 (6.31) fYP = fYP∗ fYL (y) = fYL∗ α α α α ∗ ∗ (6.32) E[YL ] = αE[YL ] E[YP ] = αE[YP ] Loss Elimination Ratio By introducing all these policy adjustments it is interesting to determine the proportion of loss (on average) that has been eliminated in the process. We refer to
6.4. ANALYSIS OF POLICY ADJUSTMENTS ON REPORTING METHODS 79 this proportion as the loss elimination ratio (abbreviated as LER) which is written as: E[YL ] (6.33) LER = 1 − E[X] The ratio E[YL ]/E[X] corresponds to the precentage loss retained by the insurer.
6.4.5
Some Examples
Example 1 Let X be an equal mixture exponential random variables with mean 20 and 40. Then it’s PDF and CDF are given by: 1 1 −x/40 1 −x/20 1 −x/40 1 1 −x/20 + FX (x) = 1 − e e e + e fX (x) = 2 20 2 40 2 2 A. If an ordinary deductible is applied to each loss find the CDF of YP . B. What is the PDF of YP ? Solution A Let d be an ordinary deductible applied to each loss of YP then the random variable YP is defined as: YP = YL YL > 0 = X − dX > d X > d which means FYL (y) − FYL (0) FX (y + d) − FX (d) = 1 − FYL (0) F¯X (x) 1 − 0.5e−(x+d)/20 − 0.5e−(x+d)/40 − 1 − 0.5e−d/20 − 0.5e−d/40 = 1 − (1 − 0.5e−d/20 − 0.5e−d/40 ) e−d/20 + e−d/4 − e−(x+d)/20 − e−(x+d)/40 = e−d/20 + e−d/40 −d/20 e e−d/40 −y/20 −y/40 =1− e + e e−d/20 + e−d/40 e−d/20 + e−d/40
FYP (y) = FYL (YL YL > 0) =
Notice that YP is a mixture of exponentials since: e−d/40 e−d/20 + −d/20 =1 e−d/20 + e−d/40 e + e−d/40  {z }  {z } p(d)
1−p(d)
Solution B Differentiating FYP (y) with respect to y: −y/20 −y/40 e e e−d/20 e−d/40 dFYP (y) = + fYP (y) = −d/20 −d/40 −d/20 −d/40 dy e +e 20 e +e 40
80
CHAPTER 6. POLICY LIMITS
Chapter 7 Frequency Models 7.1
Introduction
Ultimately interested in the aggregate claim amount under a collective risk model. (P N N >0 i=1 Xi (7.1) S= 0 N =0 S is viewed as a sum of a random number N of claim or payment amounts. The objective in this chapter is to study the process which generates claims for a portfolio of policies, i.e. the random variable N. The number of claims/payments in a given period of time is modeled as a discrete and integer valued random variable N with probability mass function (PMF) (7.2)
pK = Pr(N = k)
k = 0, 1, 2, ...
If PN (t) is the associated probability generating function (PGF) of pK then: (7.3)
N
PN (t) = E[t ] =
∞ X
t k pK
k=0
which is defined for t whenever the sum converges.
7.1.1
Useful properties of the PGF
Both the probabilities can nt h factorial moment can be derived from PN (t) 1. Probabilities of pk
∂ n PN (t)
(n) = PN (0) = n!pn
n ∂t t=0 81
82
CHAPTER 7. FREQUENCY MODELS
which means
(n)
P (0) pn = N n! 2. The nth factorial moment (1)
PN (t)
∞
∞
X ∂PN (t) X k−1 = kt pk = ktk−1 pk =⇒ ∂t k=0 k=1 (1) PN (1)
=
∞ X
kpk = E[N ]
k=1
similarly (2)
PN (t)
∞
∞
X ∂ 2 PN (t) X k−2 = k(k − 1)t p = k(k − 1)tk−2 pk =⇒ k 2 (∂t) k=0 k=2 (2) PN (1)
=
∞ X k=2
k(k − 1)pk = E[N (N − 1)]
In general PN (t) can be differentiated n times with respect to t and letting t = 1 after differentiation results in the nth factorial moment;
∞ X ∂ n PN (t)
(n) = k(k − 1)...(k − n + 1)pk = E[N (N − 1)...(N − n + 1)] PN (1) = (∂t)2 t=1 k=n
7.2
7.2.1
Possible candidates for modelling N Selection of basic distributions
Poisson Distribution The probability mass function of Possion random variable N with parameter λ > 0 is: e−λ λk pk = k = 0, 1, 2... k! It has the property: E[N ] = V ar[N ] = λ The corresponding PGF if N is: PN (t) =
∞ X
t
k
k=0 −λ λt
=e
e−λ λk k!
e = eλ(t−1)
=e
−λ
∞ X (λt)k k=0
k!
7.2. POSSIBLE CANDIDATES FOR MODELLING N
83
Property 1 If N1 , N2 , ..., Nm are indepedent poisson random varibles with parameters λ1 , λ2 , ..., λm then N = N1 + N2 + ... + Nm is a Poisson random variable with parameter λ = λ1 + λ2 + ... + λm . ⊥
PN (t) = E[tN ] = E[tN1 +N2 +...+Nm ] = E[tN1 tN2 · ... · tNm ] = E[tN1 ]E[tN2 ] · ... · E[tNm ] = eλ1 (t−1) eλ2 (t−1) · ... · eλm (t−1) = e(λ1 +λ2 +...+λm )(t−1)
= eλ(t−1)
Which is the PGF of a Poisson random variable with parameter λ. Example Suppose X1 and X2 are independent Poisson distributed random variables with means λ1 = 3 and Λ2 = 0.4. If S = X1 + X2 , find S’s PMF, mean and variance. Solution From property 1 S ∼Poisson(3 + 0.4), so: Pr(N = k) =
3.4k e−3.4 ; k = 0, 1, 2, ... E[S] = V ar[S] = 3.4 k!
Example Suppose X1 , X2 and X3 are independent Poisson distributed random variables with means Λ1 ∼Uniform(0, 3), Λ2 ∼Uniform(0, 2) and Λ3 ∼Uniform(0, 4). If S = X1 + X2 + X3 , is S a Poisson random variable? Solution From the result above we have that ⊥
E[tS ] = E[tX1 +X2 +X3 ] == E[tX1 ]E[tX2 ]E[tX3 ] (t−1)3 (t−1)2 (t−1)4 e − e(t−1)0 e − e(t−1)0 e − e(t−1)0 = (t − 1)(3) (t − 1)(2) (t − 1)(3) (e(t−1)3 − 1)(e(t−1)2 − 1)(e(t−1)4 − 1) t>1 = 18(t − 1)3 e(t−1)b − e(t−1)a E[tXi ] = E[E[tXi Λi ]] = E[eΛi (t−1) ] = MΛi (t − 1) = (t − 1)(b − a) Clearly S is not a Poisson random variable. When Λ is a random variable property 1 cannot be used. Property 2 Suppose that 1. The overall number of events N in a cerain time period is a Poisson random variable with parameter λ. 2. Each event is one of k distinct types of events that are independent of each other. Pk Given an event occurs the propbability the event is of type i is pi where i=1 pi = 1.
Then:
84
CHAPTER 7. FREQUENCY MODELS 1. For each fixed i = 1, 2, ..., k the number of events of type i, say Ni , has a Poisson distribution with parameter λpi 2. The random variables N1 , N2 , ..., Nm are independent.
proof Assume k = 2 then n = n1 + n2 and has PMF: pn1 pn2 n1 + n2 n 1 n 2 Pr(N1 = n1 , N2 = n2 N = n) = p1 p2 = (n1 + n2 )! 1 2 n1 n1 !n2 ! If n 6= n1 + n2 then Pr(N1 = n1 , N2 = n2 N = n) = 0. Summing over N = n the multinomial unconditional probability is derived: ∞ X pn1 1 pn2 2 Pr(N = n) Pr (N1 = n1 , N2 = n2 ) = (n1 + n2 )! n1 !n2 ! n=0
pn1 1 pn2 2 λn1 +n2 eλ n1 !n2 ! (n1 + n2 )! λn1 +n2 eλ(p1 +p2 )
= (n1 + n2 )!
pn1 1 pn2 2 n1 !n2 ! (λp1 )n1 e−λp1 (λp2 )n2 e−λp2 =∞ n1 ! n2 ! = Pr(N1 = n1 )Pr(N2 = n2 ) =
To find the marginal distribution of N1 and N2 Pr(N1 = n1 , N2 = n2 ) is summed over n2 and n1 that is: Pr(N1 = n1 ) =
∞ (λp1 )n1 e−λp1 X (λp2 )n2 e−λp2 Pr(N1 = n1 , N2 = n2 ) = n1 ! n2 ! n =0 =0
∞ X
n2
2
n1 −λp1
(λp1 ) e n1 ! ∞ ∞ X X (λp1 )n1 e−λp1 (λp2 )n2 e−λp2 Pr(N2 = n2 ) = Pr(N1 = n1 , N2 = n2 ) = n1 ! n2 ! n =0 n =0 =
1
1
(λp2 )n1 e−λp2 = n2 !
From above it has been shown that N1 and N2 are Poisson random variables with parameters λp1 and λp2 and that they are independent. In general, for N = N1 + N2 + ... + Nk a similar proof can be constructed using the multinomial expansion to show N1 , N2 , ..., Nk are both Poisson distributed random variables with parameters λp1 , λp2 , ..., λpk and independent.
7.2. POSSIBLE CANDIDATES FOR MODELLING N
85
Example Suppose the highway patrol wishes to count the number of vehicles that arrive at a busy intersection. The patrol notices that vehicles arrive at a Poisson rate of 30 per hour. If 50% of the vehicles are cars, 20% are trucks and 30% are buses. Calculate the probability that 4 cars, 3 trucks and 0 buses arrive in the next 15 minutes. Solution Using property 2 Number of cars = 30(0.5)/4 = 3.75 Number of trucks = 30(0.3)/4 = 2.25 Number of buses = 30(0.2)/4 = 1.5 Notice that that the number of vehicles (car, truck or bus) arriving in a 15 minute period need not be an integer. Accordingly the probability that 4 cars, 3 trucks and 0 buses arrive in the next 15 minutes is: 3.754 e−3.75 2.253 e−2.25 1.50 e−1.5 Pr(cars=4,trucks=3,buses=0) = = 0.008651 4! 3! 0! Example Using the information from the previous example, the patrol want’s to know what the probability is that 2 trucks arrive before 3 buses Binomial distribution Given that a binomial random variable has a finite support the use of a binomial distribution for the number of claims/payments N implies that there is a maximum number of claims/payments that can occur. For a binomial random variable with parameters m ∈ N+ and q ∈ (0, 1) its PMF, mean and variance are: ( m q k (1 − q)m−k k = 0, 1, 2, ..., m pk = k E[N ] = mq
V ar[N ] = mq(1 − q)
Note that the mean of N is larger than the variance of N i.e. E[N ] > V ar[N ]. The PGF of N is given by: m m X X m k m−k k m (tq)k (1 − q)m−k q (1 − q) = PN (t) = t k k k=0 k=0 m−k ! m k X m tq (1 − q) = (1 − q + tq)m k 1 − q + tq 1 − q + tq  k=0 {z } 1
= (1 − q + tq)
m
86
CHAPTER 7. FREQUENCY MODELS
Negative Binomial Distribution For a negative binomial random variable N with parameters r > 0 and β > 0 the PMF, mean and variance are:
pk =
PDF r k+r−1 1 1+β
k
Mean (E[N ]) Variance (V ar[N ]) β 1+β
k
k = 0, 1, ...
rβ
rβ(1 − β)
x is positive for any nonnegative k and any real x. Notice that where k Γ(x + 1) x(x − 1) · ... · (x − k + 1) x = = k k!Γ(x − k + 1) k! Remark: For r ∈ N+ a negative binomial random variable N can be viewed as the nbumber of failures until reaching the rth success. Notice that its variance is less than its mean (E[N ] < V ar[N ]). N ’s PGF is given by: ∞ X
r k 1 β k+r−1 PN (t) = t k 1+β 1+β k=0 r X k ∞ tβ 1 k+r−1 = k 1+β 1+β k=0 r −r X k r ∞ tβ tb tb 1 k+r−1 1− 1− = k 1+β 1+β 1+β 1+β k=0  {z } 1 !r !r 1 1 =
=
1
k
1+β tb − 1+β
1 1 + β − tβ
=
1+β 1+β−tβ 1+β
r
Remark • When r = 1, then the resulting distribution is referred to as the geometric distribution. The geometric distribution possesses the memoryless property and can be viewed as a discrete counterpart of the exponential distribution.
7.2. POSSIBLE CANDIDATES FOR MODELLING N
87
• The negative binomial is another exammple of a mixture. Indeed, if N Λ = λ ∼Poisson(λ) and Λ ∼Gam(α, θ) then N is a negative binomial distribution with parameters r = α and β = θ. Proof ; PN (t) = E[tN ] = E[E[tN Λ]] = E[eΛ(t−1) ] = MΛ (t − 1) = (1 − θ(t − 1))−α since the MGF of Λ is MX (t) = (1 − θt)−α . PN (t) is precisely the PGF of a negative binomial with r = α and β = θ.
7.2.2
The (a, b, 0) class of distributions
The Poisson, Binomial and Negative Binomial all belong to a class of discrete integer valued distributions called the (a, b, 0) class. A discrete nonnegative random variable N with PMF pk ; k = 0, 1, ... is a member of the (a, b, 0) class if there exist 2 constants a and b such that for all k = {1, 2, ...} (7.4)
pk =
b a+ k
b pk =a+ pk−1 k
pk−1
Example 1: Poisson pk = pk−1
e−λ λk k! e−λ λk−1 (k−1)!
=
λ k
a=0 b=λ
Example 2: Binomial
pk pk−1
m k q (1 − q)m−k k = m q ( k − 1)(1 − q)m−(k−1) k−1 m! m−k+1 q q (m − k)!k! = = m! 1−q k 1−q (m − (k − 1))!(k − 1)! 1 (m + 1)q −q (m + 1)q −q + a= b= = 1−q k 1−q 1−q 1−q
88
CHAPTER 7. FREQUENCY MODELS
Example 3: Negative Binomial (k + r − 1)! k + r − 1 1 r β k 1+β 1+β k pk β k!(r − 1)! = = r k−1 k+r−2 pk−1 1+β (k − 1) + r − 1 β 1 1+β 1+β k−1 (k − 1)!(r − 1)! k+r−1 β β 1 (r − 1)(β) β (r − 1)β = = + a= b= k 1+β 1+β k 1+β 1+β 1+β Important Note These 3 distributions are the only possible distributions in the (a, b, 0) class. On an exam, you must be able to identify the distribution given we know the the distribution is (a, b, 0) and the form of a and b. Note: Poisson Binomial Negative Binomial a=0 a>0 a<0
7.2.3
The (a, b, 1) class of distributions
At times the (a, b, 0) class of distributions does not adequately describe the characteristics of some insurance data with regards to the claim arrival process. In particular where the fit provided by any of the (a, b, 0) distributions is poor for the probability of no claim/payment in a given period (i.e. Pr(N = 0)). The idea is then to modify the (a, b, 0) by: • Maintaining the recursive relationship for k = {2, 3, 4, ...} • Choosing the probability for Pr(N = 0). At this point it is necessary to distinguish between 2 cases for choosing Pr(N = 0). a.) Case 1: If the new probability chosen for Pr(N = 0) is 0, the resulting distribution is referred to as a zerotruncated distribution. b.) Case 2: If the new probability chosen for Pr(N = 0) is different from the p0 suggested by the (a, b, 0) class then the resulting distribution is called the zero modified distribution. Consider that N is a member (a, b, 0) class with pk =Pr(N = k) for k = P∞of the k {1, 2, ...} and PGF PN (z) = k=0 z pk . Case 1: The probabilties of its corresponding zerotruncated distribution are pTk for k = {1, 2, ...} with pT0 = 0. For the (a, b, 1) recursion to be valid: (7.5)
pTk = βpk
k = {1, 2, ...}
7.2. POSSIBLE CANDIDATES FOR MODELLING N
89
so that pk b βpk pTk = =a+ = T βpk−1 pk−1 k pk−1 Also
P∞
k=0
k = {2, 3, ...}
pTk = 1 must be true so:
pT0 +
∞ X
pTk = 1 since pT0 = 0
∞ X
⇒
k=1
β(1 − p0 ) = 1
βpk = 1 since
k=1
=⇒
∞ X
pk = 1
k=0
β=
1 1 − p0
Accordingly pTk = pk /(1 − p0 ) k = {1, 2, ...} . Now that pTk is described over all N+ PGF can be described: PNT (z)
= =
∞ X
z k pTk
k=0 P∞ k=0
=
∞ X
z k pTk
k=1
k
=
∞ X
zk
k=1
pk 1 − p0
z pk − p0 PN z − p0 = 1 − p0 1 − p0
Case 2 The probabilities of its corresponding zero modified distribution are pM k k = {0, 1, 2, ...}. Once pM has been chosen the other probabilities are determined such 0 that thee (a, b, 1) recursion holds i.e. pM b k =a+ M k pk−1 As before pM k = βpK k = {1, 2, ...} , since pM 0
+
∞ X
M pM 0 = p0 + β
k=1
k = {2, 3, ...} P∞
X k=1
β=
k=0
pM k = 1 it follows that:
pk = pM 0 + β(1 − p0 ) =⇒
1 − pM 0 1 − p0
That means the distribution of pM k is given by: (7.6)
pM k =
1 − pM 0 pk 1 − p0
k = {1, 2, ...}
90
CHAPTER 7. FREQUENCY MODELS
Its correpsonding PGF is: PnM (z)
=
∞ X
z k pM k
=
pM 0
k=0
+
∞ X
z k pM k
k=1
= pm 0 +
∞ X 1 − pM 1 − pM 0 0 z k pk = pm + (PN (z) − p0 ) 0 1 − p0 k=1 1 − p0
Alternatively the expression above can be written as: M pM N (z) = p0 −
=
(1 − pM (1 − pM 0 ) 0 ) p0 + PN (z) (1 − p0 ) (1 − p0 )
pM 1 − pM 0 − p0 0 + PN (z) 1 − p0 1 − p0
If pM 0 > p0 the zeromodified distribuion can be viewed as a mixture between the degenrate distribution at 0 and the original (nonmodified) distribution. Example Suppose N is a Poisson random variable with mean λ = 2. We consider its zero modified version with pM 0 = 0.3. The PGF of this zeromodified Poisson random variable is: 1 − 0.3 0.7 PNM (z) = 0.3 + (e2(t−1) − e−2 ) (PN (z) − p0 ) = 0.3 + 1 − p0 1 − e−2 equivalently:
1 − 0.3 + PN (z) = 1 − e−2
1 − 0.3 e e2(t−1) 1 − e−2
Since 0.3 > e−2 the zero modified random variable is a mixture between a degenerate distribution at 0 and the original Poisson random variable.
7.2.4
Mixture of distributions
Suppose that N Θ = θ has conditinoal PMF Pr(N = nΘ = θ) and the mixing random variable is either: a.) Discrete with PMF Pr(Θ = θi ) = ai i − {1, 2, ..., n} b.) Continuous with gΘ (θ) When both distributinos of N Θ = θ and Θ are known, then the unconditional PMF od N is: (P M ai Pr(N = nΘ = θi ) if Θ is discrete (7.7) Pr(N = n) = R i=1 Pr(N = nΘ = θ)aΘ (θ)dθ if Θ is continuous
7.2. POSSIBLE CANDIDATES FOR MODELLING N
91
Example Suppose that N Θ = θ is Poisson with mean θ. Suppose Θ has density λ2 (θ + 1)eλθ gΘ = λ+1
θ>0
where λ > 0. Determine the unconditional distribution of N . Note that λ2 λ2 −λθ θe−λθ + e λ+1 λ+1 λ 1 = (λ2 θe−λθ ) + (λe−λθ ) } } λ + 1 {z λ + 1{z
gΘ (θ) =
Erlang∼(2,1/λ)
Exponential∼(1/λ)
Θ is a mixture of a Gamma(2, 1/λ) (Erlang(2, 1/λ)) and a Gamma(1, 1/λ) (exponential(1/λ)) with mixing weights 1/(1 + λ) and λ/(1 + λ). This implies the distribution of Pr(N = n) is: Z ∞ Pr(N = n) = Pr(N = nΘ = θ)gΘ (θ)dθ 0 Z ∞ n −θ θ e 1 λ 2 −λθ −λθ = (λ θe ) + (λe ) dθ n! λ+1 λ+1 0 Z ∞ n −θ Z ∞ n −θ λ 1 θ e θ e 2 −λθ −λθ = (λ θe ) dθ + (λθe ) dθ 1+λ n! 1+λ n! 0 0 {z } {z }   Neg Binomial withr=2,β=1/λ Neg Binomial withr=1,β=1/λ 2 n 1 1 1/λ n+2−1 = 2−1 1+λ 1 + 1/λ 1 + 1/λ 1 n 1 1/λ λ n+1−1 + 1−1 1+λ 1 + 1/λ 1 + 1/λ 2 1 λ λ λ = (n + 1) + 1+λ (1 + λ)n+2 1 + λ (1 + λ)n+1 N is a mixture of 2 negative binomials with mixing weights 1/(1 + λ) and λ/(1 + λ).
7.2.5
Compound random variables
A larger class of distributions can be created by the process of incorporating any 2 discrete distributions. Suppose N and M are 2 discrete nonnegative integer valued raqndom variables.
92
CHAPTER 7. FREQUENCY MODELS
Define S to be the random sum: S=
(7.8)
(P
N i=1
Mi
0
N >0 N =0
where each Mi has the same distributipon as M and M1 , M2 , ... and N are mutually independent. We refer to • N as the primary distribution with PMF pk and PGF pN (z). • M as the secondary distribution with PMF fk , PGF PM (z) and MGF MM (t). S is called a compound random variable. In insurance the distribution of S arises naturally. For instance in an auto insurance context, N may represent the number of accidents and Mi represents the number of claims generated by the ith acccident. In this case S is the total number of claims over a given time period from the auto claim insurance. S like any other random variable can be characterized via its PGF,MGF,mean,variance and PMG: PGF of S: PS (t) = E[tS ] = E[E[tS N ]] = E[(PM (t))N ] = PN (PM (t))
(7.9) where
h PN i E[tS N = n] = E t i=1 Mi N = E[tM1 tM2 · ... · tMn ]
= E[tM1 ]E[tM2 ] · ... · E[tMn ] = (PM (t))n
Note for n = 0 → E[tS N = n] = 1 = (PM (t))n it also follows that if E[tS N ] is a random variable then so is (PM (t))N (N is random here). MGF of S MS (r) = E[erS ] = PS (er ) = PN (PM (er )) = PN (MM (r))
(7.10) Mean of S (7.11)
" "
E[S] = E[E[SN ]] = E E
N X i=1
Mi N
##
= E[N E[Mi ]] = E[N ]E[Mi ]
Alternatively using the PGF of S: ∂PS (t) ∂PM (t) ′ = PN (PM (t)) ∂t ∂t ′ ′ ⇒ PS′ (1) = PM (1)PN′ (PM (1)) = E[M ]E[N ]
7.2. POSSIBLE CANDIDATES FOR MODELLING N
93
Variance of S (7.12) V ar[S] = E[V ar[SN ]] + V ar[E[SN ]] = E[N ]V ar[Mi ] + (E[Mi ])2 V ar[N ] PMF os S Let {gk }k=0 be the PMF of S i.e. gk =Pr(S = k) k = {0, 1, 2, ...}. Recall: a.) {pn }n≥0 are PMFs of the primary random variable N b.) {fk }k≥0 are PMFs of the secondary random variable M Calculation of g0 : g0 = Pr(S = 0N = 0)p0 +
∞ X
Pr(S = 0N = n)pn
N =1
= p0 + = p0 +
∞ X
n=1 ∞ X
∞ X
Pr
i=1
!
Mi = 0 p n
Pr(M1 = 0, M2 =, ..., Mn = 0)pn
n=1
= p0 +
∞ X n=1
= PN (f0 )
n
0
(f0 ) pn = p0 (f0 ) +
∞ X
n
(f0 ) pn =
n=1
∞ X
(f0 )n pn
n=0
Calculation of {gk }k≥1 gk = Pr(S = k) = Pr(S = kN = 0)p0 +
∞ X
Pr(S = kN = n)pn
n=1
=
∞ X n=1
Pr(M1 + M2 + ... + Mn = k)pn =
∞ X
fk∗n pn
n=1
Where(fk∗ )n =Pr(M1 + M2 + ... + Mn = k) is the PMF associated to the nfold convolution of fk with itself (i.e. the PMF of the sum of n independent identically distributed ranom variables wtih pmf {fk }k≥0 ). Note that fk∗1 = fk . Example Suppose p0 = 0.4, p1 = 0.4 andp2 = 0.2. Also f0 = 0.5, f1 = 0.3 and f2 = 0.2, find the PMF of S.
94
CHAPTER 7. FREQUENCY MODELS
Solution We proceed by finding the possible values of fi∗n over n f1∗2 = Pr(M1 + M2 = 1) = Pr(M1 = 0, M2 = 1) + Pr(M1 = 1, M2 = 0) = f0 (f1 ) + f1 (f0 ) = 2(f0 )f1 = 2(0.5)0.3 = 0.3 ∗2 f2 = Pr(M1 + M2 = 2) = Pr(M1 = 0, M2 = 2) + Pr(M1 = 2, M2 = 0) + Pr(M1 = 1, M2 = 1) = f2 (f0 ) + f0 (f2 ) + f1 (f1 ) = 0.2(0.5)2 + (0.3)2 = 0.29 f3∗2 = Pr(M1 + M2 = 3) = Pr(M1 = 1, M2 = 2) + Pr(M1 = 2, M2 = 1) = 2(f1 )f2 = 0.3(0.2)2 = 0.12 ∗2 f3 = Pr(M1 + M2 = 4) = Pr(M1 = 2, M2 = 2) = (0.2)2 = 0.04 It follows that g0 = PN (f0 ) =
2 X
(f0 )2 pk = p0 + f0 p1 + (f0 )2 p2 = 0.65
k=0
g1 =
2 X
f1∗n pn = f1∗1 p1 + f1∗2 p2 = 0.3(0.4) + 0.3(0.2) = 0.18
n=1
g2 =
2 X
f2∗n pn = f2∗1 p1 + f2∗2 p2 = 0.3(0.4) + 0.29(0.2) = 0.138
n=1
g3 = g4 =
2 X
n=2 2 X
f3∗n pn = f3∗2 p2 = 0.12(0.2) = 0.024 f4∗n pn = f4∗2 p2 = 0.04(0.2) = 0.008
n=2
Note that:
P4
k=0
gk = 1.
In general the nfold convolution (fkx )n can be calculated recrsively via: fk∗n = Pr(M1 + M2 + ... + Mn = k) =
k X
k = {1, 2, ...}
Pr(M1 + M2 + ... + Mn = kMn = j)Pr(Mn = j)
j=0
=
k X j=0
=
k X j=0
Pr(M1 + M2 + ... + Mn−1 = k − j)Pr(Mn = j) ∗(n−1)
fk−j
fj
7.2. POSSIBLE CANDIDATES FOR MODELLING N
95
In most cases the evaluation of fk∗n for large values of n is a cubersome opertaion. Fortunately, under some restrictions on the primary distribution N , a recursion formula known as Panjer’s Recursion can be used to efficiently obtian the PMF {gk }k≥0 .
7.2.6
Panjer’s recursion for the (a, b, 0) class
Suppose that N is an (a, b, 0) member then in order to use Panjer’s recursion two equations must first be derived, we will call them equation 1 and equation 2 : pn b n = {1, 2, ...} =⇒ =a+ pn−1 k npn = (na + b)pn−1 = ((n − 1)a + a + b)pn−1 equation 1 Recall that PS (t) = PN (PM (t)) differentiating with respect to t: PS′ (t)
=
′ PM (t)PN′ (PM (t))
=
′ PM (t)
n X
n(PM (t))n−1 pn
equation 2
i=1
′ Now multiply both sides of equation 1 by PM (t)(PM (t))n−1 : ′ PM (t)(PM (t))n−1 npn = ′ PM (t)(PM (t))n−1 ((n − 1)a + a + b)pn−1 = ′ ′ (t)(PM (t))n−1 pn−1 (t)(PM (t))n−1 (n − 1)pn−1 + (a + b)PM aPM
Summing the above equation over n from 1 to ∞ ∞ X
′ PM (t)n(PM (t))n−1 pn =
n=1
∞ X n=1
′ ′ (aPM (t)(PM (t))n−1 (n − 1)pn−1 + (a + b)PM (t)(PM (t))n−1 pn−1 ) =
′ aPM (t)
′ aPM (t)
∞ X
n=1 ∞ X n=0
(PM (t))
(n − 1)pn−1 + (a +
′ b)PM (t)
′ (PM (t))( n − 1 + 1)npn + (a + b)PM (t)
′ aPM (t) PM (t)

n−1
∞ X n=0
∞ X (PM (t))n−1 pn−1 = n=1
∞ X
n=0 ∞ X
′ (t) (PM (t))( n − 1)npn +(a + b)PM
{z
PS′ (t)
}
′ PS′ (t) = aPM (t)PS′ (t) + (a + b)PM (t)PS (t)
(PM (t))n pn =
n=0
(PM (t))n pn =⇒ {z
PS (t)
}
96
CHAPTER 7. FREQUENCY MODELS
Expanding both sides as a funciton of t: ! ∞ ! ∞ ∞ ∞ ∞ X X X X X tk−1 (kgk ) = a tj−1 (jgj ) + (a + b) tj (jgj ) t i fi ti−1 ifi i=0
k=0
j=0
i=0
j=0
which holds true for all −1 ≤ t ≤ 1. As a result the coefficients of tk−1 for k = {1, 2, 3, ...} must be the same on both sides i.e. kgk = a
X
jfi gj + (a + b)
i+j=k
= ak
k X i=0
X
ifi gj = a
i+j=k
fi gk−i + b
k X
X
(i + j)fi gj + b
i+j=k
ifi gk−i = akf0 gk +
i=0
k X
X
ifi gj
i+j=k
(ak + bi)fi gk−i =⇒
i=1
k X 1 kgk (1 − af0 ) = (ak + bi)fi gk−i k − kaf0 i=1 k X bi 1 fi gk−i a+ gk (1 − af0 ) = 1 − af0 i=1 k
Theorem 7.2.1 For N a member of the (a, b, 0) class the PMG {gk }k≥0 can be calcualted recursively via: (7.13)
k X bi 1 a+ fi gk−i gk = 1 − af0 i=1 k
for k = {1, 2, 3, ...}
where the starting point of recursion is g0 = PN (f0 ) A compound random variable S with N as a Poisson (Binomial) [Negative Binomial] random variable is called a compound Poisson (Binomial) [Negative Binomial] random variable. Example Suppose that S is a compound Poisson random variable with Poisson parameter λ > 0 and the secondary distribution is Bernoulii with mean q. Determine the distribution of S. Solution: Recall that a Bernoulli random variable is defined as: ( q X=1 pX (x) = 1−q X =0
7.2. POSSIBLE CANDIDATES FOR MODELLING N
97
with PGF PM (t) = 1 − q + qt. Approach 1: Use the PGF of of S: PS (t) = PN (PM (t)) = PN (1 − q + qt) = PN (qt) = eλq(t−1) where PN (t) = eλ(t−1) . Since PS (t) has the same form as a Poisson random variable with parameter λq, S is also a Poisson random variable. Approach 2 Using Panjer’s recursion. For N a Poisson random varaible with parameter λ we know that a = 0 and b = λ. The starting point of Panjer’s recusrion is g0 = PN (f0 ) = eλ(f0 −1) = eλ(1−q−1) = e−λq
(7.14)
Also using Panjer’s recursion: gk = =
k X λi i=1 1 X i=1
k
fi gk−i
λ fi gk−i k
k = {1, 2, ...} the Bernoulli r.v. is distributed over 0 and 1
λq = gk−1 k The PMF {gk }k≥0 satisfies the (a, b, 0) recursion that is : b λq (7.15) gk = a + gk−1 = gk−1 k k S has to be a Poisson random variable with parameter λq . Example: Suppose that Si is a compound Poisson random variable with Poisson (i) parameter λ(i) i = {1, 2}. The secondary distribution has PMF {fk }k≥0 and PGF P k (i) P (i) (t) = ∞ k=0 t fk . Assuming S1 and S2 are independent, find the distribution of S = S1 + S2 . Solution: The PGF of S is: ⊥
PS (t) = E[tS ] = E[tS1 +S2 ] = E[tS1 tS2 ] = E[tS1 ]E[tS2 ] = eλ
(1) (P (1) (t)−1)
= e(λ
eλ
(2) (P (2) (t)−1)
(1) +λ(2) )(P (1) (t)+P (2) (t)−1)
where p(t) =
= eλ = e(λ
(1) P (1) (t)+λ(2) P (2) (t)−(λ(1) +λ(2) )
(1) +λ(2) )(P (t)−1)
λ(2) λ(1) (1) p (t) + p(2) (t) λ(1) + λ(2) λ(1) + λ(2)
98
CHAPTER 7. FREQUENCY MODELS
One concludes that S is a compound poisson random variable with Poisson parameter b = λ(1) + λ(2) and secondary distribution with PGF p(t). If we denote {fk }k≥0 to be the PMF corresponding to p(t) then: fk =
λ(1) λ(2) (1) (2) f + f λ(1) + λ(2) k λ(1) + λ(2) k (1)
(2)
which is a mixture of the two original secondary distributions {fk }k≥0 and {fk }k≥0 respectively. Therefore the PMF of S can be calculated via Panjer’s recursion. k X bi 1 a+ fi gk−i gk = 1 − af0 i=1 k k X 1 (λ(1) + λ(2) )i λ(2) λ(1) (1) (2) = 0+ f (t) + (1) f (t) gk−i 1 − 0(f0 ) i=1 k λ(1) + λ(2) k λ + λ(2) k k X i (1) (2) (λ(1)f (1) + λ(2)f (2) )gk−i = (λ + λ ) k i=1 and
(1)
(2)
g0 = PN (f0 ) = e(λ +λ )(f0 −1) = λ(2) λ(1) (1) (2) (1) (2) f + (1) f −1 = exp (λ + λ ) λ(1) + λ(2) 0 λ + λ(2) 0 (1)
(2)
= exp(λ(1) f0 + λ(2) f0 − λ(1) − λ(2) ) = e−λ
7.2.7
(1) (1−f (1) ) 0
e−λ
(2) (1−f (2) ) 0
Panjer’s recursion for the (a, b, 1) class
Using a similar method(wtih some more technical details) when N is a membe of the (a, b, 1) class then: P (p1 − (a + b)p0 )fk + ki=1 a + bik fi gk−i (7.16) gk = 1 − af0 with starting value g0 = PN (f0 ) k = {1, 2, ...}
7.3
Effect of policy adjustments
As we have been saying the frequency random variable N represnts the losses/payments generated by a portfolio of policies. We will examine 2 key facors that impact the distribution of N .
7.3. EFFECT OF POLICY ADJUSTMENTS
7.3.1
99
Effect of exposure
Clearly the larger the exposure or the porfolio (in terms of the number of insured, square feet of a building, miles driven) the larger we expect N to be. Assume the current portfolio consists of n entities each of which could produce claims. Let Nj be the number of claims produced by the j th entity. Then the n X total number of claims producced by the n entities is N = Ni . Assuming that i=1
{Nj }nj=1
are independent and identically distributed random variables with PGF PNi (t) then: PN (t) = E[tN ] = E[tN1 +N2 +...+Nn ] = (PNi (t))n
(7.17)
Now suppose that the portfolio is expected to have n∗ entities in the following year. P ∗ n Let N ∗ = i=1 Ni be the total number of claims produced by the n∗ entities. Then: ∗
∗
∗
PN ∗ (t) = E[tN ] = E[tN1 +N2 +...+Nn ] = (PNi (t))n = (PNi (t)n )n
∗ /n
= (PN (t))n
∗ /n
Given that n∗ is expected to be different from n, a desirable property of a discrete distribution is given next:
Definition 7.3.1 A discrete distribution with PGF p(t) is said to be infinitely divisible if for all n = {1, 2, 3, ...} the function [PN (t)]1/n is the PGF of some random variable. Example Suppose that N is a Poisson random variable with mean λ > 0. Then its PGF is PN (t) = eλ(t−1) , to check if N is infinitely divisible we check to see if [PN (t)]1/n is the PGF for some other random variable. [PN (t)]1/n = eλ/n(t−1) Clearly N is infinitely divisible because [PN (t)]1/n is the PGF of a Poisson random variable with parmeter λ/n. Since the Poisson distribution is infinitely divisible ∗ ∗ PN ∗ (t) = (PN (t))n /n = eλn /n(t−1) correspondes to a Poisson random varaible with mean λn∗ /n. Example Suppose that N is a negative binomial random variable with mean rβ and variance rβ(1 + β) . Then PN (t) = (1 + β − βt)−r and (PN (t))1/n = (1 + β − βt)−r/n is the PGF of another negative binomial random variable with mean −r/nβ and variance −r/nβ(1 + β) for all n = {1, 2, ...}. So the negative binomial is ∞ly
100
CHAPTER 7. FREQUENCY MODELS
divisible. Example Suppose N is a binomial random variable with mean mq ande variance mq(1 − q) then: PN (t) = (1 − q + qt)m =⇒ (PN (t))( 1/n) = PN (t) = (1 − q + qt)m/n In somce cases (PN (t))( 1/n) is not a PGF and accordingly is not infinitely divisible. Also since n/m may not be an integer the associated random variable with [PN (t)]1/n may not be binomial. For example take n = 2m then (PN (t))1/(2m) = (1 − q − qt)0.5 . Note that (PN (1))1/(2m) = (1 − q + q)0.5 = 1 but:
−1 2 ∂ 2 (PN (t))1/(2m)
−1 = p2 = q (1 − q + qt) = (1 − q)−3/2 < 0
2 ∂t 4 4 t=0 There exists no variable with a negative PMF at 2, therefore the binomial random varaible is not ∞ly divisible. Note on the logarithmic distribution Definition 7.3.2 A discrete random variable N is said to follow a logarithmic distribution with parameter β > 0 if it has PMF 1 Pr(N = k) = k
β 1+β
k
1 ln(1 + β)
p0 = 0; k = {1, 2, ...}
It belongs to the (a, b, 1) class with: a=
β 1+β
b=
Note that: pk = therefore
P∞
k=1
1 k
β 1+β
k
pk = 1 implies: ∞ X k=1
pk =
∞ X k=1
1 k
β 1+β
k
−β 1+β
−ln
1
−ln
1 1=β
1
1 1=β
= 1
k ∞ X β β 1 = −ln(1 − ) ⇒ k 1 + β 1 + β k=1
7.3. EFFECT OF POLICY ADJUSTMENTS
101
In fact this is the Taylor’s series expansion for ln(1 − x) for x ≤ 1 i.e. ln(1 − x) = −
∞ X 1 k=1
k
xk
Example: Supposse that S is a discrete compound random variable with primary random variable N and secondary random variable M . Then PS (t) = PN (PM (t)) implies (PS (t))1/n = (PN (PM (t)))1/n = PN ∗ (PM (t)) where PN ∗ (t) = [PN (t)]1/n is the PGF on a random variable if N is infinitely divisible. This if the distribution of N is infintieyl divisible so is the distribution of S. One also concludes that compound Poisson and compound Negative Binmoical distributinos are infinite4ly divisible.
7.3.2
Effect of Severity
Let X be the groundup loss and NL be the number of losses for a particular portfolio of risks. Suppose that the amount paid (YL or YP ) is an amount modified from X due to some policy adjustments. There exists 2 common ways to define the loss models reflecting the total amount paid 1. Keep the frequency distribution unchanged and use the modified severity on a per loss basis (i.e. YL ) (P NL NL > 0 i=1 YL,i (7.18) S= 0NL = 0 where YL,i =amount paid on the ith loss 2. Use the modified severity on a per payment basis (YP ) in conjunction with a modified frequency distribution NP which counts the number of losses resulting ina nonzero payment (P NP NP > 0 i=1 YP,i (7.19) S= 0NP = 0 where YP,i = the amount paid on the ith nonzero payment. Note that NP is also a compound random variable: (P NL NL > 0 i=1 Ii (7.20) NP = 0 NL = 0
102
CHAPTER 7. FREQUENCY MODELS where Ii =
(
1 if the ith loss results in a nonzero payment 0 if the ith loss results in a zero payment
That is Ii is a Bernouilli random variable.
Example Let YL,1 = 10, YL,2 = 0, YL,3 = 20, YL,4 = 0 and NL = 4. In this case I1 = 1, I2 = 0, I3 = 1 and I4 = 0 so NP = 2. We can find S using the two methods mentioned P above: L 1. S = N YL,i = 10 + 0 + 20 + 0 = 30 Pi=1 NP 2. S = i=1 YP,i = 10 + 20 = 30 In general it is assumed that Pr(Ii = 0) =Pr(YL = 0) = 1 − α (for all i) and of course Pr(Ii = 1) = α. Then the PGF of an arbitray I − I is PI (t) = 1 − α + αt. One concludes that the PGF of NP is PNP (t) = PNL (PI (t)) = PNL (1 − α + αt). Note: α depends on teh specifics of the policy adjustments.
Example Let NL be the Poisson with mean λ. Find the distribution of NP given that Pr(YL = 0) = 1 − α. Solution: PNP (t) = PNL (1 − α + αt) = eλ(1−α+αt) = eλα(t−1) NP is Poisson with mean λα. Example Let NL be binomial with mean mq and variance mq(1 − q). Find the distributino of NP Pr(YL = 0) = 1 − α. Solution PNP (t) = PNL (1 − α + αt) = [1 − q + q(1 − α + αt)]m = (1 − qα + qαt)m NP is binmoial with mean mαq and variance mαq(1 − αq). Example Suppose that NL is a zeromodified Poisson random variable with Poisson paremter λ. Find the distribution of NP given that Pr(YL = 0) = 1 − α. Solution Recall that PNL (t) =
pM 0
+
1 − pM 0 1 − e−λ
(eλ(t−1) − e−λ )
7.3. EFFECT OF POLICY ADJUSTMENTS
103
It follows that
1 − pM 0 1 − e−λ
(eλ(1−α+αt−1) − e−λ ) PNP (t) = PNL (1 − α + αt) = + 1 − pM 0 M = p0 + (eλα(t−1) − e−αλ + e−αλ − e−λ ) 1 − e−λ M 1−p0 αλ (1 − e ) 1−e−λ M 1 − p 0 (e−αλ − e−λ ) + (eλα(t−1) − e−αλ ) = pM 0 + 1 − e−λ 1 − e−αλ pM 0
∗ M Define (pM 0 ) = p0 +
1−pM 0 1−e−λ
(e−αλ − e−λ ), then one can verify that:
1 − pM 0 (e−αλ − e−λ ) and 1− = (1 − −λ 1−e 1−(pM )∗ (1 − eαλ ) 1−e0−λ ∗ PNP (t) = (pM (eλα(t−1) − e−αλ ) 0 ) + 1 − e−αλ ∗ (pM 0 )
pM 0 )
∗ NP is the zeromodified Poisson with probability mass at 0 being (pM 0 ) and Poisson pararmeter αλ.
104
CHAPTER 7. FREQUENCY MODELS
Bibliography [1] S. Klugman, H. Panjer and G. Wilmott Loss Models Wiley, New Jersey, USA 2008.
105
Published on Nov 29, 2009
Loss Models study guide for University of Waterloo's Actsci431/831 and Actsc432/832