Presentatie Ruud Koning by VSAE

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Rekenen, omdat het kan? (niet voor verspreiding)

nc ep t0

Ruud H. Koning Dept Economics, Econometrics & Finance University of Groningen r.h.koning@rug.nl

4 maart 2019

1 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Computers. . .

Rond 1990: PC mini tower, 486 66Mhz chip, 8MB, harde schijf: 520 MB, VGA kleuren monitor (480Ă&#x2014;640), en zowel een groot als een klein diskette station. Kosten: f 5990. 2 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

nc ep t0

Computers. . .

Nu: Imac 27 inch, 3.8GHz i5 chip, 16GB RAM, 3TB schijf. RAM: 2000, kloksnelheid: 60, opslag: 6000.

3 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Computers. . .

Cluster: nodes (pizza dozen) met 24 cores, 128 GB RAM, of 6 nodes met Nvidia kaarten, of nodes met 48 cores en tot 2TB RAM.

library("gputools") n = 10000

A <- matrix(rnorm(n^2),n) B <- matrix(rnorm(n^2),n) print("cpu time: ") system.time(A %*% B) print("gpu time: ") system.time(gpuMatMult(A, B)) 4 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Computers. . .

Cluster: nodes (pizza dozen) met 24 cores, 128 GB RAM, of 6 nodes met Nvidia kaarten, of nodes met 48 cores en tot 2TB RAM. time: " system elapsed 0.167 48.125 time: " system elapsed 2.049 7.867

nc ep t0

[1] "cpu user 47.986 [1] "gpu user 1.880

Intern verplaatsen van gegevens kost ook tijd!

5 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Computers. . .

Opslag is redelijk goedkoop, maar dat raakt snel privacy. Hoe slaan we gegevens op zodat we die kunnen terugvinden wanneer we ze nodig hebben? In elk geval kunnen we nu individuele gegevens opslaan, we hoeven ons niet te beperken tot geaggregeerde gegevens (gemiddelde, frequentietabellen, etc). Rekenkracht is fenomenaal groot. Parallel rekenen! Simulatie over een gehele portefeuille. We kunnen modellen schatten die vroeger onschatbaar waren. Rekenen is gedemocratiseerd. Het feit dat we iets kunnen uitrekenen betekent overigens niet dat de uitkomst iets voorstelt.

6 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Verliesmodel

nc ep t0

Waarde op tijdstip t Vt = f (t, Zt ), Zt zijn de risicofactoren. Bijvoorbeeld: de waarde van een aandelenportefeuille op enig moment hangt af van het aantal aandelen in portefeuille, en de prijzen van die aandelen. Xt is de verandering in de risicofactoren: Xt = Zt − Zt−1 . Het verlies in periode (t, t + 1) is dus Lt+1 = −(Vt+1 − Vt ) = −(f (t + 1, Xt+1 + Zt ) − f (t, Zt )).

Een lineaire benadering is dan ∂f 0 = at + bt Xt+1 . Lt+1 ≈ − ft (t, Zt ) + Xt+1 ∂Z

7 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Verdeling van de verliezen Lt+1

Xt+1 ∼ N (µ, Σ) dan Lt+1 ∼ N (at + bt0 µ, bt0 Σbt ). Normale verdeling is gesloten onder lineaire transformaties, analytisch vreselijk handig.

Als Xt+1 een elliptische verdeling volgt, geldt nog steeds dat Lt+1 een elliptische verdeling volgt (GHD). Elliptische verdelingen zijn gesloten onder lineaire combinaties, analytisch handig.

8 / 51

40 "t(2.91)" Normal t(2.91)

nc ep t0

density

−0.10

−0.05

0.00

0.05

0.10

daily return

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Verdeling van de verliezen Lt+1

Xt+1 ∼ N (µ, Σ) dan Lt+1 ∼ N (at + bt0 µ, bt0 Σbt ). Normale verdeling is gesloten onder lineaire transformaties, analytisch vreselijk handig.

Als Xt+1 een elliptische verdeling volgt, geldt nog steeds dat Lt+1 een elliptische verdeling volgt (GHD). Elliptische verdelingen zijn gesloten onder lineaire combinaties, analytisch handig. 1 , . . . , X̃ s Monte Carlo simulatie: X̃t+1 t+1 ∼ FX (·; θ) en dan 1 ), . . . , L̃s s L̃1t+1 = f (t + 1, Zt + X̃t+1 t+1 = f (t + 1, Zt + X̃t+1 ), en dan kan je alles numeriek berekenen wat je wilt.

10 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Belangrijke vraag

Hoe brengen we nu de kansverdeling van (veranderingen in) risicofactoren in beeld? FX (·; θ) of

nc ep t0

FX (·; θj ) of FX (·; θi ).

Dus één model voor iedereen, of groepering op de één of andere manier, of iedereen zijn eigen model? De computer doet het wel voor u!

11 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Random forests

fare embarked 32.3208 S 7.6500 S 31.2750 S 25.9292 S 10.5000 S 52.5542 S

nc ep t0

pclass survived sex age sibsp parch 1 Upper No male 61 0 0 2 Lower No male 42 0 0 4 Lower No female 39 1 5 7 Upper Yes female 49 0 0 8 Middle No male 29 0 0 9 Upper Yes male 37 1 1 Wie overleefde de ramp met de Titanic?

13 / 51

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

No 497 339 100%

yes

sex = male

No 418 108 63%

Yes 79 231 37%

age >= 3.5

pclass = Lower

No 67 56 15%

Rekenkracht

fare >= 23 Yes 50 54 12% fare < 16 No 44 38 10% age >= 28 Yes 26 31 7% fare >= 13

No 413 94 61%

Yes 5 14 2%

No 17 2 2%

No 18 7 3%

No 6 2 1%

Yes 20 29 6%

Yes 6 16 3%

Yes 12 175 22%

14 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Random forests

1 2 3 4 5 6

left daughter right daughter split var split point status prediction 2 3 sex 1.00000 1 <NA> 4 5 age 30.50000 1 <NA> 6 7 fare 14.47710 1 <NA> 8 9 sibsp 4.50000 1 <NA> 10 11 parch 4.50000 1 <NA> 12 13 age 14.00000 1 <NA>

nc ep t0

1 2 3 4 5 6

left daughter right daughter split var split point status prediction 2 3 fare 48.30210 1 <NA> 4 5 sex 1.00000 1 <NA> 6 7 embarked 3.00000 1 <NA> 8 9 pclass 3.00000 1 <NA> 10 11 age 6.50000 1 <NA> 12 13 sex 1.00000 1 <NA>

(Er zijn nog 498 anderen, circa 130 regels)

15 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Random forests

De meerderheid wint. Veel bomen, elk vrij instabiel, maar gemiddelde voorspelling is redelijk stabiel. Hoe meten we modelonzekerheid? Hoe transparant is de uitkomst?

Dit was de hokjes aanpak om mensen te classificeren, nu een voorbeeld van een meer â&#x20AC;&#x2122;traditioneleâ&#x20AC;&#x2122; aanpak.

16 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Abstract

nc ep t0

This study proposes a new method to estimate the liabilities from income insurance claims filed by self-employed workers. The first step involves a multistate mixed proportional hazards approach to model self-employed workersâ&#x20AC;&#x2122; disability durations, together with changes in their health condition during their incapacity. Combining the multistate model with simulations yields risk-based estimates of liabilities, both for individual claims and portfolios of claims. We apply back-testing to assess the accuracy of the estimated liabilities for real-life portfolios of claims, for which we make use of a unique data set of income insurance claims provided by a large Dutch insurance company.

17 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Introduction

Risk-based estimates of liabilities are the basis for determining evidence-based underwriting criteria and calculating insurance premiums. Moreover, the Solvency II Directive requires insurance companies to base loss reserves on risk-based estimates of liabilities (SandstroĚ&#x2C6;m, 2011). Two approaches to estimating liabilities: 1

The first approach uses aggregate claim data, which makes it hard to relate person-specific risk factors to individual claim sizes (Zhao and Zhou, 2010).

The second approach is based on a statistical model that captures the relation between individual claim sizes on the one hand and person-specific characteristics or other relevant risk factors on the other hand. 18 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Introduction

nc ep t0

The economic literature has emphasized the importance of individual-specific unobserved heterogeneity (alternatively known as frailty) in modeling individual-specific risks such as unemployment and disability (Van den Berg, 2001). Ignoring unobserved heterogeneity may result in biased model coefficients; in survival models it may also lead to overestimation of the degree of negative duration dependence (Lancaster, 1990). Big data available today may help in implementing the micro approach.

19 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Introduction

The goal of this research is to make a connection between the actuarial and economic literature by accounting for individual-specific unobserved heterogeneity in the estimation of insurance liabilities from individual claim data. Our approach is based on the observation from contract theory that there is information asymmetry between the insurer and the policyholder about the risks incurred by the latter (Chiappori and Salanie, 2003). We show that unobserved heterogeneity allows for a form of experience learning that can reduce this asymmetry, which makes it easier for the insurance company to distinguish between high-risk and low-risk claimants. The experience learning entails that we update the distribution of a claimantâ&#x20AC;&#x2122;s unobserved risk factors on the basis of his or her claim history.

20 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Introduction

nc ep t0

Our approach is general and can be applied to several types of insurance, including disability, long-term care, and unemployment insurance. In this study we use disability insurance (also known as income insurance) for self-employed workers to illustrate our approach. Unobserved heterogeneity has been shown to explain a substantial part of the differences in the return-to-work process between self-employed (see Spierdijk et al., 2009). Examples of unobserved risk factors are risk aversion, motivation to recover, willingness to take prescribed medication, and individual workplace heterogeneity.

21 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Introduction

In our method of calculating liabilities for income insurance the unobserved risk factors account for the phenomenon of inertia: claimants who have been sick for several months have a higher risk of being trapped in long-term disability than similar individuals who have just started a disability spell. The former group of claimants is more likely to have unfavorable unobserved risk factors that account for long and severe disability durations. The longer his or her outstanding claim duration, ceteris paribus, the higher an existing claimant’s risk of suffering from unfavorable unobserved risk factors. We apply Bayes’ rule to extract the information contained in claimants’ ongoing disability duration. Existing claimants’ probability of having unfavorable unobserved risk factors is updated on the basis of their ongoing disability duration, which is a form of experience learning.

22 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Introduction

nc ep t0

By allowing the ongoing disability duration to affect a claimantâ&#x20AC;&#x2122;s future disability, our disability model no longer satisfies the (semi-)Markov property. Consequently, analytical derivation of the expected liabilities from income insurance by means of the Chapman-Kolmogorov equations is not straightforward (Haberman and Pitacco, 1999). We therefore adopt a simulation-based approach to calculating the complete distribution of the liabilities. When applied to income insurance claims of Dutch self-employed, our new method for estimating outstanding claim liabilities results in more accurate best estimates than the traditional method in which unobserved risk factors and experience learning do not play a role.

23 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Introduction

Two main contribbutions:

We propose a solution to the actuarial problem of calculating a best estimate of outstanding claim liabilities. We do so by connecting the economic literature on individual-specific unobserved heterogeneity to the actuarial literature on risk-based estimates of insurance liabilities. We implement our method in the context of disability insurance for self-employed and demonstrate the empirical relevance of allowing for unobserved heterogeneity.

24 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Main conclusions

Main conclusions of the study: Both observed and unobserved risk factors are important in modelling transitions.

Individual-specific unobserved heterogeneity allows for experience learning that reduces information asymmetry. This yields better best estimates.

Information on education, lifestyle, and health status is not included in this study, but may help to reduce the role of unobserved risk factors.

nc ep t0

25 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

General outline

The distribution of the claim liabilities (and thereby the best estimate of these liabilities) depends on, among others, the distribution of the individual-specific unobserved heterogeneity. The frailty distribution is usually subject to certain parametric assumptions and is estimated from a given data set of observed (but possibly right-censored) insurance claims available at time t0 , called the prior distribution. Without experience learning, the best estimate of the insurance liabilities at time t > t0 is calculated using the prior distribution. However, at time t, additional information is available about outstanding claims.

26 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

General outline

nc ep t0

For example, for outstanding disability and unemployment claims, the ongoing claim duration is available. Individuals with long-lasting claims are more likely to have unfavorable frailty that results in long and severe claims. The availability of such additional information allows us to calculate the posterior frailty distribution by means of Bayesâ&#x20AC;&#x2122; rule. Because the posterior distribution is based on the information contained in outstanding claims, it is more specific to longer-lasting claims than the prior distribution. Consequently, the posterior distribution will be be more informative about high-risk and low-risk claimants than the prior frailty distribution. It is therefore expected that the use of the posterior frailty distribution will result in more accurate best estimates of outstanding claim liabilities.

27 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Casus: disability claims

Uncertainty faced by insurer: 1

Claim incidence varies. Some policyholders fall ill and request financial compensation for their loss of income, whereas others stay healthy and never file a claim.

Self-employed customers who file a valid claim vary in two areas, unknown in advance, that affect the benefit payment due: the duration of their incapacity and the benefit income to be paid.

In this paper: focus on second type of uncertainty.

28 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Casus: disability claims

nc ep t0

Because the disability classification can take any discrete value between 0 percent and 100 percent, there are 101 ‘disability states.’ This naturally leads to the concept of multistate disability model (Haberman and Pitacco, 1999). Some aggregation is needed to reduce the number of states. Consider a multistate model with the following three states: healthy (state 0; no benefit payment), less severe illness (state 1; low benefit payment), and severe illness (state 2; high benefit payment). The policyholder moves from one state to another over time, depending on his or her disability state. The resulting three-state multistate model consists of the six transitions 0 → 1, 0 → 2, 1 → 0, 1 → 2, 2 → 0, and 2 → 1. The associated transition intensities (i.e., the transition specific hazard rates) are denoted λij (t|x, vij ) for i 6= j, given a K -dimensional vector of observed individual-specific risk factors x and unobserved individual-specific and transition-specific heterogeneity vij .

29 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Casus: disability claims

Disability path Sτ = (S(0), S(1), . . . , S(τ )), τ plannnings horizon, and S(s) ∈ {0, 1, 2} state in month s. Individual denoted by k, premiums are Pτk (t)

τ −1 X

v s−t pk ,

s=t

and payments are Bτk (t) =

τ X

v s−t bk (s, S(s − 1)).

s=t+1

bk (s, S(s − 1)) replacement income paid at time s to claimant k, who is in S(s − 1) in period s − 1. The model focuses on dynamics of S(t). 30 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Casus: disability claims

nc ep t0

Path of payments is Bτk = (Bτk (0), Bτk (1), . . . , Bτk (τ )) and premium path Pτk is defined similarly. Quantity of interest: Dτk = Bτk − Pτk = (Dτk (0), Dτk (1), . . . , Dτk (τ )) is the path of discounted net benefit payments. We want to forecast Dτk (t). We use the expected value of Dτk (t) as a forecast; this expected value is our best estimate of the discounted liabilities at time t. In calculating the best estimate, we want to take into account the information contained in the self-employed’s disability history. How can we incorporate individual heterogeneity?

31 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Casus: disability claims

Consider a self-employed person at time t who has been sick for m months. Using Bayes theorem, we can obtain the posterior frailty distribution given the information about the ongoing disability duration. In the posterior distribution of the unobserved heterogeneity v , certain values correspond to relatively fast recovery, whereas other values are associated with slower recovery. A self-employed person who has been disabled for a long period will have a relatively high posterior probability associated with slow recovery, and vice versa. We therefore use the posterior frailty distribution to calculate the expected DNBP.

32 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Casus: disability claims

nc ep t0

Haberman and Pitacco 1999 describe an analytical method for calculating expected liabilities in multistate models for disability. The main ingredients of their approach are the transition probabilities. The relation between the transition probabilities and intensities is captured by the Kolmogorov forward and backward differential equations. For a (semi-)Markov model it is possible to solve the transition probabilities analytically using these differential equations (Haberman and Pitacco, 1999). The experience learning we propose allows the ongoing transition duration to affect a claimant’s future disability path, due to which our disability model no longer satisfies the (semi-)Markov property. We therefore use a simulation-based approach to estimating liabilities.

33 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Statistical model

Model for the transition rates from i to j (MPH): λij (t|x, vij ) = λ0ij (t) exp(x 0 βij + vij )

Model for the unobserved heterogeneity: vij,mn = aij + bij w1,m + cij w2,n .

w1 , w2 discrete random variables, two factor model. More factors could be added, but additional factors turn out to be insignificant.

34 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Data

nc ep t0

We use a unique flow sample of approximately 20,000 disability claims by Dutch self-employed people between August 2004 and July 2010. Each claim requires a medical certificate. After filtering out inconsistent claims, the final sample consists of 19,285 claims by 14,065 different policyholders, or an average of 1.4 claims per person. For each claim, we know the total disability duration, measured in months. The disability classification during the personâ&#x20AC;&#x2122;s incapacity is time-varying, with monthly updates.

35 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Data

Sex, age at disability, socioeconomic status, insured income.

1 2

Occupational class, labor intensity of profession.

Disorder (psychological, locomotive, cardiovascular, other).

Region.

Year in which illness started (cohort).

Four transitions, no data on intensity. Three states, W/U (work/unpaid sick leave), P50 (26%-50% disability), and P100 (51%-100% disability).

36 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

nc ep t0

Transitions

30736 transitions in the data set

37 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Transitions: P50 to W/U (recovery)

The recovery rate decreases substantially with age; the full recovery rate of a 50-year-old persons is 67% that of a comparable 30-year-old person. The recovery rate of a self-employed person with a cardiovascular or psychological disorder is significantly lower (36% and 44%, respectively) than that of a similar individual suffering from a different disease. Self-employed in the South of the Netherlands have a 8% higher recovery rate than self-employed living elsewhere in the Netherlands. However, gender, insured income, and occupational class do not significantly affect the full recovery rate.

38 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Transitions: P50 to P100 (fall-babck)

nc ep t0

If claimants suffer from a cardiovascular, psychological, or locomotive disease, they have a relatively low fall-back rate (23% 16%, and 17% lower, respectively, than other disorders). Hence, a self-employed person with such a disorder who falls back to the worst health state does so only after he or she has spent a relatively long period in P50. The occupational class also affects this fall-back rate: people working in the industrial sector fall back relatively faster, at a rate 14% higher than that of other professions. The fall-back rate of self-employed southerners is 16% higher than that of claimants living elsewhere in the Netherlands. Gender, age, and insured income do not significantly affect the fall-back rate.

39 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Transitions: P100 to W/U (recovery from severe illness)

Men recover significantly faster than women; the difference in hazard rates equals 40%. Further- more, the recovery rate decreases with age. The full recovery rate of a 50-year-old person is 64% of that of a 30-year-old counterpart. People suffering from cardiovascular, psychological, or locomotive diseases show relatively slow full recoveries in comparison to people with other disorders. Specifically, the recovery rate is 68% lower for a cardiovascular disorder, 80% lower for a psychological disease, and 27% lower in the case of a locomotive problem. That is, full recovery from a severe psychological or cardiovascular disorder is a particularly slow process. Self-employed retailers recover relatively slowly in comparison with other sectors (26% lower). The recovery rate of claimants living in the South is 9% higher than that of self-employed people living in other parts of the country.

40 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Transitions: P100 to P50 (partial recovery)

nc ep t0

Men show significantly faster partial recovery than women; their recovery rate is 15% higher. The partial recovery rate of a self-employed person of age 50 is 8% lower than that of a similar claimant of age 30 years. Furthermore, the partial recovery rate of a claimant with an insured income of 50,000 euro is 9% higher than that of a comparable self-employed person with an insured income of only 25,000 euro. Claimants suffering from cardiovascular or psychological diseases feature relatively slow partial recovery in comparison with other disorders: 26% and 49% lower, respectively. In contrast, people with a locomotive disorder are characterized by relatively fast partial recovery (23% higher). Furthermore, self-employed agricultural workers show relatively fast partial recovery relative to other professions (41% higher), and again we can establish strong regional effects. Relative to the rest of the Netherlands, the partial recovery rate of claimants living in the South is 16% higher.

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

41 / 51

Rekenen en verzekeren

Benchmark individual

Our benchmark is a self-employed, 30-year-old man, with an initial disability classification equal to 51-100% (P100), whose incapacity started in 2005 and whose insured income equals 25,000 euro per year, working in the agricultural sector, not in the South of the Netherlands, suffering from a locomotive disorder. The start of the disability is at t = 0. Because the time horizon is 1 year, the expected DNBP equals 0 at t = 12, and we report the expected DNBP up to t = 11.

42 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Posterior distribution of frailty

Posterior distribution of frailty:

nc ep t0

Pr(w1 = m, w2 = n|M(t) > l) = Pr(M(t) > l|w1 = m, w2 = n)p1,m p2,n P m,n=0,1 Pr(M(t) > l|w1 = m, w2 = n)p1,m p2,n

Statistical learning. Frailty will be more unfavourable if you have been ill for a longer time.

43 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Main conclusions

Main conclusions of the study: 1

Both observed and unobserved risk factors are important in modelling transitions.

Individual-specific unobserved heterogeneity allows for experience learning that reduces information asymmetry. This yields better best estimates.

Information on education, lifestyle, and health status is not included in this study, but may help to reduce the role of unobserved risk factors.

44 / 51

nc ep t0 Verzekeren is het delen van het geluk van een grote groep met de pech van een paar.

Absolute performance 2015, by municipality

Observed â&#x2C6;&#x2019; expected costs 600 300 0 â&#x2C6;&#x2019;300

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

nc ep t0

Vragen

47 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Vragen

48 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

nc ep t0

Vragen

49 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Vragen

50 / 51

Rekenkracht

Random forests

Introduction

Methodology

Empirical analysis

Claim liabilities

Rekenen en verzekeren

Vragen

’ Hoe transparant moeten uitkomsten zijn?

nc ep t0

Hoe robuust zijn de resultaten?

Hoe regel je de governance van algorithmen? Zijn er ’basis’ verzekeringen?

Welke processen lenen zich voor ’black box’ benaderingen?

51 / 51