distributed (visual inspection proved this to be a reasonable assumption). These fitted models can then be used to estimate the uncertainty in IBO rankings. To do so, we simulated practical exams by randomly picking one of the four estimated models. In addition, we assumed that the true abilities of students were distributed as estimated when combining all four practical and the theoretical exam using the t-score method. For each simulation, we used the t-score method to obtain a final score across all simulated practical exams and calculated two statistics summarizing the quality of the obtained scores: 1) the correlation between the obtained scores and the true abilities assumed in the simulation, and 2) the average difference between the rank based on the simulated scores and the true rank given by the true abilities. The results of these simulations are shown in FIGURE 6 .5 for different numbers of practi­ cal exams to be used. Based on this model, we estimate that the correlation between the true abilities of students and the final practical score to be relatively high (median ρ across 1000 simulations was 0. 91) when four practical exams are used, but this still results in a ranking with a high degree of uncertainty as each student was on average 24 ranks away from where he was supposed to end up. While reducing the number of practical exams is expected to worsen the quality of the final ranking considerably, increasing the number of practicals has a more moderate effect. In contrast, a considerable improvement of the ranking could be obtained by designing the practical exams in such a way that the relative abilities of the students appear more uniformly distributed.

0.9

● ● ● ● ● ● ● ● ● ●

50

0.7 50% Quantile 80% Quantile 98% Quantile Median

0.6

0.5 1

4

8

12

16

40 ●

30

● ● ● ●

20

10

● ● ● ● ● ● ●

0

|true − obtained rank|

● ●

0.8

50

|true − obtained rank|

Correlation with true skills

1.0

40 30

● ●

20

● ● ●

10

● ● ● ● ● ● ● ● ●

0 1

4

8

12

16

1

4

8

12

16

# Practical Exams

FIGURE 6 .5  Quality of Ranking:

We simulated practical tests with varying numbers of individual practical exams where the scores of each exam were generated using general linear models fitted to the practical exams of the IBO 2013. The simulations assumed the abilities of students to be distributed as identified by the total score at the IBO 2013. For each simulation, we calculated the correlation between the abilities of students and the obtained final score in the practical test (left), as well as the average difference between the rank resulting from the total score and the rank based on the ability of the students (right). Shown are the median as well as different quantiles of these metrics obtained across 1000 simulations. We then also repeated the simulations assuming that the abilities of students were distributed uniformly within the same range (right). Dashed lines highlight the use of four practical exams, as is currently done.

84 | finalreportIBO

IBO 2013 Final Report
IBO 2013 Final Report