6. 1. 5 Statistical Analysis of the Obtained Scores Total Scores
Overall, practical exams 4 (Syst) and 2 (Plant) turned out to be the easiest with a median score of 64% and 61% of the maximum attainable score, respectively. In contrast, the median scores of practicals 3 (Etho) and 1 (Cell) were only 52% and 43%, respectively. In contrast to high school or university exams, the exams at an IBO are not designed to test a particular set of skills, but to get a reliable final ranking. As a result, the distribution of the obtained scores is desired to be broad and ideally uniform. As shown in FIGURE 6 .2 the distributions of the scores obtained in the four practical exams of the IBO 2013 are all relatively broad, but nonetheless clearly non-uniform. This is well in line with results from previous IBOs, highlighting the difficulty to design even more discriminating exams.
FIGURE 6 .2 Distribution of scores:
Figure 2 – Distribution of scores: The cumulative (left) and actual distribution of points obtained by the participantsThe are shown for each practical exam. The pointsofwere standardized that 1 cumulative (left) and actual distribution points obtained bysuch the participants are shown corresponds to the maximum number of points that could be obtained in an exam. Dashed linestoindicate for each practical exam. The points were standardized such that 1 corresponds the maximum the median scores. number of points that could be obtained in an exam. Dashed lines indicate the median scores.
Among the exams, practical 3 (Etho) had the smallest variance (2.4% after standardizing the scores by the maximum number of attainable points), practicals 1 (Cell ) and 4 (Syst ) the largest (4.6% and 4.0%, respectively). The larger variance observed for practical 1 (Cell) is partly explained by the bi-modal distribution of the obtained scores, which is itself due to a large amount of points that could only be obtained when conducting a particular experiment (pull down of cells expressing procycline) very carefully. It is thus expected that this practical suffered from increased stochasticity, probably reducing its power to accurately discriminate among students. These results hence suggest that the design of this practical could have been optimized by better balancing the attribution of points such that a single step would have less impact on the total score.
Since each practical exam is an independent but noisy assessment of the skill of students, it is of interest to estimate the variance in the data that is informative about the performance of students (in contrast to being purely stochastic). A first indication can be obtained by look
80 | finalreportIBO