Tekster dissertation

Page 242

Chapter 10

students’ writing performance than only one written text is necessary in order to draw inferences about students’ underlying writing proficiency, and we developed a benchmark rating procedure that lead to more reliable and valid ratings of text quality than what is presently achieved. Increasing our understanding about the assessment of writing is essential for improving writing education for three reasons. First, teachers need to know whether their students perform at a sufficient level. Second, teachers can adapt their lessons to better suit the specific needs of students, either through general classroom instruction, or through individual feedback. Third, reliable and valid assessment of writing performance is necessary to monitor students’ writing progress over time.   As in every study including writing assessments, we had to develop a rating procedure in which multiple raters would evaluate the quality of a large number of students’ written texts, without exceeding raters’ physical or psychological boundaries. The two large-scale intervention studies presented in the current dissertation were especially challenging, as they included a total of 2766 participating students who each wrote nine texts, varying in topic and genre. This resulted in a total of almost 25000 texts of which the quality needed to be rated by at least three raters. Of course, it was not possible to administer all texts to one jury of two or three raters. Even with a benchmark rating procedure, in which raters apparently only need approximately one minute to score the quality of each text, it would take too much time, which would possibly affect the reliability of their scores. Therefore, we decided to use overlapping rater teams (Van den Bergh & Eiting, 1989), in which each rater received three subsamples. In the design of overlapping rater teams writing products are randomly split into subsamples, equaling the number of raters. Each subsample is assigned to multiple raters (i.e., in this study to three raters) according to a prefixed design in which each rater is directly or indirectly linked to each of the other raters. Table 10.1 provides an example of such an overlapping design with five raters and five subsamples. There are various advantages related to the overlapping rater design. First, it decreases the total rating time for each individual rater. Second, the ratings are far more reliable and generalizable, because the scores do not depend on only one jury of raters, but are based on multiple juries (47 juries in study 1, and 18 juries in study 2). Third, the covariance matrix between raters allows for an estimation of the overall reliability, as well as the reliability per inTable 10.1 Design of overlapping rater teams with five raters and five subsamples Subsample

Rater 1

1

2

3

x

x

x

x

x

x

x

x

x

x

x

Rater 2

Rater 3 Rater 4

x

Rater 5

x

242

4

x

5

x


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.