4.5 The Importance of Item Piloting

from Primer on Large-Scale Assessments of Educational Achievement

Glossary of Technical Terms

Figures 4.1 and 4.2 show open-ended and multiple-choice reading literacy items developed under the SEA-PLM assessment framework. As discussed in chapter 9, this regional large-scale assessment measures reading literacy using different text types (for example, narrative, descriptive, and persuasive) and drawing on different cognitive processes linked to reading comprehension (such as locating information in a text, interpreting information, and reflecting). The open-ended item in figure 4.1 requires the student to use reading skills to compare pieces of information about two countries. The multiple-choice item in figure 4.2 presents a narrative text and requires the student to locate information about the action that one of the characters takes.

In addition to the item development and content review process by subject matter experts, national large-scale assessment teams must pilot the items; the pilot helps determine the psychometric properties of each item and allows those with adequate levels of difficulty and discrimination to be selected. The pilot study is also a good opportunity to check for student understanding of each item in the assessment and address any content-related problems before the final assessment administration (box 4.5).

BOX 4.5. The Importance of Item Piloting

Before the final version of test booklets is constructed, it is important to pilot the proposed test items to identify those that provide the most accurate and reliable evidence on what students know and can do. A pilot should be conducted several months in advance of test administration to allow sufficient time for data collection and analysis and to create, print, and distribute the final version of the test. The piloting process will help identify inappropriate items that should be omitted from the final version of the test, that may need revision before they can be included, and that are ready to be included in the final assessment. For instance, items that are extremely easy or extremely difficult for students at the target age or grade may need to be removed. Items that are unclear or have poor-quality distractors may improve if they are revised. It is also important to determine whether items perform similarly across population subgroups; no item should be systematically easier or more difficult for students of a particular sociodemographic group. If it is, it may need to be revised or removed from the final version of the test. It is common to pilot two to three times as many items as will be included in the final version of the test. For instance, if the final instrument will include 30 items per subject, at least 60 items should be piloted for each subject. In addition to supporting the selection of items for the final test form, a well-designed pilot study provides the assessment team with an opportunity to improve the instructions for assessment administrators, determine the time it takes participants to answer the test items, identify student engagement during the assessment, strengthen scoring rubrics for open-ended items, and refine data collection procedures before test administration. Anderson and Morgan (2008) provide an in-depth description of how to plan for, design, and conduct a pilot.

Source: Adapted from Anderson and Morgan 2008.

4.5 The Importance of Item Piloting

Next Article

Glossary of Technical Terms

More articles from this publication:

Glossary of Technical Terms

Asia Primary Learning Metrics 2019 Assessment

Primary Learning Metrics 2019 Reading Literacy Assessment

Benchmarks for Grades 4 and 6

Metrics 2019 Mathematical Literacy Assessment

CONFEMEN 2014

Programme d’Analyse des Systèmes Éducatifs de la CONFEMEN

Assessment, 2000–18

Eastern Africa Consortium for Monitoring Educational Quality

This article is from:

Primer on Large-Scale Assessments of Educational Achievement