CONSTRUCTING TEST ITEMS
TEST ITEM TYPES • • • • • •
Multiple choice True or False Completion/Short Answers Matching Essay Questions Performance Assessment
MULTIPLE CHOICE • Multiple-choice items can be used to measure knowledge outcomes and various types of learning outcomes. • They are most widely used for measuring knowledge, comprehension, and application outcomes. • The multiple-choice item provides the most useful format for measuring achievement at various levels of learning. • When selection-type items are to be used (multiple-choice, true-false, matching, check all that apply) an effective procedure is to start each item as a multiple-choice item and switch to another item type only when the learning outcome and content make it desirable to do so. For example (1) when there are only two possible alternatives, a shift can be made to a true-false item; and (2) (2) when there are a number of similar factors to be related, a shift can be made to a matching item.
STRENGTHS • • • • • • • • • • •
Learning outcomes from simple to complex can be measured. Highly structured and clear tasks are provided. A broad sample of achievement can be measured. Incorrect alternatives provide diagnostic information. Scores are less influenced by guessing than true-false items. Scores are more reliable than subjectively scored items (e.g., essays). Scoring is easy, objective, and reliable. Item analysis can reveal how difficult each item was and how well it discriminated between the strong and weaker students in the class Performance can be compared from class to class and year to year Can cover a lot of material very efficiently (about one item per minute of testing time). Items can be written so that students must discriminate among options that vary in degree of correctness. Avoids the absolute judgments found in True-False tests.
LIMITATIONS • • • • • • • • • • • •
Constructing good items is time consuming. It is frequently difficult to find plausible distracters. This item is ineffective for measuring some types of problem solving and the ability to organize and express ideas. Real-world problem solving differs – a different process is involved in proposing a solution versus selecting a solution from a set of alternatives. Scores can be influenced by reading ability. There is a lack of feedback on individual thought processes – it is difficult to determine why individual students selected incorrect responses. Students can sometimes read more into the question than was intended. Often focus on testing factual information and fails to test higher levels of cognitive thinking. Sometimes there is more than one defensible “correct” answer. They place a high degree of dependence on the student’s reading ability and the instructor’s writing ability. Does not provide a measure of writing ability. May encourage guessing.
Helpful Hints •
• • • • • • • • •
• • •
Base each item on an educational or instructional objective of the course, not trivial information. Try to write items in which there is one and only one correct or clearly best answer. The phrase that introduces the item (stem) should clearly state the problem. Test only a single idea in each item. Be sure wrong answer choices (distracters) are at least plausible. Incorporate common errors of students in distracters. The position of the correct answer should vary randomly from item to item. Include from three to five options for each item. Avoid overlapping alternatives (see Example 3 following). The length of the response options should be about the same within each item (preferably short). There should be no grammatical clues to the correct answer. Format the items vertically, not horizontally (i.e., list the choices vertically) The response options should be indented and in column form.
• Word the stem positively; avoid negative phrasing such as “not” or “except.” If this cannot be avoided, the negative words should always be highlighted by underlining or capitalization: Which of the following is NOT an example …… • Avoid excessive use of negatives and/or double negatives. • Avoid the excessive use of “All of the above” and “None of the above” in the response alternatives. • In the case of “All of the above”, students only need to have partial information in order to answer the question. Students need to know that only two of the options are correct (in a four or more option question) to determine that “All of the above” is the correct answer choice. Conversely, students only need to eliminate one answer choice as implausible in order to eliminate “All of the above” as an answer choice. • Similarly, with “None of the above”, when used as the correct answer choice, information is gained about students’ ability to detect incorrect answers. However, the item does not reveal if students know the correct answer to the question.
Multiple-Choice Item Writing Guidelines Multiple-choice questions typically have 3 parts: STEM, KEY & DISTRACTERS
Procedural Rules: • Use either the best answer or the correct answer format. • Best answer format refers to a list of options that can all be correct in the sense that each has an advantage, but one of them is the best. • Correct answer format refers to one and only one right answer. • Format the items vertically, not horizontally (i.e., list the choices vertically) • Allow time for editing and other types of item revisions. • Use good grammar, punctuation, and spelling consistently. • Minimize the time required to read each item. • Avoid trick items. • Use the active voice. • The ideal question will be answered by 60-65% of the tested population. • Have your questions peer-reviewed. • Avoid giving unintended cues – such as making the correct answer longer in length than the distracters.
Content-related Rules: • Base each item on an educational or instructional objective of the course, not trivial information. • Test for important or significant information. • Focus on a single problem or idea for each test item. • Keep the vocabulary consistent with the examinees’ level of understanding. • Avoid cueing one item with another; keep items independent of one another. • Use the author’s examples as a basis for developing your items. • Avoid overly specific knowledge when developing items. • Avoid textbook, verbatim phrasing when developing the items. • Avoid items based on opinions. • Use multiple-choice to measure higher level thinking. • Be sensitive to cultural and gender issues. • Use case-based questions that use a common text to which a set of questions refers.
Stem Construction Rules: • State the stem in either question form or completion form. • When using a completion form, don’t leave a blank for completion in the beginning or middle of the stem. • Ensure that the directions in the stem are clear, and that wording lets the examinee know exactly what is being asked. • Avoid window dressing (excessive verbiage) in the stem. • Word the stem positively; avoid negative phrasing such as “not” or “except.” If this cannot be avoided, the negative words should always be highlighted by underlining or capitalization: Which of the following is NOT an example …… • Include the central idea and most of the phrasing in the stem. • Avoid giving clues such as linking the stem to the answer (…. Is an example of an: test-wise students will know the correct answer should start with a vowel)
General Option Development Rules: • • • • • • • • • •
• • • •
Place options in logical or numerical order. Use letters in front of options rather than numbers; numerical answers in numbered items may be confusing to students. Keep options independent; options should not be overlapping. Keep all options homogeneous in content. Keep the length of options fairly consistent. Avoid, or use sparingly, the phrase all of the above. Avoid, or use sparingly, the phrase none of the above. Avoid the use of the phrase I don’t know. Phrase options positively, not negatively. Avoid distracters that can clue test-wise examinees; for example, absurd options, formal prompts, or semantic (overly specific or overly general) clues. Avoid giving clues through the use of faulty grammatical construction. Avoid specific determinates, such as never and always. Position the correct option so that it appears about the same number of times in each possible position for a set of items. Make sure that there is one and only one correct option.
Distracter (incorrect options) Development Rules: • • • • • • • • • •
Use plausible distracters. Incorporate common errors of students in distracters. Avoid technically phrased distracters. Use familiar yet incorrect phrases as distracters. Use true statements that do not correctly answer the item. Avoid the use of humor when developing options. Distracters that are not chosen by any examinees should be replaced. Suggestions for Writing Good Multiple Choice Items: Present practical or real-world situations to the students. Present the student with a diagram of equipment and ask for application, analysis or evaluation. • Present actual quotations taken from newspapers or other published sources and ask for the interpretation or evaluation of these quotations. • Use pictorial materials that require students to apply principles and concepts. • Use charts, tables or figures that require interpretation.
General Guidelines to Writing Test Items • • • • • • • • • • • • •
Begin writing items well ahead of the time when they will be used; allow time for revision. Match items to intended outcomes at the proper difficulty level to provide a valid measure of the instructional objectives. Be sure each item deals with an important aspect of the content area and not with trivia. Be sure that the problem posed is clear and unambiguous. Be sure that each item is independent of all other items (i.e., a hint to an answer should not be unintentionally embedded in another item). Be sure the item has one correct or best answer on which experts would agree. Prevent unintended clues to the answer in the statement or question (e.g., grammatical inconsistencies such as ‘a’ or ‘an’ give clues). Avoid duplication of the textbook in writing test items; don’t lift quotes directly from any textual materials. Avoid trick or catch questions in an achievement test. (Don’t waste time testing how well the student can interpret your intentions). On a test with different question formats (e.g., multiple choice and True-False), one should group all items of similar format together. Questions should follow an easy to difficult progression. Space the items to eliminate overcrowding. Have diagrams and tables above the item using the information, not below.
Examples & Tips Below are some strategies to reduce the cognitive load of your test items. 1. Keep the stem simple, only including relevant information. Example: Change [Stem]: The purchase of the Louisiana Territory, completed in 1803 and considered one of Thomas Jefferson's greatest accomplishments as president, primarily grew out of our need for a. the port of New Orleans* b. helping Haitians against Napoleon c. the friendship of Great Britain d. control over the Indians To [Stem]: The purchase of the Louisiana Territory primarily grew out of our need for a. the port of New Orleans* b. helping Haitians against Napoleon c. the friendship of Great Britain d. control over the Indians *an asterisk indicates the correct answer. Any additional information that is irrelevant to the question, such as the phrase "completed in 1803â€Ś," can distract or confuse the student, thus providing an alternative explanation for why the item was missed. Keep it simple.
2. Keep the alternatives simple by adding any common words to the stem rather than including them in each alternative. Example: Change When your body adapts to your exercise load, a. you should decrease the load slightly. b. you should increase the load slightly.* c. you should change the kind of exercise you are doing. d. you should stop exercising. To When your body adapts to your exercise load, you should a. decrease the load slightly. b. increase the load slightly.* c. change the kind of exercise you are doing. d. stop exercising. Instead of repeating the phrase "you should" at the beginning each alternative add that phrase to the end of the stem. The less reading the student has to do the less chance there is for confusion. 3. Put alternatives in a logical order. Example: Change According to the 1991 census, approximately what percent of the United States population is of Spanish or Hispanic descent? a. 25% b. 39% c. 2% d. 9%* To a. 2% b. 9%* c. 25% d. 39% The more mental effort (or cognitive load) that students have to use to make sense of an item the more likely a comprehension error can occur that would provide another rival explanation. By placing the alternatives in a logical order the reader can focus on the content of the question rather than having to reorder the items mentally. Although such reordering might require a limited amount of cognitive load, such load is finite, and it does not take much additional processing to reach the point where concentration is negatively impacted. Thus, this guideline is consistently recommended (Haladyna, Downing, & Rodriguez, 2002).
4. Limit the use of negatives (e.g., NOT, EXCEPT). Example: Change Which of the following is NOT true of the Constitution? a. The Constitution sets limits on how a government can operate b. The Constitution is open to different interpretations c. The Constitution has not been amended in 50 years* To Which of the following is true of the Constitution? a. The Constitution has not been amended in 50 years b. The Constitution sets limits on how a government can operate* c. The Constitution permits only one possible interpretation Once again, trying to determine which answer is NOT consistent with the stem requires more cognitive load from the students and promotes the likelihood of more confusion. If that additional load or confusion is unnecessary it should be avoided (Haladyna, Downing, & Rodriguez, 2002). If you are going to use NOT or EXCEPT, the word should be highlighted in some manner so that students recognize a negative is being used. 5. Include the same number of alternatives for each item. The more consistent and predictable a test is the less cognitive load that is required by the student to process it. Consequently, the student can focus on the questions themselves without distractions. Additionally, if students must transpose their answers onto a score sheet of some kind, there is less likelihood of error in the transposition if the number of alternatives for each item is always the same.
Reducing the Chance of Guessing Correctly â€˘ It is easy to inadvertently include clues in your test items that point to the correct answer, help rule out incorrect alternatives or narrow the choices. â€˘ Any such clue would decrease your ability to distinguish students who know the material from those who do not, thus, providing rival explanations.
Keep the grammar consistent between stem and alternatives. Example: Change What is the dietary substance that is often associated with heart disease when found in high levels in the blood? a. glucose b. cholesterol* c. beta carotene d. proteins To a. glucose b. cholesterol* c. beta carotene d. protein Obviously, "proteins" is inconsistent with the stem since it is singular and the others are plural. However, it can be easy for the test writer to miss such inconsistencies. As a result, students may more easily guess the correct answer without understanding the concept - a rival explanation.
Avoid including an alternative that is significantly longer than the rest. Example: Change What is the best reason for listing information sources in your research assignment? a. It is required b. It is unfair and illegal to use someone's ideas without giving proper credit* c. To get a better grade d. To make it longer To a. It is required by most teachers b. It is unfair and illegal to use someone's ideas without giving proper credit* c. To get a better grade on the project d. So the reader knows from where you got your information Students often recognize that a significantly longer, more complex alternative is commonly the correct answer. Even if the longer alternative is not the correct answer, some students who might otherwise answer the question correctly could be misled by this common clue and select the wrong answer. So, to be safe and avoid a rival explanation, keep the alternatives similar in length.
Make all distracters plausible. Example: Change Lincoln was assassinated by a. Lee Harvey Oswald b. John Wilkes Booth* c. Oswald Garrison Villard d. Ozzie Osbourne To
Lincoln was assassinated by a. Lee Harvey Oswald b. John Wilkes Booth* c. Oswald Garrison Villard d. Louis Guiteau If students can easily discount one or more distractors (obviously Ozzie Osbourne does not belong) then the chance of guessing is increased, reducing the discriminability of that item. There is some limited evidence that including humor on a test can have certain benefits such as reducing the anxiety of the test-takers (Berk, 2000; McMorris, Boothroyd, & Pietrangelo, 1997). But humor can be included in a manner that does not reduce the discriminability of the item. For example, the nature of the question in the stem may be humorous but still addresses the material in a meaningful way.
Avoid giving too many clues in your alternatives. Example: Change "Yellow Journalism" is associated with what two publishers? a. Adolph Ochs and Martha Graham b. William Randolph Hearst and Joseph Pulitzer* c. Col. Robert McCormick and Marshall Field III d. Michael Royko and Walter Cronkite To a. Adolph Ochs and Martha Graham b. William Randolph Hearst and Joseph Pulitzer* c. Joseph Pulitzer and Adolph Ochs d. Martha Graham and William Randolph Hearst Since both of the publishers in choice "b" are associated with yellow journalism and none of the other people mentioned is, the student only has to know of one such publisher to identify that "b" is the correct answer. That makes the item easier than if just one name is listed for each alternative. To make the question more challenging, at least some of the distracters could mention one of the correct publishers but not the other as in the second example (e.g., in distracter "c" Pulitzer is correct but Ochs is not). As a result, the student must recognize both publishers associated with yellow journalism to be certain of the correct answer.