Outcome Measurement in Brain Injury

Page 8

Item Response Theory and Postacute Brain Injury Rehabilitation Outcome Measurement by Jacob Kean, PhD, James F. Malec, PhD

If you pick up the newspaper to help decide what movie to go see this coming weekend, you may favor the critic’s 4-star drama over the 3-star comedy. On the critic’s 4-star movie scale, the drama is a better movie. On the other hand, if you are a fan of comedies, you may choose the 3-star movie instead because while it may not be “outstanding” in the critic’s eye, it’s not a 1-star waste of celluloid, either. Ghostbusters may not have been Citizen Kane, but it beat the pants off of Ishtar, right? The number of stars awarded by the critic gives us some idea of how the movie stacks up against others, independent of genre. Though brain injury professionals are not faultfinding film pundits, we too are critics in the sense that we also use numeric rating scales to measure performance. For instance, we could judge a patient’s completely independent dressing as a 7, supervised dressing as a 5, and moderately assisted dressing as a 3. We use these numbered scales like critics use stars: to give us an idea how one patient’s functional performance stacks up against another, irrespective of a particular patient. On both the film and clinical scales, the numbers on the scale represent a relative order, not a numerical value. For example, we think the 4-star drama is better than the 3-star comedy, but not 1 unit greater quality. Likewise, we know a patient has improved if she has moved from being a “mod” assist to being completely independent on a given task. However, we cannot say she has made 4 units of improvement (i.e., from 3 to 7 on a 7-point scale) since we don’t know the value of the unit. In other words, a 7 is better than a 3 (which is better than a 2) but we can’t say exactly how much better. These kinds of scales are called ordinal scales and they produce rankings. They are useful in situations where the order matters but not the difference between values. When you want to know how much, ordinal measures fail because the numbers are really only placeholders – like stars. More or Less. In many situations, the ranking produced by ordinal measures is acceptable for the intended purpose, such as the star ratings given to films. They are subjective snapshots used 8

BRAIN INJURY PROFESSIONAL

to guide low-stakes decisions. In other cases, ordinal measures are used because more objective assessment of areas of interest is seemingly impossible. For instance, although quantifying rehabilitation effectiveness is a high-stakes activity with important consequences for patients and families, as well as providers and insurers, how can we translate abstract concepts such as performance in outcome domains into mathematical units? To answer this, consider another abstract concept: the quality of a basketball team. Suppose the University of Kentucky men’s basketball team is ranked #2 and the Kansas University men’s team is ranked #4. From this ranking, we know pollsters think the Kentucky and one other team are better than Kansas. However, if casinos set the odds of Kentucky winning the national championship at 2:1 and the odds of Kansas winning the national championship at 4:1, we know Vegas thinks Kentucky is precisely twice as likely to win it all. The odds of success are an interval scale, which is a higher level of measurement than an ordinal (i.e., ranking) scale because it tells us not only who is better but also by how much. Thus, a solution to interval-level assessment of abstract concepts is the probability of success. This logic has been used in educational measurement for the past 50 years. Item response theory (IRT), pioneered by Georg Rasch and Frederick Lord, treats the encounter between a student and each test item as a competition. In this kind of model, more able students have a greater probability of correct response to items, whereas less able students have lesser probability of correct response to the same items. As the competition between students and items wages on (i.e., as item responses are accumulated), we learn about not only the ability of the students taking the test but also the difficulty of the items. Just like the casino can set the odds of winning a championship based on the performance of teams during the regular season, IRT “sets the odds” of a respondent producing a successful response to an item. Once the difficulty of the items has been estimated in hundreds of student vs. item “com-


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.