Page 1

U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión.

Instructional System Development - Evaluation Phase - Chapter VI This phase is ongoing throughout the entire ISD process. That is, it is performed during the analysis, design, development, and implementation phases. It is also performed after the learners return to their jobs. Its purpose is to collect and document learner performance in a training course, as well as on the job. The goal is to fix problems and make the system better, not to lay blame. The most exiting place in teaching is the gap between what the teacher teaches and what the student learns. This is where the unpredictable transformation takes place, the transformation that means that we are human beings, creating and dividing our world, and not objects, passive and defined - Alice Reich (1983). Evaluations help to measure Reich's gap by determining the value and effectiveness of

Introducción al Diseño Instruccional

a learning program. It uses assessment and validation tools to provide data for the evaluation. Assessment is the measurement of the practical results of the training in the work environment; while validation determines if the objectives of the training goal were met. Bramley and Newby identify five main purposes of evaluation: 1. Feedback - Linking learning outcomes to objectives and providing a form of quality cont rol. 2. Control - Making links from training to organizational activities and to consider cost effectiveness. 3. Research - Determining the relationships between learning, training, and the transfer of training to the job. 4. Intervention - The results of the evaluation influence the context in which it is occurring. 5. Power games - Manipulating evaluative data for organizational politics. Evaluations are normally divided into two broad catagories: formative and summative.


U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión. Formative Formative evaluation (also known as internal) is a method of judging the worth of a program while the program activities are forming (in progress). This part of the evaluation focuses on the process. Thus, formative evaluations are basically done on the fly. They permit the learner and the instructor to monitor how well the instructional objectives are being met. Its main purpose is to catch deficiencies so that the proper intervention can take place. This allows the learner to master the required skills and knowledge. Formative evaluation is also useful in analyzing learning materials, student learning and achievements, and teacher effectiveness.... Formative evaluation is primarily a building process which accumulates a series of components of new materials, skills, and problems into an ultimate meaningful whole. - Wally Guyot (1978).

Introducción al Diseño Instruccional

Summative The summative evaluation (also know as external) is a method of judging the worth of a program at the end of the program activities (summation). The focus is on the outcome. If we refer to Kirkpatrick's four levels of evaluation, levels one and two (reactive and learning) are formative evaluations while levels three and four (performance and impact) are summative evaluations. The reactive evaluation is a tool to help determine if the objectives can be reached, the learning evaluation is a tool to help reach the objectives, the performance evaluation is a tool to see if the objectives have actually been met, while the impact evaluation is a tool to judge the value or worth of the objectives. Thus, there are four major break points. The various instruments used to collect the data are questionnaires, surveys, interviews, observations, and testing. The model or methodology used to gather the data should be a specified step- by- step procedure. It should be carefully designed and executed to ensure the data is accurate and valid. Questionnaires are the least expensive procedure for external evaluations and can be used to collect large samples of graduate information. They should be trialed before using to ensure the recipients of the questionnaire understand their operation the way the designer intended. When designing questionnaires, keep in mind the most


U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión.

important feature is the guidance given for its completion. All instructions should be clearly stated...let nothing be taken for granted.

Revise System Once a training deficiency has been noted, the ISD process is repeated to correct the deficiency. This does not mean that the entire training program is rebuilt -- just the portions that had deficiencies or will be affected by the changes.

The Four Levels of Training Evaluation

Introducción al Diseño Instruccional

Perhaps the best known training methodology is Kirkpatrick's Four Level Evaluation Model (1994) of reacton, learning, performance, and impact. The chart below shows how the evaluation process fits together:


Introducción al Diseño Instruccional

U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión.

Level One - Reaction As the word implies, evaluation at this level measures how the learners react to the training. This level is often measured with attitude questionnaires that are passed out after most training classes. This level measures one thing: the learner's perception (reaction) of the course. Learners are keenly aware of what they need to know to accomplish a task. If the training program fails to satisfy their needs, a determination should be made as to whether it's the fault of the program design or delivery.


U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión.

This level is not indicative of the training's performance potential as it does not measure what new skills the learners have acquired or what they have learned that will transfer back to the working environment. This has caused some evaluators to down play its value. However, the interest, attention and motivation of the participants are critical to the success of any training program. People learn better when they react positively to the learning environment. When a learning package is first presented, rather it be e- learning, classroom training, CBT, etc., the learner has to make a decision as to whether he or she will pay attention to it. If the goal or task is judged as important and doable, then the learner is normally motivated to engage in it (Markus & Ruvulo, 1990). However, if the task is presented as low- relevance or there is a low probability of success, then a negative effect is generated and motivation for task engagement is low. This differs somewhat from Kirkpatrick. He writes, "Reaction may best be considered as how well the trainees liked a particular training program" (1996). However, the less

Introducción al Diseño Instruccional

relevance the learning package is to a learner, then the more effort that has to be put into the design and presentation of the learning package. That is, if it is not relevant to the learner, then the learning package has to "hook" the learner through slick design, humor, games, etc. This is not to say that design, humor, or games are not important. However, their use in a learning package should be to promote the "learning process," not to promote the "learning package" itself. And if a learning package is built of sound design, then it should be help the learners to fix a performance gap. Hence, they should be motivated to learn! If not, something went dreadfully wrong during the planning and building processes! So if you find yourself having to hook the learners through slick design, then you probably need to reevaluate the purpose of the learning program. For more information on reaction, see Self- System. Level Two - Learning This is the extent to which participants change attitudes, improve knowledge, and increase skill as a result of attending the program. It addresses the question: Did the participants learn anything? The learning evaluation require post-testing to ascertain what skills were learned during the training. In addition, the post-testing is only valid when combined with pre - testing, so that you can differentiate between what they already knew prior to training and what they actually learned during the training program.


U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión.

Measuring the learning that takes place in a training program is important in order to validate the learning objectives. Evaluating the learning that has taken place typically focuses on such questions as: • •

What knowledge was acquired? What skills were developed or enhanced?

What attitudes were changed?

Learner assessments are created to allow a judgment to be made about the learner's capability for performance. There are two parts to this process: the gathering of information or evidence (testing the learner) and the judging of the information (what does the data represent?). This assessment should not be confused with evaluation. Assessment is about the progress and achievements of the individual learners, while evaluation is about the learning program as a whole (Tovey, 1997, p. 88).

Introducción al Diseño Instruccional

Evaluation in this process comes through the learner assessment that was built in the design phase. Note that the assessment instrument normally has more benefits to the designer than to the learner. Why? For the designer, the building of the assessment helps to define what the learning must produce. For the learner, assessments are statistical instruments that normally poorly correlate with the realities of performance on the job and they rate learners low on the "assumed" correlatives of the job requirements (Gilbert, 1998). Thus, the next level is the preferred method of assuring that the learning transfers to the job, but sadly, it is quite rarely performed. Level Three - Performance (behavior) In Kirkpatrick's original four- levels of evaluation, he names this level "behavior." However, behavior is the action that is performed, while the final results of the behavior is the performance. Gilbert said that performance has two aspects -- behavior being the means and its consequence being the end (1998). If we were only worried about the behavioral aspect, then this could be done in the training environment. However, the consequence of the behavior (performance) is what we are really after -can the learner now perform in the working environment? This evaluation involves testing the students capabil ities to perform learned skills while on the job, rather than in the classroom. Level three evaluations can be performed formally (testing) or informally (observation). It determines if the correct performance


U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión.

is now occurring by answering the question, "Do people use their newly acquired learnings on the job?" It is important to measure performance because the primary purpose of training is to improve results by having the students learn new skills and knowledge and then actually applying them to the job. Learning new skills and knowledge is no good to an organization unless the participants actually use them in their work activities. Since level three measurements must take place after the learners have returned to their jobs, the actual Level three measurements will typically involve someone closely involved with the learner, such as a supervisor. Although it takes a greater effort to collect this data than it does to collect data during training, its value is important to the training department and organization as the data provides insight into the transfer of learning from the classroom to the work environment and the barriers encountered when attempting to implement the new

Introducción al Diseño Instruccional

techniques learned in the program. Level Four - Results This is the final results that occur. It measures the training program's effectiveness, that is, "What impact has the training achieved?" These impacts can include such items as monetary, efficiency, moral, teamwork, etc. While it is often difficult to isolate the results of a training program, it is usually possible to link training contributions to organizational improvements. Collecting, organizing and analyzing level four information can be difficult, time - consuming and more costly than the other three levels, but the results are often quite worthwhile when viewed in the full context of its value to the organization. As we move from level one to level four, the evaluation process becomes more difficult and time- consuming, however,it provides information that is of increasingly significant value. Perhaps the most frequently type of measurement is Level one because it is the easiest to measure. However, it provides the least valuable data. Measuring results that affect the organization is considerably more difficult, thus it is conducted less frequently, yet it yields the most valuable information. Each evaluation level should be used to provide a cross set of data for measuring training program.


U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión.

The first three- levels of Kirkpatrick's evaluation - - Reaction, Learning, and Performance are largely "soft" measurements, however decision- makers who approve such training programs, prefer results (returns or impacts). That does not mean the first three are useless, indeed, their use is in tracking problems within the learning package: •

Reaction informs you how relevant the training is to the work the learners perform (it measures how well the training requirement analysis processes worked).

Learning informs you to the degree of relevance that the training package worked to transfer KSAs from the training material to the learners ( it measures how well the design and development processes worked). The performance level informs you of the degree that the learning can actually be applied to the learner's job ( it measures how well the performance analysis process worked).

Introducción al Diseño Instruccional

Impact informs you of the "return" the organization receives from the training. Decision- makers prefer this harder "result," although not necessarily in dollars and cents. For example, a recent study of financial and information technology executives found that they consider both hard and soft "returns" when it comes to customer- centris technologies, but give more weight to non-financial metrics (soft), such as customer satisfaction and loyalty (Hayes, 2003).

Note the difference in "information" and "returns." That is, the first three- levels give you "information" for improving the learning package. While the fourth- level gives you "impacts." A hard result is generally given in dollars and cents, while soft results are more informational in nature, but instead of evaluating how well the training worked, it evaluates the impact that training has upon the organization. There are exceptions. For example, if the organizational vision is to provide learning opportunities (perhaps to increase retention), then a level-two or level- three evaluation could be used to provide a soft return. This final measurement of the training program might be met with a more "balanced" approach or a "balanced scorecard" (Kaplan & Norton, 2001), which looks at the impact or return from four perspectives:


U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión.

Financial: A measurement, such as an ROI, that shows a monetary return, or the impact itself, such as how the output is affected. Financial can be either soft

or hard results. Customer: Improving an area in which the organization differentiates itself from competitors to attract, retain, and deepen relationships with its targeted customers.

Internal: Achieve excellence by improving such processes as supply- chain

management, production process, or support process. Innovation and Learning: Ensuring the learning package supports a climate for organizational change, innovation, and the growth of individuals.

Item Analysis

Introducción al Diseño Instruccional

One of the tools used in the evaluation process is an item analysis. It is used to "Test the Test". It ensures testing instruments measure the required behaviors needed by the learners to perform a task to standard. When evaluating tests we need to ask the question: Do the scores on the test provide information that is really useful and accurate in evaluating student performance? The item analysis provides information about the reliability and validity of test items and learner performance. Item Analysis has two purposes (Brown & Frederick, 1971): First, to identify defective test items and secondly, to pinpoint the learning materials (content) the learners have and have not mastered, particularly what skills they lack and what material still causes them difficulty. Item Analysis is performed by comparing the proportion of learners who pass an test item in contrasting criterion groups. That is, for each question on a test, how many learners with the highest test scores (U) answered the question correctly or incorrectly compared with the learners who had the lowest test scores (L)? The upper (U) and lower (L) criterion groups are selected from the extremes of the distribution. The use of very extreme groups, say the upper and lower 10 percent, would result in a sharper differentiation, but it would reduce the reliability of the results because of the small number of cases utilized. In a normal distribution, the optimum point at which these two conditions balance out is 27 percent (Kelly, 1939).


U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión.

NOTE: With the large and normally distributed samples used in the development of standardized tests, it is customary to work with the upper and lower 27 percent of the criterion distribution. Many of the tables used for the computation of item validity indices are based on the assumption that the "27 percent rule" has been followed. Also, if the total sample contains 370 cases, the U and L groups will each include exactly 100 cases, thus preventing the necessity of computing percentages. For this reason it is desirable in a large test item analysis to use a sample of 370 persons. Because item analysis is often done with small classroom size groups, a simple procedure will be used here. This simple analysis uses a percentage of 33 percent to divide the class in three groups, Upper (U), Middle (M), and Lower (L). An example will be used for this discussion. In a class of 30 students we have chosen 10 students (33 percent) with the highest scores and 10 students (33 percent) with the lowest scores. We now have three groups: U, M, and L. The test has 10 items in it.

Introducción al Diseño Instruccional

Next, we tally the correct responses to each item given by the students in the three groups. This can easily be done by listing the item numbe rs in one column and prepare three other columns, named U, M, L. As we go through each student's paper, we place a tally mark next to each item that was answered correctly. This is done for each of the ten test papers in the U group, then each of the ten test papers in the M group, and finally for each of the ten papers in the L group. The tallies are then counted and recorded for each group as shown in the table below.


U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión.

A measure of item Difficulty is obtained by adding the number passing each item in all three criterion groups (U + M + L) as shown in the fifth column. A rough index of the validity or discriminative value of each item is found by subtracting the number of persons answering it correctly in the L group from the number answering it correctly in the U group (L - U) as shown in the sixth column. Reviewing the table reveals five test items (marked with an *) that require closer examination. •

Item 2 show a low difficulty level. It might be too easy, having been passed by 29 out of 30 learners. If the test item is measuring a valid performance standard, then it could still be an excellent test item. Item 4 shows a negative value. Apparently, something about the question or one of the distracters confused the U group, since a greater number of them marked it wrong than the L group. Some of the elements to look for are:

Introducción al Diseño Instruccional

wording of the question, double negatives, incorrect terms, distracters that could be consider right, or text that differs from the instructional material. •

Item 5 shows a zero discriminative value. A test item of this nature with a good difficulty rating might still be a valid test item, but other factors should be checked. i.e. Was a large number of the U group missing from training when this point was taught? Was the L group given additional training that could also benefit the U group?

Item 7 show a high difficulty level. The training program should be checked to see if this point was sufficiently covered by the trainers or if a different type of learning presentation should be developed. Item 9 shows a negative value. The high value of the negative number probably indicates a test item that was incorrectly keyed.

As you can see, the item analysis identifies deficiencies either in the test or in the instruction. Discussing questionable items with the class is often sufficient to diagnose the problem. In narrowing down the source of difficulty, it is often helpful to carry out further analysis of each test item. The table below shows the number of learners in the three groups who choose each option in answering the particular items. For brevity,


U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión.

Introducción al Diseño Instruccional

only the first three test items are shown. The correct answers are marked with an *.

This analysis could be done with just the items that were chosen for further examination, or the complete test. You might wonder why perform another analysis for the complete test if most of the test items proved valid in the first one. The answer is to see how well the distracters performed their job. To illustrate this, look at the distracters chosen for item 1. Although the first analysis showed this the be a valid test item, of the distracters chosen by the learners, only A and B we used. Nine learners choose distracter B, seven learners choose distracter C, while none choose distracter D. This distracter needs to be made more realistic or eliminated from the test item. This type of analysis helps us to further refine the testing instrument. References Brown, Frederick G (1971). Measurement and Evaluation. Itasca, Illinois: F.E. Peacock. Gilbert, T. (1998). A Leisurely Look at Worthy Performance. The 1998 ASTD Training and Performance Yearbook. Woods, J. & Gortada, J. (editors). New York McGraw- Hill. Hayes, M. (2003, Feb 3). Just Who's Talking ROI? Information Week. p. 18. Kelly, T. L. (1939). The Selection of Upper and Lower Groups for the Validation of Test Items. Journal of Educational Psychology. Vol. 30, p.p. 17- 24.


U8. El Modelo ASSURE para el Diseño Instruccional: Evaluación y Revisión.

Kirkpatrick, Donald, (1994). Evaluating Training Programs. San Francisco, CA: BerrettKoehler Publishers, Inc. (NOTE: Donald L. Kirkpatrick is a HRD Hall of Fame member.) Markus, H. & Ruvulo, A. (1990). "Possible selves. Personalized representations of goals." Goal Concepts in Psychology. Pervin, L. (Editor). Hillsdale, NJ: Lawrence Erlbaum. Pp. 211-241. Tovey, Michael (1997). Training in Australia. Sydney: Pretice Hall Australia. Copyright 2000 by Donald Clark. Created - October 9, 1995. Last update - November 5, 2004. Updated - November, 5, 2004. Para citar este texto:

Introducción al Diseño Instruccional

Clark, D. (2000). Instructional System Development - Evaluation Phase. Recuperado en noviembre 28 de 2006 de http://www.nwlink.com/~donclark/hrd/sat6.html#kirkpatrick

Instructional system development evaluation  
Read more
Read more
Similar to
Popular now
Just for you