Page 1



OCULAR ACTIVITY AS A MEASURE OF MENTAL AND VISUAL WORKLOAD M. Sage Jessee Army Research Laboratory, Human Research and Engineering Directorate, Redstone Arsenal, AL The mental workload construct is examined along with visual workload in order to discriminate the relationship between the converging constructs. Six ocular activity variables were measured in order to test convergent and discriminant properties with regards to visual and mental workload. Three pilot crews flew six flight scenarios in which subjective and physiological mental workload measures were analyzed across task difficulty and task differences (pilot on-controls versus pilot off-controls). Results indicate less subjective mental workload for the hover task compared to the action on contact task and greater ocular fixation duration variability for the lower difficulty task. Blink interval was greater for the pilot on-controls and saccadic extent was greater for the pilot off-controls. Data suggest that blink interval and saccadic extent are diagnostic of different aspects of visual workload, whereas fixation duration variability is sensitive to mental workload.

Not subject to U.S. copyright restrictions. DOI 10.1518/107118110X12829369835761

INTRODUCTION In many dynamic operational environments, workload data are gathered in order to assess the resource demands of the operator. Often these measures fail to capture subtle and quick shifts in workload, or do not provide diagnostic information regarding the cause of the shift. Furthermore, workload is such a broad construct that one measure typically only captures components, which is why the use of multiple workload measures are often suggested (Verwey & Veltman, 1996) in order to capture a more complete workload picture. It is suggested that the construct be broken down into resource components consistent with Wicken’s Multiple Resource Model (1980, 2000) and McCracken and Aldrich (1984) workload components in order to further clarify operator constraints during underload and overload events. Although non-parsimonious, further discerning the workload construct will enhance researchers’ ability to properly validate and implement various workload measures. One of the resource pools taken as a component of workload is the visual system. Visual workload (VWL) has been mentioned in several studies (Backs & Walrath, 1992; Hancock & Desmond, 2001; van der Horst, 2004; Verwey & Veltman, 1996; Wickens, Helleberg, & Xu, 2002; McCracken & Aldrich, 1984), but has received little attention compared to mental workload (MWL). In addition, VWL has received even less consideration regarding its definition and theoretical underpinnings. Here, VWL is defined as the effort required to detect visual information while MWL represents the servicing of information. Both of these detection and servicing events require effort from the operator and have various implications regarding the overall design of a system. For example, if the visual system has to exert too much effort in order to detect information that is broadly dispersed or buried within layers of pages on a dynamic display, then the operator runs the risk of not detecting prominent visual information. In this situation, VWL is too high, and the system must be redesigned such that visual information is easily accessible. This may be achieved in a number of ways, including but not limited to, optimizing font, font size, color, and the dispersion of visual information. Given operator information capacity limitations, especially in the aviation domain, the consequences of visual

overload can lead to inaccurate pictures of the current state of the aircraft and non-detection of important information. This could lead to poor decision making, decreases in overall performance, and eventually accidents. To date, several researchers have demonstrated some ocular activity variables correlated with VWL. Verwey and Veltman (1996) compared nine different workload assessment measures and suggest that eye blinks are diagnostic of VWL, because people tend to suppress eye blinks when they have to process visual information (Veltman & Gaillard, 1996). Saccadic extent as measure of VWL was alluded to by Wickens (2002) when he suggested that two overlapping visual information sources, if they are far apart, require an added effort of the operator when scanning between them. In addition to the amount of effort required to scan between highly dispersed information sources, saccadic suppression “occludes” vision during saccades. Thus, for longer saccades, vision tends to be occluded. Many ocular activity variables have been shown to be correlated with MWL, such as fixation duration (Tole, et al., 1983; Willems, Allen, & Stein, 1999), fixation frequency (Van Orden, Limbert, Makeig, & Jung, 2001), blink rate (Wilson, 2002), blink duration (Veltman & Gaillard, 1998; Ahlstom & Friedman-Berg, 2006), pupil diameter (Ahlstom & FriedmanBerg, 2006), blink frequency (Van Orden, Limbert, Makeig, & Jung, 2001; Van Orden, Jung, & Makeig, 2000), blink interval (Ryu & Myung, 2005) and saccadic extent (Van Orden, Limbert, Makeig, & Jung, 2001). Much of the blink data are correlated, but overall, this data suggests that during higher levels of mental workload, operators tend to blink quickly, inhibit blink activity, experience dilated pupils, and exhibit shorter saccades. The largest body of data supports the blink and pupil correlation to MWL and a smaller amount of literature supports the MWL correlation with saccadic extent. Fixation duration data is less clear. Willems, Allen, and Stein (1999) found that fixation durations decreased while MWL increased, while Tole, et al., (1983) found that fixation durations increased with MWL, especially for novice participants. The current study proposes that the following VWL and MWL ocular activity variables will diverge. VWL variables include blink interval, pupil size variability, and saccadic


extent. These variables were selected as VWL measures because they capture the effort exerted by the visual system as well as the occlusion of visual information (blink interval). Although there is no standardized manipulation check for VWL, it is suggested that VWL was higher for the pilot oncontrols compared to the pilot off-controls. The primary responsibility of the pilot on-controls is to aviate. Thus, the visual information relevant to his goal is dispersed throughout the environment, both inside and outside the cockpit. This implies that VWL will be higher because more effort is required to obtain visual information that is farther apart compared to visual information that is closely packed. The pilot off-controls, however, tends to be focused in a more localized area, the instrument cluster. MWL, however, has been correlated with pupil size, fixation duration, and fixation duration variability. It is suggested that these variables will diverge because the MWL variables do not demonstrate effortful visual activity but have still been correlated with MWL. METHOD Participants Participants (N = 6) were U.S. Army helicopter aviation pilots who flight-tested a simulator of the UH-60M upgrade Black Hawk helicopter. Research was conducted on Redstone Arsenal in Huntsville, AL, at the Software Engineering Directorate (SED) in a Systems Integration Laboratory (SIL). All pilots were male with an age range of 28 to 37 (M = 32.16). They were selected as part of a Limited Usability Test (LUT) conducted during the acquisition cycle of the UH-60M upgrade helicopter. Data from one of the pilots had to be dropped because the image capturing device was bumped during the flight scenario. All pilots were trained for the UH60M upgrade LUT during a one week period prior to data collection, but had no experience with the specific vignettes flown during the experimental period.


Task difference was also manipulated at two levels, pilot on-controls and pilot off-controls. To ensure equal task difficulty for the pilot on-controls and pilot off-controls, BWL ratings were compared. Results indicate that there were no significant differences for task difficulty for the pilot oncontrols and pilot off-controls, t(5) = 2.24, p > .05. The dependent variables included six ocular activity metrics, blink interval, pupil size variability, saccadic extent, pupil size, fixation duration, and fixation duration variability. Apparatus Ocular activity data were collected using an Applied Science Laboratory (ASL) Eye-Trac 6 head-mounted eye tracker and a laser-guided head tracker (LaserBird II) from Ascension Corporation. Thus, eye head integration was implemented in order to allow pilots to freely move their head. During the installation and set-up procedure, three scene planes were used to set up the head and eye tracker environment, which can be viewed in Figure 1. These include scene plane zero, the calibration scene plane, and both Multifunctional Display units (MFD), which corresponded to scene plane one and two for each pilot. The outside the window (OTW) scene corresponded to scene plane zero, while the Control Display Unit (CDU) was represented by scene plane three. Scene plane zero is an infinite extension, so it was used to capture data related to observations of the pilot’s notepad and controls. A complete eye-head integration system was employed for the pilot seat and the co-pilot’s seat. Thus, each system had three scene planes that were essentially mirrored images of each other.

Experimental Design The study was a 2 × 2, full factorial within-subjects design. The two independent variables were task difficulty and task difference. Task difficulty was manipulated at two levels, high and low workload, verified by Bedford Workload Scale (BWL) ratings. Tasks were derived from the Aircrew Training Manual (ATM). ATM task 2042 (perform actions on contact) and ATM task 1038 (perform hovering flight) were selected as the high and low workload tasks, respectively. These tasks were selected for two primary reasons. First, they were discrete events; second, the nature of those event initiation and termination points were such that data coders could consistently identify the start and stop points and be categorized reliably or without confounding broader situational events. Confirmation of the task difficulty levels (high and low workload) was indicated from the BWL ratings. BWL ratings for task 1038, hover (M = 1.58) and task 2042, action on contact (M = 4.25) were significantly different, t(11) = 8.6, p < .05.

Figure 1. Scene plane locations for pilot and co-pilot eye tracking systems Procedure After all required consent forms were completed, participants began a standard training procedure in order to familiarize themselves with the new components of the UH-


60M upgrade. Pilot training consisted of a one week long training course with a mix of classroom style lectures, desktop training, and simulation training. There was approximately two weeks between training and test execution with a one day refresher course preceding each crewâ&#x20AC;&#x2122;s series of trials. Once pilot training was completed, data collection began. During a three-week data collection period, three crews (pilot and copilot) flew six simulated flight scenarios in a one week period. The flight scenarios were designed by an experienced Black Hawk pilot with 2875 hours as an instructor pilot, 975 hours as a pilot, and 3550 hours as a pilot in command. They were developed in order to mimic typical flight scenarios for Black Hawk helicopters. Before entering the SIL, pilots went through a pre-flight briefing during which the scenario was explained. Once pilots entered the SIL, they were fitted with the ASL eye tracker, the eye tracker was calibrated, and then the pilots began the preflight tasks. Pilots were asked to complete mission objectives, and were instructed to fly as if it were a real-life situation. The purpose of this strategy was to increase ecological validity such that data can be generalized to real-world situations. After each vignette, pilots were instructed to complete the Bedford Workload Scale for a number of ATM tasks. Once pilots completed the vignettes and surveys, they were debriefed and returned the next day for two subsequent flight scenario. RESULTS A series of paired samples t-tests, and paired samples ttests of variance, were conducted to evaluate how the six ocular activity measures relate to task difficulty and task differences of pilots while operating a UH-60M upgrade simulator. A paired samples t-test of variance was conducted on fixation duration variance and pupil size variance. The ttest of variance is used to test if two variance parameters are equal (Glass & Hopkins, 1995). Essentially, it factors out of the standard error term, a non-zero correlation coefficient, since paired scores were expected to be correlated. Across task difficulty, it was expected that fixation duration, pupil size, and fixation duration variability would be significantly different. Results, illustrated in Table 1, indicate that the mean fixation duration variability was significantly higher for ATM task 1038, perform hovering flight (M = 0.25, SD = 0.11), than for ATM task 2042, perform action on contact (M = 0.15 SD = 0.05), t(13) = 1.86, p < .05. All other results indicate that there were no statistically significant differences for all other ocular activity comparisons across task difficulty. In order to evaluate which ocular activity measurements varied significantly as a function of task differences, a series of paired samples t-tests was conducted and the family wise error rate was controlled using the Bonferroni correction, which adjusts the alpha level to the equivalent .05 probability. A significant difference showed that the saccadic extent was significantly lower for the pilot on-controls (M = 8.29, SD = 1.73) versus the pilot off-controls (M = 11.21, SD = 3.88), t(13) = 2.31, p = .038. Blink interval was found to significantly increase for the pilot on-controls (M = 8.11, SD =


5.10) compared to the pilot off-controls (M = 2.84, SD = 1.24), t(13) = 4.09, p = .001. All other comparisons of ocular activity measures between task difficulty and task differences were not statistically significant. Additionally, no ocular activity variables were significant across both independent variables. Table 1: Means for Significantly Different Ocular Activity Values across Task Difficulty and Task Differences Task Difficulty Task Difference Ocular Action on OnOffHover Variable Contact Controls Controls Fixation .25 s .15 s Variability Saccadic 8.29 deg 11.21 deg Extent Blink 8.11 s 2.84 s Interval * s = seconds and deg = degrees of visual angle. DISCUSSION One of the primary motivating factors of this research was to attempt to discriminate VWL and MWL by examining how ocular activity variables change across task difficulty while holding task differences constant, and separately, by holding task difficulty constant across task differences. Data indicate that fixation duration variability is sensitive to task difficulty, while saccadic extent and blink interval are sensitive to differences in visual demands (Table 1), as indicated by variations between the pilot on-controls versus the pilot offcontrols. Specifically, fixation duration variability was greater during the hover task compared to the action on contact task; saccadic extent was greater for the pilot off-controls compared to the pilot on-controls; and blink interval was greater for the pilot on-controls compared to the pilot off-controls. All other comparisons were statistically unsupported. Saccadic Extent The data on saccadic extent indicate two potential conclusions. First, the VWL experience by the pilot offcontrols may have been significantly greater than the pilot oncontrols. Taken at face value, many would likely argue that the VWL demands of the pilot performing the primary aviation tasks (pilot on-controls) are more visually demanding than the pilot off-controls. However, one of the primary features of the simulation is a flying system referred to as flyby-wire, which allows pilots to automate much of the mechanical aviation process under conditions such as hovering. Although data were collapsed across both task differences (hover with automated flight controls and actions on contact with manual flight controls), this may have greatly reduced the VWL typical of the pilot on-controls, which would explain the reduced VWL experienced by the pilot oncontrols compared to the pilot off-controls. This would be similar to suggesting that a driverâ&#x20AC;&#x2122;s passenger who is in charge of navigation experiences a higher level of VWL because their saccades tend to transition from deep within the interior of a vehicle, perhaps a map on their lap, to outside the window.


This type of visual transition would tend to be a much greater distance than the driver’s saccadic extent between the outside the window view and the rear view mirror, as an example. A competing explanation indicates that the pilot offcontrols experienced excessive visual resources. The driver in the previous example must stay fixated on important visual information nearly all the time in order to successfully operate the vehicle. The distance of a saccade is highly influenced by the location of important visual information, which suggests that VWL, like MWL, is also a multi-faceted construct that is influenced by more than just the distance the eye moves, but also by the location and quality of important visual information. Further research should be conducted in order to investigate the expected value of visual information. This would provide designers information about how detrimental it is to place certain types of information outside the scope of an operator’s close visual range. Fixation Duration Variability Fixation duration has been shown to relate to information processing (Salthouse & Ellis, 1980), but the only data collected on fixation duration variability have varied by level of experience. The current study held level of operator experience constant but still found variability across task difficulty. This indicates that fixation duration variability may also be a sensitive measure of MWL, as it is manipulated by task demands and not only operator history, as reported by (Wikman, Nieminen, & Summala, 1998). The diagnostic value of fixation duration variability is similar to saccadic extent. Fixation duration variability was statistically different across the task difficulty variable, but not the task difference variable, indicating that it only changes as a function of MWL and not VWL. Blink Interval The data on blink interval have several implications. First, as reported by Verway and Veltman (1996), blink interval may be sensitive to subtle changes in MWL, but incapable of capturing larger magnitudes of task difficulty changes. Similar results were found in the present study because of the nature of the different tasks for which MWL data were collected. This would suggest that there are subtle changes in MWL between the pilot on-controls compared to the pilot offcontrols that blink interval data is sensitive to, but were not captured by the BWL data. Additionally, Veltman and Gaillard (1998) found that blink interval increased as more visual information had to be processed, but during a memory task, blink intervals decreased. This suggests that blink interval is diagnostic with respect to operator processing resources (visual versus cognitive). Although the current data set only provides evidence that blink interval varied across task differences, other researchers have found that blink interval is sensitive to MWL as well as VWL (Wierwille & Eggemeier, 1993). Future studies may look to examine the relationship between multiple blink variables, such as interval and duration, and tasks with more controlled visual and mental components.


CONCLUSIONS AND LIMITATIONS When interpreting this data, several limitations should be considered regarding the conclusions that are derived. First, further data must be collected in a more controlled environment in order to replicate these results. For example, the use of a low VWL task unrelated to aviating would be more favorable. Second, a VWL manipulation validation was unavailable for the current study, and would have provided data, aside from ocular activity data, regarding the VWL of the pilot on-controls versus the pilot off-controls. In conclusion, MWL is a multifaceted construct that requires several measures to fully capture. The purpose of measuring VWL is three-fold. First, it allows developers and evaluators the ability to gain information about the specific perceptual system that is being overloaded, or has spare capacity under certain task constraints, while concurrently providing information about more general aspects of workload. Second, it can inform the designer of specific aspects of the system that may need to be altered. For example, if a driver uses a vehicle telematics system with a visual display, it may draw the operator’s visual attention away from the road and reduce safety. Thus, the designer may implement an auditory system that alerts the driver of important navigation events that greatly reduce the need to focus visual attention on the display, offloading VWL onto the auditory system. Essentially, this means that measuring VWL provides information to designers and evaluators about the available capacity of the visual system, which can be used to inform design decisions and not purely diagnostics. Third, the author of this paper maintains that measuring VWL can provide construct clarification within the broader workload domain. The term MWL, or even more generally, workload, is often used to characterize task demands (difficulty, number, rate, or complexity of demands), the level of performance an operator is capable of achieving, an operator’s response to task demands as opposed to the demands directly, or an operator’s perceptions of task demands (Huey & Wickens, 1993). Thus, measuring the VWL construct separately from the broader workload domain would provide more specific and clear data regarding the nature of systems under evaluation, as well as their design implications. REFERENCES Ahlstrom, U., & Friedman-Berg, F. J. (2006). Using eye movement activity as a correlate of cognitive workload. International Journal of Industrial Ergonomics, 36, 623636. Backs, R. W., & Walrath, C. L. (1992). Eye movement and pupillary response indices of mental workload during visual search of symbolic displays. Applied Ergonomics, 23(4), 243-254. Glass, G. V., & Hopkins, K. D. (1995). Statistical methods in educational psychology. Needlham Heights, MA: Simon & Schuster. Hancock, P. A., & Desmond, P. A. (2001). Stress, workload and fatigue. Mahwah, NJ: Lawrence Erlbaum.


Huey, B. M., & Wickens, C. D. (1993). Workload transition: Implications for individual and team performance. Washington, DC, National Academy Press. McCracken, J. H., & Aldrich, T. B. (1984). Analyses of selcected LHX mission functions: Implications for operator workload and system automation goals. Technical Note ASI479-024-84; U.S. Army Research Institute Aviation Research and Development Activity: Fort Rucker, AL. Ryu, K., & Myung, R. (2005). Evaluation of mental workload with a combined measure based on physiological indices during a dual task of tracking and mental arithmetic. International Journal of Industrial Ergonomics, 35, 9911009. Salthouse, T. A., & Ellis, C. L. (1980). Determinants of eyefixation duration. American Journal of Psychology, 93(2), 207-234. Tole, J. R., Stephens, A. T., Vivaudou, M., Ephrath, A. R., & Young, L. R. (1983). Visual scanning behavior and pilot workload (NASA Contractor Rep. No. 3717). Hampton, VA: Langley Research Center. Van der Horst, R. (2004). Occlusion as a measure for visual workload: An overview of TNO occlusion research in car driving. Applied Ergonomics, 35, 189-196. Van Orden, K. F., Jung, T. P., & Makeig, S. (2000). Combined eye activity measures accurately estimate changes in sustained visual task performance. Biological Psychology, 52, 221-240. Van Orden, K. F., Limbert, W., Makeig, S., & Jung, T. (2001). Eye activity correlates of workload during a visuospatial memory task. Human Factors, 43(1), 111-121. Veltman, J. A., & Gaillard A. W. K. (1998). Physiological workload reactions to increasing levels of task difficulty. Ergonomics, 41(5), 656-669.


Verwey, W. B., & Veltman, H. A. (1996). Detecting short peaks of elevated workload: A comparison of nine workload assessment techniques. Journal of Experimental Psychology: Applied, 2, 270-285. Wickens, C. D. (1980). The structure of attentional resources. In R. Nickerson (Ed.), Attention and Performance VIII, (pp. 239-257). Hillsdale, NJ: Erlbaum. Wickens, C. D., & Hollands, J. G. (2000). Engineering psychology and human performance (3rd ed.). New Jersey: Prentice Hall. Wickens, C. D., Helleberg, J., & Xu, X. (2002). Pilot maneuver choice and workload in free flight. Human Factors, 44, 171-188. Wierwill, W. W., & Eggemeier, F. T. (1993). Recommendations for mental workload measurement in a test and evaluation environment. Human Factors, 35, 263-281. Wikman, A. S., Nieminen, T., & Summala, H. (1998). Driving experience and time-sharing during in-car tasks on roads of different width. Ergonomics, 41, 358-372. Willems, B., Allen, R. C., & Stein, E. S. (1999). Air traffic control specialist visual scanning II: Task load, visual noise, and intrusions into controlled airspace (DOT/FAA/CT-TN99/23). Federal Aviation Administration, William J. Hughes Technical Center. Atlantic City International Airport, NJ. Wilson, G. F. (2002). An analysis of mental workload in pilots during flight using multiple psychophysiological measures. International Journal of Aviation Psychology, 12,(1), 3-18.

Ocular Activity as a Measure of Mental and Visual Workload  

Some of what I do.