Issuu on Google+

User-Experience from an Inference Perspective PAUL VAN SCHAIK, Teesside University MARC HASSENZAHL, Folkwang University JONATHAN LING, University of Sunderland

In many situations, people make judgments on the basis of incomplete information, inferring unavailable attributes from available ones. These inference processes may also well operate when judgments about a product’s user-experience are made. To examine this, an inference model of user-experience, based on Hassenzahl and Monk’s [2010], was explored in three studies using Web sites. All studies supported the model’s predictions and its stability, with hands-on experience, different products, and different usage modes (action mode versus goal mode). Within a unified framework of judgment as inference [Kruglanski et al. 2007], our approach allows for the integration of the effects of a wide range of information sources on judgments of user-experience. Categories and Subject Descriptors: H.1.2 [Models and Principles]: User/Machine Systems—Human information processing; H.5.2 [Information Interfaces and Presentation]: User Interfaces—Theory and methods; H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia—Theory; I.6.5 [Simulation and Modeling]: Model Development General Terms: Experimentation, Human Factors, Theory Additional Key Words and Phrases: User-experience, model, inference perspective, beauty, aesthetics ACM Reference Format: van Schaik, P., Hassenzahl, M., and Ling, J. 2012. User-experience from an inference perspective. ACM Trans. Comput.-Hum. Interact. 19, 2, Article 11 (July 2012), 25 pages. DOI = 10.1145/2240156.2240159 http://doi.acm.org/10.1145/2240156.2240159

1. INTRODUCTION

Imagine you want to enhance your voice-over-IP-calls with a high-definition image. By coincidence, a local shop makes an exceptional offer (in terms of “value for money”) of a multifunctional (“all-singing-all-dancing”) webcam. Will you accept? The problem is to predict whether or to what extent the product would meet your needs. As you have no hands-on experience, you visit the shop to see for yourself what the product looks like in reality and to get further information from the helpful staff. However, you are not allowed to open the attractive transparent box in which the seductive product patiently awaits your expenditure. You simply cannot try the product before buying it. Therefore, in effect, you try to “guess”—or infer—the product’s reliability, usefulness and ease of use from the specific pieces of information that you find relevant. This type of inference is a ubiquitous process, which underlies many phenomena [Kardes et al. 2004b; Loken 2006]; some even argue that it is the very essence of Authors’ addresses: P. van Schaik, School of Social Sciences and Law, Teesside University, United Kingdom; email: p.van-schaik@tees.ac.uk; M. Hassenzahl, Ergonomics in Industrial Design, Folkwang University, Germany; J. Ling, Faculty of Applied Sciences, University of Sunderland. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from the Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701, USA, fax +1 (212) 869-0481, or permissions@acm.org. c 2012 ACM 1073-0516/2012/07-ART11 $15.00  DOI 10.1145/2240156.2240159 http://doi.acm.org/10.1145/2240156.2240159

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.

11


11:2

P. van Schaik et al.

human judgment [Kruglanski et al. 2007]. The purpose of the current studies was to investigate how people infer specific attributes of an interactive product from other attributes or broader evaluations, such as beauty or an overall evaluation (e.g., goodness). 1.1 Inference in Judgments about Interactive Products

While inference is ubiquitous, available models of the perceived quality of interactive products, from Technology Acceptance [Venkatesh et al. 2003] to User-Experience (UX) [Hartmann et al. 2008; Hassenzahl 2003; Lavie and Tractinsky 2004], predominantly assume only one particular pattern of inference: induction or a specific-to-generalinference [Kardes et al. 2004b]. This approach suggests that overall assessments or attitudes are “built” from the careful consideration, weighting and integration of specific attributes, such as usability, functionality, expressive aesthetics, hedonic quality, and/or engagement. The approach is reminiscent of computational, multi-attribute theories of decision-making [Keeney and Raiffa 1976]. These assume that people construct their overall assessment from single, distinct and specific attributes, which are assessed separately (e.g., “How usable is the product?”), weighted (e.g., “How important is usability to me?”) and combined into an overall evaluation. While studies seem to provide some support for the induction of general value from specific attributes, it should not be taken as the major or even the only process that operates. Current approaches to judgment and decision making take a rather noncomputational approach [Chater and Brown 2008; Chater et al. 2003; Gigerenzer and Gaissmaier 2011; Kruglanski and Gigerenzer 2011]. Supported by a wealth of empirical evidence [Gigerenzer and Gaissmaier 2011], these approaches suggest that, rather than doing complex (weighted and summated) calculations to induce, people use relatively simple cognitive strategies (e.g., “simple rules” [Chater and Brown 2008]; “heuristics” [Gigerenzer and Gaissmaier 2011]) to make judgments. This type of processing is due to the way the world is structured in terms of available information. Induction assumes, for example, some knowledge about each attribute, which is only rarely available [Gigerenzer and Gaissmaier 2011]. People may simply lack hands-on experience of a product, which makes it difficult to assess its usability. Nevertheless, people make global value judgments and more specific attribute judgments, even when information is absent or limited. They do so by inferring unobservable, momentarily hard-to-assess product attributes from their global valuation of the product (i.e., general-to-specific) or by inferring them from other, more accessible attributes (i.e., specific-to-specific). At the heart of this inference process are rules, which tie together available and unavailable information. These rules are based on lay theories and knowledge about their applicability in a particular situation. A well-known inference rule, for example, is that the more expensive a product, the higher the quality (price-quality correlation, e.g., Kardes et al. [2004a]). People, for example, guess the taste (quality) of wine, based on its price. Note that the application of rules is context-dependent. People do not apply the price-quality rule when a product is on special offer [Chernev and Carpenter 2001]. In addition, the application of inference rules is not necessarily conscious or deliberate, it can also be automatic, unconscious and hardly accessible to a particular individual [Kardes et al. 2004b; Kruglanski et al. 2007]. 1.2 Inference in User Experience

The variety of potential rules to infer attributes from other attributes has interesting implications for the study of UX, preferences, and acceptance. In most studies,

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective

11:3

data collection is concurrent, that is, all constructs are assessed simultaneously. Researchers then evaluate their specified model, which is almost exclusively inductive. However, disregarding potential deduction (e.g., general-to-specific inference) can easily lead to false conclusions. Consider the following hypothetical example. Participants are asked to assess the (perceived) usability and innovativeness of a given product and its general appeal. The researcher regresses usability and innovativeness on appeal to determine the relative importance of usability and innovativeness in explaining appeal. Now assume that people have only limited or even no hands-on experience with the product. This makes their assessment of usability difficult: without hands-on experience, the question “Do I find the product predictable?” is hard to answer. On the other hand, from the description of the product concept and what they see they may tell right away whether they find it innovative or not. People who appreciate innovativeness will provide high, but others low appeal ratings. Innovativeness becomes predictive of appeal. Usability, however, cannot be assessed as easily. People nevertheless provide an assessment, when prompted to do so. But they do not infer usability from actual specific information, but may deduce it from their general appeal rating, following the simple rule of “I like it, it must be good on all attributes” (which is simply a version of the ubiquitous “halo-effect” [Thorndike 1920]). The consequence is that people who find the product appealing (mainly due to its innovativeness) provide higher ratings of usability as well. However, the researcher applying an inductive model to the data will reach quite a different conclusion. Usability becomes highly predictive of appeal. It becomes “important” for any judgment of appeal. However, this is not a consequence of usability’s role in forming an overall judgment (of appeal in this case), but a consequence of people deducing a hard-to-assess attribute (e.g., usability) from general value (e.g., appeal). As long as people strive for consistency in their judgments, these “hypothetical” effects are likely to be responsible for many findings and potentially false beliefs in UX and information systems (IS) research. The potential different interpretations of people’s judgment of attributes and general values alone justify the quest for a better understanding of the structure of UX models. Take the study of Cyr et al. [2006] as one of many examples. In their study, 60 participants used a single mobile Web site for five to fifteen minutes to perform some given information retrieval tasks. Subsequently, design aesthetics, perceived usefulness, perceived ease-of-use, perceived enjoyment, and loyalty were assessed concurrently with a questionnaire. Their suggested model assumes that design aesthetics is used to infer three aspects: enjoyment, ease-of-use, and usefulness. Ease-of-use is then in turn used to infer enjoyment and usefulness. From the latter two, loyalty is inferred. Given the mechanisms of human judgment, this particular model is hard to justify. For example, one could easily argue that participants got an impression of design aesthetics from looking at the site and ease of use from using the site. From these two pieces of specific information, usefulness and enjoyment are then inferred separately and combined into loyalty. In this interpretation, the effect of design aesthetics on ease of use is spurious, perhaps due to the correlation of both these variables with usefulness and enjoyment. Similarly, the effects of ease of use on enjoyment and of design aesthetics on enjoyment would be spurious. The point here is that without the notion of inference and a careful consideration of how assessments are potentially made in different situations—boundary conditions— the theoretical justification of a model is almost impossible. However, this is exactly what is needed, because whatever sophisticated statistical modeling techniques are employed, the results and their credibility depend on the specification of the model, which is outside of statistical considerations.

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


11:4

P. van Schaik et al.

Fig. 1. Basic inference model.

1.3 Taking an Inference Perspective in User-Experience Research

Despite the potential implications (in terms of better-specifying UX models) of considering various types of situation-dependent inference rules beyond the specific-togeneral that is predominantly used, neither the field of IS nor human-computer interaction (HCI) seem to explicitly consider an inference perspective. A recent exception is Hassenzahl and Monk’s work [2010]. They argue that “beauty” should be thought of as an affect-driven, evaluative response to the visual Gestalt of an interactive product [Hassenzahl 2008]. This has two implications: first, judgments of beauty are always based on information; they require only a visual input, which is almost always available. Second, its predominantly affective nature makes it very quick [Tractinsky et al. 2006]. Both characteristics point to beauty as an important starting point for inferring other attributes, which are at least initially hard to access, due to, for example, a lack of hands-on experience or other missing information. Therefore, Hassenzahl and Monk [2010] argued inference from beauty to be extremely likely, especially in the absence of any further experience, but this inference may also remain a dominant mode of judgment even after hands-on experience later. They further suggested specific rules that govern inference for interactive products. Due to its evaluative nature, beauty is an important input to the general evaluation of the product (goodness) (the direct link from Beauty to Goodness in Figure 1). This is reminiscent of Dion et al.’s [1973] classic “what is beautiful is good”: a stereotype judgment in person perception. Two further constructs are of interest in the IS and HCI literature: perceived pragmatic quality (broadly related to perceived usability, perceived ease-of-use) and perceived hedonic quality (broadly related to perceived enjoyment, novelty, stimulation) [Hassenzahl 2010]. Taking an inference perspective, Hassenzahl and Monk [2010] proposed two distinct rules to account for the inference of pragmatic and hedonic quality from beauty. The link between pragmatic quality and beauty is indirect. It is a consequence of evaluative consistency [Lingle and Ostrom 1979], where individuals infer unavailable attributes from general value [goodness] to keep their judgments consistent (the path from beauty to Goodness and from Goodness to Pragmatic quality). In contrast to Pragmatic quality, Hedonic quality is directly inferred from beauty (the direct link from Beauty to Hedonic quality in Figure 1). According to probabilistic consistency [Ford and Smith 1987], individuals infer unavailable attributes directly from a specific available attribute believed to be conceptually or even causally linked to the unavailable attribute. In other words, while people may hold specific beliefs about ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective

11:5

Fig. 2. Hassenzahl and Monkâ&#x20AC;&#x2122;s [2007] test results of an inference model. Figures are (standardized) path coefficients (with bivariate correlations in brackets). Bold signifies statistical significance ( p < .05).

how beauty and hedonic quality are related, any observable link between beauty and pragmatic quality is just the consequence of people inferring overall quality (goodness) from beauty and then inferring conceptually different specific product attributes from overall quality. Hassenzahl and Monk [2007, 2010] put these rules to an initial test. Figure 2 shows the results of LISREL analysis from a sample of 430 assessments of 21 different interactive products. The sample was recruited using the online questionnaire tool AttrakDif21 . The data were fully anonymized, thus, nothing can be said about the specific interactive products constituting this sample. The analysis reported was published in Hassenzahl and Monk [2007], but the data are identical with Dataset 4 of Hassenzahl [2010]. However, the latter paper used a different analysis strategy and rather focused on the inference of pragmatic quality (i.e., perceived usability) from beauty. For the current paper, the LISREL analysis from 2007 is more illustrative. As expected, the relation between beauty and goodness was substantial (.71) and stronger than the relation between goodness, and pragmatic quality (.65) and hedonic quality (.48). Moreover, the link between beauty and pragmatic quality, which was substantial as a bivariate correlation (.53), completely disappeared when goodness was included in the analysis (path coefficient: â&#x2C6;&#x2019;.07), emphasizing the fully mediated nature of the link between beauty and pragmatic quality (i.e., an example of evaluative consistency). In contrast, the direct link between beauty and hedonic quality remained intact (path coefficient: .42), hinting at a partial mediation, where some of the hedonic quality is directly inferred from beauty (i.e., an example of probabilistic consistency) and some indirectly through goodness (i.e., an example of the evaluative consistency). The results provide strong support for an inference-based model, which was substantiated with three further studies [Hassenzahl and Monk 2010]. In sum, Hassenzahl and Monk [2007, 2010] suggest that (1) beauty is a direct determinant of goodness, (2) beauty is an indirect determinant of pragmatic quality, operating through goodness, and (3) beauty is a direct determinant of hedonic quality.

1

http://www.attrakdiff.de

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


11:6

P. van Schaik et al.

1.4 Aims of the Current Study

While representing a first step towards more adequately addressing the variety of human judgment processes, the studies of Hassenzahl and Monk [2010] left a number of questions unanswered. We aim to explore some of these in the present paper. We report three independent, complementing studies, which focus on the following aspects. Replicability of the suggested inference for single products. Monk [2004] strongly urged us to avoid a “fixed-effect fallacy” when studying the relationship between attribute judgments about products. In brief, the argument was that typically in these studies participants as well as products contribute variance. Accordingly, we need to carefully sample products as well and need to make sure that all models hold for people (i.e., subjects analysis) and products (i.e., material analysis) alike. Hassenzahl and Monk [2010] tested the inference model (as described in Figure 1) on four independent datasets, in a subjects as well as a materials analysis, and found the model was stable. However, while methodologically sound, the requirement of having a sample of products does not fit well with practices of HCI or the domain of UX. Typically, practitioners and researchers alike evaluate a single product by handing out questionnaires to a sample of people. While the results from Hassenzahl and Monk [2010] suggest that the inference model should also hold for a single product (i.e., when the product variance is held constant), this was not tested yet. Therefore, we used only a different single product in each of three studies. Our first aim was simply to replicate the inference model. Effects of hands-on experience. In Hassenzahl and Monk [2010], all participants were instructed to have brief hands-on experience with a particular product. Even when assuming that inference is a stable process, which easily overrules brief handson experience, this is not the most straightforward way of testing the model. Accordingly, the present studies contrasted two different measurements, one based on the presentation of an interactive product only and another after verifiable hands-on experience. The inference model we have outlined should hold for the “presentation-only” condition (in the sense of a control condition, where an interactive product is presented, but users do not interact with the product). In addition, this study design allows testing of the stability of the model, given additional hands-on experience. Our second aim was to explore potential effects of hands-on experience on the model. Types of experience. In most studies [though, see van Schaik and Ling [2011], hands-on experience is either task-oriented (i.e., people are asked to complete given tasks, such as the information retrieval tasks provided by Cyr et al. [2006]) or left to the participants [e.g., Hassenzahl and Monk 2010]. Inspired by Apter [1989], Hassenzahl [2003] conceptualized the psychological consequences of the different situations while interacting with a product in goal mode or action mode. In goal mode the fulfillment of a given goal is to the fore. The goal has clear relevance and determines all actions. The product is therefore just a means to an end. While interacting with a product, people in goal mode try to be effective and efficient. In action mode the action itself is to the fore. It determines rather volatile goals during use. Using the product is an end in itself. Several studies [Hassenzahl and Ullrich 2007; van Schaik and Ling 2009, 2011] revealed a profound effect of mode of use on how products are judged, which merits its inclusion. In the present paper, we studied experience in action mode in Study 1, by not specifying any particular task, and asking participants to just explore the artifact. In Study 2, we introduced specific information-retrieval tasks (experience in goal mode), which enabled us to examine potential differences. In Study 3, we additionally varied task complexity to introduce more or less demand ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective

11:7

in goal mode [van Oostendorp et al. 2009; van Schaik and Ling (to appear)]. Task complexity (path length, defined as the number of steps involved in finding the information) has a negative effect on task performance, due to the increase in probability to select a wrong link on longer paths or to misjudged the relevance of presented information [Gwizdka and Spence 2006]. For Study 3, we assumed that the manipulation of task complexity would influence perceptions of pragmatic quality but not of hedonic quality. Hassenzahl [2001], for example, found pragmatic and hedonic quality to be independent after hands-on experience in a goal-oriented mode. In this study, subjective mental effortâ&#x20AC;&#x201D;a consequence of task demands and/or usability problemsâ&#x20AC;&#x201D;negatively impacted pragmatic quality, but not hedonic quality. We assume a similar asymmetry here. Thus, in addition to studying the inference model under two different usage modes (Study 1 and Study 2), in Study 3 we set out to deliberately manipulate experience by making the task more or less complex and to explore its effects on the prediction of the inference model. Note that, within the framework of the personartifact-task model [Finneran and Zhang 2003; van Schaik and Ling, to appear], task complexity is only one of the possible variables that affect judgment. Existing models of judgment in HCI [Hartmann et al. 2008; Hassenzahl 2003] address external variables (such as person artifact and task), but can be classified uniformly as specificto-general. This limits their potential when applied to the general-to-specific type of inference investigated in the current research. In other words, these models are useful in highlighting the effects of external variables, but do not illuminate how these variables would affect the specific inference rules suggested in the present paper. Our third aim was to explore how well the inference model works across different types of experience. In the following, we report three independent studies. For each we expect the basic inference model described previously to hold (Figure 1). However, we set out to clarify whether the model replicates when different single products are used (Aim 1, all studies). To do so, we employed three different Web sites that were not homogeneous and varied in familiarity to participants, in order to establish the generality of the results over a range of artifacts. In addition, we studied the effect of additional hands-on experience (Aim 2, all studies) by deliberately comparing judgments before and after actual experience. This comparison provides information about the persistence of an inference model when users gain additional information in the process of product use. To further broaden the scope and potential generality of the inference model (Aim 3), we employed two different types of experience (activity versus goal-oriented, Study 1 versus Study 2 and Study 3) and even deliberately varied experience through an external factor (in Study 3). 2. STUDY 1: ACTION-MODE 2.1 Method 2.1.1 Participants. Ninety-four undergraduate psychology students (73 females and 21 males), with a mean age of 24 years (SD = 9) took part in the experiment as a course requirement. All participants had used the World Wide Web and all but two had used the target Web site (Wikipedia). Mean expertise using the Web was 8 years (SD = 3), mean time per week spent using the Web and Wikipedia was 17h and 3h respectively (SD = 12/7) and mean frequency of Web/Wikipedia use per week was 17/2 times (SD = 15/4). 2.1.2 Materials and Equipment. Participants gave responses to a 10-item short version of the AttrakDiff2 questionnaire [Hassenzahl and Monk 2010], consisting of 7-point semantic-differential items (see Appendix A). The following constructs were measured: ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


11:8

P. van Schaik et al.

Fig. 3. Sample Web page (Experiment 1). Wikipedia.org, Creative Commons License 3.0.

pragmatic quality (four items), hedonic quality (four items), beauty (one item) and goodness (one item).2 Participants used Wikipedia’s Web site as it existed in January 2010 (see Figure 3 for a sample Web page). The experiment in this study and the following were programmed in Visual Basic 6 and ran on personal computers (Intel Pentium, 1.86 GHz, 2 GB RAM, Microsoft Windows XP operating system, 17-inch monitors); the screen dimensions were 1280–1024; contrast (50%) and brightness (75%) were set to optimal levels. 2.1.3 Procedure. The study consisted of two phases and ran in a computer laboratory with groups of 15–20 participants working independently. In Phase 1, each participant was introduced to the Web Site through self-paced presentation of 10 noninteractive screenshots of different pages from Wikipedia’s Web site. Participants then completed the short version of the AttrakDiff2 questionnaire. In Phase 2, participants were free to use the same Web site to explore their own interests for 20 minutes (i.e., action mode). The median number of pages visited was 25, with a semi-interquartile range of 22. After this, participants again completed the AttrakDiff2 and answered demographic questions. The study took about 35 minutes to complete. The procedure used in this and the following study represents an extension of that used by Hassenzahl and Monk [2010] in that there were two separate phases for presentation only and additional hands-on experience, with measures taken at the end of each 2

Refer to Appendix B, Tables II and III for a summary of the scales’ psychometric properties.

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective

11:9

phase, and having a fixed set of screen shots for presentation and a predefined task for interaction. 2.1.4 Data Analysis with PLS. Partial-least-squares (PLS) path modeling [Vinzi et al. 2010] was used for data analysis in all three studies for the following reasons. PLS allows the analysis of both single-stage and multi-stage models with latent variables, allowing the integrated analysis of a measurement model and a structural model. Each latent variable (usually a psychological construct) is measured using one or more manifest variables (usually items). PLS is compatible with multiple regression analysis, analysis of variance and unrelated t-tests: the results of these techniques are special cases of the results of PLS, but these techniques do not account for measurement error, whereas PLS does. PLS does not require some of the assumptions imposed by covariance-based structural equation modelingâ&#x20AC;&#x201D;including those of large sample sizes, and univariate and multivariate normality. All PLS analyses reported here satisfied the following minimum requirement for robust estimations in PLS path modeling [Henseler et al. 2009]: the larger of (a) ten times the number of indicators of the scale with the largest number of formative indicators and (b) ten times the largest number of structural paths directed at a particular construct in the inner path model. Recent simulation studies have demonstrated that PLS path modeling performs at least as well as and, under various conditions, is superior to covariance-based structural equation modeling in terms of bias, root mean square error and mean absolute deviation [Hulland et al. 2010; Vilares et al. 2010]. For a consistent approach, the data analyses for all studies were conducted with PLS by way of the SmartPLS software [Ringle et al. 2005], unless stated otherwise. A bootstrapping procedure N = 5000, as Henseler et al. [2009] recommend was used to test the significance of model parameters. Each indirect (mediated) effect (e.g., the effect of beauty on pragmatic quality mediated by goodness) was calculated as the product of the two constituent direct effects (e.g., of beauty on goodness and of goodness on pragmatic quality) comprising the indirect effect. Then bootstrapping was used to test the indirect effect. In particular, each bootstrap sample produced parameter estimates for the constituent direct effects, from which the indirect effects were calculated. The mean and standard error of this calculated estimate over the bootstrap samples and, from these, a t-statistic was then calculated to test the significance of the indirect effect. The total effect (e.g., of beauty on pragmatic quality) was broken down into the indirect effect (e.g., of beauty on pragmatic quality mediated by goodness) and the direct effect (e.g., of beauty on pragmatic quality with goodness held constant). In each experiment, tests of the difference between regression coefficients before and after interaction were conducted to test their stability. In a single analysis, each bootstrap sample produced all coefficients (for presentation-only and hands-on experience). Mean difference and standard error of the difference of each pair of coefficients (for presentation-only and hands-on experience) and, from these, a t-statistic were calculated to test the significance of the difference. 2.2 Results and Discussion

Figure 4 shows the results of the proposed inference model (see Figure 1) for (a) presentation-only and (b) additional hands-on experience. Presented (a) are standardized path coefficients, with figures in brackets representing indirect effects, and (b) the variance in each endogenous latent variable that is explained by the direct effects on it. For example, in Figure 1(a), 39% (R2 = .39) of variance in pragmatic quality was explained by the direct effects of beauty and goodness, while 35% (R2 = .35) of variance in hedonic quality was explained by beauty, goodness and pragmatic quality. ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


11:10

P. van Schaik et al.

Fig. 4. Structural model (Study 1) – (a) presentation-only, (b) with hands-on experience.

In the presentation-only condition the link between beauty and pragmatic quality was fully mediated, with a nonsignificant direct effect of beauty on pragmatic quality, β = −.05, p > .05, but a significant indirect effect of beauty on pragmatic quality via goodness, β = .34, p < .001. The opposite was true for the effect of beauty on hedonic quality. The direct effect was significant, β = .42, p < .01, while the indirect was not, β = .06, p > .05. After additional hands-on experience, the results changed little. The indirect effect of beauty on hedonic quality became significant, β = .13, p < .05, but was still much smaller than the direct effect, β = .62, p < .001. Test results of the difference between each of the path coefficients in presentationonly and with additional hands-on experience were not significant, all |t| ≤ 1.22, p > .05, demonstrating the stability of the inference model. This is the case despite differences in the standardized means between the four measures (with a small change for goodness, but a very small change for hedonic quality and a negligible change for pragmatic quality and beauty; see Table II in Appendix B for details). Overall, the inference model was replicated and proved stable, even with hands-on experience. ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective

11:11

Fig. 5. Sample Web page (Study 2). Reproduced with permission of Manchester City Council.

3. STUDY 2: GOAL MODE 3.1 Method 3.1.1 Participants. Sixty-six undergraduate psychology students (49 females and 17 males), with a mean age of 24 years (SD = 8) took part in the study as a course requirement. All participants had used the World Wide Web, but only one had ever used the target site that was employed in the experiment (Manchester City Council’s Web site). Mean expertise using the Web was 10 years (SD = 3), mean time per week spent using the Web was 19 hr (SD = 16) and mean frequency of Web use per week was 16 times (SD = 10). 3.1.2 Materials and Equipment. As in Study 1, participants gave responses to AttrakDiff2.3 Participants used Manchester City Council’s Web site as it existed in October 2009 (see Figure 5 for a sample Web page). 3.1.3 Procedure. The study was run in a computer laboratory with groups of 15–20 participants who worked independently. This study, like the previous study, consisted of two phases. In Phase 1, each participant was introduced to the Web site through the self-paced presentation of nine noninteractive screenshots of different pages from the site. Participants then completed the short version of the AttrakDiff2 questionnaire. In Phase 2, a series of information retrieval tasks were presented, which reflected the various types of information that were available on the Web site. The target 3

Refer to Appendix C, Tables IV and V for a summary of the scales’ psychometric properties.

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


11:12

P. van Schaik et al.

information was factual, could be found by following links in the Web site starting from the homepage, and the location of the target information within the information area of the Web pages was not predictable from one task to the next. In each trial, a question appeared at the top of the screen, for instance “What is the cost of a new application for a Street Trading License in Manchester?” Once participants had read the question, they clicked on a button labeled “Show Web site.” The home page of the site appeared on the screen and they had to find the answer to the question, which remained visible while participants used the site to search for the appropriate information. Participants were told to take the most direct route possible to locate the answer. Having found it, they clicked on a button labeled “Your answer,” which opened a dialogue box at the bottom of the screen. Participants typed their answers into the box, clicked on OK and moved on to the next question. After 3 practice questions, the main set of 10 information retrieval tasks followed, with a maximum duration of 20 minutes. After the information retrieval task, participants again completed the AttrakDiff2 before answering demographic questions. The study took about 35 minutes to complete. Data analysis was identical to that conducted in Study 1. 3.2 Results and Discussion

Figure 6 shows the results of the PLS analysis of the inference model for (a) presentation-only and with (b) additional hands-on experience. In the presentationonly condition the link between beauty and pragmatic quality was fully mediated, with a significant indirect effect, β = .48, p < .001. Again, the opposite was true for the effect of beauty on hedonic quality. The direct effect was significant, β = .55, p < .001, while the indirect was not, β = .11, p > .05. The same pattern was apparent after hands-on experience. The link between beauty and pragmatic quality was fully mediated, with a significant indirect effect, β = .45, p < .001. However, the direct effect of beauty on hedonic quality was significant, β = .43, p < .001, while the indirect effect was not β = .19, p > .05. Test results of the difference between each of the regression coefficients in the presentation-only condition and the hands-on experience condition were not significant, all |t| < 1, further demonstrating the stability of the basic inference model. This was the case despite differences in the standardized means between the four measures (with large changes for goodness and hedonic quality, but medium for beauty and small for pragmatic quality, see Table V(c) in Appendix C for details). Overall, the inference model was replicated and proved stable even with hands-on experience. It did so although the usage mode (action versus goal) and the product (Wikipedia versus council Web site) differed between Study 1 and Study 2. 4. STUDY 3: GOAL MODE WITH VARIED COMPLEXITY 4.1 Method 4.1.1 Participants. One hundred and twenty-seven undergraduate psychology students (102 females and 25 males), with a mean age of 23 years (SD = 8) took part in the experiment as a course requirement. All participants had used the World Wide Web, but had not used the target Web site that was employed ([fictional] Whitmore University’s psychology intranet site). Mean expertise using the Web was 11 years (SD = 3), mean time per week spent using the Web was 17 hr (SD = 14) and mean frequency of Web use per week was 15 times (SD = 10). 4.1.2 Study Design. Study 3 advanced Study 2 by introducing two additional factors, varying the hands-on experience: task complexity (simple or complex; see Section 4.1.4) and site complexity (simple or complex; see Section 4.1.3). Site complexity was ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective

11:13

Fig. 6. Structural model (Study 2) â&#x20AC;&#x201C; (a) presentation-only, (b) with hands-on experience.

primarily included to establish the generality of the effect of task complexity across different levels of site complexity. 4.1.3 Materials and Equipment. As in the previous two experiments, participants gave responses to the short version of the AttrakDiff2 questionnaire.4 Two versions of a Web site were modeled as a typical psychology site for university students, and especially designed and programmed for the experiment. In addition to the homepage, the main pages of the high-complexity site (see Figure 7a) were Teaching, Research, Fees and Funding, Hall of Fame, Library, Staff, Sports and Leisure, Careers, and About, with 9990 further Web pages. In addition to the home page, the main pages of the lowcomplexity site (see Figure 7(b)) were Teaching, Research, Fees and Funding, and Hall of Fame, with 620 further Web pages. All links and content of the low-complexity site were also included in the high-complexity site. The latter had more pages than the 4

Refer to Appendix D, Tables VI and VII for a summary of the scalesâ&#x20AC;&#x2122; psychometric properties.

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


11:14

P. van Schaik et al.

Fig. 7. Low (a) and high (b) complexity site with a sample Web page (Study 3).

former, but both sites had an equal number of four levels of depth (from the home page) and therefore both site versions allowed simple and complex tasks to be completed. 4.1.4 Procedure. The procedure of Study 3 was essentially the same as that of Study 2. In Phase 1, five noninteractive screenshots of different pages from either the low- or high-complexity site were presented. In Phase 2, 3 practice questions and a main set of 37 information retrieval tasks followed, with a maximum of 20 minutes available to complete the tasks. From the home page, complex tasks required users to select four links in succession to access the Web page with the necessary information. Simple tasks required users to select two successive links. 4.1.5 Data Analysis with PLS. In Study 3, we used the inference model in the presentation-only condition similar to Study 1 and Study 2. Theoretically, site complexity could impact perception of the site in the presentation-only condition, but this seemed unlikely due to its operationalization (see section Materials and equipment and Procedure). Site complexity was varied by the number of pages in the Web site (620 versus 9990) and the number of sections offered (4 versus 9). In the presentationonly condition, however, each participant was presented with the same number of sample pages as screenshots. Therefore, it was essentially not possible to experience site complexity in terms of number of pages. The number of sections, though, was visible through the sitesâ&#x20AC;&#x2122; main menus (see Figure 7). However, from visual inspection this difference seemed negligible, when one does not interact with the site. For the hands-on experience condition, we extended the model to account for potential effects of task complexity. As in Study 1 and 2, PLS was used for data analysis. It was felt important to verify that the relations in the basic inference model were not influenced (moderated) task complexity, in order to establish that the effects in this model held true irrespective of task complexity. For this purpose, Henselerâ&#x20AC;&#x2122;s [2009] nonparametric analysis for two-group comparisons of PLS model parameters was used. Significance tests were conducted of the difference between each of the regression coefficients from the low-task complexity conditions and the high-task complexity conditions in the basic inference model after hands-on experience. 4.2 Results and Discussion

Figure 8 shows the results of the PLS analysis of the inference model for (a) presentation only, (b) additional hands-on experience, and (c) additional hands-on experience, ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective

11:15

including task complexity and site complexity. In the presentation-only condition (see Figure 8(a)) the link between beauty and pragmatic quality was fully mediated, with a significant indirect effect, β = .33, p < .001. The effect of beauty on hedonic quality was partially mediated, with a large significant direct effect, β = .57, p < .001, and a small significant indirect effect, β = .16, p < .001. In the hands-on experience condition (see Figure 8(b)), the link between beauty and pragmatic quality was only partially mediated, with a significant indirect effect, β = .36, p < .001. However, while significant, the direct effect of beauty on pragmatic quality, β = −.25, p < .001, was negative. The link between beauty and hedonic quality was partially mediated, with a large significant direct effect, β = .66, p < .001, and a small significant indirect effect, β = .10, p < .05. Tests of the difference between each of the regression coefficients in the presentation-only condition and the hands-on experience condition were not significant, all |t| < 1, further demonstrating the stability of the inference model. This is the case despite differences in the standardized means between the four measures (with a large change for pragmatic quality, but a very small change for goodness and a negligible change for hedonic quality and beauty, see Table VII(c) in Appendix D for details). Figure 8(c) shows the basic inference model, including task- and site complexity. As expected, task complexity only influenced the perception of pragmatic quality but not that of hedonic quality. The more complex the task, the less pragmatic the Web site appeared. This lends further support to the conceptual distinction (i.e., construct validity) between pragmatic and hedonic quality. While the former is related to goals and their achievement, the latter is not. This is reflected by the differential relationships. Besides this, the relationships in the model did not change due to the manipulation of pragmatic quality through task complexity. In fact, none of the coefficients differed between participants who were exposed to low task complexity (n = 64) and those who were exposed to high task complexity (n = 63), with p > .05 for all comparisons. Thus, these effects were independent of the manipulation of task complexity; task complexity was not a moderator of the coefficients. In conclusion, the results of Study 3 further support the inference model. The results also provide support for the negative effect of task complexity on pragmatic quality and the lack of such an effect on hedonic quality, while the basic relations according to general-to-specific inference remained intact. Table I summarizes the results from all three studies to facilitate comparison. The rightmost column further presents the average coefficient across all studies. As expected, the direct links between beauty and goodness, and between beauty and hedonic quality were substantial (.56, .55, respectively). The direct link between beauty and pragmatic quality was not (−.05). In contrast, the indirect effect of beauty on pragmatic quality was substantial (.38) and larger than the indirect effect of beauty on hedonic quality (.13).

5. GENERAL DISCUSSION 5.1 The Inference Model

In relation to Aim 1, three studies supported our specific inference model for tying together beauty, pragmatic quality (i.e., perceived usability) and hedonic quality (i.e., stimulation, identification). Beauty and overall evaluation were highly correlated, confirming the longstanding inference rule of “What is beautiful is good” [Dion et al. 1972], borrowed from person perception. As further expected, the effect of beauty on hedonic quality was primarily direct (probabilistic consistency as an inference rule), but the ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


11:16

P. van Schaik et al.

Fig. 8. Structural model (Study 3) â&#x20AC;&#x201C; (a) presentation-only, (b) with hands-on experience, (c) with hands-on experience, including task and site complexity.

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective

11:17

Table I. Standardized Regression Coefficients for the Inference Model Across Studies Study 1 Presentation With only hands-on experience

Effect

Study 2 Presentation With only hands-on experience

Study 3 Presentation With only hands-on experience

Meana

Direct B→G B→HQ B→PQ G→HQ G→PQ PQ→HQ

***.52 **.42 −.05 .11 ***.65 .19

***.61 ***.62 −.04 *.21 ***.54 ***.20

***.60 ***.55 .03 .18 ***.79 .21

***.59 ***.43 .13 .33 ***.76 −.01

***.56 ***.57 −.09 ***.29 ***.59 −.07

***.48 ***.66 ***−.25 *.21 ***.75 −.07

.56 .55 −.05 .22 .69 .08

B→G→HQ B→G→PQ

.06 ***.34

.13 ***.33

.11 ***.48

.19 ***.45

.16 ***.33

.10 ***.36

.13 .38

.48 .29

.75 .29

.65 .51

.62 .58

.74 .24

.76 .11

.68 .35

Indirect Total B→HQ B→PQ

Note: To facilitate comparison between studies, the results of Study 3 are those from the model without the experimental manipulations (i.e., task complexity and site complexity). a Coefficients were Fisher-Z transformed, averaged, and retransformed. * p < .05. ** p < .01. *** p < .001.

effect of beauty on pragmatic quality was primarily indirect (evaluative consistency as an inference rule), in other words, mediated by goodness. The present findings replicate Hassenzahl and Monk’s [2007, 2010] previous work, but with the following advances. First, in relation to Aim 2, evidence for inference rules was found when hands-on experience was experimentally controlled. Specifically, the same pattern of results was found after only presentation of a product and with subsequent hands-on experience. This increases the generalizability of the model. Admittedly, the hands-on experience was rather brief and may have not been intense enough to overrule well-used inference rules. The results from Study 1, however, suggest that this may not be the case. This is because participants were reportedly regular (in frequency) and substantial (in time spent) users of Wikipedia (i.e., the Web site used in the study). However, expertise over time (for example, years of experience) was not recorded, so the extent of this experience could not be analyzed in relation to inference of usability from beauty. Thus, future studies may extend the hands-on experience even further to explore when acquired specific information about attributes may overrule simple inference rules. But for now, our finding of a stable model adds to its applicability. Second, in relation to Aim 3, evidence was found for the suggested inference rules across two types of tasks (goal mode and action mode), within different products (Wikipedia, council Web site, hypothetical University Web site) and even when task complexity and artifact complexity were systematically varied. Our findings, thus increase external validity in terms of generalizability across tasks and products as well as levels of task- and product complexity. Despite the achievements of the research reported here, there are some potential limitations, in particular in relation to the choice of Web sites and range of participants. Although the Web sites used here were representative of different content classes of Web sites as artifacts (knowledge repository, government service and study in formal education), were substantially large and predominantly presented information through text, with some use of graphics, they do not exhaustively represent the full range of Web sites or artifacts more generally. Therefore, replication using of a wider range of content (e.g. online sales, content sharing and social networking) and use of media (e.g. more extensive use of graphics, sound and video) would further increase confidence in the generalizability of the findings. ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


11:18

P. van Schaik et al.

The use of psychology students as participants is typically justified in research, such as that reported here, where the goal is to examine psychological processes that operate generally across the population to which the tasks apply, and where there is no credible reason to assume that a different type of processing would occur in other members of the same population. In our research there is no apparent reason why different psychological processing would occur in the formation of UX judgments by other members of the populations of artifact users in each of our experiments than the samples that took part. Still, further confidence would be gained in the generalizability of the results reported here if they were replicated with samples from nonstudent populations of Web users. Three findings require some further discussion: the negative direct link between beauty and pragmatic quality (Study 3), the combination of evaluative and probabilistic consistency for beauty and hedonic quality, and the relationship between hedonic and pragmatic quality. Beauty and Pragmatic Quality. In Study 3, a significant direct effect of beauty on pragmatic quality was observed. However, this effect was negative, hinting at a “What is beautiful cannot be usable”-rule. Indeed, Hassenzahl and Monk [2010] reported two out of seven direct effects of beauty on pragmatic quality to be negative and significant (but not a single significant positive one). It may be that another particular type of inference is operating here, namely compensatory inference [Chernev and Carpenter 2001]. Imagine you are faced with two laptops of the same price. While laptop A has a mediocre processing power, but a large hard disk, laptop B has a better processing power, but the capacity of the hard disk is unknown. In this situation many people infer that the hard disk capacity of laptop B must be inferior to those of A. Kardes and colleagues [2004b] call this negative-correlation-based inference, and the present example draws upon people’s lay theories of competition in the market. Given the same price and some outstanding attributes, there must be some other, poorer attributes that are compensated for. While Chernev and Carpenter [2001] demonstrated the relevance of compensatory rules only in the specific domain of prices and product details, they might be much more pervasive. Who would not start to feel suspicious, if a friend excitedly tells about a new acquaintance, who is not only “educated”, “good-looking” and “sexy”, but also “unearthly rich”? In an unpublished study on strategies of people for inferring usability from a number of other attributes, we found only 3% of all participants to use beauty as an indicator of a lack of pragmatic quality (i.e., “the dark side of beauty” [Hassenzahl 2008]). In the present studies, of six models only one revealed a negative correlation. In other words, while potentially interesting and worth further exploration, we believe this negative stereotyping to be rather marginal. Evaluative and Probabilistic Inference Combined. In three out of six models (Study 1, Study 2) the effect of beauty on hedonic quality was partially mediated, that is, a larger direct effect (.55 on average, Table I) was complemented by a smaller indirect effect (.13 on average, Table I). We do not see this as a contradiction, but rather believe this to be a good illustration of how different inference rules intertwine. Evaluative consistency is the consequence of applying the “What is beautiful is good”-rule and the “I like it, it must be good on all attributes”-rule (i.e., “halo”-effect). This results in a highly beauty-driven overall evaluation, which in turn impacts all other attribute judgments. In other words, the more beautiful the better on all other attributes. It is easy to envision this as a ubiquitous, almost automatic process—the predominant mode to handle momentarily unavailable information. In addition, more specific rules may modify the outcome of this process, such as the rule that beauty is a direct indicator of hedonic quality. (Another example, would be the “What is beautiful cannot be ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective

11:19

usableâ&#x20AC;?-rule discussed before.) Therefore, stronger direct effects occur (due to probabilistic consistency), in addition to indirect effects (due to evaluative consistency). Furthermore, in the present studies goodness is more related to pragmatic quality (average .69, see Table I) than to hedonic quality (average .22, see Table I). In general, it is assumed that both qualities equally contribute to a productâ&#x20AC;&#x2122;s appeal [Hassenzahl et al. 2000; Hassenzahl 2001]; however, situational aspects may override this. In fact, people only use available information to infer the unavailable, if they find the available information relevant [Kruglanski et al. 2007]. For example, people rely more on their feelings towards an object when the judgment calls for the affective. In other words, people may find it more appropriate to rely on their feelings when asked about whether they believe a shoe to be comfortable (i.e., experiential) compared to whether they believe a shoe to be a good bargain (i.e., factual) [Yeung and Wyer 2004]. If they find a particular piece of information to be inappropriate, they will tend to discount it. The same may happen here. As long as all products used in the present studies are rather utilitarian in nature (even the product used in action mode in Study 1), people may find it inappropriate to infer hedonic quality from their global judgments, resulting in a weaker link between goodness and hedonic quality than that between goodness and pragmatic quality. Pragmatic and Hedonic Quality. In general, we expected perceptions of pragmatic quality and hedonic quality to be independent. On average, this was true for the present studies (the average regression coefficient was .07; see Table I). In addition, task complexity had its significant effect only on pragmatic quality, but not on hedonic quality, thereby further corroborating the conceptual distinction between both constructs in the sense of construct validity. Nevertheless, Study 1 revealed a small but significant relationship between pragmatic and hedonic quality. One explanation may be, that the independence between pragmatic and hedonic quality may be less strong in a situation where the focus is on the action itself [action mode; Hassenzahl 2003] rather than on achieving goals. This is because in such a situation, the interaction itself could to some extent be a source of pleasure. Our results provide evidence to demonstrate independence between pragmatic quality and hedonic quality after taskoriented hands-on experience in Study 2 and Study 3 where standardized regression coefficients were (extremely) small and not positive. However, consistent with our account, in Study 1 the small positive coefficient remained stable after exploration-based hands-on experience. Overall, the small correlations between pragmatic and hedonic quality found in Study 1 and Study 2 do not throw doubt on the general findings presented here. They are nevertheless an interesting topic for future research. 5.2 Inference of User-Experience from a Wider Perspective

Two broad classes of models accounting for processes of judgment and decision-making can be distinguished. One class considers these processes as computational, whereas the other does not. The approach taken here, based on Kruglanski et al.â&#x20AC;&#x2122;s [2007] unified framework for conceptualizing and studying judgment as inference, fits with the approach taken by noncomputational theories. In contrast to computational theories (such as normative theories [e.g. subjective utility theory] and psychological descriptive theories [e.g., prospect theory]), noncomputational theories assume that people do not make complex (weighted and summated) calculations using probability and utility. Instead, noncomputational theories assume that people use relatively simple cognitive strategies (simples rules Chater and Brown [2008], heuristics [Gigerenzer and Gaissmaier 2011; Kruglanski and Gigerenzer 2011] in making judgments and decisions. Kruglanski et al. [2007] provide a unified framework for conceptualizing and studying human judgment as inference. Specifically, they present cogent evidence ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


11:20

P. van Schaik et al.

that, perhaps surprisingly, inference is ubiquitous in the formation of judgments across a wide range of judgment types, including conditioning, perception, pattern recognition and social judgment, which were previously believed to be ruled by disparate mechanisms. Important questions are then what sources are used for inference and how are these used. Information Sources in Judgment. Our participants had two or three sources of information available to make their judgment: (1) their impression from the presentation of a product, (2) hands-on experience from of subsequent interaction with the product, and (3) memory of previous product experience (in Study 1). A systematic study of the effects of the different information sources in UX seems important. Recent research in cognitive psychology, for example, has demonstrated that the results of judgments from description and those from experience can differ dramatically [Barron and Leidner 2010; Hertwig et al. 2004]. Perhaps as important as the distinction between the different sources of judgment is that the effect of one source on judgment may be moderated by the effects of others, for example, where product attributes in a product description contradict those that a user encounters in the presentation of the same product. Therefore, the choice of the combination of sources that are studied could have profound effect on the results and the conclusions drawn from these results regarding the way people form judgments. Indeed, research has demonstrated that product description can have an effect on product evaluations if they considered these before they considered their own product experience [Wooten and Reed 1998] and sometimes people cannot even distinguish whether they actually experienced a product or only were presented with a description of an experience [Rajagopal and Montgomery 2011]. Judgment Parameters in Inference-Based Judgment. After making a cogent case for the ubiquity of inference in human judgment, Kruglanski et al. [2007] provide an account for the use of information sources in the formation of judgments by the application of (simple) inference rules of the form “If X then Y.” A large store of such rules, if judiciously applied in appropriate circumstances, can be a powerful tool in a wide range of human judgments. In their inference framework, four largely independent parameters influence judgment. First, informational relevance, is the extent to which a particular inference rule is seen as valid and therefore can affect the likelihood of its application (such as whether goodness may be relevant to infer hedonic quality of a predominantly utilitarian product). Second, task demands can affect the likelihood of the application of an inference rule. Third, cognitive resources, which are an individual’s capabilities in a particular situation, will also influence the application of inference. Fourth, motivation controls both the amount of effort expended to process information (nondirectional motivation) and the weight attached to each element of information, with a higher weight for items that are seen as more desirable (directional motivation). Kruglanski et al. [2007] maintain that in a particular judgment situation, the values on the identified parameters of judgment can account for people’s judgment, irrespective of the process(es) that produced these values. This is therefore a unified framework for studying judgment and allows for the systematic investigation of the parameters on people’s judgment of UX quality. When placed in the context of this framework, the current study found evidence that the application of inference rules for beauty, goodness, pragmatic quality and hedonic quality were not influenced by the task type (goal mode or action mode, although this situation was confounded with the artifact that was used). Moreover, when task demands increased, evidence for the application of the basic inference rules remained essentially unchanged. An attractive ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective

11:21

feature of Kruglanski et al.’s framework is that it can be seen as compatible with existing frameworks for studying human-computer interaction, in particular Finneran and Zhang’s [2003] person-artifact-task model. In this context, cognitive resources and motivation would be examples of person characteristics. Information relevance would be an example of an artifact characteristic. Task demand would be an instance of a task characteristic. One potential limitation, in principle, of Kruglanski et al.’s framework in the context of UX is that they seem to focus on sources from description, presentation and memory, but (whether intentionally or not) not direct interaction or actual task performance with a product, which is ultimately important in human-computer interaction. However, research on the experience-description gap demonstrates that firsthand experience may produce different outcomes for judgment and decision-making. In the present study, the relationships between the constructs in the inference model remained essentially unchanged with experience of interacting with a particular artifact. Perhaps this was because participants were familiar with the type of task that they were performing (i.e., navigating a Web site), so that judgment processes were not radically altered by hands-on experience. Another apparent limitation of Kruglanski et al.’s work is the use of simple inference rules of the form “If X then Y.” However, we found evidence for a more complex rules in the form of an extension of evaluative consistency (a relationship between beauty and pragmatic quality mediated by goodness) and other work has revealed the operation of a still more complex type [moderated mediation, Hassenzahl et al. 2010]. Nevertheless, by allowing more complex rules, Kruglanski et al.’s framework could accommodate these findings. Collections of inference rules rather than models. The conceptualization of judgment as inference with different rules for different types of judgment has consequences for UX models. Given the focus on such rules, it follows that the evidence needed to establish whether particular rules have been applied in judgment is the predictive value of antecedents on consequents of each rule [Kruglanski et al. 2007]. We therefore suggest abandoning the notion of a model in favor of a network of (interconnected) inference rules or even a set of rules (with unspecified connections), where the predictive value of antecedent for consequent constructs are analyzed, but covariances between indicators in different rules are not the focus of interest. This proposition has some interesting implications. For example, we need to find ways to disentangle the combination of rules manifest in data. As noted previously, it is likely that multiple rules operate simultaneously. Therefore, patterns in a particular data set are the result of the combined effect of rules that operated on variable values. In addition, we may broaden our methods to study individual judgment processes, that is how a single person makes inferences under different conditions [see Jacobson 2004 for an example in the domain of aesthetic judgments]. 5.3 Implications of the Inference Perspective and Future Work

Given the consistent findings reported here, one might ask whether there is any way that the suggested inference model could be “falsified.” In the light of the previous discussion, the answer would be that from an inference perspective a comprehensive model is not of interest. Rather, the task is to demonstrate evidence for particular inference rules that people may apply in making their judgments of UX, and some of these rules may be applied more frequently than others. For example, in the case of product purchase, price may moderate the influence of beauty on other variables. If price and beauty do not match (for instance, beauty is very high, but price is very low) then people may conclude, through compensatory inference, that overall quality and pragmatic quality must be low and perhaps even that hedonic quality in interaction ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


11:22

P. van Schaik et al.

with the product must be low. The more rules we know, the better we are able to understand the facetted nature of judging interactive products. Consistent with the inference rules of evaluative consistency for pragmatic quality (and hedonic quality) and of probabilistic consistency for hedonic quality, the results of the current study indicate that beauty and goodness are important predictors of pragmatic quality and hedonic quality both after presentation of a product and after subsequent hand-on experience. Therefore, from the perspective of design, characteristics that contribute positively to people’s evaluations of beauty and overall quality before and after actual use of a product would contribute to their perceptions of pragmatic and hedonic quality, in particular when hands-on experience is limited or absent. From an inference perspective, future work would further test the effect of the putatively generally applicable judgment parameters on people’s judgments in humancomputer interaction, while taking into account the different sources of information identified here. For example, given that the current study was mainly concerned with the inference of judgments of product attributes, in particular pragmatic quality, in contrast to hedonic quality, we manipulated task complexity to establish evidence for the application of a rule that links task complexity to pragmatic quality. However, other experimental manipulations could further elucidate the operation of inference rules in judgment. For instance, the evaluation of beauty would be a mediator of the effect of user-interface design decisions (e.g., fonts, colors, grids) on goodness and hedonic quality, and goodness would be a further mediator of the mediated effect of aesthetics, but, perhaps as perceptions of beauty diminish over time, mediation may no longer occur. As another example, the inherent design characteristics of a product that contribute to its “objective” usability would also contribute to judgments of pragmatic quality, in addition to inferences from product evaluations and task complexity. Furthermore, Bloch et al.’s [2003] research on consumer-products suggests that some people may be more influenced in their judgments by beauty than others. If this were the case for interactive computer-based products then it could have implications for judgments from inference. The consequence would be that people hold different rules in terms of the predictive validity of beauty. Obviously, experimental research can control the situations in which people experience through the presentation of and interaction with products, and scenarios in which they imagine product experience through descriptions. According to Kruglanski et al. [2007], people only apply inference rules (not necessarily consciously) that they deem valid (but, again, not necessarily consciously) to make judgments. However, while not explicitly taking an inference perspective, Hassenzahl [2010] found evidence to support the idea that attribution (that the product is responsible for their product experience) is necessary for need fulfillment to have an effect on perceptions of hedonic quality. The more general question then becomes, to what extent people need to attribute their experience during their encounter (presentation, interaction, description, memory) with a particular product to the product for inference rules to operate. This is particularly important given that when people answer the question to what extent (or whether) they attribute their experience to the product they may differ in the aspects of UX (that would map onto UX variables that the research investigates) they consider and they even may not consider the aspects that correspond to UX variables that the research investigates. Given that with repeated experience of interaction with a product people have increasingly more directly relevant information to make judgments about product attributes, a further research question would be whether general-to-specific inference rules still operate on a longer time-scale (months or years). Within Kruglanski et al.’s [2007] framework, this will depend on the extent to which a particular inference rule has been activated in the past. Here the consistent previous successful application of ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective

11:23

the same rule (across products of the same type) would be important for its application in the future. For example, if probabilistic consistency for hedonic quality (“What is beautiful is pleasurable”) has been applied repeatedly (over a long period of time) and every time a beautiful product (condition met) was found to be pleasurable (outcome achieved) then it would be more likely that this rule will be applied next time when one encounters a new product, but less likely if when the condition was met the outcome was not consistently achieved. 6. CONCLUSION

In conclusion, we found consistent evidence for the operation of a basic set of inference rules to account for people’s perceptions of product attributes across different artifacts and usage modes. Our approach and findings fit into a more general theoretical framework of judgment as inference that leads to interesting propositions for future research. We look forward to the wider application of this approach, in particular experimental work that investigates the effects of person characteristics and of manipulating artifact- and task characteristics on model parameters. APPENDIX A PQ1 PQ2 PQ3 PQ4 HQ1 HQ2 HQ3 HQ4 Beauty1 Goodness1

Pragmatic Quality – I judge the Web pages to be Confusing – Structured Impractical – Practical Unpredictable – Predictable Complicated – Simple Hedonic Quality – I judge the Web pages to be Dull – Captivating Tacky – Stylish Cheap – Premium Unimaginative – Creative Beauty – I judge the Web pages overall to be Ugly – Beautiful Goodness – I judge the Web pages overall to be Bad – Good

ELECTRONIC APPENDIX

The electronic appendix for this article can be accessed in the ACM Digital Library. REFERENCES A PTER , M. J. 1989. Reversal Theory: Motivation, Emotion and Personality. Taylor & Frances/Routledge, Florence, KY. B ARRON, G. AND L EIDNER , B. 2010. The role of experience in the Gambler’s Fallacy. J. Behav. Decis. Making 23, 117–129. B LOCH , P. H., B RUNEL , F. F., AND A RNOLD, T. J. 2003. Individual differences in the centrality of visual product aesthetics: Concept and measurement. J. Consum. Resear. 29, 551–565. C HATER , N. AND B ROWN, G. D. A. 2008. From universal laws of cognition to specific cognitive models. Cogn. Sci. Multidisc J. 32, 36–67. C HATER , N., O AKSFORD, M., N AKISA , R., AND R EDINGTON, M. 2003. Fast, frugal, and rational: How rational norms explain behavior. Organiz. Behav. Hum. Decis. Proc. 90, 63–86. C HERNEV, A. AND C ARPENTER , G. S. 2001. The role of market efficiency intuitions in consumer choice: A case of compensatory inferences. J. Market. Resear. 38, 349–361. C HIN, W. W. 2010. How to write up and report PLS analyses. In Handbook of Partial Least Squares: Concepts, Methods and Applications in Marketing and Related Fields, V. E. Vinzi, W. W. Chin, J. Henseler, and H. Wang Eds., Springer, Berlin, 655–690.

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


11:24

P. van Schaik et al.

C YR , D., H EAD, M., AND I VANOV, A. 2006. Design aesthetics leading to m-loyalty in mobile commerce. Inf. Manage. 43, 950–963. D ION, K., B ERSCHEID, E., AND WALSTER , E. 1972. What is beautiful is good. J. Person. Soc. Psych. 24, 285–290. F INNERAN, C. M. AND Z HANG, P. 2003. A person-artefact-task (PAT) model of flow antecedences in computer-mediated environments. Int. J. Hum.-Comput. Stud. 59, 475–496. F ORD, G. T. AND S MITH , R. A. 1987. Inferential beliefs in consumer evaluations: An assessment of alternative processing strategies. J. Consum. Resear. 14, 363–371. G IGERENZER , G. AND G AISSMAIER , W. 2011. Heuristic decision making. Ann. Rev. Psych. 62, 451–482. G WIZDKA , J. AND S PENCE , I. 2006. What can searching behavior tell us about the difficulty of information tasks? A study of web navigation. In Proceedings of the 69th Annual Meeting of the American Society for Information Science and Technology. H ARTMANN, J., S UTCLIFFE , A., AND D E A NGELI , A. 2008. Towards a theory of user judgment of aesthetics and user interface quality. ACM Trans. Comput.-Hum. Interact. 15, 15. H ASSENZAHL , M. 2001. The effect of perceived hedonic quality on product appealingness. Int. J. Hum.Comput. Interact. 13, 481–499. H ASSENZAHL , M. 2003. The thing and I: Understanding the relationship between user and product. In Funology: From Usability to Enjoyment, M. Blythe, C. Overbeeke, A. Monk, and P. Wright Eds., Kluwer, Dordrecht, 31–42. H ASSENZAHL , M. 2008. Aesthetics in interactive products: Correlates and consequences of beauty. In Product Experience, R. Schifferstein and P. Hekkert Eds., Elsevier, 287–302. H ASSENZAHL , M. 2010. Experience design: Technology for All the Right Reasons. Morgan and Claypool, San Rafael, CA. H ASSENZAHL , M. AND M ONK , A. F. 2007. Was uns Sch¨onheit signalisiert. Zum Zusammenhang von Sch¨onheit, wahrgenommener Gebrauchstauglichkeit und hedonischen Qualitaten ¨ [What beauty signals to us. About the association between beauty, perceived usability and hedonic quality]. In Prospektive Gestaltung von Mensch-Technik-Interaktion. 7. Berliner Werkstatt Mensch-Maschine-Systeme, M. R¨otting, G. Wozny, A. Klostermann, and J. Hus Eds., VDI, Dusseldorf, ¨ 227–232. H ASSENZAHL , M. AND M ONK , A. 2010. The inference of perceived usability from beauty. Hum.-Comput. Interact. 25, 235–260. H ASSENZAHL , M. AND U LLRICH , D. 2007. To do or not to do: Differences in user experience and retrospective judgments depending on the presence or absence of instrumental goals. Interact. Comput. 19, 429–437. H ASSENZAHL , M., P LATZ , A., B URMESTER , M., AND L EHNER , K. 2000. Hedonic and ergonomic quality aspects determine a software’s appeal. In Proceedings of the Conference on Human Factors in Computing Systems. ACM, New York, NY, 201–208. ¨ H ASSENZAHL , M., D IEFENBACH , S., AND G ORITZ , A. 2010. Needs, affect, and interactive products—Facets of user experience. Interact. Comput 22, 353–362. H ENSELER , J., R INGLE , C., AND S INKOVICS, R. 2009. The use of partial least squares modeling in international marketing. In New Challenges in International Marketing (Advances in International Marketing), T. Cavusgil, R. Sinkovicks, and P. Ghauri Eds., Emerald, London, 277–319. H ERTWIG, R., B ARRON, G., W EBER , E. U., AND E REV, I. 2004. Decisions from experience and the effect of rare events in risky choice. Psych. Sci. 15, 534–539. H ULLAND, J., R YAN, M., AND R AYNER , R. 2010. Modeling customer satisfaction: A comparative evaluation of covariance structure analysis versus partial least squares. In Handbook of Partial Least Squares: Concepts, Methods and Applications in Marketing and Related Fields, V. E. Vinzi, W. W. Chin, J. Henseler, and H. Wang Eds., Springer, 307–325. J ACOBSEN, T. 2004. Individual and group modelling of aesthetic judgment strategies. Brit. J. Psych. 95, 41–56. K ARDES, F. R., C RONLEY, M. L., K ELLARIS, J. J., AND P OSAVAC, S. S. 2004a. The role of selective information processing in price-quality inference. J. Consum. Resear. 31, 368–374. K ARDES, F. R., P OSAVAC, S. S., AND C RONLEY, M. L. 2004b. Consumer inference: A review of processes, bases, and judgment contexts. J. Consum. Psych. 14, 230–256. K EENEY, R. L. AND R AIFFA , H. 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Cambridge University Press, New York, NY. K RUGLANSKI , A. W. AND G IGERENZER , G. 2011. Intuitive and deliberate judgments are based on common principles. Psych. Rev. 118, 97–109.

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective

11:25

K RUGLANSKI , A. W., P IERRO, A., M ANNETTI , L., E RB, H., AND C HUN, W. Y. 2007. On the parameters of human judgment. In Advances in Experimental Social Psychology, Vol 39, M. P. Zanna Ed., Elsevier Academic Press, San Diego, CA, 255–303. L AVIE , T. AND T RACTINSKY, N. 2004. Assessing dimensions of perceived visual aesthetics of web sites. Int. J. Hum.-Comput. Stud. 60, 269–298. L INGLE , J. H. AND O STROM , T. M. 1979. Retrieval selectivity in memory-based impression judgments. J. Person. Soc. Psych. 37, 180–194. L OKEN, B. 2006. Consumer psychology: Categorization, inferences, affect, and persuasion. In Annual Review of Psychology, Vol 57, D. L. Schacter Ed., Annual Reviews, Palo Alto, CA. 453–485. M ONK , A. F. 2004. The product as a fixed-effect fallacy. Hum.-Comput. Interact. 19, 371–375. O OSTENDORP, H. V., M ADRID, R. I., AND P UERTA M ELGUIZO, M. C. 2009. The effect of menu type and task complexity on information retrieval performance. The Ergon. Open J. 2, 64–71. R AJAGOPAL , P. AND M ONTGOMERY, N. V. 2011. I imagine, I experience, I like. J. Consum. Resear. 38, 578–594. R INGLE , C., W ENDE , S., AND W ILL , A. 2005. SmartPLS 2.0. Institute of Operations Management and Organizations, University of Hamburg, Germany. T HORNDIKE , E. L. 1920. A constant error on psychological rating. J. Appl. Psych. 4, 25–29. T RACTINSKY, N., C OKHAVI , A., K IRSCHENBAUM , M., AND S HARFI , T. 2006. Evaluating the consistency of immediate aesthetic perceptions of web pages. Int. J. Hum. Comput. Stud. 64, 1071–1083. VAN S CHAIK , P. AND L ING, J. 2009. The role of context in perceptions of the aesthetics of web pages over time. Int. J. Hum. Comput. Stud. 67, 79–89. VAN S CHAIK , P. AND L ING, J. 2011. An integrated model of interaction experience for information retrieval in a Web-based encyclopaedia. Interact. Comput. 23, 18–32. VAN S CHAIK , P. AND L ING, J. (To appear). An experimental analysis of experiential and cognitive variables in web navigation. Hum.-Comput. Interact. 26. V ENKATESH , V., M ORRIS, M. G., D AVIS, G. B., AND D AVIS, F. D. 2003. User acceptance of information technology: Toward a unified view. MIS Q. 27, 425–478. V ILARES, M., A LMEIDA , M., AND C OELHO, P. 2010. Comparison of likelihood and PLS estimators for structural equation modeling: A simulation with customer satisfaction data. In Handbook of Partial Least Squares: Concepts, Methods and Applications in Marketing and Related Fields, V. E. Vinzi, W. W. Chin, J. Henseler, and H. Wang Eds., Springer, 289–305. V INZI , V. E., C HIN, W. W., H ENSELER , J., AND WANG, H. Eds. 2010. Handbook of Partial Least Squares: Concepts, Methods and Applications in Marketing and Related Fields. Springer. W OOTEN, D. B. AND R EED, A. 1998. Informational influence and the ambiguity of product experience: Order effects on the weighting of evidence. J. Consum. Psych. 7, 79–99. Y EUNG, C. W. M. AND W YER , J. 2004. Affect, appraisal, and consumer Judgment. J. Consum. Resear. 31, 412–424. Received April 2011; revised October 2011, January 2012; accepted January 2012

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.


User-Experience from an Inference Perspective