User-Experience from an Inference Perspective by Marc Hassenzahl

User-Experience from an Inference Perspective PAUL VAN SCHAIK, Teesside University MARC HASSENZAHL, Folkwang University JONATHAN LING, University of Sunderland

In many situations, people make judgments on the basis of incomplete information, inferring unavailable attributes from available ones. These inference processes may also well operate when judgments about a product’s user-experience are made. To examine this, an inference model of user-experience, based on Hassenzahl and Monk’s [2010], was explored in three studies using Web sites. All studies supported the model’s predictions and its stability, with hands-on experience, different products, and different usage modes (action mode versus goal mode). Within a unified framework of judgment as inference [Kruglanski et al. 2007], our approach allows for the integration of the effects of a wide range of information sources on judgments of user-experience. Categories and Subject Descriptors: H.1.2 [Models and Principles]: User/Machine Systems—Human information processing; H.5.2 [Information Interfaces and Presentation]: User Interfaces—Theory and methods; H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia—Theory; I.6.5 [Simulation and Modeling]: Model Development General Terms: Experimentation, Human Factors, Theory Additional Key Words and Phrases: User-experience, model, inference perspective, beauty, aesthetics ACM Reference Format: van Schaik, P., Hassenzahl, M., and Ling, J. 2012. User-experience from an inference perspective. ACM Trans. Comput.-Hum. Interact. 19, 2, Article 11 (July 2012), 25 pages. DOI = 10.1145/2240156.2240159 http://doi.acm.org/10.1145/2240156.2240159

1. INTRODUCTION

Imagine you want to enhance your voice-over-IP-calls with a high-definition image. By coincidence, a local shop makes an exceptional offer (in terms of “value for money”) of a multifunctional (“all-singing-all-dancing”) webcam. Will you accept? The problem is to predict whether or to what extent the product would meet your needs. As you have no hands-on experience, you visit the shop to see for yourself what the product looks like in reality and to get further information from the helpful staff. However, you are not allowed to open the attractive transparent box in which the seductive product patiently awaits your expenditure. You simply cannot try the product before buying it. Therefore, in effect, you try to “guess”—or infer—the product’s reliability, usefulness and ease of use from the specific pieces of information that you find relevant. This type of inference is a ubiquitous process, which underlies many phenomena [Kardes et al. 2004b; Loken 2006]; some even argue that it is the very essence of Authors’ addresses: P. van Schaik, School of Social Sciences and Law, Teesside University, United Kingdom; email: p.van-schaik@tees.ac.uk; M. Hassenzahl, Ergonomics in Industrial Design, Folkwang University, Germany; J. Ling, Faculty of Applied Sciences, University of Sunderland. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from the Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701, USA, fax +1 (212) 869-0481, or permissions@acm.org. c 2012 ACM 1073-0516/2012/07-ART11 $15.00 DOI 10.1145/2240156.2240159 http://doi.acm.org/10.1145/2240156.2240159

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.

11:2

P. van Schaik et al.

human judgment [Kruglanski et al. 2007]. The purpose of the current studies was to investigate how people infer specific attributes of an interactive product from other attributes or broader evaluations, such as beauty or an overall evaluation (e.g., goodness). 1.1 Inference in Judgments about Interactive Products

While inference is ubiquitous, available models of the perceived quality of interactive products, from Technology Acceptance [Venkatesh et al. 2003] to User-Experience (UX) [Hartmann et al. 2008; Hassenzahl 2003; Lavie and Tractinsky 2004], predominantly assume only one particular pattern of inference: induction or a specific-to-generalinference [Kardes et al. 2004b]. This approach suggests that overall assessments or attitudes are “built” from the careful consideration, weighting and integration of specific attributes, such as usability, functionality, expressive aesthetics, hedonic quality, and/or engagement. The approach is reminiscent of computational, multi-attribute theories of decision-making [Keeney and Raiffa 1976]. These assume that people construct their overall assessment from single, distinct and specific attributes, which are assessed separately (e.g., “How usable is the product?”), weighted (e.g., “How important is usability to me?”) and combined into an overall evaluation. While studies seem to provide some support for the induction of general value from specific attributes, it should not be taken as the major or even the only process that operates. Current approaches to judgment and decision making take a rather noncomputational approach [Chater and Brown 2008; Chater et al. 2003; Gigerenzer and Gaissmaier 2011; Kruglanski and Gigerenzer 2011]. Supported by a wealth of empirical evidence [Gigerenzer and Gaissmaier 2011], these approaches suggest that, rather than doing complex (weighted and summated) calculations to induce, people use relatively simple cognitive strategies (e.g., “simple rules” [Chater and Brown 2008]; “heuristics” [Gigerenzer and Gaissmaier 2011]) to make judgments. This type of processing is due to the way the world is structured in terms of available information. Induction assumes, for example, some knowledge about each attribute, which is only rarely available [Gigerenzer and Gaissmaier 2011]. People may simply lack hands-on experience of a product, which makes it difficult to assess its usability. Nevertheless, people make global value judgments and more specific attribute judgments, even when information is absent or limited. They do so by inferring unobservable, momentarily hard-to-assess product attributes from their global valuation of the product (i.e., general-to-specific) or by inferring them from other, more accessible attributes (i.e., specific-to-specific). At the heart of this inference process are rules, which tie together available and unavailable information. These rules are based on lay theories and knowledge about their applicability in a particular situation. A well-known inference rule, for example, is that the more expensive a product, the higher the quality (price-quality correlation, e.g., Kardes et al. [2004a]). People, for example, guess the taste (quality) of wine, based on its price. Note that the application of rules is context-dependent. People do not apply the price-quality rule when a product is on special offer [Chernev and Carpenter 2001]. In addition, the application of inference rules is not necessarily conscious or deliberate, it can also be automatic, unconscious and hardly accessible to a particular individual [Kardes et al. 2004b; Kruglanski et al. 2007]. 1.2 Inference in User Experience

The variety of potential rules to infer attributes from other attributes has interesting implications for the study of UX, preferences, and acceptance. In most studies,

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.

User-Experience from an Inference Perspective

11:3

data collection is concurrent, that is, all constructs are assessed simultaneously. Researchers then evaluate their specified model, which is almost exclusively inductive. However, disregarding potential deduction (e.g., general-to-specific inference) can easily lead to false conclusions. Consider the following hypothetical example. Participants are asked to assess the (perceived) usability and innovativeness of a given product and its general appeal. The researcher regresses usability and innovativeness on appeal to determine the relative importance of usability and innovativeness in explaining appeal. Now assume that people have only limited or even no hands-on experience with the product. This makes their assessment of usability difficult: without hands-on experience, the question “Do I find the product predictable?” is hard to answer. On the other hand, from the description of the product concept and what they see they may tell right away whether they find it innovative or not. People who appreciate innovativeness will provide high, but others low appeal ratings. Innovativeness becomes predictive of appeal. Usability, however, cannot be assessed as easily. People nevertheless provide an assessment, when prompted to do so. But they do not infer usability from actual specific information, but may deduce it from their general appeal rating, following the simple rule of “I like it, it must be good on all attributes” (which is simply a version of the ubiquitous “halo-effect” [Thorndike 1920]). The consequence is that people who find the product appealing (mainly due to its innovativeness) provide higher ratings of usability as well. However, the researcher applying an inductive model to the data will reach quite a different conclusion. Usability becomes highly predictive of appeal. It becomes “important” for any judgment of appeal. However, this is not a consequence of usability’s role in forming an overall judgment (of appeal in this case), but a consequence of people deducing a hard-to-assess attribute (e.g., usability) from general value (e.g., appeal). As long as people strive for consistency in their judgments, these “hypothetical” effects are likely to be responsible for many findings and potentially false beliefs in UX and information systems (IS) research. The potential different interpretations of people’s judgment of attributes and general values alone justify the quest for a better understanding of the structure of UX models. Take the study of Cyr et al. [2006] as one of many examples. In their study, 60 participants used a single mobile Web site for five to fifteen minutes to perform some given information retrieval tasks. Subsequently, design aesthetics, perceived usefulness, perceived ease-of-use, perceived enjoyment, and loyalty were assessed concurrently with a questionnaire. Their suggested model assumes that design aesthetics is used to infer three aspects: enjoyment, ease-of-use, and usefulness. Ease-of-use is then in turn used to infer enjoyment and usefulness. From the latter two, loyalty is inferred. Given the mechanisms of human judgment, this particular model is hard to justify. For example, one could easily argue that participants got an impression of design aesthetics from looking at the site and ease of use from using the site. From these two pieces of specific information, usefulness and enjoyment are then inferred separately and combined into loyalty. In this interpretation, the effect of design aesthetics on ease of use is spurious, perhaps due to the correlation of both these variables with usefulness and enjoyment. Similarly, the effects of ease of use on enjoyment and of design aesthetics on enjoyment would be spurious. The point here is that without the notion of inference and a careful consideration of how assessments are potentially made in different situations—boundary conditions— the theoretical justification of a model is almost impossible. However, this is exactly what is needed, because whatever sophisticated statistical modeling techniques are employed, the results and their credibility depend on the specification of the model, which is outside of statistical considerations.

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.

11:4

P. van Schaik et al.

Fig. 1. Basic inference model.

1.3 Taking an Inference Perspective in User-Experience Research

Despite the potential implications (in terms of better-specifying UX models) of considering various types of situation-dependent inference rules beyond the specific-togeneral that is predominantly used, neither the field of IS nor human-computer interaction (HCI) seem to explicitly consider an inference perspective. A recent exception is Hassenzahl and Monk’s work [2010]. They argue that “beauty” should be thought of as an affect-driven, evaluative response to the visual Gestalt of an interactive product [Hassenzahl 2008]. This has two implications: first, judgments of beauty are always based on information; they require only a visual input, which is almost always available. Second, its predominantly affective nature makes it very quick [Tractinsky et al. 2006]. Both characteristics point to beauty as an important starting point for inferring other attributes, which are at least initially hard to access, due to, for example, a lack of hands-on experience or other missing information. Therefore, Hassenzahl and Monk [2010] argued inference from beauty to be extremely likely, especially in the absence of any further experience, but this inference may also remain a dominant mode of judgment even after hands-on experience later. They further suggested specific rules that govern inference for interactive products. Due to its evaluative nature, beauty is an important input to the general evaluation of the product (goodness) (the direct link from Beauty to Goodness in Figure 1). This is reminiscent of Dion et al.’s [1973] classic “what is beautiful is good”: a stereotype judgment in person perception. Two further constructs are of interest in the IS and HCI literature: perceived pragmatic quality (broadly related to perceived usability, perceived ease-of-use) and perceived hedonic quality (broadly related to perceived enjoyment, novelty, stimulation) [Hassenzahl 2010]. Taking an inference perspective, Hassenzahl and Monk [2010] proposed two distinct rules to account for the inference of pragmatic and hedonic quality from beauty. The link between pragmatic quality and beauty is indirect. It is a consequence of evaluative consistency [Lingle and Ostrom 1979], where individuals infer unavailable attributes from general value [goodness] to keep their judgments consistent (the path from beauty to Goodness and from Goodness to Pragmatic quality). In contrast to Pragmatic quality, Hedonic quality is directly inferred from beauty (the direct link from Beauty to Hedonic quality in Figure 1). According to probabilistic consistency [Ford and Smith 1987], individuals infer unavailable attributes directly from a specific available attribute believed to be conceptually or even causally linked to the unavailable attribute. In other words, while people may hold specific beliefs about ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.

User-Experience from an Inference Perspective

11:5

Fig. 2. Hassenzahl and Monkâ&#x20AC;&#x2122;s [2007] test results of an inference model. Figures are (standardized) path coefficients (with bivariate correlations in brackets). Bold signifies statistical significance ( p < .05).

how beauty and hedonic quality are related, any observable link between beauty and pragmatic quality is just the consequence of people inferring overall quality (goodness) from beauty and then inferring conceptually different specific product attributes from overall quality. Hassenzahl and Monk [2007, 2010] put these rules to an initial test. Figure 2 shows the results of LISREL analysis from a sample of 430 assessments of 21 different interactive products. The sample was recruited using the online questionnaire tool AttrakDif21 . The data were fully anonymized, thus, nothing can be said about the specific interactive products constituting this sample. The analysis reported was published in Hassenzahl and Monk [2007], but the data are identical with Dataset 4 of Hassenzahl [2010]. However, the latter paper used a different analysis strategy and rather focused on the inference of pragmatic quality (i.e., perceived usability) from beauty. For the current paper, the LISREL analysis from 2007 is more illustrative. As expected, the relation between beauty and goodness was substantial (.71) and stronger than the relation between goodness, and pragmatic quality (.65) and hedonic quality (.48). Moreover, the link between beauty and pragmatic quality, which was substantial as a bivariate correlation (.53), completely disappeared when goodness was included in the analysis (path coefficient: â&#x2C6;&#x2019;.07), emphasizing the fully mediated nature of the link between beauty and pragmatic quality (i.e., an example of evaluative consistency). In contrast, the direct link between beauty and hedonic quality remained intact (path coefficient: .42), hinting at a partial mediation, where some of the hedonic quality is directly inferred from beauty (i.e., an example of probabilistic consistency) and some indirectly through goodness (i.e., an example of the evaluative consistency). The results provide strong support for an inference-based model, which was substantiated with three further studies [Hassenzahl and Monk 2010]. In sum, Hassenzahl and Monk [2007, 2010] suggest that (1) beauty is a direct determinant of goodness, (2) beauty is an indirect determinant of pragmatic quality, operating through goodness, and (3) beauty is a direct determinant of hedonic quality.

http://www.attrakdiff.de

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.

11:6

P. van Schaik et al.

1.4 Aims of the Current Study

While representing a first step towards more adequately addressing the variety of human judgment processes, the studies of Hassenzahl and Monk [2010] left a number of questions unanswered. We aim to explore some of these in the present paper. We report three independent, complementing studies, which focus on the following aspects. Replicability of the suggested inference for single products. Monk [2004] strongly urged us to avoid a “fixed-effect fallacy” when studying the relationship between attribute judgments about products. In brief, the argument was that typically in these studies participants as well as products contribute variance. Accordingly, we need to carefully sample products as well and need to make sure that all models hold for people (i.e., subjects analysis) and products (i.e., material analysis) alike. Hassenzahl and Monk [2010] tested the inference model (as described in Figure 1) on four independent datasets, in a subjects as well as a materials analysis, and found the model was stable. However, while methodologically sound, the requirement of having a sample of products does not fit well with practices of HCI or the domain of UX. Typically, practitioners and researchers alike evaluate a single product by handing out questionnaires to a sample of people. While the results from Hassenzahl and Monk [2010] suggest that the inference model should also hold for a single product (i.e., when the product variance is held constant), this was not tested yet. Therefore, we used only a different single product in each of three studies. Our first aim was simply to replicate the inference model. Effects of hands-on experience. In Hassenzahl and Monk [2010], all participants were instructed to have brief hands-on experience with a particular product. Even when assuming that inference is a stable process, which easily overrules brief handson experience, this is not the most straightforward way of testing the model. Accordingly, the present studies contrasted two different measurements, one based on the presentation of an interactive product only and another after verifiable hands-on experience. The inference model we have outlined should hold for the “presentation-only” condition (in the sense of a control condition, where an interactive product is presented, but users do not interact with the product). In addition, this study design allows testing of the stability of the model, given additional hands-on experience. Our second aim was to explore potential effects of hands-on experience on the model. Types of experience. In most studies [though, see van Schaik and Ling [2011], hands-on experience is either task-oriented (i.e., people are asked to complete given tasks, such as the information retrieval tasks provided by Cyr et al. [2006]) or left to the participants [e.g., Hassenzahl and Monk 2010]. Inspired by Apter [1989], Hassenzahl [2003] conceptualized the psychological consequences of the different situations while interacting with a product in goal mode or action mode. In goal mode the fulfillment of a given goal is to the fore. The goal has clear relevance and determines all actions. The product is therefore just a means to an end. While interacting with a product, people in goal mode try to be effective and efficient. In action mode the action itself is to the fore. It determines rather volatile goals during use. Using the product is an end in itself. Several studies [Hassenzahl and Ullrich 2007; van Schaik and Ling 2009, 2011] revealed a profound effect of mode of use on how products are judged, which merits its inclusion. In the present paper, we studied experience in action mode in Study 1, by not specifying any particular task, and asking participants to just explore the artifact. In Study 2, we introduced specific information-retrieval tasks (experience in goal mode), which enabled us to examine potential differences. In Study 3, we additionally varied task complexity to introduce more or less demand ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.

User-Experience from an Inference Perspective

11:7

in goal mode [van Oostendorp et al. 2009; van Schaik and Ling (to appear)]. Task complexity (path length, defined as the number of steps involved in finding the information) has a negative effect on task performance, due to the increase in probability to select a wrong link on longer paths or to misjudged the relevance of presented information [Gwizdka and Spence 2006]. For Study 3, we assumed that the manipulation of task complexity would influence perceptions of pragmatic quality but not of hedonic quality. Hassenzahl [2001], for example, found pragmatic and hedonic quality to be independent after hands-on experience in a goal-oriented mode. In this study, subjective mental effortâ&#x20AC;&#x201D;a consequence of task demands and/or usability problemsâ&#x20AC;&#x201D;negatively impacted pragmatic quality, but not hedonic quality. We assume a similar asymmetry here. Thus, in addition to studying the inference model under two different usage modes (Study 1 and Study 2), in Study 3 we set out to deliberately manipulate experience by making the task more or less complex and to explore its effects on the prediction of the inference model. Note that, within the framework of the personartifact-task model [Finneran and Zhang 2003; van Schaik and Ling, to appear], task complexity is only one of the possible variables that affect judgment. Existing models of judgment in HCI [Hartmann et al. 2008; Hassenzahl 2003] address external variables (such as person artifact and task), but can be classified uniformly as specificto-general. This limits their potential when applied to the general-to-specific type of inference investigated in the current research. In other words, these models are useful in highlighting the effects of external variables, but do not illuminate how these variables would affect the specific inference rules suggested in the present paper. Our third aim was to explore how well the inference model works across different types of experience. In the following, we report three independent studies. For each we expect the basic inference model described previously to hold (Figure 1). However, we set out to clarify whether the model replicates when different single products are used (Aim 1, all studies). To do so, we employed three different Web sites that were not homogeneous and varied in familiarity to participants, in order to establish the generality of the results over a range of artifacts. In addition, we studied the effect of additional hands-on experience (Aim 2, all studies) by deliberately comparing judgments before and after actual experience. This comparison provides information about the persistence of an inference model when users gain additional information in the process of product use. To further broaden the scope and potential generality of the inference model (Aim 3), we employed two different types of experience (activity versus goal-oriented, Study 1 versus Study 2 and Study 3) and even deliberately varied experience through an external factor (in Study 3). 2. STUDY 1: ACTION-MODE 2.1 Method 2.1.1 Participants. Ninety-four undergraduate psychology students (73 females and 21 males), with a mean age of 24 years (SD = 9) took part in the experiment as a course requirement. All participants had used the World Wide Web and all but two had used the target Web site (Wikipedia). Mean expertise using the Web was 8 years (SD = 3), mean time per week spent using the Web and Wikipedia was 17h and 3h respectively (SD = 12/7) and mean frequency of Web/Wikipedia use per week was 17/2 times (SD = 15/4). 2.1.2 Materials and Equipment. Participants gave responses to a 10-item short version of the AttrakDiff2 questionnaire [Hassenzahl and Monk 2010], consisting of 7-point semantic-differential items (see Appendix A). The following constructs were measured: ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.

11:8

P. van Schaik et al.

Fig. 3. Sample Web page (Experiment 1). Wikipedia.org, Creative Commons License 3.0.

pragmatic quality (four items), hedonic quality (four items), beauty (one item) and goodness (one item).2 Participants used Wikipedia’s Web site as it existed in January 2010 (see Figure 3 for a sample Web page). The experiment in this study and the following were programmed in Visual Basic 6 and ran on personal computers (Intel Pentium, 1.86 GHz, 2 GB RAM, Microsoft Windows XP operating system, 17-inch monitors); the screen dimensions were 1280–1024; contrast (50%) and brightness (75%) were set to optimal levels. 2.1.3 Procedure. The study consisted of two phases and ran in a computer laboratory with groups of 15–20 participants working independently. In Phase 1, each participant was introduced to the Web Site through self-paced presentation of 10 noninteractive screenshots of different pages from Wikipedia’s Web site. Participants then completed the short version of the AttrakDiff2 questionnaire. In Phase 2, participants were free to use the same Web site to explore their own interests for 20 minutes (i.e., action mode). The median number of pages visited was 25, with a semi-interquartile range of 22. After this, participants again completed the AttrakDiff2 and answered demographic questions. The study took about 35 minutes to complete. The procedure used in this and the following study represents an extension of that used by Hassenzahl and Monk [2010] in that there were two separate phases for presentation only and additional hands-on experience, with measures taken at the end of each 2

Refer to Appendix B, Tables II and III for a summary of the scales’ psychometric properties.

ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.

User-Experience from an Inference Perspective

11:9

phase, and having a fixed set of screen shots for presentation and a predefined task for interaction. 2.1.4 Data Analysis with PLS. Partial-least-squares (PLS) path modeling [Vinzi et al. 2010] was used for data analysis in all three studies for the following reasons. PLS allows the analysis of both single-stage and multi-stage models with latent variables, allowing the integrated analysis of a measurement model and a structural model. Each latent variable (usually a psychological construct) is measured using one or more manifest variables (usually items). PLS is compatible with multiple regression analysis, analysis of variance and unrelated t-tests: the results of these techniques are special cases of the results of PLS, but these techniques do not account for measurement error, whereas PLS does. PLS does not require some of the assumptions imposed by covariance-based structural equation modelingâ&#x20AC;&#x201D;including those of large sample sizes, and univariate and multivariate normality. All PLS analyses reported here satisfied the following minimum requirement for robust estimations in PLS path modeling [Henseler et al. 2009]: the larger of (a) ten times the number of indicators of the scale with the largest number of formative indicators and (b) ten times the largest number of structural paths directed at a particular construct in the inner path model. Recent simulation studies have demonstrated that PLS path modeling performs at least as well as and, under various conditions, is superior to covariance-based structural equation modeling in terms of bias, root mean square error and mean absolute deviation [Hulland et al. 2010; Vilares et al. 2010]. For a consistent approach, the data analyses for all studies were conducted with PLS by way of the SmartPLS software [Ringle et al. 2005], unless stated otherwise. A bootstrapping procedure N = 5000, as Henseler et al. [2009] recommend was used to test the significance of model parameters. Each indirect (mediated) effect (e.g., the effect of beauty on pragmatic quality mediated by goodness) was calculated as the product of the two constituent direct effects (e.g., of beauty on goodness and of goodness on pragmatic quality) comprising the indirect effect. Then bootstrapping was used to test the indirect effect. In particular, each bootstrap sample produced parameter estimates for the constituent direct effects, from which the indirect effects were calculated. The mean and standard error of this calculated estimate over the bootstrap samples and, from these, a t-statistic was then calculated to test the significance of the indirect effect. The total effect (e.g., of beauty on pragmatic quality) was broken down into the indirect effect (e.g., of beauty on pragmatic quality mediated by goodness) and the direct effect (e.g., of beauty on pragmatic quality with goodness held constant). In each experiment, tests of the difference between regression coefficients before and after interaction were conducted to test their stability. In a single analysis, each bootstrap sample produced all coefficients (for presentation-only and hands-on experience). Mean difference and standard error of the difference of each pair of coefficients (for presentation-only and hands-on experience) and, from these, a t-statistic were calculated to test the significance of the difference. 2.2 Results and Discussion

Figure 4 shows the results of the proposed inference model (see Figure 1) for (a) presentation-only and (b) additional hands-on experience. Presented (a) are standardized path coefficients, with figures in brackets representing indirect effects, and (b) the variance in each endogenous latent variable that is explained by the direct effects on it. For example, in Figure 1(a), 39% (R2 = .39) of variance in pragmatic quality was explained by the direct effects of beauty and goodness, while 35% (R2 = .35) of variance in hedonic quality was explained by beauty, goodness and pragmatic quality. ACM Transactions on Computer-Human Interaction, Vol. 19, No. 2, Article 11, Publication date: July 2012.