Zfp 2017 225issue 1 by Hogrefe

Bernd Leplow (Editor)

Zeitschrift für Psychologie Founded in 1890 Volume 225 / Number 1 / 2017 Editor-in-Chief Edgar Erdfelder Associate Editors Michael Bošnjak Herta Flor Benjamin E. Hilbig Heinz Holling Bernd Leplow Steffi Pohl Christiane Spiel Elsbeth Stern

Applied Psychological Measurement

Test development and construction: Current practices and advances “This book is indispensable for all who want an up-to-date resource about constructing valid tests.” Prof. Dr. Johnny R. J. Fontaine, President of the European Association of Psychological Assessment, Faculty of Psychology and Educational Sciences, Ghent University, Belgium

Karl Schweizer / Christine DiStefano (Editors)

Principles and Methods of Test Construction Standards and Recent Advances

(Series: Psychological Assessment – Science and Practice – Vol. 3) 2016, vi + 336 pp. US $69.00 / € 49.95 ISBN 978-0-88937-449-2 Also available as eBook This latest volume in the series Psychological Assessment – Science and Practice describes the current state-of-the-art in test development and construction. The past 10–20 years have seen substantial advances in the methods used to develop and administer tests. In this volume many of the world’s leading authorities collate these advances and provide information about current practices, thus equipping researchers and students to successfully construct new tests using the best modern standards and

www.hogrefe.com

techniques. The first section explains the benefits of considering the underlying theory when designing tests, such as factor analysis and item response theory. The second section looks at item format and test presentation. The third discusses model testing and selection, while the fourth goes into statistical methods that can find group-specific bias. The final section discusses topics of special relevance, such as multitraitmultimethod analyses and development of screening instruments.

Bernd Leplow (Editor)

Applied Psychological Measurement

Zeitschrift fuÂ¨r Psychologie Vol. 225, No. 1, 2017

Library of Congress Cataloging in Publication information is available via the Library of Congress Marc Database under the LC Control Number 2017931095 Ó 2017 Hogrefe Publishing Hogrefe Publishing Incorporated and registered in the Commonwealth of Massachusetts, USA, and in Go¨ttingen, Lower Saxony, Germany No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying, microﬁlming, recording or otherwise, without written permission from the publisher. Cover image Ó vgajic – istockphoto.com Printed and bound in Germany ISBN 978-0-88937-498-0 The Zeitschrift fu¨r Psychologie, founded by Hermann Ebbinghaus and Arthur Ko¨nig in 1890, is the oldest psychology journal in Europe and the second oldest in the world. Since 2007, it appears in English and is devoted to publishing topical issues that provide convenient state-of-the-art compilations of research in psychology, each covering an area of current interest. The Zeitschrift fu¨r Psychologie is available as a journal in print and online by annual subscription and the different topical compendia are also available as individual titles by ISBN.

Editor-in-Chief

Edgar Erdfelder, University of Mannheim, Psychology III, Schloss, Ehrenhof-Ost, 68131 Mannheim, Germany, Tel. +49 621 181-2146, Fax +49 621 181-3997, erdfelder@psychologie.uni-mannheim.de

Associate Editors

Michael Bosˇnjak, Mannheim, Germany Herta Flor, Mannheim, Germany

Benjamin Hilbig, Landau, Germany Heinz Holling, Mu¨nster, Germany Bernd Leplow, Halle, Germany

Stefﬁ Pohl, Berlin, Germany Christiane Spiel, Vienna, Austria Elsbeth Stern, Zurich, Switzerland

Editorial Board

G. M. Bente, Cologne, Germany D. Do¨rner, Bamberg, Germany N. Foreman, London, UK D. Frey, Munich, Germany J. Funke, Heidelberg, Germany W. Greve, Hildesheim, Germany W. Hacker, Dresden, Germany R. Hartsuiker, Ghent, Belgium J. Hellbru¨ck, Eichsta¨ttIngolstadt, Germany

F. W. Hesse, Tu¨bingen, Germany R. Hu¨bner, Konstanz, Germany A. Jacobs, Berlin, Germany M. Jerusalem, Berlin, Germany A. Kruse, Heidelberg, Germany W. Miltner, Jena, Germany T. Mofﬁtt, London, UK A. Molinsky, Waltham, MA, USA H. Moosbrugger, Frankfurt/Main, Germany

W. Schneider, Wu¨rzburg, Germany B. Schyns, Durham, UK B. Six, Halle, Germany P. K. Smith, London, UK W. Sommer, Berlin, Germany A. von Eye, Vienna, Austria K. Wiemer-Hastings, DeKalb, IL, USA I. Zettler, Copenhagen, Denmark

Publisher

Hogrefe Publishing, Merkelstr. 3, 37085 Go¨ttingen, Germany, Tel. +49 551 999 50 0, Fax +49 551 999 50 425, publishing@hogrefe.com North America: Hogrefe Publishing, 7 Bulﬁnch Place, 2nd ﬂoor, Boston, MA 02114, USA Tel. +1 (866) 823 4726, Fax +1 (617) 354 6875, customerservice@hogrefe-publishing.com

Production

Christina Sarembe, Hogrefe Publishing, Merkelstr. 3, 37085 Go¨ttingen, Germany, Tel. +49 551 999 50 424, Fax +49 551 999 50 425, production@hogrefe.com

Subscriptions

Hogrefe Publishing, Herbert-Quandt-Str. 4, 37081 Go¨ttingen, Germany, Tel. +49 551 999 50 900, Fax +49 551 999 50 998

Advertising/Inserts

Hogrefe Publishing, Merkelstr. 3, 37085 Go¨ttingen, Germany, Tel. +49 551 999 50 423, Fax +49 551 999 50 425, marketing@hogrefe.com

ISSN

ISSN-L 2151-2604, ISSN-Print 2190-8370, ISSN-Online 2151-2604

Ó 2017 Hogrefe Publishing. This journal as well as the individual contributions and illustrations contained within it are protected under international copyright law. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, microﬁlming, recording or otherwise, without prior written permission from the publisher. All rights, including translation rights, reserved.

Publication

Published in 4 topical issues per annual volume.

Subscription Prices

Calendar year subscriptions only. Rates for 2017: Institutions US $372.00 / 292.00; Individuals US $195.00 / 139.00 (all plus US $16.00 / 12.00 shipping & handling; 6.00 in Germany). Single issue US $49.00 / 34.95 (plus shipping & handling).

Payment

Payment may be made by check, international money order, or credit card, to Hogrefe Publishing, Merkelstr. 3, 37085 Go¨ttingen, Germany. US and Canadian subscriptions can also be ordered from Hogrefe Publishing, 7 Bulﬁnch Place, 2nd ﬂoor, Boston, MA 02114, USA

Electronic Full Text

The full text of Zeitschrift fu¨r Psychologie is available online at www.econtent.hogrefe.com and in PsycARTICLESTM.

Abstracting Services

Abstracted/indexed in Current Contents/Social and Behavioral Sciences (CC/S&BS), Social Sciences Citation Index (SSCI), Research Alert, PsycINFO, PASCAL, PsycLit, IBZ, IBR, ERIH, and PSYNDEX. Impact Factor (2015): 0.820

Zeitschrift fu¨r Psychologie (2017), 225(1)

Ó 2017 Hogrefe Publishing

Contents Editorial

Ten Years After Edgar Erdfelder and Bernd Leplow

Original Articles

Measuring Individual Differences in Implicit Learning with Artiﬁcial Grammar Learning Tasks: Conceptual and Methodological Conundrums Daniel Danner, Dirk Hagemann, and Joachim Funke

Call for Papers

Ó 2017 Hogrefe Publishing

Measuring Age-Related Differences in Using a Simple Decision Strategy: The Case of the Recognition Heuristic Ru¨diger F. Pohl

Measuring the Zero-Risk Bias: Methodological Artefact or Decision-Making Strategy? Elisabeth Schneider, Bernhard Streicher, Eva Lermer, Rainer Sachs, and Dieter Frey

Assessing Suffering in Experimental Pain Models: Psychological and Psychophysiological Correlates M. Brunner, M. Lo¨fﬂer, S. Kamping, S. Bustan, A. M. Gonza´lez-Rolda´n, F. Anton, and H. Flor

Measuring a Mastery Goal Structure Using the TARGET Framework: Development and Validation of a Classroom Goal Structure Questionnaire Marko Lu¨ftenegger, Ulrich S. Tran, Lisa Bardach, Barbara Schober, and Christiane Spiel

Parents’ and Teachers’ Opinions on Bullying and Cyberbullying Prevention: The Relevance of Their Own Children’s or Students’ Involvement Petra Gradinger, Dagmar Strohmeier, and Christiane Spiel

Intercultural Competence Development Among University Students From a Self-Regulated Learning Perspective: Theoretical Model and Measurement Dagmar Strohmeier, Petra Gradinger, and Petra Wagner

Delusions: Risk-Factors, Models, and Approaches to Psychological Intervention: A Topical Issue of the Zeitschrift fu¨r Psychologie Guest Editor: Tania Lincoln

Zeitschrift fu¨r Psychologie (2017), 225(1)

Editorial Ten Years After Edgar Erdfelder1 and Bernd Leplow2 1

Department of Psychology, School of Social Sciences, University of Mannheim, Germany

Institute of Psychology, Martin Luther University Halle-Wittenberg, Germany

A Decade of Internationalization and Topical Issues With the current issue 1 of volume 225 (2017), the Zeitschrift für Psychologie (ZfP) celebrates its tenth anniversary as an English-language journal devoted to publishing topical issues from all branches of psychological science. Departing from the journal’s history that dates back to 1890 when Hermann Ebbinghaus and others founded the ZfP as a German-language psychology journal titled Zeitschrift für Psychologie und Physiologie der Sinnesorgane, Holling and Erdfelder (2007) were the first to launch an issue that was entirely published in English – a topical issue on human memory. Looking back at the past 10 years and the 40 topical issues that have appeared since Holling and Erdfelder’s issue 1 of volume 215 (2007), we feel that the step taken in 2007 was a good one. It was clearly worthwhile to head toward more internationality and a broader thematic scope of the ZfP – with respect to guest editors, authors, readers, and the topics of psychological research addressed in the journal. As it now stands, the main objectives of the change in format have been met. First, the new ZfP attracted guest editors and authors from all over the world, several of them leading scholars in their fields. Second, almost all issues benefitted from international perspectives on the topic of interest, as exemplified by multinational groups of senior and junior researchers in the respective field. Third, as originally intended, the topical issues addressed so far covered virtually all parts of psychology. This not only holds for basic psychological science (a recent example is the issue on psychological reactance research, see Steindl, Jonas, Sittenthaler, Traut-Mattausch, & Greenberg, 2015) and applied psychology (see, e.g., the issue on competence measurement in higher education; Blömeke, Gustafsson, & Shavelson, 2015) but often also for research that aims at bridging the gap between applied and basic research, as exemplified by the topical issues on the role of memory in posttraumatic stress disorder (Flor & Wessa, 2010), Ó 2017 Hogrefe Publishing

on basic mechanisms underlying placebo effects (Klinger & Colloca, 2014), on psychological and neural consequences of torture (Weierstall, Elbert, & Maercker, 2011), and the recent issue on educational neuroscience (Ahr, Borst, & Houdé, 2016; Bédard, Laplante, & Mercier, 2016; Dion & Restrepo, 2016; Stern, Grabner, & Schumacher, 2016). Fourth, as a gratifying by-product of these achievements, the impact of ZfP publications on the field of psychology as a whole has also increased considerably. This is partly reflected in the positive developmental trend of the journal’s Web of ScienceTM impact factor since 2007, but even more so in several key papers published in ZfP that have become important for their field, as indicated by high citation rates. Most of these key papers are reviews (for a recent example, see Blömeke et al., 2015). Formally, most ZfP issues focus on a single psychological topic and begin with a pertinent review, followed by up to six original articles. Where appropriate, guest editors may also include Research Notes on work in progress of high importance. In addition, the Opinion section can be used to cover current scientific controversies, typically on topics for which the empirical evidence is inconclusive. Because of the increasing importance of research syntheses such as reviews, ZfP has recently supplemented an additional type of topical issues, called “hotspots in psychology” (Erdfelder & Bošnjak, 2016). In contrast to other issues, hotspot issues are thematically heterogeneous and allow for inclusion of all topics that are currently fiercely debated in psychology. The golden thread through potential contributions is methodological in nature: Hotspot issues include state-of-the-art meta-analyses (e.g., Chodura, Kuhn, & Holling, 2015; Lazarević et al., 2016; Rennung & Göritz, 2016, for recent examples), systematic reviews (e.g., Giroux, Coburn, Harley, Connolly, & Bernstein, 2016), and also methodological contributions on research synthesis approaches (e.g., Kühberger, Scherndl, Ludwig, & Simon, 2016). One hotspot issue has appeared so far (Erdfelder & Bošnjak, 2016), and we plan to continue this format regularly once per year, for the next time in issue 1 of volume 226 (2018). Zeitschrift für Psychologie (2017), 225(1), 1–4 DOI: 10.1027/2151-2604/a000293

This brings us to the contents of the current topical issue on applied psychological measurement, edited by Bernd Leplow. In behavioral science, measurement methods and measurement theory are often discussed in isolation, that is, separate from specific substantive research questions. Highly specialized disciplines like psychometrics and test theory typically aim at developing general-purpose tools that can be adapted to the needs of different psychological disciplines. Quite often, however, these general-purpose tools do not fit the substantive research questions closely enough to provide convincing scientific answers. As a consequence, more specific theory-guided measurement devices, instruments, and associated statistical methods need to be developed that are tailored to the research questions of interest.

In This Issue The current topical issue presents a collection of examples of this type of research-question-driven applied psychological measurement. Because of the heterogeneity of the research topics and measurement methods covered in this issue, we have refrained from including a review article. Thus, all eight contributions to this issue report original research. The current issue consists of three parts. The first part is about measurement of individual differences in cognition and includes three articles. In the second part, measurement issues in applied fields such as neuropsychology and trauma research are addressed in two contributions. Finally, in the third part, three articles discuss measurement issues in educational psychology and competence research.

Measuring Individual Differences in Cognition The first part on cognition begins with a contribution by Danner, Hagemann, and Funke (2017) on measuring individual differences in implicit learning. Using one of the standard tasks to assess implicit learning – the artificial grammar learning task – the authors investigated effects of different task instructions and the inclusion of an explicit knowledge test. Their three experiments suggest that measures of individual differences in implicit learning share problems of low internal consistency and low reliability often found for other complex learning measures as well. Despite these caveats, however, results provide evidence that individual differences in implicit learning are unrelated to general intelligence and educational attainment. Thus, to allow for an exhaustive multidimensional assessment of cognitive skills, future research aiming at improved measures of implicit learning is necessary. Zeitschrift für Psychologie (2017), 225(1), 1–4

Editorial

Another measure of individual differences in cognition – the propensity for using simple decision heuristics – is addressed in Pohl’s (2017) contribution. The author refers to the recognition heuristic (RH) of fast and frugal decision making, that is, the tendency to pick the recognized object whenever the decision maker has to choose between a recognized and non-recognized object. Based on the assumptions that (a) RH use involves knowledge about the validity of the recognition cue in the underlying decision domain (see Pohl, Michalkiewicz, Erdfelder, & Hilbig, 2017) and (b) that knowledge increases with age, the author predicts that RH use should increase from childhood to adolescence and adult age. Based on a multinomial model of RH use that disentangles the probability of pure RH use from other reasons to choose recognized objects, Pohl (2017) found support for the assumption that knowledge increases with age in a cross-sectional study. Somewhat surprisingly, however, RH use showed an inverted U-shaped trajectory across the life span: RH use increased from childhood to adolescence but then decreased again for adults. The third contribution to Part 1 is also concerned with the measurement of decision preferences. Schneider, Streicher, Lermer, Sachs, and Frey (2017) address the so-called zerorisk bias in risky choice – the preference for certain wins (with probability = 1) to uncertain wins (with probability < 1) –, even if the expected value of the uncertain win is much higher than the proposed certain win. To clarify inconsistent results reported in the relevant literature, the authors assessed the effects of (a) the zero-risk bias measurement method and (b) the decision context in four different experiments. The results reveal that the measurement method is less crucial than the context in which the risky decision is embedded. Thus, the zero-risk bias appears to be a state rather than a stable trait, and it is influenced by task characteristics such as abstractness of the task and the decision domain. In sum, the zero-risk bias, although replicable as a psychological phenomenon, is labile and highly context sensitive if conceived as an individual difference measure.

Psychological Measurement in Clinical Contexts The second part of this issue focuses on measurement in clinical contexts and starts with a neuropsychological contribution. Brunner et al. (2017) address a particularly challenging measurement issue in pain research, namely, the assessment of pain-induced suffering. The authors investigated this difficult and largely neglected topic using multimethod experimental pain induction approaches with healthy adults. Both visual analogue scales of pain-related suffering and the Pictorial Representation of Illness and Self Measure were correlated with measures of subjective pain intensity, subjective unpleasantness, and various psychophysiological Ó 2017 Hogrefe Publishing

Editorial

measures. Results suggest that suffering is distinct from subjective pain intensity and unpleasantness but is related to fear of pain, to private self-consciousness, and also to various peripheral physiological measures such as resting heart rate, skin conductance responses (SCR), and electromyogram (EMG). Thus, pain experiences should be conceived as multidimensional in nature. They require innovative assessment methods over and above established measures of subjective pain intensity and unpleasantness. The Implicit Association Test (IAT) has been used as an indirect measure of various attitudes and personality dimensions for many years, despite the fact that IAT research has also uncovered a number of methodological problems inherent in IAT methodology. In their article, Bluemke et al. (2017) used IAT measures of aggressive attitudes to assess trauma effects among war-affected young men in Northern Uganda, including former child soldiers. The authors observed an intriguing dissociation between explicit and implicit attitudes toward aggression in this sample: With increasing trauma experiences, participants explicitly reported more appetite for aggression but at the same time they also showed less positive implicit attitudes toward aggression in the IAT. Importantly, objective physiological stress as reflected in cortisol levels was best predicted by the implicit attitude. IAT measures based on the diffusion model proved to be most useful in establishing validity.

Applied Measurement in Educational Contexts The third and final part of this issue addresses applied measurement in educational psychology and competence assessment. In their contribution, Lüftenegger, Tran, Bardach, Schober, and Spiel (2017) propose the TARGET framework including the six instructional dimensions Task, Autonomy, Recognition, Grouping, Evaluation, and Time as a theoretical starting point for measuring the mastery goal structure of students in a classroom context. Given the lack of measurement instruments tailored to the TARGET dimensions, they develop a new questionnaire-based assessment instrument, called the Goal Structure Questionnaire (GSQ). Their findings indicate that the six GSQ scales are sufficiently reliable and can predict, among others, mastery goals and performance approach goals of students in the classroom context. The measurement of teachers’ and parents’ attitudes toward bullying and cyberbullying among children is addressed in the subsequent contribution by Gradinger, Strohmeier, and Spiel (2017). Their results show that parents’ and teachers’ attitudes are moderated by experiences with bullying and cyberbullying in their own children or students. Specifically, attitudes toward bullying prevention Ó 2017 Hogrefe Publishing

programs were more positive in teachers of students affected by bullying. Finally, in the eighth contribution to this issue, Strohmeier, Gradinger, and Wagner (2017) propose a three-phase learning model and a corresponding questionnaire-based assessment instrument to capture the development of intercultural competence among adults. For a sample of university students, they show that the self-report questionnaire data follow a multidimensional structure that can be aligned to the proposed three phases of the intercultural competence learning process.

Changes in the Editorial Team A final point should not go unnoticed in this Editorial. After almost a decade of serving as editor-in-chief of the ZfP, Bernd Leplow decided to step down and contribute to the journal as an associate editor in the future. On behalf of all other editors of the ZfP, Edgar Erdfelder likes to thank Bernd Leplow for his excellent work and his enormous efforts in planning and coordinating numerous topical issues, in inviting guest editors and supporting them in the editorial decision process, and in encouraging authors to submit their work to ZfP. The success of the new ZfP is his success to a large extent, and the editors of ZfP are grateful for Bernd Leplow’s willingness to act as the head of the Editorial team for so many years. Beginning with this issue, Edgar Erdfelder takes over the office of the editor-in-chief from Bernd Leplow. As the incoming editor-in-chief, he also likes to thank the associate editors for supporting the journal. Of the longstanding associate editors, Herta Flor, Heinz Holling, and Christiane Spiel will fortunately stay in the team, whereas Dieter Frey and Friedrich W. Hesse have decided to step down. We are happy that we can count on their future aid as editorial board members of the ZfP. As a consequence of their decision, the group of editors has been complemented by new members recently. Michael Bošnjak (GESIS, Mannheim) and Benjamin E. Hilbig (Landau) already joined in the course of last year. In addition, beginning with this issue, Steffi Pohl (FU Berlin) and Elsbeth Stern (ETH Zurich) join the team of associate editors. Welcome to all new members and thanks for their willingness to support the journal in the future!

Topics for Future Issues We very much hope that the topical issues of the ZfP will be even more successful in the future than they have been in the past. Whether this will turn out to be true also depends on the journal’s readers to a large extent. We therefore encourage readers to let the editors know Zeitschrift für Psychologie (2017), 225(1), 1–4

about their preferences for future topical issues and to suggest potential guest editors who are international experts in the respective domain. In particular, timely and broadly discussed topics of psychological research suggest themselves as potential candidates for ZfP issues. The editors will definitely watch out for appropriate topics, but they will also be happy to discuss reader suggestions.

References Ahr, E., Borst, G., & Houdé, O. (2016). The learning brain: Neuronal recycling and inhibition. Zeitschrift für Psychologie, 224, 277–285. doi: 10.1027/2151-2604/a000263 Bédard, M., Laplante, L., & Mercier, J. (2016). Development of reading remediation for dyslexic individuals: Added benefits of the joint consideration of neurophysiological and behavioral data. Zeitschrift für Psychologie, 224, 240–246. doi: 10.1027/ 2151-2604/a000259 Blömeke, S., Gustafsson, J.-E., & Shavelson, R. J. (2015). Beyond dichotomies: Competence viewed as a continuum. Zeitschrift für Psychologie, 223, 3–13. doi: 10.1027/2151-2604/a000194 Bluemke, M., Crombach, A., Hecker, T., Schalinski, I., Elbert, T., & Weierstall, R. (2017). Is the implicit association test for aggressive attitudes a measure for attraction to violence or traumatization? Zeitschrift für Psychologie, 225, 55–64. doi: 10.1027/2151-2604/a000281 Brunner, M., Löffler, M., Kamping, S., Bustan, S., González-Roldán, A. M., Anton, F., & Flor, H. (2017). Assessing suffering in experimental pain models: Psychological and psychophysiological correlates. Zeitschrift für Psychologie, 225, 45–54. doi: 10.1027/2151-2604/a000279 Chodura, S., Kuhn, J.-T., & Holling, H. (2015). Interventions for children with mathematical difficulties: A meta-analysis. Zeitschrift für Psychologie, 223, 129–144. doi: 10.1027/21512604/a000211 Danner, D., Hagemann, D., & Funke, J. (2017). Measuring individual differences in implicit learning with artificial grammar learning tasks: Conceptual and methodological conundrums. Zeitschrift für Psychologie, 225, 5–19. doi: 10.1027/2151-2604/a000280 Dion, J.-S., & Restrepo, G. (2016). A systematic review of the literature linking neural correlates of feedback processing to learning. Zeitschrift für Psychologie, 224, 247–256. doi: 10.1027/2151-2604/a000260 Erdfelder, E., & Bošnjak, M. (2016). “Hotspots in Psychology”: A new format for special issues of the Zeitschrift für Psychologie. Zeitschrift für Psychologie, 224, 141–144. doi: 10.1027/21512604/a000249 Flor, H., & Wessa, M. (2010). Memory and posttraumatic stress disorder: A matter of context? Zeitschrift für Psychologie, 218, 61–63. doi: 10.1027/0044-3409/a000012 Giroux, M. E., Coburn, P. I., Harley, E. M., Connolly, D. A., & Bernstein, D. M. (2016). Hindsight bias and law. Zeitschrift für Psychologie, 224, 190–203. doi: 10.1027/2151-2604/a000253 Gradinger, P., Strohmeier, D., & Spiel, C. (2017). Parents’ and teachers’ opinions on bullying and cyberbullying prevention: The relevance of their own childrens’ or students’ involvement. Zeitschrift für Psychologie, 225, 77–85. doi: 10.1027/21512604/a000278 Holling, H., & Erdfelder, E. (2007). The new Zeitschrift für Psychologie/Journal of Psychology. Zeitschrift für Psychologie, 215, 1–3. doi: 10.1027/0044-3409.215.1.1 Klinger, R., & Colloca, L. (2014). Approaches to a complex phenomenon: The basic mechanisms and clinical applications Zeitschrift für Psychologie (2017), 225(1), 1–4

Editorial

of placebo effects. Zeitschrift für Psychologie, 222, 121–123. doi: 10.1027/2151-2604/a000175 Kühberger, A., Scherndl, T., Ludwig, B., & Simon, D. M. (2016). Comparative evaluation of narrative reviews and metaanalyses: A case study. Zeitschrift für Psychologie, 224, 145–156. doi: 10.1027/2151-2604/a000250 Lazarević, L. B., Bošnjak, M., Knežević, G., Petrović, B., Purić, D., Teovanović, P., . . . Bodroža, B. (2016). Disintegration as an additional trait in the psychobiological model of personality: Assessing discriminant validity via meta-analysis. Zeitschrift für Psychologie, 224, 204–215. doi: 10.1027/2151-2604/a000254 Lüftenegger, M., Tran, U. S., Bardach, L., Schober, B., & Spiel, C. (2017). Measuring a mastery goal structure using the TARGET framework: Development and validation of a classroom goal structure questionnaire. Zeitschrift für Psychologie, 225, 65–76. doi: 10.1027/2151-2604/a000277 Pohl, R. F. (2017). Measuring age-related differences in using a simple decision strategy: The case of the recognition heuristic. Zeitschrift für Psychologie, 225, 20–30. doi: 10.1027/21512604/a000283 Pohl, R. F., Michalkiewicz, M., Erdfelder, E., & Hilbig, B. E. (2017). Use of the recognition heuristic depends on the domain’s recognition validity, not on the recognition validity of selected sets of objects. Memory & Cognition. Advance online publication. doi: 10.3758/s13421-017-0689-0 Rennung, M., & Göritz, A. S. (2016). Prosocial consequences of interpersonal synchrony: A meta-analysis. Zeitschrift für Psychologie, 224, 168–189. doi: 10.1027/2151-2604/a000252 Schneider, E., Streicher, B., Lermer, E., Sachs, R., & Frey, D. (2017). Measuring the zero-risk bias: Methodological artefact or decision-making strategy? Zeitschrift für Psychologie, 225, 31–44. doi: 10.1027/2151-2604/a000284 Steindl, C., Jonas, E., Sittenthaler, S., Traut-Mattausch, E., & Greenberg, J. (2015). Understanding psychological reactance: New developments and findings. Zeitschrift für Psychologie, 223, 205–214. doi: 10.1027/2151-2604/a000222 Stern, E., Grabner, R. H., & Schumacher, R. (2016). Educational neuroscience: A field between false hopes and realistic expectations. Zeitschrift für Psychologie, 224, 237–239. doi: 10.1027/ 2151-2604/a000258 Strohmeier, D., Gradinger, P., & Wagner, P. (2017). Intercultural competence development among university students from a self-regulated learning perspective: Theoretical model and measurement. Zeitschrift für Psychologie, 225, 86–95. doi: 10.1027/2151-2604/a000282 Weierstall, R., Elbert, T., & Maercker, A. (2011). Torture: Psychological approaches to a major humanitarian issue. Zeitschrift für Psychologie, 219, 129–132. doi: 10.1027/2151-2604/a000059 Published online July 12, 2017 Edgar Erdfelder Cognition and Individual Differences Lab University of Mannheim Schloss, Ehrenhof-Ost 68131 Mannheim Germany erdfelder@psychologie.uni-mannheim.de Bernd Leplow Institute of Psychology Martin Luther University Halle-Wittenberg Emil-Abderhalden-Str. 26-27 06108 Halle (Saale) Germany bernd.leplow@psych.uni-halle.de

Ó 2017 Hogrefe Publishing

Original Article

Measuring Individual Differences in Implicit Learning with Artificial Grammar Learning Tasks Conceptual and Methodological Conundrums Daniel Danner,1 Dirk Hagemann,2 and Joachim Funke2 GESIS – Leibniz Institute for the Social Sciences, Mannheim, Germany

1 2

Institute of Psychology, Heidelberg University, Heidelberg, Germany

Abstract: Implicit learning can be defined as learning without intention or awareness. We discuss conceptually and investigate empirically how individual differences in implicit learning can be measured with artificial grammar learning (AGL) tasks. We address whether participants should be instructed to rate the grammaticality or the novelty of letter strings and look at the impact of a knowledge test on measurement quality. We discuss these issues from a conceptual perspective and report three experiments which suggest that (1) the reliability of AGL is moderate and too low for individual assessments, (2) a knowledge test decreases task consistency and increases the correlation with reportable grammar knowledge, and (3) performance in AGL tasks is independent from general intelligence and educational attainment. Keywords: implicit learning, artificial grammar learning task, reliability, validity

Implicit learning has been defined as learning without intention or as acquiring complex information without awareness of what has been learned (e.g., Mackintosh, 1998; Reber, 1992). One example of implicit learning is the learning of grammatical rules: in many situations we know whether a sentence is grammatically right or wrong but we cannot report the underlying grammatical rule. Several authors suggest that implicit learning is a fundamental aspect of learning in real life. For example, Gomez and Gerken (1999) suggest that implicit learning is crucial for learning languages, Funke and Frensch (2007) suggest that implicit learning is a determinant of success in solving complex real life problems, and Mackintosh (1998) suggests that implicit learning may even be a predictor of educational attainment. In line with that, there are empirical findings which suggest that implicit learning is a meaningful individual difference variable. First, several studies reported low associations between implicit learning and general intelligence (e.g., Gebauer & Mackintosh, 2007; Reber, Walkenfeld, & Hernstadt, 1991). These results suggest a good divergent validity of implicit learning. In addition, Kaufman et al. (2010) and Pretz, Totz, and Kaufman (2010) reported a significant relation between implicit learning and academic performance which also suggests the predictive Ó 2017 Hogrefe Publishing

validity of implicit learning. However, investigations on how individual differences in implicit learning can be measured have been sparse. This is the aim of the present study.

The Artificial Grammar Learning Task During the last 45 years, the artificial grammar learning (AGL) task has become a standard paradigm of implicit learning (e.g., Reber, 1967). During a learning phase, the participants are asked to memorize a set of artificial letter strings (like WNSNXT). After that learning phase, the participants are informed that these letter strings were constructed according to a specific grammatical rule. In the subsequent testing phase, the participants are asked to judge new strings as grammatical or nongrammatical. One half of these strings are constructed according to the grammar and the other half are not. The percentage of correct judgments is taken as an indicator for implicit learning success. Typically, the participants show above chance performance which suggests that they learned something but are not able to report the grammar rules, which suggests that they learned the rules implicitly. The logic of such a task is Zeitschrift für Psychologie (2017), 225(1), 5–19 DOI: 10.1027/2151-2604/a000280

intuitively plausible. However, to serve as a meaningful individual difference variable, three criteria have to be met: (1) The performance variable has to be reliable. The reliability is important because only a variable that can be measured reliably allows making inferences about individuals’ ability. In particular, a low reliability results in a large confidence interval of an individual score whereas a high reliability allows an accurate estimate of an individual’s ability. In addition, the reliability of a variable is important for evaluating the validity of a variable. A low reliability limits correlations with other variables and hence, correlations between variables with low reliability cannot be interpreted properly. (2) The performance variable has to be task consistent. Task consistency means that several AGL tasks measure the same construct. In a research context, the task consistency may be important for investigating whether implicit learning is a trait-like ability that is stable over time. In an applied context, the task consistency may be important for an individual assessment (e.g., if a participant is tested more than one time). (3) The performance variable has to be independent from reportable grammar knowledge. If AGL tasks measure implicit learning, there should be no correlation between the judgment accuracy and reportable grammar knowledge. The usefulness of a performance measure may further be evaluated by its divergent validity and its predictive value. We will replicate the finding that implicit learning is independent from general intelligence and we will investigate its relation with educational attainment, but first, we will discuss previous findings and conceptual challenges.

Previous Findings Reliability There have only been sparse investigations of the reliability of implicit learning variables. Reber et al. (1991) examined N = 20 students and reported a Cronbach’s alpha of α = .51 for 100 grammaticality judgments. Likewise, Salthouse, McGuthry, and Hambrick (1999) assessed N = 183 participants between 18 and 87 years of age and reported a reliability of α = .40 for an AGL task. These results demonstrate that it is possible to measure individual differences in implicit learning although this measurement is not very consistent. However, a limitation of these studies is that only a single grammar was used. So in sum, there is only weak support for the measurement of reliable individual differences in artificial grammar learning yet. Thus, we will Zeitschrift für Psychologie (2017), 225(1), 5–19

D. Danner et al., Individual Differences in Implicit Learning

systematically investigate the reliability of the performance in AGL tasks using Cronbach’s alpha, the split-half correlation, and the retest correlation.

Task Consistency Estimating the task consistency requires the same participants to complete at least two AGL tasks which causes a conceptual obstacle: during a first AGL task, the participants are asked to memorize a set of letter strings but they do not know that these strings are constructed according to a grammatical rule. Thus, they cannot learn the grammar intentionally. However, during a second AGL task, the participants already do know that there is a grammar and accordingly they may try to discover the grammar with intention. Having this in mind, can we be sure that a second or third AGL task still measures implicit learning? To avoid this problem, Gebauer and Mackintosh (2007) refined the standard paradigm. In the learning phase, they asked their participants to memorize a set of letter strings. In the subsequent testing phase, they asked their participants not to judge whether a letter string is grammatical but to judge whether a letter string was presented before (“old”). Even though, none of the strings were previously presented, they scored a grammatical letter string which was classified as “old” as a correct decision. The cunning idea behind this procedure was that the participants learn something about the grammar, thus they feel familiar with the grammatical strings and therefore they classify a grammatical string as an “old” one. From a conceptual point of view, novelty judgments and grammaticality judgments may be seen as similar. However, from an empirical point of view, it is unclear whether asking participants to rate the novelty of letter strings measures the same construct as asking participants to rate the grammaticality of letter strings. Using novelty judgments, Gebauer and Mackintosh (2007) reported a correlation of r = .15 between two AGL tasks. This points toward a low task consistency. However, it is not known at present if this result indicates a low consistency of AGL in general or just in the case that the participants are asked to rate the novelty instead of the grammaticality of the strings. Thus, we will investigate the task consistency of AGL tasks in three experiments. A great correlation between two tasks suggests a good task consistency; a small correlation between two tasks suggests a poor task consistency.

Reportable Grammar Knowledge Reber (1967) suggested that the participants in an AGL task learn the grammar rules implicitly because they are not able to report their grammar knowledge. However, to test Ó 2017 Hogrefe Publishing

D. Danner et al., Individual Differences in Implicit Learning

whether grammaticality judgments are independent from reportable knowledge, it is necessary to define what kind of knowledge is relevant for performance in AGL tasks. Without doubt, this is a thorny question and, over the years, there have been controversial and fertile discussions about this topic. For example, Reber and Allen (1978) found that their participants were not able to report any knowledge about grammar rules and therefore suggested that they learned the grammar rules implicitly. Dulany, Carlson, and Dewey (1984) criticized that asking participants to report the grammar rules is too difficult and therefore the participants might not have been able to report their knowledge. To avoid this problem, Dulany et al. (1984) asked their participants to report letter string features on which they based their grammaticality judgments and showed that the reported knowledge was sufficient to explain the above chance accuracy of grammaticality judgments and concluded that the acquired knowledge was not implicit at all. In a similar vein, Perruchet and Pacteau (1990) showed that knowledge of bigrams (e.g., the bigram NX occurs more often in grammatical letter strings) was sufficient to explain the above chance accuracy of grammaticality judgments. Jamieson and Mewhort (2009) used an episodic memory model to demonstrate that grammaticality judgments can also be explained by the similarity of letter strings with previously learned strings (see also Knowlton & Squire, 1996). Other authors suggested that participants make grammaticality judgments based on the chunks (e.g., Servan-Schreiber & Anderson, 1990), fluency (e.g., Kinder, Shanks, Cock, & Tunney, 2003), or fragment overlap (e.g., Boucher & Dienes, 2003). Having these different explanation attempts in mind, it seems difficult to find an appropriate measurement for the relevant knowledge. Shanks and St. John (1994) concluded that it is only possible to measure the relevant knowledge for implicit learning tasks, when the information criterion and the sensitivity criterion are met. A knowledge test meets the information criterion if it captures all kinds of relevant knowledge. It also meets the sensitivity criterion if it is as similar as possible in terms of retrieval context and instruction. Thus, to investigate the relation between implicit learning performance and reportable knowledge, we developed a knowledge test that was designed to meet the information as well as the sensitivity criterion. In particular, we selected those bi- and trigrams (n-grams) that occurred in the learning phase and which also occurred in the testing more frequently in grammatical than in nongrammatical strings. These n-grams allow participants to identify grammatical strings based on n-gram knowledge. We also selected those n-grams that did not occur in the learning phase but that did occur in the testing phase more frequently in nongrammatical strings than

Ó 2017 Hogrefe Publishing

grammatical ones. These n-grams allow participants to identify nongrammatical strings based on n-gram knowledge. We presented these bigrams to the participants and asked them to rate whether the n-gram occurred more often in grammatical or nongrammatical letter strings. The test meets the information criterion because it is a direct test of participants’ n-gram knowledge and also an indirect test of other, correlated knowledge such as similarity, chunks, or fragment overlap. Hence, an above chance classification of n-grams indicates that an above change classification of letter strings can be explained by reportable knowledge such as n-grams, similarity, chunks, or fragment overlap. However, the n-gram test does not indicate which source of knowledge was used. The test further meets the sensitivity criterion because the presentation of the stimuli and the response format were identical with the testing phase. Thus, a great correlation between performance in the testing phase and performance in the knowledge test suggests that participants use reportable knowledge to make their judgments. A small correlation between performance in the testing phase and performance in the knowledge test suggests that participants do not use reportable knowledge.

Relation With General Intelligence and Educational Attainment Implicit learning is often described as part of an unconscious, intuitive learning system that is independent from explicit, declarative learning (e.g., Mackintosh, 1998; Reber & Allen, 2000). In line with this, several studies report a weak association between AGL performance and general intelligence (e.g., Gebauer & Mackintosh, 2007; Reber et al., 1991). However, even if these findings seem appealing at first glance, they may be criticized. For example, some studies did not report reliability estimates and therefore, a low correlation may also be explained by a low reliability. Other studies modified the standard AGL task and it is unclear whether this finding may be generalized to the standard AGL task. Therefore, a further aim of the present study was to replicate the finding that AGL performance and general intelligence are only weakly related. From a practical point of view, the most important characteristic of a measure may be its predictive value. Mackintosh (1998) hypothesizes that performance in AGL may be a predictor of educational attainment. However, there is no empirical evidence for this hypothesis yet. Therefore, we will close this gap and test whether performance in an AGL task can predict educational success.

Zeitschrift für Psychologie (2017), 225(1), 5–19

D. Danner et al., Individual Differences in Implicit Learning

The Present Study The present study investigates if individual differences in implicit learning can be measured with AGL tasks. In Experiment 1, we will investigate the reliability, the task consistency, and the relation with reportable knowledge when the participants are asked to rate the novelty of letter strings. In Experiment 2, we will investigate the reliability, the task consistency, and the relation with reportable knowledge when the participants are asked to rate the grammaticality of letter strings. In Experiment 3, we will investigate whether a knowledge test affects the task consistency and the relation with reportable knowledge. In this latter experiment we will further investigate how AGL performance is associated with general intelligence and educational attainment.

Figure 1. Grammar 1 (string construction identical to Gebauer & Mackintosh, 2007).

Experiment 1 Estimating the task consistency requires that participants complete two AGL tasks. Because this may cause a validity problem, Gebauer and Mackintosh (2007) asked their participants to rate the novelty of letter strings. Even though this idea is theoretically sound, there is no empirical evidence for the similarity of grammaticality and novelty ratings. Therefore, the aim of this experiment was (1) investigating the reliability, the task consistency, and the relation with reportable knowledge and (2) investigating whether asking the participants to rate the novelty of letter strings measures the same construct as asking the participants to rate the grammaticality of letter strings. The participants completed three AGL tasks. In Task 1 and Task 2 the participants rated the novelty of letter strings; in Task 3 the participants rated the grammaticality of letter strings.

Figure 2. Grammar 2 (string construction identical to Gebauer & Mackintosh, 2007).

Method Participants The participants were N = 21 students from Heidelberg University who were recruited from the campus and were paid €5 for their participation. This sample size was chosen because it allows detection of a population correlation of r = .50 between accuracy of novelty and grammaticality rating with a type-one-error probability of 0.05 (one-tailed) and a power of 0.80. Stimulus Material There were three grammars. The strings of Grammar 1 (Figure 1) and Grammar 2 (Figure 2) were the same as those used by Gebauer and Mackintosh (2007). The strings of Grammar 3 were constructed as shown in Figure 3. For each grammar, there were 30 grammatical strings in the learning phase and 40 grammatical and 40 nongrammatical strings in the testing phase (complete lists of the Zeitschrift für Psychologie (2017), 225(1), 5–19

Figure 3. Grammar 3.

Ó 2017 Hogrefe Publishing

D. Danner et al., Individual Differences in Implicit Learning

strings used in each phase are provided in the Electronic Supplementary Material, ESM 1). The nongrammatical strings contained one violation of the grammar at random positions of the strings. The length of the strings varied between three and eight letters. To test the reportable grammar knowledge of the participants, 24 n-grams were selected for each grammar. There were 12 n-grams which occurred in the learning phase and which also occurred in the testing phase more frequently in grammatical than in nongrammatical strings (see ESM 1). These n-grams were chosen because they may help to identify grammatical strings as grammatical. In addition, there were 12 n-grams which did not occur in the learning phase but which did occur in the testing phase more frequently in nongrammatical strings than in grammatical ones. Those strings were chosen because they may help to identify nongrammatical strings. Procedure Each participant completed three AGL tasks. The first artificial grammar learning task was run with Grammar 1. In the learning phase 30 letter strings were presented and the participants were instructed to memorize them. Each string was presented individually for 3 s. The participants were asked to repeat the strings correctly by pressing the respective letters on the keyboard. When a string was repeated correctly, the feedback “correct” was given and the next string occurred. When a string was repeated incorrectly, the feedback “false” was given and the string was displayed again until repeated correctly. After a participant repeated 10 strings correctly, these 10 strings were simultaneously displayed for 90 s on the screen and the participant was asked to repeat them silently. After a participant repeated all 30 strings correctly the learning phase was finished. In the testing phase, 80 new strings were presented (see ESM 1). Ten grammatical and 10 nongrammatical strings were presented twice. These strings were randomly selected out of the original 80 strings. Thus there were a total of 100 strings in the testing phase and the retest correlation of the 20 strings could be computed. Even though all strings were new (not presented in the learning phase), the participants were instructed to rate the strings as “old” (presented in the learning phase) or “new” (not presented in the learning phase). To judge a string as “old,” the participants had to press the A-key of the keyboard; to judge a string as new, they had to press the L-key. The order of presentation of the strings was fixed across participants in a random order. This was done to ensure that possible effects of order would affect all participants in the same way. Immediately after the testing phase, the participants completed the n-gram knowledge test. In the n-gram knowledge test, the participants were instructed to judge Ó 2017 Hogrefe Publishing

whether an n-gram (e.g., NWS) occurred more often in “old” strings or whether an n-gram occurred more often in “new” strings. To judge an n-gram as occurring more often in “old” strings, the participants had to press the A-key of the keyboard; to judge an n-gram as occurring more often in “new” string, they had to press the L-key. The order of presentation of the n-grams was fixed across participants in a random order. All n-grams were presented twice so that the retest correlation could be computed. After a short break, the second artificial grammar learning task was run with Grammar 2. The procedures of the learning phase, the testing phase, and the knowledge test were the same as in the first AGL task. After a short break, the third artificial grammar learning task was run with Grammar 3. After the learning phase was finished, the participants were informed that all strings in the learning phase were constructed according to a complex rule system. In the testing phase, the participants were instructed to rate the strings as grammatical or nongrammatical. Otherwise, the procedure was identical with the first and the second task. Measures The judgment accuracy was quantified as the percentage of correct classifications of the strings in the testing phase. As suggested by Gebauer and Mackintosh (2007), grammatical strings which were rated as “old” strings and nongrammatical strings which were rated as “new” strings were counted as correct classifications. The amount of n-gram knowledge was quantified as the percentage of correct classifications of n-grams in the knowledge test. Analog to the testing phase, grammatical n-grams which were rated as “old” and nongrammatical n-grams which were rated as “new” were counted as correct classifications. Results Judgment Accuracy The judgment accuracy, the reliability estimates, and the correlation between tasks are shown in Table 1. In line with previous studies, the judgment accuracy was significantly above chance in all tasks (all M 57.29%, all t 5.95, all p < .001). The reliability was estimated with Cronbach’s alpha, the split-half correlation (odd-even-split, Spearman-Brown corrected), and the retest correlation. Because the retest correlation was based on only 20 out of the 100 presented strings, we de-attenuated the correlation 5r coefficient by 1 þ4r (Lord & Novick, 1974, p. 86). As can be seen, the reliability estimates were rather heterogeneous. Some estimates were even negative which clearly indicates that the assumptions of the underlying measurement model were violated. The greatest reliability estimate was the retest correlation of 0.87 in Task 2. There was no significant or Zeitschrift für Psychologie (2017), 225(1), 5–19

D. Danner et al., Individual Differences in Implicit Learning

Table 2. N-gram knowledge in Experiment 1

Table 1. Judgment accuracy in Experiment 1 Task 1

Task 2

Task 3

Instruction

Novelty

Grammaticality

Instruction

Mean (%)

64.00

63.62

57.29

Mean (%)

72.94

62.70

71.63

4.06

6.26

5.60

SD (%)

11.18

8.17

11.53

.16

.46

.30

.45

.29

.37

.59

.67

.22

.52

.87

.27

SD (%) Cronbach’s α Split-half correlation

Retest correlation2 Correlation with Task 1

.18

Note. **p < .01. 1Spearman-Brown corrected by

.58** .08

Correlation with Task 2 2r 2 1þr; corrected

5r 1þ4r.

substantial correlation between Task 1 and Task 2 (r = .18, p = .443) or Task 2 and Tasks 3 (r = .08, p = .728), but a significant correlation between Task 1 and Task 3 (r = .58, p = .006). N-Gram Knowledge Performance in the n-gram knowledge test and correlation with judgment accuracy are shown in Table 2. N-gram knowledge was significantly above chance in all tasks (all M 62.70%, all t 8.44, all p < .001). There was no significant correlation between n-gram knowledge and judgment accuracy in Task 1, Task 2, or Task 3. Discussion This first experiment addressed the conceptual obstacle that arises when an AGL task is completed for a second time. After participants have completed a standard AGL task, they do know that there is a grammar constituting the letter strings and this may change their learning strategies in a second task. To avoid this problem, we followed Gebauer’s and Mackintosh’s (2007) approach and asked the participants to rate the novelty instead of the grammaticality. Then, we investigated whether novelty ratings indicate implicit learning success and whether novelty ratings bear an incremental value over grammaticality ratings. The present results do suggest that novelty ratings indicate implicit learning success. First, the judgment accuracy in all tasks was significantly above chance in all three tasks. This replicates the results of Gebauer and Mackintosh (2007) and suggests that learning took place. Second, in Task 1 and Task 2 there are individual differences in implicit learning. The retest correlations were r = .52 and r = .87 which correspond with the reliability estimates reported by Gebauer and Mackintosh (2007) and Reber et al. (1991). Third, the correlations with the n-gram knowledge test were nonsignificant. This suggests that judgment accuracy cannot be explained by the participants’ n-gram knowledge. However, the correlation between judgment accuracy in Task 1 and Task 2 was small and nonsignificant (r = .18). Zeitschrift für Psychologie (2017), 225(1), 5–19

Task 1 Task 2

Task 3

Novelty Novelty Grammaticality

Cronbach’s α

.39

.15

.47

Retest correlation

.50

.49

.64

Correlation with judgment accuracy

.27

.10

.16

Split-half correlation

Note. No correlation with the judgment accuracy was significant. 1Spear2r man-Brown corrected by 1þr .

This suggests that novelty ratings are not task consistent: performance in the first AGL tasks indicates something different than performance in the second AGL task. When participants complete an AGL task for the first time, they are asked to learn a list of letter strings but they do not know that they will be asked to rate letter strings as new or old afterwards. When participants complete an AGL task for the second time, they are asked to learn a list of letter strings again, but they already know that they will be asked to rate letter strings as new or old afterwards. This may cause the participants to use different strategies or heuristics to remember the strings and the performance in the second task may reflect not only implicitly learned knowledge but also a change in cognitive processing. In sum, the results of the first experiment suggest that novelty ratings are moderately reliable and independent from n-gram knowledge but not task consistent. Therefore, novelty ratings seem to create no incremental value over grammaticality ratings.

Experiment 2 The results of the first experiment suggest that novelty ratings are not task consistent. In Experiment 2 we will investigate whether grammaticality ratings are a better performance indicator for implicit learning. The participants complete three AGL tasks. We estimate the reliability, the task consistency, and the association with n-gram knowledge when the participants are asked to rate the grammaticality of letter strings. In addition, we add the order of presentation of the grammars as a between-subject factor to make sure that a feature of a specific grammar does not bias the results. Method Participants The participants were N = 42 students from the Heidelberg University who were recruited from the campus and were paid €5 for their participation. The order of presentation of the grammars was added as a between-participant Ó 2017 Hogrefe Publishing

D. Danner et al., Individual Differences in Implicit Learning

variable. One participant already had participated in Experiment 1 and therefore was excluded from the analysis. Stimulus Material The stimuli were the same as used in Experiment 1. Procedure All participants completed three AGL tasks. Half of the participants completed Task 1 with Grammar 1, Task 2 with Grammar 2, and Task 3 with Grammar 3 (order 1). The other half of the participants completed Task 1 with Grammar 2, Task 2 with Grammar 1, and Task 3 with Grammar 3 (order 2). The order of presentation of Grammar 3 was not included as a between-participant variable since that would have required a larger sample size. The procedures of the learning phase, the testing phase, and the knowledge test were the same as in Experiment 1 with two exceptions. First, the participants were asked to rate the grammaticality of the letter strings in all three tasks. Second, the n-grams were only presented once instead of twice, because of a software problem. Measures As in Experiment 1, judgment accuracy and the amount of n-gram knowledge were recorded. Results The pattern of results was the same for order 1 and order 2. Therefore, we present the results pooled over both groups.

Judgment Accuracy The judgment accuracy, the reliability estimates, and the correlation between tasks are shown in Table 3. Again, judgment accuracy was significantly above chance in all tasks (all M 59.15%, all t 8.88, all p < .001). As in Experiment 1, the reliability estimates were moderate (between 0.49 and 0.80). There was no significant or substantial correlation between Task 1 and Task 2 (r = .05, p = .715) or between Task 1 and Task 3 (r = .08, p = .631), but a significant correlation between Task 2 and Task 3 (r = .38, p = .014). N-Gram Knowledge Performance in the n-gram knowledge test and the correlation with the judgment accuracy are shown in Table 4. N-gram knowledge was significantly above chance in all tasks (all M 64.90%, all t 7.83, all p < .001). There was no significant correlation between n-gram knowledge and the judgment accuracy in Task 1 (r = .06, p = .715). However, there was a small correlation in Task 2 (r = .30, p = .060) and a significant correlation in Task 3 (r = .34, p = .023). Discussion In Experiment 2 we investigated whether grammaticality ratings can be used to measure individual differences in implicit learning. As in Experiment 1, the judgment accuracy was above chance in all three tasks which suggests that learning took place. In Task 1, there was no association

Table 3. Judgment accuracy in Experiment 2 Task 1

Task 2

Task 3

Instruction

Grammaticality

Mean (%)

61.22

61.76

59.15

7.12

7.00

6.59

Cronbach’s α

.55

.54

.49

Split-Half correlation1

.55

.75

.62

SD (%)

Retest correlation

.79

Correlation with Task 1

.80

.52

.05

.08

Correlation with Task 2 Note. *p < .05. 1Spearman-Brown corrected by

.38* 2r 2 1þr; corrected

5r 1þ4r.

Table 4. N-gram knowledge in Experiment 2 Task 1

Task 2

Task 3

Instruction

Grammaticality

Mean (%)

66.12

67.21

64.90

8.94

11.90

11.53

Cronbach’s α

.42

.24

.20

Split-Half correlation1

.25

.40

.17

.06

.30

.34*

SD (%)

Correlation with judgment accuracy Note. *p < .05. Spearman-Brown corrected by 1

Ó 2017 Hogrefe Publishing

2r 1þr.

Zeitschrift für Psychologie (2017), 225(1), 5–19

between judgment accuracy and n-gram knowledge. In Task 2, there was a small correlation, and in Task 3 there was a significant correlation between judgment accuracy and n-gram knowledge. This suggests that performance in Task 1 captures individual differences in implicit learning, but performance in Task 2 and Task 3 captures the performance in a different learning process. In line with this interpretation, there was no association between performance in Task 1 and Task 2 or between Task 1 and Task 3 but there was a significant correlation between Task 2 and Task 3. Comparing the results of Experiment 1 and Experiment 2 further reveals that there was a substantial and significant correlation between the first and the third task in Experiment 1 but not in Experiment 2. This suggests that novelty ratings and grammaticality ratings are not equivalent even though both indicators measure aspects of implicit learning. At first glance, this pattern of results appears to demonstrate that grammaticality ratings may only be used once to measure individual differences in implicit learning. During a second task, the participants may direct their attention toward n-grams and judgment accuracy is not a valid indicator for implicit learning any more. However, isn’t there an alternative explanation? The participants completed a knowledge test (containing n-grams of letter strings) after every AGL task. Therefore, it is also possible that the knowledge test and not the grammar awareness changed the participants’ strategy and caused the low task consistency as well as the relation with reported knowledge. In Experiment 3, we will follow up on this possible explanation and investigate whether a knowledge test affects the task consistency and the relation with reportable knowledge.

D. Danner et al., Individual Differences in Implicit Learning

Figure 4. Grammar 4.

Figure 5. Grammar 5.

Stimulus Material The stimuli for the first AGL task were constructed according to Grammar 4 (Figure 4). The stimuli for the second AGL task were constructed according to Grammar 5 (Figure 5). We created these two new grammars to ensure that our results are generalizable to different grammatical structures.

Experiment 3 In Experiment 3 we investigate whether an n-gram knowledge test affects the task consistency of AGL tasks. Half of the participants completed two AGL tasks and an n-gram knowledge test after each AGL task (n-gram group). Half of the participants completed an n-gram knowledge test after the second AGL task only (control group). In addition, we investigated the relation between AGL performance, general intelligence, and educational attainment. Method Participants The participants were N = 106 students from the Heidelberg University who were recruited from the campus and were paid €5 for their participation. The participants were randomly assigned to either the n-gram group (N = 53) or the control group (N = 53).

Zeitschrift für Psychologie (2017), 225(1), 5–19

Procedure All participants completed (1) a first AGL task, then (2) a knowledge test, then (3) the Culture Fair Intelligence Test (CFT; Cattell, Krug, & Barton, 1973), then (4) an additional AGL task, and then (5) a further knowledge test. (1) The first artificial grammar learning task. The procedure of the AGL task was identical with Experiment 2, except that the learning phase consisted of 39 letter strings and that the testing phase consisted of 78 letter strings which were all repeated. (2) The first knowledge test. Immediately after the testing phase, the participants completed a knowledge test. The n-gram group completed an n-gram knowledge test and the control group completed a dummy knowledge test. The n-gram knowledge test assessed participants’ knowledge of n-grams. To judge an n-gram as grammatical, the participants had to press the A-key. To judge an n-gram

Ó 2017 Hogrefe Publishing

D. Danner et al., Individual Differences in Implicit Learning

as nongrammatical, the participants had to press the L-key. There were 34 different n-grams for Grammar 4 (see ESM 1). All n-grams were presented twice so that there were a total of 68 items in the n-gram knowledge test. The order of presentation of the strings was fixed across the participants in a random order. The percentage of correct judgments in the n-gram knowledge test was taken as an indicator for the amount of reportable knowledge. In order to make the procedure for the n-gram and the control group parallel, the control group completed a dummy knowledge test which was unrelated with the letter strings. The dummy knowledge test consisted of statements like “Alberto Fujimori was president of Peru from 1990 to 2000” (which is correct, by the way) and the participants were asked to rate the truth of the statements. To rate a string as true, the participants had to press the A-key of the keyboard; to rate a string as false, the L-key. There were 34 different statements and all statements were presented twice so that there were a total of 68 items in the dummy knowledge test. Participants’ responses in the dummy knowledge test were not analyzed. (3) The Culture Fair Intelligence Test (Cattell et al., 1973) was used as an indicator for participants’ general intelligence. The test consists of 48 different figural reasoning items. The speed version of the test was administered, which took approximately 25 min. The number of correctly solved items was taken as the performance indicator for participants’ general intelligence. In the present sample, mean IQ was M = 128 (range 91–150, SD = 12.8). (4) The second artificial grammar learning task. The second AGL task also consisted of a learning phase and a testing phase. The procedure was identical to the first AGL task. (5) The second knowledge test. The procedure for the second knowledge test was identical to the first with the exception that all participants completed an n-gram knowledge test after the testing phase and the n-gram knowledge test consisted of 36 items for Grammar 5. All n-grams were presented twice so that there was a total of 72 items in the n-gram knowledge test. The stimuli are shown in ESM 1.

Results Judgment Accuracy Judgment accuracy, reliability estimates, and correlation between tasks are shown in Table 5. In line with previous studies, judgment accuracy was significantly above chance in all tasks (all M 56.98%, all t 11.05, all p < .001. Reliability estimates were generally moderate. In the control group, there was a significant correlation between Task 1 and Task 2 (r = .39, p = .004). In contrast, there was no significant correlation between Task 1 and Task 2 (r = .22, p = .109) in the n-gram group. N-Gram Knowledge Performance in the n-gram knowledge test and correlation with judgment accuracy are shown in Table 6. N-gram knowledge was significantly above chance in all tasks (all M 55.03%, all t 5.50, all p < .001). In the n-gram group, there was no significant correlation between n-gram knowledge and judgment accuracy in Task 1 (r = .01, p = .942), but there was a significant correlation in Task 2 (r = .30, p = .029). In the control group, there was no knowledge test after the first AGL task. There was no significant correlation (r = .02, p = .884) between n-gram knowledge and judgment accuracy in Task 2. Relation With General Intelligence and Educational Attainment To investigate the relation between implicit learning, general intelligence, and educational attainment, we took the performance in the first AGL task as an indicator for implicit learning performance. The data of the n-gram group and the control group were analyzed together because the procedure for both groups was identical until the completion of the first AGL task. The number of solved items in the Culture Fair Intelligence Test served as a measure of participants’ general intelligence. Cronbach’s alpha for the 48 items was α = .73. The correlation between performance in the first AGL task and the CFT3 was low and not significant (r = .16, p = .111). We also computed the correlation corrected for attenuation (r*) across both groups. The retest

Table 5. Judgment accuracy in Experiment 3 With knowledge test

Without knowledge test

Task 1

Task 2

Task 1

Task 2

Instruction

Grammaticality

Mean (%)

58.09

56.98

59.62

57.28 4.32

SD (%)

7.96

4.61

6.55

Cronbach’s α

.77

.33

.66

.21

Split-Half correlation1

.71

.13

.49

.12

Retest correlation2

.75

.52

.68

Correlation with Task 1 Note. *p < .05. 1Spearman-Brown corrected by

Ó 2017 Hogrefe Publishing

.22 2r 2 1þr; Spearman-Brown

corrected by

.43 .39*

5r 1þ4r.

Zeitschrift für Psychologie (2017), 225(1), 5–19

D. Danner et al., Individual Differences in Implicit Learning

Table 6. N-gram knowledge in Experiment 3 With knowledge test

Without knowledge test

Task 1

Task 2

Task 1

Task 2

Instruction

Grammaticality

Mean (%)

55.45

55.03

–

56.40

7.92

6.66

–

6.87

Cronbach’s α

.45

.33

–

.35

Split-Half correlation1

.49

.01

–

.00

Retest correlation

.55

.32

–

.33

Correlation with judgment accuracy

.01

.30*

–

.02

SD (%)

Note. *p < .05. Spearman-Brown corrected by 1

2r 1þr.

correlation of Task 1 for both groups was r = .72 and hence, the correlation corrected for attenuation was r* = .22. We asked the participants to report their final school exams’ grade point average (1 = “very good” to 6 = “failed”). The grades ranged between 1.0 and 3.1 with a mean of M = 1.81. The correlation between school grades and performance in the Culture Fair Intelligence Test was significant (r = .35, p < .001, r* = .40) but the correlation between school grades and performance in the first AGL task was not significant (r = .10, p = .320, r* = .12). In addition, we also ran a multiple regression analysis and predicted school grade by the Culture Fair Intelligence Test and the performance in the first AGL task. There was a significant association between school grades and the Culture Fair Intelligence Test (β = .34, p < .001) but not between school grades and the AGL task (β = .05, p = .600). Discussion In Experiment 3, we investigated the hypothesis that AGL tasks are consistent if there is no n-gram test between subsequent tasks but not consistent if there is an n-gram test between tasks. The present results support this hypothesis. There was a significant correlation between two successive AGL tasks in the control group (no n-gram test after the first task) but not in the n-gram group (n-gram test after the first task). The present results further suggest that the decrease in task consistency was due to an attention shift toward n-grams. In the n-gram group, there was no correlation between judgment accuracy and n-gram knowledge in Task 1, but there was a significant correlation between judgment accuracy and n-gram knowledge in Task 2. This suggests that the participants started to base their grammaticality judgments on n-grams after completing an n-gram test. In contrast, in the control group, there was no correlation between judgment accuracy and n-gram knowledge in Task 2. This suggests that the n-gram test and not the Zeitschrift für Psychologie (2017), 225(1), 5–19

awareness that there is a grammar constituting the letter strings decreases the task consistency. Buchner and Wippich (2000) discuss that the typical higher reliability of explicit measures compared to implicit measures makes it more likely to observe significant correlation between two explicit measures than between an implicit and an explicit measure. This suggests that the low and nonsignificant correlation between implicit learning and school grade could be explained by the lower reliability of the implicit learning measure. However, the reliability estimates for the first implicit learning task (α = .73 across both conditions) and the CFT (α = .73) were identical and thus, the reliabilities of the measures cannot have biased the correlation.

General Discussion Implicit learning has stimulated research in various fields of psychology. In cognitive psychology implicit learning tasks have spawned fertile discussions about cognitive strategies and conscious and unconscious processes (e.g., Pothos, 2007). In psychophysiological research AGL tasks have been used to investigate physiological foundations of implicit learning (e.g., Schankin, Hagemann, Danner, & Hager, 2011). Recently, implicit learning has also attracted attention as an individual difference variable (e.g., Danner, Hagemann, Schankin, Hager, & Funke, 2011). The present work addressed conceptual and methodological conundrums with measuring individual differences in implicit learning. For one thing, we discussed and investigated the obstacles that arise when participants complete an AGL task more than once. For another thing, we addressed how reportable knowledge can be measured and how a knowledge test impacts the psychometric properties of an implicit learning performance variable. In the coming section, we will summarize the core findings and discuss how they are linked with theoretical and practical implications. The three core findings of our research are: (1) overall, the reliability of Ó 2017 Hogrefe Publishing

D. Danner et al., Individual Differences in Implicit Learning

performance indicators is moderate, (2) an n-gram knowledge test decreases the task consistency and increases the correlation with reportable grammar knowledge, and (3) performance in AGL tasks is independent from general intelligence and educational attainment. Moderate Reliability of Performance Indicators A glance over Tables 1–6 shows that both Cronbach’s alpha and the split-half correlations have repeatedly negative values (e.g., Tables 1, 2, 4–6). Because reliability coefficients cannot be negative by definition, this points to a violation of assumptions that underlay the interpretation of these statistics as estimates of reliability. In particular, these statistics can be interpreted as point estimates of reliability only if all items (in the case of Cronbach’s alpha) or both testhalves (in the case of split-half correlations) measure exactly the same true score and if the measurement errors are uncorrelated (and in the case of the split-half correlations, if the error variances of both test-halves are equal). The negative values of these statistics imply that at least one assumption is violated and therefore these statistics cannot be interpreted as estimates of reliability. On the other hand, all retest correlations were positive and therefore in the admissible range. Therefore, there is no direct indication that the assumptions underlying this statistics are violated and therefore we can interpret the retest correlations as coefficients of reliability. As an estimate of reliability, the retest correlation requires the average of the repeated items to have the same true score and the same error variance (Lord & Novick, 1974). In particular, the method requires that the true score of the average judgment accuracy during the first presentation is the same true score as the average judgment accuracy during the second presentation. On the one hand, participants may feel more familiar with the strings and thus are more likely to rate them as grammatical. This would artificially increase the proportion of grammaticality ratings during the second presentation and thus decrease the retest correlation. On the other hand, participants may also explicitly remember the string as well as their prior rating which would artificially increase the correlation. However, as shown by Reber and Allen (1978) and our own pretests, the participants are not able to remember specific letter strings or their responses to specific letter strings.1 Hence, the retest correlation may provide the most accurate reliability estimate in the present study and repeating items in an implicit learning task may be the most promising method to obtain an accurate reliability estimate. 1

In the present experiments, the average retest correlation was r = .58. The magnitude of this estimate is in line with previous research. Reber et al. (1991) reported a Cronbach’s alpha of α = .50 and Gebauer and Mackintosh (2007) reported split-half correlations of r = .70. These findings suggest that the manifest performance score is too inaccurate to make inferences regarding individuals’ abilities and that AGL tasks should not be used for individual assessments. Buchner and Wippich (2000) suggest that participants use various cognitive processes when they complete implicit learning tasks and that this may be one reason for their typical low reliability. Beyond the generally moderate reliability estimates, the results show a further conspicuity: there were substantial differences in the reliability estimates between tasks. For example, in Experiment 1, Task 2, the retest correlation was r = .87 whereas in Experiment 3, Task 2, the retest correlation was only r = .21. How can these differences be explained? First, the reliability estimates of the judgment accuracies were strongly associated with the variance of judgment accuracies. We computed Spearman’s ρ between the SDs of learning scores and the respective retest correlations across all tasks and experiments. This correlation was ρ = .64 (p = .045), that is, larger SDs of learning scores are positively related to larger reliabilities of learning scores. Some grammars are associated with larger individual differences in performance than others, and the former ones are particularly well suited for a reliable measurement of individual differences in AGL task performance. Second, the properties of the specific letter strings can affect the reliability estimates because different letter strings may indicate implicit learning to a different extent which in turn can decrease the (true score) variance in implicit learning for a specific set of letter strings. One way of increasing the reliability may be selecting the items with the highest item-total correlation. On the one hand, this will yield a homogeneous set of letter strings and greater reliability estimates. On the other hand, the selected items may not be representative of the underlying grammar any more. For example, selecting items with the highest item-total correlation may produce a set of grammatical and nongrammatical strings that do not only differ in grammaticality but also in superficial features such as string length or fluency. Accordingly, the judgment accuracy in such a set of items may no longer indicate implicit learning but rather the use of fluency, string complexity, or string length as a heuristic. In addition, an increasing number of items may decrease participants’ concentration

One limitation of this argument must be mentioned. Reber and Allen (1978) used introspective reports to examine whether the participants remember specific strings, but even if the participants were not able to recall specific strings, they may still feel more familiar with some and hence, rated them as grammatical (cf. Whittlesea & Leboe, 2000).

Ó 2017 Hogrefe Publishing

Zeitschrift für Psychologie (2017), 225(1), 5–19

D. Danner et al., Individual Differences in Implicit Learning

or motivation. Thus, increasing the number of items may not be the best way to increase measurement accuracy. Another way may be developing letter strings that are less susceptible to fragment knowledge, fluency, or similarity. However, since we do not know which specific strings are affected by these effects, this will be a rocky road to greater reliability. Third, the heterogeneity of participants can influence the reliability of a variable. Reliability is defined as the true score variance of a variable relative to the observed variance. Thus, in a homogeneous sample with only minor true score differences, the reliability of a variable may be small even if the test or instrument itself allows an accurate measurement with small error variance. A small variation of implicit learning true scores is also consistent with Reber’s evolutionary model of implicit learning (e.g., Reber & Allen, 2000). The model describes implicit learning as a mechanism that developed long before explicit learning. Such an unconscious learning system that, for example, allows detecting the association between climate and occurrence of particular food sources may have been crucial for success or survival. Accordingly, individuals with high implicit learning abilities would have a higher probability to survive whereas individuals with low implicit learning abilities would have a lower probability to survive (principle of success). Over a long period of time, only successful implicit learners would survive which would result in smaller individual differences in implicit learning (principle of conservation). Hence, the moderate reliability estimates in AGL may generally reflect small individual differences in implicit learning. Effects of N-Gram Knowledge Test In Experiment 1 and Experiment 2, there were low and nonsignificant correlations between the first and the second AGL task when the participants completed an n-gram knowledge test between tasks ( .18 r .05). Likewise, in Experiment 3, there was a low and nonsignificant correlation when the participants completed an n-gram knowledge test between tasks (r = .22). This suggests a low task consistency when the participants complete knowledge tests between subsequent AGL tasks. On the other hand, there was a substantial and significant correlation (r = .39) between subsequent tasks when the participants did not complete an n-gram knowledge test between tasks. Adjusting this correlation for unreliability even reveals a 0:39 ffi ¼.72 between the true scores. correlation of r ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi

In other words, measuring n-gram knowledge appears to generate a Heisenberg effect: by measuring the phenomenon, we change the phenomenon. The findings suggest that the participants start to shift their attention toward n-grams after completing a knowledge test and the participants may start to pay attention to which n-grams occur in subsequent learning phases and base their grammaticality judgments on their n-gram knowledge. For example, after completing an n-gram knowledge test, a participant may pay more attention to the n-grams in a subsequent learning phase. Hence, the participants may notice that the n-grams WNS and NXT occur more frequently than other n-grams. Thus, in a subsequent testing phase, the participant will judge letter strings containing these n-gram as grammatical. In line with this interpretation, the correlation between n-gram knowledge and judgment accuracy rose across tasks in Experiment 2 (r = .06, r = .30, r = .34). Likewise, in Experiment 3, there was a significant correlation between n-gram knowledge and judgment accuracy after the participants completed an n-gram knowledge test (r = .30), but not when the participants did not complete an n-gram test before (r .02). This interpretation is in line with Perruchet and Pacteau (1990) who demonstrated that participants can use n-gram knowledge to reach above chance accuracy. Therefore, we suggest avoiding n-gram knowledge tests if the same participants should complete another AGL tasks in the future. Artificial Grammar Learning Not Related to General Intelligence or Educational Attainment In Experiment 3, performance in AGL was independent from general intelligence. This replicates previous research and suggests the divergent validity of AGL. Individual differences in implicit learning do not overlap with general intelligence and can reveal insights into cognitive ability beyond IQ. Thus, implicit learning may be seen as a complementary construct to describe human ability. The correlation between AGL performance and the participants’ school grades was also low and nonsignificant. This does not suggest that implicit learning is not relevant for educational success. However, there have not been many investigations of the predictive value of implicit learning yet and the students’ grade point average may only be seen as a rough indicator of success in students’ real lives. Therefore, future research will help to understand the role of implicit learning in real life in greater detail.

0:68 0:43

This suggests that a knowledge test decreases the task consistency of AGL tasks – not the participants’ awareness that there is a grammar constituting the letter strings.2 2

Limitation Before strong conclusions can be drawn, one limitation of the present work must be noted. In student samples,

The adjusted correlation is based on the reliability estimates of the tasks. As discussed, these reliability estimates can be biased and thus, the adjusted correlation may overestimate the correlation between true scores.

Zeitschrift für Psychologie (2017), 225(1), 5–19

Ó 2017 Hogrefe Publishing

D. Danner et al., Individual Differences in Implicit Learning

cognitive performance variables may be biased toward the upper range and may be restricted in their variance. This problem can readily be demonstrated in Experiment 3 where we used the CFT to measure intelligence. According to the norm tables (that are based on samples from 1963 to 1970), the participants had an IQ range between 91 and 150 with an above-average IQ of M = 128 (instead of the expected population M = 100) and a reduced variability of SD = 12.8 (instead of the expected population SD = 15). When interpreting these data, one must keep in mind that according to the “Flynn effect” the performance in IQ tests increases over the years, which may in part explain the above-average IQ scores in the present sample (see Pietschnig & Voracek, 2015, for a recent meta-analysis). Nonetheless, our data point to a variance reduction of nearly 30% in IQ scores, which may mitigate the correlations between IQ and other variables in the present study such as AGL task performance and school grades. In particular, Roth et al. (2015) performed a meta-analysis of the association between IQ scores and school grades. Based on 240 independent samples, they corrected for sampling error, unreliability, and restrictions of range and estimated the population correlation to be r = .54. The homogeneity of our samples with regard to cognitive performance may be one important reason why the observed correlation between IQ scores and school grades was only r = .35. Taken together, the use of student samples may reduce the variance of cognitive performance measures (such as AGL task performance, IQ scores, and school grades), which in turn mitigates reliabilities and correlations of these measures. Alternative Setups of the AGL Task One straightforward solution to the problem of low reliabilities of AGL scores noted above might be the use of different grammars in a sequence of different AGL tasks and combine the learning score. From a classical test theory perspective it is a sound suggestion to measure a construct with a variety of independent tasks and aggregate across them to obtain a total test score of great reliability and generality (validity). This idea is exemplified in test batteries of general intelligence such as the Wechsler tests. With respect to AGL task performance, one might perform subsequent AGL tasks with different grammars. This immediately poses the problem of repeated AGL tasks, that is, after the first task the participants know the grammatical nature of the letter strings and therefore may change their learning strategy in subsequent tasks. There may be one interesting option to circumvent the problem of repeated testing with the AGL task. In the present experiments we used a sequential setup of the tasks, that is, we conducted one AGL task with one grammar at one time, then we conducted the next AGL task with Ó 2017 Hogrefe Publishing

another grammar and so forth, each task having its own learning and testing phase. However, it would also be possible to use these different grammars to produce letter strings and mixing these strings from different grammars in one learning phase. This approach would effectively prevent the problem that knowledge of the grammatical nature of the letter strings may influence the next learning phase. Unfortunately, this approach has some limitations of its own. Using two or more different grammars to generate one set of letter strings is indistinguishable from constructing one hybrid grammar and using this to generate the strings. Such a hybrid grammar would start with a common starting node and then switch to one of the basic grammars. For example, consider Figures 1 and 2 which show two basic grammars. They can easily be combined by using the lefthanded starting node of each grammar as a common start and allowing four paths N, W, L, and R, with the former two continuing with grammar 1 and the latter two continuing with grammar 2. Of course, this hybrid grammar would be much more complex than each of the basic grammars. Schiff and Katan (2014) have investigated the associations between grammar complexity and performance in the AGL tasks. They meta-analyzed data from 56 experiments that used 10 different grammars and showed that there is a negative correlation of r = .32 between grammar complexity (quantified as the topological entropy of the grammar chart) and AGL performance across all experiments. From this result it may be inferred that following the idea of intermixing letter strings of several separate grammars will deteriorate performance, that is, the task difficulty will increase. A shift of task difficulty toward greater difficulty must shift each person’s judgment accuracy toward chance level, which in turn must reduce the variance of judgment accuracy. In turn, a reduction of the variance of AGL task performance will mitigate reliability of the performance measures and their correlations with other variables (such as intelligence and school grades). Therefore, mixing up several items from different grammars in one learning phase may solve the problem of task knowledge in a sequential setup of the AGL task but will reduce reliability and correlations. If this route is a viable one may be target of future research.

Summary Artificial grammar learning tasks can be used to measure individual differences in implicit learning. The low correlation with other ability constructs such as general intelligence suggests a good divergent validity of AGL. In line with previous research, the present results suggest that the reliability of the measurement is generally moderate and hence may be used for the study of individual differences but not for individual assessments. The present results further suggest Zeitschrift für Psychologie (2017), 225(1), 5–19

that n-gram knowledge tests should be avoided when the participants complete more than one AGL task because an n-gram test shifts attentions toward n-grams, decreases the task consistency, and increases the relation with reportable grammar knowledge. We hope that the present results and reflections stimulate and support this line of research in the field of implicit learning.

Acknowledgments This research was supported by grants awarded by the Deutsche Forschungsgemeinschaft to Dirk Hagemann (HA3044/7-1). Part of this research was previously published as part of the Daniel Danner’s dissertation at the Heidelberg University. We gratefully thank Andreas Neubauer, Anna-Lena Schubert, and Katharina Weskamp for administering the experiments and Saul Goodman and an anonymous reviewer for helpful comments on an earlier draft of this manuscript.

Electronic Supplementary Material The electronic supplementary material is available with the online version of the article at http://dx.doi.org/10.1027/ 2151-2604/a000280 ESM 1. Tables (.doc). Strings used in the experiments.

References Boucher, L., & Dienes, Z. (2003). Two ways of learning associations. Cognitive Science, 27, 807–842. doi: 10.1207/ s15516709cog2706_1 Buchner, A., & Wippich, W. (2000). On the reliability of implicit and explicit memory measures. Cognitive Psychology, 40, 227–259. doi: 10.1006/cogp.1999.0731 Cattell, R. B., Krug, S. E., & Barton, K. (1973). Technical supplement for the Culture Fair Intelligence Tests, Scales 2 and 3. Champaign, IL: Institute for Personality and Ability Testing. Danner, D., Hagemann, D., Schankin, A., Hager, M., & Funke, J. (2011). Beyond IQ: A latent state-trait analysis of general intelligence, dynamic decision making, and implicit learning. Intelligence, 39, 323–334. doi: 10.1016/j.intell.2011.06.004 Dulany, D. E., Carlson, R. A., & Dewey, G. I. (1984). A case of syntactical learning and judgment: How conscious and how abstract? Journal of Experimental Psychology: General, 113, 541–555. doi: 10.1037/0096-3445.113.4.541 Funke, J., & Frensch, P. A. (2007). Complex problem solving: The European perspective – 10 years after. In D. H. Jonassen (Ed.), Learning to solve complex scientific problems (pp. 25–47). New York, NY: Erlbaum. Gebauer, G. F., & Mackintosh, N. J. (2007). Psychometric intelligence dissociates implicit and explicit learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 34–54. doi: 10.1037/0278-7393.33.1.34

Zeitschrift für Psychologie (2017), 225(1), 5–19

D. Danner et al., Individual Differences in Implicit Learning

Gomez, R. L., & Gerken, L. (1999). Artificial grammar learning by 1-year-olds leads to specific and abstract knowledge. Cognition, 70, 109–135. doi: 10.1016/S0010-0277(99)00003-7 Jamieson, R. K., & Mewhort, D. J. K. (2009). Applying an exemplar model to the artificial-grammar task: Inferring grammaticality from similarity. The Quarterly Journal of Experimental Psychology, 62, 550–575. doi: 10.1080/17470210802055749 Kaufman, S. B., DeYoung, C. G., Gray, J. R., Jiménez, L., Brown, J., & Mackintosh, N. (2010). Implicit learning as an ability. Cognition, 116, 321–340. doi: 10.1016/j.cognition.2010. 05.011 Kinder, A., Shanks, D. R., Cock, J., & Tunney, R. J. (2003). Recollection, fluency, and the explicit/implicit distinction in artificial grammar learning. Journal of Experimental Psychology: General, 132, 551–565. doi: 10.1037/0096-3445. 132.4.551 Knowlton, B. J., & Squire, L. R. (1996). Artificial grammar learning depends on implicit acquisition of both abstract and exemplarspecific information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 169–181. doi: 10.1037/ 0278-7393.22.1.169 Lord, F. M., & Novick, M. R. (1974). Statistical theories of mental test scores. Oxford, UK: Addison-Wesley. Mackintosh, N. J. (1998). IQ and human intelligence. New York, NY: Oxford University Press. Perruchet, P., & Pacteau, C. (1990). Synthetic grammar learning: Implicit rule abstraction or explicit fragmentary knowledge? Journal of Experimental Psychology: General, 119, 264–275. doi: 10.1037/0096-3445.119.3.264 Pietschnig, J., & Voracek, M. (2015). One century of global IQ gains: A formal meta-analysis of the Flynn effect (1909–2013). Perspectives on Psychological Science, 10, 282–306. doi: 10.1177/1745691615577701 Pothos, E. M. (2007). Theories of artificial grammar learning. Psychological Bulletin, 133, 227–244. doi: 10.1037/0033-2909. 133.2.227 Pretz, J. E., Totz, K. S., & Kaufman, S. B. (2010). The effects of mood, cognitive style, and cognitive ability on implicit learning. Learning and Individual Differences, 20, 215–219. doi: 10.1016/ j.lindif.2009.12.003 Reber, A. S. (1967). Implicit learning of artificial grammars. Journal of Verbal Learning & Verbal Behavior, 6, 855–863. doi: 10.1016/S0022-5371(67)80149-X Reber, A. S. (1992). The cognitive unconscious: An evolutionary perspective. Consciousness and Cognition, 1, 93–133. doi: 10.1016/1053-8100(92)90051-B Reber, A. S., & Allen, R. (1978). Analogic and abstraction strategies in synthetic grammar learning: A functionalist interpretation. Cognition, 6, 189–221. doi: 10.1016/0010-0277 (78)90013-6 Reber, A. S., & Allen, R. (2000). Individual differences in implicit learning: Implications for the evolution of consciousness. In R. G. Kunzendorf & B. Wallace (Eds.), Individual differences in conscious experience (Vol. 20, pp. 227–247). Amsterdam, The Netherlands: John Benjamins. Reber, A. S., Walkenfeld, F. F., & Hernstadt, R. (1991). Implicit and explicit learning: Individual differences and IQ. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 888–896. doi: 10.1037/0278-7393.17.5.888 Roth, B., Becker, N., Romeyke, S., Schäfer, S., Dominik, F., & Spinath, F. M. (2015). Intelligence and school grades: A metaanalysis. Intelligence, 53, 118–137. doi: 10.1016/j.intell. 2015.09.002 Salthouse, T. A., McGuthry, K. E., & Hambrick, D. Z. (1999). A framework for analyzing and interpreting differential aging

Ó 2017 Hogrefe Publishing

D. Danner et al., Individual Differences in Implicit Learning

patterns: Application to three measures of implicit learning. Aging, Neuropsychology, and Cognition, 6(1), 1–18. Schankin, A., Hagemann, D., Danner, D., & Hager, M. (2011). Violations of implicit rules elicit an early negativity in the ERP. NeuroReport, 13, 642–645. doi: 10.1097/WNR.0b013e328349d146 Schiff, R., & Katan, P. (2014). Does complexity matter? Metaanalysis of learner performance in artificial grammar tasks. Frontiers in Psychology, 5, 1084. doi: 10.3389/fpsyg.2014.01084 Servan-Schreiber, E., & Anderson, J. R. (1990). Learning artificial grammars with competitive chunking. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 592–608. Shanks, D. R., & St. John, M. F. (1994). Characteristics of dissociable human learning systems. Behavioral and Brain Sciences, 17, 367–447. doi: 10.1017/S0140525X00035032 Whittlesea, B. W. A., & Leboe, J. P. (2000). The heuristic basis of remembering and classification: Fluency, generation, and

Ó 2017 Hogrefe Publishing

resemblance. Journal of Experimental Psychology: General, 129, 84–106. doi: 10.1037/0096-3445.129.1.84 Received October 30, 2016 Revision received November 29, 2016 Accepted December 13, 2016 Published online July 12, 2017 Daniel Danner GESIS – Leibniz Institute for the Social Sciences PO Box 122155 68072 Mannheim Germany daniel.danner@gesis.org

Zeitschrift für Psychologie (2017), 225(1), 5–19

Original Article

Measuring Age-Related Differences in Using a Simple Decision Strategy The Case of the Recognition Heuristic Rüdiger F. Pohl Department of Psychology, School of Social Sciences, University of Mannheim, Germany

Abstract: According to the recognition heuristic, decision makers base their inferences on recognition alone, assuming that recognized objects have larger criterion values than unrecognized ones. Knowing that recognition is a valid cue and thus using the recognition heuristic should increase with age. This was tested in two experiments with preadolescents (N = 140), adolescents (N = 186), and adults (N = 78). The results show, as expected, a monotonic age-related trend in the improvement of domain-specific knowledge but, unexpectedly, a nonmonotonic one for using the recognition heuristic. More specifically, use of the recognition heuristic increased from preadolescents to adolescents, but then dropped for adults. Keywords: decision making, heuristics, recognition, knowledge, children, adolescents, adult

In the past two decades, Gigerenzer, Todd, and ABC Research Group (1999) and Todd, Gigerenzer, and ABC Research Group (2012) proposed that decision making can be much more accurate than the so-called “heuristics and biases” program suggested previously (see, e.g., Gilovich, Griffin, & Kahneman, 2002; Kahneman, Slovic, & Tversky, 1982). They assumed that adults possess a repertoire of “fast and frugal” heuristics – the so-called “mental toolbox” – that exploit regularities of the environment and are thus applied more adaptively than the previously proposed heuristics. Adults are believed to have learned which cue is valid in which decision domain through their extended experience with such domains (Goldstein & Gigerenzer, 2002). By contrast, not much is known about the trajectory of this development in childhood and adolescence, but acquiring such knowledge concerning cue validities and single-cue strategies is essential for being able to apply such simple heuristics. This paper focuses on the recognition heuristic (RH), arguably the simplest judgment heuristic specified as part of the mental toolbox. After introducing the RH, this paper summarizes research on the developmental aspects of using simple heuristics such as the RH and then reports two experiments testing how frequently preadolescents, adolescents, and adults use the RH in inferential decision making.

Zeitschrift für Psychologie (2017), 225(1), 20–30 DOI: 10.1027/2151-2604/a000283

The Recognition Heuristic According to Gigerenzer and Todd (1999), heuristics are strategies that help people interact with their environment in an ecologically rational way. These heuristics use less time and information than more complex strategies, but nevertheless lead to high levels of accuracy. Alongside several others, the RH is one such strategy (Gigerenzer et al., 1999). Goldstein and Gigerenzer (1999, 2002) described the RH as the simplest heuristic of the mental toolbox because it is based on one easily accessible cue only, namely object recognition. More precisely, the RH can, in principle, be applied whenever only one of the given options is recognized. For example, when being asked, “Which city is larger, New York or Abidjan,” it is common that a decision maker will have heard of New York, but not of Abidjan. In this case, the RH predicts that he or she will infer that New York is the larger city (which happens to be correct). The RH works well in domains in which the probability of recognizing an object is related to the judgment criterion (such as city population). This is known as the “recognition validity” and defined as the proportion of object pairs with one object recognized and the other not, in which the recognized one is the correct choice (Goldstein & Gigerenzer, 2002).

Ó 2017 Hogrefe Publishing

R. F. Pohl, Using the Recognition Heuristic

Since its introduction, the RH has been investigated in numerous studies (see, e.g., the three-part special issue of the journal Judgment and Decision Making; Marewski, Pohl, & Vitouch, 2010, 2011a, 2011b) and has also stimulated some controversy (see, e.g., Gigerenzer & Goldstein, 2011; Hilbig; 2010a, 2010b; Pohl, 2011). Central findings are that recognition is a valid cue in many domains and that the RH predicts a substantial proportion of people’s inferences (see, e.g., Gigerenzer & Goldstein, 2011; Marewski, Gaissmaier, Schooler, Goldstein, & Gigerenzer, 2010; Pachur, 2010; Pachur, Bröder, & Marewski, 2008; Pachur, Todd, Gigerenzer, Schooler, & Goldstein, 2011; Pohl, 2006). Other – more critical – findings are that actually using the RH concerns a much smaller proportion of inferences than predicted by the RH (see, e.g., Hilbig, Erdfelder, & Pohl, 2010), that often-further information beyond recognition is considered (see, e.g., Hilbig & Pohl, 2008), and that alternative process models may provide more parsimonious accounts of decision makers’ behavior (see, e.g., Glöckner & Bröder, 2014; Glöckner, Hilbig, & Jekel, 2014). This paper focuses on an aspect that has been largely neglected so far, namely the question of how the ability to apply heuristics such as the RH develops with children’s age. To this end, at least two prerequisites seem to be essential: Children need to have (1) knowledge whether recognition is a valid cue in a given domain and (2) the ability to ignore irrelevant cues and apply one-reason strategies. As both of these abilities develop with increasing experience, use of the RH should also increase with age. In addition and according to dual-process theories of thinking (see, e.g., Stanovich, West, & Toplak, 2011), age should also be related to the development of a reflective-analytical mode of thinking (as opposed to an intuitive-heuristic one), so that heuristic inferences can be overruled by more sophisticated processes and inferential strategies can be applied increasingly flexible. This could lead to either a decrease or an increase in RH use depending on individual knowledge and the task constraints (see Horn, Ruggeri, & Pachur, 2016).

Development of Decision Making So far, studies on the RH mainly investigated young adults only, with a few looking at developmental differences between younger and older adults (Horn, Pachur, & Mata, 2015; Mata, Schooler, & Rieskamp, 2007; Pachur, Mata, & Schooler, 2009). Only one study has directly tested children’s and adolescents’ use of the RH yet, namely Horn et al. (2016). They tested 9-, 12-, and 17-year-olds with two different materials, namely US cities and infectious diseases, the former having a high and the latter a low Ó 2017 Hogrefe Publishing

recognition validity. The authors’ main question was at what age children are able to apply the RH adaptively to the given domains. Their results showed several things: (1) Even the youngest children already made substantial use of the RH. (2) There was not much difference in the behavior of 9- and 12-year-olds. (3) Most importantly, only the 17-year-olds applied the RH selectively, that is, significantly more often in the high-validity domain than in the low-validity one. The younger children did not differentiate these domains and used the RH at intermediate levels. In other words, RH use increased with age in the domain with high recognition validity, but decreased in the domain with low recognition validity. In addition, Horn et al. observed as expected that the percentage of recognized objects as well as participants’ knowledge validity increased with age. Knowledge validity is defined as the proportion of correct choices in pairs with both objects recognized (Goldstein & Gigerenzer, 2002). A number of other investigations have focused on children’s general development of decision-making skills (see, e.g., Bereby-Meyer, Assor, & Katz, 2004; Howse, Best, & Stone, 2003; Jacobs & Klaczynski, 2002, 2005; Klaczynski, Byrnes, & Jacobs, 2001; Kuhn & Franklin, 2006). The common view is that younger children employ different, and often less successful, strategies than older children do. Klayman (1985) conducted one of the first developmental studies in this domain and found that 12-year-olds as well as adults searched information systematically and shifted toward simpler heuristics when information load increased. Davidson (1991) accordingly proposed that basic decisionmaking skills, like those found in adults, are developed in early adolescence, an age at which children show a prominent improvement in their ability to focus selectively on relevant information (see also Bereby-Meyer et al., 2004; Gregan-Paxton & Roedder John, 1997; Jacobs & Potenza, 1991; Klaczynski et al., 2001; Kuhn & Franklin, 2006). In her own study, Davidson tested the decision making of 8-, 11-, and 14-year-old children using an information-board design, in which children could consecutively uncover the features of several options before making a decision. Davidson reported that younger children searched more information and were also less systematic in their search compared to older children. Davidson summarized her research as showing that the most important developmental changes consist in learning to ignore irrelevant information and to apply less demanding and simpler strategies (see also Howse et al., 2003; Klaczynski et al., 2001). Mata, von Helversen, and Rieskamp (2011) reported further supporting evidence. They tested 9–10-year-olds, 11–12-year-olds, and young adults in two different decision domains that either favored an information-integration strategy or a simple heuristic relying on one piece of Zeitschrift für Psychologie (2017), 225(1), 20–30

information only. Their results showed that younger children compared to older children and adults had more difficulties to adaptively use the simpler heuristic (see also Bereby-Meyer et al., 2004). In line with the positions outlined above, Mata et al. explained the observed difference in decision making with the younger children’s less developed skill to selectively attend to the relevant information. In addition, Mata et al. proposed that these difficulties could be due to the still ongoing maturation of the prefrontal cortex areas that are associated with working memory and selective attention and that mature considerably from late childhood to adolescence (Bunge & Zelazo, 2006; Kuhn & Franklin, 2006). Ruggeri and Katsikopoulos (2013) compared 8-, 10-, and 18-year-olds and found that the young adults outperformed both groups of children in successful cue selection. Betsch and Lang (2013) reported similar results after testing decision making in an information-board design in three age groups, namely preschoolers (6 years), children from elementary school (9 years), and young adults (23 years). They found that the youngest children were most disturbed by the presence of an irrelevant (invalid) cue, and that the ability to use a highly valid cue increased with age (see also Betsch, Lang, Lehmann, & Axmann, 2014). In sum, the evidence on the development of children’s decision making suggests a marked change for adolescents. Compared to younger children, adolescents are better able to ignore irrelevant information and to apply single-cue strategies. In addition, children become more and more acquainted with different domains and their cues’ validities. In a dual-process approach, Klaczynski (2004) differentiated experiential from analytic processing modes and summarized the empirical evidence as showing that “experiential processing predominates adolescent decision making under most conditions, and many adult-like heuristics are acquired by late childhood” (p. 99). Kuhn and Franklin (2006), moreover, stressed that one important feature of this stage is that adolescents become able to think about their own thinking, that is, they improve their meta-cognitive skills, which in turn would enable more adaptive and efficient strategy use (as seen, e.g., in the results of Horn et al., 2016).

The Present Experiments The main aim of the following two experiments was to investigate whether the observed developmental change in decision making during adolescence also applies to using the RH, thus replicating and extending the Horn et al. (2016) results. To this end, groups of preadolescents and adolescents were tested. In addition, young adults were Zeitschrift für Psychologie (2017), 225(1), 20–30

R. F. Pohl, Using the Recognition Heuristic

tested to assess the further course of development (and to enable comparison to other studies with young adults). The expectation was that knowledge about the validity of cues (including recognition) and the ability to apply onereason strategies increase with age. In turn, domain-specific knowledge and use of the RH should also increase with age (as Horn et al. had found). Knowledge was measured via the percentage of recognized objects and the quality of knowledge about these objects, the so-called knowledge validity (Goldstein & Gigerenzer, 2002). As materials, the two experiments used the hitherto most often used knowledge domain, namely cities. This material has been used in countless studies from the earliest ones establishing the RH (Goldstein & Gigerenzer, 1999) up to the most recent ones testing age-related differences in RH use (Horn et al., 2015, 2016). Thus, sticking to cities as materials allows comparison across a wide range of previous findings. One may object, though, that cities may not represent an appropriate domain for children and that other materials might be better suited to compare children to older age groups, especially materials where age groups do not differ in their respective knowledge. While this may be true, it appears next to impossible to find such materials where all age groups recognize the same number of objects and have the same knowledge validity (see, e.g., Horn et al., 2016). Thus, it appears more favorable to use established and well-known materials and then to take any observed knowledge differences into account. To get a first estimate of how many cities children recognize, a small pilot study was conducted: 15 children (from 7 to 15 years old) were asked which of 60 major world cities (taken out of a list of world cities with more than 3 million inhabitants without metropolitan areas) they recognize. They recognized 21 of the 60 cities on average, with older children recognizing more cities than the younger ones did, namely 27 versus 14. The overall recognition validity was .63. Assuming that the availability of additional knowledge increases with the number of recognized objects, the observed frequencies appeared large enough to conclude that children possess at least some knowledge in this domain so that it would be suited as material. Knowledge validity was, however, expected to be lower than recognition validity, so that the chosen domain would be comparable to the domain of US cities used by Horn et al. (2016), but not to their domain of diseases (where recognition validity and knowledge validity were similarly large), or to other domains where knowledge validity exceeds recognition validity. Another problem of studies like these here is how to assess RH use. Several studies simply took the adherence (or accordance) rate as a proxy of RH use (e.g., Goldstein & Gigerenzer, 2002). This measure counts how often Ó 2017 Hogrefe Publishing

R. F. Pohl, Using the Recognition Heuristic

inferences are in line with the RH, that is, how often recognized cities are chosen over unrecognized ones. The adherence rate, however, cannot be used to assess RH use, because it is confounded (Hilbig, 2010b; Horn et al., 2015; Schwikert & Curran, 2014). The reason for this is that several processes may lead to choosing the recognized object, only one of which is using the RH, that is, basing one’s decision on recognition alone. Another possibility is that further knowledge about the recognized object is used either in addition to the recognition cue or instead of it. In order to resolve the confound and to estimate the proper probability of RH use, Hilbig et al. (2010) developed the r-model (a summary of the r-model is provided in the Electronic Supplementary Material, ESM 1) that belongs to the class of multinomial processing-tree models (Erdfelder et al., 2009) and provides a robust and unbiased measure of RH use. More specifically, the r-model seeks to explain categorical observable data (i.e., the decision frequencies) through a set of four latent parameters, the most important of which is r, the probability of RH use (i.e., the proportion of choices in pairs with one object recognized and the other not, in which the recognized object is chosen based on recognition alone). The r-model has been validated through simulations and experimental studies (see Hilbig et al., 2010, for the details) and has been applied to several studies since. The overall model fit as well as differences between parameter values can easily be tested by using log-likelihood ratio statistics (see the summary in ESM 1). In both of the following experiments, the freely available software tool “multiTree” (Moshagen, 2010) was used for these tests.

Experiment 1 Method Sample In total, 192 participants from three different age groups were tested: (a) 49 preadolescents (34 female, 15 male; mean age = 9.1 years, SD = 0.8 years, range = 8–10 years) from the 3rd and 4th grade of an elementary school (in Lippstadt-Benninghausen, Germany); (b) 105 adolescents (55 female, 50 male; mean age = 13.6 years, SD = 1.1 years, range = 12–15 years) from the 7th and 9th grade of a secondary school (also in Lippstadt-Benninghausen); and (c) 38 young adults (25 female, 11 male, 2 undisclosed; mean age = 23.2 years, SD = 5.1 years, range = 19–39 years), mainly students from the University of Mannheim, Germany. I refer to the two groups of children as 9-year-olds and 14-yearolds, respectively. Children and their parents provided written consent for participation in the study. Given that the adults possibly represent a selected group (of university students), comparability to the children and adolescents may be somewhat compromised. Ó 2017 Hogrefe Publishing

With the given sample sizes and setting α = .05, the power to detect age-group differences of medium size in an analysis of variance (ANOVA; Cohen’s f = 0.25) was .88 (computed with the software tool “G*Power”; Faul, Erdfelder, Lang, & Buchner, 2007). A post hoc power analysis was conducted on the aggregated data set with all three age groups to compute the power of the r-model to detect differences in r (with the software tool“multiTree”; Moshagen, 2010). All parameters of the r-model were set to plausible values with no group differences (i.e., recognition validity a = .70, knowledge validity b = .60, and guessing g = .50), but allowed to be free both under the H1 and the H0 model. Most importantly, the r parameter (estimating RH use) was set to increase by .05 from one age group to the next (i.e., .55, .60, and .65) and again free under H1, but set to be equal under H0. Given α = .05, the resulting power to detect such group differences of Δr = .05 was .81. Material This study used a selection of 12 cities out of the largest world cities as materials (details available in ESM 1). An important question concerns the validity of recognition as a cue. To achieve a rather large recognition validity in the chosen set of cities that would – in principle – be easily detected even by the youngest children, the 12 cities were selected in the following way: Given the recognition rates from the pilot study and ordering the 60 used cities according to their size, six highly recognized cities were chosen from the upper half and six highly unrecognized ones from the lower half of the list. This selection procedure should maximize the recognition validity in the chosen set. With respect to the pilot data, the recognition validity of this set was indeed .94. The selection procedure should also ensure that the proportion of RH-applicable pairs comes close to its maximum of 54.5% (which would be achieved if exactly 50% of the cities were recognized, i.e., 6 out of 12). Then the 12 cities were combined in all possible pairs, resulting in 66 pairs that were randomly ordered into three lists that were given alternately to the participants. Procedure The children were tested as a group in their respective classes. Adults were tested in small groups of up to six participants in a laboratory. The procedure was the same for all participants and consisted of two consecutive paper/pencil tasks, namely, a recognition task and a paired-comparison task – mirroring the widely used procedure in studying the RH. The task order was not varied, because previous studies had found no effect (see Goldstein & Gigerenzer, 2002; Pachur & Hertwig, 2006; Pachur et al., 2009; see also Michalkiewicz & Erdfelder, 2016). In the recognition task, participants received an alphabetically ordered list of Zeitschrift für Psychologie (2017), 225(1), 20–30

the 12 cities and were asked to indicate for each city whether they had heard of it before. In the paired-comparison task, participants received the randomly ordered list of 66 city pairs and were asked to indicate for each pair which of the two cities was the more populous one (i.e., has more inhabitants). There were no time constraints. The 9-yearolds needed on average 23 min to complete the study, whereas the 14-year-olds required 14 min. All children received sweets as compensation. For adults, the study took on average only 4 min and was run as a filler task in an unrelated experiment, for which participants received course credit or payment. Results Even the 9-year-olds recognized almost half of the cities. This percentage increased significantly with age (see Table 1), F(2, 189) = 14.36, p < .001, ηp2 = .13. Post hoc Tukey tests revealed that 9-year-olds recognized significantly less cities than both 14-year-olds and adults, both p < .001, whereas the latter groups did not differ significantly, p = .23. The number of cities recognized determines, in turn, what proportion of the 66 city pairs will consist of one recognized and one unrecognized city, that is, pairs in which the RH could potentially be applied. Here, these proportions were around 50% (see Table 1), that is, close to its maximum, and did not differ significantly between groups, F(2, 189) = 2.86, p = .06, ηp2 = .03. Next, the recognition validity and the knowledge validity were computed for each participant. Based on the order of cities that was used to construct the material set, the mean recognition validity was .92 (just as was expected from the pilot study’s data), but as it turned out (after running both Experiments 1 and 2) the order used was wrong. Applying the correct order yielded much lower values. The means are given in Table 1. Recognition validity was significantly different between age groups, F(2, 189) = 5.65, p = .004, ηp2 = .06. Post hoc Tukey test revealed that 9-year-olds had the lowest recognition validity compared to 14-yearolds, p = .008, and adults, p = .011, whereas the latter two groups did not differ, p = .84. Similarly, knowledge validity showed significant differences, too, F (2, 189) = 5.29, p = .006, ηp2 = .05. Post hoc Tukey test again showed that 9-year-olds had a lower knowledge validity than 14-year-olds, p = .006, and adults, p = .034, whereas the latter two groups did not differ, p > .99. To assess RH use, the r-model (Hilbig et al., 2010) was applied to the aggregated data set. The model was tripled to account for all three groups simultaneously comprising 12,672 data (= 192 participants 66 paired comparisons). The model fit the data well, as indicated by the nonsignificant log-likelihood ratio statistic G2(3) = 1.30, p = .73. Differences in parameter r, denoting the probability of RH use (see Table 1), were tested by equating r in all pairs Zeitschrift für Psychologie (2017), 225(1), 20–30

R. F. Pohl, Using the Recognition Heuristic

of age groups (see the summary of the r-model in ESM 1 for details). This revealed that RH use was lower for 9-yearolds than for 14-year-olds, ΔG2(1) = 65.59, p < .001, Δr = .19 (Δr indicates the effect size; Moshagen, 2010), and adults, ΔG2(1) = 18.35, p < .001, Δr = .14. In addition, and contrary to expectations, adults used the RH significantly less often than 14-year-olds did, ΔG2(1) = 5.03, p = .025, Δr = .05. However, in absolute terms, this difference appears rather small: From 33 RH-applicable cases ( 50% of the 66 pairs), adults used the RH on average only about two times (= 5% of 33) less often than 14-year-olds did. To reassure that the r-model results were not artificial (due to aggregation across individual data), the r-model was also applied to each participant’s data separately. The results showed misfit of the model in 11 cases, which is close to what would be expected (i.e., about 5% of 192 participants). For the remaining 181 participants, the mean values of parameter r (and SDs) were .58 (.32), .75 (.27), and .69 (.30) for 9-year-olds, 14-year-olds, and adults, respectively. These are close to the values of the aggregate analysis. An ANOVA confirmed significant group differences in RH use, F(2, 178) = 5.47, p = .005, ηp2 = .06. Post hoc Tukey tests, however, revealed that only the difference between 9-year-olds and 14-year-olds remained significant, p = .003, whereas adults differed neither from 9-year-olds, p = .21, nor from 14-year-olds, p = .49. To check whether age-group differences in RH use were related to according differences in recognition validity or knowledge validity, a multiple linear regression was computed, including log(age), centered log(age)2 (to capture the quadratic age-related trend in RH use), recognition validity, and knowledge validity as predictors and RH use as dependent variable. The model was significant F(4, 176) = 6.38, p < .001, ηp2 = .13, adj. R2 = .11, and found age, β = .24, p = .003, as well as squared age, β = .27, p < .001, to be significantly related to RH use. These two regression weights signify that RH use generally increased with age, but also followed an inverse [-shaped trend. In addition, knowledge validity was a significant predictor, too, β = .23, p = .003, but not recognition validity, β = .06, p = .43. Note that the relation between knowledge validity and RH use was negative, suggesting that the more valid one’s knowledge, the less often was the RH used. To further explore this relation, correlations were computed separately for each age group. Only 9-year-olds showed a strong and highly significant correlation between knowledge validity and RH use, r = .56, p < .001, that was moreover significantly larger than that of 14-year-olds, r = .15, p = .13, and adults, r = .11, p = .51, with z = 2.57, p = .015, and z = 2.24, p = .033, respectively. This pattern suggests that the trade-off between quality of knowledge and RH use in the regression analysis was mainly due to the 9-year-olds. Ó 2017 Hogrefe Publishing

R. F. Pohl, Using the Recognition Heuristic

Table 1. Mean proportions of cities recognized and RH pairs, mean values of recognition validity and knowledge validity, and mean estimates of RH use (and standard errors) for the three age groups in Experiments 1 and 2 Experiment 1 9-year-olds (N = 49)

Experiment 2

14-year-olds (N = 105)

Young adults (N = 38)

10-year-olds (N = 71)

14-year-olds (N = 81)

Young adults (N = 40) .62 (.01)

Proportion of cities recognized

.47 (.02)

.55 (.01)

.58 (.01)

.40 (.02)

.52 (.01)

Proportion of RH pairs

.50 (.01)

.52 (< .01)

.51 (.01)

.48 (.01)

.53 (< .01)

.49 (.01)

Recognition validity

.72 (.02)

.77 (.01)

.78 (.01)

.51 (.01)

.45 (.01)

.49 (.01)

Knowledge validity

.50 (.02)

.59 (.01)

.52 (.01)

.63 (.01)

.65 (.01)

Probability of RH use

.54 (.02)

.73 (.01)

.68 (.02)

.71 (.01)

.82 (.01)

.71 (.02)

Discussion Experiment 1 compared 9- and 14-year-olds and young adults and found that the percentage of cities recognized increased with age, and so did recognition validity and knowledge validity, although 14-year-olds and adults showed no differences here. But most importantly, use of the RH showed a non-monotonic trend, with the expected increase from 9- to 14-year-olds, but then an unexpected decrease for adults. This result suggests that, while familiarity with and knowledge in a decision domain increase with age, use of a simple heuristic such as the RH may not follow the same trend. It could be argued that the increase in RH use from 9- to 14-year-olds is confounded with according increases in recognition validity and knowledge validity (see Table 1). However, this conjecture is rather unlikely for two reasons. First, there is no according pattern for the differences between 14-year-olds and adults. Here, recognition validity and knowledge validity were almost identical, but the probability of RH use nevertheless decreased significantly for adults (albeit only little in absolute terms). Second, the regression analysis found that recognition validity had no effect on RH use and that knowledge validity was negatively related to RH use, not positively as the age-group means in Table 1 suggest. However, this negative correlation is an important result itself, especially that it was mainly due to 9-year-olds. In other words, participants in this age group either had good knowledge and thus relied less on the RH, or had poor knowledge and thus relied more on the RH. Nevertheless, one critical aspect of this experiment is that the materials were specifically selected to maximize recognition (albeit based on the wrong rank order of cities). This could have influenced participants’ decision-making behavior. A further shortcoming is that the two groups of children came from different schools thus possibly compromising their comparability. Finally and most importantly, the non-monotonic trend in RH use was not expected, but the surprising decrease in RH use from 14-year-olds to adults was also relatively small. Thus, this pattern should be replicated in the first place, before turning to potential explanations. Experiment 2 addressed all these issues. Ó 2017 Hogrefe Publishing

Experiment 2 Experiment 2 was designed to mainly replicate Experiment 1, but with two changes: The two age groups of children now came from one and the same school, and the material was selected differently in order to yield a less extreme recognition validity. Method Sample In total, 192 participants were tested, again from three age groups, but now the two groups of children came from the same school (a secondary school in Steinheim, Germany): (a) 71 preadolescents from 5th grade (42 female, 27 male, 2 undisclosed; mean age = 10.2 years, SD = 0.5 years, range = 9–11 years); (b) 81 adolescents from 9th grade (47 female, 34 male; mean age = 14.2 years, SD = 0.5 years, range = 12–15 years); and (c) 40 young adults (29 female, 10 male, 1 undisclosed; mean age = 23.8 years, SD = 5.7 years, range = 18–41 years), again mainly students from the University of Mannheim. The groups of children are referred to as 10-year-olds and 14-year-olds, respectively. Children and their parents provided written consent for participation in the study. Again, the comparability of adults could be compromised as they presumably represent a selected group. With these sample sizes and setting α = .05, the power to detect age-group differences of medium size in an ANOVA (Cohen’s f = 0.25) was .88 (computed with “G*Power 3”; Faul et al., 2007). To determine the power of the r-model to detect differences in r, a post hoc power analysis was run just as in Experiment 1, using the same procedure and parameter values (computed with “multiTree”; Moshagen, 2010). With α = .05 the resulting power to detect group differences of Δr = .05 was .87. Material As before, 12 cities out of the largest world cities served as materials (details available in ESM 1), but in contrast to Experiment 1 and in order to yield a lower (but still substantial) recognition validity, the selection procedure was different. Based on the recognition rates from the pilot Zeitschrift für Psychologie (2017), 225(1), 20–30

study and ordering cities according to their size, six cities were chosen from the upper half of the list, four of which were highly recognized and two highly unrecognized, and six from the lower half, four of which were highly unrecognized and two highly recognized. This leads to a recognition validity of .70 according to the pilot data (which is comparable to values in other studies using world cities). Then again all 66 possible pairs were constructed from these 12 cities and ordered randomly into three lists that were given alternately to the participants. Procedure Except for the different materials used, the procedure was completely identical to that of Experiment 1. The whole experiment lasted about 13 min for the 10-year-olds, 8 min for the 14-year-olds, and 4 min for the adults. Results The 10-year-olds recognized on average 40% of the cities. Again, this percentage increased significantly with age (see Table 1), F(2, 189) = 60.96, p < .001, ηp2 = .39. Post hoc Tukey tests showed that all pairwise differences were highly significant, p < .001. The proportion of the 66 city pairs, in which the RH could be applied, varied again around 50% (Table 1), but was significantly different for the three age groups, F(2, 189) = 14.27, p < .001, ηp2 = .13. Post hoc Tukey tests showed that this proportion was significantly larger for 14-year-olds than for the other two age groups, both p .001, which did not differ from one another, p = .82. However, in absolute numbers, the difference is rather small: For 10-year-olds and adults, the RH could on average be applied to about 32 out of the 66 pairs, whereas it could be applied to about 35 pairs for 14-year-olds. Based on the wrong order of cities (that was used to construct the material set), the mean recognition validity was .70 (just as was expected from the pilot study’s data), but using the correct order yielded much lower values. In fact, the mean recognition validity was close to chance in all three age groups. Recognition validity was significantly different between the three age groups, F(2, 189) = 9.20, p < .001, ηp2 = .09. Post hoc Tukey test showed that 14-year-olds had a significantly lower recognition validity than 10-year-olds, p < .001, and adults, p = .050, whereas the latter two groups did not differ, p = .47. Similarly, knowledge validity was also significantly different, F(2, 189) = 9.48, p < .001, ηp2 = .09. Post hoc Tukey test showed that 10-year-olds had the lowest knowledge validity compared to 14-year-olds, p < .001, and adults, p = .002, whereas the latter two groups did not differ, p = .93. As before, to assess RH use, the r-model (Hilbig et al., 2010) was applied to the aggregated data set. The model was tripled to account for all three groups simultaneously comprising 12,672 data (= 192 participants 66 paired Zeitschrift für Psychologie (2017), 225(1), 20–30

R. F. Pohl, Using the Recognition Heuristic

comparisons). The r-model fit the data well, G2(3) = 2.12, p = .55. The probability of RH use (parameter r; see Table 1) was found to be smaller for 10-year-olds than for 14-yearolds, ΔG2(1) = 38.98, p < .001, Δr = 0.11, but larger for 14-year-olds than for adults, ΔG2(1) = 29.26, p < .001, Δr = 0.11, whereas 10-year-olds and adults did not differ, ΔG2(1) = 0.01, p = .93. To check the r-model results for artificiality (due to aggregating across individual data), the r-model was again applied to each participant’s data separately. The model did not fit the data in 10 cases, which is to be expected (i.e., 5% of 192 participants). For the remaining 182 participants, the mean values (and SDs) of parameter r were .73 (.26), .83 (.19), and .73 (.29) for 10-year-olds, 14-year-olds, and adults, respectively. These values were again close to the ones found above and also significantly different between groups, F(2, 179) = 4.53, p = .012, ηp2 = .05. Post hoc Tukey tests, however, revealed that only the difference between 10-year-olds and 14-year-olds remained significant, p = .018, whereas adults differed neither from 10-year-olds, p > .99, nor from 14-year-olds, p = .08. Next, a multiple linear regression on RH use was computed as in Experiment 1. The model was significant, F(4, 174) = 3.81, p = .005, ηp2 = .08, adj. R2 = .06, and found age, β = .22, p = .022, as well as squared age, β = .35, p < .001, to be significant predictors. This again shows that RH use generally increased with age, but also followed an inverse [-shaped trend. In contrast, neither knowledge validity, β = .09, p = .28, nor recognition validity, β = .01, p = .87, was significantly related to RH use. Despite the overall nonsignificant impact of knowledge validity on RH use, separate correlations were computed for each age group (to allow comparison with the results from Experiment 1), but none of them was significant. The correlations were r = .21, p = .10, for 10-year-olds, r = .06, p = .58, for 14-year-olds, and r = .19, p = .27, for adults. Also, none of the differences between age groups reached significance, with z = 1.87, p = .07 for the largest difference. Discussion The percentage of cities recognized showed a clear monotonic trend with age. In addition, 10-year-olds had less valid knowledge about recognized cities compared to 14-year-olds and adults, who did not differ from each other. More importantly, use of the RH showed again a clear non-monotonic trend with increasing use from 10-year-olds to 14-year-olds, but then a decreasing use by adults, thus fully replicating the findings from Experiment 1. The material in this study, however, had a much less extreme recognition validity than in Experiment 1 (and even one that was accidentally close to chance), yet RH use was frequent, thus ruling out the possibility that the Ó 2017 Hogrefe Publishing

R. F. Pohl, Using the Recognition Heuristic

high estimates for RH use in Experiment 1 were influenced by the high recognition validity of the chosen city set in that experiment. In addition, the current results support the view that the recognition validities of specific sets of objects from a domain represent only more or less valid estimates for the “true” recognition validity in that domain and that participants base their behavior on the latter, not the former. In other words, participants take object sets as representative of the underlying domain. This view has received support in two recent studies that were published after running the present experiments (see Basehore & Anderson, 2016, Experiment 2; Pohl, Michalkiewicz, Erdfelder, & Hilbig, 2017). One may nevertheless argue that the increase in RH use from 10- to 14-year-olds was influenced by according differences in their knowledge validity. As in Experiment 1, this conjecture seems rather unlikely. First, the argument does not fit to the pattern of results for 14-year-olds and adults, who showed the same knowledge validity, but still adults used the RH less often. Second, the multiple regression analysis (as in Experiment 1) found neither recognition validity nor knowledge validity to be significant predictors of RH use.

General Discussion Two studies were reported that investigated age-related differences in using a very simple inferential decision strategy, the recognition heuristic (RH; Gigerenzer & Goldstein, 2011; Goldstein & Gigerenzer, 1999, 2002). The RH assumes that when people are asked to infer which of two objects has the higher criterion value, but they recognize only one of them, they choose the recognized one, that is, they base their inference on recognition alone. This heuristic is considered “ecologically rational” as long as the probability of recognizing an object is related to its criterion value. Thus, learning that recognition is a valid cue in some domains and that one-reason strategies could be quite accurate is a matter of sufficient experience. Correspondingly, use of the RH should generally increase with children’s age (see Horn et al., 2016). Both experiments compared three age groups, namely preadolescents (9–10-year-olds), adolescents (14-yearolds), and young adults and asked them to infer which of two major world cities in a number of pairs is the larger one. The chosen material proved to be well suited, not only because it allows comparison with numerous other studies (most importantly the one from Horn et al., 2016), but also because even the youngest children recognized a substantial proportion of the cities and all age groups had about the same proportion of RH-applicable pairs ( 50%), that is, pairs in which the RH could in principle be used. Ó 2017 Hogrefe Publishing

The main results are as follows: (a) The percentage of recognized objects increased with age; (b) recognition validity of the chosen set of objects had little if any influence on participants’ behavior; (c) knowledge validity increased from preadolescents to adolescents, but did not differ between the latter and adults; (d) use of the RH increased from preadolescents to adolescents, but then decreased for adults; and (e) the between-group differences in RH use could not be explained by differences in knowledge validity. That familiarity with a domain (like world cities) and the quality of knowledge increased with age is not so surprising (see Horn et al., 2016, for similar results). Unexpected, however, is the non-monotonic trend of RH use that was found in both studies. This pattern of RH use could not be explained by age-group differences in knowledge validity and thus indicates an independent developmental trend in using inferential heuristics. In the following, a potential explanation is offered. The preadolescents were the only group that showed a strong negative correlation between knowledge validity and RH use, suggesting a trade-off between knowledgebased and heuristic strategies. Their knowledge was on average still poor, but recognition as a cue had already some validity, leading to a substantial RH use. Put in other words, recognition was perhaps the only valid cue some of them had. Accordingly, one could even expect that they would have used the RH all the time, but that was not the case. One explanation could be that children at this age still have problems attending to the most valid cue and using single-cue strategies (as has been repeatedly found). Instead, they may have preferred to guess. Another reason is that some participants in this age group did have some valid knowledge and preferred to rely on that, and accordingly used the RH less often. Previous research on the development of decision making suggested a marked change during adolescence: At this age, children improve their decision-making skills by searching information more systematically, by better eliminating irrelevant aspects or options, or by relying on one cue only (Bereby-Meyer et al., 2004; Davidson, 1991; Gregan-Paxton & Roedder John, 1997; Howse et al., 2003; Jacobs & Potenza, 1991; Klaczynski et al., 2001; Klayman, 1985; Kuhn & Franklin, 2006; Mata et al., 2011; Ruggeri & Katsikopoulos, 2013). The present results fit nicely into this picture: Adolescents made substantially more use of the RH than preadolescents did, suggesting that the efficient use of simple heuristics based on a single cue increases with children’s age from preadolescence to adolescence (see also Betsch & Lang, 2013; Betsch et al., 2014; Horn et al., 2016). With improved knowledge and increased experience with simple heuristics, the adolescents apparently relied more strongly on both. However, their Zeitschrift für Psychologie (2017), 225(1), 20–30

meta-cognitive abilities – while certainly improved from childhood (Kuhn & Franklin, 2006) – are perhaps not yet fully developed in order to adaptively switch between strategies (i.e., to “suspend” using the RH; Pachur & Hertwig, 2006; Pachur et al., 2009). In terms of dual-process theories (Stanovich et al., 2011), the analytical-reflective mode did not often enough overrule the intuitive-heuristic one. Hence, use of the RH is particularly frequent. This corresponds to Klaczynski’s (2004) statement that adolescents’ thinking is mainly driven by experiential (heuristic) processes. Similarly, Horn et al. (2016) reported that 12-yearolds did not use the RH adaptively (i.e., according to recognition validity), whereas 17-year-olds did. With still more experience in decision making, adults might eventually have learned to trust their knowledge (i.e., they have developed appropriate meta-cognitions) and thus preferred to more often rely on it, even if the RH would do. Particularly, adults could be anxious to outperform the RH by employing their knowledge beyond recognition in order to discriminate cases with the recognized city being the correct versus the false choice (cf. Hilbig & Pohl, 2008; Pohl, 2006). This is the only chance to increase performance beyond what is achieved by always following the RH (which is specified by the recognition validity). In other words, the reflectiveanalytical mode of thinking (as assumed in dual-process approaches) might be better developed and more often invoked in adults than in younger participants. Michalkiewicz, Minich, and Erdfelder (2017) tested such an approach on adults’ RH use and found that RH use was negatively related to “need for cognition” which could be considered a proxy for the preferred mode of analyticalreflective thinking. One major shortcoming of the adults tested here (university students) is that they possibly represent selected samples that are not readily comparable to those of the children. Presumably, not all of the tested children will later enter university. One selection criterion could be intelligence, so that the tested adults are perhaps on average more intelligent than the tested children. However, Michalkiewicz, Arden, and Erdfelder (2017) reanalyzed the data from Hilbig (2008) and reported that RH use increased with intelligence. Thus, potential age-related differences in intelligence can hardly explain the observed decrease in RH use for adults. But, of course, other systematic differences between the tested adults and children not captured here may nevertheless exist. Discovering such differences could also help to explain why the non-monotonic trend in RH use reported here is different from what was found in the studies by Horn et al. (2015, 2016). In a domain where recognition validity exceeded knowledge validity (just as was the case here),

Zeitschrift für Psychologie (2017), 225(1), 20–30

R. F. Pohl, Using the Recognition Heuristic

Horn et al. (2016) found RH use of r .40 (exact values not reported) for both groups of 9- and 12-year-olds, but of r .65 for 17-year-olds. Having used the same material, Horn et al. (2015) reanalyzed the data from Pachur et al. (2009) and reported that younger and older adults used the RH with a high and equal probability of r = .85. So if one wants to compare the results across both of these studies (despite all potential differences), RH use increased monotonically with age and did not decline from adolescents to adults as was found here. But again, it is questionable whether such a comparison is actually justified. At least, Horn et al. (2016) themselves refrained from drawing such a conclusion. In sum, two experiments on inferential decision making showed that use of the RH, which is one of the simplest strategies, because it relies on recognition only, increased from preadolescence to adolescence, but then surprisingly decreased for young adults. This result suggests that use of the RH develops in a non-monotonic fashion with a peak in adolescence. It remains for future research to further explore the underlying mechanisms of this pattern. Electronic Supplementary Material The electronic supplementary material is available with the online version of the article at http://dx.doi.org/10.1027/ 2151-2604/a000283 ESM 1. Text (.docx). Summary of the r-model and material sets used in Experiments 1 and 2. Acknowledgments I thank Franziska von Massow, Barbara BeckmannSchumacher, and Simone Malejka for their help in data collection, Nikoletta Symeonidou and Sabine Schellhaas for data analysis, and Benjamin E. Hilbig for helpful comments.

References Basehore, Z., & Anderson, R. B. (2016). The simple life: New experimental tests of the recognition heuristic. Judgment and Decision Making, 11, 301–309. Retrieved from http://journal. sjdm.org/16/16202a/jdm16202a.pdf Bayen, U. J., Erdfelder, E., Bearden, J. N., & Lozito, J. P. (2006). The interplay of memory and judgment processes in effects of aging on hindsight bias. Journal of Experimental Psychology: Learning, Memory and Cognition, 32, 1003–1018. doi: 10.1037/ 0278-7393.32.5.1003 Bereby-Meyer, Y., Assor, A., & Katz, I. (2004). Children’s choice strategies: The effects of age and task demands. Cognitive Development, 19, 25–47. doi: 10.1016/j.cogdev.2003. 11.003

Ó 2017 Hogrefe Publishing

R. F. Pohl, Using the Recognition Heuristic

Bernstein, D. M., Erdfelder, E., Meltzoff, A. N., Peria, W., & Loftus, G. R. (2011). Hindsight bias from 3 to 95 years of age. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 378–391. doi: 10.1037/a0021971 Betsch, T., & Lang, A. (2013). Utilization of probabilistic cues in the presence of irrelevant information: A comparison of risky choice in children and adults. Journal of Experimental Child Psychology, 115, 108–125. doi: 10.1016/j.jecp. 2012. 11.003 Betsch, T., Lang, A., Lehmann, A., & Axmann, J. M. (2014). Utilizing probabilities as decision weights in closed and open information boards: A comparison of children and adults. Acta Psychologica, 153, 74–86. doi: 10.1016/j.actpsy.2014. 09.008 Bunge, S. A., & Zelazo, P. D. (2006). A brain-based account of the development of rule use in childhood. Current Directions in Psychological Science, 15, 118–121. doi: 10.1111/j.0963-7214. 2006.00419.x Coolin, A., Erdfelder, E., Bernstein, D. M., Thornton, A. E., & Thornton, W. L. (2015). Explaining individual differences in cognitive processes underlying hindsight bias. Psychonomic Bulletin & Review, 22, 328–348. doi: 10.3758/s13423-0140691-5 Davidson, D. (1991). Children’s decision-making examined with an information-board procedure. Cognitive Development, 6, 77–90. doi: 10.1016/0885-2014(91)90007-Z Erdfelder, E., Auer, T.-S., Hilbig, B. E., Aßfalg, A., Moshagen, M., & Nadarevic, L. (2009). Multinomial processing tree models: A review of the literature. Zeitschrift für Psychologie, 217, 108–124. doi: 10.1027/0044-3409.217.3.108 Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191. doi: 10.3758/BF03193146 Gigerenzer, G., & Goldstein, D. G. (2011). The recognition heuristic: A decade of research. Judgment and Decision Making, 6, 100–121. Retrieved from http://journal.sjdm.org/11/rh15/rh15.pdf Gigerenzer, G., & Todd, P. M. (1999). Fast and frugal heuristics: The adaptive toolbox. In G. Gigerenzer, P. M. Todd, & ABC Research Group (Eds.), Simple heuristics that make us smart (pp. 3–34). New York, NY: Oxford University Press. Gigerenzer, G., Todd, P. M., & ABC Research Group. (Eds.). (1999). Simple heuristics that make us smart. New York, NY: Oxford University Press. Gilovich, T., Griffin, D., & Kahneman, D. (Eds.). (2002). Heuristics and biases: The psychology of intuitive judgment. New York, NY: Cambridge University Press. Glöckner, A., & Bröder, A. (2014). Cognitive integration of recognition information and additional cues in memory-based decisions. Judgment and Decision Making, 9, 35–50. Retrieved from http://journal.sjdm.org/13/13912/jdm13912.pdf Glöckner, A., Hilbig, B. E., & Jekel, M. (2014). What is adaptive about adaptive decision making? A parallel constraint satisfaction account. Cognition, 133, 641–666. doi: 10.1016/j.cognition. 2014.08.017 Goldstein, D. G., & Gigerenzer, G. (1999). The recognition heuristic: How ignorance makes us smart. In G. Gigerenzer, P. M. Todd, & ABC Research Group (Eds.), Simple heuristics that make us smart (pp. 37–58). New York, NY: Oxford University Press. Goldstein, D. G., & Gigerenzer, G. (2002). Models of ecological rationality: The recognition heuristic. Psychological Review, 109, 75–90. doi: 10.1037/0033-295X.109.1.75 Gregan-Paxton, J., & Roedder John, D. (1997). The emergence of adaptive decision making in children. Journal of Consumer Research, 24, 43–56. doi: 10.1086/209492

Ó 2017 Hogrefe Publishing

Groß, J., & Bayen, U. J. (2015a). Hindsight bias in younger and older adults: The role of access control. Aging, Neuropsychology, & Cognition, 22, 183–200. doi: 10.1080/13825585.2014. 901289 Groß, J., & Bayen, U. J. (2015b). Adult age differences in hindsight bias: The role of recall ability. Psychology and Aging, 30, 253–258. doi: 10.1037/pag0000017 Hilbig, B. E. (2008). Individual differences in fast-and-frugal decision making: Neuroticism and the recognition heuristic. Journal of Research in Personality, 42, 1641–1645. doi: 10.1016/j.jrp.2008.07.001 Hilbig, B. E. (2010a). Precise models deserve precise measures: A methodological dissection. Judgment and Decision Making, 5, 272–284. Retrieved from http://journal.sjdm.org/10/rh5/rh5. pdf Hilbig, B. E. (2010b). Reconsidering ‘evidence’ for fast and frugal heuristics. Psychonomic Bulletin & Review, 17, 923–930. doi: 10.3758/PBR.17.6.923 Hilbig, B. E., Erdfelder, E., & Pohl, R. F. (2010). One-reason decision-making unveiled: A measurement model of the recognition heuristic. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 123–134. doi: 10.1037/a0017518 Hilbig, B. E., & Pohl, R. F. (2008). Recognizing users of the recognition heuristic. Experimental Psychology, 55, 394–401. doi: 10.1027/1618-3169.55.6.394 Horn, S., Pachur, T., & Mata, R. (2015). How does aging affect recognition-based inference? A hierarchical Bayesian modeling approach. Acta Psychologica, 154, 77–85. doi: 10.1016/j.actpsy. 2014.11.001 Horn, S. S., Ruggeri, A., & Pachur, T. (2016). The development of adaptive decision making: Recognition-based inference in children and adolescents. Developmental Psychology, 52, 1470–1485. doi: 10.1037/dev0000181 Howse, R. B., Best, D. L., & Stone, E. R. (2003). Children’s decision making: The effect of training and memory aids. Cognitive Development, 18, 247–268. doi: 10.1016/S0885-2014(03)00023-6 Jacobs, J. E., & Klaczynski, P. A. (2002). The development of judgment and decision making during childhood and adolescence. Current Directions in Psychological Science, 11, 145–149. doi: 10.1111/1467-8721.00188 Jacobs, J. E. & Klaczynski, P. A. (Eds.). (2005). The development of judgment and decision making in children and adolescents. Mahwah, NJ: Erlbaum. Jacobs, J. E., & Potenza, M. (1991). The use of judgment heuristics to make social and object decisions: A developmental perspective. Child Development, 62, 166–178. doi: 10.2307/1130712 Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgment under uncertainty: Heuristics and biases. Cambridge, UK: Cambridge University Press. Klaczynski, P. A. (2004). A dual-process model of adolescent development: Implications for decision making, reasoning, and identity. Advances in Child Development and Behavior, 32, 73–123. doi: 10.1016/S0065-2407(04)80005-3 Klaczynski, P. A., Byrnes, J. P., & Jacobs, J. E. (2001). Introduction to the special issue: The development of decision making. Journal of Applied Developmental Psychology, 22, 225–236. doi: 10.1016/S0193-3973(01)00081-8 Klayman, J. (1985). Children’s decision strategies and their adaptation to task characteristics. Organizational Behavior and Human Performance, 35, 179–201. doi: 10.1016/07495978(85)90034-2 Kuhn, D., & Franklin, S. (2006). The second decade: What develops (and how). In D. Kuhn, R. S. Siegler, W. Damon, & R. M. Lerner (Eds.), Handbook of child psychology. Vol. 2: Cognition, perception, and language (6th ed., pp. 953–993). Hoboken, NJ: Wiley.

Zeitschrift für Psychologie (2017), 225(1), 20–30

Marewski, J. N., Gaissmaier, W., Schooler, L. J., Goldstein, D. G., & Gigerenzer, G. (2010). From recognition to decisions: Extending and testing recognition-based models for multialternative inference. Psychonomic Bulletin & Review, 17, 287–309. doi: 10.3758/PBR.17.3.287 Marewski, J. N., Pohl, R. F., & Vitouch, O. (2010). Recognition processes in inferential decision making [Special issue; Part 1]. Judgment and Decision Making, 5(4). Retrieved from http:// journal.sjdm.org/vol5.4.html Marewski, J. N., Pohl, R. F., & Vitouch, O. (2011a). Recognition processes in inferential decision making [Special issue; Part 2]. Judgment and Decision Making, 6(1). Retrieved from http:// journal.sjdm.org/vol6.1.html Marewski, J. N., Pohl, R. F., & Vitouch, O. (2011b). Recognition processes in inferential decision making [Special issue; Part 3]. Judgment and Decision Making, 6(5). Retrieved from http:// journal.sjdm.org/vol6.5.html Mata, R., Schooler, L. J., & Rieskamp, J. (2007). The aging decision maker: Cognitive aging and the adaptive selection of decision strategies. Psychology and Aging, 22, 796–810. doi: 10.1037/ 0882-7974.22.4.796 Mata, R., von Helversen, B., & Rieskamp, J. (2011). When easy comes hard: The development of adaptive strategy selection. Child Development, 82, 687–700. doi: 10.1111/j.1467-8624. 2010.01535.x Michalkiewicz, M., Arden, K., & Erdfelder, E. (2017). Do smarter people make better decisions? The influence of intelligence on adaptive use of the recognition heuristic. Manuscript submitted for publication. Michalkiewicz, M., & Erdfelder, E. (2016). Individual differences in use of the recognition heuristic are stable across time, choice objects, domains, and presentation formats. Memory & Cognition, 44, 454–468. doi: 10.3758/s13421-015-0567-6 Michalkiewicz, M., Minich, B., & Erdfelder, E. (2017). Explaining individual differences in fast-and-frugal decision making: The impact of need for cognition and faith in intuition on use of the recognition heuristic. Manuscript submitted for publication. Moshagen, M. (2010). MultiTree: A computer program for the analysis of multinomial processing tree models. Behavior Research Methods, 42, 42–54. doi: 10.3758/BRM.42.1.42 Pachur, T. (2010). Recognition-based inference: When is less more in the real world? Psychonomic Bulletin & Review, 17, 589–598. doi: 10.3758/PBR.17.4.589 Pachur, T., Bröder, A., & Marewski, J. (2008). The recognition heuristic in memory-based inference: Is recognition a non-compensatory cue? Journal of Behavioral Decision Making, 21, 183–210. doi: 10.1002/bdm.581 Pachur, T., & Hertwig, R. (2006). On the psychology of the recognition heuristic: Retrieval primacy as a key determinant of its use. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 983–1002. doi: 10.1037/02787393.32.5.983

Zeitschrift für Psychologie (2017), 225(1), 20–30

R. F. Pohl, Using the Recognition Heuristic

Pachur, T., Mata, R., & Schooler, L. J. (2009). Cognitive aging and the adaptive use of recognition in decision making. Psychology and Aging, 24, 901–915. doi: 10.1037/a0017211 Pachur, T., Todd, P. M., Gigerenzer, G., Schooler, L. J., & Goldstein, D. G. (2011). The recognition heuristic: A review of theory and tests. Frontiers in Psychology, 2, 1–14. doi: 10.3389/ fpsyg.2011.00147 Pohl, R. F. (2006). Empirical tests of the recognition heuristic. Journal of Behavioral Decision Making, 19, 251–271. doi: 10.1002/bdm.522 Pohl, R. F. (2011). Recognition information in inferential decision making: An overview of the debate. Judgment and Decision Making, 6, 423–438. Retrieved from http://journal.sjdm.org/11/ rh19/rh19.pdf Pohl, R. F., Bayen, U. J., & Martin, C. (2010). A multi-process account of hindsight bias in children. Developmental Psychology, 46, 1268–1282. doi: 10.1037/a0020209 Pohl, R. F., Michalkiewicz, M., Erdfelder, E., & Hilbig, B. E. (2017). Use of the recognition heuristic depends on the domain’s recognition validity, not on the recognition validity of selected sets of objects. Memory & Cognition. Advance online publication. doi: 10.3758/s13421-017-0689-0 Ruggeri, A., & Katsikopoulos, K. V. (2013). Make your own kinds of cues: When children make more accurate inferences than adults. Journal of Experimental Child Psychology, 115, 517–535. doi: 10.1016/j.jecp.2012.11.007 Schwikert, S. R., & Curran, T. (2014). Familiarity and recollection in heuristic decision making. Journal of Experimental Psychology: General, 143, 2341–2365. doi: 10.1037/xge0000024 Stanovich, K. E., West, R. F., & Toplak, M. E. (2011). The complexity of developmental predictions from dual process models. Developmental Review, 31, 103–118. doi: 10.1016/j.dr.2011. 07.003 Todd, P. M., Gigerenzer, G., & ABC Research Group. (Eds.). (2012). Ecological rationality: Intelligence in the world. New York, NY: Oxford University Press. Received October 17, 2016 Revision received November 17, 2016 Accepted November 18, 2016 Published online July 12, 2017 Rüdiger F. Pohl Department of Psychology School of Social Sciences University of Mannheim 68131 Mannheim Germany pohl@psychologie.uni-mannheim.de

Ó 2017 Hogrefe Publishing

Original Article

Measuring the Zero-Risk Bias Methodological Artefact or Decision-Making Strategy? Elisabeth Schneider,1 Bernhard Streicher,2 Eva Lermer,1 Rainer Sachs,3 and Dieter Frey1 1

Department of Psychology, LMU Munich, Germany

Department of Psychology and Medical Sciences, UMIT University, Hall in Tyrol, Austria Munich Re, Munich, Germany

Abstract: Uncertainty is a dynamic state that is perceived as discomforting and individuals are highly motivated to reduce these feelings. With regard to risky decision making, people tend to overweigh the value of certainty and opt for zero-risk solutions, even if this results in a less favorable outcome. This phenomenon is referred to as the zero-risk bias and it has been demonstrated in varying contexts and with different methods. However, there is a high variance in the emergence of the bias reported by the existing literature, leaving it unclear to what extent the bias was evoked by the method or whether other psychological factors influenced people’s decision making. Four studies were conducted in order to investigate methodological and situational factors on the bias, comparing its emergence within different task formats (questionnaires vs. behavioral tasks), decision types (forced choice vs. free resource allocation), and different decision domains. Results indicate that the zerorisk bias is persistent over different methods but highly sensitive to contextual factors: abstractness of the task, decision domain, and appropriateness of the zero-risk option. First, its emergence varied between the task formats, in that it was shown more often in abstract than in concrete tasks. Second, participants’ choice of zero-risk did not correlate between different tasks, indicating effects of decision domain. Third, a zero-risk strategy seemed to be appropriate for dividing risks on objects (lottery urns in a gambling task) but not on persons (in a health scenario). In the latter situation, aspects like fairness influenced choice. Future research is needed to explore the relation between these factors and identify their underlying mechanisms. Keywords: zero-risk bias, risky decision making, measurement, domain-specific risk taking

Assume you are taking part in the TV game show “Who wants to be a millionaire?” You have to correctly answer a series of questions of increasing difficulty to win one million. For each question, you get a certain amount of money, which is (mostly) doubled for the next question. You now have to choose one out of two variations: You can either play the safe mode, where you have two safe haven questions, the first one at a level of €500 and the other one at a level of €16,000 (if you answer these questions correctly, you will always get this safe amount, even if you answer a subsequent question incorrectly), or you can play the risk mode, where you trade the safe level of €16,000 for an additional joker (you can ask a person in the audience). Which mode would you choose? As seen on TV, a lot of people choose the safe mode for different reasons: to have a relatively high amount of money guaranteed (since the majority of the participants reach the €16,000 level), to reduce the potential loss you would experience by giving a wrong answer in the late rounds and hence reduce expected regret, or to simply feel safe when gambling in the last rounds which offer high cash prizes. Now, assuming the same situation and knowing that in 95% of cases, the audience joker gives the correct answer and that participants playing the risk mode win €110,000 Ó 2017 Hogrefe Publishing

on average whereas participants playing the safe mode win only €40,000 on average, which mode would you choose now? A rational agent should now choose the risk mode, since its expected utility is much higher than that of the safe mode. However, it is doubtful that everybody really decides this way – especially when he/she sits on the hot chair in the TV studio. The example illustrates some basic psychological motives influencing decision making: being averse to uncertainty and risk as well as avoiding losses. Whereas laypeople might consider uncertainty and risk alike, since both might be linked to similar unpleasant emotional states, research distinguishes between both terms. The term “risk” refers to a known set of (negative) outcomes, whereas in the case of “uncertainty” such knowledge is missing (Knight, 1921). Following this, risk is associated with known, accessible, or objective probabilities, and uncertainty with unknown, unobtainable, or subjective probabilities (Gigerenzer, 2002; Knight, 1921; Leroy & Singell, 1987). Often there is a continuous transition between risk and uncertainty (Gigerenzer, 2002), depending on knowledge, information, time, or level of perspective (Tannert, Elvers, & Jandrig, 2007). With respect to the above-mentioned example, the risk mode should rather be considered an uncertain mode. Zeitschrift für Psychologie (2017), 225(1), 31–44 DOI: 10.1027/2151-2604/a000284

However, when enough information is available to calculate outcome probabilities, as in the second part of the example, the uncertain mode becomes a true risk mode. The most prominent decision-making theory considering those psychological motives of risk aversion and loss avoidance is Kahneman and Tversky’s (1979) prospect theory. It proposes three major principles: (1) a reference point from which all better outcomes are perceived as gains and all worse outcomes as losses, (2) a utility function that is (a) concave for gains (risk aversion) and convex for losses (risk seeking) and (b) steeper for losses than for gains (loss aversion), and (3) a nonlinear probability weighting function implying overweighting of small probabilities, underweighting of large probabilities, and reduced sensitivity to probabilities around .50. By following those three principles, people often violate the principles of utility maximization. For example, when deciding between the reduction of two (or more) risks, people often prefer the safe option, even if this option is less favorable than the other alternative(s) – they exhibit a zero-risk bias. The reasons for opting for zero-risk can be many. In general, they could depend on personal (e.g., less ambiguity tolerance, a high need for safety, high anticipated regret) and on situational factors (e.g., social norms, task complexity, available information). Since research regarding the zero-risk bias is sparse and it cannot be ruled out that the bias was just evoked by the methods used, we concentrate on situational variables affecting the zero-risk bias in our study. Returning to the example, a person might just choose the safe mode because he/she is not aware of the expected values and probabilities or does not have the mathematical skills to compute the optimal solution. Hence, the bias could be evoked by the tasks’ complexity and diminished if the task is simple enough. However, in real life we usually cannot predict values and probabilities to all alternatives. “Playing the safe mode” might therefore serve as a general strategy to reduce complexity and simplify the decision itself. The current research addresses this issue by examining the zero-risk bias with several designs and in various contexts, in order to answer the question whether the bias is a methodological artifact or a decision-making strategy and which situational factors promote the bias.

Theoretical Background Zero-Risk Bias For decades, psychological research has demonstrated the shortcomings of a rational agent model in human decision making and has highlighted the boundaries of expected utility theory (e.g., Birnbaum, 2004; Hastie &

Zeitschrift für Psychologie (2017), 225(1), 31–44

E. Schneider et al., Measuring the Zero-Risk Bias

Dawes, 2001; Hsu, Krajbich, Zhao, & Camerer, 2009; Kahneman, 2012; Kahneman & Tversky, 1979; Rottenstreich & Hsee, 2001). Allais (1953) was one of the first to note that the simple multiplication of the objective probability of an alternative by its value (as proposed by expected utility theory) cannot predict decision making properly since people subjectively distort objective probabilities. Kahneman and Tversky (1979) called this effect diminishing sensitivity and embedded it as one of the three major principles in prospect theory (besides the point of reference and loss aversion). They demonstrated that probabilities are not perceived as linear in that subjective probability distortion is greatest to the poles of the function, that is, 0.0 and 1.0. Reducing a risk from 5% to 0% is perceived as more valuable than a reduction by an equal range from 55% to 50%, since the former is closer to certainty. This phenomenon is referred to as the zero-risk bias and people are willing to take into account considerable drawbacks in order to gain certainty (e.g., Allais, 1953; Kahneman, 2012; Kahneman & Tversky, 1979; Viscusi, Magat, & Huber, 1987). Baron, Gowda, and Kunreuther (1993) demonstrated that, given that choosing the zero-risk option out of three possible alternatives resulted in the highest overall risk, 11% of participants rated this option as the best out of three and even 42% did not vote it as the worst alternative. When putting certainty into money, around 89% of participants were willing to pay a certainty premium (Viscusi et al., 1987): they were willing to pay more for the complete elimination of one risk than for an equal reduction of another risk with residual overall risk. Furthermore, when participants were asked how much money they were willing to pay to reduce risk, they paid around 1.5 more for the complete elimination of one out of two risks than for the reduction of two risks to a certain rest-risk – both alternatives resulting in the same overall risk (Ritov, Baron, & Hershey, 1993). Using a similar questionnaire design to that of Baron et al. (1993), in our research we demonstrated that the zero-risk bias varied over decision domains (it was pronounced the strongest in the social domain and less in the health domain, in that participants opted more often for zero-risk if they could be departed from family or friends than if they could reduce their own risk of getting injured) and was quite robust to changes in the favorability of a rest-risk option (Schneider, Streicher, Lermer, Sachs, & Frey, 2015), indicating that people neglected the overall risk. In sum, a zero-risk bias could be demonstrated in hypothetical decision and gambling scenarios as well as in prizing tasks or consumer decisions. However, there is a high variance in its prevalence and previous research failed in identifying psychological mechanisms causing the bias.

Ó 2017 Hogrefe Publishing

E. Schneider et al., Measuring the Zero-Risk Bias

The number of people exhibiting a zero-risk bias (1) varied between the different methods used (e.g., behavioral tasks, questionnaires) as well as (2) depended on the decision domain addressed. The former could be explained by assuming that the zero-risk bias is just a methodological artifact. However, since each method evoked the bias, we propose that the variance in the prevalence of the bias was rather driven by methodological demands or contextspecific factors: First, different tasks may trigger different ways of thinking (e.g., concrete vs. abstract, see Trope & Liberman, 2010) and second may require different answer modes (e.g., behavioral decision vs. hypothetical decision in a questionnaire). To our knowledge, there are no studies addressing the influence of the method on the zero-risk bias. Furthermore, there is no research explaining the psychological reasons for this variance. By using construal level theory (Trope & Liberman, 2010) and domain-specific risk taking, we try to offer theoretical explanations for the findings mentioned above.

Context Sensitivity of Risk Estimations When thinking of risks, people usually think in verbal terms, that is, “It is very likely that it will rain tomorrow,” and not numerically, that is, “There is a chance of 95% that it will rain tomorrow” (Dobelli, 2011). Effects like framing and decision-making biases are very sensitive to the scale being used, in that they are often shown on verbal rating scales (e.g., very likely) but are reduced or vanish completely when the same task is presented in numerical frequencies (e.g., 1 out of 10). For example, Reyna, Chick, Corbin, and Hsia (2014) demonstrated that the emergence of framing effects varied, depending on how the probabilities were presented: when presented verbally/qualitatively (e.g., very likely), framing effects were the strongest, whereas they vanished when each probability of each outcome was numerically presented. Similar effects were found for the conjunction fallacy, which means rating two conjunct events more likely than just one of these two (Tversky & Kahneman, 1983): It decreased when participants were asked for absolute frequencies instead of verbal probabilities or percentages (Fiedler, 1988; Hertwig & Gigerenzer, 1999). Furthermore, studies showed that verbal scales are much more sensitive to psychological effects like unrealistic optimism than numerical scales (Lermer, Streicher, Sachs, & Frey, 2013; Windschitl & Wells, 1996). Regarding the zero-risk bias, to the best of our knowledge there is no research systematically testing the effects of scales or verbal and numerical frequencies on the bias. At the extreme and similar to the conjunction fallacy, it might just be evoked by the mere presentation of verbal probabilities instead of numerical frequencies. The present research uses different presentation styles/scales, in that we presented Ó 2017 Hogrefe Publishing

percentages, absolute frequencies, and a visual analog scale to compare the biases’ emergence. Construal Level Theory Besides scales, tasks can also vary in their level of abstractness: Taking balls from a pot in lottery, feeling them, seeing the pots in front of you, might be very concrete compared to simply reading of that lottery and having to hypothesize to take part in this lottery in a questionnaire – this might be rather abstract. Construal level theory (Trope & Liberman, 2010) addresses this issue. Its basic assumption is that people think in different construal levels and that these construal levels affect the perceived psychological distance of an object, person, or event – and vice versa. Construals are mental representations that are either concrete (low level construal) or abstract (high level construal). Psychological distance reflects the perceived closeness of an object, person, or event to oneself at the present state (Trope & Liberman, 2010). The theory postulates a bidirectional relationship between construal level and psychological distance, meaning that if the mental representation of, for example, an object is concrete (vs. abstract), it is perceived as close (vs. distant). In turn, if an object is perceived as close (vs. distant), a concrete (vs. abstract) mental representation is triggered. With respect to task presentations, research on construal level theory found that words are associated with high level construals and abstract representation, whereas pictures are associated with low level construals and concrete representation (Amit, Algom, & Trope, 2009; Amit, Algom, Trope, & Liberman, 2008; Trope & Liberman, 2010). For example, presenting an event in words results in more general categorization than presenting an event in pictures, and response times to pictures of objects are faster than response times to the word of the object (Amit et al., 2009). Based on this reasoning, in the present research, we assume that questionnaire scenarios, that is, written descriptions, are linked to a rather abstract mental representation on a high level construal whereas we assume that behavioral tasks, analogous to pictures, trigger a rather concrete mental representation on a low level construal. Research shows that construal level and psychological distance affect probability estimations, risk seeking, and framing effects. Participants who were thinking concretely about events (low level construal) rated them as more likely than participants who thought about the same events on a high level construal (Lermer, Streicher, Sachs, Raue, & Frey, 2016; Raue, Streicher, Lermer, & Frey, 2015; Wakslak & Trope, 2009). When thinking on a low level construal, participants were more risk averse, and risks that are perceived as close are rated more likely and trigger less risk seeking (Lermer et al., 2016). Furthermore, Raue et al. (2015) demonstrated that framing effects only occurred Zeitschrift für Psychologie (2017), 225(1), 31–44

on a low level construal, but not on a high level construal. This can explain the findings of Keysar, Hayakawa, and An (2012), who demonstrated that framing effects only occurred when material was presented in mother tongue, but not in a foreign (but well-spoken) language. A foreign language is perceived as distant and triggers a high level construal, whereas mother tongue is (emotionally) close and represented on low level construal. Wakslak and Trope (2008) demonstrated that spatially remote cats were thought to be more likely to have a rare blood type (unlikely events are perceived on a high level construal), whereas spatially near cats were thought to be more likely to have a common blood type (likely events are perceived on a low level construal). Basing on these findings, it seems to be fruitful to build on construal level theory to compare effects of task presentation formats on the zero-risk bias. Domain-Specific Risk Taking Besides scales/presentation styles and abstractness, tasks can vary in the risk domains they address, for example, they can touch health-related risks or financial risks. Research shows that risk attitude and risk taking behavior vary depending on the risk domain (e.g., Blais & Weber, 2006; MacCrimmon & Wehrung, 1990; Soane & Chmiel, 2005). Blais and Weber (2006) propose five different risk domains, namely health, financial, ethical, recreational, and social. For example, people might behave very risk seeking regarding financial decisions but act risk averse regarding health decisions. Recent research (Schneider et al., 2015) confirmed these domain-specific effects and extended them to the zero-risk bias, showing that the zero-risk bias was most pronounced in the social domain and the least in the health domain. By further considering different domains, we extend the existing research which mainly focused on health and financial risks and consolidate recent findings.

E. Schneider et al., Measuring the Zero-Risk Bias

emergence of the bias is simply caused by the methods used or due to other factors. For example, the different presentations of frequencies in the tasks, the different response modes of the tasks (concrete behavior vs. abstract decision scenario), or the different domains they touch (health, gambling, financial) could affect the biases’ emergence. The current research aims to explore the emergence of the bias in different settings and designs in order to shed further light on the question whether the bias is a methodological artifact or a decision-making strategy. For this purpose, we deployed different methods to measure the zero-risk bias and to examine the relationship between these methods. We conducted four studies, focusing on the following research questions: Research Question 1 (RQ1): Does the emergence of the zero-risk bias vary depending on the method used? Research Question 2 (RQ2): If so, are these variations in its emergence systematically, for example, can they be explained by task characteristics or the decision domain? On these grounds, we compared forced choice answer formats (Study 1 and 2) with free allocation formats (Study 3 and 4), abstract questionnaire scenarios (assumed to be associated with a high level construal) with concrete behavioral tasks (assumed to be associated with a low level construal; Study 1 and 2) and examined the biases’ emergence within different risk domains (gambling, Study 1 and 2; financial, Study 3; social, Study 1, 2, and 3; health, Study 3 and 4). By doing so, we try to integrate the rather isolated previous findings and try to identify conditions promoting the occurrence of the zero-risk bias.

Study 1

The Present Research In sum, research regarding the zero-risk bias is sparse and the number of participants exhibiting the bias varied from 11% (Baron et al., 1993) up to 89% (Viscusi et al., 1987). All studies used different research methods (e.g., decision scenario, Baron et al., 1993; Schneider et al., 2015; spending money, Ritov et al., 1993; Viscusi et al., 1987) and touched different risk domains (e.g., deposit cleanup to reduce the emergence of cancer, Baron et al., 1993; Ritov et al., 1993; gambling, e.g., Gottlieb, Weiss, & Chapman, 2007; Hsu et al., 2009; Kahneman & Tversky, 1979; willingness to pay in order to reduce toxic side effects, Viscusi et al., 1987). There is no common research method to investigate the bias and it is unclear whether the variance in the Zeitschrift für Psychologie (2017), 225(1), 31–44

Study 1 was conducted to examine the relation of the emergence of the bias between two different tasks. In short: do people who exhibit the bias in one task also exhibit the bias in another task? For that purpose, we compared participants’ choice in a behavioral gambling task (low level construal) with their choice in a social decision scenario (high level construal). Method Participants and Design Data was collected from students on the campus at two Bavarian Universities. The sample consisted of 121 participants (female: n = 90, 74.4%; male: n = 31, 25.6%), their age ranged from 18 to 34 years (M = 22.25, SD = 3.13) and they came from different fields of study (social or Ó 2017 Hogrefe Publishing

E. Schneider et al., Measuring the Zero-Risk Bias

educational sciences: n = 77, 63.6%); engineering and natural sciences [including mathematics]: n = 23, 19.0%; economics or law: n = 11, 9.1%; human or cultural sciences: n = 9, 7.5%; no indication: n = 1, 0.8%). Note that this sample size provides a power of .97 to detect an effect of ω = .30. The study included two structurally similar tasks, a lottery and a questionnaire scenario. The tasks touched two different risk domains and different construal levels: The lottery task was a behavioral task which we assume to evoke concrete, low construal level (CL) thinking and touches a gambling domain. The questionnaire scenario was settled in a social context (domain) and afforded hypothetical thinking, which should evoke a rather high CL. Participants had to do both tasks sequentially. In both tasks, participants could reduce overall risk, but they had to decide whether to reduce a risk from 5% to 0% (zero-risk option) or to reduce a risk from 20% (or 30%) to 5% (restrisk option). This resulted in a 2 (gambling and low CL vs. social and high CL) 2 (amount of reduction in the restrisk option: low [from 20% to 5%] vs. high [from 30% to 5%]) design with repeated measure for the first factor. Participation was voluntary and anonymous and participants received a small gift. Materials and Procedure Participants were recruited on the university campus and the study was directly done there. First, participants received a small gift (one piece of candy). After that, they did the first task (low CL, gambling domain): Participants were told by the examiner that they could now double their stakes in a lottery task, that is, get a second gift, or lose all, that is, have to give the gift back to the examiner. In the lottery, they had two pots in front of them, each containing several gains and losses (red and blue balls). They had to blindly draw one ball from each pot, meaning that they had to draw one ball from pot 1 and one ball from pot 2. In order to control for effects of placement, the pots were randomly placed and participants were asked about their handedness. If they drew two gains, that is, one gain from pot 1 and one gain from pot 2, they won. They lost all if they drew at least one loss from one of the pots (e.g., if one drew a gain from pot 1 and a loss from pot 2, one would lose all). Participants were informed that both pots contained 40 balls. In the high (low) overall risk reducing range condition, in pot A, 12 (8) of the 40 balls were losses. In pot B, two of the 40 balls were losses. Participants were then told that they could choose one out of two interventions in order to reduce their risk of losing. They could either choose to remove the two losses in pot B (zero-risk option) or choose to remove 10 (6) losses from pot A, so that two losses were still left in this pot (rest-risk option). After this information, participants had to indicate which intervention they would choose, removed the losses according to the intervention Ó 2017 Hogrefe Publishing

and then drew once from each pot. Since participants had to draw from each pot, in both conditions choosing the rest-risk option yielded to a lower overall risk of losing (zero-risk option: 30%/20% chance of losing; rest-risk option: 10.25% chance of losing). After the lottery task, they received a paper with a description of a decision situation, which was similar to the lottery, but located in the social risk domain (social scenario: high CL, social domain). In the high (low) overall risk reducing range condition, they read the following: “Assume that the company you are working in is restructuring and several employees have to relocate to either site Alpha or site Beta. You don’t want to move to another site since your family and your friends are living at your current location. You get the chance to reduce the risk of relocation by choosing one out of two interventions. Which intervention would you choose? (A) With a probability of 30% (20%), you have to move to site Alpha. Intervention A would reduce this risk to 5%. (B) With a probability of 5%, you have to move to site Beta. Intervention B would reduce this risk to 0%.” The structure of the lottery and the scenario is similar, both offering a rest-risk (pot/intervention A) and a zero-risk option (pot/intervention B) with the same chances to reduce the overall risk. Note that in the social scenario, the events are disjoint, leading to a slightly less overall risk if the rest-risk option is taken (social scenario: 10%; lottery: 10.25%). However, in both tasks the rest-risk option is always superior to the zero-risk option. At the end, sociodemographic data and field of study were requested. Results and Discussion Lottery There were no significant differences in choice behavior with respect to condition (30% losses vs. 20% losses), w2(1) = 0.05, p = .822. Consequently, the two condition groups were combined for further analysis. In sum, 17 participants showed the zero-risk bias (14.0%). As reported above, there was no significant influence of condition: 9 of 61 participants (14.8%) showed it in the 20%-losses condition, compared to 8 of 60 participants (13.3%) in the 30%-losses condition. Scenario As in the lottery, there were no significant differences in choice behavior with respect to condition (30% losses vs. 20% losses), w2(1) = 0.01, p = .923. Consequently, the two condition groups were combined for further analysis. Zeitschrift für Psychologie (2017), 225(1), 31–44

In sum, 57 participants showed the zero-risk bias (47.1%) in the decision scenario. This was significantly more than the percentage in the behavioral task, w2(1) = 109.50, p < .001. A w2-test revealed that there was no significant relation between the decision in the lottery and the decision in the scenario, w2(1) = 2.46, p = .117. Regarding decisional consistency, 69 participants (57.0%) decided congruently, meaning that they either chose zero-risk in both tasks or rest-risk in both tasks. There was no significant difference in congruency regarding choice in the lottery, w2(1) = 0.48, p = .490, but there was a significant difference between congruency and choice in the decision scenario, w2(1) = 62.59, p < .001. Forty-six people (38.0%) who did not show the bias in the lottery task showed it later in the scenario task. In turn, only six persons (5.0%) took the zero-risk option in the lottery and the rest-risk option in the scenario. Summary We found intrapersonal differences in choice between both operationalizations of the zero-risk bias. The bias was less pronounced in a behavioral gambling task (only 14.0%) but over one half of participants showed it in a social decision scenario. In both tasks, the overall risk reduction range/favorability of the rest-risk option had no effect on participants’ choice. If these differences in choice, that is, the zero-risk bias itself, were not just artificially caused by methodology or procedure, differences in construal level and decision domain could be possible explanations for the variation in the emergence of the bias. In order to further investigate whether and how decision domain and construal level influence the emergence of the bias, Study 2 was conducted.

E. Schneider et al., Measuring the Zero-Risk Bias

ω = .30. Again, the study followed a 2 (gambling and high CL vs. social and high CL) 2 (amount of reduction in the rest-risk option: low [from 20% to 5%] vs. high [from 30% to 5%]) design with repeated measure for the first factor. Materials and Procedure Study 2 was similar to Study 1, but the lottery task was described as a questionnaire scenario. Participation was voluntary and uncompensated. As in Study 1, participants were randomly assigned to either the high range condition (reduce the rest-risk from 30% to 5%; n = 20) or the low range condition (reduce the rest-risk from 20% to 5%; n = 20). After reading the lottery description (high CL, gambling domain), participants had to indicate which intervention they would choose (depending on the condition, the rest-risk reduction could either be from 12 losses [30%, high reduction] to 2 losses [5%] or 8 losses [20%, low reduction] to 2 losses [5%]). After that, they received the social scenario from Study 1 (high CL, social domain). At the end, they had to write down the reasons for their decisions in both tasks and sociodemographic data was collected.

The purpose of Study 2 was (a) to test our assumption that construal level and/or domain affect the zero-risk bias and (b) to separate effects of construal level and domain in order to find out which has more influence on the bias.

Results and Discussion Lottery A w2-test revealed that there were no significant differences in choice behavior with respect to condition (30% losses vs. 20% losses), w2(1) = 0.96, p = .327. Consequently, the two condition groups were combined for further analysis. In sum, 15 participants showed the zero-risk bias (37.5%). This was significantly more than in Study 1, where only 14.0% exhibited the bias in the lottery task, w2(1) = 15.88, p < .001. Since the two samples were similar regarding sociodemographic variables, the difference could indicate that choice varied depending on the response mode (i.e., construal level), in that the bias was less pronounced on a low level construal behavioral task than it was on a high level construal decision task. However, other factors causing the results cannot be ruled out.

Method Participants and Design In this study, 40 students (female: n = 19, 47.5%; male: n = 20, 50.0%; no indication: 1, 2.5%), recruited on the campus and in lectures at a Bavarian University, participated. Their age ranged from 18 to 30 years (M = 22.08, SD = 3.17), with various fields of study (mathematics, engineering, and natural sciences: n = 24, 60.0%; human or cultural sciences: n = 7, 17.5%; economics or law: n = 7, 17.5%; social or educational sciences: n = 1, 2.5%; no indication: n = 1, 2.5%). Due to the smaller sample size, this study provides a power of .48 to detect an effect of

Scenario There were no significant differences in choice with respect to range condition (30% losses vs. 20% losses), w2(1) = 0.92, p = .337, so that the two condition groups were combined for further analysis. In sum, 17 participants showed the zero-risk bias (42.5%). This distribution did not significantly differ from that found in Study 1, where 47.1% showed the zero-risk bias, w2(1) = 0.40, p = .527. This could indicate that the foregoing gambling task did not affect participants’ decision in the social scenario, however, we cannot rule out other influencing factors which might have varied between the studies.

Study 2

Zeitschrift für Psychologie (2017), 225(1), 31–44

Ó 2017 Hogrefe Publishing

E. Schneider et al., Measuring the Zero-Risk Bias

Furthermore, the number of participants showing the zero-risk bias in the social scenario did not significantly differ from that in the lottery task, w2(1) = 0.43, p = .514. A w2-test revealed that there was no significant relation between the decision in the lottery and the decision in the scenario, w2(1) = 1.15, p = .283. Even though nearly the same number of participants showed the zero-risk bias in the two tasks, there is no relation between decisions in the tasks. Twenty-four participants (60.0%) decided congruently, meaning that they either chose zero-risk in both tasks or rest-risk in both tasks. The other 40% changed their choice, which might be due to domain-specific risk taking.

Study 3

Summary Compared to Study 1, more people exhibited a zero-risk bias, when the lottery task was presented as a scenario. This finding might be caused by different construal levels, meaning that on a high level construal (abstract scenario) the bias could be more pronounced than on a low level construal (lottery task in Study 1). Taking a zero-risk option might have served as a strategy to reduce complexity: the behavioral lottery task in Study 1 might have been easier to understand than its rather abstract description in Study 2. Due to the higher cognitive effort in the abstract lottery in Study 2, people might have chosen a less cognitive effortful decision strategy, that is, take the zero. This is in line with research demonstrating that higher cognitive effort promotes heuristic information processing (e.g., Betsch & Glöckner, 2010; Hilbig, Michalkiewicz, Castela, Pohl, & Erdfelder, 2015). Furthermore, there was no relation between participants’ choices in the two scenarios (lottery and social). Participants opting for zero-risk in the lottery scenario did not necessarily opt for zero-risk in the social scenario. By now, we measured the zero-risk bias by three forced choice decision tasks, where people had to choose between a rest-risk and a zero-risk option. In each task, some people exhibited a zero-risk bias, however, the number varied between 14.0% and 47.1%. We found differences in choice within and between subjects. The former could be explained by domain-specific risk taking (e.g., Blais & Weber, 2006; Schneider et al., 2015). The latter seems to be caused by task characteristics (behavioral vs. questionnaire). As noted above, the structure of the three tasks was similar and people were forced to choose between two options. Results indicate that people did not choose randomly one option, however, their choice might have been driven by other factors, for example, cognitive effort or misconception of the task itself. In order to control for this explanation, we introduced another way to operationalize the zero-risk bias: allocating resources to different risks.

Method Participants and Design Overall, 115 participants filled in the questionnaire. There were two subsamples: The first one consisted of 89 (77.4%) students (female: n = 41, 46.1%; male: n = 47, 52.8%; no indication: 1, 1.1%), recruited in lectures at two Bavarian Universities. Their age ranged from 18 to 31 years (M = 22.09, SD = 2.62) with various fields of study (mathematics, engineering, and natural sciences: n = 47, 52.8%; social or educational sciences: n = 29, 32.6%; economics or law: n = 4, 4.5%; human or cultural sciences: n = 4, 4.5%; medicine and health sciences: n = 2, 2.2%; no indication: n = 3, 3.4%). The second subsample consisted of 26 businessmen (female: n = 9, 34.6%; male: n = 17, 65.4%), recruited in leadership seminars. Their age ranged from 33 to 62 years (M = 45.92, SD = 8.55). Eighteen of them (69.2%) provided information regarding their educational background, all of them having a university degree (mathematics, engineering, or natural sciences: n = 11, 61.1%; economics or law: n = 5, 27.8%; human or cultural sciences: n = 2, 11.1%). The sample size provides a power of .83 to detect an effect of ω = .30. Participants were randomly allocated to one of the three conditions for the scenario task (domain: health/financial/social), which was followed by a lottery scenario that all participants completed. This results in a 3 (domain) 2 (task) design with repeated measure for the last factor.

Ó 2017 Hogrefe Publishing

The purpose of introducing a resource allocation task was twofold: first, if the zero-risk bias is also shown in resource allocation, a random selection of one of the two alternatives as an explanation for the zero-risk bias can be excluded and further support for the existence of a zero-risk bias is given. Second, resource allocation might be of higher ecological validity, since in real life dichotomous all or no alternatives are rarely given. In order to analyze whether the zero-risk bias is also pronounced in resource allocation and whether this corresponds to a decision between two options, Study 3 and 4 were conducted.

Materials and Procedure The questionnaire was similar to that used in Study 2 with slight differences regarding question order and the lottery decision task. First, participants received either a social, financial, or health decision scenario (factor risk domain). In these scenarios, participants again had to decide between a zero-risk and a rest-risk option (with the zero-risk option always being inferior to the rest-risk option regarding overall risk reduction). After that, the lottery scenario was presented. However, contrary to the foregoing studies, participants were now free to allocate the losses between Zeitschrift für Psychologie (2017), 225(1), 31–44

the two pots. They were told that they were in the final round of a lottery and could now double their win or lose all. Again, they had to draw each one ball from two pots. These pots contained each 50 gains. However, there were also 10 losses which participants could now divide between the two pots. For example, they could put all losses in one pot (zero-risk), nine losses in pot 1 and one loss in pot 2 and so on. They then indicated how they would divide the losses. At the end, sociodemographic data was requested and participants were asked to write down their reasons for their decisions in the two tasks. Results and Discussion No significant effects were found for the control variables sample (students vs. businessmen), age, and sex on the dependent variables, all ps > .124. Hence, the two subsamples were combined for further analysis Scenario. In sum, 30 participants took the zero-risk option (26.1%). There were domain-specific effects on the zero-risk bias, w2(2) = 10.23, p = .006. The zero-risk bias was shown the most in the social scenario (n = 14; 40.0%) compared to the financial (n = 11; 33.3%) and the health scenario (n = 5; 10.7%). The proportional distribution of participants showing the zero-risk bias in the health scenario significantly differed from that in the financial and in the social scenario, all w2(1) > 10.89, ps < .001. The proportional emergence of the zero-risk bias did not significantly differ between the social and the financial scenario, w2(1) = .70, p = .403. This replicates previous findings (Schneider et al. 2015). There was no significant relation between the decision in the social/financial/health scenario and the lottery task, w2(1) = 1.43, p = .232, and domains had no significant effect on the lottery task, w2(2) = 2.05, p = .359. Similar to Study 2 (60.0%), 63.5% of participants (n = 73) decided congruently, meaning that they either chose zero-risk in both tasks or rest-risk in both tasks. Lottery In the lottery, 50.4% of the participants (n = 58) preferred a 50/50 split, meaning that they put five losses in each pot; 31.3% (n = 36) preferred zero-risk in one pot, that is, they either put all losses in the first or in the second pot. The majority of them (n = 26; 72.2%) put all losses in the first pot, so that they could “win in the second round for sure” (anonymous participant; verbal communication). Twentyone of the participants (18.3%) divided the losses differently (e.g., 1/9, 2/8, 3/7, 4/6) and there was no preference for one type of division. When asked for the reasons of their division, participants preferring zero-risk in one pot mostly indicated, among other reasons, having certainty in at least one pot, reducing excitement when they had the risk of losing in only one pot, or having all the tension in advance Zeitschrift für Psychologie (2017), 225(1), 31–44

E. Schneider et al., Measuring the Zero-Risk Bias

when they put it in the first pot. In Study 2, 37.5% showed the zero-risk bias in the lottery task, compared to 31.1% in Study 3. These distributions did not significantly differ from each other, w2(1) = 1.82, p = .177, but from that found in Study 1 in the behavioral task, w2(1) = 29.04, p < .001. Again, note that other factors varying between the studies could have caused this pattern. Summary The results indicate that the zero-risk bias varies among the response modes of the task (behavioral vs. scenario), which could be explained by different levels of construal, but not depending on the formulation of the task (lottery in Study 2 vs. lottery in Study 3). This supports the proposition that the zero-risk bias is not a methodological artifact, but rather a strategy to reduce unpleasant states like high cognitive load or tension.

Study 4 So far, Study 3 extended the findings by using another, more indirect way to measure the zero-risk bias, namely to let participants freely allocate losses in an abstract gambling task (high level construal). As domain-specific effects were found for the decision scenario where participants had to decide between two options, there might also be domainspecific effects for decision tasks where participants have to allocate resources/losses. Hence, Study 4 used a similar design to the gambling task in Study 3 but within the health domain. Method Participants and Design The sample consisted of 50 students (female: n = 39, 78.0%; male: n = 11, 22.0%), recruited on the campus and in lectures at a Bavarian University. Their age ranged from 19 to 37 years (M = 23.20, SD = 3.01) and they came from various fields of study (social or educational sciences: n = 37, 74.0%; human or cultural sciences: n = 6, 12.0%; economics or law: n = 6, 12.0%; mathematics, engineering, and natural sciences: n = 1, 2.0%). Since we explored the distribution in a descriptive way, no effects are calculated (this sample size would have provided a power of .56 to detect an effect of ω = .30). The questionnaire was identical for all participants, so that the study followed a one factorial design. Participation was voluntary and uncompensated. Materials and Procedure The study was performed in a paper-pencil design and participants received a short questionnaire. They read a description of a health scenario where they had to assume that they were first aiders in a car accident. They could spend the resources “time” and “dressing” to two injured Ó 2017 Hogrefe Publishing

E. Schneider et al., Measuring the Zero-Risk Bias

Table 1. Overview of the percentages of participants choosing zero-risk in each task Study

Task

Domain

Decision type

Zero-risk option (%)

Lottery

Low

Gambling

Forced choice

14.0

Scenario

High

Social

Forced choice

47.1

Lottery

High

Gambling

Forced choice

37.5

Scenario

High

Social

Forced choice

42.5

Lottery

High

Gambling

Free allocation

31.3

Scenario

High

Social

Forced choice

40.0

Financial

Forced choice

33.3

Health

Forced choice

10.7

Health

Free allocation

2 3

First aider

High

4.0 (all to slightly injured) 16.0 (all to either one person)

Note. CL = construal level.

persons. One person had a 30% risk to die and if the first aider spent all of his time/dressing on that person, this risk could be reduced to 5% (rest-risk option). The other person had a risk of 5% to die and if the first aider spent all his time/dressing on that person, the risk of this person to die could be reduced to zero (zero-risk option). Participants then had to indicate on a visual analog scale with the poles of persons 1 and 2 how they would allocate their resources. The distance between pole 1 and 2 was 10 cm, so that 1 cm corresponded to 10%. Individual scores were traced by a ruler in full centimeters. In addition, they had to write down in percentages the number of resources they would give to each person. At the end, sociodemographic data was requested and participants were asked to write down the reasons for their choice. Results and Discussion In sum, participants spent more of their resources to the severely injured person (30% risk to die, rest-risk option). They spent around 70% of their resources to the severely injured person and 30% of their resources to the slightly injured person (analog scale: M = 29.92; SD = 27.11; percentages: M = 28.86; SD = 23.46). Only two participants (4.0%) showed the zero-risk bias and gave all their resources to the slightly injured person. On the contrary, six participants (12.0%) gave all their resources to the severely injured person. Participants (80.0%; n = 40) gave at least 10% of their resources to the slightly injured person. However, only 16% of the participants (n = 8) spent more than half of their resources to the slightly injured person, meaning that the majority concentrated more on the severely injured person. Only a small number of participants (n = 8; 16.0%) gave all resources to either one person and only five participants (10.0%) split their resources equally. When asked for the reasons of their decision, most participants listed aspects of fairness and that they aimed to reduce the risk to die for both persons to a nearly equal or minimal level. Ă&#x201C; 2017 Hogrefe Publishing

Summary In general, resource or risk allocation differed between Study 3 and Study 4. Whereas over 30% of participants put all losses in either one pot (Study 3), only 16% of participants gave all resources to either one person (and only 4.0% of participants gave all resources to save one person for sure; Study 4). Furthermore, around 50% of participants preferred a 50/50 split in the lottery, compared to only 10% in the first aider scenario. Most of them mentioned aspects of fairness as a reason for the resource allocation in the first aider task. However, note that one has to give all the resources to the severely injured person in order to have an equal rest-risk to die for both persons, that is, 5%. Taking the zero-risk or giving all resources to only one person (i.e., giving the other one zero) might have been considered as an inappropriate strategy in this context and might have been replaced by a strategy of fairness. However, note that differences between studies were not based on manipulation and could be driven by other factors than suggested. Table 1 presents an overview over tasks and the emergence of the zero-risk bias.

General Discussion The purpose of the present research was to explore the effects of measurement on the emergence of a zero-risk bias. In particular, we tried to find out whether there are variations in its occurrence depending on the methods used, that is, the tasks. With construal level theory and domain specificity of risk-taking behavior we offer potential explanation approaches for the results. Four major findings emerge from our studies: first of all, the zero-risk bias varied depending on the tasks in general, ranging from 4.0% (resource allocation in a first aider questionnaire scenario) up to 47.1% (social questionnaire scenario) of participants showing the bias. Second, the bias varied depending on the abstractness of the task. The bias Zeitschrift fĂźr Psychologie (2017), 225(1), 31â&#x20AC;&#x201C;44

occurred less in behavioral task (14.0% of participants showed the bias), which we assumed to evoke a concrete, low level construal, and it was pronounced more in abstract questionnaires (up to 47.1% of participants), which might evoke a rather high level construal. Third, the zero-risk bias varied on the risk domain the method addressed. It was pronounced the strongest in the social domain and the least in the health domain. Fourth, we found intrapersonal differences in choice in a (social) decision scenario and in gambling tasks, in that a participant opting for zero-risk in a social scenario did not necessarily opt for zero-risk in a gambling scenario. Emergence of the Zero-Risk Bias There was a high variance in the emergence of the zero-risk bias between the tasks. The bias was least pronounced in the first aider scenario (4.0% or 16.0% when including an allocation of all resources to only one person), where participants had to allocate resources to injured persons, as well as in a behavioral gambling task, where only 14.0% of participants exhibited the bias. Note that, if the same gambling task was on a high level construal, that is, formulated as a gambling scenario in a questionnaire, 37.5% took the zero-risk option. When asked how to divide losses between two pots, 31.3% preferred a zero-risk split (i.e., all losses in one pot and one pot without any losses). In addition, around 30% of participants showed the bias in a questionnaire decision scenario, but its emergence varied significantly depending on the domain the scenario addressed. Only 10.7% took a zero-risk option in a health scenario, compared to 33.3% in a financial scenario and 40.0% (even 47.1% in Study 1) in a social scenario. In sum, the emergence of the bias clearly varied depending on the design, but it never vanished completely or was pronounced from more than one half of the participants. This is in line with previous findings (Baron et al., 1993; Schneider et al, 2015; Viscusi et al., 1987) and undermines the assumption, that the bias is persistent over different methods, indicating that it might be rather a context-sensitive decision strategy than a methodological artifact. In the current research, we propose two factors that influence the biases’ emergence: construal level and domain. Zero-Risk Bias and Construal Level Theory The first context factor we manipulated was the abstractness of the task: we posed exactly the same task on two different abstraction levels, in that participants had to do a gambling task either in a concrete, behavioral setting with two pots in front of them where they could draw out blanks or in an abstract questionnaire with a precise description of the scenario where they had to mark the option they would take. We assumed that both tasks should trigger different construal levels, in that the behavioral task evoked concrete Zeitschrift für Psychologie (2017), 225(1), 31–44

E. Schneider et al., Measuring the Zero-Risk Bias

thinking about behavioral responses (low level construal) compared to the questionnaire scenario, where one should imagine the task and would think rather abstract about different options (high level construal). In line with research on construal level theory, indicating that low level construals are linked to more risk aversion and less risk seeking (e.g., Lermer, Streicher, Sachs, Raue, & Frey, 2015; Lermer et al., 2016; Wakslak & Trope, 2009), participants indeed chose the option which resulted in a lower overall risk, that is, the rest-risk option. In the low level construal condition, only 14.0% took the zero-risk option, compared to 37.5% in the high level construal condition. Another important factor that affects risk perception biases is the presentation mode of the numbers or the measuring scale respectively (e.g., “very likely” vs. “one out of ten” vs. 10%). For example, research on conjunction fallacy (rating two conjunct events likelier than just one of them) demonstrated that the bias decreases or is completely eliminated when facts are presented in verbal frequencies instead of numerical probabilities (Fiedler, 1988; Tversky & Kahneman, 1983). This can be transferred to the present research, in that the colored balls in both lottery tasks were presented in frequencies (e.g., two blue losses) compared to the social scenario, where risks were presented in probabilities (e.g., 5%). Regarding the differences in the exhibition of the zero-risk bias between the social scenario and the behavioral lottery, this is similar to research on conjunction fallacy: the zero-risk bias was less pronounced in the lottery tasks compared to the social scenario. However, this cannot explain the differences between the two lottery presentations (behavioral vs. questionnaire), since the frequencies in the high level construal lottery scenario were also presented verbally and illustrated by a small picture. Hence, it seems plausible that the differences between the lottery and the social scenario are rather driven by domain-specific risk taking than the presentations of the probabilities. Regarding the differences between the two lottery tasks, thinking about concrete actions in the behavioral lottery could have promoted a concrete style of thinking about the risks associated, that is, deeper information processing. On the other hand, thinking abstractly about risks in the questionnaire scenario might have triggered rather superficial information processing, since one might have been less involved. Based on the assumption that human decision making is guided by two processing systems – a fast, intuitive system 1 and a slow, rule-based, rational system 2 (dual-system approach, see Kahneman & Frederick, 2009; Lermer et al., 2013; Loewenstein, Weber, Hsee, & Welch, 2001; Schwarz, 2009; Slovic, Finucane, Peters, & MacGregor, 2004; Slovic & Peters, 2006), superficial information processing in the questionnaire scenario might have triggered system 1. This intuitive system is rather linked to heuristic decision making and taking the zero-risk Ó 2017 Hogrefe Publishing

E. Schneider et al., Measuring the Zero-Risk Bias

might have served as a simple rule of thumb. On the other hand, a concrete thinking style might have triggered system 2, linked to rather rational, utility-maximizing decisionmaking strategies. However, note that we did not measure participant’s construal level so we cannot conclude that two presentation modes indeed evoked different construal levels. Moreover, the outlined dual-system explanation is a mere post hoc explanation (Gigerenzer, 1998; Keren & Schul, 2009) and, therefore, future research should test the corresponding predictions. For example, response times should be shorter for information processing in system 1, and differences between a system 1 associated presentation style and a system 2 associated presentation style should disappear, when participants are under cognitive load in the system 1 associated condition. An alternative explanation might lie in the cognitive effort the two presentation modes required. It might have been cognitively more demanding to imagine the lottery task and then draw conclusions than seeing the two pots in front of oneself in a real-world situation. Cognitive capacity is often considered a limiting factor in judgment and decision making (for limitations of that assumption, see Betsch & Glöckner, 2010; Glöckner & Witteman, 2010). The higher the cognitive load the likelier the application of simple decision rules, which are often linked to an intuitive, heuristic judgment (e.g., Gilovich, Griffin, & Kahneman, 2009; Kahneman, 2012; Kruglanski & Gigerenzer, 2011). Regarding the present findings, people might have taken the simple zero-risk rule in the abstract lottery, since it already afforded a high amount of cognitive capacity (however, note that we did not measure cognitive load). Research showed that when cognitive load was heightened, for example, by contradicting information (Betsch & Glöckner, 2010) or distracting cues (Hilbig et al., 2015), people were more prone to use heuristic processing. Domain Specificity The second context factor we investigated was decision domain. Again, the zero-risk bias varied on the risk domain the method addressed. In line with previous findings (Schneider et al., 2015), it was pronounced the most in the social domain and the least in the health domain (when using a decision scenario with a zero-risk and a rest-risk option). The current research extended previous findings by adding another design, namely a free allocation of resources between two alternatives in order to reduce an overall risk. We used this design within a gambling context (Study 3) and a health context (Study 4) and found significant differences. Whereas only 16% of participants allocated all resources to one alternative (injured person) – and only 4.0% took the clear zero-risk option (i.e., giving all resources to the slightly injured person that can be saved Ó 2017 Hogrefe Publishing

for sure) – 31.3% of participants preferred a zero-risk option in the gambling task. This distribution is very similar to that found between the health and financial decision scenario. In the health scenario, 10.7% took zero-risk compared to 33.3% in the financial scenario. This gives further support for the consistency of domain-specific effects on the zerorisk bias, independent of the method used. With regard to the underlying mechanisms causing these effects, one can only speculate, but our qualitative analyses give indications for relevant factors that might influence domain-specific risk taking. In the health domain, participants focused more on the overall risk. Furthermore, aspects of fairness were important for the allocation of resources, leading to a 70/30 split in favor of a severely injured person compared to a less injured person. However, by such a split, they did not reach an equal rest-risk for both persons, indicating that the decision was not driven by outcome fairness, but rather by procedural fairness. This is in line with research demonstrating that procedural justice is perceived as more relevant than outcome fairness, even if this results in an unequal outcome distribution (e.g., Greenberg & Colquitt, 2005). Note that, even though both tasks were on a high level construal, it is interesting that participants did not solely focus on the task (overall risk reduction), but extended the scenario by context factors like the anticipated consequences or emotions, making fairness salient. Indeed, people try to embed risks in a broader context in order to give meaning and make risks more assessable (e.g., Petts et al., 2000). By extending the scenarios with such features, participants could have given meaning to the scenario. With regard to the gambling scenario, most participants taking a zero-risk split wanted all losses in the first pot, so that they had the unpleasant state first and could then relax, that is, they did not have to split their attention. Similar considerations were found in the comments regarding the financial scenario. Here, participants tried to eliminate one risk completely, so that they could focus all their energy on the (higher) rest-risk. Taking these statements, cognitive effort could be one mechanism causing the bias, in that eliminating one risk completely could release cognitive capacity. Since one major purpose of heuristics is to reduce cognitive effort (e.g., Gigerenzer & Gaissmaier, 2011; Shah & Oppenheimer, 2008; Simon, 1959), taking a zero-risk option could perform better in effort reduction than a rest-risk option. Future research is needed to detect the underlying mechanisms causing differences in risk taking behavior between domains. Intrapersonal Variability Methods did not correlate, meaning that we found no relation between the emergence of the zero-risk bias in the (social) decision scenario and in the other gambling Zeitschrift für Psychologie (2017), 225(1), 31–44

tasks. Participants choosing the zero-risk option in one task did not necessarily choose it in the other task as well (the same is the case for participants choosing rest-risk). On the one hand, this supports the assumption of domain-specific effects on risk taking (since gambling was always compared with a decision scenario). On the other hand, it also provides evidence that the type of task (free resource allocation vs. decision between two options) also influences the emergence of the bias. Limitations and Future Research As mentioned above, one major limitation of the study is that construal level has not been measured. Building on construal level theory and according to empirical findings we assumed that the gambling tasks triggered different construal levels. However, the two presentation modes could have also acquired varyingly high cognitive load. Future research should test this assumption by explicitly measuring and manipulating construal level and cognitive effort. Another limitation is that most tasks varied between but not within studies. Attribution of task-specific differences to the proposed concepts should be taken with care. However, the present research can provide several new research designs for the zero-risk bias and it could be demonstrated that the occurrence of the bias is task specific. Future research should concentrate on individual differences between tasks. In addition, we did not vary the order of the tasks in Study 1, 2, and 3. Hence, the previous task could have affected responses in the second one. Additional studies should manipulate task order. Note that in Study 2 and 4, power is not that high due to smaller sample sizes. Since there was always a zero-risk effect and since we did rather have an explorative approach, we do not consider this as a too severe limitation. Furthermore, since there are effects of domain on the emergence of the bias, we cannot transfer the findings in the gambling tasks to other domains and we cannot predict the relation between presentation mode and domain. Future research is needed in order to further explore the effects of cognitive load and construal level on risk taking behavior and risky decision making between different domains. In addition, one might argue that the decision scenarios seem rather unrealistic or artificial, since in real life, for example, risk probabilities to die or to be relocated are usually unknown. However, the purpose of the study was to create situations in which the decision in favor of a certain option leads to a more unfavorable outcome (i.e., more overall risk). For this reason we chose to present probabilities in order to better manipulate and compare the amount of overall risk reduction. When comparing effects of framing between verbal and numerical probability presentations, Reyna et al. (2014) could show that framing effects were strongest for verbal probability presentations. This might also be the case Zeitschrift für Psychologie (2017), 225(1), 31–44

E. Schneider et al., Measuring the Zero-Risk Bias

for the zero-risk bias, leading to an even higher number of participants showing the bias than found in this study. Future research is needed to test the bias (a) with different probability presentations and (b) in real life decisions touching other domains than gambling. However, considering, for example, the first aider scenario, this might be very difficult or morally questionable.

Conclusion The aim of our studies was to shed further light on the findings regarding the prevalence of the zero-risk bias. By using different research designs, we found evidence that the bias is rather a decision-making strategy than a methodological artifact and identified three contextual factors that influence the bias: domain, presentation mode (frequencies and abstractness), and decision type (forced choice vs. free allocation). We extended the existing research in identifying these contextual factors, which give possible explanations for the somewhat inconsistent findings regarding the biases prevalence. Future research is needed to explore the relation between these factors and identify their underlying mechanisms. The aim should be to derive strategies that reduce or eliminate the bias, not only for giving us higher chances to win game shows, but to succeed in a better resource allocation, especially with regard to future generations.

References Allais, M. (1953). Le comportement de l’homme rationnel devant le risque: Critique des postulats et axiomes de l’ecole Americaine [The rational individual’s behavior in the presence of risk: Critique of the postulates and axioms of the American school]. Econometrica, 21, 503–546. doi: 10.2307/ 1907921 Amit, E., Algom, D., & Trope, Y. (2009). Distance-dependent processing of pictures and words. Journal of Experimental Psychology: General, 138, 400–415. doi: 10.1037/a0015835 Amit, E., Algom, D., Trope, Y., & Liberman, N. (2008). “Thou shalt not make unto thee any graven image”: The distancedependence of representation. In K. D. Markman, W. M. P. Klein, & J. A. Suhr (Eds.), The handbook of imagination and mental simulation (pp. 53–68). New York, NY: Psychology Press. Baron, J., Gowda, R., & Kunreuther, H. (1993). Attitudes toward managing hazardous waste: What should be cleaned up and who should pay for it? Risk Analysis, 13, 183–192. doi: 10.1111/ j.1539-6924.1993.tb01068.x Betsch, T., & Glöckner, A. (2010). Intuition in judgment and decision making: Extensive thinking without effort. Psychological Inquiry, 21, 279–294. doi: 10.1080/1047840X. 2010.517737 Birnbaum, M. H. (2004). Causes of Allais common consequence paradoxes: An experimental dissection. Journal of Mathematical Psychology, 48, 87–106. doi: 10.1016/j.jmp.2004. 01.001

Ó 2017 Hogrefe Publishing

E. Schneider et al., Measuring the Zero-Risk Bias

Blais, A., & Weber, E. U. (2006). A Domain-Specific Risk-Taking (DOSPERT) scale for adult populations. Judgment and Decision Making, 1, 33–47. Dobelli, R. (2011). Die Kunst des klaren Denkens [The art of thinking clearly]. Munich, Germany: Hanser. Fiedler, K. (1988). The dependence of the conjunction fallacy on subtle linguistic factors. Psychological Research, 50, 123–129. doi: 10.1007/BF00309212 Gigerenzer, G. (1998). Surrogates for theories. Theory & Psychology, 8, 195–204. doi: 10.1177/0959354398082006 Gigerenzer, G. (2002). Calculated risks: How to know when numbers deceive you. New York, NY: Simon & Schuster. Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of Psychology, 62, 451–482. doi: 10.1146/annurev-psych-120709-145346 Gottlieb, D. A., Weiss, T., & Chapman, G. B. (2007). The format in which uncertainty information is presented affects decision biases. Psychological Science, 18, 240–246. doi: 10.1111/ j.1467-9280.2007.01883.x Gilovich, T., Griffin, D., & Kahneman, D. (Eds.). (2009). Heuristics and biases (2nd ed.). New York, NY: Cambridge University Press. Glöckner, A., & Witteman, C. (2010). Beyond dual-process models: A categorisation of processes underlying intuitive judgement and decision making. Thinking & Reasoning, 16, 1–25. doi: 10.1080/13546780903395748 Greenberg, J. & Colquitt, J. A. (Eds.). (2005). Handbook of organizational justice. Mahwah, NJ: Erlbaum. Hastie, R., & Dawes, R. M. (2001). A normative, rational decision theory. In R. Hastie & R. Dawes (Eds.), Rational choice in an uncertain world. The psychology of judgement and decision making (pp. 249–287). Thousand Oaks, CA: Sage. Hertwig, R., & Gigerenzer, G. (1999). The “conjunction fallacy” revisited: How intelligent inferences look like reasoning errors. Journal of Behavioral Decision Making, 12, 275–305. doi: 10.1002/(SICI)1099-0771(199912)12:4<275::AID-BDM323>3.0. CO;2-M Hilbig, B. E., Michalkiewicz, M., Castela, M., Pohl, R. F., & Erdfelder, E. (2015). Whatever the cost? Information integration in memory-based inferences depends on cognitive effort. Memory & Cognition, 43, 659–671. doi: 10.3758/s13421-0140493-z Hsu, M., Krajbich, I., Zhao, C., & Camerer, C. F. (2009). Neural response to reward anticipation under risk is nonlinear in probabilities. The Journal of Neuroscience, 29, 2231–2237. doi: 10.1523/JNEUROSCI.5296-08.2009 Kahneman, D. (2012). Thinking, fast and slow. London, UK: Penguin. Kahneman, D., & Frederick, S. (2009). Representativeness revisited: Attribute substitution in intuitive judgment. In T. Gilovich, D. W. Griffin, & D. Kahneman (Eds.), Heuristics and biases (pp. 49–81). New York, NY: Cambridge University Press. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–292. doi: 10.2307/ 1914185 Keren, G., & Schul, Y. (2009). Two is not always better than one: A critical evaluation of two-system theories. Perspectives on Psychological Science, 4, 533–550. doi: 10.1111/j.1745-6924. 2009.01164.x Keysar, B., Hayakawa, S. L., & An, S. G. (2012). The foreignlanguage effect: Thinking in a foreign tongue reduces decision biases. Psychological Science, 23, 661–668. doi: 10.1177/ 0956797611432178 Knight, F. H. (1921). Risk, uncertainty, and profit. In Hart, Schaffner & Marx Prize Essays, 31. Boston, MA/New York, NY: Houghton Mifflin.

Ó 2017 Hogrefe Publishing

Kruglanski, A. W., & Gigerenzer, G. (2011). Intuitive and deliberate judgments are based on common principles. Psychological Review, 118, 97–109. doi: 10.1037/a0020762 Lermer, E., Streicher, B., Sachs, R., & Frey, D. (2013). How risky? The impact of target person and answer format on risk assessment. Journal of Risk Research, 16, 903–919. doi: 10.1080/13669877.2012.761267 Lermer, E., Streicher, B., Sachs, R., Raue, M., & Frey, D. (2015). The effect of construal level on risk-taking. European Journal of Social Psychology, 45, 99–109. doi: 10.1002/ ejsp.2067 Lermer, E., Streicher, B., Sachs, R., Raue, M., & Frey, D. (2016). Thinking concretely increases the perceived likelihood of risks: The effect of construal level on risk estimation. Risk Analysis, 36, 623–637. doi: 10.1111/risa.12445 Leroy, S. F., & Singell, L. D. Jr. (1987). Knight on risk and uncertainty. Journal of Political Economy, 95, 394–406. doi: 10.1086/ 261461 Loewenstein, G. F., Weber, E. U., Hsee, C. K., & Welch, N. (2001). Risk as feelings. Psychological Bulletin, 127, 267–286. doi: 10.1037/0033-2909.127.2.267 MacCrimmon, K. R., & Wehrung, D. A. (1990). Characteristics of risk taking executives. Management Science, 36, 422–435. doi: 10.1287/mnsc.36.4.422 Petts, J., Horlick-Jones, T., Murdock, G., Hargreaves, D., McLachlan, S., & Löftstedt, R. (2000). Social amplification of risk: The media and the public. London, UK: Health and Safety Executive. Retrieved from: http://www.hse.gov.uk/research/ crr_pdf/2001/crr01329.pdf Raue, M., Streicher, B., Lermer, E., & Frey, D. (2015). How far does it feel? Construal level and decisions under risk. Journal of Applied Research in Memory and Cognition, 4, 256–264. doi: 10.1016/j.jarmac.2014.09.005 Reyna, V. F., Chick, C. F., Corbin, J. C., & Hsia, A. N. (2014). Developmental reversals in risky decision making: Intelligence agents show larger decision biases than college students. Psychological Science, 25, 76–84. doi: 10.1177/ 0956797613497022 Ritov, I., Baron, J., & Hershey, J. C. (1993). Framing effects in the evaluation of multiple risk reduction. Journal of Risk and Uncertainty, 6, 145–159. doi: 10.1016/0167-6687(93)90905-5 Rottenstreich, Y., & Hsee, C. K. (2001). Money, kisses, and electric shocks: On the affective psychology of risk. Psychological Science, 12, 185–190. doi: 10.1111/1467-9280. 00334 Schneider, E., Streicher, B., Lermer, E., Sachs, R., & Frey, D. (2015). The zero-risk bias: Limits of its occurrence and domain specific effects. Manuscript submitted for publication. Schwarz, N. (2009). Feelings as information: Moods influence judgments and processing strategies. In T. Gilovich, D. W. Griffin, & D. Kahneman (Eds.), Heuristics and biases (pp. 543–547). New York, NY: Cambridge University Press. Shah, A. K., & Oppenheimer, D. M. (2008). Heuristics made easy: An effort-reduction framework. Psychological Bulletin, 134, 207–222. doi: 10.1037/0033-2909.134.2.207 Simon, H. (1959). Theories of decision making in economics and behavioural science. The American Economic Review, 49, 253–283. Slovic, P., Finucane, M. L., Peters, E., & MacGregor, D. G. (2004). Risk as analysis and risk as feelings: Some thoughts about affect, reason, risk, and rationality. Risk Analysis, 24, 311–322. doi: 10.1111/j.0272-4332.2004.00433.x Slovic, P., & Peters, E. (2006). Risk perception and affect. Current Directions in Psychological Science, 15, 322–325. doi: 10.1111/ j.1467-8721.2006.00461.x

Zeitschrift für Psychologie (2017), 225(1), 31–44

Soane, E., & Chmiel, N. (2005). Are risk preferences consistent? The influence of decision domain and personality. Personality and Individual Differences, 38, 1781–1791. doi: 10.1016/ j.paid.2004.10.005 Tannert, C., Elvers, H.-D., & Jandrig, B. (2007). The ethics of uncertainty. In the light of possible dangers, research becomes a moral duty. EMBO Reports, 8, 892–896. doi: 10.1038/sj.embor.7401072 Trope, Y., & Liberman, N. (2010). Construal-level theory of psychological distance. Psychological Review, 117, 440–463. doi: 10.1037/a0018963 Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90, 293–315. doi: 10.1037/0033-295X. 90.4.293 Viscusi, W. K., Magat, W. A., & Huber, J. (1987). An investigation of the rationality of consumer valuations of multiple health risks. The RAND Journal of Economics, 18, 465–479. doi: 10.2307/ 2555636 Wakslak, C. J., & Trope, Y. (2008). The who, where, and when of low and high probability events: Probability as distance and everyday decision making. Unpublished manuscript. New York, NY: New York University.

Zeitschrift für Psychologie (2017), 225(1), 31–44

E. Schneider et al., Measuring the Zero-Risk Bias

Wakslak, C., & Trope, Y. (2009). The effect of construal level on subjective probability estimates. Psychological Science, 20, 52–58. doi: 10.1111/j.1467-9280.2008.02250.x Windschitl, P. D., & Wells, G. L. (1996). Measuring psychological uncertainty: Verbal versus numeric methods. Journal of Experimental Psychology: Applied, 2, 343–364. doi: 10.1037/ 1076-898X.2.4.343 Received October 15, 2016 Revision received November 1, 2016 Accepted November 21, 2016 Published online July 12, 2017

Elisabeth Schneider Department of Psychology LMU Munich Leopoldstr. 13 80802 Munich Germany elisabeth.schneider2502@web.de

Ó 2017 Hogrefe Publishing

Original Article

Assessing Suffering in Experimental Pain Models Psychological and Psychophysiological Correlates M. Brunner,1 M. Löffler,1 S. Kamping,1 S. Bustan,2 A. M. González-Roldán,2 F. Anton,2 and H. Flor1 1

Department of Cognitive and Clinical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany

Institute for Health and Behavior, FLSHASE/INSIDE, University of Luxembourg, Luxembourg

Abstract: Although suffering is a central issue in pain, there is only little research on this topic. The aim of this study was to assess suffering in an experimental context using various stimulation methods and durations, and to examine which psychological or psychophysiological measures covary with pain-related suffering. Twenty-one healthy volunteers participated in two experiments in which we used tonic thermal and phasic electric stimuli with short and long stimulus durations. The participants rated pain intensity, unpleasantness, and pain-related suffering on separate visual analog scales (VAS) and completed the Pictorial Representation of Illness and Self Measure (PRISM), originally developed to assess suffering in chronic illness. We measured heart rate, skin conductance responses (SCRs), and the electromyogram (EMG) of the musculus corrugator supercilii. For both heat and electric pain, we obtained high ratings on the suffering scale confirming that suffering can be evoked in experimental pain conditions. Whereas pain intensity and unpleasantness were highly correlated, both scales were less highly related to suffering, indicating that suffering is distinct from pain intensity and unpleasantness. Higher suffering ratings were associated with more pronounced fear of pain and increased private self-consciousness. Pain-related suffering was also related to high resting heart rate, increased SCR, and decreased EMG during painful stimulation. These results offer an approach to the assessment of suffering in an experimental setting using thermal and electric pain stimulation and shed light on its psychological and psychophysiological correlates. Keywords: pain-related suffering, pain intensity, pain unpleasantness, fear of pain, private self-consciousness, PRISM

Suffering has been described as a pronounced state of distress which threatens the physical or psychological integrity of a person through helplessness, loss of control, and concerns about the future (Cassell, 1982, 1999). Pain and suffering are often described by the same words (Cassell, 1982), making it difficult to distinguish between them (Kahn & Steeves, 1996). Loeser (1980) and Loeser and Melzack (1999) defined four pain-related dimensions: nociception, pain, suffering, and pain behaviors – with suffering being the affective response triggered by nociception or an aversive event (e.g., fear, threat, loss). In an attempt to further refine these concepts, Fields (1999) and Price (2000) suggested that pain includes sensory and affective dimensions and differentiated between primary and secondary pain unpleasantness. The latter may be interpreted as painrelated suffering. In contrast to these attempts of subsuming suffering under the affective component of pain, Bustan (in press) proposed that pain-related suffering is a separate and distinct integral component of pain. Ó 2017 Hogrefe Publishing

Several assessment tools have been proposed to assess suffering. Monin and Schulz (2009) suggested a battery of questionnaires assessing affective states or symptoms to fully grasp the psychological aspects of suffering. Buechi and Sensky (1999) and Streffer et al. (2009) used the Pictorial Representation of Illness and Self Measure (PRISM) to evaluate suffering by nonverbal means. In a first experimental pain study, Bustan et al. (2015) induced suffering in the laboratory using mechanical stimulation and showed that unpleasantness and suffering are strongly associated but could be distinguished as two dimensions. We were interested to determine if levels of suffering similar to a clinical context could be elicited as indicated by the PRISM measure and examined its psychological and psychophysiological correlates. We hypothesized that pain-related suffering would be associated with increased heart rate, elevated levels of skin conductance responses (SCRs), and increased corrugator electromyographic activity. Furthermore, we examined how pain-related suffering was associated with Zeitschrift für Psychologie (2017), 225(1), 45–53 DOI: 10.1027/2151-2604/a000279

depression, anxiety, fear of pain, self-consciousness, or catastrophizing.

Methods Participants We investigated 21 healthy, right-handed participants (Mage = 23.76 years, standard deviation (SD) = 3.85; 11 female; see Table 1 in Electronic Supplementary Material, ESM 1, for participant characteristics). All volunteers were screened for somatic disorders or chronic pain syndromes and assessed for mental disorders using the German version of the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders IV (SCID; Wittchen, Zaudig, & Fydrich, 1997). Exclusion criteria were a positive diagnosis for these disorders, heavy smoking (more than 20 cigarettes per day), caffeine intake less than 4 hr prior to the experiment, alcohol consumption in the day prior to the experiment, drug abuse, current medication (contraceptives and thyroxin excluded), pregnancy, and known allergies toward chilies (capsaicin). The participants were invited for two study days, completed written consent for taking part in the study, and received a monetary compensation of €72. The study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of the Medical Faculty Mannheim, Heidelberg University, Germany.

Experimental Design On the two study days, separated by one week, the participants received electrical and heat pain stimulation. The order of the sessions was counterbalanced (for further information, see Figure 1 in ESM 1). The participants were introduced to the different procedures and how to use the visual analog scales (VAS) and the PRISM-rating. For electrical stimulation, we applied an established intraepidermal stimulation method (Inui & Kakigi, 2012) that selectively activates nociceptors. A pair of disposable sensory needle electrodes (20 0.35 mm, active recording area 2 mm2; Natus Europe, Skovlunde, Denmark) was placed intracutaneously at the participant’s left medial trapezius muscle with a distance of 5 mm. The electrical stimuli (stimulation frequency of 2 Hz, 400 V, 2 ms stimulus duration) were applied using a standard electric stimulation device (Constant Current Stimulator, model DS7A; Digitimer, Hertfordshire, UK). For the heat pain stimulation, we used a capsaicin-induced heat hyperalgesia model (Dirks, Petersen, & Dahl, 2003; Madsen, Johnsen, Fuglsang-Frederiksen, Jensen, & Finnerup, 2012): Zeitschrift für Psychologie (2017), 225(1), 45–53

M. Brunner et al., Assessment of Suffering

To pre-sensitize the skin, 2 g of a commercially available capsaicin cream (0.075% capsaicin; Hansaplast ABC Wärme-Creme, Beiersdorf AG, Hamburg, Germany) was applied to the participant’s nondominant medial volar forearm. The cream was rubbed in gently for 3 min and left for 15 min covering a skin area of approximately 8 6 cm. After 15 min, the remaining cream was wiped off. For the following assessments and the experiment, contact heat was delivered at the center of the cream-covered skin by means of a thermal stimulator (Medoc Pathway; Medoc Advanced Medical Systems, Ramat Yishay, Israel) using a 30 30 mm square contact thermode. Prior to the experiment, conduction currents and temperature were individually calibrated for each participant. To examine pain thresholds, the participants were asked to stop the continuously increasing phasic currents (approximately 0.05 mA/s) or continuously increasing temperature (baseline at 32 °C, temperature rise 0.4 °C/s) by clicking a mouse button on the first occasion they experienced the stimulation as painful. To evaluate pain tolerance, the stimulation was gradually increased until the participant could no longer tolerate the painful stimulation. The assessments were repeated four times and the last three trials were averaged to determine pain threshold and tolerance. For the experiment, we used a stimulation intensity at 80% between pain threshold and tolerance. The participants received painful stimuli in three test trials (duration of electrical stimulation 30 s, duration of heat pain stimulation 90 s). After each test trial, the participants rated their pain intensity via a computer screen on a visual analog scale (VAS) of an 800-pixel length with the endpoints no pain and extreme pain. The VAS-scores were transformed to values ranging from 0 to 100. We adapted the stimulation to an intensity rating of approximately 80. During the experiment, the individual current/temperature was held constant. On one day, short phasic electrical stimuli were administered with two different stimulus block durations of 10 (short) and 30 (long) s each, repeated 10 times. The stimulation blocks were separated by blocks without painful stimulation of the same length. Due to technical problems with the stimulation one volunteer was excluded from the analyses. On the other day, heat pain stimulation was applied in 10 blocks each for the short and long stimulus durations and ten rest blocks of the same duration, during which no stimulation was applied. The two stimulus conditions consisted of two different stimulus durations (ramp-and-hold procedures from baseline 32 °C to stimulation temperature) with 30 (short) and 90 (long) s each, repeated 10 times. The order of presentation of the short or long stimulus durations for both stimulation methods was pseudorandomized to control for sequence effects. After every stimulation Ó 2017 Hogrefe Publishing

M. Brunner et al., Assessment of Suffering

Table 1. Partial correlations for the three pain components and the PRISM-rating for N = 20 participants. Pearson’s correlations with p-values reported

Intensity

Unpleasantness

Suffering

PRISM

.837 (p < .001)

.376 (p = .001)

.022 (p = .848)

Unpleasantness

.837 (p < .001)

.150 (p = .181)

.096 (p = .395)

Suffering

.376 (p = .001)

.150 (p = .181)

.391 (p < .001)

PRISM

.022 (p = .848)

.096 (p = .395)

.391 (p < .001)

block, the participants rated the preceding trial on three separate VAS and on a modified, computer-assisted version of PRISM (Buechi & Sensky, 1999; Streffer et al., 2009). The order of the different VAS-ratings and the PRISM-rating was randomized (see Figure 1 in ESM 1). The three VAS represented each of the three pain components: pain intensity, pain unpleasantness, and pain-related suffering with the endpoints no pain/unpleasantness/suffering to extreme pain/unpleasantness/suffering. As recommended by Cassell (1999), we directly asked subjects to rate their suffering experience (“How much do you suffer?”) on the Pain Suffering Scale. PRISM is a visual instrument developed and validated to evaluate suffering in patients (Krikorian, Limonero, & Corey, 2013; Krikorian, Limonero, Roman, Vargas, & Palacio, 2014). The participants used a computer mouse to freely move a red disk representing the painful stimulus (“situation”) in contrast to another fixed yellow circle representing their self (“self”). The instruction was to move the pain source (red disk) the closer to the self (yellow circle) the more the self was affected by the experienced pain. The Euclidian distance between the centers of both circles was transformed to a scale from 0 to 100% indicating an ascending impact of the painful situation on the individual self.

the right hip bone (Jennings et al., 1981). SCR was assessed with two electrodes (TK7, Medizinelektronik Eissner, Wetzlar, Germany) filled with TD 246 electrode gel (PAR Medizintechnik GmbH, Berlin, Germany), which were affixed to the palmar surface of the medial phalanges of the second and third digit (Benedek & Kaernbach, 2010b; Fowles, Christie, Grings, Lykken, & Venables, 1981). The EMG recordings followed the recommendations by Fridlund and Cacioppo (1986). The skin was cleansed with alcohol swabs and abraded lightly using abrasive electrolyte gel (Abralyt 2000, EASYCAP GmbH, Herrsching, Germany). Two Ag-AgCl electrodes (E220N, EASYCAP GmbH, Herrsching, Germany) prefilled with electrode gel (Microlyte Electrode Gel, Coulbourn Instruments, Whitehall, PA) were placed on the inner border of the eyebrow and 1 cm lateral to the first electrode above the brow. The ground electrode was attached on the forehead close to the hairline. Electrode impedance was kept at less than 10 kΩ. The EMG data were band-pass (20–450 Hz) and notch (50 Hz) filtered, rectified and integrated with a time constant of 10 ms. All data were baseline-corrected by subtracting the mean baseline-scores of the corresponding bocks without and with painful stimulation.

Questionnaire Data Psychophysiological Recordings Concurrent with the pain induction, in each experimental session heart rate (HR), skin conductance responses (SCRs), and the electromyographic (EMG) activity of the musculus corrugator supercilii were continuously recorded. All psychophysiological data were recorded with a sampling rate of 1,000 Hz using a BrainAmp ExG amplifier in combination with the Brain Vision Recorder software (Brain Products, Munich, Germany). An adaption phase of 10 min was followed by a baseline assessment of 5 min for all three measures. Heart Rate was recorded using 3-lead chest electrocardiography with foam-ECG-electrodes (ASF40C, Asmuth GmbH, Minden, Germany) applied on the participants’ left lateral sternum at the upper and lower edges of the musculus pectoralis major. The ground electrode was placed on

Ó 2017 Hogrefe Publishing

Depression was evaluated using the German version of the Center for Epidemiologic Studies Depression Scale (Hautzinger & Bailer, 1993). Affective state was assessed using the German version of the Positive and Negative Affect Schedule (PANAS; Krohne, Egloff, Kohlmann, & Tausch, 1996). For trait anxiety, we used the German version of the State-Trait Anxiety Inventory (Laux & Spielberger, 1981). The Fear of Pain Questionnaire (Kröner-Herwig, Gaßmann, Tromsdorf, & Zahrend, 2012) assesses fears about severe, minor, and medical pain with higher scores indicating more fear. The Pain-Related SelfStatements Scale (Flor, Behle, & Birbaumer, 1993) evaluates cognitive aspects of coping with pain and consists of the two subscales Active Coping and Catastrophizing. We assessed perceived controllability using the Multidimensional Locus of Control Scales (Krampen, 1981)

Zeitschrift für Psychologie (2017), 225(1), 45–53

with the subscales belief in chance, powerful others, and perceived internal control. Self-consciousness was evaluated using the Self-Consciousness Scale (Filipp & Freudenberg, 1989) with the subscales private and public self-consciousness, that is, the tendency to attend more to inner, private aspects of the self, versus those that are related to public display and more overt. All questionnaires were completed before the beginning of the study. Immediately before and after the experiment, the participants rated their current arousal and valence using the Self-Assessment Manikin (Bradley & Lang, 1994).

Data Analysis All data were analyzed with the statistical software packages IBM SPSS Statistics 22 (IBM, Armonk, NY) and Matlab (The MathWorks, Natick, MA). The psychophysiological data were segmented and analyzed using BrainVision Analyzer 2 (Brain Products, Munich, Germany). The HR data were examined with the help of Kubios HRV (Biosignal Analysis and Medical Imaging Group, Kuopio, Finland). SCRs were analyzed using continuous decomposition analysis (smoothing = 0.2, grid size = 10, peak level = .01) in Ledalab V3.4.6c (Benedek & Kaernbach, 2010a). To examine the behavioral data, we conducted a repeated-measures analysis of variance (ANOVA) with a 4 2 2 design (component [intensity, unpleasantness, suffering, PRISM] stimulus type [thermal, electric] stimulus duration [short, long]). For the EMG, HR, and SCR data, we used pairedsamples t-tests to check for differences between baselinecorrected blocks without stimulation and with stimulation as well as baseline measures with blocks without and with painful stimulation. Due to a technical problem, we had to exclude 258 of 5,040 individual measures (5.12%). Partial correlations were used to investigate the relationship between the different pain components. We employed two separate hierarchical stepwise linear regression analyses with missing values replaced by means (13.53% for the first, 0.23% for the second analysis) with suffering as a dependent variable and pain intensity and unpleasantness as covariates. For the first linear regression the baselines and the baseline-corrected mean of blocks without and with painful stimulation of the EMG, HR, and SCR data were used as independent variables. For the second model we entered depression, mood (positive, negative), trait anxiety, fear of pain (severe, minor, medical, total), self-consciousness (private and public), pain-related self-statements (catastrophizing and coping), and locus of control (internal, powerful other, chance) as independent variables. Statistical significance was considered at p < .05 (two-tailed).

Zeitschrift fĂźr Psychologie (2017), 225(1), 45â&#x20AC;&#x201C;53

M. Brunner et al., Assessment of Suffering

Results Stimulation Methods We found high pain-related suffering ratings for both the painful heat conditions (Mshort = 66.78, SD = 30.51, Mlong = 67.24, SD = 28.43) and the painful electric stimulation (Mshort = 59.8; SD = 29.28, Mlong = 62.68, SD = 31.03). Figure 1 shows an overview of the mean ratings for all four pain conditions for the three VAS-ratings and PRISM. While we found no significant effect between electric and thermal stimulation (method), there was a significant differentiation between short versus long stimulus duration, F(1, 19) = 9.419, p = .006, with long stimulus duration showing higher ratings. We observed no significant interaction effect between stimulation method and stimulus duration. We found no significant differences between the thermal and electric stimulation, which was performed on different days, for depression, t(19) = 1.308; p = .206, positive, t(19) = 1.624; p = .121, and negative mood, t(19) = 0.401; p = .693, see Table 1 in ESM 1, emotional valence pre, t(19) = 1.100; p = .285, and post, t(19) = 0.578; p = .570, arousal pre, t(19) = 0.208; p = .837, and post, t(19) = 0.080; p = .937. We found no significant differences for valence and arousal levels before and after the experiment for either electric [valence: t(19) = 0.484, p = .634; arousal: t(19) = 0.227, p = .823] or thermal stimulation [valence: t(19) = 1.410, p = .175; arousal: t(19) = 1.234, p = .232]. Table 1 in ESM 1 shows the main demographic and experimental data.

Interrelationship of Suffering and Other Pain-Related Variables Since the stimulation types did not yield significant differences, we merged the data sets of both stimulation methods for the subsequent analyses. Due to the high positive correlation between pain intensity and unpleasantness (Geissner, 1996; Weiss, 1996), we computed partial correlations to remove interaction effects by controlling for the two remaining pain and the PRISM-ratings. The intensity and unpleasantness components showed a high positive correlation even when pain-related suffering and PRISM were partialed out (r = .837, p < .001). However, we found significantly lower partial correlations between intensity and pain-related suffering (r = .376, p < .001) and unpleasantness and pain-related suffering (r = .15, p = .181). The partial correlations between intensity and unpleasantness were significantly higher (z = 2.52, p = .006) compared to the partial correlations between intensity and pain-related suffering.

Ă&#x201C; 2017 Hogrefe Publishing

M. Brunner et al., Assessment of Suffering

Figure 1. Subjects’ visual analog scale ratings (linearly transformed to range from 0 to 100) for all four stimulus conditions. Means, standard errors of the mean, and significant differences (p-values, determined by paired-samples t test) with effect sizes (Cohen’s d, in parentheses) are depicted.

While there was no significant association between the PRISM-ratings and either intensity (r = .022, p = .848) or unpleasantness (r = .096, p = .395), we found a moderately high partial correlation between the PRISM and the painrelated suffering ratings (r = .391, p < .001). Table 1 depicts the partial correlations of the three pain components and the PRISM-ratings.

without, t(57) = 4.074, p < .001, and with painful stimulation, t(57) = 8.762, p < .001. For SCR no significant differences between the conditions were found [all t(20) < 1.91, all p > .07]; see Figure 2C. Baseline values were significantly lower compared to the blocks with painful stimulation, t(61) = 2.615, p = .011. There was no significant difference between baseline and the blocks without stimulation, t(61) = .091, p = .928.

Psychophysiological Data For an overview of the psychophysiological data, see Table 1 in ESM 1. The EMG data showed a significant difference between blocks without and with painful stimulation for short, t(20) = 3.126, p = .005, and long, t(20) = 3.036, p = .007, thermal stimulation as well as for long electric stimulation, t(19) = 3.038, p = .007, with blocks with painful stimulation having higher values. We found no significant difference between blocks without and with painful stimulation for short electric stimulation, t(19) = 1.202, p = .244; see Figure 2A. Baseline measurements were significantly lower compared to the blocks without, t(79) = 8.995, p < .001, and with painful stimulation, t(79) = 4.064, p < .001. The heart rate data showed an orienting response with significantly higher heart rates in without stimulation, similar to those reported in other studies (Berntson, Boysen, & Cacioppo, 1991). In detail, short electric stimulation, t (16) = 4.896, p < .001, and long, t(16) = 3.886, p = .001, as well as short thermal stimulation, t(16) = 2.761, p = .014, and long, t(16) = 3.654, p = .002, revealed significant differences between blocks with painful and without stimulation with painful stimulation showing a reduction of heart beats per minute (BPM); see Figure 2B. Baseline measurements were significantly higher compared to the blocks Ó 2017 Hogrefe Publishing

Relationship of Psychophysiological Variables and Pain-Related Suffering For the psychophysiological variables [total R2 = .44, F(4, 77) = 15.101, p < .001], baseline HR (variance explained 22.75%), pain-related SCR (variance explained 4.58%), and pain-related EMG levels (variance explained 4.93%) significantly (all t > 2.397, all p < .019) account for pain-related suffering after controlling for pain intensity and pain unpleasantness. For the psychological variables [total R2 = .438, F(3, 78) = 20.29, p < .001], pain-related suffering was significantly (all t > 4.371, all p < .001) associated with fear of minor pain (variance explained 19.36%) and private selfconsciousness (variance explained 16%) after controlling for pain intensity and pain unpleasantness. For an overview of the different coefficients, see Table 2.

Discussion The purpose of this study was to assess pain-related suffering in healthy subjects and to examine its psychological and psychophysiological correlates. Zeitschrift für Psychologie (2017), 225(1), 45–53

(A)

M. Brunner et al., Assessment of Suffering

Table 2. Coefficients for pain-related suffering for both regression analyses Model Physiology (1)

Questionnaires (2)

R2 (total) .44

.438

Coefficients HR (baseline)

β .477

EMG (painful stimulation)

.222

SCR (painful stimulation)

.214

FPQ minor

.440

SCS private

.400

Notes. HR (baseline) = baseline heart rate; EMG (painful stimulation) = corrugator electromyographic activity during pain stimulation; SCR (painful stimulation) = skin conductance response during pain stimulation; FPQ = fear of pain questionnaire; SCS = self-consciousness scale.

(B)

(C)

stimulation. Threshold levels and stimulation intensities for electric as well as thermal stimulation were in line with previous findings (Arendt-Nielsen, Andersen, & Jensen, 1996; Buchgreitz, Egsgaard, Jensen, Arendt-Nielsen, & Bendtsen, 2008; Dirks et al., 2003; Neddermeyer, Flühr, & Lötsch, 2008). Contrary to our expectation, we observed that tonic heat stimulation on capsaicin-prepared skin as well as phasic electric stimulation equally led to high suffering ratings. This complements our previous findings indicating that pain-related suffering can be elicited by tonic mechanical stimulation (Bustan et al., 2015). Thermal stimulation led to significantly higher pain intensity ratings. However, the stimulation method (thermals vs. electric) did not affect the unpleasantness, suffering, and PRISM-ratings. Longer stimulus presentation led to higher intensity, unpleasantness, and PRISM-ratings. Moreover, we found moderate levels of suffering indicated by PRISM, consistent with the values reported in other studies using PRISM in patient populations (Krikorian, Limonero, Vargas, & Palacio, 2013; Streffer et al., 2009). Thus, the experimentally induced suffering experience was comparable to the suffering of pain patients.

Relationship Between Pain Components Figure 2. Scores of the baseline-corrected psychophysiological responses for the different short and long blocks with no stimulation (without stim.) and painful stimulation (painful stim.) for both thermal and electric stimuli. (A) Activity of the m. corrugator supercilii (EMG), data depicted in millivolt (mV); (B) heart rate (HR), data depicted in beats per minute (BPM); (C) skin conductance response (SCR), data depicted in microsiemens (μS). Means, standard errors of the mean, and significant differences (p-values, determined by paired-samples t test) with effect sizes (Cohen’s d, in parentheses) are depicted.

Induction of Pain-Related Suffering in a Laboratory Setting It was possible to induce pain-related suffering in an experimental context using painful thermal and electrical Zeitschrift für Psychologie (2017), 225(1), 45–53

We found high positive correlations between pain intensity and unpleasantness, which is in line with theoretical assumptions and previous empirical studies (Geissner, 1996; Stein & Mendl, 1988). These reported associations between intensity and unpleasantness seem to depend largely on the setting and background of the study and the subjects investigated. In our study, we used rather painful but tolerable stimulation leading to positively correlated high intensity and unpleasantness ratings in healthy adults. PRISM is a validated tool to assess suffering in patients with chronic pain (Kassardjian, Gardner-Nix, Dupak, Barbati, & Lam-McCullock, 2008; Krikorian et al., 2013; Streffer et al., 2009). In our experiment, we found a moderate positive correlation between pain-related suffering and Ó 2017 Hogrefe Publishing

M. Brunner et al., Assessment of Suffering

PRISM, indicating overlapping features but some conceptual differences between both measures. A similar correlation between suffering VAS and PRISM has been reported in a study investigating cancer patients (Krikorian et al., 2013). These results suggest that our painful stimulation led to high suffering experiences and had a moderate impact on the integrity of the subjects’ self. Therefore, our findings are in line with Cassell’s (1999) hypothesis that an actual or potential threat for the integrity of a person is important for the experience of suffering.

Suffering as a Distinct Third Component of the Pain Experience Using phasic electric and tonic thermal stimulation, we found very low correlations between pain-related suffering and unpleasantness as well as PRISM and unpleasantness suggesting that unpleasantness and pain-related suffering are independent constructs processed and interpreted in separate ways. Similar findings were reported in our previous investigation of pain-related suffering using phasic and tonic mechanical stimulation (Bustan et al., 2015). Field’s and Price’s theoretical framework (Fields, 1999; Price, 2000) suggested that an acute and short-lasting “primary unpleasantness” is essentially coupled with stimulus intensity and a longer-lasting “secondary unpleasantness/suffering” is associated with higher order processing and therefore should have a lower correlation to it. We could experimentally confirm Field’s and Price’s hypothesis: having found a significant difference between the high positive correlation between pain intensity and unpleasantness (r = .831) and the moderate correlation between pain intensity and pain-related suffering (r = .338). With regard to the model mentioned before, unpleasantness would cover the immediate emotional reaction to a nociceptive input via autonomic arousal while pain-related suffering – independent from the first emotional impression – would involve memory and cognitive appraisal associated with the actual pain experience.

Psychological and Psychophysiological Correlates of Pain-Related Suffering In our analyses, pain-related suffering was associated with higher baseline heart rate and increased SCR levels as well as decreased EMG activity during painful stimulation phases. High resting heart rate has been associated with low heart rate variability, which has been viewed as an indicator of deficient adaptation to environmental challenge and lack of inhibitory control (Thayer & Lane, 2009). A high SCR response can be viewed as an expression of high arousal in the situation of painful stimulation and Ó 2017 Hogrefe Publishing

has been associated with fear of pain (Glombiewski et al., 2015). In contrast to other reports on pain and aversive emotions, where high corrugator activity has been observed, high suffering was related to lower EMG levels potentially indicative of a lack of control (Lindström, Mattsson-Marn, Golkar, & Olsson, 2013). To what extent this is a change that is specific for suffering needs to be determined in further work. Furthermore, pain-related suffering was related to increased levels of fear of pain and increased private selfconsciousness. Importantly, the fear of pain scores for severe pain (M = 36.62), minor pain (M = 19.67), and the total score (M = 81.71) of participants reporting high painrelated suffering ratings corresponded to the mean scores of chronic pain patients (severe pain = 37.1, minor pain = 19.2, total = 79.7) of the norming sample of the Fear of Pain Questionnaire III (McNeil & Rainwater, 1998). Healthy subjects with increased fear of severe (e.g., breaking a leg) or minor injuries (e.g., biting your tongue) or fear of pain in general experience more suffering when facing painful but tolerable stimuli. Previous studies have shown that fear of pain plays an important role in the experience of pain and behavioral outcomes such as avoidance or escape responses in healthy subjects as well as in pain patients (McNeil et al., 2001; McNeil & Rainwater, 1998; Osman, Breitenstein, Barrios, Gutierrez, & Kopper, 2002; Vlaeyen et al., 1999). Elevated levels of fear of pain lead to restrictions, disabilities, and distress (McCracken, Faber, & Janeck, 1998; McCracken, Gross, Sorg, & Edmands, 1993) – and therefore may indicate pain-related suffering having an impact on our daily lives. According to Buss (2001), private self-consciousness relates to a tendency to introspect by heightening the sensitivity to experienced feelings and imaging oneself. Suffering may force the individual to introspect and focus on the internal self as suggested by Bustan (in press) and could be demonstrated in the present study. Interestingly, we did not find any associations between pain-related suffering and depression or anxiety contrary to what has been suggested by several authors (Cassell, 1999; Ferrell & Coyle, 2008; Loeser & Melzack, 1999; Monin & Schulz, 2009). However, it must be noted that we examined a healthy sample and that the findings in patients might differ.

Limitations Only healthy volunteers with no severe injuries or traumata participated in our study. It could be hypothesized that the participants’ experience of pain-related suffering during the painful stimulation is substantially different from a patient suffering from a chronic disease. However, the moderately high PRISM-ratings and the correlation between the pain-related suffering rating and PRISM similar to Zeitschrift für Psychologie (2017), 225(1), 45–53

studies in chronic pain syndromes suggest that there are overlapping features. Additionally, ethical limitations led us to conduct our experiment in healthy participants prior to implementing the suffering dimension into research on chronic pain patients. Future investigations should aim to apply our approach to investigations in clinical patient groups.

Conclusions Providing a model to study pain-related suffering in a controlled and reproducible environment using phasic electric and tonic thermal pain stimulation, we could identify pain-related suffering as an independent component of the pain experience in addition to pain intensity and unpleasantness. Furthermore, we found that fear of pain and private self-consciousness is specifically associated with pain-related suffering. Since pain-related suffering seems to be a factor independent of unpleasantness and may cover unique qualities, we suggest to additionally assess that dimension in future pain studies. Acknowledgments This work was supported by grants from the Deutsche Forschungsgemeinschaft (FL 156/34-1) and the Fonds National de la Recherche Luxembourg (PASCOM). M. Brunner and M. Löffler contributed equally to this work. The authors declare that they have no conflict of interest. Electronic Supplementary Material The electronic supplementary material is available with the online version of the article at http://dx.doi.org/10.1027/ 2151-2604/a000279 ESM 1. Text (.docx). Table (participant characteristics and experimental data) and Figure (pictorial representation of the experimental procedure).

References Arendt-Nielsen, L., Andersen, O. K., & Jensen, T. S. (1996). Brief, prolonged and repeated stimuli applied to hyperalgesic skin areas: A psychophysiological study. Brain Research, 712, 165–167. doi: 10.1016/0006-8993(95)01529-9 Benedek, M., & Kaernbach, C. (2010a). A continuous measure of phasic electrodermal activity. Journal of Neuroscience Methods, 190, 80–91. doi: 10.1016/j.jneumeth.2010.04.028 Benedek, M., & Kaernbach, C. (2010b). Decomposition of skin conductance data by means of nonnegative deconvolution. Psychophysiology, 47, 647–658. doi: 10.1111/j.1469-8986. 2009.00972.x

Zeitschrift für Psychologie (2017), 225(1), 45–53

M. Brunner et al., Assessment of Suffering

Berntson, G. G., Boysen, S. T., & Cacioppo, J. T. (1991). Cardiac orienting and defensive responses: Potential origins in autonomic space. In B. A. Campbell (Ed.), Attention and information processing in infants and adults (pp. 163–200). Hillsdale, NJ: Erlbaum. Bradley, M. M., & Lang, P. J. (1994). Measuring emotion: The selfassessment manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychiatry, 26, 49–59. doi: 10.1016/0005-7916(94)90063-9 Buchgreitz, L., Egsgaard, L. L., Jensen, R., Arendt-Nielsen, L., & Bendtsen, L. (2008). Abnormal pain processing in chronic tension-type headache: A high-density EEG brain mapping study. Brain Research, 131, 3232–3238. doi: 10.1093/brain/ awn199 Buechi, S., & Sensky, T. (1999). PRISM: Pictorial representation of illness and self measure: A brief nonverbal measure of illness impact and therapeutic aid in psychosomatic medicine. Psychosomatics, 40, 314–320. doi: 10.1016/S0033-3182(99) 71225-9 Buss, A. (2001). Psychological dimensions of the self. Thousand Oaks, CA: Sage. Bustan, S. (in press). The pain and suffering lived matrix. In S. Bustan (Ed.), Fundamental transdisciplinary questions on pain and suffering. New York, NY: Springer. Bustan, S., Gonzalez-Roldan, A. M., Kamping, S., Brunner, M., Loeffler, M., Flor, H., & Anton, F. (2015). Suffering as an independent component of the experience of pain. European Journal of Pain, 19, 1035–1048. doi: 10.1002/ejp.709 Cassell, E. J. (1982). The nature of suffering and the goals of medicine. The New England Journal of Medicine, 306, 639–645. doi: 10.1056/NEJM198203183061104 Cassell, E. J. (1999). Diagnosing suffering: A perspective. Annals of Internal Medicine, 131, 531–534. doi: 10.7326/0003-4819-1317-199910050-00009 Dirks, J., Petersen, K., & Dahl, J. (2003). The heat/capsaicin sensitization model: A methodological study. The Journal of Pain, 4, 122–128. doi: 10.1054/jpai.2003.10 Ferrell, B., & Coyle, N. (2008). The nature of suffering and the goals of nursing. New York, NY: Oxford University Press. Fields, H. L. (1999). Pain: An unpleasant topic. Pain, 82, 61–69. doi: 10.1016/S0304-3959(99)00139-6 Filipp, S. H., & Freudenberg, E. (1989). Fragebogen zur Erfassung dispositionaler Aufmerksamkeit (SAM-Fragebogen) [Questionnaire for the assessment of self-consciousness]. Göttingen, Germany: Hogrefe. Flor, H., Behle, D. J., & Birbaumer, N. (1993). Assessment of painrelated cognitions in chronic pain patients. Behavior Research and Therapy, 31, 63–73. doi: 10.1016/0005-7967(93)90044-U Fowles, D. C., Christie, M. J., Grings, W. W., Lykken, D. T., & Venables, P. H. (1981). Publication recommendations for electrodermal measurements. Psychophysiology, 18, 232–239. doi: 10.1111/j.1469-8986.2012.01384.x Fridlund, A. J., & Cacioppo, J. T. (1986). Guidelines for human electromyographic research. Psychophysiology, 23, 567–589. doi: 10.1111/j.1469-8986.1986.tb00676.x Geissner, E. (1996). Schmerzempfindungs-Skala (SES) [Pain Perception Scale]. Göttingen, Germany: Hogrefe. Glombiewski, J. A., Riecke, J., Holzapfel, S., Rief, W., Konig, S., Lachnit, H., & Seifart, U. (2015). Do patients with chronic pain show autonomic arousal when confronted with feared movements? An experimental investigation of the fear-avoidance model. Pain, 156, 547–554. doi: 10.1097/01.j.pain.0000460329. 48633.ce Hautzinger, M., & Bailer, M. (1993). Allgemeine Depressions Skala [Center of Epidemiological Studies – Depression Scale]. Göttingen, Germany: Beltz.

Ó 2017 Hogrefe Publishing

M. Brunner et al., Assessment of Suffering

Inui, K., & Kakigi, R. (2012). Pain perception in humans: Use of intraepidermal electrical stimulation. Journal of Neurology, Neurosurgery, and Psychiatry, 83, 551–556. doi: 10.1136/ jnnp-2011-301484 Jennings, J. R., Berg, W. K., Hutcheson, J. S., Obrist, P., Porges, S., & Turpin, G. (1981). Publication guidelines for heart rate studies in man. Psychophysiology, 18, 226–231. doi: 10.1111/j.14698986.1981.tb03023.x Kahn, D. L., & Steeves, R. H. (1996). The experience of suffering: Conceptual clarification and theoretical definition. Journal of Advanced Nursing, 11, 623–631. doi: 10.1111/j.1365-2648. 1986.tb03379.x Kassardjian, C. D., Gardner-Nix, J., Dupak, K., Barbati, J., & Lam-McCullock, J. (2008). Validating PRISM (pictorial representation of illness and self measure) as a measure of suffering in chronic non-cancer pain patients. The Journal of Pain, 9, 1135–1143. doi: 10.1016/j.jpain.2008.06.016 Krampen, G. (1981). IPC-Fragebogen zur Kontrollüberzeugung [Questionnaire for locus of control]. Göttingen, Germany: Hogrefe. Krikorian, A., Limonero, J. T., & Corey, M. T. (2013). Suffering assessment: A review of available instruments for use in palliative care. Journal of Palliative Medicine, 16, 130–142. doi: 10.1089/jpm.2012.0370 Krikorian, A., Limonero, J. T., Roman, J. P., Vargas, J. J., & Palacio, C. (2014). Predictors of suffering in advanced cancer. American Journal of Hospice & Palliative Medicine, 31, 534–542. doi: 10.1177/1049909113494092 Krikorian, A., Limonero, J. T., Vargas, J. J., & Palacio, C. (2013). Assessing suffering in advanced cancer patients using pictorial representation of illness and self-measure (PRISM), preliminary validation of the Spanish version in a Latin American population. Support Care Cancer, 21, 3327–3336. doi: 10.1007/ s00520-013-1913-5 Krohne, H. W., Egloff, B., Kohlmann, C. W., & Tausch, A. (1996). Investigations with a German version of the “positive and negative affect schedule” (PANAS). Diagnostica, 42, 139–156. Kröner-Herwig, B., Gaßmann, J., Tromsdorf, M., & Zahrend, E. (2012). The effects of sex and gender role on responses to pressure pain. Psychosocial Medicine, 9, 1–10. doi: 10.3205/ psm000079 Laux, L., & Spielberger, C. D. (1981). Das State-TraitAngstinventar: STAI [State-Trait Anxiety Inventory (STAI)]. Weinheim, Germany: Beltz. Lindström, B. R., Mattsson-Marn, I. B., Golkar, A., & Olsson, A. (2013). In your face: Risk of punishment enhances cognitive control and error-related activity in the corrugator supercilii muscle. PLoS One, 8, e65692. doi: 10.1371/journal.pone. 0065692 Loeser, J. D. (1980). Perspectives on pain. In P. Turner (Ed.), Proceedings of the first world congress on clinical pharmacology and therapeutics (pp. 316–326). London, UK: Macmillan. Loeser, J. D., & Melzack, R. (1999). Pain: An overview. The Lancet, 353, 1607–1609. doi: 10.1016/S0140-6736(99)01311-2 Madsen, C. S., Johnsen, B., Fuglsang-Frederiksen, A., Jensen, T. S., & Finnerup, N. B. (2012). Increased contact heat pain and shortened latencies of contact heat evoked potentials following capsaicin-induced heat hyperalgesia. Clinical Neurophysiology, 123, 1429–1436. doi: 10.1016/j.clinph.2011.11.032 McCracken, L. M., Faber, S. D., & Janeck, A. S. (1998). Painrelated anxiety predicts non-specific physical complaints in persons with chronic pain. Behaviour Research and Therapy, 36, 621–630. doi: 10.1016/S0005-7967(97)10039-0 McCracken, L. M., Gross, R. T., Sorg, P. J., & Edmands, T. A. (1993). Prediction of pain in patients with chronic low back pain: Effects of inaccurate prediction and pain-related anxiety.

Ó 2017 Hogrefe Publishing

Behaviour Research and Therapy, 31, 647–652. doi: 10.1016/ 0005-7967(93)90117-D McNeil, D. W., Au, A. R., Zvolensky, M. J., McKee, D. R., Klineberg, I. J., & Ho, C. C. K. (2001). Fear of pain in orofacial pain patients. Pain, 89, 245–252. doi: 10.1016/S0304-3959(00) 00368-7 McNeil, D. W., & Rainwater, A. J. (1998). Development of the fear of pain questionnaire-III. Journal of Behavioral Medicine, 21, 389–410. doi: 10.1023/A:1018782831217 Monin, J. K., & Schulz, R. (2009). Interpersonal effects of suffering in older adult caregiving relationships. Psychology and Aging, 24, 681–695. doi: 10.1037/a0016355 Neddermeyer, T. J., Flühr, K., & Lötsch, J. (2008). Principle components analysis of pain thresholds to thermal, electrical, and mechanical stimuli suggests a predominant common source of variance. Pain, 138, 286–291. doi: 10.1016/j.pain. 2007.12.015 Osman, A., Breitenstein, J., Barrios, F., Gutierrez, P., & Kopper, B. (2002). The fear of pain questionnaire-III: Further reliability and validity with nonclinical samples. Journal of Behavioral Medicine, 25, 155–173. doi: 10.1023/a:1014884704974 Price, D. D. (2000). Psychological and neural mechanisms of the affective dimension of pain. Science, 288, 1769–1772. doi: 10.1126/science.288.5472.1769 Stein, C., & Mendl, G. (1988). The German counterpart to McGill pain questionnaire. Pain, 32, 251–255. doi: 10.1016/0304-3959 (88)90074-7 Streffer, M.-L., Buechi, S., Mörgeli, H., Galli, U., & Ettlin, D. (2009). PRISM (pictorial representation of illness and self measure): A novel visual instrument to assess pain and suffering in orofacial pain patients. Journal of Orofacial Pain, 23, 140–146. doi: 10.5167/uzh-25322 Thayer, J. F., & Lane, R. D. (2009). Claude Bernard and the heartbrain connection: Further elaboration of a model of neurovisceral integration. Neuroscience and Behavioral Reviews, 33, 81–88. doi: 10.1016/j.neubiorev.2008.08.004 Vlaeyen, J. W. S., Seelen, H. A. M., Peters, M., de Jong, P., Aretz, E., Beisiegel, E., & Weber, W. E. J. (1999). Fear of movement/(re)injury and muscular reactivity in chronic low back pain patients: An experimental investigation. Pain, 82, 297–304. doi: 10.1016/S0304-3959(99)00054-8 Weiss, L. (1996). Modulation des Schmerzerlebens und-verhaltens durch Entspannung – Einfluss der Art der Entspannungsinduktion und des Einsatzzeitpunktes der Entspannung [Modulation of pain experience and behavior by relaxation – the influence of relaxation induction type and timing]. Dissertation, HeinrichHeine-Universität Düsseldorf, Düsseldorf, Germany. Wittchen, H.-U., Zaudig, M., & Fydrich, T. (1997). SKID-I. Strukturiertes Klinisches Interview für DSM-IV [Structured clinical interview for DSM-IV axis I disorders]. Göttingen, Germany: Hogrefe. Received October 20, 2016 Revision received November 6, 2016 Accepted November 13, 2016 Published online July 12, 2017 Herta Flor Department of Cognitive and Clinical Neuroscience Central Institute of Mental Health Medical Faculty Mannheim University of Heidelberg Square J5 68159 Mannheim Germany herta.flor@zi-mannheim.de

Zeitschrift für Psychologie (2017), 225(1), 45–53

Original Article

Is the Implicit Association Test for Aggressive Attitudes a Measure for Attraction to Violence or Traumatization? Matthias Bluemke,1,2 Anselm Crombach,3 Tobias Hecker,4 Inga Schalinski,3 Thomas Elbert,3 and Roland Weierstall5 Department of Survey Design and Methodology, GESIS – Leibniz-Institute for the Social Sciences, Mannheim, Germany

1 2

Psychological Institute, University of Heidelberg, Germany

Department of Psychology, University of Konstanz, Germany

Department of Psychology, Bielefeld University, Germany

MSH Medical School Hamburg, University of Applied Sciences and Medical University, Hamburg, Germany

Abstract: Traumatic exposure is particularly devastating for those who, at a young age, have become combatants or experienced massive adversity after abduction by armed movements. We investigated the impact of traumatic stressors on psychopathology among war-affected young men of Northern Uganda, including former child soldiers. Adaptation to violent environments and coping with trauma-related symptoms often result in an increasing appetite for violence. We analyze implicit attitudes toward violence, assessed by an Implicit Association Test (IAT), among 64 male participants. Implicit attitudes varied as a function of the number of experienced traumatic event types and committed offense types. As the number of traumatic experiences and violence exposure increased, more appetitive aggression was reported, whereas the IAT indicated increasingly negative implicit attitudes toward aggression. The IAT was also the strongest predictor of cortisol levels. Diffusionmodel analysis was the best way to demonstrate IAT validity. Implicit measures revealed the trauma-related changes of cognitive structures. Keywords: posttraumatic stress disorder, appetitive aggression, Implicit Association Test (IAT), combatants

The 21st century “modern warfare” is often characterized by violence between rivaling groups within state borders and the recruitment of children as soldiers. Children are involved as active fighters in over 75% of the world’s armed conflicts (Twum-Danso, 2003). The UN estimates that up to 300,000 child soldiers – that is, any person under the age of 18 years who is part of armed forces – are currently fighting in 50 different states around the world (Singer, 2001). Over a decade, Uganda’s Lord Resistance Army (LRA), a military movement operating in Northern Uganda and adjacent countries, has forcibly abducted over 12,000 children to turn them into soldiers for their fight against the Ugandan government. The mental health consequences of being exposed to extreme violence have been extensively investigated: In line with a dose-response effect, repeated exposure to different traumatic event types increases the risk for trauma spectrum disorders, foremost Posttraumatic stress disorder (PTSD) and affective disorders, such as Major Depression Zeitschrift für Psychologie (2017), 225(1), 54–63 DOI: 10.1027/2151-2604/a000281

(Schauer & Elbert, 2010). The dysregulation of the hypothalamus-pituitary-adrenal (HPA) axis is a key feature of a range of trauma-related symptoms (de Kloet, Joëls, & Holsboer, 2005). The HPA axis describes a set of interactions between the hypothalamus, the pituitary gland, and the adrenal gland, which results in the release of its effector cortisol. For example, patients with a history of chronic traumatization displayed altered cortisol responses after exposure to acute stressors (Heim et al., 2000). Consequently, serious mental problems can be found among child soldiers as well (Betancourt, Brennan, Rubin-Smith, Fitzmaurice, & Gilman, 2010). A study with former child soldiers in northern Uganda suggested that PTSD in severely traumatized individuals who continue to live under stressful conditions might be associated with general hypercortisolism (Steudte et al., 2011). Many of juvenile war survivors not only suffer from PTSD and/or depressive symptoms, but also show an increased level of aggressive and disruptive behavior (Shaw, 2003). Aggressive behavior Ó 2017 Hogrefe Publishing

M. Bluemke et al., Implicit Association Test in Child Soldiers

in the aftermath of trauma exposure has often been traced back to increased anger or a general hyperarousal as one of the trauma symptoms (e.g., Jakupcak et al., 2007). However, there is also another form of aggressive behavior that is not linked to merely increased impulsivity or defensive responses toward threats. This aggression subtype, called appetitive aggression, can rather be found in combatants and perpetrators of serious atrocities. It describes the phenomenon that violence is perceived as self-rewarding, appealing, and exciting without being linked to pathological aggressive behavior (Elbert, Weierstall, & Schauer, 2010). Whereas a moral perspective insists that aggression in most cases poses a problem for the human species that needs to be solved, contemporary trends in aggression research also consider its functionality and adaptive value (Duntley & Buss, 2011). In line with such a conception, we demonstrated that appetitive aggression facilitates the adaptation to an adverse environment and may serve as a resilience factor for the development of trauma symptoms, by altering the processing of violence cues (e.g., Hecker, Hermenau, Maedl, Schauer, & Elbert, 2013; Weierstall, Huth, Knecht, Nandi, & Elbert, 2012; Weierstall, Schaal, Schalinski, Dusingizemungu, & Elbert, 2011). Instead of eliciting traumatic processing of violence cues (e.g., triggering fear), events, such as witnessing serious atrocities or being in life threat during combat, rather lose their frightening connotation. In its place, combatants develop an attraction to participate in cruel acts; they experience violence more and more as selfrewarding. Joining military forces at a young age and perpetrating violence intensify this appetite for aggression (Hecker, Hermenau, Maedl, Elbert, & Schauer, 2012; Nandi, Crombach, Bambonye, Elbert, & Weierstall, 2015). Unsurprisingly, the assessment of appetitive aggression with questionnaires is plagued with two major difficulties: First, acknowledging being attracted to cruelty and enjoying violent behavior is usually a social taboo and may result in shame or criminal prosecution – maybe even more so in crisis regions such as Rwanda that have just recently been pacified (Weierstall et al., 2011). Secondly, not everyone is aware of his or her own affect in violent situations, and most likely some parts of our affective responses to aggressive behavior are beyond our capacity for conscious perception or remain unconscious due to self-censorship (Bluemke & Teige-Mocigemba, 2015; Bluemke & Zumbach, 2012). To mitigate these difficulties, indirect methods for assessing aggression rely on associative impulses rather than deliberate reflection. Implicit measures assess attitudes, for instance, via speeded-classification tasks. Assessing appetitive aggression via automatic associations would effectively reduce blatant desirable responding in a socially sanctioned domain. Crucially, one may detect impulsive precursors of appetitive aggression that the participants Ó 2017 Hogrefe Publishing

cannot introspectively access or find hard to deliberately control (Strack & Deutsch, 2004; Nosek & Smyth, 2007). Considering that the conflict in Northern Uganda has lessened since 2006, individuals’ attitudes toward aggressive behavior are most likely buried under social norms, though they still have a strong influence on people’s behavior. Implicit measures of appetitive aggression should allow an objective view on the impact of traumatic experiences on people’s attitudes toward violence (Richetin, Richardson, & Mason, 2010). The Implicit Association Test (IAT) is the most reliable implicit measure based on response latencies (Greenwald, McGhee, & Schwartz, 1998; LeBel & Paunonen, 2011), and the only currently available implicit measurement procedure to successfully assess aggressiveness (Bluemke & Teige-Mocigemba, 2015; Richetin & Richardson, 2008). For instance, Banse, Messer, and Fischer (2015) reported significant correlations between the IAT and observers’ aggression ratings. The IAT incrementally predicted unprovoked aggression too, at least for people high in trait aggression (Bluemke & Friese, 2012; Brugman et al., 2015). Of note, the evidence is limited to a specific variant – the self-concept IAT. The present study was part of a larger project (for other aspects of this project, see Crombach, Weierstall, Hecker, Schalinski, & Elbert, 2013; Weierstall, Schalinski, Crombach, Hecker, & Elbert, 2012); in this part, we administered an attitudinal IAT that is less frequently used in the domain of aggression (cf. Gray, MacCulloch, Smith, Morris, & Snowden, 2003). The aim of the study was to investigate the relationships of attitudinal IATs to trauma-related symptoms (i.e., PTSD symptoms, depressive symptoms) and appetitive aggression in a highly unique sample of Ugandan war-affected youth including former child soldiers. We hypothesized that, on one hand, traumatic experiences and trauma-related symptoms in war-affected young men from Northern Uganda would rather be related to negative associations of violence. On the other hand, if appetitive aggression were directly related to automatic associations, this might pull associations of violence more toward the favorable side. As a physiological validation criterion, we used cumulative hair cortisol as a physiological marker of long-lasting stress due to previous traumatic experiences. Similar relationships had been observed with Ugandan child soldiers before (Steudte et al., 2011).

Method Participants Sixty-four (out of 83) male Ugandans provided valid IAT data (some participants had to be dropped due to Zeitschrift für Psychologie (2017), 225(1), 54–63

difficulties with the computer-based task; see “Computation of IAT effects” below). Out of these 64 men, 29 had been abducted by the LRA, so that they spent between 2 days and 12 years in the bush (median = 7 months; M = 2.13 years, SD = 3.20). On average, they were 11.45 years old when abducted. They had returned from the bush on average 8.74 years ago (SD = 4.14; range: 1–17). The other 35 participants had also experienced the war in Northern Uganda, but not been abducted. Consequently, they had not stayed as child soldiers in the bush (few hours of abduction, if any). Participant groups were nearly the same age (M = 21.31 years, SD = 2.48 vs. M = 21.54 years, SD = 2.51).

Procedure and Materials Assessments were conducted individually in a private setting in a camp for internally displaced people in Pabbo, Northern Uganda (September–October 2009). Four clinical psychologists carried out semi-structured interviews with the help of five local interpreters, who had been trained in the concepts of mental disorders and aggression for 2 months (Ertl et al., 2010). All questionnaires were translated into local language, Acholi, using back-and-forth translations. After the semi-structured interview, participants took the IAT on a laptop computer. For many participants it was the first time to use or see a laptop computer. The Ethical Review board of the University of Konstanz and the Uganda National Council for Science and Technology had approved this study as part of a larger project on the impact of combat exposure on mental health in former child soldiers. All participants (alternatively two caretakers) gave their informed consent. Participants received a financial compensation of 4,000 Ugandan Shilling (US $1.50) for the 2.5-hr assessment. Traumatic Event Types Traumatic event types were indexed by a checklist of 34 war- and non-war-related potentially life-threatening events such as injury by weapon, rape, accidents (Neuner et al., 2004), including those from the Posttraumatic Diagnostic Scale (Foa, Cashman, Jaycox, & Perry, 1997; see below). The number of times a specific event had been experienced was not assessed; measuring event types provides an accurate and practical measure of trauma experiences (Wilker et al., 2015). We initially distinguished event types they had experienced themselves from those they had witnessed (Neuner et al., 2004). On average, the participants reported 16.38 (range: 5–28) different traumatic event types altogether. Previously collected data from Uganda showed that the event list had high test-retest reliability (r = .73), significant accordance with the Composite Zeitschrift für Psychologie (2017), 225(1), 54–63

M. Bluemke et al., Implicit Association Test in Child Soldiers

International Diagnostic Interview (CIDI) Event List (Ertl et al., 2010), and correlated with cortisol in hair as indicator of chronic stress (Steudte et al., 2011). PTSD Symptom Severity The validated Acholi version of the widely used Posttraumatic Diagnostic Scale (PDS) in its interview form (Ertl et al., 2010; Foa et al., 1997) assessed PTSD symptom severity. Each of the 17 items corresponds to one PTSD symptom specified in DSM-IV with ratings ranging from 0 (= “never”) to 3 (= “5 times per week or more/very severe/ nearly always”). Participants evaluated the severity of PTSD symptoms in the past four weeks with regard to their most stressful life event. Depression The short version of the Hopkins Symptom Checklist (HSCL; Derogatis, Lipman, Rickels, Uhlenhuth, & Covi, 1974) was used to assess the extent of depression. It is a valid screening instrument for 15 depression symptoms, available for use in many languages. It is also suitable for samples of refugees in post-conflict countries including Uganda (cf. Pfeiffer & Elbert, 2011; Roberts, Ocaka, Browne, Oyok, & Sondorp, 2008; Vinck, Pham, Stover, & Weinstein, 2007; Winkler et al., 2015). The respective HSCL section can also be used with a locally validated depression cut-off (Ertl et al., 2010). Cortisol Following Steudte et al. (2011), we used cumulative hair cortisol as a known physiological correlate of stress due to habitually elevated cortisol levels, for instance, due to reexperiencing traumatic episodes. Close-to-the-scalp hair strands of up to 3 cm length were taken (posterior vertex position) to estimate the cumulative cortisol exposure across the previous 15 weeks or less (Loussouarn, 2001). In line with the procedure reported by Davenport, Tiefenbacher, Lutz, Novak, and Meyer (2006) and Kirschbaum, Tietze, Skoluda, and Dettenborn (2009), first the hair strands have to be washed to remove contaminants; then cortisol molecules are extracted chemically from the hair by so-called steroid extraction, and then an immunoassay with chemiluminescence detection is run to quantify the cortisol concentration (CLlA, IBL-Hamburg, Germany). Number of Offense Types Committed We assessed aggressive behavior via the number of different types of committed offenses (OFF), perpetrated either individually or as part of a group. The checklist of 17 different offense types ranged from physical assault to rape or killings. Each offense type was coded as 1 (= committed) or 0 (= not committed), and the total score represented the number of different offense types committed. Ó 2017 Hogrefe Publishing

M. Bluemke et al., Implicit Association Test in Child Soldiers

General Aggression The Buss and Perry Aggression Questionnaire (BPAQ; Buss & Perry, 1992) serves as a reliable and valid quasi-standard index of overall aggressiveness (e.g., Collani & Werner, 2005). Participants evaluated on 5-point Likert scales of 16 culturally adapted items how much they agreed or disagreed with statements that relate to facets of physical aggression, verbal aggression, hostility, and anger. Appetitive Aggression The Appetitive Aggression Scale (AAS; Weierstall & Elbert, 2011) is a relatively recent measure that has been validated with over 1,600 ex-combatants and has shown good psychometric properties. Its 15 items assess participants’ perceptions while committing acts of violence (e.g., “Is it exciting for you if you make an opponent really suffer?” or “Once fighting has started do you get carried away by the violence?”), rated on 5-point Likert scales (0 = disagree; 4 = agree; range: 0–51). Previous analyses showed that the scale sum score is reliable (α = .85) and represents a distinct construct of human aggression (32% of variance explained by a first factor; Weierstall & Elbert, 2011). Implicit Association Test (IAT) Procedure The attitudinal IAT is a computer-based measure for the assessment of attitudes that are otherwise prone to social desirability (Greenwald et al., 1998). Based on objective response latencies, it aims to measure participants’ automatic associations between the concepts of violence (violent vs. peaceful) and valence (good vs. bad). The procedure requires participants to sort stimuli presented in random order according to one of four categories, and with the help of only two response keys. Following Gray and colleagues (2003), in one crucial block of trials evaluatively compatible categories are mapped onto shared response keys (“violent +bad,” “peaceful+good”). Another block combines the same categories in an incompatible manner (“violent +good,” “peaceful+bad”). In both blocks the stimuli have to be categorized by pressing the appropriate left or right response key as quickly as possible without error. Necessary cultural adaptations of the stimulus materials included, first, representing the categories on the computer screen with symbols and, second, using pictures instead of words as test stimuli. “Violent” and “peaceful” were represented by a fist and a handshake, “good” and “bad” by a thumb up and a thumb down, all symbols meeting cultural validity. Violent and peaceful stimuli, matched for authorship, background scenes, and complexity, were 10 pairs of pencil drawings that reflected typical Ugandan social situations. They were selected out of a pool of 40 drawings according to their potential to best represent violence and its counter-category as rated by 11 Ugandan men. Ó 2017 Hogrefe Publishing

Valence stimuli were photos of 10 happy and 10 angry Ugandan men unbeknown to our participants, with the same male individuals displaying the respective emotions. The procedure was programmed in PsyScope X Build 53 (Cohen, MacWhinney, Flatt, & Provost, 1993). The setup encompassed seven blocks of speeded classification (Greenwald, Nosek, & Banaji, 2003). Apart from typical practice blocks (20 trials each), one critical block assessed the normatively incompatible association of violent with good, while a compatible block coupled violent with bad (40 trials each). Counterbalancing order of block compatibility and left/right position of categories did not challenge any of the following conclusions. Keyboard buttons were marked accordingly to facilitate responses. Computation of IAT Effects Higher scores reflected more positive implicit attitudes toward violence, yet, just like the procedural setup, the computation of IAT effects required adjustments to participants’ education level. None of the participants had ever used a computer, and over half of them had attended school for less than 4 years (primary school). To deal with the low computer literacy, (a) three different algorithms for computing the IAT effect were compared, while (b) procedural adjustments to the typical computation of these algorithms were necessary. As regards the algorithms, we computed (1) IAT effects as conventional difference scores based on the mean reaction times of correct responses in compatible and incompatible blocks (IAT-RT effects), after dropping latencies outside a response window from 300 to 3,000 ms (Greenwald et al., 1998); (2) an improved algorithm, so-called IAT-D scores (Greenwald et al., 2003), which typically converts wrong responses arbitrarily into latencies by adding latency penalties; its advantage over the algorithm for IAT-RT effects is that it reduces nuisance variance by individually standardizing an individual’s difference score by his/her pooled standard deviation; (3) finally, IAT-DM effects on the basis of diffusion-model analysis, extracting speed of information processing in compatible and incompatible blocks by means of simultaneously modeling response latencies of accurate and inaccurate responses (Klauer, Voss, Schmitz, & TeigeMocigemba, 2007). Given that the young men had low education levels and difficulties when working on the IAT, a substantial number of errors occurred. Diffusion-model analysis is ideally suited to incorporate all information available also from erroneous trials while minimizing the influences of decision bias and the duration of general non-decisional (executive) processes. As regards the necessary procedural adjustments for the computation of the three algorithms in light of the present sample, we first had to exclude participants with apparent limitations in motivation or cognitive skills (n = 19), sparing Zeitschrift für Psychologie (2017), 225(1), 54–63

M. Bluemke et al., Implicit Association Test in Child Soldiers

Table 1. Intercorrelations Measure IAT effect (RT)

IAT (RT) IAT (D) IAT (DM) CORT T-witn. –

.97***

T-total

PDS

HSCL

OFF

BPAQ

AAS

.63***

.14

.20

.25*

.11

.09

.17

.06

.08

.70***

.16

.23†

.27*

.28*

.11

.12

.20

.02

.12

–

.16

.25*

.27*

.28*

.18

.21†

.29*

.04

.12

.17

.22†

–

.03

.00

.01

.07

.08

.15

.07

.22†

.30*

.04

.67***

.87***

.31*

.25*

.57***

.35**

.39**

.26*

.31*

.00

.66***

.95***

.55*

.42*

.74***

.39**

.58***

.50*

.39*

.74***

.41**

.55***

.53***

.39**

.48***

.46***

–

IAT effect (D)

.97***

IAT effect (DM)

.69***

.75***

Cortisol (nmol/L; CORT)

.14

Traumatic events witnessed (T-witn.)

.19 †

T-exp.

–

Traumatic events experienced (T-exp.)

.24

Traumatic events total (T-total)

.24†

.27*

.33**

.01

.86***

.95***

PTSD Symptom Severity (PDS)

.10

.11

.24†

.06

.30*

.54***

– .49***

–

Depression (HSCL)

.11

.13

.19

.07

.30*

.46***

.44***

.57***

–

.30*

.49***

.35**

Offenses committed (OFF)

.16

.20

.35**

.09

.57***

.74***

.38***

.33**

–

.43***

.66***

Buss & Perry Aggression (BPAQ)

.06

.02

.05

.15

.35**

.39**

.41***

.49***

.50***

.43***

Appetitive Aggression (AAS)

.07

.10

.20

.07

.38**

.58***

.55***

.47***

.37**

.67***

– .55***

.54*** –

Notes. Above diagonal: N = 64; below diagonal: N = 62 without misfitting participants according to diffusion-model analysis. N = number of participants; IAT = Implicit Association Test; RT = reaction-time based IAT analysis; D = improved scoring algorithm based IAT analysis; DM = diffusion-model based IAT analysis; CORT = cortisol; T-witn. = traumatic event types witnessed; T-exp. = traumatic event types experienced; T-total = traumatic event types total; PDS = Posttraumatic Diagnostic Scale; HLSC = Hopkins Symptom Checklist; OFF = number of offense types committed; BPAQ = Buss & Perry Aggression Questionnaire; AAS = Appetitive Aggression Scale. †p .10; *p .05; **p .01; ***p .001.

64 participants. The excluded participants had obvious difficulties to follow task instructions promptly: They responded more than +1 SD more slowly (> 1,774 ms) than the typical respondent (M = 1,472 ms, SD = 329); and they had more than 50% incorrect responses or response latencies that fell out of the 300–3,000 ms response window. For them less than half of valid data existed, rendering the IAT procedure itself invalid, the computation of IAT effects unreliable, and the imputation of missing data unfeasible (cf. Nosek & Smyth, 2007). Second, although it is standard nowadays to include the double-discrimination practice blocks (3 and 6) in the computation of IAT effects, these practice blocks had very low criterion correlations. As participants had severe difficulties to adjust to the unfamiliar technical equipment, the unknown procedure, and the complex IAT tasks, all three algorithms were based exclusively on the trials of the main critical IAT blocks (4 and 7). As regards the computation of standard IAT-RT effects (algorithm 1), the substantial number of implausibly short and long latencies (outside the 300–3,000 ms interval) prevented us from recoding slow/fast outliers to the boundary values; instead, we accepted these values as missing data. With regard to the computation of IAT-D effects (algorithm 2), we did not apply the proposed latency penalties for erroneous responses, because, in our sample, errors mostly reflected the difficulty to follow task instructions and were not indicative of proper associative processes (following a suggestion by Bluemke & Zumbach, 2012). Given that we faced a massive loss of data and potentially limited skills among the retained participants, valid IAT effects are most likely to be expected on the basis of Zeitschrift für Psychologie (2017), 225(1), 54–63

diffusion-model analysis (IAT-DM effects; algorithm 3). Information diffusion models (Ratcliff, 1978) are suitable to analyze data from binary choice tasks (e.g., IATs). They allow estimating the performance in information processing in the critical IAT blocks in a cognitively process-pure fashion (via the so-called drift rate, v, representing the speed of the information uptake in either compatible or incompatible block). A full account is beyond the scope of the present paper (Voss, Nagler, & Lerche, 2013, provide a simple and quick overview; for a full account of diffusion modeling applied to IATs, see Klauer et al., 2007). The conventional IAT-RT effect is a blend of information drift rates, nondecision components, and speed-accuracy settings; the diffusion-model based IAT-DM effect is based on the parameter of interest. We used the free software Fast-dm with Kolmogorov-Smirnov method for the estimation of parameters (Voss & Voss, 2007), allowing response latencies from slow responders between 300 and 5,000 ms, as they can be accommodated by diffusion models (Voss et al., 2013). Fast-dm provides chi-square distributed goodness-of-fit tests for each participant to check whether the assumptions of the diffusion model hold. As the decision-making process of two participants did not comply with the diffusion-model assumptions, they were excluded (compare Table 1 values above and below diagonal).

Results Table 2 summarizes the descriptives of the present set of variables. As a preliminary analysis, we describe the impact of traumatizing experiences, beginning with the differences Ó 2017 Hogrefe Publishing

M. Bluemke et al., Implicit Association Test in Child Soldiers

Table 2. Descriptives of trauma-related variables (N = 64) Non-abducted (n = 35) Min IAT effect (RT)

Max.

Abducted (n = 29) M

1,023.40

876.28

395.17

356.45

374.17

386.98

498.32

IAT effect (D)

2.21

0.85

0.78

0.80

0.88

1.09

273.41 0.60

IAT effect (DM)

5.81

0.95

1.53

1.30

1.37

1.38

1.71

1.20

Cortisol (nmol/L; CORT)

0.02

48.43

13.57

8.01

12.83

6.97

14.46

9.17

Traumatic events witnessed (T-witn.)

8.64

2.41

7.49

2.13

10.03

1.96

Traumatic events experienced (T-exp.)

7.73

3.94

5.11

1.89

10.90

3.42

Traumatic events total (T-total)

16.38

5.84

12.60

3.35

20.93

4.87

PTSD Symptom Severity (PDS)

4.86

5.15

3.77

5.02

6.17

5.09

Depression (HSCL)

9.36

8.63

8.00

7.93

11.00

9.29

Offenses committed (OFF)

5.61

3.93

4.06

2.26

7.48

4.67

Buss & Perry Aggression (BPAQ)

23.06

12.52

21.94

13.47

24.41

11.35

Appetitive Aggression (AAS)

17.70

14.34

13.91

11.32

22.28

16.35

Notes. N = number of participants. IAT = Implicit Association Test; RT = reaction-time based IAT analysis; D = improved scoring algorithm based IAT analysis; DM = diffusion-model based IAT analysis; CORT = cortisol; T-witn. = traumatic event types witnessed; T-exp. = traumatic event types experienced; T-total = traumatic event types total; PDS = Posttraumatic Diagnostic Scale; HLSC = Hopkins Symptom Checklist; OFF = number of offense types committed; BPAQ = Buss & Perry Aggression Questionnaire; AAS = Appetitive Aggression Scale.

between abducted and non-abducted Ugandan men. In total, abducted participants had witnessed or experienced almost twice as many traumatic event types than non-abducted ones, t(62) = 8.08, p < .001, Cohen’s d = 2.05. According to PDS scores, they also reported higher PTSD symptom severity than non-abducted participants, though not to an extent that the dose-response effect would suggest, t(62) = 1.89, p = .06, d = 0.48. They hardly differed in reported symptoms of depression (HSCL scores), t(62) = 1.39, p = .17, d = 0.23. Rather than being generally more aggressive (BPAQ scores), t < 1, p = .44, they displayed characteristically higher appetitive aggression (AAS scores), t(62) = 2.41, p = .02, d = 0.61. As evident from the intercorrelations (Table 1) the number of traumatic event types correlated with the degree of traumatization, depression scores, and aggression scores across all war-affected young men. Exposure to traumatic experiences was positively related to PTSD symptom severity, and at the same time to appetitive aggression. Note that BPAQ and AAS did not merely represent the same kind of aggressiveness: the number of committed offense types (OFF) was more closely associated with appetitive aggression (ASS) than with general aggression (BPAQ). This underscores the motivational-affective quality of appetitive aggression, which theoretically serves as an instigator of aggressive acts and, at the same time, as a coping mechanism for trauma. IAT scores were reliable (split-half correlations roddeven = .88–.93). The three IAT algorithms converged strongly, albeit imperfectly. Diffusion modeling handled the noise in the data of young Ugandan men best, because Ó 2017 Hogrefe Publishing

– despite similar reliability estimates – the sophisticated IAT-DM effect yielded consistently the highest criterion correlations. Note that cortisol levels were not significantly associated with any self-report measure (|r| = .00–.15). If there was a measure that reflected the stressful experiences encapsulated in hair strands best, it was the IAT (r = .22, p = .08), yet only when IAT scores were freed from as much error variance as possible (IAT-DM scores), and only so in the analysis of participants for whom the diffusion-model analysis fitted (N = 62; see Table 1 below diagonal). Implicit attitudes toward violence were only weakly related to PTSD symptom severity (PDS). Yet, participants’ traumatic encounters predicted negative implicit attitudes. The more exposure to traumatic events had occurred – and to self-experienced events in particular – the more negative were the associations toward violence (IAT scores). At the same time, a higher number of traumatic event types correlated with higher self-reported general and appetitive aggression. Likewise, the number of committed offenses correlated negatively with implicit attitudes toward violence (IAT scores), whereas it correlated positively with participants’ self-reported aggression. An additional analysis of partial correlations showed that the relationships between IAT and explicit measures were not simply masked due to group differences between abducted and non-abducted men. Rather the relationships of IAT scores with general and appetitive aggression scores became weaker. (The relationships between IAT and trauma-related variables were, of course, slightly attenuated after partialling out relevant group differences.) Zeitschrift für Psychologie (2017), 225(1), 54–63

Discussion The current research contributes to our understanding of the relationship between experiencing and perpetrating violence, particularly for those who were forced to grow up in war zones. Implicit attitudes toward violence were unrelated to an explicit measure of general aggression (BPAQ). Yet, for the first time, we demonstrated that indirect measurement techniques can reveal how traumatic events imprint on young men’s associative structures: Higher exposure to traumatic experiences and violent offenses in the past were associated with more negative implicit associations toward violence. At the same time, in line with our expectations, general aggressiveness and specifically appetitive aggression were positively associated with violent experiences and committed offenses, supporting previous findings about the validity of appetitive aggression as a construct (Nandi et al., 2015; Weierstall, Huth, et al., 2012). Whereas the extent of traumatic events predicted negative implicit attitudes, it also predicted positive explicit attitudes (appetitive aggression). Very revealing – and unlike what one might assume on the basis of typically positive implicit-explicit relationships (Hofmann, Gawronski, Gschwendner, Le, & Schmitt, 2005; Nosek, 2005; Nosek & Smyth, 2007) – admitting to being high in appetitive aggression (i.e., positive attitudes toward violence) was not reflected in positive associations toward violence according to the IAT. In other words, participants high in appetitive aggression did not have more favorable implicit associations toward violence; if anything they tended to have negative ones (r = .20; Table 1). Such a negative implicit-explicit correlation is an atypical finding in the domain of implicit attitudes (e.g., Hofmann et al., 2005). We attribute this to the special character of our sample and the experiences among young Ugandan men. This dissociation might be highly relevant for understanding appetitive aggression as a coping mechanism and inspires further research. Note, however, that appetitive aggression and traumatic events themselves were correlated substantially, presumably because appetitive aggression serves an adaptive function in violent environments (see Weierstall, Huth, et al., 2012; Weierstall, Schalinski et al., 2012). In a crosssectional design, the mere variance overlap between appetitive aggression and traumatic events prevents a clear-cut answer whether appetitive aggression indeed alters implicit violence associations into a positive direction. Of course, implicit attitudes can be subject to confluent dynamic processes: developing more negative implicit attitudes toward violence in response to traumatic experiences, and developing more positive implicit attitudes as a protective mechanism that prevents one from becoming overly traumatized by one’s own atrocities. Zeitschrift für Psychologie (2017), 225(1), 54–63

M. Bluemke et al., Implicit Association Test in Child Soldiers

We interpret the overall pattern in line with research on the consequences of trauma, where a substantial alteration in the processing of violence has been described (e.g., Elbert & Schauer, 2002). Cues that trigger previous traumatic experiences typically activate a fear-network in the brain, thereby provoking a fear-response and breeding the clinical symptoms of trauma-related disorders such PTSD, including hyperarousal and repetitive patterns of reliving traumatic memories, and depression. This reaction is rooted in altered brain mechanisms and physiological correlates such as cortisol responses as shown in altered cortisol responses of different, severely traumatized samples including child soldiers (Heim et al., 2000; Steudte et al., 2011). We propose that the same mechanism is responsible for building up appetitive aggression in parallel: Despite a general tendency to predominantly associate violence with bad, the change of explicit attitudes toward experiencing aggression as appetitive must be rooted in the same pathways responsible for the altered processing of violence cues. The accelerated connection between violence and bad in those who experienced more traumatic events points to newly formed connections in the brain of traumatized people who acted as perpetrators (Elbert et al., 2011). Perpetrators perceive violence differently from victims. Elbert et al. (2010) proposed that the violence-related association network competes with the fear-network. The more appetite for violence perpetrators develop, thereby experiencing feelings of control, power, fascination, and lust while harming or killing someone, the more become cues (such as blood, pain, and screams) integrated in a positive association network while having reduced impact on the fear-network. At first glance it may seem surprising that those who reported significantly more appetitive aggression did not have more positive implicit attitudes toward violence. Considering, however, that the IAT measures a general attitude toward violent situations, whereas the AAS specifically assesses how self-committed violence is perceived, the seeming contradiction dissolves. Note that the IAT stimuli, unlike the AAS items, were ambiguous with regard to who the perpetrator is. From a test-taker’s perspective the IAT did not differentiate between perpetrating and victimizing situations. Child soldiers very often live in insecure and violent environments and are exposed to many traumatic events in active and passive ways (Hecker et al., 2012). That formerly abducted participants had experienced more traumatic events than the non-abducted ones may be responsible for the relationship between negative implicit attitudes and higher cortisol levels; but at the same time their appetitive aggression counteracted the stress they experienced. It is possible that the attraction to violence among our participants decreased over time. The time spent in civil Ó 2017 Hogrefe Publishing

M. Bluemke et al., Implicit Association Test in Child Soldiers

society after returning from the bush amounted to more than 8 years on average. It is conceivable that any observed relationships represent only lower bounds for estimated strengths (see also Hermenau, Hecker, Schaal, Maedl, & Elbert, 2013). But this also shows how profound effects are and how persistently individuals are affected after engaging in violence and committing atrocities. Our findings highlight the importance of considering appetitive aggression when reintegrating former child soldiers into society.

It provides a perspective on the usefulness of computerized assessment of psychological measures beyond clinical interviews and self-rating instruments. Yet, the attitudinal IAT was rather related to traumatic repercussions than aggression. Only psychological interventions that consider the fundamentally altered processing of violence cues (e.g., Hecker, Hermenau, Crombach, & Elbert, 2015) will help the successful reintegration of war-affected youth into civil societies.

Limitations

References

There are some caveats in our research. Some researchers (including some of us) have voiced concerns about the scientific merits of implicit attitudes toward violence and aggression (Bluemke & Zumbach, 2012). One would usually expect positive evidence for IAT validity if the procedure were based on the aggressive self-concept, not on attitudes toward aggression (Bluemke & Teige-Mocigemba, 2015). Yet, in the present case, our participants had been subject to the strongest life events we can think of – unlike, say, participants from the normal population who are merely exposed to virtual computer game violence. Intense forces may form a precondition to render the implicit attitude toward violence a reliable and valid indicator in the domain of aggression (cf. Gray et al., 2003). Furthermore, not only was our sample limited to young men, who had little education. On the one hand, this may have limited participants’ capacity to control associative impulses during the IAT task, fostering validity of the attitudinal IAT. On the other hand, the typical IAT procedure was less feasible than usual; instead, adjustments in the procedure were required, affecting the stimulus selection, the task labels, and the data analytic strategy. It is unclear to what extent our findings generalize to other samples. Another limitation is small sample size. Our sample contained participants who are difficult to recruit, former abductees as well as war-affected, yet not formerly abducted youth. Computing refined analyses separately for each group is not feasible, as the correlations for such small group sizes (Ns 35) would not be reliable. It is also impossible to control for any moderating impact of subtle group differences on the obtained correlation coefficients (interaction terms in regression models).

Banse, R., Messer, M., & Fischer, I. (2015). Predicting aggressive behavior with the Aggressiveness-IAT. Aggressive Behavior, 41, 65–83. doi: 10.1002/ab.21574 Betancourt, T. S., Brennan, R. T., Rubin-Smith, J., Fitzmaurice, G. M., & Gilman, S. E. (2010). Sierra Leone’s former child soldiers: A longitudinal study of risk, protective factors, and mental health. Journal of the Academy of Child and Adolescent Psychiatry, 49, 606–615. doi: 10.1016/j.jaac.2010.03.008 Bluemke, M., & Friese, M. (2012). On the validity of idiographic and generic self-concept Implicit Association Tests: A core concept model. European Journal of Personality, 26, 515–528. doi: 10.1002/per.850 Bluemke, M., & Teige-Mocigemba, S. (2015). Automatic processes in aggression: Conceptual and measurement issues. Aggressive Behavior, 41, 44–50. doi: 10.1002/ab.21567 Bluemke, M., & Zumbach, J. (2012). Assessing aggressiveness via reaction times online. Cyberpsychology: Journal of Psychosocial Research on Cyberspace, 6, 5–. doi: 10.5817/CP2012-1-5 Brugman, S., Lobbestael, J., Arntz, A., Cima, M., Schuhmann, T., Dambacher, F., & Sack, A. T. (2015). Identifying cognitive predictors of reactive and proactive aggression. Aggressive Behavior, 41, 51–64. doi: 10.1002/ab.21573 Buss, A. H., & Perry, M. (1992). The aggression questionnaire. Journal of Personality and Social Psychology, 63, 452–459. doi: 10.1037/0022-3514.63.3.452 Collani, G., & Werner, R. (2005). Self-related and motivational constructs as determinants of aggression. An analysis and validation of a German version of the Buss-Perry Aggression Questionnaire. Personality and Individual Differences, 38, 1631–1643. doi: 10.1016/j.paid.2004.09.027 Cohen, J., MacWhinney, B., Flatt, M., & Provost, J. (1993). PsyScope: An interactive graphical system for designing and controlling experiments in the psychology laboratory using Macintosh computers. Behavioral Research Methods, Instrumentation, and Computers, 25, 257–271. Crombach, A., Weierstall, R., Hecker, T., Schalinski, I., & Elbert, T. (2013). Social status and the desire to resort to violence – Using the example of Uganda’s former child soldiers. Journal of Aggression, Maltreatment, and Trauma, 22, 559–575. doi: 10.1080/10926771.2013.785458 Davenport, M. D., Tiefenbacher, S., Lutz, C. K., Novak, M. A., & Meyer, J. S. (2006). Analysis of endogenous cortisol concentrations in the hair of rhesus macaques. General and Comparative Endocrinology, 147, 255–261. doi: 10.1016/j.ygcen.2006. 01.005 de Kloet, E. R., Joëls, M., & Holsboer, F. (2005). Stress and the brain: From adaptation to disease. Nature Reviews Neuroscience, 6, 463–475. doi: 10.1038/nrn1683 Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H., & Covi, L. (1974). The Hopkins Symptom Checklist (HSCL):

Conclusion The present study demonstrated not only the methodological difficulties, but also the feasibility of applying implicit measures in field studies with populations that have handled neither computers nor reaction-time tasks before. Ó 2017 Hogrefe Publishing

Zeitschrift für Psychologie (2017), 225(1), 54–63

A self-report symptom inventory. Behavioral Science, 19, 1–15. doi: 10.1002/bs.3830190102 Duntley, J. D., & Buss, D. M. (2011). Homicide adaptations. Aggression and Violent Behavior, 16, 399–410. doi: 10.1016/ j.avb.2011.04.016 Elbert, T., & Schauer, M. (2002). Burnt into memory. Nature, 419, 883. doi: 10.1038/419883a Elbert, T., Schauer, M., Ruf, M., Weierstall, R., Neuner, F., Rockstroh, B., & Junghöfer, M. (2011). The tortured brain: Imaging neural representations of traumatic stress experiences using RSVP with affective pictorial stimuli. The Journal of Psychology, 219, 167–174. doi: 10.1027/21512604/a000064 Elbert, T., Weierstall, R., & Schauer, M. (2010). Fascination violence – On mind and brain of man hunters. European Archives of Psychiatry and Clinical Neuroscience, 260, 100–105. doi: 10.1007/s00406-010-0144-8 Ertl, V., Pfeiffer, A., Saile, R., Schauer, E., Elbert, T., & Neuner, F. (2010). Validation of a mental health assessment in an African conflict population. Psychological Assessment, 22, 318–324. doi: 10.1037/a0018810 Foa, E. B., Cashman, L., Jaycox, L., & Perry, K. (1997). The validation of a self-report measure of posttraumatic stress disorder: The Posttraumatic Diagnostic Scale. Psychological Assessment, 9, 445–451. doi: 10.1037/1040-3590.9.4.445 Gray, N. S., MacCulloch, M. J., Smith, J., Morris, M., & Snowden, R. J. (2003). Forensic psychology: Violence viewed by psychopathic murderers. Nature, 423, 497–498. doi: 10.1038/ 423497a Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition: The Implicit Association Test. Journal of Personality and Social Psychology, 74, 1464–1480. doi: 10.1037/0022-3514.74.6.1464 Greenwald, A. G., Nosek, B. A., & Banaji, M. R. (2003). Understanding and using the Implicit Association Test: I. An improved scoring algorithm. Journal of Personality and Social Psychology, 85, 197–216. doi: 10.1037/0022-3514.85.2.197 Hecker, T., Hermenau, K., Crombach, A., & Elbert, T. (2015). Treating traumatized offenders and veterans by means of Narrative Exposure Therapy. Frontiers in Psychiatry, 6, 80. doi: 10.3389/fpsyt.2015.00080 Hecker, T., Hermenau, K., Maedl, A., Elbert, T., & Schauer, M. (2012). Appetitive aggression in former combatants – Derived from the ongoing conflict in DR Congo. International Journal of Law and Psychiatry, 35, 244–249. doi: 10.1016/j.ijlp. 2012. 02.016 Hecker, T., Hermenau, K., Maedl, A., Schauer, M., & Elbert, T. (2013). Aggression inoculates against PTSD symptom severity – Insights from armed groups in the eastern DR Congo. European Journal of Psychotraumatology, 4, 20070. doi: 10.3402/ejpt. v4i0.20070 Heim, C., Newport, D. J., Heit, S., Graham, Y., Wilcox, M., . . . Nemeroff, C. B. (2000). Pituitary-adrenal and autonomic responses to stress in women after sexual and physical abuse in childhood. The Journal of the American Medical Association, 284, 592–597. doi: 10.1001/jama.284.5.592 Hermenau, K., Hecker, T., Schaal, S., Maedl, A., & Elbert, T. (2013). Addressing post-traumatic stress and aggression by means of narrative exposure – A randomized controlled trial with ex-combatants in the eastern DRC. Journal of Aggression, Maltreatment and Trauma, 22, 916–934. doi: 10.1080/ 10926771.2013.824057 Hofmann, W., Gawronski, B., Gschwendner, T., Le, H., & Schmitt, M. (2005). A meta-analysis on the correlation between the Implicit Association Test and explicit self-report measures. Personality and Social Psychology Bulletin, 31, 1369–1385. doi: 10.1177/0146167205275613

Zeitschrift für Psychologie (2017), 225(1), 54–63

M. Bluemke et al., Implicit Association Test in Child Soldiers

Jakupcak, M., Conybeare, D., Phelps, L., Hunt, S., Holmes, H. A., Felker, B., . . . McFall, M. E. (2007). Anger, hostility, and aggression among Iraq and Afghanistan war veterans reporting PTSD and subthreshold PTSD. Journal of Traumatic Stress, 20, 945–954. doi: 10.1002/jts.20258 Kirschbaum, C., Tietze, A., Skoluda, N., & Dettenborn, L. (2009). Hair as a retrospective calendar of cortisol production: Increased cortisol incorporation into hair in the third trimester of pregnancy. Psychoneuroendocrinology, 34, 32–37. doi: 10.1016/j.psyneuen.2008.08.024 Klauer, K. C., Voss, A., Schmitz, F., & Teige-Mocigemba, S. (2007). Process components of the Implicit Association Test: A diffusion-model analysis. Journal of Personality and Social Psychology, 93, 353–368. doi: 10.1037/0022-3514.93.3.353 LeBel, E. P., & Paunonen, S. V. (2011). Sexy but often unreliable: The impact of unreliability on the replicability of experimental findings with implicit measures. Personality and Social Psychology Bulletin, 37, 570–583. doi: 10.1177/ 0146167211400619 Loussouarn, G. (2001). African hair growth parameters. The British Journal of Dermatology, 145, 294–297. doi: 10.1046/j.13652133.2001.04350.x Nandi, C., Crombach, A., Bambonye, M., Elbert, T., & Weierstall, R. (2015). Predictors of posttraumatic stress and appetitive aggression in active soldiers and former combatants. European Journal of Psychotraumatology, 6, 26553. doi: 10.3402/ejpt. v6.26553 Neuner, F., Schauer, M., Karunakara, U., Klaschik, C., Robert, C., & Elbert, T. (2004). Psychological trauma and evidence for enhanced vulnerability for posttraumatic stress disorder through previous trauma among West Nile refugees. BMC Psychiatry, 4, 34. doi: 10.1186/1471-244X-4-34 Nosek, B. A. (2005). Moderators of the relationship between implicit and explicit evaluation. Journal of Experimental Psychology: General, 134, 565–584. doi: 10.1037/0096-3445. 134.4.565 Nosek, B. A., & Smyth, F. L. (2007). A multitrait-multimethod validation of the Implicit Association Test: Implicit and explicit attitudes are related but distinct constructs. Experimental Psychology, 54, 14–29. doi: 10.1027/1618-3169.54.1.14 Pfeiffer, A., & Elbert, T. (2011). PTSD, depression and anxiety among former abductees in Northern Uganda. Conflict and Health, 5, 14–. doi: 10.1186/1752-1505-5-14 Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108. doi: 10.1037/0033-295X.85.2.59 Richetin, J., & Richardson, D. S. (2008). Automatic processes and individual differences in aggressive behavior. Aggression and Violent Behavior, 13, 423–430. doi: 10.1016/j.avb.2008. 06.005 Richetin, J., Richardson, D. S., & Mason, G. D. (2010). Predictive validity of IAT aggressiveness in the context of provocation. Social Psychology, 41, 27–34. doi: 10.1027/1864-9335/a000005 Roberts, B., Ocaka, K. F., Browne, J., Oyok, T., & Sondorp, E. (2008). Factors associated with post-traumatic stress disorder and depression amongst internally displaced persons in northern Uganda. BMC Psychiatry, 8, 38. doi: 10.1186/1471244X-8-38 Schauer, E., & Elbert, T. (2010). The psychological impact of child soldiering. In E. Martz (Ed.), Trauma rehabilitation after war and conflict (pp. 311–360) doi: 10.1007/978-1-4419-5722-1. New York, NY: Springer Shaw, J. A. (2003). Children exposed to war⁄terrorism. Clinical Child and Family Psychology Review, 6, 237–246. doi: 10.1023/ B:CCFP.0000006291.10180 Singer, P. W. (2001). Caution: Children at war. Parameters, 31, 40–56.

Ó 2017 Hogrefe Publishing

M. Bluemke et al., Implicit Association Test in Child Soldiers

Strack, F., & Deutsch, R. (2004). Reflective and impulsive determinants of social behavior. Personality and Social Psychology Review, 8, 220–247. doi: 10.1207/s15327957pspr0803_1 Steudte, S., Kolassa, I.-T., Stalder, T., Pfeiffer, A., Kirschbaum, C., & Elbert, T. (2011). Increased cortisol concentrations in hair of severely traumatized Ugandan individuals with PTSD. Psychoneuroendocrinology, 36, 1193–1200. doi: 10.1016/ j.psyneuen.2011.02.012 Twum-Danso, A. (2003). Africa’s young soldiers: The co-option of childhood. Monograph, 82, . Retrieved from http://www.issafrica.org/uploads/Mono82.pdf Vinck, P., Pham, P. N., Stover, E., & Weinstein, H. M. (2007). Exposure to war crimes and implications for peace building in northern Uganda. The Journal of the American Medical Association, 298, 543–554. doi: 10.1001/jama.298.5.543 Voss, A., Nagler, M., & Lerche, V. (2013). Diffusion models in experimental psychology: A practical introduction. Experimental Psychology, 60, 385–402. doi: 10.1027/1618-3169/a000218 Voss, A., & Voss, J. (2007). Fast-dm: A free program for efficient diffusion model analysis. Behavioral Research Methods, 39, 767–775. doi: 10.3758/BF03192967 Weierstall, R., & Elbert, T. (2011). The Appetitive Aggression Scale. European Journal of Psychotraumatology, 2, 8430. doi: 10.3402/ ejpt.v2i0.8430 Weierstall, R., Huth, S., Knecht, J., Nandi, C., & Elbert, T. (2012). Attraction to violence as a resilience factor against trauma disorders: Appetitive aggression and PTSD in German World War II veterans. PLoS One, 7, e50891. doi: 10.1371/journal. pone.0050891 Weierstall, R., Schaal, S., Schalinski, I., Dusingizemungu, J. P., & Elbert, T. (2011). The thrill of being violent as an antidote to

Ó 2017 Hogrefe Publishing

posttraumatic stress disorder in Rwandese genocide perpetrators. European Journal of Psychotraumatology, 2, 6345. doi: 10.3402/ejpt.v2i0.6345 Weierstall, R., Schalinski, I., Crombach, A., Hecker, T., & Elbert, T. (2012). When combat prevents PTSD symptoms – Results from a survey with former child soldiers in Northern Uganda. BMC Psychiatry, 12, 41. doi: 10.1186/1471-244X-12-41 Wilker, S., Pfeiffer, A., Kolassa, S., Koslowski, D., Elbert, T., & Kolassa, I.-T. (2015). How to quantify exposure to traumatic stress? Reliability and predictive validity of measures for cumulative trauma exposure in a post-conflict population. European Journal of Psychotraumatology, 6, 28306. doi: 10.3402/ejpt.v6.28306 Winkler, N., Ruf-Leuschner, M., Ertl, V., Pfeiffer, A., Schalinski, I., . . . Elbert, T. (2015). From war to classroom: PTSD and depression in formerly abducted youth in Uganda. Frontiers in Psychiatry, 6, 2. doi: 10.3389/fpsyt.2015.00002 Received November 30, 2016 Revision received December 16, 2016 Accepted December 24, 2016 Published online July 12, 2017 Matthias Bluemke Department of Survey Design and Methodology GESIS – Leibniz-Institute for the Social Sciences B2, 1 68159 Mannheim Germany matthias.bluemke@gesis.org

Zeitschrift für Psychologie (2017), 225(1), 54–63

Original Article

Measuring a Mastery Goal Structure Using the TARGET Framework Development and Validation of a Classroom Goal Structure Questionnaire Marko Lüftenegger,1 Ulrich S. Tran,2 Lisa Bardach,1 Barbara Schober,1 and Christiane Spiel1 1

Department of Applied Psychology: Work, Education and Economy, Faculty of Psychology, University of Vienna, Austria

Department of Basic Psychological Research and Research Methods, Faculty of Psychology, University of Vienna, Austria

Abstract: In prior research, goal structures have been measured as macroscopic and holistic constructs referring to all activities in the classroom setting associated with learning and performing on a meta-level. A more comprehensive approach for identifying concrete classroom structures that should foster students’ mastery goals is provided by the multidimensional TARGET framework with its six instructional dimensions (Task, Autonomy, Recognition, Grouping, Evaluation, Time). However, measurement instruments assessing students’ perceptions of all TARGET dimensions are largely lacking. The main aim of this study was to develop and validate a new student questionnaire for comprehensive assessment of the perceived TARGET classroom structure (the Goal Structure Questionnaire – GSQ). Scales were constructed using a rational-empirical strategy based on classical conceptions of the TARGET dimensions and prior empirical research. The instrument was tested in a study using a sample of 1,080 secondary school students. Findings indicate that the scales are reliable, internally valid, and externally valid in terms of relationships with students’ achievement goals. More concretely, analyses revealed that the TARGET mastery goal structure positively predicts mastery goals, performance approach goals, and an incremental implicit theory of intelligence. No associations were found with performance avoidance goals. Keywords: motivation, achievement goals, classroom, goal structure, TARGET framework

“The most important attitude that can be formed is that of desire to go on learning” John Dewey (1938, p. 48). One of the most important objectives of school is to enable, empower, and equip children to become lifelong learners (Schober, Lüftenegger, Wagner, Finsterwald, & Spiel, 2013). This is crucial for both personal development across the life span as well as the necessity of being able to handle constant change and transition as a result of rapid technological and scientific changes, organizational innovation, and global competition. Schools and in particular contextual characteristics in the classroom can promote and children can adopt profoundly different definitions of what teaching and learning are about: mastery or performance goals. Mastery goals are related to developing new skills and improving one’s level of competence, whereas performance goals focus on demonstrating competence and ability in comparison to others (approach focus), or avoiding failure and unfavorable judgments of one’s ability by others (avoidance focus) (Elliot, 2005). Studies demonstrated that Zeitschrift für Psychologie (2017), 225(1), 64–75 DOI: 10.1027/2151-2604/a000277

students show the most beneficial motivational and cognitive patterns when they focus on mastery goals (see Meece, Anderman, & Anderman, 2006). Consequently, researchers investigated characteristics of the classroom context in order to describe the extent and the way in which environmental factors support the adoption of different goals. One prominent example of these efforts is the TARGET framework (Epstein, 1988) with its six instructional strategies or dimensions (Task, Authority, Recognition, Grouping, Evaluation, Time). Different types of achievement goals can be stressed in classrooms along any of these dimensions, and students tend to adopt these goals (Meece et al., 2006). However, measurement instruments assessing students’ perceptions of the TARGET dimensions are largely lacking. Of particular importance, a comprehensive measurement instrument assessing all six TARGET dimensions that represent a mastery goal structure is missing so far. Additionally, the majority of studies examining consequences of goal structures have mainly relied on college or high school student samples. Ó 2017 Hogrefe Publishing

M. Lüftenegger et al., Classroom Goal Structure Questionnaire

Thus, we know less about the adoption of achievement goals in younger students, particularly immediately after the transition to secondary school. Therefore, the aim of the present study is to develop a measurement instrument to adequately assess a perceived mastery goal structure following the six TARGET dimensions. Moreover, another aim is to test the associations between the perceived TARGET mastery goal structure and personal achievement goals.

Personal Achievement Goals In educational settings, student goals are the purposes or reasons for engaging, choosing, and persisting in different learning activities or achievement tasks (Pintrich, 2003). Research on students’ achievement goals has a long tradition in educational research and has resulted in the development of various conceptual models. The dichotomous model (Ames, 1992; Dweck & Leggett, 1988; Maehr, 1989; Nicholls, 1984) distinguishes between mastery goals (to develop competence) and performance goals (to demonstrate competence). Central for individuals pursuing mastery goals is a focus on developing new skills, improving one’s level of competence, and trying to understand new learning subjects. In contrast, individuals with performance goals focus on demonstrating their competence and ability in comparison to others (approach focus). The dichotomous model was extended by bifurcating performance goals into performance approach and performance goals (focus on avoiding failure and unfavorable judgments of one’s ability by others, respectively) (Elliot, 2005). This trichotomous achievement goal model was further expanded into a 2 2 model by also bifurcating the mastery goal construct into approach and avoidance goal types (Elliot, 2005). In addition to the expansion of the model, there is an active debate about the precise definition of achievement goals (see Hulleman, Schrager, Bodmann, & Harackiewicz, 2010; Kaplan & Maehr, 2007). In this study, we confine ourselves to the trichotomous achievement goal model as a theoretical basis because the mastery-avoidance construct has received little empirical support at the early secondary education level (junior high school) so far. Young children and adolescents are still improving their competences, wherefore mastery-avoidance goals may be of greater importance among older populations (Lee & Bong, 2016). The effects of students’ achievement goals have been extensively studied in experimental and correlational research designs over the last three decades. Most studies have relied on samples from a higher education context (college, university). We focus on empirical findings within the school context. The findings for mastery goals have been consistent and mostly favorable. Mastery goals are positively associated with greater effort and persistence Ó 2017 Hogrefe Publishing

in learning (Wolters 2004), deep-level learning strategies (Greene, Miller, Crowson, Duke, & Akey, 2004), selfefficacy (Bong, 2009), and well-being (Kaplan & Maehr, 1999). The association between mastery goals and performance is more complex, and the empirical evidence is mixed so far (Bergsmann, Lüftenegger, Jöstl, Schober, & Spiel, 2013; Bong, 2009; Hulleman et al., 2010; Kaplan & Maehr, 1999). Performance avoidance goals are typically associated with negative consequences like self-handicapping (Midgley & Urdan, 2001; Urdan, 2004), anxiety (Federici, Skaalvik, & Tangen, 2015), procrastination and low persistence (Wolters, 2004). In contrast, empirical findings on performance approach goals are mixed and possible consequences are subject to controversial discussion (see Senko, Hulleman, & Harackiewicz, 2011). For example, this applies to the relation between performance approach goals and academic performance: Performance approach goals have been found to be positively related to academic performance (Bong, 2009) or not related (Paulick, Watermann, & Nückles, 2011). These mixed results with regard to performance can be partly explained by the different conceptualizations of performance approach goals (focus on normative comparison vs. demonstration of competence; see Senko et al., 2011). For instance, normative performance approach goals are positively associated with performance, whereas competence demonstration goals are not (see Hulleman et al., 2010).

Goal Structure and TARGET Framework From the beginnings of achievement goal theory, researchers have highlighted the idea that students’ adoption of personal achievement goals may be influenced by what happens in the classroom (Ames 1992; Maehr & Midgley, 1996). Classroom structure has the potential to be managed and modified by the actions of those within the classroom (both teachers and students) and can make different achievement goals salient on a contextual level. Children tend to adopt the goals that are stressed in their classroom as their own guiding purposes. More accurately, their perception of these goal structures (Meece et al., 2006), also known as classroom goal structures or classroom goals, should determine the adoption of personal achievement goals. To avoid ambiguities, we use the term goal structures consistently throughout this paper. Empirical findings are in agreement and strongly suggest that perceived goal structures are related to students’ personal achievement goals (e.g., Bergsmann et al., 2013; Church, Eliott, & Gable, 2001; Federici et al., 2015; Greene et al., 2004; Lau & Nie, 2008; Lüftenegger, van de Schoot, Schober, Finsterwald, & Spiel, 2014; Urdan, 2004; Wolters, 2004). Zeitschrift für Psychologie (2017), 225(1), 64–75

However, which classroom structures exactly shape students’ achievement goals? Researchers have identified core dimensions of instructional practices in classrooms involved in the shaping of students’ personal achievement goals. Joyce Epstein (1988) used the acronym TARGET for a prominent systematization of key classroom dimensions that affect students’ development and learning: Task design, distribution of Authority/autonomy, Recognition/rewards of students, Grouping arrangements, Evaluation practices, and Time allocation. Carole Ames (1992) used these six classroom dimensions to describe how personal mastery goals should be facilitated. Therefore, the TARGET dimensions were conceptualized to represent a mastery goal structure. The task dimension concerns the design of classwork and homework. Appropriate tasks include a focus on learning, moderate challenges, curiosity, and active involvement. Authority refers to the opportunity to participate actively in making decisions in the classroom that are relevant to instruction. Teachers share authority over instructional decisions with students, taking into account their needs and feelings. The opportunity to decide for oneself what exercises and tasks one should complete in a certain subject area is also included in this dimension, as is shared responsibility in social decision-making processes, for instance enacting class rules. To better reflect the theoretical description of this classroom dimension, we decided to use autonomy instead of authority throughout the manuscript. The recognition dimension concerns the formal or informal provision of recognition through incentives, rewards, or feedback. Rewards can be useful for students if they provide information about their progress or competence. The grouping dimension involves the use of heterogeneous cooperative groups and peer interaction to encourage working with others (Ames, 1992). The evaluation dimension focuses on methods that assess progress and improvement while avoiding the establishment of a competitive environment. Students should experience that it is normal to make mistakes and that these are allowed in the classroom (Steuer, Rosentritt-Brunn, & Dresel, 2013). Time encompasses the appropriateness of workload, the pace of instruction, and the time allotted for students to introduce their own topics and interests. The time dimension is closely linked to the design of tasks and autonomy. In some concepts and studies, time and task are treated as a joint single dimension (Ames, 1992; Greene et al., 2004). Against the background of the manifold nature of the aspects covered, we assume that the six described dimensions are distinguishable, but nevertheless interrelated subdimensions of students’ perception of their goal structure in line with the TARGET framework (Lüftenegger et al., 2014). We assume that each of the dimensions Zeitschrift für Psychologie (2017), 225(1), 64–75

M. Lüftenegger et al., Classroom Goal Structure Questionnaire

contributes to the perception of the overall TARGET goal structure in the classroom. Therefore, we assume that perceptions of the six subdimensions vary rather consistently between learners and that one superordinate uniform factor of TARGET can be conceptualized.

Assessment of Perceived Goal Structure and TARGET In line with tradition among achievement goal researchers, we argue that students’ subjective perception of their learning environment is the appropriate source of data (e.g., Maehr & Midgley, 1996; Schwinger & Stiensmeier-Pelster, 2011; but also Lüdtke, Robitzsch, Trautwein, & Kunter, 2009). It is not what the assumed objective outsider (e.g., the teacher or the researcher) sees that is the immediate cause of student attitudes, behavior, or goal adoption. Students’ perceptions of the goals emphasized in the classroom affect thoughts, beliefs, and attitudes associated with the personal investment in learning. Different classrooms have different learning environments with different classroom goal messages that influence students’ goal adoption. Therefore, it is important to examine the degree to which perceptions in classrooms are shared. A broad array of empirical evidence indicates that a mastery goal structure is linked to students’ adoption of a personal mastery goal orientation (e.g., Lau & Lee, 2008; Lüftenegger et al., 2014; Urdan, 2004). Studies examining the relation between a perceived mastery goal structure and students’ personal performance approach and avoidance goals reveal contradictory results: For instance, one study showed that the perception of a mastery goal structure was positively related to both types of performance goals (Federici et al., 2015), whereas no significant relations between mastery goal structure and the two performance goal orientations were found in another study (Midgley & Urdan, 2001). Only including performance approach goals in their analyses, Lau and Lee (2008) showed a positive relationship between the perception of a mastery goal structure and performance approach goals. Other studies report that a mastery goal structure negatively predicts performance approach goals or found no significant relation with performance avoidance goals (Urdan, 2004; Wolters, 2004). In summary, reported studies cover the whole range of possible relations between a mastery goal structure and the two types of performance goals and no consistent pattern explaining these differing results could be found. It should be noted that the reported studies differ with respect to the complexity of their statistical analyses. However, the inconsistent empirical evidence could also possibly be traced back to the measurement instruments used and their underlying conceptualizations of performance Ó 2017 Hogrefe Publishing

M. Lüftenegger et al., Classroom Goal Structure Questionnaire

approach and avoidance goals. Instruments assessing achievement goals often vary in their theoretical conceptions and operational definitions of performance goals (see e.g., Hackel, Jones, Carbonneau, & Mueller, 2016). Regarding the operationalization of performance approach and avoidance goals, two critical elements can be distinguished: the desire to demonstrate competence (e.g., Kaplan & Maehr, 2007) or to outperform others (e.g., Elliot, 2005). Therefore, Senko and colleagues (2011) suggest refining the constructs of both performance goals by distinguishing between normatively-focused performance goals (outperforming others) and appearance-based performance goals (demonstrating competence). Current measurements of achievement goals tend to either mix the “normative” and “appearance” components in their scales for performance approach and performance avoidance goals (e.g., SELLMO; Spinath, Stiensmeier-Pelster, Schöne, & Dickhäuser, 2002), or use items that assess solely normative (e.g., Achievement Goals Questionnaire, AGQ; Elliot & Murayama, 2008) or appearance aspects (e.g., Patterns of Adaptive Learning, PALS; Midgley et al., 2000). So far, goal structures in the classroom have been measured as a macroscopic and holistic construct that refers to all activities in the classroom setting associated with learning and performing on a meta-level (e.g., PALS, Midgley et al., 2000; Schwinger & Stiensmeier-Pelster, 2011). Research on the TARGET framework has been mostly conducted in empirical studies where individual dimensions were investigated separately (Church et al., 2001; Greene et al., 2004; Lau & Lee, 2008; Tapola & Niemivirta, 2008), or where a few dimensions representing a mastery goal structure were observed together (Bergsmann et al., 2013). Examining singular dimensions, therefore, presumably leads to different results than considering classroom instruction in its entire complexity (Church et al., 2001). Particularly, the systematic promotion of mastery goals possibly necessitates a classroom structure that does not singularly focus on a set of strategies or a particular instructional method, but rather the engagement of a complete constellation of strategies that are conceptually related, such as the system theoretically provided by the TARGET framework. Lüftenegger and colleagues (2014) considered all six TARGET dimensions and could show in a longitudinal design that a mastery goal structure following TARGET has a causal effect on junior high school students’ personal mastery goals. However, the measurement of several dimensions was very limited in terms of content validity due to the use of single items. So far, to the best of our knowledge, no instrument has been presented that comprehensively assesses all six TARGET dimensions and has been rigorously validated. Due to the comprehensive conceptualization of the TARGET dimensions in the classroom, it is not sufficient Ó 2017 Hogrefe Publishing

to include some items in other instruments of mastery goal structure (e.g., PALS, Midgley et al., 2000) or existing operationalizations of TARGET dimensions (Church et al., 2001; Greene et al., 2004; Lau & Lee, 2008; Tapola & Niemivirta, 2008; Lüftenegger et al., 2014). Instead, a comprehensive assessment of all six dimensions seems necessary to ensure the content validity of scores regarding the overall TARGET goal structure.

The Present Investigation Based on prior work on favorable goal structures, the main purpose of the present study was to develop a reliable measurement instrument to assess perceived goal structure based on the TARGET framework with its six subdimensions. Furthermore, we aimed to analyze the effects of this perceived TARGET goal structure on students’ achievement goals and implicit theories. Specifically, the present research is designed to investigate the structural validity of the measurement instrument and to examine links with achievement goals and implicit theories that have been shown to be both conceptually and empirically important in prior work on goal structure and personal achievement goals more generally. Based on theory and prior research (Lüftenegger et al., 2014), we expected the TARGET mastery goal structure to be a multidimensional classroom characteristic. We expected that the TARGET goal structure has multiple, interrelated subdimensions that, in concert, constitute a superordinate and uniform overall perceived TARGET mastery classroom structure (Hypothesis 1a). This varies between classrooms (Hypothesis 1b). The second objective was to investigate how the TARGET classroom goal structure is related to students’ personal achievement goals and implicit theories of intelligence. Prior research (Lüftenegger et al., 2014) found that a perceived TARGET mastery goal structure influenced students’ mastery goals over time. To our knowledge, no study has investigated the effects on other achievement goals. Moreover, we focused on implicit theories of intelligence, which have been shown to be conceptually and empirically important for the adoption of achievement goals (Dweck & Leggett, 1988). To our knowledge, studies focusing on goal structures as possible antecedents of implicit theories are lacking so far. As such, it would seem promising to include implicit theories in the goal structure’s nomological network. Perceived TARGET mastery goal structure is expected to positively predict personal mastery goals (Hypothesis 2a). Perceived TARGET mastery goal structure is expected to predict implicit theories (Hypothesis 2b). Due to inconsistent research results, interrelations with performance goal types Zeitschrift für Psychologie (2017), 225(1), 64–75

(approach normative, approach appearance, avoidance normative, avoidance appearance) are indeed expected, but their direction cannot be specified (Hypothesis 2c).

Method Sample The survey was conducted with 1,080 Austrian students in May 2015. Participation was voluntary, and only students with active parental consent participated in the study. Less than 1% of students were not allowed to participate by their parents. Students completed the questionnaire during normal classroom hours and were instructed by trained research assistants. The students did not receive compensation for their participation in the study. The data were collected in five academic-track secondary schools (Gymnasium schools) in Vienna. In line with the typical composition of academic-track schools in Austria, girls were slightly overrepresented in this study (53.2% of the sample). Students’ mean age was 12.8 years (SD = 1.01) and they were enrolled in grades six (34%), seven (37.1%), and eight (28.9%). The average number of children per classroom was 23.46 (SD = 3.29). The subject for each class was determined randomly prior to data collection. The subjects investigated were German (51.8%) and mathematics (48.2%). In Austria, the new school year begins in September; therefore, sufficient time had clearly passed for goal structures to be established. Measures Personal achievement goals, the TARGET goal structure, and implicit theories were all assessed with a questionnaire. All questionnaire items, except those measuring implicit theories, used a 6-point scale ranging from 1 (= strongly disagree) to 6 (= strongly agree) and referred to specific school subjects, that is, mathematics and German language class. Mathematics and German language were chosen as the focus of study, because previous research on goals in secondary schools has found these to be particularly important domains of inquiry (e.g., Midgley & Urdan, 2001; Murayama, & Elliot, 2009; Schwinger & StiensmeierPelster, 2011; Wolters, 2004). Personal Achievement Goals Following the trichotomous achievement goal conceptualization, we assessed mastery goals, performance approach goals, and performance avoidance goals with the respective subscales of the well-validated German achievement goal questionnaire SELLMO-S (Spinath et al., 2002). The scales reflect several dimensions of achievement goals classified in a meta-analysis of achievement goal measures (Hulleman et al., 2010). The items for mastery goals mainly focus on a preference for challenging activities, but also Zeitschrift für Psychologie (2017), 225(1), 64–75

M. Lüftenegger et al., Classroom Goal Structure Questionnaire

represent the motivation to master a task in order to gain new knowledge. The operationalization of performance approach goals and performance avoidance goals focuses on both the normative and appearance components of performance goals. We divided both performance goals scales into two subscales, one with a focus on appearance and the other on normative comparisons, resulting in four performance goal scales (approach normative, approach appearance, avoidance normative, avoidance appearance). All items were introduced with the phrase “In my German/math class, I personally strive . . .” followed by the statements referring to the respective goal type. Examples are “. . .to learn as much as possible” for mastery goals (eight items), “. . .to get my work done better than others” for performance approach goals with a normative focus (four items), “. . .to demonstrate what I can and what I know” for performance approach goals with an appearance focus (three items), “. . .not to stand out by stupid questions” for performance avoidance goals with an appearance focus (three items) and “. . .to that other students don’t consider me stupid” for performance avoidance goals with a normative focus (five items) (see Schwinger & Stiensmeier-Pelster, 2011, for a complete list of items). Internal consistencies of the five subscales were good (α = .70–.86; CR = .71–.86). Implicit Theory of Intelligence Students’ implicit theory of intelligence was measured using a subscale of a well-established German instrument for the assessment of subjective beliefs about factors underlying learning and achievement (SE-SÜBELLKO; Spinath & Schöne, 2003). The scale consists of three items in the form of statements about the nature of intelligence. Students completed these statements by indicating the degree of malleability they believe in on a 6-point semantic differential (sample item: “You have a certain amount of intelligence that cannot be changed vs. that can be changed.’’). Higher values represent higher endorsement of an incremental theory (α = .80; CR = .80). TARGET Goal Structure TARGET goal structure was assessed via student perceptions of the six proposed dimensions. These comprised Task, Autonomy, Recognition, Grouping, Evaluation, and Time (the items are provided in the Electronic Supplementary Material, ESM 1). The development of the measurement instrument comprised several steps. First, we newly formulated 12–15 items for each TARGET dimension that were derived from the conceptual understanding of the respective dimension. Second, to ensure content validity, we revised these items using expert judgments from members of our research group. In the next step, we selected the ten items with the best representation of the conceptual understanding of the respective dimension Ó 2017 Hogrefe Publishing

M. Lüftenegger et al., Classroom Goal Structure Questionnaire

and used criteria for semantic redundancy. These items were used as a preliminary version of the questionnaire in the present study. The selection of items for the final scales was based on several criteria: (1) having an efficient and balanced instrument; (2) representation of all proposed aspects of the respected TARGET dimension; (3) good psychometric properties of the scales (reliability and validity). This also included attending to both convergent and divergent scale validity. Items were selected according to convergent item validity (i.e., high factor loadings on the relevant TARGET scale) as well as divergent item validity (i.e., low factor loadings on other TARGET scales). We selected six items per dimension according to these three criteria. Analyses on the item level showed sufficient properties for the 36 items constituting the final Goal Structure Questionnaire (GSQ). Analyses on the scale level also revealed sufficient properties for all subscales (α = .69–.85; CR = .69–.85). Strategy of Analyses and Missing Data Statistical analyses were conducted using structural equation models (SEMs) with the complex design option in Mplus 7.4 (Muthén & Muthén, 1998–2015) to control for the hierarchical nature of the data. The complex design option takes into account the nonindependence of the scores of students from the same class – that is, the clustering effect of students nested within classes. As the constructs are measured as ordinal-level variables, we used robust weighted least squares estimation (WLSMV) for all analyses. Goodness-of-fit of the models was evaluated using several different indices, including the w2 Test of Model Fit, the Tucker-Lewis Index (TLI), Comparative Fit Index (CFI), and Root Mean Square Error of Approximation (RMSEA). In addition, a 90% confidence interval around the point estimate enabled an assessment of the precision of the RMSEA estimate (for details about these indices, see Kline, 2011). We used traditional cutoff scores indicative of excellent and adequate fit to the data, respectively: CFI and TLI .95 and .90, and RMSEA .06 and .08. We provide standardized coefficients. Standardized coefficients represent the amount of change in the outcome that can be expected from a one standard deviation unit change in the predictors. Following Cohen’s (1988) guidelines, in the context of regression parameters, standardized values greater than 0.10, 0.30, and 0.50 generally reflect small, moderate, and large effect sizes. The influence of domain and sex was investigated in preliminary analyses. Main effects of the domain were found for autonomy and grouping. Main effects of sex were found only for mastery goals. All significant main effects were considered in the main analysis. The rate of individuals omitting items (nonresponse) in the present study was at a maximum of 1.3% for all considered items and, as such, very low. We used the full Ó 2017 Hogrefe Publishing

information maximum likelihood (FIML) approach implemented in Mplus 7.4 to deal with missing values. This approach takes all available information from the observed data into account when estimating parameter estimates and standard errors.

Results Table 1 provides descriptive statistics, bivariate correlations, and reliabilities (Cronbach’s α and composite reliability) of the investigated variables. The findings indicate that there was sufficient variation of scores on all scales and reliability for all scales ranged from moderate to excellent. Dimensionality of the Perceived Goal Structure In order to analyze the dimensionality of the perceived TARGET goal structure, confirmatory factor analyses (CFAs) were performed. In a first step, six CFAs were conducted to examine the construct validity of each of the six TARGET dimensions. All six models showed moderate to good model fit, CFI = .951–.992, TLI = .919–.975, RMSEA = .049–.100. Standardized item loadings were in the range of λ = .45–.81 with four exceptions: two items in the subscale task and two items for grouping only had a standardized loading of less than of λ = .27 and were excluded from all subsequent analyses and the final scales (see ESM 1). Without these items, the task and grouping model was characterized by good model fit indices and satisfactory standardized loadings (Figure 1). In a second step, we examined three models to test for the dimensionality of the perceived goal structure. Our hypothesized Model 1 included one factor for each of the six perceived TARGET dimensions with loadings of the respective items. This model fitted well with the data (see Table 2). We additionally tested Model 1 against an alternative model. Model 2 was a one-factor model reflecting a strictly unidimensional conceptualization of perceived goal structure in which all items load on one factor (Model 2). Model estimation and model comparison revealed significant advantages for the hypothesized Model 1 (see Table 2). To test our hypothesis that the six TARGET dimensions constitute a superordinate and uniform factor reflecting the overall goal structure in the classroom (Hypothesis 1), we specified another model (Model 3): Based on Model 1, we modeled one second-order factor with loadings of all six dimensions. This model also showed acceptable fit to the data (see Table 2), although slightly worse than that for Model 1. The model comparison revealed advantages for Model 1 over Model 2. Model 3 provides estimates of the latent relationships, which were in the range of φ = .54–.94, indicating the appropriateness of a mastery goal structure conception including six subdimensions. Zeitschrift für Psychologie (2017), 225(1), 64–75

Zeitschrift für Psychologie (2017), 225(1), 64–75

.37

.52

.47

3. Recognition

4. Grouping

5. Evaluation

6. Time

.10

.07

11. Performance goal AV-A

12. Implicit theories

.69

.67–.71

.11

.74

95% CI CR

ICC 1

ICC 2

.85

.19

.78–.82

.80

1.00–6.00

0.41

1.06

3.73

.09

.01

.02

.23

.02

.34

.75

.68

.63

.69

.81

.15

.72–.77

.74

1.00–6.00

0.51

0.95

4.05

.09

.03

.01

.27

.02

.40

.73

.75

.50

.91

.30

.72–.77

.75

.74

1.00–6.00

0.02

1.22

3.13

.00

.04

.08

.18

.05

.27

.51

.50

.81

.15

.84–.87

.85

1.00–6.00

0.73

1.03

4.41

.10

.03

.04

.32

.05

.42

.74

.83

.17

.79–.83

.81

1.00–6.00

0.59

1.04

4.10

.07

.02

.01

.28

.03

.39

. 56

.05

.84–.88

.86

1.00–6.00

0.77

0.86

4.41

.16

.26

.19

.65

.26

.48

.04

.80–.84

.82

1.00–6.00

0.19

1.27

3.35

.06

.52

.63

.43

.44

.03

.67–.75

.71

.70

1.00–6.00

0.68

1.03

4.45

.12

.32

.31

.33

.42

.03

.67–.74

.80–.84 .02

.71

1.00–6.00

0.83

1.30

3.64

.03

.82

1.00–6.00

0.25

1.17

3.09

.08

.68

.24

.01

.77–.83

.80

1.00–6.00

0.97

1.24

4.67

Notes. AP-A = approach appearance; AP-N = approach normative; AV-A = avoidance appearance; AV-N = avoidance normative; CI = confidence interval; CR = composite reliability; ICC = intraclass correlation. |r| .07, p < .05.

.69

Actual range

1.00–6.00

Skewness

0.94

0.41

3.90

.16

10. Performance goal AV-N

.28

9. Performance goal AP-A

.13

8. Performance goal AP-N

Number of items

.42

7. Mastery goal

Student characteristics

.44

.48

2. Autonomy

1. Task

TARGET classroom structure

Table 1. Bivariate correlations, descriptive statistics, and reliabilities

70 M. Lüftenegger et al., Classroom Goal Structure Questionnaire

Ó 2017 Hogrefe Publishing

M. Lüftenegger et al., Classroom Goal Structure Questionnaire

Figure 1. Structural equation modeling showing external linkages of the superordinate and uniform TARGET goal structure factor. Standardized regression coefficients are reported and nonsignificant paths are shown as dashed lines. PGAP-A = performance goals approach appearance; PGAP-N = performance goals approach normative; PGAV-A = performance goals avoidance appearance; PGAV-N = performance goals avoidance normative.

Although some of the relationships between conceptual similar constructs, such as recognition and evaluation, were high, they clearly indicate that all of the dimensions are separable, given that the latent coefficients were corrected for unreliability and represent the highest possible estimates for these relationships. These results indicate that it is justifiable to conceptualize perceived goal structure as hierarchically structured, consisting of distinguishable subdimensions that contribute to one superordinate uniform factor of mastery goal structure (see Hypothesis 1a).

Ó 2017 Hogrefe Publishing

Thus, we decided in favor of the model with one superordinate factor and six dimensions. Classroom Differences As expected in Hypothesis 1b, we were able to find significant and moderate to large differences between classrooms in all six subdimensions and the superordinate uniform factor of mastery goal structure (ICC1 = .11–.30; p < .001; see Table 1). Within classrooms, students’ perceptions of the TARGET subdimensions seem to be rather

Zeitschrift für Psychologie (2017), 225(1), 64–75

M. Lüftenegger et al., Classroom Goal Structure Questionnaire

Table 2. Results from CFA models of mastery classroom goal structure indicators w2 or Δw2

df or Δdf

CFI

TLI

RMSEA [90% CI]

Model fit Model 1: six dimensions

1106.92*

449

.943

.937

.037 [.034–.040]

Model 2: one overall g factor

1449.06*

464

.915

.909

.044 [.042–.047]

1157.39*

458

.940

.935

.038 [.035–.040]

Model 1 vs. Model 2

410.41*

Model 1 vs. Model 3

93.88*

Model 3 vs. Model 2

373.61*

Model 3: six subfactors and one superordinate uniform factor of mastery goal structure Model comparison

Notes. n = 1,080. All items were treated as ordered categorical, utilizing the WLSMV estimator in Mplus (Muthén & Muthén, 1998–2015). Model comparisons were conducted using the robust difference testing procedure for mean and variance adjusted test statistics. CFA = confirmatory factor analysis; CFI = comparative fit index; RMSEA = root mean square error of approximation; TLI: tucker-lewis index. *p < .001.

homogeneous (ICC2 = .74–.91). ICC2 can be interpreted as an indicator of the reliability of the measurement of a contextual characteristic via several individual perceptions. ICC2 is interpreted in line with other reliability measures (Lüdtke et al., 2009). Predicting Students’ Personal Achievement Goals In order to investigate whether students’ perception of the TARGET mastery goal structure can predict their personal achievement goals and implicit theories (Hypotheses 2a–2c), structural equation modeling was employed (see Figure 1 – structural model). Overall fit indices showed a good model fit, CFI = .943, TLI = .940, RMSEA = .026 [.024, .028]. Estimation revealed that a perceived TARGET goal structure positively predicted personal mastery goals (b = .53, SE = .03, p < .001), performance approach goals with an appearance (b = .42, SE = .04, p < .001) and normative focus (b = .07, SE = .04, p = .048), and implicit theories (b = .17, SE = .03, p < .001). No associations were found between the TARGET goal structure and performance avoidance goals for both appearance (b = .02, SE = .04, p = .632) and normative avoidance focus (b = .04, SE = .03, p = .237).

Discussion The purpose of the study was to develop and validate a measurement instrument which adequately assesses goal structures within the TARGET conceptualization. By building upon previous work (e.g., Lüftenegger et al., 2014), this study aimed to extend existing findings and refine previous attempts to measure goal structures in the TARGET framework. Our newly developed Goal Structure Questionnaire (GSQ) incorporates a broad range of theoretically-rooted aspects of the TARGET dimensions and can therefore be considered a highly comprehensive and differentiated measurement instrument. Results revealed that students’ perceptions comprise six dimensions of interrelated but Zeitschrift für Psychologie (2017), 225(1), 64–75

distinguishable classroom structures (Task, Authority, Recognition, Grouping, Evaluation, Time). Instead of regarding these six dimensions as isolated aspects, this study’s findings provide a more holistic view of goal structures. Taken together, the six dimensions form a learning environment in classrooms which we refer to as “TARGET.” The findings of this study support the existence of TARGET as a macroscopic construct representing students’ overall perceptions of their mastery goal structure (Hypothesis 1a). As students in different classrooms receive different goal-relevant messages from their teachers and are exposed to different classroom practices and activities, we assumed that the perception of TARGET varies between classrooms. As hypothesized, we found differences between perceived TARGET goal structures in different classrooms, indicating adequate variability of TARGET goal structures on the class level (Hypothesis 1b). All in all, these results substantiate the internal validity of the TARGET questionnaire. Moreover, the study shows that the TARGET questionnaire has predictive power in explaining students’ mastery goals (Hypothesis 2a) and implicit theories (Hypothesis 2b). As expected and in line with theoretical considerations as well as empirical evidence (Lau & Lee, 2008; Lüftenegger et al., 2014; Tapola & Niemivirta, 2008), TARGET goal structures turned out to be positively related to students’ personal mastery goals. This finding underscores the suitability of the developed scales for assessing a mastery goal structure. The positive relationship to implicit theories corroborates our research hypotheses and allows us to draw some initial conclusions about the interplay between TARGET goal structures and students’ adoption of an incremental theory of intelligence. Regarding the relationship between TARGET goal structures and the four types of students’ personal performance goals (Hypothesis 2c), it was shown that TARGET goal structures are positively related to students’ performance approach goals with an appearance focus and, to a lesser extent, to students’ performance approach goals with a Ó 2017 Hogrefe Publishing

M. Lüftenegger et al., Classroom Goal Structure Questionnaire

normative focus. No significant relations between TARGET goal structures and either type of performance avoidance goals were found. All in all, the results for Hypotheses 2a– 2c substantiate the external validity of the measurement instrument. Given the inconsistent findings on the relation between mastery goal structures and students’ personal performance goals (e. g. Lau & Lee, 2008; Urdan, 2004; Wolters, 2004) this study’s findings reinforce the premise that a more comprehensive conceptualization of mastery goal structures as provided here may be especially appropriate for investigating and therefore enhancing our understanding of this relation. An important finding is that even though personal performance approach and avoidance goals are highly positively related (which is in line with broad empirical evidence; see Linnenbrink-Garcia et al., 2012), our results suggest that TARGET mastery-focused classrooms do not endorse both performance goals equally. Keeping in mind the negative consequences of pursuing performance avoidance goals, the finding that a TARGET mastery goal structure is not related to students’ adoption of performance avoidance goals has important implications for classroom practice. Furthermore, the positive relation of TARGET mastery goal structures with mastery goals and (both types of) performance approach goals can be interpreted in terms of a multiple goal perspective (Wormington & Linnenbrink-Garcia, 2016). Mastery goals are still considered the most beneficial goals, and their promotion should therefore represent a major aim in class (Wormington & Linnenbrink-Garcia, 2016). However, the simultaneous pursuit of a performance approach goal may also be adaptive for specific outcomes (Senko et al., 2011). The magnitude of the relations between the TARGET mastery goal structure and students’ personal goals found in this study (mastery > performance approach appearance > performance approach normative) reflects these assumptions and further emphasizes the suitability of the newly developed GSQ for research and classroom practice. Limitations and Implications for Future Research Some limitations of the present study should be noted. It has to be mentioned that our conclusions regarding the external validity of this study are limited by the correlational study design and do not allow us to interpret relations between TARGET goal structures, achievement goals, and implicit theories in causal ways. Additionally, due to the cross-sectional design it was not possible to examine the stability of the GSQ. Longitudinal studies could bring insights into the impact of TARGET goal structures on the development trajectories of achievement goals and incremental theories. Our reliance on students’ self-reports can be seen as another limitation. The use of multiple methods such as interviews (see e.g., Urdan, 2004) or classroom observations (see e.g., Patrick, Anderman, Ryan, Edelin, Ó 2017 Hogrefe Publishing

& Midgley, 2001) is therefore highly recommended to expand our understanding of the TARGET goal structures. Moreover, additional research on TARGET should obtain information from multiple informants (teachers and students), with the concordance between students’ perceptions of classroom structures and teachers’ perceptions of their instructional behavior of particular interest (Urdan, Midgley, & Anderman, 1998). Furthermore, it is an open question whether our findings can be generalized to other populations (e.g., gifted students, students of different age groups like elementary school students). More research is also needed to investigate whether the results of this study can be replicated in different countries and cultures. Another interesting avenue for future research would involve the deployment of the TARGET questionnaire in a completely different context, for example, sport psychology. One example of a research question might be how various aspects of instructional structures (measured by the TARGET questionnaire) influence the motivational development of adolescents in competitive sports. Methodologically, the modest reliability of the dimension Task has to be acknowledged. Further studies could address this limitation by developing and psychometrically testing additional items for the Task dimension. The high loading of especially Recognition on the TARGET higher-order factor has two implications: First, Recognition is apparently the most salient indicator of the overall TARGET goal structure, in the sense that the higher-order factor is effectively identical to this specific lower-order indicator. Second, with regard to measurement, assessment of the Recognition factor provides at the same time a reliable proxy measure of the overall higher-order factor. This may be of use in future applied research. Additionally, the TARGET questionnaire should be further validated by fully considering the inherent multilevel structure of goal structure using doubly latent modeling (latent in relation to both measurement and sampling error). However, doubly latent models are very complex and require substantial ICCs and large sample sizes on both the individual and class level (Lüdtke et al., 2011). Future research should also focus on further broadening the nomological network of mastery goal structures by investigating the effects on other motivational constructs such as academic interest, self-efficacy or self-concept, and how this nomological network (goal structure ? personal motivation) is connected to school performance. In conclusion, despite their limitations, the findings of the present study indicate that the newly developed scales are internally valid as demonstrated by confirmatory factor analysis, externally valid as demonstrated in terms of relationships with achievement goals, and reliable. The GSQ can therefore be considered the first comprehensive and psychometrically sound instrument assessing students’ perception of TARGET goal structures. Zeitschrift für Psychologie (2017), 225(1), 64–75

Electronic Supplementary Material The electronic supplementary material is available with the online version of the article at http://dx.doi.org/10.1027/ 2151-2604/a000277 ESM 1. Text (.docx). Items of the final questionnaire.

References Ames, C. (1992). Classrooms: Goals, structures, and student motivation. Journal of Educational Psychology, 84, 261–271. doi: 10.1037/0022-0663.84.3.261 Bergsmann, E., Lüftenegger, M., Jöstl, G., Schober, B., & Spiel, C. (2013). The role of classroom structure in fostering students’ school functioning: A comprehensive and application-oriented approach. Learning and Individual Differences, 26, 131–138. doi: 10.1016/j.lindif.2013.05.005 Bong, M. (2009). Age-related differences in achievement goal differentiation. Journal of Educational Psychology, 101, 879–896. doi: 10.1037/a0015945 Church, M. A., Eliott, A. J., & Gable, S. L. (2001). Perceptions of classroom environment, achievement goals, and achievement outcomes. Journal of Educational Psychology, 93, 45–54. doi: 10.1037/0022-0663.93.1.43 Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Dewey, J. (1938). Experience and education. West Lafayette, IN: Kappa Delta Pi. Dweck, C. S., & Leggett, E. L. (1988). A social-cognitive approach to motivation and personality. Psychological Review, 95, 256–273. Elliot, A. J. (2005). A conceptual history of the achievement goal construct. In A. J. Elliot & C. S. Dweck (Eds.), Handbook of competence and motivation (pp. 52–72). New York, NY: Guilford Press. Elliot, A. J., & Murayama, K. (2008). On the measurement of achievement goals: Critique, illustration, and application. Journal of Educational Psychology, 100, 613–628. doi: 10.1037/ 0022-0663.100.3.613 Epstein, J. L. (1988). Effective schools or effective students: Dealing with diversity. In R. Haskins & D. MacRae (Eds.), Policies for America’s public schools: Teacher, equity and indicators (pp. 89–126). Norwood, NJ: Ablex. Federici, R. A., Skaalvik, E. M., & Tangen, T. N. (2015). Students’ perceptions of the goal structure in mathematics classrooms: Relations with goal orientations, mathematics anxiety, and help-seeking behavior. International Education Studies, 8, 146–158. doi: 10.5539/ies.v8n3p146 Greene, B. A., Miller, R. B., Crowson, M. H., Duke, B. L., & Akey, K. L. (2004). Predicting high school students’ cognitive engagement and achievement: Contributions of classroom perceptions and motivation. Contemporary Educational Psychology, 29, 462–482. doi: 10.1016/j.cedpsych.2004.01.006 Hackel, T. S., Jones, M. H., Carbonneau, K. J., & Mueller, C. E. (2016). Re-examining achievement goal instrumentation: Convergent validity of AGQ and PALS. Contemporary Educational Psychology, 46, 73–80. doi: 10.1016/j.cedpsych.2016. 04.005 Hulleman, C. S., Schrager, S. M., Bodmann, S. M., & Harackiewicz, J. M. (2010). A meta-analytic review of achievement goal measures: Different labels for the same constructs or different constructs with similar labels? Psychological Bulletin, 136, 422–449. doi: 10.1037/a0018947

Zeitschrift für Psychologie (2017), 225(1), 64–75

M. Lüftenegger et al., Classroom Goal Structure Questionnaire

Kaplan, A., & Maehr, M. L. (1999). Achievement goals and student well-being. Contemporary Educational Psychology, 24, 330–358. doi: 10.1006/ceps.1999.0993 Kaplan, A., & Maehr, M. L. (2007). The contributions and prospects of goal orientation theory. Educational Psychology Review, 19, 141–184. doi: 10.1007/s10648-006-9012-5 Kline, R. B. (2011). Principles and practice of structural equation modeling (3rd ed.). New York, NY: Guilford Press. Lau, K. L., & Lee, J. (2008). Examining Hong Kong students’ achievement goals and their relations with students’ perceived classroom environment and strategy use. Educational Psychology, 28, 357–372. doi: 10.1080/01443410701612008 Lau, S., & Nie, Y. (2008). Interplay between personal goals and classroom goal structures in predicting student outcomes: A multilevel analysis of person-context interactions. Journal of Educational Psychology, 100, 15–29. doi: 10.1037/0022-0663. 100.1.15 Lee, M., & Bong, M. (2016). In their own words: Reasons underlying the achievement striving of students in schools. Journal of Educational Psychology, 108, 274–294. doi: 10.1037/ edu0000048 Linnenbrink-Garcia, L., Middleton, M. J., Ciani, K. D., Easter, M. A., O’Keefe, P. A., & Zusho, A. (2012). The strength of the relation between performance approach and performance avoidance goal orientations: Theoretical, methodological, and instructional implications. Educational Psychologist, 47, 281–301. doi: 10.1080/00461520.2012.722515 Lüdtke, O., Marsh, H. W., Robitzsch, A., & Trautwein, U. (2011). A 2 2 taxonomy of multilevel latent contextual models: Accuracy-bias trade-offs in full and partial error correction models. Psychological Methods, 16, 444–467. doi: 10.1037/a0024376 Lüdtke, O., Robitzsch, A., Trautwein, U., & Kunter, M. (2009). Assessing the impact of learning environments: How to use student ratings of classroom or school characteristics in multilevel modeling. Contemporary Educational Psychology, 34, 120–131. doi: 10.1016/j.cedpsych.2008.12.001 Lüftenegger, M., van de Schoot, R., Schober, B., Finsterwald, M., & Spiel, C. (2014). Promotion of students’ mastery goal orientations: Does TARGET work? Educational Psychology, 34, 451–469. doi: 10.1080/01443410.2013.814189 Maehr, M. L. (1989). Thoughts about motivation. In C. Ames & R. Ames (Eds.), Research on motivation in education (Vol. 3, pp. 299–315). New York, NY: Academic Press. Maehr, M. L., & Midgley, C. (1996). Transforming school cultures. Boulder, CO: Westview Press. Meece, J. L., Anderman, E. M., & Anderman, L. H. (2006). Classroom goal structure, student motivation, and academic achievement. Annual Review of Psychology, 57, 487–503. doi: 10.1146/annurev.psych.56.091103.070258 Midgley, C., Maehr, M. L., Hruda, L. Z., Anderman, E., Anderman, L., Freeman, K. E., . . . Urdan, T. (2000). Manual for the Patterns of Adaptive Learning Scales (PALS). Ann Arbor, MI: University of Michigan. Midgley, C., & Urdan, T. (2001). Academic self-handicapping and achievement goals: A further examination. Contemporary Educational Psychology, 26, 61–75. doi: 10.1006/ceps.2000.1041 Murayama, K., & Elliot, A. J. (2009). The joint influence of personal achievement goals and classroom goal structures on achievement-relevant outcomes. Journal of Educational Psychology, 101, 432–447. doi: 10.1037/a0014221 Muthén, B. O., & Muthén, L. K. (1998–2015). Mplus (Version 7. 4). Los Angeles, CA: Muthén & Muthén. Nicholls, J. G. (1984). Achievement motivation: Conceptions of ability, subjective experience, task choice, and performance. Psychological Review, 91, 328–346. doi: 10.1037/0033-295X. 91.3.328

Ó 2017 Hogrefe Publishing

M. Lüftenegger et al., Classroom Goal Structure Questionnaire

Patrick, H., Anderman, L. H., Ryan, A. M., Edelin, K. C., & Midgley, C. (2001). Teachers’ communication of goal orientations in four fifth-grade classrooms. The Elementary School Journal, 102, 35–58. doi: 10.1086/499692 Paulick, I., Watermann, R., & Nückles, M. (2011). Zielorientierungen und schulische Leistungen am Grundschulübergang [Achievement goals and school achievement during the transition from elementary to secondary school]. Unterrichtswissenschaft, 39, 365–384. Pintrich, P. R. (2003). A motivational science perspective on the role of student motivation in learning and teaching contexts. Journal of Educational Psychology, 95, 667–686. doi: 10.1037/ 0022-0663.95.4.667 Schober, B., Lüftenegger, M., Wagner, P., Finsterwald, M., & Spiel, C. (2013). Facilitating lifelong learning in school-age learners. European Psychologist, 18, 114–125. doi: 10.1027/ 1016-9040/a000129 Schwinger, M., & Stiensmeier-Pelster, J. (2011). Performanceapproach and performance-avoidance classroom goals and the adoption of personal achievement goals. The British Journal of Educational Psychology, 81, 680–699. doi: 10.1111/j.20448279.2010.02012.x Senko, C., Hulleman, C. S., & Harackiewicz, J. M. (2011). Achievement goal theory at the crossroads: Old controversies, current challenges, and new directions. Educational Psychologist, 46, 26–47. doi: 10.1080/00461520.2011.538646 Spinath, B., & Schöne, C. (2003). Die Skalen zur Erfassung subjektiver Überzeugungen zu Bedingungen von Erfolg in Lernund Leistungskontexten (SE-SÜBELLKO). In J. StiensmeierPelster & F. Rheinberg (Eds.), Diagnostik von Motivation und Selbstkonzept [Diagnosis of motivation and self-concept] (pp. 15–27). Göttingen, Germany: Hogrefe. Spinath, B., Stiensmeier-Pelster, J., Schöne, C., & Dickhäuser, O. (2002). Die Skalen zur Erfassung von Lern- und Leistungsmotivation (SELLMO) [Scales for the assessment of learning and performance motivation (SELLMO)]. Göttingen, Germany: Hogrefe. Steuer, G., Rosentritt-Brunn, G., & Dresel, M. (2013). Dealing with errors in mathematics classrooms: Structure and relevance of

Ó 2017 Hogrefe Publishing

perceived error climate. Contemporary Educational Psychology, 38, 196–210. doi: 10.1016/j.cedpsych.2013.03.002 Tapola, A., & Niemivirta, M. (2008). The role of achievement goal orientations in students’ perceptions of and preferences for classroom environment. The British Journal of Educational Psychology, 78, 291–312. doi: 10.1348/ 000709907X205272 Urdan, T. (2004). Using multiple methods to assess students’ perceptions of classroom goal structures. European Psychologist, 9, 222–231. doi: 10.1027/1016-9040.9.4.222 Urdan, T., Midgley, C., & Anderman, E. M. (1998). The role of classroom goal structure in students’ use of self-handicapping strategies. American Educational Research Journal, 35, 101–122. doi: 10.3102/00028312035001101 Wolters, C. A. (2004). Advancing achievement goal theory: Using goal structure and goal orientations to predict students’ motivation, cognition and achievement. Journal of Educational Psychology, 96, 236–250. doi: 10.1037/0022-0663. 96.2.2 Wormington, S. V., & Linnenbrink-Garcia, L. (2016). A new look at multiple goal pursuit: The promise of a person-centered approach. Educational Psychology Review. Advance online publication. doi: 10.1007/s10648–016-9358–2 Received November 14, 2016 Revision received November 26, 2016 Accepted November 29, 2016 Published online July 12, 2017 Marko Lüftenegger Department of Applied Psychology: Work, Education and Economy Faculty of Psychology University of Vienna Universitätsstr. 7 1010 Vienna Austria marko.lueftenegger@univie.ac.at

Zeitschrift für Psychologie (2017), 225(1), 64–75

Original Article

Parents’ and Teachers’ Opinions on Bullying and Cyberbullying Prevention The Relevance of Their Own Children’s or Students’ Involvement Petra Gradinger,1 Dagmar Strohmeier,1 and Christiane Spiel2 1

School of Medical Engineering and Social Sciences, University of Applied Social Sciences Upper Austria, Linz, Austria

Department of Applied Psychology: Work, Education and Economy, Faculty of Psychology, University of Vienna, Austria

Abstract: The goals of the present study were (1) to examine parents’ and teachers’ opinions on bullying and cyberbullying prevention, and (2) to investigate whether the involvement of their children or students in bullying affects their opinions. Altogether, 959 adults (466 parents, 493 teachers) reported on their opinions. More than 95% of parents and teachers regarded bullying as an important topic. Cyberbullying was seen as the least serious form and physical bullying as the most serious one. Ninety-five percent of parents and 90% of teachers stated that they would accept a bullying prevention program; 61% of parents and 75% of teachers were willing to actively participate in bullying prevention; 34% of parents and 66% of teachers reported that their own children or students were victims of bullying. This involvement moderated teachers’ opinions. Teachers of students affected by bullying rated verbal and cyberbullying as more serious, accepted prevention programs more readily, and were more willing to actively participate in a program compared to teachers whose students were not involved. Keywords: bullying, parents, teachers, attitudes, prevention

A substantial number of children and youth are engaged in or suffer from bullying, a subcategory of aggressive behavior (Roland, 1989). Negative consequences for bullies as well as for victims are well documented and there is an international call for a prevention policy (Inchley et al., 2016; Spiel & Strohmeier, 2011). To put effective bullying prevention into practice, measures on several levels are necessary. According to the socio-ecological perspective, mechanisms of bullying lay not only on the individual level, but also on the level of schools and families (Espelage & Swearer, 2004). Therefore, it is important to better understand the opinions and the behaviors of parents and teachers and to integrate this knowledge into prevention programs. Research already demonstrated that bullying prevention programs are more effective when they also include trainings for teachers and parents (Fox, Farrington, & Ttofi, 2012). Moreover, a substantial heterogeneity regarding the commitment to anti-bullying work in schools among teachers and parents was reported affecting the fidelity of highquality program implementation (Schultes, Stefanek, van de Schoot, Strohmeier, & Spiel, 2014). This lack of commitment may be due to several factors, like negative Zeitschrift für Psychologie (2017), 225(1), 76–84 DOI: 10.1027/2151-2604/a000278

opinions toward prevention programs, a lack of responsibility to prevent and intervene in bullying incidents, or a lack of knowledge regarding the seriousness of different forms of bullying, thus seeing bullying not as a problem at school (Cunningham et al., 2016; Green et al., 2017). What has not been investigated so far is, whether adults’ opinions are moderated by the bullying involvement of their children or students’. It is likely that parents or teachers whose children or students were already involved in bullying and who had told them about these incidents differ from teachers and parents without this involvement. Therefore, the main goal of the present study is to shed light on this question.

Consequences of Bullying and Cyberbullying for Young People Bullying is characterized by intent to harm, repetition, and imbalance of power (Olweus, 1993; Roland, 1989; Smith & Sharp, 1994). Bullying can take many forms, ranging from physical, over verbal to relational bullying. A large body of studies demonstrates that the involvement in bullying and Ó 2017 Hogrefe Publishing

P. Gradinger et al., Parents’ and Teachers’ Opinions on Bullying Prevention

victimization is a health risk. Both bullies and victims exhibit lower levels of health and well-being, and report higher levels of depression (Crick, & Grotpeter, 1995; Isaacs, Hodges, & Salmivalli, 2008; Klomek, Marrocco, Kleinman, Schonfeld, & Gould, 2007), anxiety (Kaltiala-Heino, Rimpelä, Rantanen, & Rimpelä, 2000), suicidality (Bonanno & Hymel, 2010; Klomek et al., 2007), and psychosomatic symptoms (Kaltiala-Heino et al., 2000) compared to uninvolved students. Furthermore, both bullies and victims show more school truancy, feel more unsafe at schools (Berthold & Hoover, 2000; Juvonen, Nishina, & Graham, 2000), and have lower academic achievements compared with uninvolved youth (Glew, Fan, Katon, Rivara, & Kernic, 2005; Juvonen et al., 2000; Nansel et al., 2001). Moreover, bullying has also long-term effects which extend into adulthood (Gladstone, Parker, & Malhi, 2006; Isaacs et al., 2008). With the increase of modern forms of communication tools like computers and mobile phones, bullying can also be carried out in cyberspace. This new form of bullying is called cyberbullying (Li, 2006; Smith et al., 2008). Although cyberbullying is carried out less frequently than bullying (Inchley et al., 2016), a huge body of evidence shows that cyberbullies and cybervictims also suffer from large number of negative consequences (Kowalski, Giumetti, Schroeder, & Lattanner, 2014). This is not surprising, as there is a high overlap between cyberbullying and traditional bullying (Gradinger, Strohmeier, & Spiel, 2009, 2012). Taken together, bullying has been identified as a major public health problem (Srabstein & Leventhal, 2010) and a threat to the educational system and economy (Cowie & Jennifer, 2008). Therefore, bullying prevention and antibullying intervention in schools are of high importance. Taking a socio-ecological perspective (Espelage & Swearer, 2004), bullying incidents unfold in social contexts which are not only constituted by single individuals such as the bully or the victim (Yoon, 2004), but also by the interaction of peers (e.g., bystanders, reinforcers, defenders; Salmivalli, 2010) and adults (e.g., teachers, school administrators, counselors; Hawkins, Pepler, & Craig, 2001). It is therefore important, that bullying prevention and intervention efforts target the whole system, aiming for a supportive and respectful school climate where students can feel safe and secure. This is best done by so-called whole-school approaches, where teachers, parents, and students are committed to creating a bullying-resistant climate (Bosworth & Judkins, 2014).

The Importance of Parents’ and Teachers’ Opinions for Their Commitment to Prevention Meta-analyses show that bullying prevention programs are on average effective in reducing traditional bullying and Ó 2017 Hogrefe Publishing

victimization (e.g., Farrington & Ttofi, 2009; Ttofi & Farrington, 2011). Recently, several programs that specifically tackle cyberbullying have been developed and evaluated (e.g., Menesini, Nocentini, & Palladino, 2012; OrtegaRuiz, Del Rey, & Casas, 2012; Pieschl & Porsch, 2013; Schultze-Krumbholz, Zagorscak, Wölfer, & Scheithauer, 2014). Meanwhile, there is increasing evidence, that bullying prevention programs are effective to also prevent cyberbullying besides bullying (Gradinger, Yanagida, Strohmeier, & Spiel, 2014; Williford et al., 2013; Yanagida, Strohmeier, & Spiel, 2016), even with long-term effects (Gradinger, Yanagida, Strohmeier, & Spiel, 2016). Nevertheless, it has been shown in meta-analyses, that several implementation features and program components of bullying prevention programs are differentially associated with program effectiveness (Fox et al., 2012). Programs containing teacher or parent trainings are more effective than programs without these elements (Fox et al., 2012; Ttofi & Farrington, 2011). Moreover, programs with longer duration and more components for teachers and students are more effective than programs with shorter duration and fewer components. These studies emphasize the importance of teachers’ and parents’ commitment, who need to actively participate in such programs in order to be most effective. Research already demonstrated the negative effect of low teacher’s commitment on bullying prevention program effectiveness. Teachers’ self-efficacy was significantly more enhanced in schools where a bullying prevention program had been implemented with high fidelity. Furthermore, only teachers with high participant responsiveness significantly changed their behavior in bullying situations after program participation (Schultes et al., 2014). But why are educators not enough committed to implement and sustain bullying prevention programs? Cunningham and colleagues (2016) interviewed 103 teachers and identified the following opinions as barriers: curriculum demands limit the time for training, implementation, and prompt responses to bullying. Principals fail to back up teachers; ambivalent colleagues, uncooperative parents, and a lack of evidence reduce teachers’ commitment to program implementation. Teachers feel frustrated and discouraged, and struggle to mobilize the enthusiasm needed to ensure successful implementation. Besides organizational support for teachers within the schools, it is crucial that they have high awareness of the seriousness of all forms of bullying, not only the overt ones, and that they are willing to intervene and to prevent bullying (Duy, 2013). Unfortunately, not all forms of bullying are seen as serious which is why some teachers do not hold intolerant attitudes against bullying (Craig, Henderson, & Murphy, 2000). This is problematic, as it is crucial that teachers are committed to the prevention program content and are Zeitschrift für Psychologie (2017), 225(1), 76–84

P. Gradinger et al., Parents’ and Teachers’ Opinions on Bullying Prevention

informed about the importance of high fidelity implementation, as their attitudes are related to students’ outcomes in school-based violence prevention programs (Biggs, Vernberg, Twemlow, Fonagy, & Dill, 2008). For instance, teachers’ opinions on staff importance in resolving bully-victim problems predicted teacher implementation of the prevention program’s classroom procedures. Research shows that teachers’ own victimization experiences as a student are associated with their later bullying prevention competence (Kokko & Pörhölä, 2009). Teachers who had been victimized by their peers when they were students were better equipped to deal with bullying than non-victims, or those who had been victimized less consistently. It was demonstrated that higher levels of empathy felt with the victim, combined with higher levels of willingness to communicate bullying incidents and high communication competence in general, are a solid foundation for bullying prevention. To the best of our knowledge, the question whether the bullying involvement of their own children or students affects teachers’ and parents’ opinions regarding bullying has not been investigated yet. As not all educators will have these kinds of intense first-hand experiences, it is important to find out whether this kind of indirect involvement in bullying moderates opinions about bullying as a problem at school as well as about bullying prevention.

The Present Study Considering teachers’ and parents’ importance for bullying and cyberbullying prevention at school, there is a need to better understand the factors influencing their commitment to prevention programs, especially as intense programs with more elements and a longer duration are more effective. Empirical studies showed that there is heterogeneity among adults and that several teachers and parents have rather negative opinions regarding bullying prevention. Concretely, many adults think that bullying is not an important problem at school, especially covert forms of bullying are not so serious, and school staff might not be responsible for resolving bully-victim problems, thus should not be involved in bullying prevention programs or intervention efforts. As teachers’ own experience as a child has an influence on their pro-bullying prevention competence, it is conceivable that their indirect involvement through their children or students also moderates their opinions. Therefore, the present study aims (1) to better understand parents’ and teacher’ opinions regarding bullying as a problem and bullying prevention at school and (2) to investigate whether their indirect involvement in bullying via their students’ or children’s moderates these opinions. This knowledge is important to better understand factors underlying teachers’ and parents’ commitment to bullying Zeitschrift für Psychologie (2017), 225(1), 76–84

prevention programs. We hypothesize that teachers’ and parents’ indirect involvement in bullying via their students or children moderates their opinions. Concretely, we expect those adults, whose children or students were involved in bullying, to perceive bullying more as a problem at school (importance, seriousness, prevalence) and to accept systematic bullying prevention programs (acceptance, willingness for participation, inclusion of school staff, and parents in prevention) more than adults without indirect bullying involvement.

Method Design and Procedure The present study was conducted in Austria. After the study was approved by the relevant bodies in Austria, a variety of methods were used to reach teachers and parents in numerous regions of the country. Homepages of schools’ and parents’ associations in each of the nine federal states stratified by big cities, small cities, and rural areas were searched for and key multipliers were contacted to inform teachers and parents about the study. Interested teachers and parents were then contacted personally by mail or e-mail from research assistants and asked to complete the questionnaire either in a paper-pencil version or online. In addition, all participants were asked to recommend the study to other teachers or parents (snowball sampling). For parents, the online questionnaire link was also posted on websites, for instance on parents’ associations or special interest groups (e.g., www.eltern-forum.at, www.kinder. univie.ac.at/forum, www.psychoforum.at). To also reach parents without online affinity, they were personally approached at public playgrounds and sports clubs. After obtaining their active consent, the participants were assured that their answers would be kept confidential and that they could stop answering the questions at any time. Participants The present study is based on two samples, a parents’ and a teachers’ sample. Parent Sample In total, 466 parents responded to the questionnaire. On average, the parents were 43.34 years old (range: 26–68 years). The majority of respondents were female. Half of the parents came from large cities, while nearly the same amount came from small towns or rural areas. Only a minority of parents came from single parent families. Teacher Sample In total, 493 teachers responded to the questionnaire. On average, the teachers were 44.52 years old (range: 23–64 years). The majority of respondents were female. Ó 2017 Hogrefe Publishing

P. Gradinger et al., Parents’ and Teachers’ Opinions on Bullying Prevention

Half of the teachers came from large cities and a minority came from small towns. On average, the teachers had 16.51 years of teaching experience (range: 1–40 years). The teachers in the study were significantly – although not substantially – older, F(939, 1) = 4.894, p < .05, than the parents participating, had a similar gender distribution, w2(1) = 2.349, p > .05, and more came from rural areas – while fewer came from small towns – than parents, w2(2) = 15.399, p < .001. More sample details are available as Table 1 in the Electronic Supplementary Material, ESM 1. Measures To ensure as many complete data sets as possible, the questionnaire was as short as possible. Therefore, only one item was used to measure each construct. For the online version of the questionnaire, the tool SoSci Survey (https://www.soscisurvey.de/) was used. At the beginning of the questionnaire, the following definition of bullying was provided: “We call it bullying when a student who cannot easily defend her- or himself is teased, hassled or attacked by one or more students repeatedly and over a longer period of time. These attacks can be carried out physically (e.g., beating, kicking, or boxing someone), verbally (e.g., insulting someone), relationally (e.g., excluding someone) or via computer or mobile phone (e.g., mean e-mails or text messages).” Importance of Bullying at School The self-reported importance of bullying at school was measured with one item: “How important is the topic of bullying at Austrian schools?” Response options ranged from (1) not at all important, (2) not so important, (3) important, to (4) very important. Seriousness of Bullying The self-reported seriousness of bullying was measured with four items, regarding physical, verbal, relational, and cyberbullying, respectively: “How serious are the following bullying behaviors? – physical behaviors (e.g., hitting, kicking, boxing someone), verbal behaviors (e.g., insulting someone), relational behavior (e.g., excluding someone), and cyber behaviors (e.g., writing mean e-mails or SMS).” The response option range was: (1) not at all serious, (2) not so serious, (3) serious, (4) very serious. Prevalence Estimation of Bullying To measure beliefs about the prevalence of bullying at school, the respondents were asked to rate the prevalence of bullying and victimization in Austrian schools: “How big is the percentage of students who are involved in bullying [or victimization] at school?” Response options ranged from (0) zero percent to (100) one hundred percent. Ó 2017 Hogrefe Publishing

Prevalence Estimation of Victims’ Reports To measure beliefs about the prevalence of victims’ reporting about bullying, the respondents were asked to rate the prevalence of victims who report the incident to teachers or parents: “How big is the percentage of victims who are reporting the incident to a teacher [or parent]?” Response options ranged from (0) zero percent to (100) one hundred percent. Acceptance of a Prevention Program One item was used to measure the acceptance of a bullying prevention program at their own school: “Would you like to have a bullying prevention program at your school?” Response options ranged from (0) no to (1) yes. Willingness to Participate One item was used to measure the willingness to participate in a bullying prevention program at their own school: “Would you actively participate in a bullying prevention program at your school?” Response options were: (0) no, (1) yes, and (2) maybe. Target Groups for Prevention Four items measured possible target groups of a bullying prevention program at their own school: “Who should participate in a bullying prevention program at your school? Students? Teachers? Parents? Principals?” Response options for each were: (0) no and (1) yes. Bullying Involvement of Their Children or Students Two items measured bullying involvement of participants’ own children or students: “Did any of your students (children) report being bullied by class-mates?” Response options were: (0) no and (1) yes. See the questionnaire under osf.io, project name: Parents’ and Teachers’ Opinions on Bullying and Cyberbullying Prevention.

Results All analyses were calculated with IBM SPSS Statistics, Version 24. In the first step, we compared the mean levels of parents’ and teachers’ reports and used a series of single factor analyses of variance (ANOVAs) with the independent factor target group (teachers vs. parents) and used Pearson’s chi-squared tests (w2) to test for differences of the categorical data. In the second step, a series of two-factor ANOVAs with target group (teachers vs. parents) and involvement (involved vs. noninvolved) as independent variables were conducted to test whether the past bullying experiences of their own children or students would moderate the opinions. Again, Pearson’s chi-squared tests (w2) were used to test for differences of the categorical data. Zeitschrift für Psychologie (2017), 225(1), 76–84

P. Gradinger et al., Parents’ and Teachers’ Opinions on Bullying Prevention

Teachers’ and Parents’ Opinions on Bullying as a Problem Generally, teachers and parents of the study rate bullying as an important or very important topic at school and see all forms of bullying as serious or very serious. Bullying at school received a similar importance rating from both parents and teachers, F(956, 1) = 0.466, p > .05. However, differences between the two groups were found with regard to the seriousness of the various forms of bullying. Parents attach a higher seriousness rating to physical bullying than teachers, F(956, 1) = 4.705, p < .05, while teachers rate verbal bullying, F(957, 1) = 11.787, p < .01, and relational bullying, F(959, 1) = 16.927, p < .001, as more serious than parents. Parents and teachers rate cyberbullying similarly high, F(433, 1) = 0.972, p > .05. They differ in their estimates of prevalence rates: Parents estimate both bullying, F(942, 1) = 5.893, p < .05, and victimization rates, F(940, 1) = 12.034, p < .01, higher than teachers. Parents also estimate the rate of victims reporting incidents to parents as higher than teachers do, F(936, 1) = 5.137, p < .05. Their estimates of the rate of victims reporting incidents to teachers are similarly high, F(939, 1) = 0.047, p > .05. Means and standard deviations are provided in Table 2 in ESM 1. Importance and Seriousness of Bullying Forms by Target Group and Involvement Regarding the variable importance of bullying, results revealed neither a main effect of target group, F(956, 1) = 1.313, p > .05, nor of involvement, F(956, 1) = 2.892, p > .05. However, a significant interaction effect of target group and involvement was found, F(956, 1) = 5.042, p < .05. Involved teachers rate the importance of bullying at school higher than uninvolved teachers, while both involved and uninvolved parents rate the importance of bullying at school similarly high. Regarding the variable seriousness of physical bullying, results revealed a main effect of target group, F(996, 1) = 4.029, p < .05, while no main effect of involvement, F(959, 1) = 0.033, p > .05, and no interaction effect of target group and involvement, F(959, 1) = 1.648, p > .05, was found. Parents rate the seriousness of physical bullying higher than teachers. Regarding the variable seriousness of verbal bullying, results revealed a main effect of target group, F(957, 1) = 7.110, p < .001. Results also showed a main effect of involvement, F(957, 1) = 4.432, p < .05. No interaction effect between target group and involvement was found, F(959, 1) = 1.658, p > .05. Teachers rate the seriousness of verbal bullying higher than parents, and involved adults rate verbal bullying as more serious than uninvolved adults. Zeitschrift für Psychologie (2017), 225(1), 76–84

Regarding the variable seriousness of relational bullying, results revealed a main effect of target group, F(959, 1) = 16.288, p < .001. There was no main effect of involvement, F(959, 1) = 0.112, p > .05, and no interaction effect of target group and involvement, F(959, 1) = 1.585, p > .05. Teachers rate the seriousness of relational bullying higher than parents. Regarding the variable seriousness of cyber bullying, results revealed no main effect of target group, F(433, 1) = 3.665, p > .05. However, a main effect of involvement, F(433, 1) = 10.645, p < .01, and an interaction between target group and involvement, F(433, 1) = 21.446, p < .001, were found. Involved teachers rate the seriousness of cyber bullying higher than uninvolved teachers, while involved parents rate the seriousness of cyber bullying similarly as uninvolved parents. Means and standard deviations are provided in Table 3 in ESM 1. Prevalence Estimates by Target Group and Involvement Regarding the prevalence estimates of bullying, results revealed a main effect target group, F(942, 1) = 12.874, p < .001. Results also showed a main effect involvement, F(942, 1) = 17.607, p < .001. No interaction between target group and involvement was found, F(942, 1) = 0.062, p > .05. Parents estimate bullying rates higher than teachers and involved adults estimate bullying rates higher than uninvolved adults. Regarding the prevalence estimates of victimization, results revealed a main effect target group, F(940, 1) = 19.926, p < .001, and a main effect of involvement, F(940, 1) = 14.423, p < .001. No interaction between target group and involvement was found, F(940, 1) = 0.225, p > .05. Parents estimate victimization rates higher than teachers and involved adults estimate bullying rates higher than uninvolved adults. Regarding the prevalence estimates of victim’s reporting incidents to teachers, results revealed no main effect target group, F(939, 1) = 0.118, p > .05, nor involvement, F(939, 1) = 3.779, p > .05. However, an interaction effect target group and involvement was found, F(939, 1) = 7.844, p < .01. Involved teachers estimate the victim’s rate reporting to teachers higher than uninvolved teachers, while involved parents estimate the victim’s rate reporting to teachers similarly as uninvolved parents. Regarding the prevalence estimates of victim’s reporting incidents to parents, results revealed a main effect target group, F(936, 1) = 5.598, p < .05. No main effect involvement was found, F(939, 1) = 3.779, p > .05. However, an interaction target group and involvement was revealed, F(939, 1) = 7.844, p < .01. Parents estimate the victim’s rate reporting to parents higher than teachers. Involved teachers estimate Ó 2017 Hogrefe Publishing

P. Gradinger et al., Parents’ and Teachers’ Opinions on Bullying Prevention

the victim’s rate reporting to parents higher than uninvolved teachers, while involved parents estimate the victim’s rate reporting to parents similarly as uninvolved parents. Means and standard deviations are provided in Table 4 in ESM 1. Bullying Prevention Opinions by Target Group and Involvement In order to investigate the relevance of involvement for parents’ and teachers’ bullying prevention attitudes, we conducted – separately by parents and teachers – Pearson’s chi-squared test (w2), to test the comparability of frequency distributions regarding adults’ involvement (involved vs. noninvolved). Parents A similar number of uninvolved and involved parents accept a bullying prevention program at their school, w2(1) = 0.005, p > .05, and are willing to participate in a bullying prevention program, w2(1) = 0.068, p > .05. Furthermore, a similar number of uninvolved and involved parents think that students, w2(1) = 3.137, p > .05, teachers, w2(1) = 0.645, p > .05, parents, w2(1) = 0.215, p > .05, and principals, w2(1) = 0.124, p > .05, should participate in bullying prevention programs. Teachers A higher number of involved than uninvolved teachers accept a bullying prevention program at their school, w2(1) = 14.135, p < .001, and are willing to participate in a bullying prevention program, w2(1) = 21.314, p < .001. Furthermore, a higher rate of involved than uninvolved teachers think, that principals, w2(1) = 5.764, p < .001, should participate in bullying prevention programs. A similar rate of uninvolved and involved parents think that students, w2(1) = 0.203, p > .05, teachers, w2(1) = 1.121, p > .05, and parents, w2(1) = 2.216, p > 0.05, should participate in bullying prevention programs. Means and standard deviations are provided in Table 5 in ESM 1.

Discussion The present study described and compared teachers’ and parents’ opinions relevant for bullying prevention. Importantly, the moderating role of teacher’s and parents’ indirect involvement in bullying incidents was explored. Indirect involvement was defined as having an own child (as parent) or having students (as teacher), who had told them that they had been victimized. Altogether, 959 adults – 466 parents and 493 teachers – from various regions

Ó 2017 Hogrefe Publishing

in Austria filled in the questionnaire. In the following, we discuss the most interesting results. Teachers’ and Parents’ Opinions on Bullying as a Problem Nearly all parents (96%) and teachers (97%) in the sample believe that bullying is an important or very important topic for Austrian schools. This is remarkable, especially as schools are normally seen as places to learn and not for socio-emotional development. It is possible that parents and teachers attribute this high importance to bullying in schools because there were many media reports on this topic in Austria and a national strategy to prevent violence in schools was developed and stepwise implemented (Spiel & Strohmeier, 2011). Nearly all parents and teachers rate all forms of bullying as serious or very serious. In Austria, physical bullying is rated as most serious and cyber bullying as least serious form. This is congruent with other studies showing that overt bulling is seen as more serious than covert forms (Craig, Bell, & Leschied, 2011). However, with the increasing importance of social media and the increasing number of media reports about cyberbullying, teachers also become more aware of the seriousness of cyberbullying (Cassidy, Brown, & Jackson, 2012). The prevalence estimations of parents and teachers are rather accurate when compared to the Austrian rates reported in the latest Health Behaviour in School-aged Children (HBSC) study (Inchley et al., 2016). More concretely, their estimations of rates in comparison to a strict cut-off value (at least 2–3 times a month in the last couple of months) are rather high, but in comparison to a lenient cut-off value (at least once in the last couple of months) quite accurate. Taken together, parents and teachers have a good picture about the number of involved youth. Interestingly, parents and teachers have lower estimations regarding how many of the victims report their experiences to teachers or parents. For instance, Dooley, Gradinger, Strohmeier, Cross, and Spiel (2010) found that 24% of female victims and 32% of male victims told their teacher and 50% of female and male victims told their parents about the incidents in Austria. Although parents and teachers seem to underestimate the number of students or children approaching them, they seem to realistically see that parents are more often approached by victims than teachers. Differences in Opinions Between Parents and Teachers Although parents and teachers in general rate all forms of bullying as serious, there are some interesting differences between parents and teachers. While parents rate physical

Zeitschrift für Psychologie (2017), 225(1), 76–84

P. Gradinger et al., Parents’ and Teachers’ Opinions on Bullying Prevention

bullying as more serious than teachers, the teachers rate verbal and relational bullying as more serious than parents. This finding might indicate that the pedagogical “experts,” the teachers, are probably already more informed that also verbal bullying – as the most prevalent form – and relational bullying – as a very covert form – have very serious implications for both victims and bullies (Strohmeier, Gradinger, Schabmann, & Spiel, 2012). Relevance of Involvement for Opinions As expected, indirect involvement in bullying moderated the opinions. Our data show that these moderations were due to teachers who changed their attitudes when having students who told them about their victimization experience. For instance, indirectly involved teachers attribute higher importance to bullying at school, they believe that verbal and cyber bullying are more serious, they estimate prevalence ratings higher, they expect more victims approaching teachers and parents, and they have a higher acceptance of a bullying prevention program at their school as well as they are more willing to actively participate in a bullying prevention program at their school compared with noninvolved teachers. Moreover, the involved teachers believe strongly that principals should actively participate in bullying prevention programs compared to uninvolved teachers. These opinions are all positive and crucial for a high fidelity implementation of whole-school bullying prevention programs. The findings that parents’ opinions regarding the implementation of prevention programs in schools do not change much due to their indirect involvement is not problematic, because overall their pro bullying prevention opinions are already on a very high level. Strengths, Limitations, and Future Research The present study has several strengths. To begin with, a high number of adults answered the questionnaire. This is remarkable, as it is usually difficult to motivate teachers and parents to participate in research. Thus, the high participation rate is another indicator for the high importance of the topic of bullying at school. Furthermore, the external validity of the present study is high. Contrary to studies measuring hypothetical answers in hypothetical situations (Burger, Strohmeier, Spröber, Bauman, & Rigby, 2015), the present study measured opinions regarding real experiences. The present study is not without limitations, however. First of all, we do not know to what degree the reported opinions relate to actual behavior. It can be assumed that there is only a weak or moderate association between opinion and behavior. Future studies might offer also a prevention program and ask for adults’ registration at the end of the questionnaire. Secondly, we compared parents’ and

Zeitschrift für Psychologie (2017), 225(1), 76–84

teachers’ opinions, but we do not know if some parents are also teachers, and if some teachers are also parents. As these co-occurring statuses are very likely, future studies need to disentangle this. Thirdly, we do not know enough about parents’ and teachers’ involvement in victimization of their children or students. For instance, it is possible that a teacher of 30 years teaching experience was told about a bullying incident by a student only once, and another teacher had to deal with several bullying cases every year. Future studies are therefore needed to investigate the intensity of the indirect involvement in depth. Fourthly, it is important to keep in mind that the present study is cross-sectional. Therefore, no causal mechanism can be detected. Fifthly, the present study relied on a nonprobability sample, which might not be representative of the teacher and parent population. While we used a wide variety of methods to reach this hard-to-reach population, future studies with stratified random samples, for example, randomly selected children, teachers, and parents from the same schools, stratified by regions and other variables influencing the dependent variables of the study, need to verify the present results. Finally, only single items were used to measure the variables of the study. Although normally multiple-item scales are recommended to measure latent constructs in psychological research, we used more simple and concrete statements, which are easy to understand to heighten study adherence. Future studies are advised to use multiple-item measures to be able to separate measurement error. To summarize, the present study demonstrated that both parents and teachers have high pro bullying prevention opinions. It was also revealed that indirect involvement is important – especially for teachers. To integrate this knowledge when implementing prevention programs in schools is important, as indirect involvement of both teachers and parents might be used to foster the willingness to participate in a whole-school prevention program, for instance through implementing anonymous victim reporting systems at school or distributing online victim reports to adults before asking them for program participation. Acknowledgments We thank all teachers and parents who participated in this study. We are also very grateful to Brigitte Hirschegger, Angelika Killmann, Nathalie Schopper, and Julia Strobl for their invaluable work regarding data collection. The conceptualization, development of measurement, and data collection were funded by the Austrian Federal Ministry for Education, Arts and Cultural Affairs. The data analyses and writing of the present study were funded by the Platform for Intercultural Competences, University of Applied Sciences Upper Austria.

Ó 2017 Hogrefe Publishing

P. Gradinger et al., Parents’ and Teachers’ Opinions on Bullying Prevention

The authors declare that they have no competing interests.

References Berthold, K. A., & Hoover, J. H. (2000). Correlates of bullying and victimization among intermediate students in the midwestern USA. School Psychology International, 21, 65–78. Biggs, B. K., Vernberg, E. M., Twemlow, S., Fonagy, P., & Dill, E. (2008). Teacher adherence and its relation to teacher attitudes and student outcomes in an elementary schoolbased violence prevention program. School Psychology Review, 37, 533–549. Bonanno, R., & Hymel, S. (2010). Beyond hurt feelings: Investigating why some victims of bullying are at greater risk for suicidal ideation. Merrill-Palmer Quarterly, 56, 420–442. Bosworth, K., & Judkins, M. (2014). Tapping into the power of school climate to prevent bullying: One application of schoolwide positive behavior interventions and supports. Theory Into Practice, 53, 300–307. doi: 10.1080/00405841. 2014.947224 Burger, C., Strohmeier, C., Spröber, N., Bauman, S., & Rigby, K. (2015). How teachers respond to school bullying: An examination of self-reported intervention strategy use, moderator effects, and concurrent use of multiple strategies. Teaching and Teacher Education, 51, 191–202. doi: 10.1016/j.tate.2015. 07.004 Cassidy, W., Brown, K., & Jackson, M. (2012). “Under the radar”: Educators and cyberbullying in schools. School Psychology International, 33, 520–532. doi: 10.1177/0143034312445245 Cowie, H., & Jennifer, D. (2008). New perspectives on bullying. New York, NY: McGraw-Hill. Craig, K., Bell, D., & Leschied, A. (2011). Pre-service teachers’ knowledge and attitudes regarding school-based bullying. Canadian Journal of Education Revue Canadienne de L’éducation, 34, 21–33. Craig, W. M., Henderson, K., & Murphy, J. G. (2000). Prospective teachers’ attitudes toward bullying and victimization. School Psychology International, 21, 5–21. doi: 10.1177/ 0143034300211001 Crick, N. R., & Grotpeter, J. K. (1995). Relational aggression, gender, and social-psychological adjustment. Child Development, 66, 710–722. doi: 10.1111/j.1467-8624.1995.tb00900.x Cunningham, C. E., Rimas, H., Mielko, S., Mapp, C., Cunningham, L., Buchanan, D., . . . Marcus, M. (2016). What limits the effectiveness of antibullying programs? A thematic analysis of the perspective of teachers. Journal of School Violence, 15, 460–482. doi: 10.1080/15388220.2015.1095100 Dooley, J., Gradinger, P., Strohmeier, D., Cross, D., & Spiel, C. (2010). Cyber-victimisation: The association between helpseeking behaviours and self-reported emotional symptoms in Australia and Austria. Australian Journal of Guidance and Counselling, 20, 194–209. doi: 10.1375/ajgc.20.2.194

Ó 2017 Hogrefe Publishing

Duy, B. (2013). Teachers’ attitudes toward different types of bullying and victimization in Turkey. Psychology in the Schools, 50, 987–1002. doi: 10.1002/pits.21729 Espelage, D., & Swearer, S. (2004). Bullying in American schools: A social-ecological perspective on prevention and intervention. Mahwah, NJ: Erlbaum. Farrington, D. P., & Ttofi, M. M. (2009). School-based programs to reduce bullying and victimization. Campbell Systematic Reviews, 2009, 6. Fox, B. H., Farrington, D. P., & Ttofi, M. M. (2012). Successful bullying prevention programs: Influence of research design, implementation features, and program components. International Journal of Conflict and Violence, 6, 273–282. Received from http://ijcv.org/index.php/ijcv/article/viewFile/ 245/pdf_65 Gladstone, G. L., Parker, G. L., & Malhi, G. S. (2006). Do bullied children become anxious and depressed adults? A crosssectional investigation of the correlates of bullying and anxious depression. The Journal of Nervous and Mental Disease, 194, 201–208. doi: 10.1097/01.nmd.0000202491.99719.c3 Glew, G. M., Fan, M.-Y., Katon, W., Rivara, F. P., & Kernic, M. A. (2005). Bullying, psychological adjustment, and academic performance in elementary school. Archives of Pediatrics and Adolescent Medicine, 159, 1026–1031. doi: 10.1001/archpedi. 159.11.1026 Gradinger, P., Strohmeier, D., & Spiel, C. (2009). Traditional bullying and cyberbullying: Identification of risk groups for adjustment problems. The Journal of Psychology, 217, 205–213. doi: 10.1027/0044-3409.217.4.205 Gradinger, P., Strohmeier, D., & Spiel, C. (2012). Motives for bullying others in cyberspace: A study on bullies and bullyvictims in Austria. In Q. Li, D. Cross, & P. Smith (Eds.), Cyberbullying in the global playground: Research from international perspectives (pp. 263–284). Chichester, UK: Wiley. Gradinger, P., Yanagida, T., Strohmeier, D., & Spiel, C. (2014). Prevention of cyberbullying and cyber victimization: Evaluation of the ViSC social competence program. Journal of School Violence, 14, 87–110. doi: 10.1080/15388220.2014. 963231 Gradinger, P., Yanagida, T., Strohmeier, D., & Spiel, C. (2016). Effectiveness and sustainability of the ViSC social competence program to prevent cyberbullying and cyber-victimization: Class and individual level moderators. Aggressive Behavior, 42, 181–193. doi: 10.1002/ab.21631 Green, V. A., Johnston, M., Mattioni, L., Prior, T., Harcourt, S., & Lynch, T. (2017). Who is responsible for addressing cyberbullying? Perspectives from teachers and senior managers. International Journal of School & Educational Psychology, 5, 100–114. doi: 10.1080/21683603.2016.1194240 Hawkins, D. L., Pepler, D., & Craig, W. M. (2001). Naturalistic observations of peer interventions in bullying. Social Development, 10, 512–527. doi: 10.1111/1467-9507.00178 Inchley, J., Currie, D., Young, S., Samdal, O., Torsheim, T., Augustson, L., . . . Barnekow, V. (2016). Growing up unequal: Gender and socioeconomic differences in young people’s health and well-being. Health Behaviour in School-aged Children (HBSC) study: International report from the 2013/2014 survey. Copenhagen, Denmark: WHO Regional Office for Europe. Isaacs, J., Hodges, E. V. E., & Salmivalli, C. (2008). Long-term consequences of victimization by peers: A follow-up from adolescence to young adulthood. European Journal of Developmental Science, 2, 387–397. doi: 10.3233/DEV-2008-2404 Juvonen, J., Nishina, A., & Graham, S. (2000). Peer harassment, psychological adjustment, and school functioning in early adolescence. Journal of Educational Psychology, 92, 349–359. doi: 10.1037/0022-0663.92.2.349

Zeitschrift für Psychologie (2017), 225(1), 76–84

P. Gradinger et al., Parents’ and Teachers’ Opinions on Bullying Prevention

Kaltiala-Heino, R., Rimpelä, M., Rantanen, P., & Rimpelä, A. (2000). Bullying at school: An indicator of adolescents at risk for mental disorders. Journal of Adolescence, 23, 661–674. doi: 10.1006/jado.2000.0351 Klomek, A. B., Marrocco, F., Kleinman, M., Schonfeld, I. S., & Gould, M. S. (2007). Bullying, depression, and suicidality in adolescents. Journal of the American Academy of Child Adolescent Psychiatry, 46, 40–49. doi: 10.1097/01.chi.0000242237. 84925.18 Kokko, T. H. J., & Pörhölä, M. (2009). Tackling bullying: Victimized by peers as a pupil, an effective intervener as a teacher? Teaching and Teacher Education, 25, 1000–1008. doi: 10.1016/ j.tate.2009.04.005 Kowalski, R. M., Giumetti, G. W., Schroeder, A. N., & Lattanner, M. R. (2014). Bullying in the digital age: A critical review and meta-analysis of cyberbullying research among youth. Psychological Bulletin, 140, 1073–1137. doi: 10.1037/a0035618 Li, Q. (2006). Cyberbullying in schools. A research of gender differences. School Psychology International, 27, 157–170. doi: 10.1177/0143034306064547 Menesini, E., Nocentini, A., & Palladino, B. (2012). Empowering students against bullying and cyberbullying: Evaluation of an Italian peer-led model. International Journal of Conflict and Violence, 6, 314–321. doi: 10.4119/UNIBI/ijcv.253 Nansel, T. R., Overpeck, M., Pilla, R. S., Ruan, W. J., SimmonsMorton, B., & Scheidt, P. (2001). Bullying behaviors among us youth. The Journal of the American Medical Association, 285, 2094–2100. doi: 10.1001/jama.285.16.2094 Olweus, D. (1993). Bullying at school: What we know and what we can do. Oxford, UK: Blackwell. Ortega-Ruiz, R., Del Rey, R., & Casas, J. (2012). Knowing, building and living together on internet and social networks: The conred cyberbullying prevention program. International Journal of Conflict and Violence, 6, 303–313. doi: 10.4119/UNIBI/ijcv.250 Pieschl, S., & Porsch, T. (2013). Das Präventionsprogramm SurfFair gegen Cybermobbing – Eine Einführung [The prevention program Surf-Fair against cyberbullying – An introduction]. Psychologie in Oesterreich, 33, 14–21. Roland, E. (1989). A system oriented strategy against bullying. In E. Roland & E. Munthe (Eds.), Bullying: An international perspective (pp. 143–151). London, UK: David Fulton. Salmivalli, C. (2010). Bullying and the peer group: A review. Aggression and Violent Behavior, 15, 112–120. doi: 10.1016/ j.avb.2009.08.007 Schultes, M.-T., Stefanek, E., van de Schoot, R., Strohmeier, D., & Spiel, C. (2014). Measuring implementation of a school-based violence prevention program on two levels. Fidelity and teachers’ responsiveness as predictors of proximal outcomes. Zeitschrift für Psychologie, 222, 49–57. doi: 10.1027/21512604/a000165 Schultze-Krumbholz, A., Zagorscak, P., Wölfer, R., & Scheithauer, H. (2014). Prävention von Cybermobbing und Reduzierung aggressiven Verhaltens Jugendlicher durch das Programm Medienhelden. Ergebnisse einer Evaluationsstudie [Prevention of cyberbullying and reduction of adolescents’ aggressive

Zeitschrift für Psychologie (2017), 225(1), 76–84

behavior using the “Medienhelden” program. Results from an evaluation study]. Diskurs Kindheits-und Jugendforschung, 9, 61–79. Smith, P. K., Mahdavi, J., Carvalho, M., Fisher, S., Russell, S., & Tippett, N. (2008). Cyberbullying: Its nature and impact on secondary school pupils. Journal of Child Psychology and Psychiatry, 49, 376–385. doi: 10.1111/j.1469-7610.2007. 01846.x Smith, P. K., & Sharp, S. (1994). School bullying: Insights and perspectives. London, UK: Routledge. Spiel, C., & Strohmeier, D. (2011). National strategy for violence prevention in Austrian schools and kindergarten: Development and implementation. International Journal of Behavioral Development, 35, 412–418. doi: 10.1177/0165025411407458 Srabstein, J. C., & Leventhal, B. L. (2010). Prevention of bullyingrelated morbidity and mortality: A call for public health policies. Bulletin of the World Health Organization, 88, 403. doi: 10.2471/ BLT.10.077123 Strohmeier, D., Gradinger, P., Schabmann, A., & Spiel, C. (2012). Gewalterfahrungen von Jugendlichen. Prävalenzen und Risikogruppen [Adolescent’s experiences of violence: Prevalence and groups at risk]. In F. Eder (Ed.), PISA 2009. Nationale Zusatzerhebungen (pp. 165–208). Münster, Germany: Waxmann. Ttofi, M. M., & Farrington, D. P. (2011). Effectiveness of schoolbased programs to reduce bullying: A systematic and metaanalytic review. Journal of Experimental Criminology, 7, 27–56. doi: 10.1007/s11292-010-9109-1 Williford, A., Elledge, C., Boulton, A., DePaolis, K., Little, T., & Salmivalli, C. (2013). Effects of the KiVa antibullying program on cyberbullying and cybervictimization frequency among finnish youth. Journal of Clinical Child and Adolescent Psychology, 42, 820–833. doi: 10.1080/15374416.2013.787623 Yanagida, T., Strohmeier, D., & Spiel, C. (2016). Dynamic change of aggressive behavior and victimization among adolescents: Effectiveness of the ViSC program. Journal of Clinical Child and Adolescent Psychology. doi: 10.1080/15374416.2016. 1233498 Yoon, J. S. (2004). Predicting teacher interventions in bullying situations. Education and Treatment of Children, 27, 37–45. Received November 1, 2016 Revision received December 13, 2016 Accepted December 17, 2016 Published online July 12, 2017 Petra Gradinger School of Applied Health and Social Sciences University of Applied Sciences Upper Austria Garnisonstr. 21 4020 Linz Austria petra.gradinger@gmail.com

Ó 2017 Hogrefe Publishing

Original Article

Intercultural Competence Development Among University Students From a Self-Regulated Learning Perspective Theoretical Model and Measurement Dagmar Strohmeier, Petra Gradinger, and Petra Wagner Faculty of Medical Engineering and Applied Social Sciences, University of Applied Sciences Upper Austria, Linz, Austria

Abstract: Intercultural competence is defined as a lifelong learning task that can be developed in any intergroup situation. A self-regulated learning model is applied to better understand the intercultural learning process that is initiated during the forethought phase, monitored during the performance phase, and evaluated during the self-reflection phase. In each phase, particular psychological constructs are important to initiate, monitor, and evaluate the learning process. The empirical goals of the present study were (1) to develop a self-report questionnaire capturing the three learning phases, (2) to test the theoretical structure of the proposed intercultural learning process, and (3) to examine two theoretically meaningful learning cycles. Data were collected from 188 women and 48 men aged 18–47 years (M = 26.41, SD = 6.19). Structural equation models (SEMs) demonstrated that intercultural learning goals, intercultural self-efficacy, and intercultural intrinsic interest form the latent factor forethought phase. In line with composite models of intercultural competence, the intercultural learning goals had a three-factor structure (knowledge domain, attitude domain, communication domain). Self-monitoring, self-recording, and selfexperimentation form the latent factor performance phase. Mediation analyses provided initial evidence of the existence of two distinct learning cycles: (1) The forethought phase precedes the performance phase which precedes both self-evaluation and success attribution on intercultural competence (constructs of the self-reflection phase). (2) The performance phase precedes optimizing future learning (construct of the self-reflection phase) which precedes the forethought phase indicating the emergence of a future learning action. The theoretical and practical value of the newly developed self-assessment of intercultural competence is discussed. Keywords: self-regulated learning, intercultural competence, structural equation models, intercultural social work, university students, assessment

To foster intercultural competence among university students is an important goal of many degree programs offered at universities worldwide. Assuming that the development of intercultural competence is a lifelong self-regulated learning task, it is important to better understand and to foster the learning process of students in intercultural situations. To model the learning process in intercultural situations, state-of-the-art scholarly knowledge on intercultural competence development was innovatively combined with theories on self-regulated learning. Based on our new theoretical model, the empirical goals of the present study are (1) to develop a self-report questionnaire capturing the three learning phases, (2) to test the theoretical structure of the proposed intercultural learning process, and (3) to examine two theoretically meaningful learning cycles. Ó 2017 Hogrefe Publishing

Intercultural Competence Development Acknowledging that social groups of any size can have their distinct cultures and that every individual belongs simultaneously to many different social groups, Barrett (2013) suggests applying an intergroup perspective to better understand the concept of intercultural competence. According to the intergroup perspective, the context or the situation defines the importance of particular cultural affiliations. When cultural signs are salient and prompt individuals to shift their frame of reference a situation changes from interpersonal to intercultural. Thus, every interpersonal situation is potentially an intercultural situation when salient group memberships lead people to respond to each Zeitschrift für Psychologie (2017), 225(1), 85–94 DOI: 10.1027/2151-2604/a000282

other as cultural group members and not on the basis of their individual characteristics. Therefore, intercultural competence becomes relevant in all intergroup situations to deal with the tasks, difficulties, or challenges presented by them. To better understand the complex competencies needed in an intercultural situation, long lists of personal characteristics comprising cognitions, affects, and behaviors have been suggested and numerous compositional models have been developed in the literature (Spitzberg & Changnon, 2009). Although these models offer a useful structure to identify the multitude of personal characteristics potentially necessary in an intercultural situation, many shortcomings of such models were also identified. To begin with, the rather static view on “trait-like” personal characteristics of such models which often neglect the dynamic, constructivist, and developmental perspective of intercultural competence is a critical point (Hammer, 2015). Instead of understanding competencies as rather stable personal characteristics, Koester and Lustig (2015) argued that competence versus incompetence are social judgments, thus people perceive themselves or are perceived by others as competent versus incompetent in particular social situations (see also Spitzberg & Cupach, 2011). Moreover, the lack of empirical validation of interconnections and causal pathways between the sets of cognitions, affects, and behaviors suggested in the compositional models is considered as problematic (e.g., Barrett, 2013; Spitzberg & Changnon, 2009, van de Vijver & Leung, 2009). Although there is a large number of measures available (Fantini, 2009), a review of available tests to assess intercultural competencies (Matsumoto & Hwang, 2013) identified only three instruments which provided satisfying evidence regarding their underlying theoretical factors. Thus, there is still a need to develop theoretically sound measurements of intercultural competence. To develop a new measurement, we argue that it is important to shift the paradigm to intercultural learning competence development instead of conceptualizing intercultural competence as a trait-like personal characteristic. Such a paradigm shift offers several theoretical advantages. Most importantly, a focus on intercultural learning competence development implies that intercultural competence development is a lifelong learning process (Deardorff, 2015). Consequently, any intercultural situation potentially offers new learning opportunities. Such a theoretical perspective also acknowledges that intercultural learning is a cyclical process, because learning actions follow each other as they are initiated, monitored, and evaluated (see also Holmes & O’Neill, 2012 for a similar approach).

Zeitschrift für Psychologie (2017), 225(1), 85–94

D. Strohmeier et al., Intercultural Competence Measurement

Self-Regulated Learning in Intercultural Situations To be able to model an intercultural learning process, we used the basic ideas of theories on self-regulated learning (Pintrich, 2003; Zimmerman, 2000). Self-regulation is not viewed as a stable characteristic or a trait that is constant across all situations, but it is rather assumed that an individual is not equivalently motivated in all situations. Nevertheless, there is a general consensus that central psychological determinants exist which can have a positive influence on the successful execution of learning actions (Pintrich, 2003). In the model of self-regulated learning developed by Zimmerman (2000), the learning action is divided into three phases that are characterized by particular motivational constructs (see Figure 1). In the forethought phase a learning action is initiated and planned. According to our theorizing, learning is initiated in an intercultural situation when learners set themselves the goal to improve their intercultural competence in several domains. Borrowing ideas from the composite models of intercultural competence (e.g., Bolten, 2006; Erll & Gymnich, 2010; Ting-Toomey & Kurogi, 1998), learners might want to improve their knowledge, attitudes, or communication skills in an intercultural situation. For instance, they might wish to improve their knowledge regarding a particular country, their cultural perspectivetaking, or their ability to communicate. Thus, we theorize that from the perspective of the self-regulated learner the competencies defined in the composite models of intercultural competence reflect potential learning goals in particular intercultural situations. Based on the expectancy-value theory of motivation (Wigfield & Eccles, 2000), learners start to learn if they believe that they can be successful in performing a task (= expectancy component) and if they consider the task to be important (= value component). Thus, learners will initiate a learning process in intercultural

Figure 1. Intercultural learning process.

Ó 2017 Hogrefe Publishing

D. Strohmeier et al., Intercultural Competence Measurement

situations if their self-efficacy and intrinsic interest regarding their intercultural competence are high. In the performance phase, the execution of the learning action takes place. It is assumed that learning only remains attractive for learners when they also know how to learn successfully during an intercultural situation (see e.g., Weinstein & Hume, 1998). During an intercultural situation, learners need to observe their learning process via self-monitoring, self-recording, or self-experimentation. For instance, learners might reflect the appropriateness of their actions (self-monitoring), they might take notes (selfrecording) or they might use different strategies in order to reach their goals (self-experimentation) during intercultural situations. In the self-reflection phase, a functional assessment of the learning actions happens, so that continued learning remains attractive in the future. Only learners who attribute success to their own performance and see failure as something that can be coped with will be able to maintain their appreciation for and expectation of success in the context of learning (see e.g., Weiner, 2005). Thus, it is assumed that learners will evaluate their actions after an intercultural learning situation and that this evaluation is related both with their past and with their future learning in the intercultural situation.

The Present Study The present study aims to understand the learning process of university students to acquire intercultural competence through the perspective of a self-regulated learner. Intercultural competence is defined as a lifelong learning task which can be developed in any intergroup situation. Importantly, intercultural competence is also understood as acting appropriately and effectively in intercultural situations. To build a sound theoretical model of intercultural learning competence development, self-regulated learning models (e.g., Zimmerman, 2000) were combined with composite models of intercultural competence (Bolten, 2006; Erll & Gymnich, 2010; Ting-Toomey & Kurogi, 1998). Based on our theorizing (see Figure 1), a new selfreport questionnaire to measure the intercultural learning process was developed and the following hypotheses were formulated. Based on the self-regulated learning models (Zimmerman, 2000) we assumed that the learning process can be empirically described in three phases. Intercultural learning is initiated during the forethought phase, monitored during the performance phase, and evaluated during the self-reflection phase. We assumed that the various possible intercultural learning goals of a self-regulated learner can be best described Ó 2017 Hogrefe Publishing

with a three-factor structure – knowledge domain, attitude domain, communication domain – reflecting the theorizing of the composite models of intercultural competence (Bolten, 2006; Erll & Gymnich, 2010; Ting-Toomey & Kurogi, 1998). We hypothesized that these three intercultural learning goals together with intercultural self-efficacy and intercultural intrinsic interest would form the forethought phase. We hypothesized that self-monitoring, self-recording, and self-experimentation would form the performance phase, because we expected that self-regulated learners would observe their learning process during an intercultural interaction. We hypothesized that self-evaluation, success attribution, and optimizing future learning, which are important motivational constructs during the self-reflection phase, are differentially related with the learning cycles. While selfevaluation and success attribution are hypothesized to be carried at the end point of a learning action, optimizing future learning is assumed to trigger a new learning action. Thus, based on the self-regulated learning theory (Zimmerman, 2000, see also Figure 1) we were able to formulate two distinct learning cycles for empirical testing. Cycle 1: We hypothesized that motivational beliefs during the forethought phase are positively related with both selfevaluation and success attribution in the self-reflection phase, however we assumed that this association is fully mediated by actions taken during the performance phase (see Figures 2 and 3). Thus, we assumed that the higher the motivational beliefs of learners during the forethought

Figure 2. Mediation model of the first intercultural learning cycle (ILG-K = intercultural learning goals knowledge domain; ILG-A = intercultural learning goals attitude domain; ILG-C = intercultural learning goals communication domain; I-SE = intercultural selfefficacy; I-IIN = intercultural intrinsic interest; S-MO = self-monitoring; S-REC = self-recording; S-EXP = self-experimentation; IT1 = After I acted in intercultural situations I reflect if my strategies were meaningful; IT2 = After I acted in intercultural situations I reflect if my planned procedure is consistent with my actual procedure; IT3 = After I acted in intercultural situations I reflect if I reached my goals; IT4 = After I acted in intercultural situations I reflect if I proceeded appropriately). Zeitschrift für Psychologie (2017), 225(1), 85–94

D. Strohmeier et al., Intercultural Competence Measurement

forethought phase indicating the emergence of a future learning action.

Method

Figure 3. Mediation model of the first intercultural learning cycle (ILG-K = intercultural learning goals knowledge domain; ILG-A = intercultural learning goals attitude domain; ILG-C = intercultural learning goals communication domain; I-SE = intercultural selfefficacy; I-IIN = intercultural intrinsic interest; S-MO = self-monitoring; S-REC = self-recording; S-EXP = self-experimentation).

Figure 4. Mediation model of the second intercultural learning cycle (S-MO = self-monitoring; S-REC = self-recording; S-EXP = selfexperimentation; IT1 = If an intercultural interaction was negative I take the opportunity to reflect how I can change my actions next time; IT2 = If an intercultural interaction was negative I can learn a lot what I can change next time; IT3 = If an intercultural interaction was negative I try to understand my mistakes to better know what I can change next time; I-IIN = intercultural intrinsic interest; I-SE = intercultural selfefficacy; ILG-C = intercultural learning goals communication domain; ILG-A = intercultural learning goals attitude domain; ILGK = intercultural learning goals knowledge domain).

phase, the more self-observing actions would be taken during the performance phase which in turn would be positively related with both self-evaluation and success attribution at the end of a learning action. Cycle 2: We hypothesized that actions taken during the performance phase are positively related with motivational beliefs in the forethought phase, however we assumed that this association is fully mediated by optimizing future learning in the self-reflection phase (see Figure 4). Thus, we assumed that learners who show a high level of selfobservation during the performance phase, would also have a high ability to optimize their future learning which in turn would be positively related with motivational beliefs in the Zeitschrift für Psychologie (2017), 225(1), 85–94

Participants At the beginning of the academic year in 2012, all students (N = 262) enrolled in either the bachelor or the master program of Social Work at the University of Applied Sciences Upper Austria were invited to participate in the study. The invitation email was sent out by the administrative personnel. The students were informed that the study was about intercultural competence development and that their participation was voluntary and anonymous. Data collection took place during regular lessons at the University of Applied Sciences in the following week. A reminder email was sent to all students who were absent on the day of data collection and contained a link to the online survey. The participation rate was very high (90%) and the final sample comprised 236 students; 169 students were enrolled in the bachelor program and 67 students were enrolled in the master program. There were 188 women and 48 men aged 18–47 years (M = 26.41, SD = 6.19). The vast majority (> 90%) of students was monolingual (N = 220), born in Austria (N = 216), and had the Austrian citizenship (N = 225). Measures A self-report questionnaire was developed to measure the psychological constructs relevant in the three learning phases. Several items were newly developed or, if available, they were adapted from existing instruments. The whole questionnaire including all items can be obtained from the authors upon request. A definition of intercultural competence was provided before participants were asked to answer the items: “Below we ask you some questions regarding your intercultural competence. Please think how you perceive yourself right now. The term “intercultural competence” can have different meanings for different people. In order to make it easier for you to answer the questions below, we provide you with a definition. Intercultural competence means that one’s own knowledge, feelings and actions are focused to act appropriately in intercultural situations.” We omitted the term “effectively” in this definition because being effective is something very ambivalent in social work. Social workers usually intervene in rather challenging situations where acting appropriately rather than acting effectively is usually considered more important. The answer options for all items were a 6-point scale ranging between Ó 2017 Hogrefe Publishing

D. Strohmeier et al., Intercultural Competence Measurement

Table 1. Intercultural learning goals: Summary of the exploratory factor analyses M (SD)

Geomin rotated loadings

N = 236

Knowledge

. . .Acquire a broad cultural-and country-specific knowledge

3.92 (1.01)

0.423*

To improve my intercultural competence, I set myself the goal to ...

Attitudes

. . .Engage as intensively as possible with one particular culture

3.23 (1.12)

0.643*

. . .Learn a new language

3.52 (1.32)

0.242*

. . .Acquire knowledge about cultural differences

4.11 (0.87)

0.605*

. . .Acquire knowledge about cultural similarities

3.92 (0.89)

0.819*

. . . Become more aware of my attitudes toward my own culture

3.98 (0.94)

0.627*

. . .Become more aware of my attitudes toward other cultures

4.19 (0.83)

0.365*

. . .Improve my ability to better understand others’ feelings

4.19 (0.91)

0.750*

. . .Increase my open-mindedness

4.09 (0.98)

0.787*

0.259*

. . .Deal better with inconsistencies

4.07 (0.84)

0.549*

. . .Respect other people even more

4.00 (1.04)

0.919*

. . .Not give in immediately in difficult intercultural situations

4.08 (0.88)

0.522*

. . .Discuss the way I talk when difficulties in communication occur

3.63 (1.15)

. . .Address unpleasant topics indirectly, if it is appropriate in the situation

2.94 (1.22)

. . .Address unpleasant topics directly, if it is appropriate in the situation

3.90 (1.00)

. . .Improve my conflict resolution strategies

4.20 (0.81)

Communication

0.436* 0.186* 0.327* 0.430*

0.648*

Note. *p < .01.

0 (= “I fully disagree”), 1 (= “I disagree”), 2 (= “I rather disagree”), 3 (= “I rather agree”), 4 (= “I agree”), and 5 (= “I fully agree”). Forethought Phase This phase comprises motivational beliefs necessary to initiate and plan an intercultural interaction. Intercultural Learning Goals We hypothesized that learners set priorities in three different domains before acting in intercultural situations (Bolten, 2006; Erll & Gymnich, 2010; Ting-Toomey & Kurogi, 1998): – Knowledge domain. Based on the composite models of intercultural competence, seven new items were developed (see Table 1), for example, “To improve my intercultural competence, I set myself the goal to acquire a broad cultural and country-specific knowledge.” – Attitude domain. Based on the composite models of intercultural competence, five new items were developed (see Table 1), for example, “To improve my intercultural competence, I set myself the goal to improve my ability to better understand others’ feelings.” – Communication domain. Based on the composite models of intercultural competence, four new items were developed (see Table 1), for example, “To improve my intercultural competence, I set myself the goal to improve my conflict resolution strategies.” Intercultural self-efficacy describes the motivational belief in one’s ability to reach intercultural goals, to continue an Ó 2017 Hogrefe Publishing

action, and to put in some effort even if the circumstances are very challenging. Learners who have high levels of intercultural self-efficacy believe in their future successes and keep on trying even if they experience some drawbacks in intercultural situations. By modifying existing items (Jerusalem & Satow, 1999), four items were newly developed, for example, “I can act interculturally competent even in difficult situations, if I make an effort.” Intercultural intrinsic interest is characterized by an unconditional and essential interest in the intercultural topic. Learners who have high levels of intercultural intrinsic interest think that developing their intercultural competence is an end in itself. By modifying existing items (Schiefele, Krapp, Wild & Winteler, 1993; Schmitz, Perels, Bruder & Otto, 2003), four items were newly developed, for example, “My major goal is to improve my intercultural competence.” Performance Phase The performance phase comprises competences necessary during an intercultural situation. Because our sample comprises social work students, professional intercultural situations are most likely carried out in a counseling setting. Self-monitoring is the competence to check whether one’s own actions are goal oriented or in accordance with the previous plan. Learners scoring high on self-monitoring are able to monitor their actions, to compare them with their plan, and to check whether they are meaningful. By modifying existing items (Wosnitza, 2000), four items were newly developed, for example, “During an intercultural interaction I regularly check if my previous actions are useful.” Zeitschrift für Psychologie (2017), 225(1), 85–94

Self-recording is the competence to write down or to record what is happening during intercultural situations. Learners scoring high on self-recording are able to produce written protocols and take minutes in order to make notes about what has been happening. Four items were newly developed, for example, “During an intercultural interaction I make notes how I proceed.” Self-experimentation is the competence to vary strategies in order to find out the most effective one. Learners scoring high on self-experimentation intentionally use different strategies to be able to identify which one is working best in the particular intercultural situation. Four items were newly developed, for example, “During an intercultural interaction I try out different strategies to find out which one fits best.” Self-Reflection Phase The self-reflection phase comprises competencies necessary after an intercultural situation. Because of the cyclical nature of the intercultural learning process, the results of the self-reflection phase also influence whether the learner will reengage in the next intercultural learning action. Self-evaluation is the competence to judge whether the chosen procedures were meaningful and to reflect whether the previously formulated goals were reached. Four items were newly developed, for example, “After I acted in intercultural situations I reflect if I reached my goals.” Success attributed to intercultural competence is the causal attribution of a successful intercultural interaction to the own intercultural competence. The following item was newly developed: “If an intercultural interaction was positive, it was because of my intercultural competence.” Optimizing future learning is the competence to reflect one’s own actions for future learning in case of failure. Learners scoring high on optimizing future learning do not repeat the same ineffective or unsuccessful action over and over again, but try out other behavioral alternatives if the intercultural interaction was negative. By modifying existing items (Schmitz et al., 2003), three items were newly developed, for example, “If an intercultural interaction was negative, I try to understand my mistakes to better know what I can change next time.” Data Analytical Strategy All data analyses were carried out using Mplus 7. We computed (1) measurement models (both EFAs and CFAs) to evaluate the construct validity of all constructs, (2) structural models on the factorial structure of the forethought phase and performance phase, and (3) mediation models to test the two proposed learning cycles. To evaluate the model fit three criteria were used: the chi-square test, the Comparative Fit Index (CFI; Bentler, 1990), and the root mean squared error of approximation (RMSEA; Steiger, 1990). Zeitschrift für Psychologie (2017), 225(1), 85–94

D. Strohmeier et al., Intercultural Competence Measurement

Nonsignificant chi-square values indicate good model fit. However, because chi-square is known to be sensitive to sample size, CFI and RMSEA indices of fit were also important to examine. CFI ranges from 0 to 1.00 with values above 0.95 indicating good, values above 0.90 indicating adequate fit. RMSEA ranges from 0 to 1, with values below 0.05 indicating good, values below 0.08 indicating adequate fit. Maximum likelihood estimation using the MLR estimator of Mplus was implemented providing standard errors and test statistics that are robust to non-normality of the data and to nonindependence of observations.

Results Measurement Models Intercultural Learning Goals To establish the factor structure of the Intercultural Learning Goals, an exploratory factor analysis (EFA) was applied. Three EFAs ranging from 1 to 3 factors with maximum likelihood estimation (MLR) were conducted. In line with the composite models of intercultural competence, the three-factor solution, w2(75) = 150.39, p < .01, CFI = 0.91, RMSEA = 0.065, described the data much better than the one-, w2(104) = 319.99, p < .01, CFI = 0.74, RMSEA = 0.082, or two-factor solution, w2(89) = 189.58, p < .01, CFI = 0.88, RMSEA = 0.069. As shown in Table 1, it is possible to structure the intercultural learning goals according to three meaningful underlying dimensions. Confirmatory Factor Analyses To establish the factor structure of all constructs, confirmatory factor analyses (CFAs) were applied separately for each construct. As shown in Table 2, all constructs showed satisfying construct validity. Therefore, no item was deleted from the subsequent analyses. Structural Models To construct the structural models, parcels (= scale means) were used. The means, standard deviations, and bivariate correlations of all manifest variables are presented in Table 3. Parcels are preferred for the consecutive analyses because, compared with items, parcels have a superior psychometric quality that reduces both Type I and Type II sources of error but does not bias or otherwise inflate construct relations (for details, see Little, 1997). Forethought Phase Five constructs (intercultural learning goals with three domains, intercultural self-efficacy, and intercultural intrinsic interest) were constructed to form the latent factor forethought phase. The model showed an excellent fit, w2(5) = 8.41, p = .013, CFI = 0.978, RMSEA = 0.054. Ó 2017 Hogrefe Publishing

D. Strohmeier et al., Intercultural Competence Measurement

Table 2. Summary of the confirmatory factor analyses df

CFI

RMSEA

24.38

.01

0.957

0.072

6.10

.19

0.995

0.047

.46

4.86

.09

0.940

0.078

.81

2.33

.12

0.996

0.075

.84

7.22

< .01

0.984

0.162

Self-monitoring

.78

8.43

.01

0.974

0.117

Self-recording

.88

0.62

.73

1.000

0.000

Self-experimentation

.72

0.00

< .01

1.000

0.000

Number of items

Cronbach α

Intercultural learning goals – knowledge

.72

Intercultural learning goals – attitudes

.83

Intercultural learning goals – communication

Intercultural self-efficacy Intercultural intrinsic interest

Constructs

Forethought phase

Performance phase

Self-Reflection phase Self-evaluation

.86

7.05

.02

0.988

0.104

Optimizing future learning

.68

0.00

< .01

1.000

0.000

Note. CFI = comparative fit index, RMSEA = root mean squared error of approximation.

Table 3. Means, standard deviations, and bivariate correlations between the study variables organized along the learning process Constructs

M (SD) N = 236

Forethought phase 1. Intercultural learning goals – knowledge

3.84 (0.61)

–

2. Intercultural learning goals – attitudes

4.09 (0.71)

.48

–

3. Intercultural learning goals – communication

3.67 (0.65)

.38

.42

–

4. Intercultural self-efficacy

3.45 (0.70)

.24

.12ns

.20

–

5. Intercultural intrinsic interest

3.89 (0.88)

.47

.44

.30

.24

3.12 (0.88)

.14

.20

.15

.06ns

–

Performance phase .15

–

7. Self-recording

2.62 (1.23)

.11

.13

.09

.02ns

.04ns

.41

–

8. Self-experimentation

3.49 (0.82)

.26

.16

.28

.18

.20

.31

.35

9. Self-evaluation

4.04 (0.73)

.19

.30

.19

.04ns

.24

.46

.44

.45

–

10. Success attribution on intercultural competence

3.39 (0.85)

.07

.21

.16

.23

.33

.16

.21

.37

–

11. Optimizing for future learning

4.29 (0.63)

.28

.39

.25

.11ns

.33

.40

.41

.33

.52

.33

6. Self-monitoring

–

Self-reflection phase

Note. All items range between 0 and 5. All bivariate correlations were statistically significant at p < .05 level, except the ones marked with

Performance Phase Three constructs (self-monitoring, self-recording, and selfexperimentation) were constructed to form the latent factor performance phase. The model was just identified, w2(0) = 0.00, p = .00, CFI = 1.00, RMSEA = 0.00. Testing the Two Hypothesized Learning Cycles According to Baron and Kenny (1986), mediation was analyzed in three steps, using Structural Equation Modeling (SEM). The first step is to show that the antecedent variable predicts the predictor. The second step is to show that the mediator predicts the predictor. For the third step, we tested whether the effect of the antecedent variable on the predictor was either strongly reduced (partial mediation) or nonexistent (full mediation) when the mediator Ó 2017 Hogrefe Publishing

was included. Due to the cross-sectional data this is a mediation analysis in a statistical sense which does not allow inferring causality. – Learning Cycle 1 (Figure 2) – Prediction of SelfEvaluation Step 1: We tested whether the forethought phase predicted self-evaluation, w2(26) = 42.14, p = .0237, CFI = 0.971, RMSEA = 0.051, and revealed that the forethought phase positively predicted self-evaluation (β = .36, p < .01). Step 2: We tested whether the performance phase predicted self-evaluation, w2(13) = 23.42, p = .0369, CFI = 0.978, RMSEA = 0.050, and revealed that the performance phase predicted self-evaluation (β = .79, p < .01). Zeitschrift für Psychologie (2017), 225(1), 85–94

D. Strohmeier et al., Intercultural Competence Measurement

Step 3: When the mediator was introduced into the model, the forethought phase predicted the performance phase (β = .41, p < .01), and the performance phase predicted self-evaluation (β = .77, p < .01). However, the forethought phase did not predict self-evaluation anymore (β = .04, p = .62) indicating a full mediation, w2(51) = 82.93, p = .0031, CFI = 0.957, RMSEA = 0.052. – Learning Cycle 1 (Figure 3) – Prediction of Success Attribution on Intercultural Competence Step 1: We tested whether the forethought phase predicted the success attribution, w2(26) = 42.14, p = .0237, CFI = 0.971, RMSEA = 0.051, and revealed that the forethought phase positively predicted the success attribution (β = .26, p < .01). Step 2: We tested whether the performance phase predicted the success attribution, w2(13) = 23.42, p = .0369, CFI = 0.978, RMSEA = 0.050, and revealed that the performance phase positively predicted the success attribution (β = .40, p < .01). Step 3: When the mediator was introduced into the model, the forethought phase predicted the performance phase (β = .40, p < .01), and the performance phase predicted the success attribution (β = .36, p < .01). However, the forethought phase did not predict success attribution anymore (β = .12, p = .24) indicating a full mediation, w2(25) = 39.48, p = .0330, CFI = 0.946, RMSEA = 0.050. – Learning Cycle 2 (Figure 4) – Prediction of Forethought Phase Step 1: We tested whether the performance phase predicted the forethought phase, w2(19) = 28.60, p = .0725, CFI = 0.962, RMSEA = 0.046, and revealed that the performance phase positively predicted the forethought phase (β = .40, p < .01). Step 2: We tested whether optimizing future learning predicted the forethought phase, w2(19) = 20.03, p = .393, CFI = 0.998, RMSEA = 0.015, and revealed that optimizing future learning positively predicted the forethought phase (β = .53, p < .01). Step 3: When the mediator was introduced into the model, the performance phase predicted optimizing future learning (β = .65, p < .01), and optimizing future learning predicted the forethought phase (β = .48, p < .01). However, the performance phase did not predict the forethought phase anymore (β = .07, p = .65) indicating a full mediation, w2(41) = 53.14, p = .09, CFI = 0.979, RMSEA = 0.035.

Zeitschrift für Psychologie (2017), 225(1), 85–94

Discussion Due to globalization and increased mobility intercultural competence is considered a key competence for many professions. Assuming that every interpersonal situation is potentially an intercultural situation when salient group memberships lead people to respond to each other as cultural group members and not on the basis of their individual characteristics (Barrett, 2013), intercultural competence becomes relevant in any intergroup situation. Therefore, to better understand and to foster the learning process of students in intercultural situations is important for every university degree program. Although many theoretical models of intercultural competence have been developed (for a summary, see Spitzberg & Changnon, 2009), we are not aware of a model that conceptualized intercultural competence development from the perspective of self-regulated learning. Applying main ideas of self-regulated learning models (Zimmerman, 2000), a cyclical intercultural learning process was proposed. Based on cross-sectional data, the theoretical structure of the proposed process was rigorously tested and two distinct learning cycles were examined. Based on self-regulated learning theory (Zimmerman, 2000) we hypothesized that the intercultural learning process can be described in three distinct phases and would operate in a cyclical fashion. The forethought phase precedes the performing phase which precedes the selfreflection phase which precedes the next forethought phase (see Figure 1). We hypothesized that a learning process is initiated in the forethought phase when learners set themselves intercultural learning goals and when their intercultural self-efficacy and intercultural intrinsic interest are high. We also hypothesized that a learner observes a learning action via self-monitoring, self-recording, or selfexperimentation. Structural equation models confirmed the theoretical structure of the forethought and the performance phase. To test the cyclical nature of the intercultural learning process, we found the existence of two distinct learning cycles. To begin with, we assumed that the higher the motivational beliefs of a learner during the forethought phase, the more self-observing actions would be taken during the performance phase which in turn would be positively related with both self-evaluation and success attribution at the end of a learning action. Second, we hypothesized that a learner who shows a high level of self-observation during the performance phase, would also have a high ability to optimize their future learning which in turn would be positively related with motivational beliefs in the forethought phase indicating the emergence of a future learning action.

Ó 2017 Hogrefe Publishing

D. Strohmeier et al., Intercultural Competence Measurement

Thus, the present study provides initial evidence of the validity of the newly developed self-regulated learning approach and the existence of two theoretically meaningful learning cycles. Conceptual Similarities and Differences of Approaches The self-regulated learning approach shows several similarities with the developmental paradigm for intercultural competence (Bennett, 1986a, 1986b; Hammer, 2011, 2015). To begin with, the proposed intercultural learning process describes a dynamic and cyclical interaction during a particular intercultural situation and does not focus on rather static, trait-like personality characteristics. In line with the developmental paradigm, the self-regulated learning approach assumes that intercultural competence evolves over time as a result of cyclical intercultural learning processes. Contrary to the developmental paradigm however, the self-regulated learning approach is also compatible with the cognitive/affective/behavioral (CAB) paradigm. This is because the dimensions developed in composite models of intercultural competence (for more details, see Spitzberg & Changnon, 2009, pp. 10–15) are integrated into the intercultural learning process as possible learning goals in intercultural situations. Our analyses revealed that the many possible learning goals of selfregulated learners can be meaningfully organized into three dimensions as suggested by the CAB paradigm. We labeled the three intercultural learning goals found in the present study “knowledge,” “attitudes,” and “communication.” The self-regulated learning approach is also compatible with the process models of intercultural competence (Deardorff, 2006) as it specifies a cyclical learning process in three phases. The intercultural learning goals, which are set by self-regulated learners, have many similarities with the attitudes, knowledge, and skills dimension described in the Deardorff process model (Deardorff, 2006). However, instead of defining desired internal or external outcomes in terms of an informed frame of reference shift or the effective and appropriate communication in an intercultural situation, the self-regulated learning approach assumes that the intercultural learning process is successful when the self-reflection phase is again followed by a forethought phase. Thus, the self-regulated learning perspective on intercultural competence puts the ability for lifelong learning at the heart of intercultural competence development. Thus, instead of evaluating the outcomes of intercultural learning cycles in terms of effectiveness or appropriateness, the goal of intercultural learning is that a self-regulated learner keeps on learning in intercultural situations. To summarize, the self-regulated learning approach innovatively aligns intercultural competence development with the framework of lifelong learning which Ó 2017 Hogrefe Publishing

has been defined as an important future research agenda (Deardorff, 2015). Limitations and Future Research The results of this study should be evaluated in the context of some limitations. To gather initial evidence of the intercultural learning process a newly developed self-assessment questionnaire was applied. Although all items were constructed relying on validated instruments and the structural validity of the new measures was rigorously tested, it should be kept in mind that self-assessment questionnaire is only one possibility to investigate intercultural learning processes. Future studies could also collect data via learning diaries which could be completed before, during, and after a series of specific real-life intercultural situations. Alternatively, experimentally manipulated hypothetical vignettes could be applied. Both methods could be used to investigate the validity of the intercultural learning process across different intercultural situations and to examine the cyclical nature of self-regulated learning longitudinally. Moreover, additional items should be developed for the subscale intercultural learning goals communication as the reliabilities were rather low and the current scale captured four rather heterogeneous items. To summarize, the present study demonstrates that a self-regulated learning model developed in the US can be successfully adapted to describe intercultural learning processes of social work students in Austria. Although there is no reason to assume that the self-regulated learning paradigm is not applicable to other subcultures and other countries, replication studies in Austria with different target groups and cross-national studies are recommended to rule out a possible ethnocentric bias inherent in the model. Acknowledgments The present study was funded by the Platform for Intercultural Competences, University of Applied Sciences Upper Austria between 2012 and 2015 (PI: Dagmar Strohmeier).

References Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. doi: 10.1037/0022-3514. 51.6.1173 Barrett, M. (2013). Intercultural competence: A distinctive hallmark of interculturalism? In M. Barrett (Ed.), Interculturalism and multiculturalism: Similarities and differences (pp. 147–168). Strasbourg, France: Council of Europe. Bennett, M. J. (1986a). Towards ethnorelativism: A developmental model of intercultural sensitivity. In R. M. Paige (Ed.),

Zeitschrift für Psychologie (2017), 225(1), 85–94

Cross-cultural orientation: New conceptualizations and applications (pp. 27–70). New York, NY: University Press of America. Bennett, M. J. (1986b). A developmental approach to training for intercultural sensitivity. International Journal of Intercultural Relations, 10, 179–196. doi: 10.1016/0147-1767(86)90005-2 Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246. Bolten, J. (2006). Interkultureller Trainingsbedarf aus der Perspektive der Problemerfahrungen entsandter Führungskräfte [Need for intercultural training from the perspective of manager’s abroad with problem experiences]. In K. Götz (Ed.), Interkulturelles Lernen, interkulturelles Training (pp. 57–75). Munich, Germany: Hampp. Deardorff, D. K. (2015). Intercultural competence: Mapping the future research agenda. International Journal of Intercultural Relations, 48, 3–5. doi: 10.1016/j.ijintrel.2015.03.002 Deardorff, D. K. (2006). Identification and assessment of intercultural competence as a student outcome of internationalization. Journal of Studies in International Education, 10, 241–266. doi: 10.1177/1028315306287002 Erll, A., & Gymnich, M. (2010). Interkulturelle Kompetenzen– Erfolgreich kommunizieren zwischen den Kulturen [Intercultural competencies – successful communication between cultures]. Stuttgart, Germany: Klett Lerntraining. Fantini, A. E. (2009). Assessing intercultural competence: Issues and tools. In D. K. Deardorff (Ed.), The sage handbook of intercultural competence (pp. 456–476). Thousand Oaks, CA: Sage. Hammer, M. R. (2011). Additional cross-cultural validity testing of the Intercultural Development Inventory. International Journal of Intercultural Relations, 35, 474–487. doi: 10.1016/j.ijintrel. 2011.02.014 Hammer, M. R. (2015). The developmental paradigm for intercultural competence research. International Journal of Intercultural Relations, 48, 12–13. doi: 10.1016/j.ijintrel.2015. 03.004 Holmes, P., & O’Neill, G. (2012). Developing and evaluating intercultural competence: Ethnographies of intercultural encounters. International Journal of Intercultural Relations, 36, 707–718. doi: 10.1016/j.ijintrel.2012.04.010 Jerusalem, M., & Satow, L. (1999). Schulbezogene Selbstwirksamkeitserwartung [Self-efficacy expectations at school]. In R. Schwarzer & M. Jerusalem (Eds.), Skalen zur Erfassung von Lehrer-und Schülermerkmalen (pp. 15–16). Berlin, Germany: FU. Koester, J., & Lustig, M. W. (2015). Intercultural communication competence: Theory, measurement, and application. International Journal of Intercultural Relations, 48, 20–21. doi: 10.1016/j.ijintrel.2015.03.006 Little, T. D. (1997). Mean and covariance structures (MACS) analyses of cross-cultural data: Practical and theoretical issues. Multivariate Behavioral Research, 32, 53–76. doi: 10.1207/s15327906mbr3201_3 Matsumoto, D., & Hwang, H. C. (2013). Assessing cross-cultural competence: A review of available tests. Journal of Cross-Cultural Psychology, 44, 849–873. doi: 10.1177/ 0022022113492891

Zeitschrift für Psychologie (2017), 225(1), 85–94

D. Strohmeier et al., Intercultural Competence Measurement

Pintrich, P. R. (2003). A motivational science perspective on the role of student motivation in learning and teaching contexts. Journal of Educational Psychology, 95, 667–686. Schiefele, U., Krapp, A., Wild, K.-P., & Winteler, A. (1993). Der “Fragebogen zum Studieninteresse” (FSI) [Study interest questionnaire]. Diagnostica, 39, 335–351. Schmitz, B., Perels, F., Bruder, S., & Otto, B. (2003). Fragebogen Selbstregulation [Questionnaire self-regulation] Unpublished questionnaire, TU Darmstadt, Darmstadt. Spitzberg, B. H., & Changnon, G. (2009). Conceptualizing intercultural competence. In D. K. Deardorff (Ed.), The sage handbook of intercultural competence (pp. 1–52). Thousand Oaks, CA: Sage. Spitzberg, B. H., & Cupach, W. R. (2011). Interpersonal skills. In M. L. Knapp & J. A. Daly (Eds.), Handbook of interpersonal communication (4th ed., pp. 481–524). Newbury Park, CA: Sage. Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research, 25, 173–180. doi: 10.1207/s15327906mbr2502_4 Ting-Toomey, S., & Kurogi, A. (1998). Facework competence in intercultural conflict: An updated face-negotiation theory. International Journal of Intercultural Relations, 22, 187–225. doi: 10.1016/S0147-1767(98)00004-2 van de Vijver, F. J. R., & Leung, K. (2009). Methodological issues in researching intercultural competence. In D. K. Deardorff (Ed.), The sage handbook of intercultural competence (pp. 404–418). Thousand Oaks, CA: Sage. Weiner, B. (2005). Motivation from an attribution perspective and the social psychology of perceived competence. In A. J. Elliot & C. S. Dweck (Eds.), Handbook of competence and motivation (pp. 73–84). New York, NY: Guilford Press. Weinstein, C. E., & Hume, L. M. (1998). Study strategies for lifelong learning. Washington, DC: American Psychological Association. Wigfield, A., & Eccles, J. S. (2000). Expectancy-value theory of achievement motivation. Contemporary Educational Psychology, 25, 68–81. doi: 10.1006/ceps.1999.1015 Wosnitza, M. (2000). Motiviertes selbstgesteuertes Lernen im Studium [Motivated self-regulated learning in university students]. Landau, Germany: Verlag Empirische Pädagogik. Zimmerman, B. J. (2000). Attaining self-regulation. A social cognitive perspective. In M. Boekaerts, P. R. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp. 13–39). London, UK: Academic Press.

Received November 14, 2016 Revision received December 13, 2016 Accepted December 23, 2016 Published online July 12, 2017 Dagmar Strohmeier Faculty of Medical Engineering and Applied Social Sciences University of Applied Sciences Upper Austria Garnisonstr. 21 4020 Linz Austria dagmar.strohmeier@fh-linz.at

Ó 2017 Hogrefe Publishing

Call for Papers Delusions: Risk-Factors, Models, and Approaches to Psychological Intervention A Topical Issue of the Zeitschrift für Psychologie Guest Editor: Tania Lincoln University of Hamburg, Germany

Delusions are beliefs that are held with strong conviction despite evidence to the contrary. They develop on a background of genetic and environmental risk and tend to be accompanied by severe distress both for the individual who holds them and for others involved. Delusions typically occur in the context of mental disorders and are the most prominent symptom of psychosis. Since the 1950s, when Aaron Beck published a description of how he treated a psychotic patient using a cognitive approach, research in this field has come a long way. We now know that different types of cognitive behavioral interventions are helpful in reducing delusional conviction and distress. Somewhat disappointingly, though, none of the existing interventions has consistently shown more than a small effect on delusions over and above medication effects. This challenges us to continue our search for the best psychological approaches to delusions. But how can we arrive at better interventions? One step towards improved interventions is to come to a better understanding of the factors that contribute to the formation and maintenance of delusions. Numerous risk factors (e.g., social trauma and other social adversity) and correlates of delusions (e.g., low self-esteem, reasoning biases, and difficulties in emotion regulation) have been identified. Nevertheless, it is unclear how much each individual factor contributes to explaining delusions. Moreover, the interactions between different risk factors and their combined relevance to delusions are insufficiently understood. More research in this area is necessary in order to work out which factors should be given priority in therapy. Another step towards better interventions is to refine explanatory models that can guide assessment and treatment. Existing models of delusions tend to focus on a selection of relevant factors and often fail to integrate all relevant levels of explanation (e.g., social, biological, cognitive, experiential, behavioral). More complex models that include a range of factors can help clarify which factors Ó 2017 Hogrefe Publishing

need to be addressed in treatment and may also improve prediction of which type of intervention could be helpful for whom and to what extent. A third step lies in developing and evaluating novel interventions for delusions. Several researchers have proposed that it would be beneficial to focus interventions more explicitly on the factors identified as causal to delusion formation and maintenance. This can be done either by refining existing psychological interventions and tailoring them more specifically to the factors known to cause and maintain delusions or by developing novel types of interventions that target causal factors. This topical issue of the Zeitschrift für Psychologie aims to bring together research that focuses on the challenge of improving psychological therapy for delusions via one of the three steps outlined above. I invite original or review papers that contribute to the understanding of how delusions arise and are maintained, theoretical models that integrate findings on delusions from different levels of explanation, and intervention research that investigates the effects of interventions derived from what is known about the factors that trigger, cause, and maintain delusions. Contributions may also involve studies that build on the continuum and investigate mechanisms or interventions in samples with subclinical levels of delusions, although studies involving clinical samples are preferred. In addition to full original or review articles and metaanalyses, shorter research notes and opinion papers are also welcome. Interested authors are invited to submit their abstracts on potential papers electronically to the guest editor, Tania Lincoln (E-mail tania.lincoln@ uni-hamburg.de). How to submit: Interested authors should submit a letter of intent including: (1) a working title for the manuscript, (2) names, affiliations, and contact information for all authors, and (3) an abstract of no more than 500 words detailing the content of the proposed manuscript. Zeitschrift für Psychologie (2017), 225(1), 95–96 DOI: 10.1027/2151-2604/a000276

Call for Papers

There is a two-stage submissions process. Initially, interested authors are requested to submit only abstracts of their proposed papers. Authors of the selected abstracts will then be invited to submit full papers. All papers will undergo blind peer review.

For additional information, please contact the guest editor.

About the Journal Deadline for submission of abstracts is July 15, 2017. Deadline for submission of full papers is November 15, 2017. The journal seeks to maintain a short turnaround time, with the final version of accepted papers being due by February 15, 2018. The topical issue will be published as issue 3 (2018).

Zeitschrift für Psychologie (2017), 225(1), 95–96

The Zeitschrift für Psychologie, founded in 1890, is the oldest psychology journal in Europe and the second oldest in the world. One of the founding editors was Hermann Ebbinghaus. Since 2007 it is published in English and devoted to publishing topical issues that provide state-ofthe-art reviews of current research in psychology. For detailed author guidelines, please see the journal’s website at www.hogrefe.com/j/zfp

Ó 2017 Hogrefe Publishing

Instructions to Authors The Zeitschrift fu¨r Psychologie publishes high-quality research from all branches of empirical psychology that is clearly of international interest and relevance, and does so in four topical issues per year. Each topical issue is carefully compiled by guest editors. The subjects being covered are determined by the editorial team after consultation within the scientific community, thus ensuring topicality. The Zeitschrift fu¨r Psychologie thus brings convenient, cutting-edge compilations of the best of modern psychological science, each covering an area of current interest. Zeitschrift fu¨r Psychologie publishes the following types of articles: Review Articles, Original Articles, Research Spotlights, Horizons, and Opinions. Manuscript submission: A call for papers is issued for each topical issue. Current calls are available on the journal’s website at www.hogrefe.com/j/zfp. Manuscripts should be submitted as Word or RTF documents via e-mail attachment to either the responsible guest editor(s) or the Editor-in-Chief (Prof. Dr. Edgar Erdfelder, University of Mannheim, Psychology III, Schloss, Ehrenhof-Ost, 68131 Mannheim, Germany, Tel. +49 621 181-2146, Fax +49 621 181-3997, erdfelder@psychologie.uni-mannheim.de) Detailed instructions to authors are provided at http://www. hogrefe.com/j/zfp Copyright Agreement: By submitting an article, the author confirms and guarantees on behalf of him-/herself and any coauthors that he or she holds all copyright in and titles to the submitted contribution, including any figures, photographs, line drawings, plans, maps, sketches and tables, and that the article and its contents do not infringe in any way on the rights of third parties. The author indemnifies and holds harmless the publisher from any third-party claims. The author agrees, upon acceptance of the article for publication, to transfer to the publisher on behalf of him-/herself and any coauthors the exclusive right to reproduce and distribute the article and its contents, both physically and in nonphysical, electronic, and other form, in the journal to which it

Ó 2017 Hogrefe Publishing

has been submitted and in other independent publications, with no limits on the number of copies or on the form or the extent of the distribution. These rights are transferred for the duration of copyright as defined by international law. Furthermore, the author transfers to the publisher the following exclusive rights to the article and its contents: 1. The rights to produce advance copies, reprints, or offprints of the article, in full or in part, to undertake or allow translations into other languages, to distribute other forms or modified versions of the article, and to produce and distribute summaries or abstracts. 2. The rights to microfilm and microfiche editions or similar, to the use of the article and its contents in videotext, teletext, and similar systems, to recordings or reproduction using other media, digital or analog, including electronic, magnetic, and optical media, and in multimedia form, as well as for public broadcasting in radio, television, or other forms of broadcast. 3. The rights to store the article and its content in machinereadable or electronic form on all media (such as computer disks, compact disks, magnetic tape), to store the article and its contents in online databases belonging to the publisher or third parties for viewing or downloading by third parties, and to present or reproduce the article or its contents on visual display screens, monitors, and similar devices, either directly or via data transmission. 4. The rights to reproduce and distribute the article and its contents by all other means, including photomechanical and similar processes (such as photocopying or facsimile), and as part of so-called document delivery services. 5. The right to transfer any or all rights mentioned in this agreement, as well as rights retained by the relevant copyright clearing centers, including royalty rights to third parties. Online Rights for Journal Articles: Guidelines on authors’ rights to archive electronic versions of their manuscripts online are given in the document ‘‘Guidelines on sharing and use of articles in Hogrefe journals’’ on the journal’s web page at www.hogrefe.com/j/zfp January 2017

Zeitschrift fu¨r Psychologie (2017), 225(1)

The relationship between neuroscience and behavioral research Contents and topics include • Development of reading remediation for dyslexic individuals: Added beneﬁts of the joint consideration of neurophysiological and behavioral data by Mélanie Bédard, Line Laplante, and Julien Mercier • A systematic review of the literature linking neural correlates of feedback processing to learning by Jan-Sébastien Dion and Gérardo Restrepo • The effect of a prospected reward on semantic processing: An N400 EEG study by Sanne H. G. van der Ven, Sven A. C. van Touw, Anne H. van Hoogmoed, Eva M. Janssen, and Paul P. M. Leseman • Proportional reasoning: The role of congruity and salience in behavioral and imaging research by Ruth Stavy, Reuven Babai, and Arava Y. Kallai • The learning brain: Neuronal recycling and inhibition by Emmanuel Ahr, Grégoire Borst, and Olivier Houdé

Elsbeth Stern / Roland H. Grabner / Ralph Schumacher (Editors)

Neuroscience and Education Added Value of Combining Brain Imaging and Behavioral Research (Series: Zeitschrift für Psychologie – Vol. 224) 2016, iv + 66 pp., large format US $49.00 / € 34.95 ISBN 978-0-88937-485-0 Educational neuroscience is an interdisciplinary ﬁeld that has emerged through the improvement of brain imaging techniques and investigates human learning inside and outside of schools. In this issue, leading experts

www.hogrefe.com

look back at the progress made within the ﬁeld over the last decade, focusing on the bidirectional relationship between neuroscience and behavioral research.

Assessment methods in health psychology “This book is an excellent overview of measurement issues that are central to health psychology.” David French, PhD, Professor of Health Psychology, University of Manchester, UK

Yael Benyamini / Marie Johnston / Evangelos C. Karademas (Editors)

Assessment in Health Psychology (Series: Psychological Assessment – Science and Practice – Vol. 2) 2016, vi + 346 pp. US $69.00 / € 49.95 ISBN 978-0-88937-452-2 Also available as eBook

Assessment in Health Psychology presents and discusses the best and most appropriate assessment methods and instruments for all speciﬁc areas that are central for health psychologists. It also describes the conceptual and methodological bases for assessment in health psychology, as well as the most important current issues and recent progress in methods. A unique feature of this book, which brings together leading authorities on health psychology assessment, is its emphasis on the bidirectional link between theory and practice. Assessment in Health Psychology is addressed to masters and doctoral students in health psychology, to all

www.hogrefe.com

those who teach health psychology, to researchers from other disciplines, including clinical psychology, health promotion, and public health, as well as to health policy makers and other healthcare practitioners. This latest volume in the series Psychological Assessment – Science and Practice provides a thorough and authoritative record of the best available assessment tools and methods in health psychology, making it an invaluable resource both for students and academics as well as for practitioners in their daily work.

How to assess the social atmosphere in forensic hospitals and identify ways of improving it “All clinicians and researchers who want to help make forensic treatment environments safe and effective should buy this book.” Mary McMurran, PhD, Professor of Personality Disorder Research, Institute of Mental Health, University of Nottingham, UK

Norbert Schalast / Matthew Tonkin (Editors)

The Essen Climate Evaluation Schema – EssenCES A Manual and More 2016, x + 108 pp. US $49.00 / € 34.95 ISBN 978-0-88937-481-2 Also available as eBook The Essen Climate Evaluation Schema (EssenCES) described here is a short, well-validated questionnaire that measures three essential facets of an institution’s social atmosphere. An overview of the EssenCES is followed by detailed advice on how to administer

www.hogrefe.com

and score it and how to interpret ﬁndings, as well as reference norms from various countries and types of institutions. The EssenCES “manual and more” is thus a highly useful tool for researchers, clinicians, and service managers working in forensic settings.

Alternatives to traditional self-reports in psychological assessment

“A unique and timely guide to better psychological assessment.” Rainer K. Silbereisen, Research Professor, Friedrich Schiller University Jena, Germany Past-President, International Union of Psychological Science

Tuulia Ortner / Fons J. R. van de Vijver (Editors)

Behavior-Based Assessment in Psychology Going Beyond Self-Report in the Personality, Affective, Motivation, and Social Domains (Series: Psychological Assessment – Science and Practice – Vol. 1) 2015, vi + 234 pp. US $63.00 / € 44.95 ISBN 978-0-88937-437-9 Also available as eBook Traditional self-reports can be an unsufficiant source of information about personality, attitudes, affect, and motivation. What are the alternatives? This first volume in the authoritative series Psychological Assessment – Science and Practice discusses the most influential, state-of-the-art forms of assessment that can take us beyond self-report. Leading scholars from various countries describe the theo-

www.hogrefe.com

retical background and psychometric properties of alternatives to self-report, including behavior-based assessment, observational methods, innovative computerized procedures, indirect assessments, projective techniques, and narrative reports. They also look at the validity and practical application of such forms of assessment in domains as diverse as health, forensic, clinical, and consumer psychology.

Volume 225 / Number 1 / 2017

In behavioral science, measurement methods and theory are often discussed in isolation, separate from specific substantive research questions. This frequently leads to the development of tools that do not fit substantive research questions of current interest closely enough to provide convincing scientific answers. As a consequence, there is a need for the development of more specific theory-guided measurement devices, instruments, and associated statistical methods that are tailored to the research questions of interest. This volume presents examples of this type of research question-driven applied psychological measurement in three areas: individual differences in cognition, applied fields such as neuropsychology and trauma research, and educational psychology and competence research.

Contents include: Ten Years After Edgar Erdfelder and Bernd Leplow Measuring Individual Differences in Implicit Learning with Artificial Grammar Learning Tasks: Conceptual and Methodological Conundrums Daniel Danner, Dirk Hagemann, and Joachim Funke Measuring Age-Related Differences in Using a Simple Decision Strategy: The Case of the Recognition Heuristic Rüdiger F. Pohl Measuring the Zero-Risk Bias: Methodological Artefact or Decision-Making Strategy? Elisabeth Schneider, Bernhard Streicher, Eva Lermer, Rainer Sachs, and Dieter Frey Assessing Suffering in Experimental Pain Models: Psychological and Psychophysiological Correlates M. Brunner, M. Löffler, S. Kamping, S. Bustan, A. M. González-Roldán, F. Anton, and H. Flor

Applied Psychological Measurement / Zeitschrift für Psychologie

Is the Implicit Association Test for Aggressive Attitudes a Measure for Attraction to Violence or Traumatization? Matthias Bluemke, Anselm Crombach, Tobias Hecker, Inga Schalinski, Thomas Elbert, and Roland Weierstall Measuring a Mastery Goal Structure Using the TARGET Framework: Development and Validation of a Classroom Goal Structure Questionnaire Marko Lüftenegger, Ulrich S. Tran, Lisa Bardach, Barbara Schober, and Christiane Spiel Parents’ and Teachers’ Opinions on Bullying and Cyberbullying Prevention: The Relevance of Their Own Children’s or Students’ Involvement Petra Gradinger, Dagmar Strohmeier, and Christiane Spiel Intercultural Competence Development Among University Students From a Self-Regulated Learning Perspective: Theoretical Model and Measurement Dagmar Strohmeier, Petra Gradinger, and Petra Wagner

Hogrefe Publishing Group Göttingen · Berne · Vienna · Oxford Boston · Paris · Amsterdam · Prague Florence · Copenhagen · Stockholm Helsinki · São Paulo · Madrid · Lisbon www.hogrefe.com

ISBN 978-0-88937-498-0 90000 9 780889 374980

Bernd Leplow (Editor)

Applied Psychological Measurement