Ejpa 2016 32 issue 2 by Hogrefe

Volume 32 / Number 2 / 2016

European Journal of

Psychological Assessment

European Journal of Psychological Assessment

jpa_32_2_cover.indd 1,3

Editor-in-Chief Matthias Ziegler

Official Organ of the European Association of Psychological Assessment

Associate Editors Martin Bäckström Gary N. Burns Laurence Claes Marjolein Fokkema David Gallardo-Pujol Samuel Greiff Christoph Kemper Stefan Krumm Lena Lämmle Carolyn MacCann René T. Proyer Sebastian Sauer Marit Sijbrandij

29.04.2016 09:37:27

The multiaxial ber o t c O diagnostic system for 2016 psychodynamically oriented psychiatrists and therapists, now for children and adolescents OPD Task Force / Franz Resch / Georg Romer / Klaus Schmeck / Inge Seiffge-Krenke (Editors)

OPD-CA-2 Operationalized Psychodynamic Diagnosis in Childhood and Adolescence Theoretical Basis and User Manual 2016, ca. X + 400 ca. US $79.00 / â&#x201A;Ź 57.00 ISBN 978-0-88937-489-8 Also available as eBook Following the success of the Operationalized Psychodynamic Diagnosis for Adults (OPD-2), this multiaxial diagnostic and classification system based on psychodynamic principles has now been adapted for children and adolescents by combining psychodynamic, developmental, and clinical psychiatric perspectives. The OPD-CA-2 is based on four axes that are aligned with the new dimensional approach in the DSM-5: I = interpersonal relations, II = conflict,

www.hogrefe.com

III = structure, and IV = prerequisites for treatment. After an initial interview, the clinician (or researcher) can evaluate the patientâ&#x20AC;&#x2122;s psychodynamics according to these axes to get a comprehensive psychodynamic view of the patient. Easy-to-use checklists and evaluation forms are provided. The set of tools and procedures the OPD-CA-2 manual provides have been widely used for assessing indications for therapy, treatment planning, and measuring change, as well as providing information for parental work.

European Journal of

Psychological Assessment Volume 32, No. 2, 2016 OfďŹ cial Organ of the European Association of Psychological Assessment

Editor-in-Chief

Matthias Ziegler, Department of Psychology, Humboldt University Berlin, Rudower Chaussee 18, 12489 Berlin, Germany (Tel. +49 30 2093 9447, Fax +49 30 2093 9361, E-mail zieglema@hu-berlin.de)

Editor-in-Chief (past)

Karl Schweizer, Department of Psychology, Goethe University Frankfurt, Theodor-W.-Adorno-Platz 6, 60323 Frankfurt a.M., Germany (Tel. +49 69 798-22081, Fax +49 69 798-23847, E-mail k.schweizer@psych.uni-frankfurt.de)

Editorial Assistant

Doreen Bensch, Department of Psychology, Humboldt University Berlin, Rudower Chaussee 18, 12489 Berlin, Germany (Tel. +49 30 2093 9441, Fax +49 30 2093 9360, E-mail benschdx@cms.hu-berlin.de)

Associate Editors

Martin Ba¨ckstro¨m, Sweden; Gary N. Burns, USA; Laurence Claes, Belgium; Marjolein Fokkema, The Netherlands; David Gallardo-Pujol, Spain; Samuel Greiff, Luxembourg; Christoph Kemper, Germany; Stefan Krumm, Germany; Lena La¨mmle, Germany; Carolyn MacCann, Australia; Rene´ Proyer, Germany; Sebastian Sauer, Germany; Marit Sijbrandij, The Netherlands

Consulting Editors

Paul De Boeck, USA Christine DiStefano, USA Anastasia Efklides, Greece Rocı´o Ferna´ndez-Ballesteros, Spain Brian F. French, USA Klaus Kubinger, Austria Kerry Lee, Singapore Helfried Moosbrugger, Germany

Founders

Rocı´o Ferna´ndez-Ballesteros and Fernando Silva

Supporting Organizations

The journal is the ofﬁcial organ of the European Association of Psychological Assessment (EAPA). The EAPA was founded to promote the practice and study of psychological assessment in Europe as well as to foster the exchange of information on this discipline around the world. Members of the EAPA receive the journal in the scope of their membership fees. Further, the Division for Psychological Assessment and Evaluation, Division 2, of the International Association of Applied Psychology (IAAP) is sponsoring the journal: Members of this association receive the journal at a special rate (see below).

Publisher

Hogrefe Publishing, Merkelstr. 3, D-37085 Go¨ttingen, Germany, Tel. +49 551 999-500, Fax +49 551 999-50111, E-mail publishing@hogrefe.com, Web http://www.hogrefe.com North America: Hogrefe Publishing, 38 Chauncy Street, Suite 1002, Boston, MA 02111, USA, Tel. +1 866 823-4726, Fax +1 617 354-6875, E-mail customerservice@hogrefe-publishing.com, Web http://www.hogrefe.com

Production

Regina Pinks-Freybott, Hogrefe Publishing, Merkelstr. 3, D-37085 Go¨ttingen, Germany, Tel. +49 551 999-500, Fax +49 551 999-50111, E-mail production@hogrefe.com

Subscriptions

Hogrefe Publishing, Herbert-Quandt-Strasse 4, D-37081 Go¨ttingen, Germany, Tel. +49 551 50688-900, Fax +49 551 50688-998

Advertising/Inserts

Melanie Beck, Hogrefe Publishing, Merkelstr. 3, D-37085 Go¨ttingen, Germany, Tel. +49 551 999-500, Fax +49 551 999-50111, E-mail marketing@hogrefe.com

ISSN

ISSN-L 1015-5759, ISSN-Print 1015-5759, ISSN-Online 2151-2426

Ó 2016 Hogrefe Publishing. This journal as well as the individual contributions and illustrations contained within it are protected under international copyright law. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, microﬁlming, recording or otherwise, without prior written permission from the publisher. All rights, including translation rights, reserved.

Publication

Published in 4 issues per annual volume.

Subscription Prices

Annual subscription, Institutions (2016): 1334.00, US $434.00, £268.00 Annual subscription, Individuals (2016): 1152.00, US $208.00, £121.00 Postage and handling: 112.00, US $16.00, £10.00 Special rates: IAAP/Colegio Oﬁcial de Psico´logos members: 1108.00, US $151.00, £86.00 (+ 112.00, US $16.00, £10.00 postage and handling); EAPA members: Included in membership Single issues: 184.00, US $109.00, £67.00 (+ postage and handling)

Payment

Payment may be made by check, international money order, or credit card, to Hogrefe Publishing, Merkelstr. 3, D-37085 Go¨ttingen, Germany, or, for North American customers, to Hogrefe Publishing, 38 Chauncy Street, Suite 1002, Boston, MA 02111, USA.

Electronic Full Text

The full text of the European Journal of Psychological Assessment is available online at http://econtent.hogrefe.com and in PsycARTICLES.

Abstracting/Indexing Services

The journal is abstracted/indexed in Current Contents / Social & Behavioral Sciences (CC/S&BS), Social Sciences Citation Index (SSCI), Social SciSearch, PsycINFO, Psychological Abstracts, PSYNDEX, ERIH, and Scopus. Impact Factor (2014): 1.973.

European Journal of Psychological Assessment (2016), 32(2)

Janos Nagy, Hungary Tuulia Ortner, Austria Willibald Ruch, Germany Manfred Schmitt, Germany Ste´phane Vautier, France Fons J.R. van de Vijver, The Netherlands Cilia Witteman, The Netherlands

Ó 2016 Hogrefe Publishing

Contents Editorial

50 Facets of a Trait – 50 Ways to Mess Up? Matthias Ziegler and Martin Ba¨ckstro¨m

105

Original Articles

The Beck Hopelessness Scale: Speciﬁc Factors of Method Effects? Marianna Szabo´, Veronika Me´sza´ros, Judit Sallay, Gyo¨ngyi Ajtay, Viktor Boross, A`gnes Udvardy-Me´sza´ros, Gabriella Vizin, and Do´ra Perczel-Forintos

111

The Utrecht-Management of Identity Commitments Scale (U-MICS): Measurement Invariance and Cross-National Comparisons of Youth From Seven European Countries Radosveta Dimitrova, Elisabetta Crocetti, Carmen Buzea, Venzislav Jordanov, Marianna Kosic, Ergyul Tair, Jitka Tausˇova´, Natasja van Cittert, and Fitim Uka

119

Measurement Invariance of the Self-Description Questionnaire II in a Chinese Sample Kim Chau Leung, Herbert W. Marsh, Rhonda G. Craven, and Adel S. Abduljabbar

128

Selecting the Best Items for a Short-Form of the Experiences in Close Relationships Questionnaire Marie-France Lafontaine, Audrey Brassard, Yvan Lussier, Pierre Valois, Philip R. Shaver, and Susan M. Johnson

140

Measuring the Situational Eight DIAMONDS Characteristics of Situations: An Optimization of the RSQ-8 to the S8* John F. Rauthmann and Ryne A. Sherman

155

Ultra-Brief Measures for the Situational Eight DIAMONDS Domains John F. Rauthmann and Ryne A. Sherman

165

Ó 2016 Hogrefe Publishing

European Journal of Psychological Assessment (2016), 32(2)

Editorial 50 Facets of a Trait – 50 Ways to Mess Up? Matthias Ziegler1 and Martin Bäckström2 1

Humboldt-Universität zu Berlin, Berlin, Germany, 2Lunds Universitet, Lund, Sweden

In recent years papers introducing measures containing constructs with a lower order facet structure make up a large proportion of our submissions. Different notions of facets can be found in the literature, though. The most prominent concept of facets is the idea of hierarchically structured nomological nets. This ideas has existed for a very long time. Especially in research on the structure of intelligence it has been argued that abilities could be clustered and structured hierarchically (see Ackerman, 1996, for an excellent summary). Whereas this view has been successful in the area of mental abilities, with researchers tentatively agreeing on a structure (McGrew, 2009), and empirical evidence for the validity of test scores from different levels of the hierarchy (Schmidt & Hunter, 1998; Ziegler, Dietl, Danay, Vogel, & Bühner, 2011), the idea of hierarchies is still not fully lodged within the assessment of non-cognitive personality aspects. One of the first proponents of this idea was Eysenck (1947) who suggested that human behavior could be summarized on four levels: specific responses, habits, traits and types. The lowest level of this framework can be understood as the items in a questionnaire. The next levels which we will call facets and traits or domains, have inspired research and controversy in the last decades. A conceptually different notion of facets can be found in circumplex models (Di Blas, 2000; Martínez-Arias, Silva, Díaz-Hidalgo, Ortet, & Moro, 1999). Here, facets are operationalized as the composite of two domains. One prominent proponent of this approach is the Abridged Big 5 Circumplex (AB5C) which has inspired much research (Hofstee, De Raad, & Goldberg, 1992). Most of the papers we see conceive of facets as lower order structures beneath a broader domain in sensu Eysenck. We will therefore focus on this approach and only make a few side notes on circumplex models where appropriate. Probably the best-known model of a personality hierarchy is the facetted Five Factor Model (FFM) by Costa and McCrae (1995a). However, despite its widespread use, empirical evidence for the psychometric quality of the proposed model is still lacking for many important aspects (see below). This critique does not only hold for the facets of the Ó 2016 Hogrefe Publishing

FFM. Other hierarchical measures of personality, for example the HEXACO measures (Lee & Ashton, 2004; Lee & Ashton, 2006) or the IPIP (Goldberg et al., 2006) could be criticized likewise. Especially the lacking evidence for factorial validity of the assumed hierarchy using confirmatory factor analysis is striking. Still, the idea of facets as an important aspect of human personality is becoming more and more popular (Woods & Anderson, 2016). There is one obvious reason for this phenomenon: test-criterion correlations. Even though the bandwidth-fidelity dilemma (Cronbach & Gleser, 1965) has been discussed extensively (Ones & Viswesvaran, 1996), its general problem also applies to the question of facets versus traits. If facets are conceptualized as narrower versions of a general trait, the bandwidth-fidelity debate is on full blast. Importantly though, it has been shown that personality facets can outperform their underlying traits when predicting a variety of criteria (Paunonen & Ashton, 2001, 2013; Ziegler, Bensch, et al., 2014; Ziegler, Danay, Schölmerich, & Bühner, 2010). However, as convincing as this evidence may seem, there is also evidence contradicting these findings (Salgado, Moscoso, & Berges, 2013). To summarize, based on early theories of human personality, the idea of multifaceted personality traits has inspired the construction of many personality questionnaires. Empirical evidence supports the utility of these facets in many cases. Still, doubts remain, especially regarding other aspects such as reliability and construct validity (see below). This editorial is meant to provide a different look onto the concept of facets. Moreover problems, which should be addressed when proposing a facetted measure, are discussed.

The Concept of Facets As outlined above, the idea of facets located beneath a more general domain often is explained based on Eysenck’s European Journal of Psychological Assessment 2016; Vol. 32(2):105–110 DOI: 10.1027/1015-5759/a000372

106

Editorial

Trait A

Trait B

SF1

SFi

Facet 1 Response …. 1

…. Response n

SF1

Facet i Response …. 1

SFj

Facet 1 Response m

Response …. 1

Facet j

…. Response p

Response …. 1

Response q

Figure 1. A trait hierarchy.

terminology. He proposed that specific responses to stimuli can be clustered into habits or what we now call facets. Thus, facets in this sense can be understood as a collection of specific behaviors with systematic interindividual differences, which occur systematically across situations and time. The next level, the trait or domain level is meant to represent clusters of such facets (see Figure 1). In other words, the trait score itself should represent the common core built by the shared variance of all facets. Whether a factor score derived from a hierarchical model is a better estimate for this compared to a sum score is another issue. The facets themselves should be described by specific behaviors that can be representative of the common core but do not necessarily need to be located there. It would be possible for a facet to include behaviors not located in the common core but specific to the trait itself. In other words, when interpreting facet and domain scores it is important to keep in mind that the domain score represents the common core of all facets whereas the facet scores can contain behaviors that are specific. Moreover, when speaking of the trait, one should specify whether the whole trait is meant or just the trait score. Clearly, based on the ideas just outlined, the trait itself can be more than the trait score implies. Thus, within a nomological net (Cronbach & Meehl, 1955; Ziegler, 2014) it should be defined which behaviors or specific responses form a facet and how different facets can be understood as representing a common core. This is represented in Figures 1 and 2 for the idea of hierarchically structured traits. If we use this to formalize the individual response to an item according to classical test theory, we arrive at the following equation: X Item ¼ T Trait þ S Facet þ EItem :

ð1Þ

In other words, the answer to an item is influenced by the latent trait, the specific variance of the facet (facet specificity) and an error. This equation mirrors the idea of two sources of systematic variance, positioned at two latent yet related hierarchy levels. In terms of test score interpretations this means that a multifaceted and European Journal of Psychological Assessment 2016; Vol. 32(2):105–110

Trait

Facet 2

Common Core

Facet 1

Figure 2. The nomological net of a trait with four facets depicted as a Venn diagram. hierarchically structured trait score reflects the variance shared by all facets and therefore, the common core of the trait as explained above. Within each item this influence is represented by the amount of trait variance. A facet score, however, represents an area in the nomological net which is still a part of the trait but also can contain behaviors that are specific to this facet and do not strongly overlap with the other facets within this trait (see Figure 2). Within an item this systematic source of variance is represented by the variance explained by the facet. Thus, defining the nomological net of hierarchically organized traits requires defining the behaviors making up the common core but also the specific behaviors (specifics) of a trait represented by facets. Figure 2 exemplifies this idea. Here a trait and its four facets are depicted as Venn diagrams. The overlap between the facets represents the common core. It can also be seen that each facet contains specific variance representative of the trait but not shared with other facets. Moreover, there are areas of the trait not covered by any of the facets hinting at the idea that the trait is not fully explored yet. Importantly, this differentiation into two systematic sources of variances bears great implications for the Ó 2016 Hogrefe Publishing

Editorial

estimation of reliability and the validation strategy. The challenge during the validation process is to show that the specific variance of the facet is an essential part of the trait’s nomological net and not just variance additionally measured but actually representing a different construct. At the same time, having two sources of systematic variance complicates the interpretation of test-criterion correlations. Again, we need to stress that all of this mostly refers to hierarchically structured traits. Circumplex models, which combine facets in a different way, pose different problems. Another important aspect we need to mention here is the notion of facets as situation or domain specific manifestations of a trait. One example for this are school-subject specific or domain specific measures of achievement motivation (Sparfeldt et al., 2015). Here the idea of a facet is combined with situational perception (Horstmann & Ziegler, 2016; Ziegler & Horstmann, 2015; Ziegler & Ziegler, 2015). In such cases it is important though, to clearly state, whether it is assumed that there actually is a lower order facet of a specific trait, like mathematical achievement motivation, or whether the facet is a combination of the trait and situational perception (Rauthmann & Sherman, 2016; Rauthmann, Sherman, & Funder, 2015). The latter case would require to extend or adjust the equation stated above to include situational perception and interactions between situational perception and the trait measured.

Facets and Reliability Estimates Despite widespread critique (Cronbach & Shavelson, 2004; McCrae, 2015; McCrae, Kurtz, Yamagata, & Terracciano, 2011; Ziegler, Poropat, & Mell, 2014; Zinbarg, Revelle, Yovel, & Li, 2005), Cronbach’s alpha (Cronbach, 1951) still is the reliability estimate chosen in most papers published here and elsewhere. An important assumption of this reliability estimate is tau-equivalency of the items. In almost all cases, this assumption is violated. Oftentimes it is argued though that alpha still is a minimum estimate if the items are at least congeneric. In other words, if the items are at least unidimensional, alpha is a good estimator. Applying this to facets and traits introduces a problem. We know that the items are affected by two latent entities, the trait and the facet specifics. Even though assumptions of unidimensionality could still be met (Ziegler & Hagemann, 2015) if all items were affected by the two sources in a comparable fashion (Bejar, 1983), considering the loadings, often seen for facetted measures, this seems at least doubtful. However, if the assumption of unidimensional items does not hold, Cronbach’s alpha is no suitable estimate of reliability. In fact, Cronbach (1951) suggested: ‘‘Tests divisible into distinct subtests should be so divided before using the formula.’’ (p. 297). This alone would mean that alpha should not be used to estimate the reliability of the trait score interpretation. Here composite reliabilities should be used (Raykov & Pohl, 2013). Moreover, the fact that there are two sources of variances is problematic for the reliability estimate of the facet Ó 2016 Hogrefe Publishing

107

score interpretation as well. The Cronbach’s alpha formula capitalizes on the correlations between items. The larger (and positive) these are, the larger the reliability estimate. However, the size of these correlations between items belonging to a facet is not only driven by the facet specific variance but also by the trait variance. Thus, the estimate could be too high. Similar criticism could be brought forward against facets within circumplex models. The easiest way to deal with this would be to stop speaking of an estimate for facet score reliability. It would be more correct to speak of the reliability of the facet score plus the trait influence whenever using Cronbach alpha. More practical ways could be to use other estimates such as McDonald’s omega (Ziegler & Brunner, 2016). This, however, could yield reliability estimates rendering facet scores practically useless (Brunner & Süß, 2005). While this might seem devastating at first, it just stresses the importance of obtaining test-retest correlations, which might not only be better but also seem to be more consequential for the validity of a score interpretation (McCrae, 2015; McCrae et al., 2011).

Facets and Construct Validity In order to provide evidence for construct validity it is customary to show convergent and discriminant correlations for the scores derived from a measure (Campbell & Fiske, 1959; Ziegler, 2014). As expressed by the equation above, items forming a facet scale within a hierarchically organized model should contain two systematic sources of variance, i.e. trait variance and facet specificity. Therefore, for scores from facetted measures, construct validity has to be shown for both sources. Convergent validity evidence for trait scores can be a tricky business (Miller, Gaughan, Maples, & Price, 2011; Pace & Brannick, 2010) because of the many different ways traits are being interpreted (Ziegler, Booth, & Bensch, 2013). However, paying attention to differences due to test family, such evidence can be obtained easily. Such ease is harder to find when trying to provide convergent validity evidence for facet score interpretations. Since there is no generally agreed upon facet framework for most traits, finding convergent measures is complicated. In case of the Big Five a recent paper by Woods and Anderson (2016) could pave the way for such a framework. In any case, evidence for the convergent validity of facet score interpretations requires a detailed definition of the nomological net of the facet, the trait, and also the other facets within the trait as outlined above. Ideally, the nomological net for hierarchically structured multifaceted traits should also contain information regarding neighboring or overlapping constructs (and their facets). This seems to be the only way to theoretically justify the selection of convergent facet measures. Admittedly, this is a high bar to cross. Still, the effects could be a slimming cure for such models. Even more important seems evidence regarding discriminant validity. There are at least two general pitfalls: (1) facet specificity is not distinct from the specificity of European Journal of Psychological Assessment 2016; Vol. 32(2):105–110

108

Editorial

other facets within the same or related traits, (2) facet specificity is not really part of the trait’s nomological network but captures a different trait altogether. The first pitfall, that is, facet specificity which is correlated with the specificity of another facet within the same trait or a related one has been acknowledged early. Costa and McCrae (1995b) for example admitted that the correlations between their FFM scores might be due to lacking discriminant validity of some of the facets. Thus, the problem has been known for a while and different solutions have been proposed. Most of these solutions somehow rely on factor analytical methods. It has to be noted that distinguishing the facets within one trait should already be regarded as evidence for discriminant validity. Here it has to be stated that confirmatory approaches such as structural equation modeling seem to be mandatory at the final stage of the validation process. Additionally, if such models only contain facets from one trait, the problem of facet specificity not being unique to one trait but also being present in facets of other traits (pitfall 1) cannot be detected. Thus, it is important to test models containing more than one trait and its facets (e.g., Beauducel & Kersting, 2002). The second pitfall would mean that the variance captured within the facet is systematic but not part of the nomological network it is supposed to represent. An example can be found in the work by Marsh, exploring the structure of self-esteem (Marsh, 1986, 1996). His work shows that one facet of self-esteem, negative self-esteem, which had been purported actually represented differences in verbal ability due to negative item keying. Thus, the facet did not really represent the trait’s nomological network but a totally different construct. Such issues are not always easy to detect. Rigorous theoretical definitions of the facets, their anchoring within the nomological net of the trait, and the distinction from neighboring traits and their facets are an important first step. Based on these definitions it should be possible to empirically test whether the facet specificity is uniquely associated with the trait in question or simply represents a different trait altogether.

Test-Criterion Correlations as Evidence for Criterion Validity Many papers presenting evidence for test scores’ criterion validities contain long and elaborate tables containing correlations between the scores under scrutiny and (hopefully) theoretically selected criteria. While this might do in case of scores representing distinct traits, it does not work to show that a facet score interpretation within a hierarchically structured multifaceted trait framework can be used to predict behavior. Again, we come back to the equation. The facet score contains variance due to the trait in question and facet specificity. The former variance source, the common core, is also shared by all other facets within the trait. Thus, correlations with criteria can be misleading. This would be the case if the correlations were only driven by the trait variance (common core, see Figure 2) and not European Journal of Psychological Assessment 2016; Vol. 32(2):105–110

the facet specific variance. The test-criterion correlations for the facet scores would still differ according to the amount of trait reflected in the facet score variance. However, the facet itself would not be needed to explain the test-criterion correlation. Only the trait score as a reflection of the common core would be needed. Thus, in order to show that the facet score has unique test-criterion correlations, it is necessary to control for the common core, that is, the overlap between the facets. This is usually done using regression analyses (Ziegler, Bensch, et al., 2014; Ziegler et al., 2010) or bifactor modelling (Leue & Beauducel, 2011; Ziegler & Bühner, 2009). Only if a facet score obtains its significant and substantial criterion relation in such an analysis, would the practical use of the facet score be advisable. We also want to remark that during this procedure it could be vital to include other traits or their facets to show that it really is the specificity of the facet in question that predicts a criterion and not a third, unwanted source of variance. An interesting approach that also uses these ideas has been proposed by Siegling, Petrides, and Martskvishvili (2015) to identify facets which should be deleted from a trait’s nomological net. In this multistep approach extraneous and redundant facets are identified. Some of the ideas outlined above, that is, facets reflecting traits which are not part of the trait’s nomological net, also appear there. A final note here deals with the selection of the right criteria. In order to show that a facet score ‘‘works’’ criteria need to be selected that reflect the specificities of the facets and not just the common core.

Editorial Guidelines Clearly, the issues discussed above call for one general conclusion. Facetted measures need to be built on strong theoretical assumptions and not just empirically driven hunts for more scores. We already admitted that this is a high bar to cross. Still we strongly believe that in the end the benefits will outweigh the trouble. Maybe in relation to Newton’s interpretation of Occam’s razor we could say that ‘‘facets are not to be multiplied beyond necessity.’’ Another important implication is the differentiation between the trait represented in the nomological net and the scores derived from the measure. We showed above that the trait score reflects a common core and facet scores a proportion of this core plus some specific variance. The trait in the nomological net, however, comprises all of the behaviors in question. It is therefore important to pay attention to the correct wording when talking about a trait, a trait score, or a facet score. Finally, we want to suggest some editorial guidelines, papers presenting reliability or validity evidence for a hierarchically organized, multifaceted trait measure should adhere to: (1) The nomological net of the trait should be defined as outlined above. Ó 2016 Hogrefe Publishing

Editorial

(2) Reliability estimates or their interpretation should reflect the existence of more than one source of systematic variance. (3) The validation strategy should be informed by the nomological net. For construct validation this means that convergent and discriminant test scores should acknowledge the facet as well as the trait level. Validity evidence regarding structure should use confirmatory methods. And, finally, validity evidence regarding test-criterion correlations should use methods that ensure that the specific relation between a facet is differentiated from the common core’s overlap with the criterion.

References Ackerman, P. L. (1996). A theory of adult intellectual development: Process, personality, interests, and knowledge. Intelligence, 22, 227–257. Beauducel, A., & Kersting, M. (2002). Fluid and crystallized intelligence and the Berlin Model of Intelligence Structure (BIS). European Journal of Psychological Assessment, 18, 97–112. Bejar, I. I. (1983). Achievement testing: Recent advances. Beverly Hills, CA: Sage. Brunner, M., & Süß, H. M. (2005). Analyzing the reliability of multidimensional measures: An example from intelligence research. Educational and Psychological Measurement, 65, 227–240. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105. Costa, P. T., & McCrae, R. R. (1995a). Domains and facets – Hierarchical personality-assessment using the revised neo personality-inventory. Journal of Personality Assessment, 64, 21–50. Costa, P. T., & McCrae, R. R. (1995b). Solid ground in the wetlands of personality: A reply to Block. Psychological Bulletin, 117, 216–220. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. Cronbach, L. J., & Gleser, G. (1965). The bandwidth-fidelity dilemma. In Psychological tests and personnel decisions (pp. 97–107). Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302. Cronbach, L. J., & Shavelson, R. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and Psychological Measurement, 64, 391–418. Di Blas, L. (2000). A validation study of the Interpersonal Circumplex Scales in the Italian language. European Journal of Psychological Assessment, 16, 177. Eysenck, H. J. (1947). The dimensions of human personality. London, UK: Routledge & Kegan Paul. Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., & Gough, H. G. (2006). The international personality item pool and the future of publicdomain personality measures. Journal of Research in Personality, 40, 84–96. Hofstee, W. K., De Raad, B., & Goldberg, L. R. (1992). Integration of the Big Five and circumplex approaches to trait structure. Journal of Personality and Social Psychology, 63, 146–163.

Ó 2016 Hogrefe Publishing

109

Horstmann, K. T., & Ziegler, M. (2016). Situational Perception: Its Theoretical Foundation, Assessment, and Links to Personality. In U. Kumar (Ed.), The Wiley Handbook of Personality Assessment (pp. 31–43). Oxford, UK: John Wiley & Sons. Lee, K., & Ashton, M. C. (2004). Psychometric properties of the HEXACO personality inventory. Multivariate Behavioral Research, 39, 329–358. Lee, K., & Ashton, M. C. (2006). Further assessment of the HEXACO Personality Inventory: Two new facet scales and an observer report form. Psychological Assessment, 18, 182–191. Leue, A., & Beauducel, A. (2011). The PANAS structure revisited: On the validity of a bifactor model in community and forensic samples. Psychological Assessment, 23, 215–225. Marsh, H. W. (1986). Negative item bias in ratings scales for preadolescent children: A cognitive-developmental phenomenon. Developmental Psychology, 22, 37–49. Marsh, H. W. (1996). Positive and negative global self-esteem: A substantively meaningful distinction or artifactors? Journal of Personality and Social Psychology, 70, 810–819. Martínez-Arias, R., Silva, F., Díaz-Hidalgo, M. T., Ortet, G., & Moro, M. (1999). The structure of Wiggins’ interpersonal circumplex: Cross-cultural studies. European Journal of Psychological Assessment, 15, 196–205. McCrae, R. R. (2015). A more nuanced view of reliability: Specificity in the trait hierarchy. Personality and Social Psychology Review, 19, 97–112. McCrae, R. R., Kurtz, J. E., Yamagata, S., & Terracciano, A. (2011). Internal consistency, retest reliability, and their implications for personality scale validity. Personality and Social Psychology Review, 15, 28–50. McGrew, K. (2009). CHC theory and the human cognitive abilities project: Standing on the shoulders of the giants of psychometric intelligence research. Intelligence, 37, 1–10. Miller, J. D., Gaughan, E. T., Maples, J., & Price, J. (2011). A comparison of agreeableness scores from the Big Five Inventory and the NEO PI-R: Consequences for the study of narcissism and psychopathy. Assessment, 18, 335–339. Ones, D., & Viswesvaran, C. (1996). Bandwidth-fidelity dilemma in personality measurement for personnel selection. Journal of Organizational Behavior, 17, 609–626. Pace, V. L., & Brannick, M. T. (2010). How similar are personality scales of the ‘‘same’’ construct? A meta-analytic investigation. Personality and Individual Differences, 49, 669–676. Paunonen, S. V., & Ashton, M. C. (2001). Big five factors and facets and the prediction of behavior. Journal of Personality and Social Psychology, 81, 524–539. Paunonen, S. V., & Ashton, M. C. (2013). On the prediction of academic performance with personality traits: A replication study. Journal of Research in Personality, 47, 778–781. Rauthmann, J. F., & Sherman, R. A. (2016). Measuring the situational eight DIAMONDS characteristics of situations. European Journal of Psychological Assessment, 32, 165–174. Rauthmann, J. F., Sherman, R. A., & Funder, D. C. (2015). Principles of situation research: Towards a better understanding of psychological situations. European Journal of Personality, 29, 363–381. Raykov, T., & Pohl, S. (2013). On studying common factor variance in multiple-component measuring instruments. Educational and Psychological Measurement, 73, 191–209. Salgado, J. F., Moscoso, S., & Berges, A. (2013). Conscientiousness, its facets, and the prediction of job performance ratings: Evidence against the narrow measures. International Journal of Selection and Assessment, 21, 74–84. Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262–274. European Journal of Psychological Assessment 2016; Vol. 32(2):105–110

110

Editorial

Siegling, A. B., Petrides, K. V., & Martskvishvili, K. (2015). An examination of a new psychometric method for optimizing multi-faceted assessment instruments in the context of trait emotional intelligence. European Journal of Personality, 29, 42–54. Sparfeldt, J. R., Brunnemann, N., Wirthwein, L., Buch, S. R., Schult, J., & Rost, D. H. (2015). General versus specific achievement goals: A re-examination. Learning and Individual Differences, 43, 170–177. Woods, S. A., & Anderson, N. R. (2016). Toward a periodic table of personality: Mapping personality scales between the five-factor model and the circumplex model. Journal of Applied Psychology, 101, 582–604. Ziegler, M. (2014). Stop and state your intentions! Let’s not forget the ABC of test construction. European Journal of Psychological Assessment, 30, 239–242. Ziegler, M., Bensch, D., Maaß, U., Schult, V., Vogel, M., & Bühner, M. (2014). Big Five facets as predictor of job training performance: The role of specific job demands. Learning and Individual Differences, 29, 1–7. Ziegler, M., Booth, T., & Bensch, D. (2013). Getting entangled in the nomological net. European Journal of Psychological Assessment, 29, 157–161. Ziegler, M., & Brunner, M. (2016). Test standards and psychometric modeling. In A. A. Lipnevich, F. Preckel, & R. Roberts (Eds.), Psychosocial skills and school systems in the 21st century (pp. 29–55). New York, NY: Springer. Ziegler, M., & Bühner, M. (2009). Modeling socially desirable responding and its effects. Educational and Psychological Measurement, 69, 548–565. Ziegler, M., Danay, E., Schölmerich, F., & Bühner, M. (2010). Predicting academic success with the Big 5 rated from different points of view: Self-rated, other rated and faked. European Journal of Personality, 24, 341–355. Ziegler, M., Dietl, E., Danay, E., Vogel, M., & Bühner, M. (2011). Predicting training success with general mental ability, specific ability tests, and (un) structured interviews: A meta analysis with unique samples. International Journal of Selection and Assessment, 19, 170–182.

European Journal of Psychological Assessment 2016; Vol. 32(2):105–110

Ziegler, M., & Hagemann, D. (2015). Testing the unidimensionality of items: Pitfalls and loopholes. European Journal of Psychological Assessment, 31, 231–237. Ziegler, M., & Horstmann, K. (2015). Discovering the second side of the coin: Integrating situational perception into psychological assessment. European Journal of Psychological Assessment, 31, 69–74. Ziegler, M., Poropat, A., & Mell, J. (2014). Does the length of a questionnaire matter? Journal of Individual Differences, 35, 250–261. Ziegler, M., & Ziegler, J. (2015). Better understanding of psychological situations: opportunities and challenges for psychological assessment. European Journal of Personality, 29, 418–419. Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s a, Revelle’s b, and McDonald’s x H: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70, 123–133.

Matthias Ziegler Institut für Psychologie Humboldt Universität zu Berlin Rudower Chaussee 18 12489 Berlin Germany Tel. +49 30 2093-9447 Fax +49 30 2093-9361 E-mail zieglema@hu-berlin.de Martin Bäckström Lund University Dept. of Psychology Box 117 221 00 Lund Sweden E-mail martin.backstrom@psy.lu.se

Ó 2016 Hogrefe Publishing

Alternatives to traditional self-reports in psychological assessment “A unique and timely guide to better psychological assessment.” Rainer K. Silbereisen, Research Professor, Friedrich Schiller University Jena, Germany Past-President, International Union of Psychological Science

Tuulia Ortner / Fons J. R. van de Vijver (Editors)

Behavior-Based Assessment in Psychology Going Beyond Self-Report in the Personality, Affective, Motivation, and Social Domains (Series: Psychological Assessment – Science and Practice – Vol. 1) 2015, vi + 234 pp. US $63.00 / € 44.95 ISBN 978-0-88937-437-9 Also available as eBook Traditional self-reports can be an unsufficiant source of information about personality, attitudes, affect, and motivation. What are the alternatives? This first volume in the authoritative series Psychological Assessment – Science and Practice discusses the most influential, state-of-the-art forms of assessment that can take us beyond self-report. Leading scholars from various countries describe the theo-

www.hogrefe.com

retical background and psychometric properties of alternatives to self-report, including behavior-based assessment, observational methods, innovative computerized procedures, indirect assessments, projective techniques, and narrative reports. They also look at the validity and practical application of such forms of assessment in domains as diverse as health, forensic, clinical, and consumer psychology.

Hogrefe OpenMind Open Access Publishing? It’s Your Choice! Your Road to Open Access Authors of papers accepted for publication in any Hogrefe journal can now choose to have their paper published as an open access article as part of the Hogrefe OpenMind program. This means that anyone, anywhere in the world will – without charge – be able to read, search, link, send, and use the article for noncommercial purposes, in accordance with the internationally recognized Creative Commons licensing standards.

The Choice Is Yours 1. Open Access Publication: The final “version of record” of the article is published online with full open access. It is freely available online to anyone in electronic form. (It will also be published in the print version of the journal.) 2. Traditional Publishing Model: Your article is published in the traditional manner, available worldwide to journal subscribers online and in print and to anyone by “pay per view.” Whichever you choose, your article will be peer-reviewed, professionally produced, and published both in print and in electronic versions of the journal. Every article will be given a DOI and registered with CrossRef.

www.hogrefe.com

How Does Hogrefe’s Open Access Program Work? After submission to the journal, your article will undergo exactly the same steps, no matter which publishing option you choose: peer-review, copy-editing, typesetting, data preparation, online reference linking, printing, hosting, and archiving. In the traditional publishing model, the publication process (including all the services that ensure the scientific and formal quality of your paper) is financed via subscriptions to the journal. Open access publication, by contrast, is financed by means of a one-time article fee (€ 2,500 or US $3,000) payable by you the author, or by your research institute or funding body. Once the article has been accepted for publication, it’s your choice – open access publication or the traditional model. We have an open mind!

Original Article

The Beck Hopelessness Scale Specific Factors of Method Effects? Marianna Szabó,1 Veronika Mészáros,2 Judit Sallay,2 Gyöngyi Ajtay,2 Viktor Boross,2 Àgnes Udvardy-Mészáros,2 Gabriella Vizin,2 and Dóra Perczel-Forintos2 1

School of Psychology, The University of Sydney, NSW, Australia, Department of Clinical Psychology, Semmelweis University, Budapest, Hungary

Abstract. The aim of the present study was to examine the construct and cross-cultural validity of the Beck Hopelessness Scale (BHS; Beck, Weissman, Lester, & Trexler, 1974). Beck et al. applied exploratory Principal Components Analysis and argued that the scale measured three specific components (affective, motivational, and cognitive). Subsequent studies identified one, two, three, or more factors, highlighting a lack of clarity regarding the scale’s construct validity. In a large clinical sample, we tested the original three-factor model and explored alternative models using both confirmatory and exploratory factor analytical techniques appropriate for analyzing binary data. In doing so, we investigated whether method variance needs to be taken into account in understanding the structure of the BHS. Our findings supported a bifactor model that explicitly included method effects. We concluded that the BHS measures a single underlying construct of hopelessness, and that an incorporation of method effects consolidates previous findings where positively and negatively worded items loaded on separate factors. Our study further contributes to establishing the cross-cultural validity of this instrument by showing that BHS scores differentiate between depressed, anxious, and nonclinical groups in a Hungarian population. Keywords: Beck Hopelessness Scale, method effect, validity, factor analysis

The concept of hopelessness was first introduced by Aaron Beck (1963), who observed that depressed individuals share a set of negative expectations concerning the self and the future. Subsequent research has shown that hopelessness is a powerful predictor of suicidal behaviors, and is associated with depression and a range of other clinical conditions (Beck, Brown, Berchick, Stewart, & Steer, 1990; Beck, Riskind, Brown, & Steer, 1988). Assessing and understanding hopelessness in the context of various forms of psychopathology is crucial, therefore. The most widely used measure of hopelessness is the Beck Hopelessness Scale (BHS; Beck, Weissman, Lester, & Trexler, 1974), a 20-item self-report instrument containing both negative (e.g., I don’t expect to get what I really want) and positive (e.g., I look forward to the future with hope and enthusiasm) statements about the future. An initial Principal Components Analysis (PCA) by the authors suggested the presence of three independent underlying dimensions. ‘‘Feelings about the Future’’ was thought to indicate affective associations, such as a lack of enthusiasm or faith, ‘‘Loss of Motivation’’ was concerned with giving up trying, and ‘‘Future Expectations’’ was thought to reflect a cognitive component of hopelessness, concerning beliefs about a dark and uncertain future. Since its first publication, the BHS has become a widely used instrument in research and clinical practice in Englishspeaking cultures, and the validity of its translated versions is beginning to be established in non-English-speaking cultures as well (Dozois & Covin, 2004). In countries with Ó 2015 Hogrefe Publishing

extremely high suicide rates, such as Hungary, the introduction and valid use of the BHS has crucial clinical implications (Perczel-Forintos, Sallai, & Rózsa, 2010). In general, testing the construct validity of this instrument and its three proposed underlying dimensions is both theoretically and clinically important. If hopelessness in fact contains three components, these might differentially predict depression, suicidal behavior, or other forms of psychopathology. For example, it has been shown that the three components have differential associations with the number of physical symptoms and with a desire for hastened death among patients with AIDS (Rosenfeld, Gibson, Kramer, & Breitbart, 2004). Nevertheless, studies that examined the factor structure of the BHS yielded inconsistent results. While some researchers (Davidson, Tripp, Fabrigar, & Davidson, 2008; Dyce, 1996; Hill, Gallagher, Thompson, & Ishida, 1988; Iliceto & Fino, 2014; Rosenfeld et al., 2004; Steer, Iguchi, & Platt, 1994) supported a three-factor model resembling that offered by Beck et al. (1974), others suggested alternative solutions containing between one and five factors (e.g., Aish & Wasserman, 2001; Nissim et al., 2009; Pompili, Tatarelli, Rogers, & Lester, 2007; Steed, 2001; Steer, Beck, & Brown, 1997; Tanaka et al., 1998; Young, Halper, Clark, Scheftner, & Fawcett, 1992). In addition, several studies found that the 20 items do not converge in an interpretable structure, and recommended the use of either 18 (Tanaka et al., 1998), 16 (Steed, 2001), 15 (Davidson et al., 2008), 14 (Steer et al., 1994), or 4 (Aish European Journal of Psychological Assessment 2016; Vol. 32(2):111–118 DOI: 10.1027/1015-5759/a000240

112

M. Szabó et al.: The Beck Hopelessness Scale

& Wasserman, 2001) items, with a large variability in the placement of these items on the suggested factors. A possible reason for the heterogeneity of the findings may be that some studies tested the structure of the BHS in psychologically healthy volunteers, while others used clinical samples. It has been shown that the BHS has relatively low reliability in nonclinical samples (see Dozois & Covin, 2004), and some researchers argued that studies involving such samples have limited utility in helping us understand hopelessness in individuals who exhibit various forms of psychopathology (Pompili et al., 2007; Rosenfeld et al., 2004). Nevertheless, the results of studies involving psychiatric patients also varied. Researchers who supported a three-factor solution disagreed in the placement of some of the 20 items on the factors (Dyce, 1996; Hill et al., 1988), or included less than 20 items in their solutions (Steer et al., 1994). Others concluded that a two-factor (Kao et al., 2012; Steer et al., 1997), or a one-factor (Aish & Wasserman, 2001; Young et al., 1992) solution fit the data best. Therefore, further research involving psychiatric samples is needed to confirm the structure of the hopelessness construct. Another reason for the reported inconsistencies may be that even among those studies that included comparable samples, a variety of analytical techniques have been employed. Importantly, items on the BHS have a binary response format that is not appropriate for traditional factor analysis. Although some researchers sought to overcome this problem by converting the item format into Likert-scales (Steed, 2001; Iliceto & Fino, 2014) this strategy alters the meaning of participants’ responses and reduces comparability with other studies. Alternatively, advanced factor analytic methods that are able to handle categorical data may be used, such as mean- and variance-adjusted weighted least squares parameter estimates (WLSMV), and the weighted root mean square residual (WRMR) as an index of model fit (Muthén & Muthén, 1998–2010; Yu, 2002). So far only one clinically relevant study utilized such methods. Aish and Wasserman (2001) tested several models in a sample of Swedish suicide attempters, and concluded that a four-item one-factor solution had excellent fit. While this solution may be statistically defensible, a four-item scale may have limited utility beyond providing a quick screen for potential suicidality. Among the large variety of solutions offered so far, two consistent findings emerged. In studies proposing more than one factor, the first factor tended to account for the largest amount of variance, prompting some researchers to speculate whether the results would in fact be best interpreted as supporting a one-factor solution (e.g., Dyce, 1996), and some reviewers to conclude that the BHS is arguably a unidimensional measure of hopelessness (Dozois & Covin, 2004). However, it has also been observed that in studies reporting more than one factor the majority of positively worded items tended to load on one factor, while the second and third factors tended to contain most of the negatively worded items (Beck et al., 1974; Dyce, 1996; Hill et al., 1988). While it can be argued that the factors represent ‘‘hopefulness’’ and ‘‘hopelessness,’’ respectively (Steed, 2001), it is also possible that this pattern indicates a European Journal of Psychological Assessment 2016; Vol. 32(2):111–118

methodological artifact. The BHS may in fact reflect a unitary construct of hopelessness, as well as method effects resulting from item wording. Although this possibility may help to reconcile the reported pattern of item loadings in previous studies, no research has yet tested a model that explicitly included method effects. The aim of the present study, therefore, was to examine the construct validity of the BHS in a large Hungarian clinical sample. We first tested the original three-factor model using Confirmatory Factor Analysis (CFA). We followed this analysis with an exploratory approach to allow any alternative item loadings to empirically emerge. Further, we sought to explain previous observations concerning the separation of positively and negatively worded items. To that aim, we used CFA to test a two-factor model involving the distinct factors of ‘‘hopelessness’’ versus ‘‘hopefulness,’’ and a bifactor model involving a unitary ‘‘hopelessness’’ factor and two method factors indicating positive versus negative item wording. We compared these models with a one-factor model involving a unitary construct of hopelessness that did not involve method effects. Following the factor analyses, we sought to document the convergent and discriminant validity of the Hungarian BHS by assessing its relationship with depression and anxiety, as well as its ability to differentiate among various diagnostic groups.

Method Participants The factor analyses involved data from 905 clinic-referred individuals (72.33% women, M age = 35.97, SD = 13.52). A separate sample of 100 psychologically healthy participants (65% women, M age = 30.93, SD = 8.87 years) served as a comparison group to test the discriminant validity of the BHS; data from these participants were not included in the factor analyses. The clinic-referred sample (N = 905) was recruited from all successive intakes at two outpatient clinical psychology services during 2002– 2011. Primary diagnoses were established according to ICD-10 (WHO, 2013). Of the total group, 362 participants were diagnosed with Mood Disorders. A further 440 participants were diagnosed with Neurotic, Stress-related, and Somatoform disorders, for example, phobic or other anxiety disorders (n = 156), mixed-anxiety depression (n = 103), or adjustment disorders (n = 113). Additionally, 103 individuals received diagnoses for a variety of other clinical problems, including personality disorders (n = 43), eating disorders (n = 17), or schizophrenia (n = 10).

Materials The Beck Hopelessness Scale (Beck et al., 1974) consists of 20 items. Respondents use a true/false format to indicate whether they agree with each statement, referring to the Ó 2015 Hogrefe Publishing

M. Szabó et al.: The Beck Hopelessness Scale

previous week. The BHS was translated into Hungarian by the last author. This was followed by back-translation by a native English speaker who also speaks fluent Hungarian. Differences between the two translations were discussed and corrected. Cronbach’s alpha for the Hungarian version was .91 (Perczel-Forintos et al., 2010). The Beck Depression Inventory (BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961) is a 21-item selfreport inventory for measuring the severity of depression. Participants rate each item on a 4-point scale (0 = not true at all, 3 = very much true). Reliability and validity indices for the BDI have been reported by Beck et al. (1988). We used the Hungarian version of the BDI (PerczelForintos, Ajtay, Barna, Kiss, & Komlósi, 2012), its internal consistency in the current sample was 0.92. The Beck Anxiety Inventory (BAI; Beck, Epstein, Brown, & Steer, 1988) consists of 21 items designed to measure symptoms of anxiety. Participants indicate the extent to which they experienced these symptoms during the past week, using a 4-point response format (0 = not at all, 3 = severely). We used the Hungarian version of the BAI (Perczel-Forintos et al., 2012). Internal consistency in the current sample was 0.71.

Procedure The study protocol was approved by the Institutional Review Board. Participants completed the questionnaires while waiting for their first interview with a clinical psychologist, as a routine part of initial assessment at the participating clinics. Diagnostic information was obtained at intake interview by intern clinical psychologists under the supervision of the last author. Diagnoses were established according to ICD-10 (WHO, 2013).

Results Factor Analyses First, we used CFA to evaluate the original three-factor structure of the BHS in the total clinical sample. Items were

113

set to load on the Affective, Cognitive, and Motivational factors as originally specified by Beck et al. (1974). We followed this analysis by testing a less restricted, exploratory three-factor model where all items were allowed to freely load on any factor. This analysis was expected to reveal whether the 20 items would converge in a three-factor structure that would approximate the original model proposed by Beck et al. Next, we used CFA to test further hypothetical solutions. For this series of analyses, the sample was randomly split into two subgroups; Subgroup 2 was used to cross-validate the results obtained in Subgroup 1. A model including two distinct content-related factors denoting ‘‘hopelessness’’ versus ‘‘hopefulness’’ was tested first, followed by a one-factor solution reflecting a single underlying hopelessness construct. Finally, we tested a bifactor model where one content-related factor (hopelessness) and two method-related factors (positive versus negative wording) were specified. For each factor analysis, we used Mplus 7.2 and employed WLSMV parameter estimates (Muthén & Muthén, 1998–2010). Goodness of fit was evaluated using chi-square, RMSEA, CFI, TLI, and the WRMR, an index recommended as an important addition to model fit assessment when binary variables are used (Yu, 2002). Acceptable model fit was defined by RMSEA < .05, CFI > .96, TLI > .95, and WRMR < 1.00 (Yu, 2002).

Testing the Original Three-Factor Model We first tested a three-factor model with item placements as specified by Beck et al. (1974) in the total clinical sample (N = 905). The model fit indices presented in the first row of Table 1 show that although model fit was adequate according to some of the indices, the RMSEA and the WRMR indicated inadequate fit. Standardized factor correlations were 0.848 between the Motivational and Affective factors, 0.977 between the Motivational and Cognitive factors, and 0.925 between the Cognitive and Affective factors, indicating poor differentiation. All items loaded significantly on their allocated factors, but item 10 had a very low loading of 0.156. Modification Indices suggested 12 instances of potential misspecification of item placements.

Table 1. Model fit indices for all CFA models tested in the clinical sample Total sample (N = 905) Three-factor 20-item Subsample 1 (N = 442) One-factor 20-item Two-factor 20-item Bifactor 20-item Subsample 2 (N = 463) One-factor 18-item Two-factor 18-item Bifactor 18-item

RMSEA

652.62

.057

N/A 367.914 319.337 382.256 259.533 186.912

p RMSEA .05

CFI

TLI

WRMR

.008

.965

.960

1.498

.052 .051

.347 .443

.971 .975

.967 .969

1.125 0.987

.063 .045 .036

.002 .845 .995

.954 .981 .989

.948 .978 .986

1.295 1.041 0.810

Note. One-factor 18-item and two-factor 18-item = BHS items 10 and 13 excluded. Ó 2015 Hogrefe Publishing

European Journal of Psychological Assessment 2016; Vol. 32(2):111–118

114

M. Szabó et al.: The Beck Hopelessness Scale

Table 2. Oblimin rotated loadings and factor correlations obtained from EFA of a three-factor model in the total clinical sample (n = 905) Original item numbers and abbreviated content 16. Never get what I want 14. Things won’t work out 20. No use in really trying 9. Just don’t get the breaks 17. Very unlikely to get real satisfaction 2. Might as well give up 11. Ahead of me is unpleasantness 12. Don’t expect to get what I really want. 7. Future seems dark 15. Have great faith in the future 1. Look forward to the future with hope 10. Experiences prepared well for future 6. Expect to succeed 8. Expect to get more good things 19. Look forward to more good times 13. Expect to be happier than now 18. Future seems vague and uncertain 3. Helped knowing can’t stay that way 5. Have enough time to accomplish things 4. Can’t imagine life in 10 years Factor correlations Factor 1 Factor 2 Factor 3

Factor 1

Factor 2

Factor 3 0.250

0.912 0.872 0.871 0.777 0.766 0.737 0.663 0.616 0.562

0.353

0.205 0.213 0.382 0.822 0.735 0.653 0.603 0.483 0.473 0.234 0.468 0.408 0.393 0.234

1.000 0.660 0.031

1.000 0.094

0.423 0.319 0.325 0.349 0.306 0.233

0.572 0.493 0.412 0.271

Original factor Motivational Cognitive Motivational Motivational Motivational Motivational Motivational Motivational Cognitive Affective Affective Cognitive Affective Cognitive Affective Affective Cognitive Motivational Affective Cognitive

1.000

Note. Loadings < 0.20 are omitted for clarity. Factor loadings >.40 are in bold.

Noting the large number of misspecifications indicated, we proceeded to explore a model where a correlated three-factor structure was specified but items were allowed to freely load on any factor. This less restricted three-factor model fit the data well (v2 = 304.183, df = 133, p < .0001; RMSEA = .038, CFI = .988, TLI = .982), but the pattern of item loadings differed substantially from the original pattern. As Table 2 shows, the first factor was defined primarily by the original ‘‘Motivational’’ items, while the second factor included a mixture of ‘‘Affective’’ and ‘‘Cognitive’’ items, with item 10 double loading on the first and second factor. The third factor contained only three items, and was primarily defined by item 13.

Alternative Models For the next set of analyses, the clinical sample was split into two random groups. Initial CFAs were conducted in subsample 1 (n = 442), followed by cross-validating the obtained models in subsample 2 (n = 463). There were no significant differences between the two subsamples regarding background demographic variables. Subsample 1 We first tested a two-factor model where all positively worded items were set to load on Factor 1 (‘‘Hopefulness’’) European Journal of Psychological Assessment 2016; Vol. 32(2):111–118

and all negatively worded items on Factor 2 (‘‘Hopelessness’’). Fit indices presented in Table 1 show that although the TLI and CFI suggested adequate fit, RMSEA and WRMR did not meet recommended thresholds. Item loadings for the two-factor model are shown in Table 3. Although all items loaded significantly on their allocated factors, item 10 had a very low loading. The only Modification Index above 10 for this model suggested that item 10 load on both factors. Items 13 and 5 also had relatively low loadings, although item 5 approached acceptable threshold. Finally, a high factor correlation indicated little differentiation between the two factors. Nevertheless, a one-factor model where all items were set to load on a single ‘‘hopelessness’’ factor failed to converge in this subsample. Next, we tested a bifactor model where all items were set to load on a single content-related factor (‘‘hopelessness’’), all positively worded items were set to load on Method Factor 1, and all negatively worded items on Method Factor 2. The fit indices for this model are also presented in Table 1, and item loadings are given in Table 4. Most items loaded highly on the content factor, and less strongly on the method factors. Consistent with the other models, item 10 failed to load significantly on the content factor, and item 13 exhibited the second lowest loading. The only Modification Index over 10 suggested that item 13 also load on the second method factor (tapping into both positive and negative method-related effects). Ó 2015 Hogrefe Publishing

M. Szabó et al.: The Beck Hopelessness Scale

115

Table 3. Standardized factor loadings from CFA testing a two-factor model in subgroup 1 (n = 442) Original item numbers and abbreviated content

Factor 1

6. Expect to succeed 1. Look forward to the future with hope 19. Look forward to more good times 15. Great faith in the future. 8. Expect to get more good things 3. Helped knowing can’t stay that way 5. Enough time to accomplish things 13. Expect to be happier than now. 10. Experiences prepared well for future. 7. Future seems dark 20. No use in really trying 11. Ahead of me is unpleasantness 12. Don’t expect to get what I really want 16. Never get what I want 9. Just don’t get the breaks 18. Future seems vague and uncertain 2. Might as well give up 14. Things won’t work out 17. Very unlikely to get real satisfaction 4. Can’t imagine life in 10 years Factor correlations Factor 1 Factor 2

Factor 2

0.889 0.828 0.803 0.798 0.698 0.524 0.492 0.427 0.218 0.903 0.893 0.845 0.815 0.800 0.798 0.790 0.777 0.761 0.706 0.592 1.000 0.861

1.000

Note. Relatively low factor loadings are in italics.

Table 4. Standardized factor loadings from CFA of orthogonal bifactor model specifying one content-related factor and two method-related factors in subgroup 1 (n = 442) Original item numbers and abbreviated content 6. 19. 1. 15. 8. 3. 5. 13. 10. 7. 11. 18. 20. 12. 9. 2. 14. 16. 17. 4.

Factor 1 (content)

Factor 2 (method)

0.826 0.744 0.739 0.692 0.635 0.468 0.438 0.392 0.131 0.908 0.817 0.809 0.799 0.788 0.781 0.719 0.680 0.660 0.659 0.598

0.214 0.229 0.397 0.541 0.276 0.262 0.247 0.128 0.458

Expect to succeed Look forward to more good times Look forward to the future with hope Great faith in the future. Expect to get more good things Helped knowing can’t stay that way Enough time to accomplish things Expect to be happier than now Experiences prepared well for future Future seems dark Ahead of me is unpleasantness Future seems vague and uncertain No use in really trying Don’t expect to get what I really want. Just don’t get the breaks Might as well give up Things won’t work out Never get what I want. Very unlikely to get real satisfaction Can’t imagine life in 10 years.

Factor 3 (method)

0.071 0.212 0.029 0.455 0.206 0.160 0.330 0.420 0.616 0.280 0.021

Note. Relatively low factor loadings on the content factor are in italics.

Cross-Validating: Subsample 2 Each of the models tested so far indicated that item 10 did not contribute substantially to the definition of the Ó 2015 Hogrefe Publishing

underlying construct, and item 13 also had low or inconsistent loadings. Therefore, we proceeded to cross-validate these models in subsample 2 with the omission of items 10 and 13. A series of CFAs in subsample 2 (n = 463) European Journal of Psychological Assessment 2016; Vol. 32(2):111–118

116

M. Szabó et al.: The Beck Hopelessness Scale

Table 5. Descriptive statistics for the 18-item BHS in a nonclinical sample and in separate diagnostic groups Nonclinical Phobic and anxiety disorders Mixed-anxiety depression Mood disorders

Mean

100 148 93 340

2.03 5.82 8.85 10.41

2.11 4.34 4.47 4.92

.678 .869 .856 .884

tested the model fit of the one-factor, two-factor, and bifactor solutions described above, including 18 items only. As shown in Table 1, although the fit indices improved for each model, only the 18-item bifactor model had excellent fit according to all indices.

Reliability and Validity of the 18-Item Hungarian BHS We proceeded to calculate total scores for the 18-item version of the BHS to explore its reliability and validity. The Mean BHS score was 8.13 (SD = 5.13, N = 844), and internal consistency estimate (Cronbach’s alpha) was .898 in the total clinical sample. To compare with traditional Cronbach’s alpha values, we calculated model-based reliability indices (Brunner, Nagy, & Wilhelm, 2012). Coefficient omega, indicating the proportion of variance attributable to a blend of the global hopelessness factor and the specific method factors was .789. Coefficient omega hierarchical, indicating the proportion of variance in hopelessness scores that is attributable to the content factor only, was .620. In the clinical sample as a whole, the 18-item BHS scores had a correlation of 0.69 with the BDI ( p < .001, N = 766) and 0.36 with the BAI ( p < .001, N = 841). Finally, we compared mean BHS scores among groups of individuals diagnosed with mood disorders, mixed-anxiety depression, phobic and other anxiety disorders, and psychiatrically healthy individuals. Descriptive statistics are shown in Table 5. The significant overall ANOVA, F(3, 677) = 107.25, p < .001, and linear trend, F(1, 677) = 303.59, p < .001, indicated that BHS scores increased gradually among these groups, with the nonclinical group obtaining the lowest and the depressed group the highest scores. Internal consistency estimates were similar among the clinical groups, but appeared relatively lower in the nonclinical group.

Discussion The present study explored the factor structure and convergent and discriminant validity of the Beck Hopelessness Scale (Beck et al., 1974) in a large Hungarian clinical sample. We used CFA to test the original three-factor model hypothesized to reflect ‘‘Feelings about the Future,’’ ‘‘Loss European Journal of Psychological Assessment 2016; Vol. 32(2):111–118

of Motivation,’’ and ‘‘Future Expectations,’’ followed by exploring a less restricted three-factor model where all items were allowed to freely load on any factor. The CFA indicated inadequate model fit and a large number of misspecifications. The exploratory model failed to replicate the original pattern of item loadings. Only a few items loaded on their expected factors, and the third factor was defined by a very small number of items. The majority of negatively worded items loaded on the first factor, and the majority of positively worded items loaded on the second. We proceeded to test further hypotheses using CFA in two randomly established subsamples of clinic-referred individuals. We tested a two-factor, a one-factor, and a bifactor model in the first subsample. Each of the models tested in this series of analyses suggested that items 10 and 13 contributed little to the definition of the underlying construct. Therefore, we proceeded to cross-validate these models in the second subsample with the inclusion of 18 items only. These analyses indicated that the 18-item bifactor model provided the best fit for the data. We concluded that the BHS reflects a unitary construct of hopelessness, with method effects resulting from item wording. Our findings considering the factor structure of the BHS have significant theoretical and practical implications. The importance of modeling method effects to improve our understanding of various constructs has been increasingly emphasized in the literature. Our findings are consistent with previous studies where such effects have been found to underlie the structure of other well-known instruments, for example, the Rosenberg Self-Esteem Scale (Tomas & Oliver, 1999). An incorporation of method effects in the structure of the BHS consolidates previous findings, where either one-factor solutions were reported (Young et al., 1992), or positively and negatively worded items loaded on separate factors (e.g., Hill et al., 1988; Rosenfeld, et al., 2004). In addition to providing an explanation for previous results and a clearer delineation of hopelessness as a unitary construct, modeling both content and method-related factors allowed a more precise assessment of the reliability of the BHS in this study. Our results show that Cronbach’s alpha overestimates the reliability of the BHS, and provide a more precise assessment of the amount of variance accounted for by the underlying construct itself. The results also add to a growing body of evidence questioning the three-factor structure of the BHS proposed by Beck et al. (1974). Considering the limited support for this structure, researchers and clinicians need to exercise caution when using subscale scores derived from the three putative components. Although differential associations of the components with other measures, for example, with a desire for hastened death have previously been reported, the majority of correlations obtained in that study were in fact similar for all three components (Rosenfeld et al., 2004). A more recent study also reported that the three components were associated with depression, aggression, and various personality traits to similar degrees (Iliceto & Fino, 2014). Therefore, a single score assessing the unitary construct of hopelessness would be more advisable to use both in research and clinical practice. In the present study, Ó 2015 Hogrefe Publishing

M. Szabó et al.: The Beck Hopelessness Scale

total scores derived from the 18-item Hungarian language BHS had a stronger relationship with depression than with anxiety, and were able to discriminate between individuals diagnosed with depression, mixed-anxiety depression, phobic and other anxiety disorders, and those with no diagnosed disorders. These results are consistent with those of previous findings (Beck, Riskind, et al., 1988), and support the validity and potential utility of the BHS as a measure of a unitary hopelessness construct in a Hungarian clinical population. Similar to previous studies (e.g., Aish & Wasserman, 2001; Steer et al., 1994), our results also indicated that not all 20 items of the BHS contribute substantially to the definition of hopelessness. Each model we tested indicated that items 10 and 13 were weakly or inconsistently associated with the underlying construct. Item 10 (My experiences prepared me well for future) had the lowest loading in the original PCA reported by Beck et al. (1974) and has been identified as a weak item by others (e.g., Steer et al., 1994). It appears that this statement may be understood as having either an optimistic or a pessimistic tone, and may therefore be interpreted inconsistently by the respondents. Indeed, in our clinical experience, respondents often ask for clarification for this item when completing the BHS. The second item we selected for deletion, item 13 (When I think about the future, I expect to be happier than I am now), has not previously been identified as a weak item. Therefore, it is possible that this item may have been affected by the translation process. For example, the Hungarian word used to translate the English expression ‘‘I expect . . . (to be happier)’’ can also be understood as ‘‘I hope . . . (to be happier),’’ lending a less clearly positive meaning to this item. Considering the nature and aims of our study, it cannot be ruled out that other language or cultural differences affected our results. While the results make both theoretical and statistical sense and are largely consistent with previous data, future studies in other cultures are needed to establish whether the BHS indeed reflects a unitary construct of hopelessness with method effects, rather than two dimensions of ‘‘hopelessness’’ versus ‘‘hopefulness,’’ or the three commonly used dimensions initially offered by Beck et al. (1974). Replication of the inclusion and exclusion of particular items is also necessary in varied cultural and language groups, and in both clinic-referred and psychiatrically healthy individuals. For example, because we know that hopelessness is associated with wide range of psychopathologies, we included a heterogeneous psychiatric sample to study this phenomenon. Further studies may now explore the same question in more homogenous groups, in particular in depressed and suicidal individuals. Although total BHS scores differentiated between various diagnostic groups and a nonclinical sample in our study, it also needs to be acknowledged that internal consistency appeared to be substantially lower in the nonclinical sample, compared to the clinic-referred groups. This finding supports the contention that the construct or measurement of hopelessness needs to be considered separately in clinical versus nonclinical samples, and calls for future research to examine possible differences in the experience of hopelessness in Ó 2015 Hogrefe Publishing

117

psychiatrically healthy and in clinic-referred individuals (Dozois & Covin, 2004; Rosenfeld et al., 2004). Finally, because it was not possible to establish a standardized environment for data collection, uncontrollable environmental factors in the data collection process may have influenced our results and further underline the need for replication. In the meantime, our study is the first to use modern factor analytic methods in a large clinical sample to reconcile previously inconsistent results concerning the BHS. Our data have shown that hopelessness is a unitary construct, and that method effects associated with positive and negative item wording need to be acknowledged when using this instrument in research and clinical practice.

References Aish, A. M., & Wasserman, D. (2001). Does Beck’s Hopelessness Scale really measure several components? Psychological Medicine, 31, 367–372. Beck, A. T. (1963). Thinking and depression. Archives of General Psychiatry, 9, 326–333. Beck, A. T., Brown, G., Berchick, R. J., Stewart, B. L., & Steer, R. A. (1990). Relationship between hopelessness and ultimate suicide: A replication with psychiatric outpatients. American Journal of Psychiatry, 147, 190–195. Beck, A. T., Epstein, N., Brown, G., & Steer, R. A. (1988). An inventory for measuring clinical anxiety: Psychometric properties. Journal of Consulting and Clinical Psychology, 56, 893–897. Beck, A. T., Riskind, J. H., Brown, G., & Steer, R. A. (1988). Levels of hopelessness in DSM-III disorders: A partial test of content specificity in depression. Cognitive Therapy and Research, 12, 459–469. Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561–571. Beck, A. T., Weissman, A., Lester, D., & Trexler, L. (1974). The measurement of pessimism: The hopelessness scale. Journal of Consulting and Clinical Psychology, 42, 861–865. Brunner, M., Nagy, G., & Wilhelm, O. (2012). A tutorial on hierarchically structured constructs. Journal of Personality, 80, 796–846. Davidson, M. A., Tripp, D. A., Fabrigar, L. R., & Davidson, P. R. (2008). Chronic pain assessment: A seven-factor model. Pain Research & Management, 13, 299–308. Dozois, D., & Covin, R. (2004). The Beck Depression Inventory-II (BDI-II), the Beck Hopelessness Scale (BHS) and Beck Scale for Suicidal Ideation (BSS). In M. Hersen (Ed.), Comprehensive handbook of psychological assessment (Vol. 2, Ch. 5, pp. 50–85). Hoboken, NJ: Wiley. Dyce, J. A. (1996). Factor structure of the Beck Hopelessness Scale. Journal of Clinical Psychology, 52, 555–558. Hill, R. D., Gallagher, D., Thompson, L. W., & Ishida, T. (1988). Hopelessness as a measure of suicidal intent in the depressed elderly. Psychology and Aging, 3, 230–232. Iliceto, P., & Fino, E. (2014). Beck Hopelessness Scale (BHS): A second-order confirmatory factor analysis. European Journal of Psychological Assessment. Advance online publication. http://dx.doi.org/10.1027/1015-5759/a000201 Kao, Y. C., Liu, Y. P., & Lu, C. W. (2012). Beck Hopelessness Scale: Exploring its dimensionality in patients with schizophrenia. Psychiatric Quarterly, 83, 241–255. Muthén, L. K., & Muthén, B. O. (1998-2010). Mplus User’s Guide (Sixth Edition). Los Angeles, CA: Muthén & Muthén. European Journal of Psychological Assessment 2016; Vol. 32(2):111–118

118

M. Szabó et al.: The Beck Hopelessness Scale

Nissim, R., Flora, D. B., Cribbie, R. A., Zimmerman, C., Gagliese, L., & Rodin, G. (2009). Factor structure of the Beck Hopelessness Scale in individuals with advanced cancer. Psychooncology, 19, 255–263. Perczel-Forintos, D., Ajtay, G. Y., Barna, C. S., Kiss, Z. S., & Komlósi, S. (2012). Kérdo}ívek, becslo}skálák a klinikai pszichológiában [Questionnaires and Assessment Methods in Clinical Psychology]. Budapest: Semmelweis Kiadó. Perczel-Forintos, D., Sallai, J., & Rózsa, S. (2010). Adaptation of the Beck Hopelessness Scale in Hungary. Psychological Topics, 19, 307–321. Pompili, M., Tatarelli, R., Rogers, J. R., & Lester, D. (2007). The Hopelessness Scale: A factor analysis. Psychological Reports, 100, 375–378. Rosenfeld, B., Gibson, C., Kramer, M., & Breitbart, W. (2004). Hopelessness and terminal illness: The construct of hopelessness in patients with advanced AIDS. Palliative & Supportive Care, 2, 43–53. Steed, L. (2001). Further validity and reliability evidence for Beck Hopelessness Scale scores in a nonclinical sample. Educational and Psychological Measurement, 61, 303–316. Steer, R. A., Beck, A. T., & Brown, G. K. (1997). Factors of the Beck Hopelessness Scale: Fact or artifact? Multivariate Experimental Clinical Research, 11, 131–144. Steer, R. A., Iguchi, M. Y., & Platt, J. J. (1994). Hopelessness in IV drug-users not in treatment and seeking HIV testing and counseling. Drug and Alcohol Dependence, 34, 99–103. Tanaka, E., Sakamoto, S., Ono, Y., Fujihara, S., & Kitamura, T. (1998). Hopelessness in a community population: Factorial structure and psychosocial correlates. Journal of Social Psychology, 138, 581–590.

European Journal of Psychological Assessment 2016; Vol. 32(2):111–118

Tomas, J. M., & Oliver, A. (1999). Rosenberg’s Self-Esteem Scale: Two factors or method effects. Structural Equation Modeling: A Multidisciplinary Journal, 6, 84–98. WHO. (2013, November 13). International classification of diseases. [web page]. Retrieved from http://www.who.int/ classifications/icd/en/ Young, M. A., Halper, I. S., Clark, D. C., Scheftner, W., & Fawcett, J. (1992). An item-response theory evaluation of the Beck Hopelessness Scale. Cognitive Therapy and Research, 16, 579–587. Yu, C.-Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. Unpublished doctoral dissertation. University of California, Los Angeles, CA.

Date of acceptance: August 22, 2014 Published online: February 27, 2015

Dóra Perczel-Forintos Department of Clinical Psychology Semmelweis University Tömõ street 25–29 1083 Budapest Hungary E-mail perczel_forintos.dora@med.semmelweis-univ.hu

Ó 2015 Hogrefe Publishing

Original Article

The Utrecht-Management of Identity Commitments Scale (U-MICS) Measurement Invariance and Cross-National Comparisons of Youth From Seven European Countries Radosveta Dimitrova,1 Elisabetta Crocetti,2 Carmen Buzea,3 Venzislav Jordanov,4 Marianna Kosic,5 Ergyul Tair,6 Jitka Taušová,7 Natasja van Cittert,8 and Fitim Uka9 1

Department of Psychology, Stockholm University, Sweden, 2Utrecht University, The Netherlands, 3 Transylvania University of Brasov, Romania, 4Venzislav Jordanov, National Sports Academy, Bulgaria, 5 Scientific-Cultural Institute Mandala, Slovene Research Institute, Italy, 6Bulgarian Academy of Sciences, Bulgaria, 7Palacky´ University, Czech Republic, 8Tilburg University, The Netherlands, 9 European Center for Vocational Education ‘‘Qeap-Heimerer’’, Kosovo Abstract. The Utrecht-Management of Identity Commitments Scale (U-MICS; Crocetti, Rubini, & Meeus, 2008) is a recently developed measure of identity that has been shown to be a reliable tool for assessing identity processes in adolescents. This study examines psychometric properties of the U-MICS in a large adolescent sample from seven European countries focused on the interplay of commitment, in-depth exploration, and reconsideration of commitment. Participants were 1,007 adolescents from Bulgaria (n = 146), the Czech Republic (n = 142), Italy (n = 144), Kosovo (n = 150), Romania (n = 142), Slovenia (n = 156), and the Netherlands (n = 127). We tested the U-MICS measurement invariance, reliability estimates in each language version, and compared latent identity means across groups. Results showed that the U-MICS has good internal consistency as well as configural, metric, and partial scalar invariance across groups in the sampled countries. Keywords: U-MICS, identity, Bulgaria, the Czech Republic, Kosovo, the Netherlands, Slovenia, Romania, cross-national comparisons

In his seminal work, Erikson (1950) postulated the development of a coherent sense of identity as a core task for adolescents. Among the most influential elaborations of Erikson’s view on identity formation was Marcia’s (1966) identity status model that distinguished four identity statuses, based on the amount of exploration and commitment of adolescent experiences. Exploration refers to consideration of a broad array of goals, values, and beliefs, and commitment refers to adopting one or more of these. The combination of these dimensions leads to identity statuses of achievement (active exploration leading to a firm set of identity commitments), foreclosure (strong commitments without much exploration), moratorium (active exploration of different alternatives without clear commitments), and diffusion (absence of commitment and haphazard exploration). Numerous studies have found support for Marcia’s identity status model (for a review see Kroger & Marcia, 2011). However, there have been concerns about identity status Ó 2015 Hogrefe Publishing

research because of its strong focus on classifying individuals into statuses rather than examining the process of identity development (Bosma, 1985; Côté & Levine, 1988). In response to such largely characterological rather than developmental approaches, scholars began to acknowledge the importance of studying identity formation as a developmental process in addition to investigating differences between and among statuses (Grotevant, 1987; Stephen, Fraser, & Marcia, 1992). Therefore, more nuanced process identity models have been proposed (for a review see Meeus, 2011), along with clear advantages for identity research. Process models have expanded the study of identity in at least two ways (Crocetti, Schwartz, Fermani, & Meeus, 2010). First, they have provided a conceptual framework for tracking developmental processes over time (e.g., Klimstra, Hale, Raaijmakers, Branje, & Meeus, 2010; Luyckx, Goossens, & Soenens, 2006; Meeus, van de Schoot, Keijsers, Schwartz, & Branje, 2010). In this respect, European Journal of Psychological Assessment 2016; Vol. 32(2):119–127 DOI: 10.1027/1015-5759/a000241

120

R. Dimitrova et al.: U-MICS Across Europe

longitudinal studies have shown that personal identity develops progressively during adolescence (for a review see Meeus, 2011). Most research findings on identity dimensions show identity maturation, with youth increasing their certainty about current commitments over the course of adolescence (Klimstra et al., 2010). Longitudinal studies on identity statuses indicate that a substantial number of adolescents remain in the same identity status over time, whereas adolescents who undertake identity transitions show mainly identity progressions (e.g., from the status of moratorium to the status of achievement). Identity regressions (e.g., from achievement to foreclosure) are less common (Meeus et al., 2010). Second, process models may be used flexibly in variable-centered approaches (which focus on the links between identity processes and relevant correlates) and in personcentered approaches (which focus on differences among individuals classified into various identity statuses). In this regard, identity statuses can be derived from processoriented models by means of empirical classification methods (e.g., cluster analysis, latent class analysis; cf. Crocetti & Meeus, 2014). In this way, it has been possible to advance the identity literature by empirically extracting identity statuses theorized by Marcia (1966) and by identifying new statuses that have clarified contradictory findings reported in the literature (see Crocetti, Rubini, Luyckx, & Meeus, 2008, for a distinction between two types of moratorium; and Luyckx, Goossens, Soenens, Beyers, & Vansteenkiste, 2005 for a distinction between two types of diffusion). In line with this renewed line of research, Meeus, Crocetti, and colleagues (Crocetti, Rubini, & Meeus, 2008; Meeus et al., 2010) developed a three-factor identity model focused on the developmental dynamics of identity formation by including commitment, in-depth exploration, and reconsideration of commitment as core identity processes in adolescence. To measure this three-dimensional model of identity formation, Meeus and Crocetti developed the Utrecht-Management of Identity Commitments Scale (U-MICS; Crocetti, Rubini, & Meeus, 2008). The present study was designed to examine psychometric properties of the U-MICS in a large adolescent sample from seven European countries.

U-MICS Three-Dimensional Approach: Commitment, In-Depth Exploration, and Reconsideration of Commitment The three-dimensional model (Crocetti, Rubini, & Meeus, 2008; Meeus et al., 2010) posits three dimensions as underlying identity formation. Commitment involves enduring choices that youth make in various developmental domains and the self-confidence they derive from these choices. In-depth exploration refers to the extent to which youth explore actively their commitments, reflect on them, and discuss their choices with other people. Reconsideration of commitment includes the comparison of present European Journal of Psychological Assessment 2016; Vol. 32(2):119–127

commitments with alternative commitments because the current ones are no longer fulfilling. This three-factor model includes a dual cycle process. Specifically, the first cycle is focused on identity formation: adolescents form commitments by considering and reconsidering them. The second cycle represents identity maintenance: adolescents become more familiar with their present commitments through in-depth exploration of current commitments. Thus, the tentative order of both cycles is reconsideration-commitment – in-depth exploration (Meeus, 2011). Furthermore, the Meeus-Crocetti model captures Erikson’s (1950) dynamic of identity versus identity diffusion by including commitment, exploration in depth, and reconsideration. In addition, similar to Marcia’s (1966) dimensions of exploration and commitment, the Meeus-Crocetti model can be employed to assign participants to identity status categories (e.g., Crocetti, Rubini, Luyckx, et al., 2008). It should be noted, however, that the Meeus-Crocetti model differs from Marcia’s (1966) model in two major ways. First, it differentiates Marcia’s concept of exploration into in-depth exploration (whose function is to maintain and validate existing commitments) and reconsideration of commitment (whose function is to change current commitments searching for new alternatives). Second, the Meeus-Crocetti model adopts more of a process orientation by assuming that adolescents regularly reflect upon their present commitments. As a result, commitments are formed and revised through a process of choosing commitments and reconsidering them (Luyckx, Goossens, & Soenens, 2006; Luyckx, Goossens, Soenens, & Beyers, 2006). Additionally, Crocetti, Rubini, and Meeus (2008) found that commitment, in-depth exploration, and reconsideration of commitment are interrelated but relatively distinct identity processes. Commitment was found to be positively related to in-depth exploration (associations between these factors are strong, with standardized values of interrelationships between latent variables being around .50/.60; Crocetti, Rubini, & Meeus, 2008; Crocetti et al., 2010), whereas in-depth exploration was positively associated with reconsideration of commitment (associations between these factors are moderate, with standardized values of interrelationships between latent variables being comprised between .15 and .40; Crocetti, Rubini, & Meeus, 2008; Crocetti et al., 2010). Indeed, youth with strong commitments also actively explored their present choices and those who explored present commitments also explored commitment alternatives. Finally, commitment and reconsideration of commitment were not interrelated in adolescence, indicating that endorsing and evaluating commitments represent separate processes (Luyckx, Goossens, Soenens, & Beyers, 2006). The recently developed Utrecht-Management of Identity Commitments Scale (U-MICS; Crocetti, Rubini, & Meeus, 2008) has been shown to be a valid and reliable tool for assessing identity processes in the Netherlands (Crocetti et al., 2008), in Italy (Crocetti et al., 2010), and in French-speaking Switzerland (Zimmermann, Biermann, Mantzouranis, Genoud, & Crocetti, 2012). The three-factor structure of the U-MICS was found to be consistent across Ó 2015 Hogrefe Publishing

R. Dimitrova et al.: U-MICS Across Europe

gender, age, and across ethnic minority and mainstream groups (Crocetti, Avanzi, Hawk, Fraccaroli, & Meeus, 2014; Crocetti, Rubini, & Meeus, 2008; Crocetti et al., 2010).

Aims and Hypotheses Based on prior evidence indicating that the U-MICS is a valuable instrument for assessing identity processes and the need to ascertain the usefulness of identity measures across national contexts (Schwartz, Adamson, FerrerWreder, Dillon, & Berman, 2006), this study examines the psychometric properties of the U-MICS in different countries across Europe (i.e., Bulgaria, the Czech Republic, Italy, Kosovo, the Netherlands, Romania, and Slovenia). First, using Confirmatory Factor Analysis (CFA), we examined the three-factor model validity of all seven language versions of the U-MICS. Based on previous findings with Dutch-, Italian-, and French-speaking respondents (Crocetti, Rubini, & Meeus, 2008; Crocetti et al., 2010; Zimmermann et al., 2012), we expected that a three-factor solution would provide a good fit in each national sample (Hypothesis 1). Second, we compared the U-MICS factor structure across samples to test measurement invariance across national groups. This test requires multiple hierarchical steps (e.g., Byrne & van de Vijver, 2010; van de Schoot, Lugtig, & Hox, 2012): (a) configural invariance (the same number of factors and pattern of fixed and freely estimated parameters hold across groups); (b) metric invariance (equivalence of factor loadings indicating that respondents from multiple groups attribute the same meaning to the latent construct of interest); and (c) scalar invariance (invariance of factor loadings and of item intercepts, indicating that the meaning of the construct and the levels of the underlying items are equal across groups). Based on previous cross-national comparisons in Italy and the Netherlands (Crocetti et al., 2010), we expected to establish configural and metric invariance, while also extending prior research in examining scalar invariance across the seven target countries in this study (Hypothesis 2). Third, assuming that cross-national measurement invariance would be found, we sought to test for structural invariance across countries (e.g., Cheung, 2008; Vandenberg, 2002). We compared mean scores and covariances of the three U-MICS factors across countries. In line with available evidence (Crocetti et al., 2010), we hypothesized that differences would occur in the endorsement of identity processes (i.e., latent mean scores) (Hypothesis 3). However, since this is the first time in which identity processes are compared across a wide range of nations, we could not advance more detailed hypotheses about direction of differences.

Method Participants and Procedure Participants were 1,007 adolescents (age: M = 15.96 years, SD = 1.47) from Bulgaria (n = 146), the Czech Republic Ó 2015 Hogrefe Publishing

121

(n = 142), Italy (n = 144), Kosovo (n = 150), Romania (n = 142), Slovenia (n = 156), and the Netherlands (n = 127). Gender and age demographics for each country are reported in Table 1. Participants for this study were recruited through public schools in major towns in Bulgaria (Sofia), the Czech Republic (North Bohemia: Lovosice, Most, Krupka, Teplice; Central Bohemia: Prague; South Moravia: Olomouc, Brno, Prostejov; Czech Silesia: Ostrava), Italy (Cesena, Macerata, and Trieste), Kosovo (Pristine), the Netherlands (Breda and Eindhoven), Romania (Brasov), and Slovenia (Lublin). Prior to data collection, local school authorities, teachers, parents, and students were informed about the purpose and methods of the study to acquire their consent and participation.

Measures Sociodemographic Questionnaire Participants in all countries provided information on their nationality, age, and gender.

Identity We employed the U-MICS (Crocetti et al., 2008) to assess commitment, in-depth exploration, and reconsideration of commitment across countries. The Dutch (Crocetti et al., 2008), Italian (Crocetti et al., 2010), and Romanian (Negru & Crocetti, 2010) versions were already available. The original English version was translated into Bulgarian, Kosovan, Czech, and Slovenian by a team of bilingual translators following the recommended procedures for the establishment of linguistic equivalence (Van de Vijver & Leung, 1997). The U-MICS consists of 13 items rated on a response scale ranging from 1 (= completely untrue) to 5 (= completely true). In this study, we measured identity dimensions in one ideological domain (education) and in one interpersonal domain (friendship), therefore presenting each item once for the ideological domain and once for the interpersonal domain. Specifically, across domains, 10 items measure commitment, 10 items assess in-depth exploration, and 6 items refer to reconsideration of commitment for a total of 26 items. Sample items are ‘‘My education/best friend gives me certainty in life’’ (commitment), ‘‘I think a lot about my education/best friend’’ (in-depth exploration), and ‘‘I often think it would be better to try to find a different education/best friend’’ (reconsideration of commitment). Ideological and interpersonal domain items can be also combined into one global identity factor (Crocetti, Rubini, & Meeus, 2008). Thus, for each of the identity dimensions, we combined responses across the ideological and interpersonal domains (Crocetti, Rubini, & Meeus, 2008). Good internal reliability consistency of the U-MICS subscales across samples, in terms of Cronbach’s alphas, was found (Table 1). European Journal of Psychological Assessment 2016; Vol. 32(2):119–127

Notes. Missing information on gender (number of cases): Bulgaria (2), Slovenia (5), Italy (2). Comm = Commitment; In-DepE = In-depth exploration; RecCom = Reconsideration of commitment.

2.23 (0.78) .75 1.86 (0.78) .66 2.07 (0.78) .77 1.96 (0.59) .71 2.68 (0.70) .68 2.44 (0.76) .77 2.51 (0.82) .80 RecCom, M (SD) Cronbach’s a

2.08 (0.81) .82

3.30 (0.60) .74 3.09 (0.57) .74 3.24 (0.56) .70 3.16 (0.44) .63 3.51 (0.61) .77 3.31 (0.58) .75 3.48 (0.60) .79 In-DepE, M (SD) Cronbach’s a

3.25 (0.76) .82

3.65 (0.60) .80 3.68 (0.40) .72 3.68 (0.56) .79 3.83 (0.47) .77 3.75 (0.51) .80 3.36 (0.66) .86 3.43 (0.71) .84 3.80 (0.65) .87 Comm, M (SD) Cronbach’s a

552 (55.3%) 446 (44.7%) 82 (64.6%) 45 (35.4%) 53 (35.1%) 98 (64.9%) 96 (67.6%) 46 (32.4%) 96 (64%) 54 (36%) 87 (61.3%) 55 (38.7%) 67 (47.2%) 75 (52.8%) 71 (49.3%) 73 (50.7%) Gender, n (%) Female Male

15.49 (0.95) 17.33 (0.82) 16.35 (0.65) 16.47 (1.36) 16.30 (1.28) 14.79 (1.11) 14.86 (1.74) Age Mean (SD)

The Netherlands (n = 127) Slovenia (n = 156) Romania (n = 142) Kosovo (n = 150) Italy (n = 144) Czech Republic (n = 142) Bulgaria (n = 146)

Table 1. Means and standard deviations for adolescents in seven countries

15.96 (1.47)

R. Dimitrova et al.: U-MICS Across Europe

Total sample (n = 1,007)

122

European Journal of Psychological Assessment 2016; Vol. 32(2):119–127

Results Confirmatory Factor Analyses in Each National Context The first aim of this study was to examine the extent to which the three-factor structure of the U-MICS documented in previous studies (Crocetti, Rubini, & Meeus, 2008, Crocetti et al., 2010; Zimmermann et al., 2012) would be replicated in the samples examined in the current investigation. To this end, we first performed CFAs in the total sample and in each national sample. We conducted analyses in Mplus 7.11 (Muthén & Muthén, 2010) using the maximum likelihood estimation. We tested a model consisting of three latent variables (commitment, in-depth exploration, and reconsideration of commitment) and three observed indicators per each latent variable. In line with previous identity studies (e.g., Crocetti, Rubini, & Meeus, 2008; Crocetti et al., 2010; Luyckx et al., 2006) and statistical recommendations (e.g., Marsh, Hau, Balla, & Grayson, 1998) three observed indicators for each latent factor were constructed through the item-to-construct balance parceling method (Little, Cunningham, Shahar, & Widaman, 2002). Following prior research on the U-MICS (Crocetti, Rubini, & Meeus, 2008; Crocetti et al., 2010; Zimmermann et al., 2012), we applied parceling rather than individual level item approach for several reasons. The advantageous properties of parcels have been shown to present greater reliability (Kishton & Widaman, 1994), larger ratio of common-to-unique variance (Little et al., 2002), a more closely approximate distributions of normality (Bagozzi & Heatherton, 1994), a more optimal sample size ratio indication (Bagozzi & Edwards, 1998), a greater likelihood of achieving a good model solution (Marsh et al., 1998), and better model fit (Bandalos & Finney, 2001). Model fit was examined in terms of various indices (Byrne, 2009): the ratio of the chi-square statistic to the degrees of freedom (v2/df ) should be less than 3; the Comparative Fit Index (CFI) is considered acceptable with values higher than .90 and excellent with values exceeding .95; and the Root Mean Square Error of Approximation (RMSEA) should be less than .08, with values less than .05 representing a very good fit. As reported in Table 2, fit indices suggested that the U-MICS three-factor structure fit the data well. Model comparisons (conducted computing changes in v2 and in model fit indices) indicated that the fit of the three-factor model was significantly better than the fit of the one-factor model, in which all indicators loaded on a single latent general identity factor (Dv2 = 2,289.093, Ddf = 3, p < .001, DCFI = .503, DRMSEA = .251) and of the two-factor model, in which the indicators for in-depth exploration and reconsideration of commitment loaded on the same general exploration factor (Dv2 = 1,347.478, Ddf = 2, p < .001, DCFI = .296, DRMSEA = .188). Additionally, the three-factor model fit within each national group (see Table 2). Ó 2015 Hogrefe Publishing

R. Dimitrova et al.: U-MICS Across Europe

123

Table 2. Fit indices for the U-MICS v2

v2/df

CFI

RMSEA (90% CI)

One-factor model Overall sample

2,357.717

87.322

.487

.294 (.284; .304)

Two-factor model Overall sample

1,416.102

54.565

.694

.231 (.221; .241)

Three-factor model Overall sample

68.624

2.85

.990

.043 (.031; .055)

National samples Bulgaria Czech Republic Italy Kosovo The Netherlands Romania Slovenia

60.177 44.185 38.602 33.689 28.883 18.575 33.899

24 24 24 24 24 24 24

2.50 1.84 1.60 1.40 1.20 0.78 1.41

.956 .973 .978 .985 .989 1.00 .983

.102 .078 .065 .052 .040 .000 .052

(.070; (.040; (.021; (.000; (.000; (.000; (.000;

.134) .113) .102) .090) .086) .048) .089)

Notes. v2 = chi-square; df = degrees of freedom; CFI = Comparative Fit Index; RMSEA = Root Mean Square Error of Approximation and 90% Confidence Interval.

Table 3. Tests of U-MICS national measurement invariance Model fit v

Model comparisons

Dv /df

CFI

1. Configural invariance 2. Metric invariance 3a. Full scalar invariance (compared to 2) 3b. Partial scalar invariance (compared to 2) 4. Covariance invariances (compared to 2)

258.010 331.587 622.019 422.943

168 204 240 222

1.54 1.62 2.59 1.91

.980 .971 .913 .954

.061 .066 .106 .080

.076) .079) .116) .091)

.009 .058 .017

.005 .040 .014

Correlation between commitment and in-depth exploration Correlation between in-depth exploration and reconsideration of commitment Correlation between commitment and reconsideration of commitment

384.091

210

1.83

.960

.076 (.064; .088)

.011

.010

347.307

210

1.65

.969

.068 (.055; .080)

.002

353.040

210

1.68

.968

.069 (.056; .081)

.003

RSMEA (90% CI) (.046; (.053; (.095; (.068;

DCFI

DRMSEA

Notes. v2 = chi-square; df = degrees of freedom; CFI = Comparative Fit Index; RMSEA = Root Mean Square Error of Approximation and 90% Confidence Interval; D = Change in the parameter.

Measurement Invariance Tests The second aim of this study was to test measurement invariance across national groups. In order to evaluate this aim, we conducted successive multigroup CFAs. To determine significant differences between models, we followed Chen’s (2007, p. 501) recommendations according to which a DCFI .010, supplemented by DRMSEA .015 would be indicative of non-invariance. Findings reported in Table 3 clearly indicated the presence of configural (the same number of factors and pattern of fixed and freely estimated parameters held across groups) and metric (factor loadings were invariant indicating that respondents from multiple groups attribute the same meaning to the latent construct of interest) invariance. Standardized factor loadings for the total sample are reported in Figure 1. These loadings were very high, ranging from .74 to .86. Ó 2015 Hogrefe Publishing

Finally, full scalar invariance was not established, indicating that intercepts differ across groups. Therefore, we tried to establish partial scalar invariance (i.e., constraining two intercepts for each latent factor to be equal across groups and freeing one intercept). Specifically, we freely estimated intercepts of parcels 2, 6, and 7. In this case the DCFI suggested non-invariance, but the DRMSEA was lower than the cutoff of .015.

Structural Invariance Tests The third aim of this study was to test structural invariance. This aim included two subgoals: (a) to compare latent means; and (b) to examine covariances among identity processes across countries. Because we could establish partial scalar invariance, we compared latent means across

European Journal of Psychological Assessment 2016; Vol. 32(2):119–127

124

R. Dimitrova et al.: U-MICS Across Europe

Parcel 1 Parcel 2 Parcel 3

Figure 1. Standardized solution of the three-factor model of the U-MICS. All factor loadings and correlations are significant at p < .001 with the only exception of the correlation between commitment and reconsideration of commitment that is not significant.

.80 .86

Commitment

.74

.60

Parcel 4 .80 Parcel 5

.81

In-Depth Exploration

-.06

.79 Parcel 6

.22

Parcel 7 .85 Parcel 8

.86

Reconsideration of Commitment

.79 Parcel 9

countries. Therefore, each national group was considered as the reference group and compared to all the other groups. Results yielded the following pattern of findings: (IT, CZ) < (NL, SLO, KO, BG, RO) for commitment, NL < RO (SLO, CZ) IT BG < KO for in-depth exploration, and (NL, RO, SLO, CZ) < IT BG KO for reconsideration of commitment. Additionally, as can be seen in Table 3, findings indicated that covariances between identity processes were all invariant across countries.

Discussion Identity formation is a dynamic lifelong and universal (Erikson, 1950) process and a central developmental task during adolescence (Arnett, 2000). Therefore, it is of particular importance to test the validity of instruments that measure identity processes across different cultural contexts. Various studies have shown that the U-MICS is a valid instrument and provided the evidence for its crossnational validity (Crocetti, Rubini, & Meeus, 2008; Crocetti et al., 2010, 2012; Zimmermann et al., 2012). Building on this prior evidence, the objective of this study was to explore the psychometric properties of the U-MICS in several countries across Europe (Bulgaria, the Czech Republic, Italy, Kosovo, the Netherlands, Romania, and Slovenia). European Journal of Psychological Assessment 2016; Vol. 32(2):119–127

In line with the first hypothesis, the results confirmed the three-factor structure of the U-MICS, indicating that commitment, in-depth exploration, and reconsideration of commitment represent distinct identity processes. Consistent with findings of Crocetti, Rubini, and Meeus (2008), the same three-factor structure was replicated within each country, providing evidence of validity of the U-MICS across different European contexts. In line with the second hypothesis, the results of a multigroup CFA indicated configural invariance, implying that the number of factors and the pattern of fixed and free parameters are constant across the national groups. Our findings also supported metric invariance, meaning that factor loadings on the subscales were equal across samples. This is indicative of the assumption that participants from all seven countries attributed the same meaning to the latent construct measured by the U-MICS. According to the third hypothesis, we expected to find scalar invariance. The results partly met our expectations. Although the assumption of full scalar invariance was not satisfied, we were able to establish partial scalar invariance (Byrne, Shavenlson, & Muthén, 1989). Accordingly, we found that intercepts of parceled indicators were comparable across samples participating in our study. Thus, we were able to replicate results of previous single-cultural studies on the U-MICS and we were able to demonstrate that identity dimensions were similar across a variety of cultural settings. From these findings it can Ó 2015 Hogrefe Publishing

R. Dimitrova et al.: U-MICS Across Europe

be concluded that examining identity processes in youth is relevant not only in individualistic countries, where identity theories and much of research have been developed, but also in countries characterized by more collectivistic value orientations. Moreover, our findings attest a basic level of cross-national comparability of the U-MICS scale that can be validly used for comparison across different European countries. However, all countries we have investigated were European, thus findings might have been different in Asia, Africa, or Latin America. In a final step, we tested the structural invariance of the U-MICS by examining latent means and covariances among identity processes across countries. Thus, in addition to the structure of the construct, we examined crossnational differences in the mean level of identity across cultural groups. Results indicated diverse patterns of findings in comparing each national group as the reference group all the other groups regarding commitment, exploration, and reconsideration of commitment. Our findings indicated that identity uncertainty was detected primarily in adolescents from Bulgaria and Kosovo, who showed a combination of high commitment, high in-depth exploration, and high reconsideration of commitment. On the other hand, adolescents from the Netherlands, Romania, and Slovenia reported greater identity stability, as indicated by high commitment coupled with low reconsideration of commitment. An intermediate result was reported by adolescents from Italy and the Czech Republic, who scored low on commitment and moderately to low on reconsideration of commitment. These findings are exploratory and should be interpreted with caution given the small sample sizes within each participating country. Future larger cross-cultural studies are needed to identify and test socioeconomic and crosscultural dimensions that can explain this pattern of national differences. Additionally, we could also show that covariances between identity processes were all invariant across countries. The factor covariance invariance indicates that the three latent variables have similar interrelationships across groups (Milfont & Fischer, 2010). Taken together, these results provide evidence that the U-MICS is a reliable self-report measure for assessing the three identity factors of commitment, exploration, and reconsideration of commitment in a large adolescent sample across seven European countries. Thus, the measure proved to be theoretically and empirically useful in different cultural contexts.

125

youth might have different identity formation experiences. Further research using large samples is necessary to examine how aspects of commitment, in-depth exploration, and reconsideration of commitment are represented across different conditions relevant for identity formation among youth in other societies, especially outside Europe. In addition, we need to verify our results in more representative samples, including ethnic minority groups in the countries we have investigated. Our study was concerned with mainstream White samples only, thereby limiting our ability to generalize our observed effects to other groups in these countries. Thus, further replications of this study should involve both mainstream and ethnic minority groups. Specifically, additional replication in ethnic minority groups in Bulgaria (e.g., Turkish-Bulgarian and Roma), the Czech Republic (e.g., Roma), Italy (e.g., Slovene), Kosovo (e.g., Roma), the Netherlands (e.g., TurkishDutch), Romania (e.g., Roma), and Slovenia (e.g., Italian) will increase our confidence in the current findings. Lastly, an intriguing question remains regarding the relationship between identity formation and well-being and how this relationship may vary across national contexts. Future studies should include well-being measures in addition to U-MICS in different cultural contexts, which could model the more complex, potentially interactive, relationship between identity and well-being in a cross-cultural perspective.

Acknowledgments The authors would also like to acknowledge the support by a COFAS Marie Curie fellowship (Forte-projekt 20132669) to the first author and a Marie Curie fellowship (FP7-PEOPLE-2010-IEF; Project number 272400) to the second author. We would like to thank Katia Levovnik, Neli Filipova, Roza Krasteva, Iglika Nedelcheva, Silvia Bakardjieva, Stefan Aranyosi, Dorel Agache, principal Scataro, all school personnel, and students for their help in carrying out the study. This paper was part of the Early Researchers Union (ERU)/European Association of Developmental Psychology (EADP) writing week 2014 and we are profoundly grateful to EADP for supporting this event that made possible to work on this paper.

References Limitations and Suggestions for Future Research Our findings provide the first comparative perspective on the U-MICS across seven European countries; however, there are a number of limitations that need to be addressed. First, the sample size for each country was rather small, and these small sample sizes may partly be responsible for the lack of full scalar invariance in our data. Second, it is not clear how patterns observed in this study would generalize to adolescents and young adults in other countries in which Ó 2015 Hogrefe Publishing

Arnett, J. J. (2000). Emerging adulthood. A theory of development from the late teens through the twenties. The American Psychologist, 55, 469–480. doi: 10.1037/0003-066X. 55.5.469 Bagozzi, R. P., & Edwards, J. R. (1998). A general approach to representing constructs in organizational research. Organizational Research Methods, 1, 45–87. doi: 10.1177/ 109442819800100104 Bagozzi, R. P., & Heatherton, T. F. (1994). A general approach to representing multifaceted personality constructs: Application to state self-esteem. Structural Equation Modeling, 1, 35–67. doi: 10.1080/10705519409539961 European Journal of Psychological Assessment 2016; Vol. 32(2):119–127

126

R. Dimitrova et al.: U-MICS Across Europe

Bandalos, D. L., & Finney, S. J. (2001). Item parceling issues in structural equation modeling. In G. A. Marcoulides & R. E. Schumacker (Eds.), New developments and techniques in structural equation modeling (pp. 269–296). Mahwah, NJ: Erlbaum. Bosma, H. A. (1985). Identity development in adolescents: Coping with commitments (Unpublished doctoral dissertation). University of Groningen, The Netherlands. Byrne, B. M. (2009). Structural equation modeling with AMOS: Basic concepts, applications, and programming (2nd ed.). New York, NY: Routledge, Taylor & Francis Group. Byrne, B. M., Shavenlson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105, 456–466. doi: 10.1037/00332909.105.3.456 Byrne, B. M., & van de Vijver, F. J. R. (2010). Testing for measurement and structural equivalence in large-scale crosscultural studies: Addressing the issue of nonequivalence. International Journal of Testing, 10, 107–132. doi: 10.1080/ 15305051003637306 Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling – A Multidisciplinary Journal, 14, 464–504. doi: 10.1080/ 10705510701301834 Cheung, G. W. (2008). Testing equivalence in the structure, means, and variances of higher-order constructs with structural equation modeling. Organizational Research Methods, 11, 593–613. doi: 10.1177/1094428106298973 Côté, J. E., & Levine, C. (1988). A critical examination of the ego identity status paradigm. Developmental Review, 8, 147–184. doi: 10.1016/0273-2297 Crocetti, E., Avanzi, L., Hawk, S. T., Fraccaroli, F., & Meeus, W. (2014). Personal and social facets of job identity: A personcentered approach. Journal of Business and Psychology. doi: 10.1007/s10869-013-9313-x Crocetti, E., & Meeus, W. (2014). Identity statuses: Advantages of a person-centered approach. In K. C. McLean & M. Syed (Eds.), The Oxford handbook of identity development (pp. 97–114). New York, NY: Oxford University Press. Crocetti, E., Rubini, M., Luyckx, K., & Meeus, W. (2008). Identity formation in early and middle adolescents from various ethnic groups: From three dimensions to five statuses. Journal of Youth and Adolescence, 37, 983–996. doi: 10.1007/s10964-007-9222-2 Crocetti, E., Rubini, M., & Meeus, W. (2008). Capturing the dynamics of identity formation in various ethnic groups: Development and validation of a three-dimensional model. Journal of Adolescence, 31, 207–222. doi: 10.1016/ j.adolescence.2007.09.002 Crocetti, E., Schwartz, S., Fermani, A., Klimstra, T., & Meeus, W. (2012). A cross-national study of identity statuses in Dutch and Italian adolescents: Status distributions and correlates. European Psychologist, 17, 171–181. doi: 10.1027/10169040/a000076 Crocetti, E., Schwartz, S., Fermani, A., & Meeus, W. (2010). The Utrecht Management of Identity Commitments Scale (U-MICS): Italian validation and cross-national comparisons. European Journal of Psychological Assessment, 26, 169–183. doi: 10.1027/1015-5759/a000024 Erikson, E. (1950). Childhood and society. New York, NY: Norton. Grotevant, H. D. (1987). Toward a process model of identity formation. Journal of Adolescent Research, 2, 203–222. doi: 10.1177/074355488723003 Kishton, J. M., & Widaman, K. F. (1994). Unidimensional versus domain representative parceling of questionnaire items: An empirical example. Educational and Psychological Measurement, 54, 757–765. doi: 10.1177/ 0013164494054003022 European Journal of Psychological Assessment 2016; Vol. 32(2):119–127

Klimstra, T. A., Hale, W. W., Raaijmakers, Q. A. W., Branje, S. J. T., & Meeus, W. H. J. (2010). Identity formation in adolescence: Change or stability? Journal of Youth and Adolescence, 39, 150–162. doi: 10.1007/s10964-009-9401-4 Kroger, J., & Marcia, J. E. (2011). The identity statuses: Origins, meanings, and interpretations. In S. J. Schwartz, K. Luyckx, & V. L. Vignoles (Eds.), Handbook of identity theory and research (pp. 31–53). New York, NY: Springer. Little, T. D., Cunningham, W. A., Shahar, G., & Widaman, K. F. (2002). To parcel or not to parcel: Exploring the question, weighing the merits. Structural Equation Modeling, 9, 151–173. doi: 10.1207/S15328007SEM0902_1 Luyckx, K., Goossens, L., & Soenens, B. (2006). A developmental contextual perspective on identity construction in emerging adulthood: Change dynamics in commitment formation and commitment evaluation. Developmental Psychology, 42, 366–380. doi: 10.1037/0012-1649.42.2.366 Luyckx, K., Goossens, L., Soenens, B., & Beyers, W. (2006). Unpacking commitment and exploration: Validation of an integrative model of adolescent identity formation. Journal of Adolescence, 29, 361–378. doi: 10.1016/j.adolescence. 2005.03.008 Luyckx, K., Goossens, L., Soenens, B., Beyers, W., & Vansteenkiste, M. (2005). Identity statuses based on 4 rather than 2 identity dimensions: Extending and refining Marcia’s paradigm. Journal of Youth and Adolescence, 34, 605–618. doi: 10.1007/s10964-005-8949-x Marcia, J. E. (1966). Development and validation of ego-identity status. Journal of Personality and Social Psychology, 3, 551–558. doi: 10.1037/h0023281 Marsh, H. W., Hau, K. T., Balla, J. R., & Grayson, D. (1998). Is more ever too much? The number of indicators per factor in confirmatory factor analysis. Multivariate Behavioral Research, 33, 181–220. doi: 10.1207/s15327906mbr3302_1 Meeus, W. (2011). The study of adolescent identity formation 2000–2010. A review of longitudinal and narrative research. Journal of Research on Adolescence, 1, 75–94. doi: 10.1111/ j.1532-7795.2010.00716.x Meeus, W., van de Schoot, R., Keijsers, L., Schwartz, S. J., & Branje, S. (2010). On the progression and stability of adolescent identity formation. A five-wave longitudinal study in early-to-middle and middle-to-late adolescence. Child Development, 81, 1565–1581. doi: 10.1111/j.14678624.2010.01492.x Milfont, T. L., & Fischer, R. (2010). Testing measurement invariance across groups: Applications in cross-cultural research. International Journal of Psychological Research, 3, 111–121. Muthén, L. K., & Muthén, B. O. (2010). Mplus user’s guide (6th ed.). Los Angeles, CA: Muthén & Muthén. Negru, O., & Crocetti, E. (2010). Dimensions of well-being and identity development in Romanian and Italian emerging adults: A cross-cultural analysis. Psychology & Health, 25, 286. Schwartz, S. J., Adamson, L., Ferrer-Wreder, L., Dillon, F. R., & Berman, S. L. (2006). Identity status measurement across contexts: Variations in measurement structure and mean levels among White American, Hispanic American, and Swedish emerging adults. Journal of Personality Assessment, 86, 61–76. doi: 10.1207/s15327752jpa8601_08 Stephen, J., Fraser, E., & Marcia, J. E. (1992). Moratoriumachievement (Mama) cycles in lifespan identity development: Value orientations and reasoning system correlates. Journal of Adolescence, 15, 283–300. doi: 10.1016/0140-1971 (92)90031-Y van de Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. European Journal of Developmental Psychology, 9, 486–492. doi: 10.1080/ 17405629.2012.686740 Ó 2015 Hogrefe Publishing

R. Dimitrova et al.: U-MICS Across Europe

Van de Vijver, F. J. R., & Leung, K. (1997). Methods and data analysis for cross-cultural research. Thousand Oaks, CA: Sage. Vandenberg, R. J. (2002). Toward a further understanding of and improvement in measurement invariance methods and procedures. Organizational Research Methods, 5, 139–158. doi: 10.1177/1094428102005002001 Zimmermann, G., Biermann, E., Mantzouranis, G., Genoud, P. A., & Crocetti, E. (2012). Brief Report: The Identity Style Inventory (ISI-3) and the Utrecht-Management of Identity Commitments Scale (U-MICS): Factor structure, reliability, and convergent validity in French-speaking college students. Journal of Adolescence, 35, 461–465. doi: 10.1016/j.adolescence.2010.11.013

Ó 2015 Hogrefe Publishing

127

Date of acceptance: August 11, 2014 Published online: February 27, 2015

Radosveta Dimitrova Department of Psychology Stockholm University 10691 Stockholm Sweden E-mail rdimitrova@tiscali.it Tel. +468 163-881 Fax +468 159-342

European Journal of Psychological Assessment 2016; Vol. 32(2):119–127

Original Article

Measurement Invariance of the Self-Description Questionnaire II in a Chinese Sample Kim Chau Leung,1 Herbert W. Marsh,2,3 Rhonda G. Craven,2 and Adel S. Abduljabbar3 1

Hong Kong Institute of Education, Hong Kong, PR China, 2Australian Catholic University, Sydney, NSW, Australia, 3King Saud University, Riyadh, Saudi Arabia

Abstract. Studies on the construct validity of the Self-Description Questionnaire II (SDQII) have not compared the factor structure between the English and Chinese versions of the SDQII. By using rigorous multiple group comparison procedures based upon confirmatory factor analysis (CFA) of measurement invariance, the present study examined the responses of Australian high school students (N = 302) and Chinese high school students (N = 322) using the English and Chinese versions of the SDQII, respectively. CFA provided strong evidence that the factor structure (factor loading and item intercept) of the Chinese version of the SDQII in comparison to responses to the English version of the SDQII is invariant, therefore it allows researchers to confidently utilize both the English and Chinese versions of the SDQII with Chinese and Australian samples separately and cross-culturally. Keywords: Self-Description Questionnaire II, construct validity, measurement invariance, self-concept

An English version of the Self-Description Questionnaire II (SDQII) has been widely used and its construct validity has been well established (Boyle, 1994; Byrne, 1984; Hattie, 1992; Leach, Henson, Odom, & Cagle, 2006; Wylie, 1989). Although there is clear support for the reliability and validity of the Chinese version of the SDQII (Marsh, Kong, & Hau, 2000), measurement invariance between the Chinese and English versions of the SDQII has not been examined. The present investigation attempts to adopt rigorous multiple group comparisons based on CFA to test the measurement invariance of these two versions of selfconcept instruments.

Multidimensionality of Self-Concept Recent self-concept research has emphasized the multidimensionality of self-concept (e.g., Leung, Marsh, Craven, Yeung, & Abduljabbar, 2013; Leung, Marsh, Yeung, & Abduljabbar, 2014; Marsh, Byrne, & Shavelson, 1988; Marsh & Craven, 1997, 2006; Marsh & O’Mara, 2008). The multidimensionality of self-concept has been strongly supported by numerous factor analytic studies (e.g., Harter, 1982; Marsh, Barnes, & Hocevar, 1985; Marsh, Parker, & Barnes, 1985) and construct validity reviews (e.g., Byrne, 1984; Marsh & Shavelson, 1985).

European Journal of Psychological Assessment 2016; Vol. 32(2):128–139 DOI: 10.1027/1015-5759/a000242

Shavelson, Hubner, and Stanton (1976) posited that selfconcept is multifaceted instead of unidimensional in nature. As described in the Shavelson model, there is a general selfconcept defined by academic and nonacademic selfconcepts. Academic self-concept is further separated into different domains in particular subject areas, whereas nonacademic self-concept is divided into social, physical, moral, and emotional domains. On the basis of Shavelson et al.’s (1976) model, Marsh developed the SDQ instruments including Self-Description Questionnaire I (SDQI) for preadolescent primary school students, Self-Description Questionnaire II (SDQII) for junior high and high school students, and Self-Description Questionnaire III (SDQIII) for late adolescents and young adults (see Marsh, 1990b, 1992a, 1992b). SDQ research (Byrne, 1984; Hattie, 1992; Marsh, 1990a, 1993; Marsh & Craven, 1997; Marsh, Parada, Craven, & Finger, 2004; Marsh & Shavelson, 1985; Shavelson & Marsh, 1986) has provided sound evidence for the multidimensionality of self-concept and these SDQ instruments have been considered to be the best multidimensional instruments regarding their psychometric properties and construct validity (Boyle, 1994; Byrne, 1984; Hattie, 1992; Leach et al., 2006; Wylie, 1989). The majority of SDQ studies have involved Englishspeaking samples (e.g., American, Canadian, and Australian

2015 Hogrefe Publishing

K. C. Leung et al.: Measurement Invariance of Chinese SDQII

sample populations; e.g., Kaminski, Shafer, Neumann, & Ramos, 2006; Marsh, 1990b, 1992a; Marsh, 1994; Marsh, Craven, & Debus, 1991; Marsh, Ellis, & Craven, 2002; Marsh, Parada, & Ayotte, 2004; Marsh, Smith, & Barnes, 1985; Von der Luft, Harman, Koenig, Nixon-Cave, & Gaughan, 2008). Extensive research studies have revealed that the reliability, factor structure, and construct validity of the English version of the SDQII are sound (e.g., Leach et al., 2006; Marsh, 1992a, 1994; Marsh, Plucker, & Stocking, 2001). However, there is a dearth of studies examining the factor structure of the SDQII using Chinese students. For example, Kong (2000) developed a Chinese version of the SDQII and demonstrated that the reliability estimates were reasonably high and the factor structure for all of the SDQII subscales was distinct based on a sample of 5,694 Grade 8 and 9 students from 44 Chinese high schools in Hong Kong, China. Also, the convergent and discriminant validity, factorial invariance (factor loadings, factor covariances, and variances) over time, and stability coefficients of the measure were demonstrated. This work provided clear evidence to support the Chinese version of the SDQII as a reliable and valid measure of self-concept for Chinese students. Although these findings provide clear support for the validity of the SDQII in either English or Chinese version, they are uninformative since they did not compare the factor structure of the English and Chinese versions of the SDQII using rigorous multiple group comparison procedures based upon CFA of measurement invariance. There is a strong need to examine the generalizability of the SDQII in Chinese populations. Measurement Invariance When conducting cross-cultural comparisons of constructs, it is important to establish that the scales of measured constructs across the cultures examined have the same meaning and there are no cultural biases and differences in the meaning of items (Lee, Little, & Preacher, 2011; Marsh, Hau, Artelt, Baumert, & Peschar, 2006). Hence, steps should be taken to establish that the items consistently assess the measured constructs across the cultures examined and the scale properties are not affected by cultural differences. Typically, it can be established by testing measurement invariance (Meredith, 1993; Vandenberg & Lance, 2000; Widaman & Reise, 1997) adopting multigroup confirmatory factor analyses. Measurement invariance involves examination of a series of invariance models. In the first step, configural invariance model (Widaman & Reise, 1997) is used to test whether the factorial structure is the same across countries considered. In this model, there is no invariance of any estimated parameters except for those fixed to zero or one that define the factor structure in each group (Marsh et al., 2013). The configural invariance model serves as a baseline model against which the more restrictive invariance models can be compared.

2015 Hogrefe Publishing

129

The second model is the metric invariance model (Vandenberg & Lance, 2000) or weak measurement invariance model (Meredith, 1993). In this model, it requires the factor loadings to be invariant across the countries. When weak measurement invariance holds, it indicates that the factor loadings are comparable across countries and there is a constant metric between latent and manifest variables that allows comparisons of latent factor correlations across groups. After establishing metric invariance, test of strong measurement invariance (Meredith, 1993) or scalar invariance model (Vandenberg & Lance, 2000) will be conducted. Here, both the item intercepts as well as the factor loadings are set to be invariant across countries. When strong measurement invariance is established, it indicates that the latent factor means are comparable across countries (Marsh et al., 2009; Meredith, 1993; Widaman & Reise, 1997). Next is the strict measurement invariance model (Meredith, 1993) which requires invariance of item uniqueness in addition to invariant factor loadings and item intercepts. It justifies that the manifest scale scores can be meaningfully compared across countries (Marsh et al., 2009; Meredith, 1993; Widaman & Reise, 1997). Hence, it suggests that the reliability across groups is equal and permits comparison of variances of manifest variables.

Aims of the Present Investigation Hence, the present investigation attempts to adopt a CFA approach to test (1) the multidimensionality of the English and Chinese versions of the SDQII and (2) measurement invariance to compare the factor structure between the English and Chinese versions of the SDQII. Given the similar reliability and factor structure of both the English and Chinese versions of the SDQII that is evident from previous research (Kong, 2000), it was predicted that the factor structure of the Chinese version of the SDQII in comparison to responses to the English version of the SDQII will be invariant in the present study, thus supporting the measurement invariance of this measure.

Method Participants and Procedure Australian Participants Participants were 302 high school students (n = 102 in Year 7, n = 100 in Year 8, and n = 100 in Year 9) from two high schools in metropolitan Sydney, New South Wales, Australia. The age of the participants ranged from 11 to 16 years (M = 13.10, SD = 0.92). The sample comprised 152 males (50.3%) and 150 females (49.7%). Participants were from diverse socioeconomic backgrounds, ranging

European Journal of Psychological Assessment 2016; Vol. 32(2):128â&#x20AC;&#x201C;139

130

K. C. Leung et al.: Measurement Invariance of Chinese SDQII

from working class (59%) to middle class families (41%), and the vast majority (> 90%) of students were Caucasian.

Chinese Participants Participants were 322 Chinese high school students (n = 100 in Year 7, n = 105 in Year 8, and n = 117 in Year 9) from a Chinese high school in Hong Kong. The participants were all Chinese and ranged in age from 11 to 16 years (M = 13.15, SD = 0.90). The sample comprised 155 males (48.1%) and 167 females (51.9%). Participants also derived from diverse socioeconomic backgrounds, ranging from working class (57%) to middle class families (43%). Participants were asked to fill out the questionnaire during class. The Australian participants completed the English version of the SDQII (Marsh, 1992a), while the Chinese participants completed the Chinese version of the SDQII (Kong, 2000).

Materials English Version of the SDQII The SDQII was designed to measure 11 areas of the selfconcept of adolescents (Marsh, 1992a; Marsh, Parker, et al., 1985). The SDQII measures seven nonacademic self-concept scales (Opposite-Sex Relations, Same-Sex Relations, Parent Relations, Honesty-Trustworthiness, Emotional Stability, Physical Ability, Physical Appearance), three academic self-concept scales (Verbal, Mathematics, General School self-concept), and General self-concept. Responses to declarative sentence items are rated on a response scale that ranges from 1 ( false) to 6 (true). In the present investigation, the short form of the SDQII was used and it consists of 51 items (see Appendix). Previous studies shown in the SDQII test manual (Marsh, 1992a) and subsequent research (see Leung et al., 2014; Marsh, 1990a, 1993; Marsh & Craven, 1997; Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Marsh, Parada, Craven, et al., 2004) demonstrated that each subscale of SDQII had good reliability and factor analyses have consistently showed the 11 distinct factors. Chinese Version of the SDQII The Chinese version of the SDQII (Kong, 2000; Marsh et al., 2000) was utilized in the present study. This Chinese version is a translation of the original 11 subscales of the English version of the SDQII, which was designed to measure 11 facets of the self-concept of adolescents (Marsh, 1992a). As for the English version, the short form of the SDQII was used for the Chinese version, consisting of 51 items. Responses to declarative sentence items are rated on a response scale that ranges from 1 ( false) to 6 (true). The validity of the instrument has been well established, European Journal of Psychological Assessment 2016; Vol. 32(2):128–139

including SDQ studies using a Chinese translation of the instrument (Kong, 2000; Marsh et al., 2000). The factor structure was shown to be distinct in that the factor loadings are substantially high on the target factors, correlations among factors are reasonably low, and comparable to the normative data reported in the SDQII test manual (Marsh, 1992a). Moreover, the convergent and discriminant validity of the measure was established by multitrait-multitime analysis that demonstrated substantially high correlations between the same self-concept subscales at different occasions whereas the correlations between different selfconcept subscales at the same occasion were substantially lower and became even much lower across different test occasions (Kong, 2000). There is also strong evidence to support the factorial invariance of factor loadings, factor covariances, and variances in self-concept structure over time (Kong, 2000).

Statistical Analyses Reliability of the SDQII Cronbach’s alpha estimate of reliability, which ranges from 0 to 1, was used in this study. Factor Structure of the SDQII Fifty one items were used for the English and Chinese versions of the SDQII, respectively, and a 51 · 51 covariance matrix for 11 correlated SDQ subscales for each version of the SDQII was constructed for CFA utilizing Lisrel 8.54 (Jöreskog & Sörbom, 2003) to examine the factor structure of these two versions. Robust maximum likelihood was the method of estimation adopted for the models because of its robustness in correcting for nonnormality (Jöreskog & Sörbom, 2003). Also, Satorra-Bentler chi-square was estimated (Satorra & Bentler, 1988, 1994, 2001) since it will make adjustment on the goodnessof-fit chi-square for bias due to multivariate nonnormality by dividing the goodness-of-fit chi-square value for the model by the scaling correction factor which represents the degree of average multivariate kurtosis distorting the data. Goodness of fit of the models was assessed on the basis of evaluations of various fit indices. The fit indices, TuckerLewis Index (TLI), Relative Noncentrality Index (RNI), Comparative Fit Index (CFI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR), as well as chi-square test statistics were evaluated, as recommended by previous research (Bentler, 1990; Hu & Bentler, 1999; Kline, 2005; Marsh, Balla, & Hau, 1996; Marsh, Balla, & McDonald, 1988). The TLI, RNI, and CFI vary along a 0-to-1 continuum in which values greater than .90 and .95, respectively, are typically considered to reflect acceptable and excellent model fit (Bentler, 1990; Bentler & Bonett, 1980; Hu & Bentler, 1999). RMSEA values of less than .05 are taken to reflect 2015 Hogrefe Publishing

K. C. Leung et al.: Measurement Invariance of Chinese SDQII

a good model fit, while values less than .08 suggest a reasonable model fit (Browne & Cudeck, 1993; see also JĂśreskog & SĂśrbom, 1993). For SRMR, values below 0.10 suggest good model fit (Kline, 2005). In the present study, it was hypothesized that the a priori 11-correlated-factor model with factors corresponding to the 11 subscales of SDQII would provide a good fit to the data. Although the hypothesized 11-correlated-factor model was evaluated, a single-factor model in which all the measured variables across all scales were converged to one factor was also compared. The purpose of including single-factor models following corresponding multifactorial models was to further support the hypothesis that the multifactorial models should fit the data much better than the corresponding single-factor models. Measurement Invariance As mentioned in the previous section, a series of invariance models would be examined in the following sequence. In the first step, the configural invariance model (no invariance) is tested, followed by metric or weak measurement invariance model (factor loadings invariant), then the strong measurement or scalar invariance model (item intercepts invariant as well as the factor loadings invariant), and finally strict measurement invariance model (item uniqueness invariant in addition to factor loadings and item intercepts invariant). The fit indices mentioned above were assessed to indicate whether there was support for the invariance of the parameters. This involves comparison of more restrictive models with the less restrictive models: configural invariance model (no invariance) versus metric measurement invariance model (factor loadings invariant); metric measurement invariance model (factor loadings invariant) versus scalar invariance model (item intercepts invariant as well as the factor loadings invariant); scalar invariance model (item intercepts invariant as well as the factor loadings invariant) versus strict measurement invariance model (item uniqueness invariant in addition to factor loadings and item intercepts invariant). If adding invariance constraints in a model (a more restrictive model) leads to substantial decline in fit indices in comparison with a less restrictive model, the invariance constraints are not supported. Alternatively, if imposing invariance constraints in a model (a more restrictive model) does not lead to substantial decrement in fit indices in comparison with the less restrictive model, the invariance constraints are established. In evaluating the magnitude of the change in fit indices, Cheung and Rensvold (2001, 2002) and Chen (2007) suggested that the more constrained model is supported if the decrement in fit index (e.g., TLI, RNI, and CFI) for constrained model is less than .01. Chen (2007) and Chen, Curran, Bollen, Kirby, and Paxton (2008) suggested that there is good support for the more parsimonious model when the RMSEA increases by less than .15. For the chi-square difference, scaled chi-square difference tests (Bryant & Satorra, 2012; Satorra & Bentler,

2015 Hogrefe Publishing

131

2001, 2010) were conducted for comparing the less restrictive model with the other restricted models. If there is no significant change in chi-square, there is reasonable support for the more restrictive model. Hence, in the present investigation, measurement invariance tests of the English and Chinese versions of the SDQII involved assessing the fit indices for a series of invariance models.

Results Descriptive Statistics for Australian and Chinese Samples Descriptive statistics which include means, standard deviations, kurtosis, skewness, standard errors of the mean, and reliability estimates for each of the self-concept scales are shown in Table 1. Results suggest that the English version of the SDQII has a reasonably high reliability for each selfconcept scale. The coefficient alpha estimates of reliability range from .77 to .90 and most of them were greater than .80, with a median of .84 (see Table 1). Similar to those of the English version of the SDQII in the present study, the coefficient alpha estimates of reliability for each scale of the Chinese version of SDQII were reasonable, ranging from .74 to .86 with most of them being greater than .75 and with a median of .76 (see Table 1). Most of the reliability estimates are higher for the Australian sample compared with Chinese sample.

Factor Structure for Each Version of the SDQII Multidimensionality of the SDQII (English Version) The results of the present study provide evidence to support the multidimensionality of the English version of the SDQII in that there are 11 distinct factors in the SDQII. Comparing with the 1-factor model, the 11-factor model provides the best goodness-of-fit indices (see Table 2). The TLI, RNI, and CFI values indicate the model had an excellent fit because these three scores were higher than .95 and the RMSEA was less than .05 which is taken to reflect a good fit while the SRMR was smaller than 0.10 indicates there is good model fit. These results suggest that the 11-factor model provides a reasonable fit to the data. The target factor loadings are reasonable (most > .50), ranging from .35 to .93 in this study. The correlations among the factors are modest except for the modestly high correlation between Mathematics and Verbal (see Table 3a). Nevertheless, the typically modest correlations among the SDQII factors provide preliminary support for the discriminant validity of the factors. In sum, the results demonstrate similar patterns found in previous SDQ research and provide support for the notion that the 11 subscales are distinguishable from each other

European Journal of Psychological Assessment 2016; Vol. 32(2):128â&#x20AC;&#x201C;139

132

K. C. Leung et al.: Measurement Invariance of Chinese SDQII

Table 1. Descriptive statistics for the SDQII in Australian and Chinese samples Australian sample Subscale Phy Appr Osex Ssex Hons Prnt Emot Genl Math Verb Schl

4.24 3.52 4.04 4.99 4.62 5.11 3.96 4.84 3.81 4.10 4.38

1.31 1.34 1.37 1.02 0.96 1.03 1.24 0.91 1.40 1.38 1.22

Kurt 0.68 0.71 0.57 1.73 0.47 1.23 0.72 0.14 0.82 0.72 0.47

SEM

302 301 301 302 302 302 302 302 302 302 302

.80 .84 .84 .83 .77 .83 .79 .82 .90 .89 .85

.08 .08 .08 .06 .06 .06 .07 .05 .08 .08 .07

3.52 3.06 3.92 4.67 4.01 4.38 3.83 3.99 3.13 3.39 3.48

1.34 1.03 1.11 0.96 0.94 1.10 1.18 0.91 1.35 1.23 1.09

Skew 0.49 0.19 0.49 1.29 0.48 1.31 0.24 0.83 0.22 0.44 0.54

Chinese sample Kurt 0.86 0.14 0.35 0.96 0.09 0.03 0.36 0.11 0.70 0.65 0.06

Skew 0.02 0.14 0.17 0.84 0.19 0.59 0.17 0.09 0.21 0.17 0.04

SEM

322 322 320 322 322 322 322 322 322 322 322

.83 .76 .74 .74 .76 .76 .80 .75 .86 .84 .79

.08 .06 .06 .05 .05 .06 .07 .05 .08 .07 .06

Notes. SEM = standard error of the mean; Kurt = Kurtosis; Skew = Skewness; a = coefficient alpha estimate of reliability; Phy = Physical Ability; Appr = Physical Appearance; Osex = Opposite-Sex Relations; Ssex = Same-Sex Relations; Hons = Honesty-Trustworthiness; Prnt = Parent Relations; Emot = Emotional Stability; Genl = General self-concept; Math = Mathematics; Verb = Verbal; Schl = General School self-concept.

Table 2. Goodness-of-fit summary for alternative models of the SDQII subscales for Australian and Chinese samples respectively Model 1. Australian sample a. 11-factor model b. 1-factor model 2. Chinese sample a. 11-factor model b. 1-factor model

TLI

RNI

CFI

RMSEA

SRMR

1,520.44 5,752.16

1,169 1,224

.975 .690

.977 .703

.96 .79

.035 (90% CI = .030–.040) .120 (90% CI = .120–.130)

.062 .120

1,551.92 4,346.89

1,169 1,224

.943 .534

.948 .553

.91 .65

.038 (90% CI = .033–.043) .110 (90% CI = .110–.110)

.069 .120

Notes. TLI = Tucker-Lewis Index; RNI = Relative Non-Centrality Index; CFI = Comparative Fit Index; RMSEA = Root Mean Square Error of Approximation; SRMR = Standardized Root Mean Square Residual; CI = Confidence Interval.

such that the multidimensionality of the self-concept scales of the English version of the SDQII is demonstrated. Multidimensionality of the SDQII (Chinese Version) Compared with the 1-factor model, the 11-factor model provides the best goodness-of-fit indices (see Table 2). The TLI and RNI values indicate the model had a good fit because these two scores were close to .95, and CFI indicates an acceptable model fit since it falls in the range of 0.90–0.95. The RMSEA was less than .05, while the SRMR was smaller than 0.10 indicates there is good model fit. These results suggest that the 11-factor model provides a reasonable fit to the data. The target factor loadings were reasonable (most > .50), ranging from .46 to .91 in this study. The typically modest correlations among the SDQII factors provide clear support for the discriminant validity of the factors (see Table 3b). In sum, the results demonstrate similar patterns which are consistent with findings of previous SDQ studies and provide evidence to support the multidimensionality of the self-concept scales of the Chinese

European Journal of Psychological Assessment 2016; Vol. 32(2):128–139

version of the SDQII in that the 11 subscales are distinct factors.

Measurement Invariance Over Two Versions of the SDQII Measurement invariance testing of the SDQII in the present study involved assessing the fit indices for four models across Chinese and Australian population samples. The first model (configural invariance) contained no-invariance constraints across the two samples (NO IN); the second model (weak measurement invariance) constrained factor loadings to be invariant (LOAD = IN); the third model (strong measurement invariance) constrained factor loadings and item intercept as invariant (LOAD = IN; INTERCEPT = IN); and the fourth model (strict measurement invariance) held factor loadings, item intercept, and uniquenesses invariant (LOAD = IN; INTERCEPT = IN; UN = IN). The results generated from these four models across two population samples are presented in Table 4. The first

2015 Hogrefe Publishing

K. C. Leung et al.: Measurement Invariance of Chinese SDQII

133

Table 3a. CFA correlations among 11 subscales of the SDQII in Australian sample Subscale Phy Appr Osex Ssex Hons Prnt Emot Genl Math Verb Schl

Phy

Appr

Osex

Ssex

Hons

Prnt

Emot

Genl

Math

Verb

Schl

– .31*** .27*** .31*** .05 .27*** .19* .36*** .23*** .23** .31***

– .54*** .38*** .07 .37*** .20* .53*** .27* .22** .37***

– .37*** .09 .09 .30*** .37*** .14* .15* .24**

– .20*** .25*** .26** .44*** .19** .34*** .34***

– .29*** .27*** .38*** .20** .28*** .27***

– .19* .51*** .23** .21** .29***

– .40*** .21** .29** .32***

– .55*** .65*** .80***

– .50*** .74***

– .86***

–

Notes. Phy = Physical Ability; Appr = Physical Appearance; Osex = Opposite-Sex Relations; Ssex = Same-Sex Relations; Hons = Honesty-Trustworthiness; Prnt = Parent Relations; Emot = Emotional Stability; Genl = General self-concept; Math = Mathematics; Verb = Verbal; Schl = General School self-concept. *p < .05. **p < .01. ***p < .001.

Table 3b. CFA correlations among 11 subscales of the SDQII in Chinese sample Subscale

Phy

Appr

Osex

Ssex

Phy Appr Osex Ssex Hons Prnt Emot Genl Math Verb Schl

– .31*** .26** .26** .06 .03 .18* .35*** .17* .05 .13

– .45*** .27** .19* .20* .19* .51*** .20* .26** .36***

– .62*** .03 .04 .24** .50*** .00 .09 .23**

– .32*** .22*** .44** .54*** .04 .14 .21*

Hons

Prnt

Emot

Genl

Math

Verb

Schl

– .17 .21* .38*** .20* .20* .27**

– .21* .37*** .22** .13 .19*

– .42*** .13 .06 .24**

– .37*** .36*** .59***

– .01 .65***

– .22**

–

model with no invariance (TLI = .950; RNI = .954; CFI = .94; RMSEA = .043; SRMR = .067) served as the basis for comparison with the more restrictive models that impose invariance constraints on factor loadings. Imposing constraints on factor loadings in Model 2 resulted in slightly lower fit indices (TLI = .947; RNI = .950; CFI = .94; RMSEA = .044; SRMR = .079) than the no-invariance model (Model 1) and there is only slight change in fit indices (DTLI = .003; DRNI = .004; DCFI = .000; DRMSEA = +.001; DSRMR = +.012). Scaled chi-square difference test revealed that the change in chi-square was significant (Model 1 vs. Model 2: scaled chi-square difference = 1,163.35, df = 51, p < .001). The results provide strong support for the invariance of the factor loadings since the decrease in fit indices (e.g., TLI, RNI, and CFI) for Model 2 is less than .01 and the RMSEA increases by less than .15 although the change in SB-chi square is statistically significant. 2015 Hogrefe Publishing

Similarly, the fit indices in Model 3 (TLI = .941; RNI = .943; CFI = .93; RMSEA = .047; SRMR = .078) were also slightly lower than those in Model 2 by imposing the item intercept in addition to factor loadings. And there is only slight change in fit indices (DTLI = .006; DRNI = .007; DCFI = .010; DRMSEA = +.003; DSRMR = +.001). Scaled chi-square difference test revealed that the change in chi-square was significant (Model 2 vs. Model 3: scaled chi-square difference = 639.57, df = 40, p < .001). These results also provide reasonable support for the invariance of the factor means and factor loading since the decrease in fit indices (e.g., TLI, RNI, and CFI) for Model 3 is not greater than .01 and the RMSEA increases by less than .15 although the change in SB-chi square is statistically significant. In order to estimate the latent means for the two population samples, it is necessary to set the latent means for one group to zero and serve as a reference for making comparison European Journal of Psychological Assessment 2016; Vol. 32(2):128–139

134

K. C. Leung et al.: Measurement Invariance of Chinese SDQII

Table 4. Measurement invariance tests for the English and Chinese versions of the SDQII v2

TLI

RNI

CFI

Model 1: No IN model

3,373.89

2,338

.950

.954

.94

Model 2: LOAD = IN

3,504.08

2,389

.947

.950

.94

Model 3: LOAD = IN; INTERCEPT = IN

3,697.62

2,429

.941

.943

.93

Model 4: LOAD = IN; INTERCEPT = IN; Uniq = IN

4,193.93

2,480

.921

.923

.92

Model

RMSEA .043 (90% CI = .040–.046) .044 (90% CI = .041–.047) .047 (90% CI = .044–.050) .054 (90% CI = .051–.057)

SRMR .067 .079 .078 .082

with other groups since a latent variable is unobservable without an intrinsic scale (Byrne, 1998, 2012; Jöreskog & Sörbom, 1993). In the present investigation, the Australian sample was designated as reference group and the factor means were constrained to zero. Hence, the latent means were estimated for Chinese sample compared with the Australian sample. The results indicate small but systematic differences in favor of Australian students (see Table 5). However, for Model 4, the fit indices (TLI = .921; RNI = .923; CFI = .92; RMSEA = .054; SRMR = .082) were much lower than for Model 3 when the uniquenesses were held invariant in addition to invariant factor loadings and item intercept. Furthermore, there is considerable change in fit indices (DTLI = .020; DRNI = .020; DCFI = .010; DRMSEA = +.007; DSRMR = +.004). The scaled chi-square difference test revealed that the change in chi-square was significant (Model 3 vs. Model 4: scaled chi-square difference = 869.66, df = 51, p < .001).

Table 5. Estimated latent means (unstandardized) and standard error for the SDQII in Australian and Chinese samples Sub-scale Phy Appr Osex Ssex Hons Prnt Emot Genl Math Verb Schl

Chinese sample 0.63* (0.31) 0.47 (0.37) 0.18 (0.38) 0.34 (0.36) 0.79* (0.32) 0.81** (0.31) 0.06 (0.34) 1.07** (0.33) 0.54 (0.31) 0.67* (0.29) 0.87** (0.31)

Australian sample 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

These results do not provide support for the invariance of the uniqueness since TLI and RNI dropped more than .01 and the change in SB-chi square was statistically significant although RMSEA increased by less than .15 and CFI decreased less than .01. In sum, these measurement invariance tests indicated that the factor structure (factor loadings and item intercepts) are invariant across the Chinese and English versions of the SDQII while the uniqueness is not invariant across these two versions of the SDQII.

Discussion Previous research has provided clear support for the reliability and validity of both the English and Chinese versions of the SDQII, respectively, however, no research thus far has tested the measurement invariance between the Chinese and English versions of the SDQII. Hence, the present investigation is the first to conduct measurement invariance testing of the Chinese and English versions of the SDQII and contributes uniquely in self-concept research by providing clear support for the measurement invariance of these two versions of self-concept instruments. As shown in the results, since the a priori 11-factor model fit the data well in separate analyses of Australian and Chinese data, it provides a sound basis for subsequent pursuit of measurement invariance tests. In the measurement invariance tests, the results showed that the factor structure (factor loading and item intercept) of the Chinese version of the SDQII in comparison to responses to the English version of the short form of SDQII is invariant. These results imply that researchers can confidently utilize both the English and Chinese versions of the SDQII with Chinese and Australian samples separately and cross-culturally. Hence, this permits future research to conduct cultural generalization of selfconcept research across samples from English-speaking countries with Chinese cultures. However, the present study showed that manifest means were not invariant between the Chinese and English versions of the SDQII since most of the reliability estimates are higher for the Australian sample compared with the 2015 Hogrefe Publishing

K. C. Leung et al.: Measurement Invariance of Chinese SDQII

Chinese sample. These differences make it inappropriate to make comparison between Chinese and Australian samples based on manifest scores not corrected for unreliability. Hence, comparison should be made on latent variables instead. Marsh et al. (2013) also found similar results when comparing the manifest means in mathematics and science motivational scales including self-concept between Arab and Anglo countries based on the Trends in International Mathematics and Science Study (TIMSS) 2007 data. They found manifest means were not invariant since most of the reliability estimates were higher for the Anglo sample. Of particular interest is the modestly high correlation between Mathematics and Verbal in the English version of the SDQII. This is consistent with the patterns reported in a study based on archive data (Marsh et al., 2005) and other SDQ findings (e.g., Bodkin-Andrews, Ha, Craven, & Yeung, 2010; Marsh, 1990c, 1992c; Marsh & Craven, 1991). These results indicate that these two school subjects are closely related in the Australian sample in the present investigation. Also, there is support for the item intercept invariance between the Chinese and English versions of the SDQII (see Model 3). It was revealed that there are small but systematic differences in favor of Australian students (see Table 5). Previous studies also showed that US selfconcepts are much higher than China (Shen & Pedulla, 2000; Shen & Tam, 2006). These results may be explained by the fact that collectivism and humbleness are emphasized in Chinese culture under the influence of Confucian values. In contrast, individualism and values of outperforming others are emphasized in Western culture (Kurman, 2003; Li, Zhang, Bhatt, & Yum, 2006; Wang, 2001). Hence, Chinese students may downgrade themselves in relation to others in comparison with American students (Kurman, 2003; Stigler, Smith, & Mao, 1985; Uttal, Lummis, & Stevenson, 1988). However, even if there was sound evidence supporting the invariance of item intercepts, Marsh et al. (2006) argued that it may not be appropriate to analyze the mean-level differences across cultures since cultural differences in modesty and self-enhancement (Kurman, 2003) might affect the mean responses of items. As illustrated in the Program for International Student Assessment (PISA) study (Artelt, Baumert, JuliusMcElvany, & Peschar, 2003), a Korean student who claims to work hard in his/her study may not be comparable to a Brazilian student who also claims to work hard in his/her learning since the cultural norms regarding modesty and self-assertion are different. The former comes from culture which emphasizes humbleness and collectivism while the latter comes from a culture which emphasizes outperforming others. In other words, children coming from different cultures would have very different styles in expressing themselves concerning self-evaluation. Marsh et al. (2013) also revealed that students from Arab countries (Saudi Arabia, Jordan, Oman, and Egypt) displayed higher self-concepts than students from Anglo countries (United States, Australia, England, and Scotland) based on the international TIMSS 2007 data since Arab students have been socialized to be less critical of themselves in school and

2015 Hogrefe Publishing

135

family (Abu-Hilal, 2001) and hence they have higher evaluation of themselves than students from Anglo countries. Thus, it was found that cross-cultural investigation of self-reported features (e.g., self-concept, student approaches to learning) is vulnerable to the problem of comparability across cultures (e.g., Bempechat, Jimenez, & Boulay, 2002; Heine, Lehman, Markus, & Kitayama, 1999; Van de Vijver & Leung, 1997). Hence, group means from these two countries (Australia and China) in the present investigation cannot be compared to generate meaningful interpretations. Marsh et al. (2006) emphasized that although there was no evidence to support for the item intercepts invariance across the 57 countries participating in PISA 2006, it was not problematic if the relations among variables and theoretical models of these relations within each country are the key focus since differences in modesty and selfenhancement across cultures might affect the item means while leaving relations among variables and their relations to the underlying constructs unaffected. However, item intercept invariance is a critical issue when ranking order of countries in relation to mean levels of these constructs is the main focus. Hence, the Chinese version of SDQII is most appropriately used for studies of Chinese students, or comparison relations among variables between Chinese and Australian samples but perhaps not in the comparison of means between Chinese and Australian samples.

Limitations and Future Research While the findings of this study provide sound evidence to support the measurement invariance of both the English and Chinese versions of the SDQII, a major limitation of the present study is that the sample size was small relative to the large number of parameters estimated in the CFA matrix in the present investigation. Hence, it will lead to less powerful tests of measurement invariance. Although appropriate estimation procedures (maximum likelihood) were adopted and appropriate fit indices were used to account for the small sample, a larger sample would be more desirable because it allows the examination of the generalizability of the findings to a larger population. Nevertheless, the present study has provided strong support for the scalar measurement invariance of both the English and Chinese versions of the SDQII considered here.

References Abu-Hilal, M. M. (2001). Correlates of achievement in the United Arab Emirates: A sociocultural study. In D. M. McInerney & S. Van Etten (Eds.), Research on sociocultural influences on motivation and learning (Vol. 1, pp. 205â&#x20AC;&#x201C;230). Greenwich, CT: Information Age. Artelt, C., Baumert, J., Julius-McElvany, N., & Peschar, J. (2003). Learners for life: Student approaches to learning. Results from PISA 2000. Paris, France: OECD.

European Journal of Psychological Assessment 2016; Vol. 32(2):128â&#x20AC;&#x201C;139

136

K. C. Leung et al.: Measurement Invariance of Chinese SDQII

Bempechat, J., Jimenez, N. V., & Boulay, B. A. (2002). Culturalcognitive issues in academic achievement: New directions for cross-national research. In A. C. Porter & A. Gamoran (Eds.), Methodological advances in cross-national surveys of educational achievement (pp. 117–149). Washington, DC: National Academic Press. Bentler, P. M. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238–246. Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588–606. Bodkin-Andrews, G. H., Ha, M. T., Craven, R. G., & Yeung, A. S. (2010). Factorial invariance testing and latent mean differences for the Self-Description Questionnaire II (short version) with indigenous and non-indigenous Australian secondary school students. International Journal of Testing, 9, 47–79. doi: 10.1080/15305050903352065 Boyle, G. I. (1994). Self-Description Questionnaire II: A review. Test Critiques, 10, 632–643. Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage. Bryant, F. B., & Satorra, A. (2012). Principles and practice of scaled difference chi-square testing. Structural Equation Modeling, 19, 372–398. Byrne, B. M. (1984). The general/academic self-concept nomological network: A review of construct validation research. Review of Educational Research, 54, 427–456. Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS, and SIMPLIS: Basic concepts, applications, and programming. Mahwah, NJ: Erlbaum. Byrne, B. M. (2012). Structural equation modeling with Mplus: Basic concepts, applications, and programming. New York, NY: Routledge Academic. Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14, 464–504. doi: 10.1080/10705510701301834 Chen, F., Curran, P. J., Bollen, K. A., Kirby, J., & Paxton, P. (2008). An empirical evaluation of the use of fixed cutoff points in RMSEA test statistic in structural equation models. Sociological Methods & Research, 36, 462–494. doi: 10.1177/0049124108314720 Cheung, G. W., & Rensvold, R. B. (2001). The effects of model parsimony and sampling error on the fit of structural equation models. Organizational Research Methods, 4, 236–264. doi: 10.1177/109442810143004 Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233–255. doi: 10.1207/ S15328007SEM0902_5 Harter, S. (1982). The perceived competence scale for children. Child Development, 53, 87–97. Hattie, J. (1992). Self-concept. Hillsdale, NJ: Erlbaum. Heine, S. J., Lehman, D. R., Markus, H. R., & Kitayama, S. (1999). Is there a universal need for positive self-regard? Psychological Review, 106, 766–794. Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55. Jöreskog, K. G., & Sörbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS command language [Computer software]. Chicago, IL: Scientific Software International. Jöreskog, K. G., & Sörbom, D. (2003). LISREL 8.54 [Computer software]. Chicago, IL: Scientific Software International. Kaminski, P. L., Shafer, M. E., Neumann, C. S., & Ramos, V. (2006). Self-concept in Mexican American girls and boys: Validating the Self-Description Questionnaire-I. European Journal of Psychological Assessment 2016; Vol. 32(2):128–139

Cultural Diversity & Ethnic Minority Psychology, 11, 321–338. doi: 10.1037/1099-9809.11.4.321 Kline, R. B. (2005). Principles and practices of structural equation modeling. New York, NY: Guilford. Kong, C. K. (2000). Chinese students’ self-concept: Structure, frame of reference, and relation with academic achievement. Dissertation Abstracts International Section A: Humanities & Social Sciences, 61, 880. Kurman, J. (2003). Why is self-enhancement low in certain collectivist cultures? An investigation of two competing explanations. Journal of Cross-Cultural Psychology, 34, 496–510. doi: 10.1177/0022022103256474 Leach, L. F., Henson, R. K., Odom, L. R., & Cagle, L. S. (2006). A reliability generalization study of the SelfDescription Questionnaire. Educational and Psychological Measurement, 66, 285–304. Lee, J., Little, T. D., & Preacher, K. J. (2011). Methodological issues in using structural equation models for testing differential item functioning. In E. Davidov, P. Schmitt, & J. Billiet (Eds.), Cross-cultural analysis: Methods and applications (Vol. 1, pp. 55–85). New York, NY: Routledge. doi: 10.1109/TSC.2008.18 Leung, K. C., Marsh, H. W., Craven, R. G., Yeung, A. S., & Abduljabbar, A. S. (2013). Domain specificity between peer support and self-concept. Journal of Early Adolescence, 33, 227–244. Leung, K. C., Marsh, H. W., Yeung, A. S., & Abduljabbar, A. S. (2014). Validity of social, moral and emotional facets of Self Description Questionnaire II. Journal of Experimental Education. doi: 10.1080/00220973.2013.876229 Li, H. Z., Zhang, Z., Bhatt, G., & Yum, Y. O. (2006). Rethinking culture and self-construal: China as a middle land. Journal of Social Psychology, 146, 591–610. Marsh, H. W. (1990a). A multidimensional, hierarchical selfconcept: Theoretical and empirical justification. Educational Psychology Review, 2, 77–172. Marsh, H. W. (1990b). Self-Description Questionnaire (SDQ) I: A theoretical and empirical basis for the measurement of multiple dimensions of preadolescent self-concept: A test manual and a research monograph. Sydney, Australia: University of Western Sydney. Marsh, H. W. (1990c). The structure of academic self-concept: The Marsh/Shavelson model. Journal of Educational Psychology, 82, 623–636. Marsh, H. W. (1992a). Self-Description Questionnaire (SDQ) II: Manual.. Sydney, Australia: University of Western Sydney. (Original Work published 1990). Marsh, H.W. (1992b). Self-Description Questionnaire (SDQ) III: A theoretical and empirical basis for the measurement of multiple dimensions of late adolescent self-concept: A test manual and a research monograph. Sydney, Australia: University of Western Sydney. Marsh, H. W. (1992c). The content specificity of relations between academic achievement and academic self-concept. Journal of Educational Psychology, 84, 35–42. Marsh, H. W. (1993). Academic self-concept: Theory measurement and research. In J. Suls (Ed.), Psychological perspectives on the self (Vol. 4, pp. 59–98). Hillsdale, NJ: Erlbaum. Marsh, H. W. (1994). Using the National Longitudinal Study of 1988 to evaluate theoretical models of self-concept: The Self-Description Questionnaire. Journal of Educational Psychology, 80, 439–456. Marsh, H. W., Abduljabbar, A. S., Abu-Hilal, M. M., Morin, A. J. S., Abdelfattah, F., Leung, K. C., . . . Parker, P. (2013). Factorial, convergent, and discriminant validity of TIMSS math and science motivation measures: A comparison of Arab and Anglo-Saxon countries. Journal of Educational Psychology, 105, 108–128. doi: 10.1037/a0029907 2015 Hogrefe Publishing

K. C. Leung et al.: Measurement Invariance of Chinese SDQII

Marsh, H. W., Balla, J. R., & Hau, K. T. (1996). An evaluation of incremental fit indices: A clarification of mathematical and empirical processes. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling techniques (pp. 315–353). Hillsdale, NJ: Erlbaum. Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodnessof-fit indexes in confirmatory factor analysis: The effect of sample size. Psychological Bulletin, 103, 391–410. Marsh, H. W., Barnes, J., & Hocevar, D. (1985). Self-other agreement on multidimensional self-concept ratings: Factor analysis and multitrait-multimethod analysis. Journal of Personality and Social Psychology, 49, 1360–1377. Marsh, H. W., Byrne, B. M., & Shavelson, R. (1988). A multifaceted academic self-concept: Its hierarchical structure and its relation to academic achievement. Journal of Educational Psychology, 80, 366–380. Marsh, H. W., & Craven, R. G. (1991). Self-other agreement on multiple dimensions of preadolescent self-concept: Inferences by teachers, mothers, and fathers. Journal of Educational Psychology, 83, 393–404. Marsh, H. W., & Craven, R. G. (1997). Academic self-concept: Beyond the dustbowl. In G. Phye (Ed.), Handbook of classroom assessment: Learning, achievement, and adjustment (pp. 131–198). Orlando, FL: Academic Press. Marsh, H. W., & Craven, R. G. (2006). Reciprocal effects of self-concept and performance from a multidimensional perspective: Beyond seductive pleasure and unidimensional perspectives. Perspectives on Psychological Science, 1, 133–163. Marsh, H. W., Craven, R. G., & Debus, R. (1991). Self-concepts of young children 5 to 8 years of age: Measurement and multidimensional structure. Journal of Educational Psychology, 83, 377–392. Marsh, H. W., Ellis, L. A., & Craven, R. G. (2002). How do preschool children feel about themselves? Unraveling measurement and multidimensional self-concept structure. Developmental Psychology, 38, 376–393. Marsh, H. W., Ellis, L. A., Parada, R. H., Richards, G., & Heubeck, B. G. (2005). A short version of the Self Description Questionnaire II: Operationalizing criteria for short-form evaluation with new applications of confirmatory factor analyses. Psychological Assessment, 17, 81–102. Marsh, H. W., Hau, K.-T., Artelt, C., Baumert, J., & Peschar, J. (2006). OECD’s Brief Self-Report Measure of Educational Psychology’s most useful affective constructs: CrossCultural, Psychometric Comparisons across 25 countries. International Journal of Testing, 6, 311–360. doi: 10.1207/ s15327574ijt0604_1 Marsh, H. W., Kong, C. K., & Hau, K. T. (2000). Longitudinal multilevel models of the big-fish-little-pond effect on academic self-concept: Counterbalancing contrast and reflected-glory effects in Hong Kong schools. Journal of Personality and Social Psychology, 78, 337–349. Marsh, H. W., Muthén, B. O., Asparouhov, T., Lüdtke, O., Robitzsch, A., Morin, A. J. S., & Trautwein, U. (2009). Exploratory Structural Equation Modeling, integrating CFA and EFA: Application to students’ evaluations of university teaching. Structural Equation Modeling, 16, 439–476. Marsh, H. W., & O’Mara, A. (2008). Reciprocal effects between academic self-concept, self-esteem, achievement, and attainment over seven adolescent years: Unidimensional and multidimensional perspectives of self-concept. Journal of Personality and Social Psychology, 34, 542–552. doi: 10.1177/0146167207312313 Marsh, H. W., Parada, R. H., & Ayotte, V. (2004). A multidimensional perspective of relations between self-concept (Self Description Questionnaire II) and adolescent mental health (Youth Self Report). Psychological Assessment, 16, 27–41. 2015 Hogrefe Publishing

137

Marsh, H. W., Parada, R. H., Craven, R. G., & Finger, L. (2004). In the looking glass: A reciprocal effects model elucidating the complex nature of bullying, psychological determinants and the central role of self-concept. In C. S. Sanders & G. D. Phye (Eds.), Bullying: Implications for the classroom (pp. 63–106). Orlando, FL: Academic Press. Marsh, H. W., Parker, J., & Barnes, J. (1985). Multidimensional adolescent self-concepts: Their relationship to age, sex and academic measures. American Educational Research Journal, 22, 422–444. Marsh, H. W., Plucker, J. A., & Stocking, V. B. (2001). The Self-Description Questionnaire II and gifted students: Another look at Plucker, Taylor, Callahan, and Tomchin’s (1997) ‘‘Mirror, mirror on the wall’’. Educational and Psychological Measurement, 61, 976–996. Marsh, H. W., & Shavelson, R. J. (1985). Self-concept: Its multifaceted, hierarchical structure. Educational Psychologist, 20, 107–125. Marsh, H. W., Smith, I. D., & Barnes, J. (1985). Multidimensional self-concepts: Relations with sex and academic achievement. Journal of Educational Psychology, 77, 581–596. Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525–543. Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chisquare statistics in covariance structure analysis. In ASA 1988 Proceedings of Business and Economic Statistics, Section (308–313). Alexandria, VA: American Statistics Association. Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C. C. Clog (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Thousand Oaks, CA: Sage. Satorra, A., & Bentler, P. M. (2001). A scaled difference chisquare test statistic for moment structure analysis. Psychometrika, 66, 507–514. Satorra, A., & Bentler, P. M. (2010). Ensuring positiveness of the scaled difference chi-square test statistic. Psychometrika, 75, 243–248. Shavelson, R. J., Hubner, J. J., & Stanton, G. C. (1976). Validation of construct interpretations. Review of Educational Research, 46, 407–441. Shavelson, R. J., & Marsh, H. W. (1986). On the structure of self-concept. In R. Schwazer (Ed.), Anxiety and cognitions (pp. 305–330). Hillsdale, NJ: Erlbaum. Shen, C., & Pedulla, J. J. (2000). The relationship between students’ achievement and their self-perception of competence and rigour of mathematics and science: A crossnational analysis. Assessment in Education, 7, 237–253. Shen, C., & Tam, H. P. (2006). The paradoxical relationship between students’ achievement and their self-perceptions: A cross-national analysis based on three waves of TIMSS data. In H. Wagemaker (Ed.), The Second IEA International Research Conference: Proceedings of the IRC-2006, Vol. 1: Trends in International Mathematics and Science Study (TIMSS) (pp. 43–60). Amsterdam, The Netherlands: International Association for the Evaluation of Educational Achievement. Stigler, J. W., Smith, S., & Mao, L. W. (1985). The selfperception of competence by Chinese children. Child Development, 56, 1259–1270. Uttal, D. H., Lummis, M., & Stevenson, H. W. (1988). Low and high mathematics achievement in Japanese, Chinese, and American elementary-school children. Developmental Psychology, 24, 335–342. Van de Vijver, F., & Leung, K. (1997). Methods and data analysis of comparative research. In J. W. Berry, Y. H. Poortinga, & J. Pandey (Eds.), Handbook of cross-cultural psychology: Theory and method (Vol. 1, pp. 257–300). Needham Heights, MA: Allyn and Bacon. European Journal of Psychological Assessment 2016; Vol. 32(2):128–139

138

K. C. Leung et al.: Measurement Invariance of Chinese SDQII

Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–70. Von der Luft, G., Harman, L. B., Koenig, K. P., Nixon-Cave, K., & Gaughan, J. P. (2008). Cross validation of a self-concept tool for use with children with cerebral palsy. Journal of Developmental and Physical Disabilities, 20, 561–572. doi: 10.1007/s10882-008-9119-3 Wang, Q. (2001). Culture effects on adults’ earliest childhood recollection and self description: Implications for the relation between memory and the self. Journal of Personality and Social Psychology, 81, 220–233. Widaman, K. F., & Reise, S. P. (1997). Exploring the measurement invariance of psychological instruments: Applications in the substance use domain. In K. J. Bryant, M. Windle, & S. G. West (Eds.), The science of prevention: Methodological advances from alcohol and substance abuse research (pp. 281–324). Washington, DC: American Psychological Association. Wylie, R. C. (1989). Measures of self-concept. Lincoln, NB: University of Nebraska Press.

European Journal of Psychological Assessment 2016; Vol. 32(2):128–139

Date of acceptance: August 11, 2014 Published online: February 27, 2015

Kim Chau Leung Department of Special Need and Counselling Hong Kong Institute of Education Hong Kong PR China Tel. +852 2948-6239 Fax +852 2948-7794 E-mail ckcleung@ied.edu.hk

2015 Hogrefe Publishing

K. C. Leung et al.: Measurement Invariance of Chinese SDQII

139

Appendix Self-Description Questionnaire (SDQII) Subscale

Item

Subscale E1 E2

I worry more than I need to I am a nervous person

I often feel confused and mixed up

I get upset easily

E5 G1 G2 G3 G4 G5 G6 M1 M2

I worry about a lot of things Overall, I have a lot to be proud of Most things I do, I do well Overall, most things I do turn out well I can do things as well as most people If I really try I can do almost anything I want to do Overall, I’m a failure Mathematics is one of my best subjects I do badly in tests of mathematics

M3 M4 V1 V2

I get good marks in mathematics I have always done well in mathematics I’m hopeless in English classes Work in English classes is easy for me

English is one of my best subjects

H2 H3 H4 H5

I enjoy things like sports, gym, and dance I’m good at things like sports, gym, and dance I am awkward at things like sports, gym, and dance I’m better than most of my friends at things like sports, gym, and dance I have a nice looking face I am good looking Other people think I am good looking I have a good looking body I make friends easily with boys I’m not very popular with members of the opposite sex I do not get along very well with girls I have lots of friends of the opposite sex It is difficult to make friends with members of my own sex I make friends easily with boys Not many people of my own sex like me I do not get along very well with boys I make friends easily with members of my own sex I sometimes take things that belong to other people I am honest I sometimes tell lies to stay out of trouble I always tell the truth I sometimes cheat

V4 V5 S1 S2

H6 PR1 PR2 PR3 PR4

I often tell lies I get along well with my parents My parents treat me fairly My parents understand me I do not like my parents very much

I get good marks in English I learn things quickly in English classes I get bad marks in most school subjects I learn things quickly in most school subjects I do well in tests in most school subjects I’m good at most school subjects

P1 P2 P3 P4 A1 A2 A3 A4 OP1 OP2 OP3 OP4 SM1 SM2 SM3 SM4 SM5 H1

S3 S4

Item

Notes. P = Physical Ability; A = Physical Appearance; OP = Opposite-Sex Relations; SM = Same-Sex Relations; H = HonestyTrustworthiness; PR = Parent Relations; E = Emotional Stability; G = General self-concept; M = Mathematics; V = Verbal; S = General School self-concept. Numbers beside each subscale indicate the symbol used for the respective item.

2015 Hogrefe Publishing

European Journal of Psychological Assessment 2016; Vol. 32(2):128–139

Original Article

Selecting the Best Items for a Short-Form of the Experiences in Close Relationships Questionnaire Marie-France Lafontaine,1 Audrey Brassard,2 Yvan Lussier,3 Pierre Valois,4 Philip R. Shaver,5 and Susan M. Johnson6 1

Department of Psychology, University of Ottawa, Canada, 2Université de Sherbrooke, Canada, Université du Québec à Trois-Rivières, Canada, 4Université Laval, Canada, 5University of California, Davis, CA, USA, 6University of Ottawa, Ottawa Couple and Family Institute, Canada

Abstract. Five studies were conducted to develop a short form of the Experiences in Close Relationships (ECR) questionnaire with optimal psychometric properties. Study 1 involved Item Response Theory (IRT) analyses of the responses of 2,066 adults, resulting in a 12-item form of the ECR containing the most discriminating items. The psychometric properties of the ECR-12 were further demonstrated in two longitudinal studies of community samples of couples (Studies 2 and 3), in a sample of individuals in same-sex relationships (Study 4), and with couples seeking therapy (Study 5). The psychometric properties of the ECR-12 are as good as those of the original ECR and superior to those of an existing short form. The ECR-12 can confidently be used by researchers and mental health practitioners when a short measure of attachment anxiety and avoidance is required. Keywords: attachment, self-report measure, Experiences in Close Relationships, psychometrics, Item Response Theory

Interest in adult attachment among researchers and clinical practitioners has increased over the past 25 years, spurring efforts to create and validate self-report measures of attachment in romantic, or couple, relationships (see Mikulincer & Shaver, 2007, for a review). One of the most frequently used attachment questionnaires is the Experiences in Close Relationships (ECR) questionnaire, developed by Brennan, Clark, and Shaver (1998). This 36-item questionnaire is a synthesis of previous measures created by many different investigators and is based on factor analyses of 323 items from such measures. Translated into many languages, the ECR is now considered a reference instrument. It assesses anxiety concerning rejection or abandonment and avoidance of intimacy and interdependence. The anxiety dimension is associated with viewing oneself as needing others but being vulnerable to rejection (Bartholomew & Horowitz, 1991; Collins & Allard, 2001). The avoidance dimension is associated with a working model of others (i.e., relationship partners) as overly needy and dependently intrusive, and with a strong desire to remain independent and self-reliant (Bartholomew & Horowitz, 1991; Collins & Allard, 2001). These two dimensions are solidly grounded in attachment theory (Bowlby,

European Journal of Psychological Assessment 2016; Vol. 32(2):140–154 DOI: 10.1027/1015-5759/a000243

1969/1982) and research, beginning with the pioneering work of Ainsworth, Blehar, Waters, and Wall (1978) on infant-mother attachment. The two-dimensional factor structure of the ECR has been observed in samples of men and women (Lafontaine & Lussier, 2003), and in homosexual and heterosexual, clinical and nonclinical adult populations (Alonso-Arbiol, Balluerka, & Shaver, 2007; Lafontaine & Lussier, 2003; Matte, Lemieux, & Lafontaine, 2009; Picardi, Bitetti, Puddu, & Pasquini, 2000). This structure is also evident in many translations of the ECR into languages other than English, including Chinese (Mallinckrodt & Wang, 2004), Dutch (Conradi, Gerlsma, van Duijn, & de Jonge, 2006), French (Lafontaine & Lussier, 2003), Hebrew (Mikulincer & Florian, 2000), Italian (Picardi et al., 2000), Japanese (Nakao & Kato, 2004), and Spanish (Alonso-Arbiol et al., 2007). The ECR has been used worldwide because of its high reliability and validity, demonstrated in many correlational and experimental studies; for example: (a) convergent validity with measures of relationship satisfaction, psychological distress, fear of intimacy, romantic dependence, accommodation strategies, and self-esteem, among other constructs; (b) high test-retest reliability (ranging from .50

Ó 2015 Hogrefe Publishing

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

to .75 depending on the nature of the study and the time span being assessed); and (c) high internal consistency (for both subscales, anxiety and avoidance, near or above .90; e.g., Alonso-Arbiol et al., 2007; Brennan et al., 1998; Conradi et al., 2006; Lafontaine & Lussier, 2003; Mallinckrodt & Wang, 2004; Nakao & Kato, 2004). Because the two ECR scales are fairly long (18 items each), there is a strong desire among researchers for a shorter version of the ECR based on the best discriminating items and retaining high reliability. Using a restricted pool of items has the advantage of reducing participants’ fatigue, frustration, and boredom (Robins, Hendin, & Trzesniewski, 2001). Furthermore, a brief ECR would be very useful in largesample studies or as part of a thorough psychological assessment. The primary purpose of our five studies was to develop a short version of the ECR and to validate it in five samples, including couples from the general community, individuals in same-sex relationships, and couples seeking therapy for relationship difficulties. Based on this choice, we used relationship satisfaction and psychological distress as validity criteria. The selected pool of items has to be small enough to allow a rapid assessment of attachment dimensions but large enough to ensure accurate parameter estimates and good reliability (Marsh, Hau, Balla, & Grayson, 1998). Based on Marsh et al.’s (1998) recommendation that a factor should include a minimum of four items, our goal was to select a minimum of eight items (i.e., four items for each ECR dimension). Other studies have followed this recommendation in developing shortform questionnaires (e.g., Poitras, Guay, & Ratelle, 2012).

Previous Modifications of the ECR To our knowledge, only one short version of the ECR has been published, a 12-item questionnaire created by Wei, Russell, Mallinckrodt, and Vogel (2007), called the ECR-S. They conducted separate exploratory factor analyses on items from the anxiety and avoidance subscales. They identified redundant items based on inter-item correlations and similar wording, and retained only items with the highest factor loadings. This logical and statistical approach resulted in a six-item avoidance subscale (including the original ECR items numbered 11, 13, 17, 27, 33, and 35) and a six-item anxiety subscale (items 6, 16, 18, 22, 26, and 32). Wei et al. (2007) also performed confirmatory factor analyses (on the ECR-S and the ECR) to test different models using college and undergraduate samples, including the sample on which the item selection was based. The fit to the models was comparable for the short version and the original version, but this was the case only after two latent variables, representing positively and negatively phrased items, were statistically controlled. Five studies assessed the validity and reliability of the ECR-S, and all of them yielded acceptable internal consistency for both short subscales. The same five studies evaluated the construct validity of the ECR-S using correlations with relevant constructs (e.g., depression, anxiety, and social desirability). Analyses confirmed that test-retest Ó 2015 Hogrefe Publishing

141

reliability for the ECR-S was acceptable over both a 3-week interval (Study 6) and a 1-month interval (Study 4). Wei and colleagues (2007) demonstrated that it is possible to reduce the number of items on the ECR while preserving its validity and reliability, but several limitations of their work justify the creation of an alternative short form. First, the generalizability of their validation study is limited by its reliance on homogeneous samples of North American undergraduate students. Second, the factor structure of the ECR-S was acceptable only after controlling for two additional latent variables accounting for response sets (which was not the case with the original ECR). Third, Wei et al. (2007) used Classical Test Theory (CTT) to select items to be removed or retained. However, many researchers have argued that CTT has a number of shortcomings (e.g., Embretson & Reise, 2000; Thissen & Steinberg, 1997). For instance, because test analysis in CTT occurs at the level of the test rather than the item, it provides little information concerning how a specific item functions independently of the other items (Stroud, McKnight, & Jensen, 2004). Another limitation of CTT is that its estimates of item parameters are dependent on the particular sample from which they were derived. Finally, CTT assumes an average standard error of measurement for all examinees. These shortcomings have motivated psychometricians to seek alternative theories and models to measure mental processes. According to Nunnally and Bernstein (1994), Item Response Theory (IRT) methods are central in modern psychometrics and are now considered a good alternative to develop and validate psychosocial scales. Although, CTTand IRT-based scales can be highly comparable (e.g., MacDonald & Paunonen, 2002), IRT allows selection of items providing maximum information within a specific trait range along the underlying latent dimension (e.g., anxiety or avoidance) and the creation of shorter tests that can be even more reliable than corresponding longer tests (Embretson & Reise, 2000). Fraley, Waller, and Brennan (2000) used IRT analyses on Brennan et al.’s (1998) original pool of 323 attachment items to create the ECR-R. To improve discrimination at the secure ends of the two dimensions, Fraley et al. replaced 18 items in the original ECR while retaining the other 18, 6 from the original anxiety subscale (items 4, 6, 10, 16, 22, and 26) and 12 from the avoidance subscale (items 1, 3, 7, 9, 13, 15, 19, 21, 23, 25, 27, and 29). Eighteen new items from the original 323 were chosen to replace the 18 that were eliminated, making the ECR-R another 36-item measure. Several studies have provided support for the validity and reliability of the ECR-R (e.g., Fairchild & Finney, 2006; Sibley, Fischer, & Liu, 2005). Despite this support, Mikulincer and Shaver (2007) have continued to use the original ECR, arguing that: (a) the wording of some of the new items on the ECR-R is problematic; (b) there is a slightly higher correlation between the two dimensions (anxiety and avoidance) of the ECR-R than is characteristic of the original ECR, perhaps reducing discriminability of the two constructs; (c) the ECR-R does not yield a significant increase in validity over the ECR, and (d) the ECR and the ECR-R share 18 items, which results in the correlation between the two versions of the anxiety and avoidance European Journal of Psychological Assessment 2016; Vol. 32(2):140–154

142

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

subscales being approximately .95. For these reasons, we decided to construct our short scales using the original ECR items.

The Present Research The goal of the first study reported here was to develop a short version of the ECR. IRT analyses were used to examine item performance in the original ECR, as a basis for selecting items for a shorter measure (Study 1). Four independent studies were then conducted to examine whether this new short measure presented with superior psychometric properties than the ECR-S across various participant samples: two community samples of French-Canadian couples (Study 2) and English-speaking couples (Study 3), a community sample of English-speaking gay, lesbian, and bisexual adults (Study 4), and a clinical sample of Canadian couples seeking couple therapy (Study 5). Gender-related differences were tested based on the results of a metaanalysis showing that women reported higher anxiety and lower avoidance than men, particularly in community samples (Del Giudice, 2011).

Study 1 The objectives of Study 1 were: (1) to select ECR items for a new short form, based on two IRT criteria; and (2) to perform differential item functioning (DIF) analyses (Ramsay, 2000) on the selected items to determine whether the range of answers on a given item was biased by gender (i.e., whether men and women used a different range of responses on the item).

Method Participants and Procedure Participants were 2,066 French-Canadian college students, 1,301 women (63.3%), and 755 men (36.7%). The mean age of the sample was 20.69 years (SD = 3.18), with an average of 13 years of education. The majority of the participants (62.3%) were involved in a romantic relationship at the time of the study, and 72 participants (3.5%) had children. Research assistants recruited participants in various colleges in the province of Québec, and the participants were provided questionnaires and were instructed to complete them in class (56%) or at home (44%) and to return them in the preaddressed envelope provided. Measures The attachment dimensions – anxiety and avoidance – were assessed with a French-language version of the ECR European Journal of Psychological Assessment 2016; Vol. 32(2):140–154

(Lafontaine & Lussier, 2003). Participants indicated their level of agreement with each statement using a seven-point rating scale ranging from 1 (= disagree strongly) to 7 (= agree strongly), based on their general orientation in close relationships. Items on both scales were averaged (following appropriate reversals of negatively worded items) to produce two scores for each participant; higher scores indicated greater anxiety and avoidance. Data Analysis Strategy To determine which items to include in the ECR short form, we used two criteria based on IRT (Reeve & Fayers, 2005). Several models of IRT can be applied to rating data but, given the use of Likert scales, the Graded-Response Model (GRM; Samejima,1969, 1997) was deemed most appropriate (Embretson & Reise, 2000). The primary objective of IRT methods is to specify the associations between item responses and the latent trait (h) posited to be measured by the test or questionnaire. An Item Characteristic Curve (ICC) graphically represents the way in which the probability of a response varies with the level of the underlying trait. In the graded response model, if the probability of endorsing an item or option increases as a function of, for instance, anxiety, the item is effective. This first criterion is called ‘‘discrimination.’’ In order to compute the parameter estimates under the GRM, the EIRT program (Excel Item Response Theory assistant; Valois, Houssemand, Germain, & Abdous, 2011) was used in the current study. This program estimates the ability of items and scales to detect differences between individuals across a latent trait. Ideally, the differences between two individuals should be easily detected, regardless of whether they are in the low or high regions of the latent trait. The slope of the ICC (ai; item discriminability) indicates the extent to which a change in item score corresponds to changes in the level of latent trait (Baker, 2001). We examined results of item analyses for each of the original ECR’s 18 avoidance items and its 18 anxiety items. Our goal was to select the items that best differentiated participants in terms of avoidance and anxiety scores along the entire span of scores. This was done by choosing the items that yielded the steepest (most steeply sloped) item characteristic curves, with steepness indicating how well the item distinguishes participants with different trait levels. To assess the precision of these slopes, we created 1,000 nonparametric bootstrap samples of 1,033 participants (a random half of the total sample in each bootstrap sample), a procedure designed to empirically estimate the variability of a statistic (the slope of the ICC) whose theoretical distribution is unknown. We ‘‘resampled’’ the data from the original sample with replacement to generate an empirical estimate of the entire sampling distribution of the slope for each item (Mooney & Duval, 1993). The analysis was performed on each bootstrap sample, which gave 1,000 slope values and standard error estimates for every item. All of these values were then ranked, averaged, and used to construct 99% confidence intervals, allowing us to identify the most discriminating items. Ó 2015 Hogrefe Publishing

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

After analyzing the degree of discrimination, we examined the item difficulty for all items with the bij parameters and their standard errors. The bij parameters represent a given avoidance or anxiety level as a function of the maximum probability of choosing each response option. For the choice of response 2 on item 2, for example, the bij parameter b22 is the place on the latent trait (or theta scale) of anxiety where the probability of choosing option 2 is maximal. Good items have bij parameters that match their respective response levels and have minimal error (Poitras et al., 2012). Based on Baker (2001), we wanted to keep items with bij parameters having an adequate location on the largest possible scale continuum (from 3 to 3) and the lowest standard errors. Based on these two IRT criteria (item discrimination and difficulty), the bivariate correlations between each possible short scale (from 2- to 18-item anxiety and avoidance subscales) and the original ECR subscales, and the distribution of their standard errors, we determined the best number of items for the new short form. We then conducted differential item functioning (DIF; Ramsay, 2000) analyses on the selected items to examine the presence of gender bias.

143

times an item had the second steepest slope, etc. Table 1 shows the ranks of the 18 avoidance items. For example, item 29, which possessed the highest mean slope, was ranked first on 427 occasions, second on 238 occasions, third on 193 occasions, etc. Table 1 suggests that there were four groups of items, based on their ranks: Items 29, 27, 25, 31, 9, and 15 (group 1: with item slopes found mostly in the first sixth ranks) had the best rankings, followed by items 23, 7, 13, and 19 (group 2: with item slopes found mostly between ranks 7 and 10, inclusive); items 5, 11, 1, 21, and 17 (group 3: with item slopes found mostly between ranks 11 and 15, inclusive); and items 33, 3, and 35 (group 4: with item slopes found mostly between ranks 16 and 18, inclusive). Among the six items with the steepest slopes (i.e., group 1), none appeared among the last five ranks over the 1,000 bootstraps. Items 29, 27, and 25 were classified more than 900 times in the top five ranks, followed by item 9 (721 times), item 31 (671 times), and item 15 (482 times). In contrast, items from the three other groups were never or rarely among the top six ranks. Table 2 presents, in decreasing order, the means and confidence intervals for slopes of the item characteristic curves (ICC) for each of the original 18 avoidance items. The averaged item ICC slopes were all quite high, although the six items from group 1 appeared to be the best for a short avoidance subscale. The estimation of those six items’ slopes yielded the lowest standard errors, indicating greater precision. Item difficulty parameters for the 18 avoidance items, also shown in Table 2, revealed that these six items had adequate locations along most of the avoidance theta scale continuum (from .96 to 2.86) and had low standard errors (< .18). It is worth mentioning that almost all avoidance items performed relatively poorly on the lower portion

Results of Study 1 IRT Analysis for the Avoidance Items Following the estimation of the slope of each item’s item characteristic curve, using the 1,000 bootstrap samples, we ranked the 18 avoidance items based on their slopes using EIRT software (Valois et al., 2011). This allowed us to identify how many times an item had the steepest slope (i.e., ranked first) over the 1,000 bootstraps, how many

Table 1. Rank of items assessing avoidance based on estimated slopes using 1,000 bootstrap samples Item Rank

18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0 0 0 0 0 0 0 0 0 2 1 1 11 38 89 193 238 427

0 0 0 0 0 0 0 1 4 3 9 19 38 69 117 223 276 241

0 0 0 0 0 0 0 0 1 5 4 19 29 81 158 222 267 214

0 0 0 0 0 5 3 4 18 21 36 67 175 196 218 141 79 37

0 0 0 0 0 0 0 0 2 10 13 53 201 267 216 112 82 44

0 0 0 0 0 1 3 8 11 23 60 95 317 238 141 68 28 7

0 0 0 3 18 31 63 94 100 166 195 217 61 28 15 4 4 1

0 0 3 25 28 47 71 86 106 116 166 162 58 48 31 19 13 21

0 1 10 39 60 71 93 91 138 135 127 122 60 21 8 11 6 7

0 0 0 22 48 88 121 104 160 171 142 119 19 5 1 0 0 0

0 2 26 83 107 124 152 156 137 87 59 39 11 7 2 3 5 0

0 2 32 94 113 130 122 140 126 89 80 41 18 2 4 4 2 1

0 1 17 99 126 173 142 132 109 96 68 35 2 0 0 0 0 0

0 1 33 197 234 158 130 99 56 53 32 7 0 0 0 0 0 0

0 7 82 311 230 149 85 75 28 21 8 4 0 0 0 0 0 0

0 320 503 90 34 22 15 10 4 2 0 0 0 0 0 0 0 0

7 659 294 37 2 1 0 0 0 0 0 0 0 0 0 0 0 0

993 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Note. Items in bold are the avoidance items retained in the ECR-S (Wei et al., 2007). Ó 2015 Hogrefe Publishing

European Journal of Psychological Assessment 2016; Vol. 32(2):140–154

99% CI [5.25–5.66] [5.14–5.66] [5.18–5.65] [5.03–5.61]

[5.05–5.56] [5.06–5.50]

[4.78–5.38] [4.57–5.55]

[4.59–5.46] [4.73–5.34] [4.49–5.38]

[4.50–5.41] [4.63–5.27] [4.56–5.21] [4.43–5.20] [4.17–5.00] [4.00–4.87] [3.44–4.27]

a (SE) 5.47 (.08) 5.44 (.10) 5.43 (.10) 5.34 (.11)

5.34 (.09) 5.30 (.09)

5.12 (.13) 5.11 (.19)

5.07 (.18) 5.05 (.12) 4.98 (.18)

4.98 (.18) 4.96 (.13) 4.90 (.13) 4.85 (.15) 4.58 (.17) 4.48 (.18) 3.85 (.17)

1.57 (.11)

.14 (.04)

.82 (.05)

.51 (.04)

.90 (.06)

.81 (.06)

.18 (.04)

.12 (.03)

.90 (.06)

.06 (.03)

.15 (.03)

.49 (.04)

.84 (.06)

.60 (.05)

.70 (.05)

.96 (.06)

.82 (.05)

.77 (.05)

b1 (SE)

.90 (.07)

.54 (.05)

.23 (.03)

.06 (.03)

.35 (.04)

.26 (.04)

.25 (.03)

.38 (.03)

.28 (.03)

.46 (.04)

.57 (.04)

.01 (.03)

.31 (.04)

.13 (.03)

.17 (.03)

.41 (.03)

.28 (.03)

.25 (.03)

b2 (SE)

b3 (SE)

.47 (.05)

1.47 (.09)

.84 (.05)

.63 (.04)

.46 (.04)

.54 (.04)

.86 (.06)

1.08 (.07)

.75 (.04)

1.04 (.06)

1.14 (.07)

.72 (.04)

.53 (.04)

.57 (.04)

.77 (.04)

.49 (.03)

.64 (.04)

.63 (.04)

b4 (SE)

1.75 (.10)

1.94 (.12)

1.73 (.09)

1.19 (.07)

1.14 (.06)

1.08 (.06)

1.33 (.08)

1.50 (.09)

1.47 (.08)

1.41 (.08)

1.50 (.09)

1.28 (.07)

1.06 (.06)

1.05 (.06)

1.44 (.08)

1.09 (.06)

1.26 (.07)

1.28 (.07)

b5 (SE)

2.73 (.15)

2.29 (.14)

2.49 (.13)

1.99 (.11)

1.91 (.10)

1.78 (.09)

2.01 (.11)

2.03 (.12)

2.08 (.11)

1.95 (.11)

2.01 (.12)

1.90 (.10)

1.48 (.08)

1.60 (.09)

1.95 (.10)

1.60 (.08)

1.76 (.09)

1.75 (.09)

Note. Items in bold are the avoidance items retained for the ECR-12. Italicized items are the avoidance items retained in the ECR-S (Wei et al., 2007).

29. I feel comfortable depending on romantic partners. 27. I usually discuss my problems and concerns with my partner. 25. I tell my partner just about everything. 31. I don’t mind asking romantic partners for comfort, advice, or help. 9. I don’t feel comfortable opening up to romantic partners. 15. I feel comfortable sharing my private thoughts and feelings with my partner. 23. I prefer not to be too close to romantic partners. 7. I get uncomfortable when a romantic partner wants to be very close. 13. I am nervous when partners want to get too close to me. 19. I find it relatively easy to get close to my partner. 5. Just when my partner starts to get close to me I find myself pulling away. 11. I want to get close to my partner, but I keep pulling back. 1. I prefer not to show a partner how I feel deep down. 21. I find it difficult to allow myself to depend on romantic partners. 17. I try to avoid getting too close to my partner. 33. It helps to turn to my romantic partner in times of need. 3. I am very comfortable being close to romantic partners. 35. I turn to my partner for many things, including comfort and reassurance.

Item

Table 2. Slopes’ means, standard errors, and 99% confidence intervals, and item difficulty coefficients for the avoidance subscale items b6 (SE)

3.86 (.23)

2.64 (.16)

3.28 (.20)

2.89 (.16)

2.70 (.15)

2.64 (.14)

2.67 (.16)

2.65 (.16)

2.80 (.16)

2.73 (.17)

2.59 (.16)

2.67 (.15)

2.09 (.11)

2.23 (.12)

2.51 (.14)

2.33 (.12)

2.44 (.13)

2.26 (.12)

b7 (SE)

4.61 (.31)

2.83 (.18)

3.73 (.28)

3.32 (.21)

3.15 (.18)

3.08 (.18)

2.93 (.18)

2.95 (.18)

3.21 (.20)

3.18 (.23)

2.87 (.19)

3.12 (.20)

2.49 (.14)

2.57 (.14)

2.82 (.17)

2.81 (.16)

2.86 (.17)

2.58 (.15)

144 M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

Ó 2015 Hogrefe Publishing

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

of the avoidance scale continuum. Only item 35 had an adequate location on a lower portion of the theta continuum, but it had a high standard error and the lowest ICC of all the avoidance items. To further validate our choice of the best six items for a brief avoidance subscale, we compared versions of the avoidance subscale of different lengths with respect to their distributions of standard errors across points on the latent avoidance continuum. The results indicate that the gain of adding more items decreases after six items, supporting our choice of a six-item avoidance subscale (Figure 1). We then compared the bivariate correlations between the original ECR avoidance subscale and brief versions of different lengths based on their ranks in Table 2. Results indicated that the correlation between the original ECR avoidance subscale and the one-item version is .722. This correlation reached .803 with the two-item version, .838 with the three-item version, .864 with the four-item versions, .889 with the five-item version, and .921 with the six-item version, which is very good for an abridged scale. In sum, 6 of the original 18 items from the ECR avoidance subscale displayed characteristics that made them suitable for a briefer subscale (items 9, 15, 25, 27, 29, and 31). Interestingly, when using IRT, only item 27 was the same as those retained in Wei et al.’s (2007) short version of the avoidance subscale.

145

0.7

Standard error of measurement

0.6

0.5

0.4

0.3

0.2 -3

-2

-1

Avoidance continuum (θ )

Figure 1. The distribution of the standard error of measurement at each point along the latent avoidance continuum (graphed in standard deviation units), depending on the number of items included in the subscale. The fifth curve from the top is for the six-item avoidance subscale of the ECR-12 (the first curve is equivalent to two items).

IRT Analysis of the Attachment Anxiety Items

highest rankings, followed by items 20, 4, 30, and 26 (group 2); items 22, 36, 32, and 10 (group 3); and items 28, 12, 34, and 16 (group 4). Of the six items with the steepest slopes (i.e., group 1), none was classified in the last five ranks over the 1,000 bootstraps. Although items from group 2 occasionally made the top five, items from the other groups were never or rarely classified in the top six.

We used the same procedures described above to identify the best items for a short-form anxiety subscale. Table 3 shows the rankings of the 18 anxiety items based on 1,000 estimated slopes. There were, again, four groups of items based on IRT criteria: items 6, 8, 2, 14, 18, and 24 (group 1) have the

Table 3. Rank of items assessing attachment anxiety based on estimated slopes using 1,000 bootstrap samples Item Rank 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 41 946

0 0 0 0 0 0 0 0 0 0 0 0 0 1 65 363 545 26

0 0 0 0 0 0 0 0 0 0 0 0 0 4 117 492 261 26

0 0 0 0 0 0 0 0 0 0 0 0 4 78 738 127 51 2

0 0 0 0 0 0 0 0 0 1 1 11 84 821 75 5 2 0

0 0 0 0 0 0 0 2 3 9 68 188 643 82 5 0 0 0

0 0 0 0 0 0 1 11 59 191 316 327 94 1 0 0 0 0

0 0 0 0 0 0 11 22 115 235 246 241 120 10 0 0 0 0

0 0 0 0 0 0 2 70 224 255 227 181 39 2 0 0 0 0

0 0 0 0 0 30 77 214 332 206 96 36 8 1 0 0 0 0

0 0 0 6 28 229 201 249 143 79 42 15 8 0 0 0 0 0

0 0 0 1 9 118 441 314 93 19 4 1 0 0 0 0 0 0

0 1 6 24 99 476 243 115 31 5 0 0 0 0 0 0 0 0

6 25 96 231 519 103 17 3 0 0 0 0 0 0 0 0 0 0

22 134 244 367 196 30 7 0 0 0 0 0 0 0 0 0 0 0

10 332 344 218 87 9 0 0 0 0 0 0 0 0 0 0 0 0

110 391 286 147 61 5 0 0 0 0 0 0 0 0 0 0 0 0

852 117 24 6 1 0 0 0 0 0 0 0 0 0 0 0 0 0

Note. Items in bold are the anxiety items retained in the ECR-S (Wei et al., 2007). Ó 2015 Hogrefe Publishing

European Journal of Psychological Assessment 2016; Vol. 32(2):140–154

146

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

Table 4 presents the slopes’ means and confidence intervals for all 18 items of the anxiety subscale, in decreasing order. The averaged item characteristic slopes were all fairly high, although the six items from group 1 appeared to be the best ones to include in a short version of the anxiety subscale. Item difficulty parameters for the 18 anxiety items, also shown in Table 4, revealed that these six items had an adequate location on the entire anxiety scale continuum (from 2.44 to 2.77) and also had the lowest standard errors (< .16). To further validate our choice of a six-item brief anxiety subscale, we compared different-length versions of the anxiety subscale (from a 2-item to a 17-item version) based on their distribution of standard errors on the latent anxiety continuum. Results indicated that the six-item version has a standard error curve comparable to the curves for versions comprising seven or more items, which also supported a six-item ECR anxiety subscale (Figure 2). We also compared the bivariate correlations between the original ECR anxiety subscale and the brief versions of different lengths. The correlation between the ECR anxiety subscale and the 1-item version was .624. This correlation went up to .774 with the two-item version, .813 with the three-item version, .840 with the four-item versions, .873 with the five-item version, and reached .880 with the six-item version, which is adequate for an abridged version. In sum, 6 of the original 18 ECR anxiety items displayed characteristics important for inclusion in an abridged version of this subscale (items 2, 6, 8, 14, 18, and 24). Of these six items, only two (items 6 and 18) were included in Wei et al.’s (2007) short version of the ECR. Differential Item Functioning The software program TESTGRAF was used to perform differential item functioning analyses (DIF; Ramsay, 2000) on all 2,066 participants for the 12 selected items. DIF was conducted to determine whether there was a gender difference in the probability of endorsing a given item on either of the two subscales. The characteristic curves for each item revealed that men and women endorsed them similarly. The composite DIFs (i.e., the summary statistics for the level of bias in each item) were all less than .10 (a criterion suggested by Ramsay, 2000). Thus, there is no gender bias in reporting anxiety or avoidance on the brief ECR-12. This was not the case, however, for 5 out of 12 items included in the ECR-S (Wei et al., 2007): items 11, 13, 16, 17, and 26 showed composite DIFs higher than .10.

Brief Discussion The objective of Study 1 was to use two IRT-based criteria to select the best possible items for a short version of the ECR. Based on IRT analyses and 1,000 nonparametric bootstrap samples, six items from the original ECR avoidance subscale (29, 27, 25, 31, 9, and 15) and six items from the original anxiety subscale (items 6, 8, 2, 14, 18, and 24) were selected. These 12 items possessed the best European Journal of Psychological Assessment 2016; Vol. 32(2):140–154

discrimination and difficulty levels, meaning that the resulting subscales have the greatest discriminating power along a large part of the avoidance and anxiety continuums. This was rarely the case for the 12 items included in the ECR-S (Wei et al., 2007). More specifically, the IRT analyses revealed that 9 items (11, 13, 16, 17, 22, 26, 32, 33, and 35) from the 12 proposed by Wei et al. (2007) did not possess optimal discriminating power. Based on a literature review, Wei et al. (2007) suggested that the attachment anxiety and avoidance subscales should each represent three theoretical components of avoidance and anxiety. They used this theoretical rationale as a tool to develop the ECR-S. The items chosen here for the new ECR-12 are also representative of the three components of anxiety – fear of interpersonal rejection (items 2, 8, and 14), disproportionate need for approval from others (18), and distress when one’s partner is unavailable (6 and 24). In addition, the ECR-12 includes two of the three components of avoidance, namely excessive need for selfreliance (29 and 31) and reluctance to self-disclose (9, 15, 25, and 27). No item in the ECR-12 directly tapped the fear of interpersonal intimacy component, although items 9 and 15 refer to discomfort about opening up or sharing private thoughts and feelings with others. Overall, the ECR-12 represents the various issues involved in attachment anxiety and avoidance, although it addresses the issue of discomfort with closeness primarily through items referring to self-disclosure and openness. Results from the DIF also showed that the new ECR-12, as opposed to the ECR-S, is not affected by a gender response bias (i.e., there is no difference in the way men and women endorse each item). This is important because many studies using the ECR involve assessing both members of heterosexual couples. Our promising new short version needed to be evaluated for validity and reliability in comparison with the original ECR. In subsequent studies, different types of validity and reliability were examined, based on CTT, to further test the psychometric properties of the new short ECR in a variety of samples.

Studies 2 Through 5: Classical Test Theory The goals of Studies 2 through 5 were to examine the psychometric properties of the ECR-12 using CTT, to demonstrate the equivalence of the short form and the original version, and to demonstrate the superiority of the ECR-12 over the ECR-S in a variety of samples. The specific objectives were (1) to confirm the factor structure of the ECR-12 using CFA and test its gender invariance; (2) to assess the correlation between the avoidance and anxiety subscales; (3) to evaluate internal consistency and test-retest reliability over a 1-year period; and (4) to assess convergent and predictive validity with measures of relationship satisfaction and psychological distress. Ó 2015 Hogrefe Publishing

Ó 2015 Hogrefe Publishing

[3.96–4.77]

[3.85–4.63]

[3.71–4.63]

[3.66–4.59]

[3.42–4.31]

4.29 (.15)

4.21 (.17)

4.16 (.17)

3.90 (.18)

[4.60–5.27]

4.96 (.13)

4.42 (.15)

[4.72–5.36]

5.05 (.12)

[4.29–5.08]

[4.82–5.36]

5.10 (.11)

4.69 (.16)

[4.77–5.37]

5.11 (.11)

[4.45–5.09]

[4.92–5.51]

5.24 (.11)

4.80 (.13)

[5.40–5.78] [5.35–5.71] [5.13–5.59]

5.62 (.07) 5.54 (.07) 5.40 (.09)

[4.36–5.19]

[5.45–5.78]

5.63 (.07)

4.83 (.16)

99% CI [5.58–5.85]

a (SE) 5.75 (.05)

b1 (SE)

.33 (.05)

2.37 (.15)

.42 (.05)

1.15 (.08)

3.26 (.20)

2.33 (.13)

1.75 (.10)

2.23 (.13)

.74 (.05)

2.00 (.11)

1.70 (.10)

1.15 (.07)

1.57 (.09)

1.85 (.10) 1.21 (.07) 2.44 (.13)

–1.60 (.08)

1.27 (.07)

.27 (.04)

1.68 (.11)

.12 (.04)

.64 (.06)

2.81 (.17)

1.79 (.10)

1.21 (.07)

1.57 (.09)

.27 (.03)

1.48 (.08)

1.09 (.06)

.71 (.05)

1.14 (.07)

1.42 (.07) .84 (.05) 1.97 (.10)

–1.17 (.06)

.89 (.05)

b2 (SE)

1.14 (.07)

.55 (.06)

.92 (.06)

.13 (.05)

2.11 (.13)

.86 (.06)

.30 (.04)

.56 (.05)

.44 (.04)

.62 (.05)

.24 (.04)

.02 (.03)

.48 (.04)

.79 (.05) .26 (.03) 1.19 (.07)

.49 (.04)

.34 (.03)

b3 (SE)

1.87 (.11)

.48 (.05)

1.66 (.10)

.93 (.07)

1.35 (.10)

.03 (.03)

.55 (.04)

.17 (.04)

1.03 (.06)

.08 (.03)

.43 (.04)

.60 (.04)

.15 (.03)

.28 (.03) .27 (.03) .50 (.04)

.04 (.03)

.10 (.03)

b4 (SE)

2.88 (.17)

1.77 (.10)

2.60 (.15)

2.05 (.12)

.40 (.05)

1.03 (.06)

1.61 (.09)

.99 (.04)

1.83 (.10)

1.07 (.06)

1.32 (.07)

1.51 (.08)

1.05 (.06)

.49 (.04) 1.01 (.06) .33 (.03)

.73 (.04)

.74 (.04)

b5 (SE)

Notes. Items in bold are the anxiety items retained for the ECR-12. Italicized items are the anxiety items retained in the ECR-S (Wei et al., 2007).

6. I worry that romantic partners won’t care about me as much as I care about them. 8. I worry a fair amount about losing my partner. 2. I worry about being abandoned. 14. I worry about being alone. 18. I need a lot of reassurance that I am loved by my partner. 24. If I can’t get my partner to show interest in me, I get upset or angry. 20. Sometimes I feel that I force my partners to show more feeling, more commitment. 4. I worry a lot about my relationships. 30. I get frustrated when my partner is not around as much as I would like. 26. I find that my partner (s) don’t want to get as close as I would like. 22. I do not often worry about being abandoned. 36. I resent it when my partner spends time away from me. 32. I get frustrated if romantic partners are not available when I need them. 10. I often wish that my partner’s feelings for me were as strong as my feelings for him/her. 28. When I’m not involved in a relationship, I feel somewhat anxious and insecure. 12. I often want to merge completely with romantic partners, and this sometimes cares them away. 34. When romantic partners disapprove of me, I feel really bad about myself. 16. My desire to be very close sometimes scares people away.

Item

Table 4. Slopes’ means, standard errors, and 99% confidence intervals, and item difficulty coefficients for the anxiety subscale items b6 (SE)

3.94 (.24)

3.10 (.18)

3.55 (.21)

3.28 (.19)

.57 (.05)

2.29 (.12)

2.68 (.14)

2.18 (.12)

2.78 (.15)

2.21 (.11)

2.33 (.12)

2.54 (.13)

2.17 (.11)

1.42 (.07) 1.84 (.09) 1.31 (.07)

1.62 (.08)

1.58 (.08)

b7 (SE)

4.42 (.30)

3.74 (.22)

4.04 (.26)

3.94 (.24)

1.12 (.08)

2.90 (.16)

3.18 (.18)

2.93 (.16)

3.29 (.19)

4.51 (.42)

4.55 (.42)

3.04 (.17)

2.77 (.15)

1.90 (.10) 2.24 (.12) 1.83 (.09)

2.11 (.11)

2.04 (.10)

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12 147

European Journal of Psychological Assessment 2016; Vol. 32(2):140–154

148

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

Standard error of measurement

0.7

0.6

0.5

0.4

0.3 -3

-2

-1

Anxiety continuum (θ )

Figure 2. The distribution of the standard error of measurement at each point along the latent anxiety continuum (graphed in standard deviation units), depending on the number of items included in the subscale. The fifth curve from the top is for the six-item anxiety subscale of the ECR-12 (the first curve is equivalent to two items).

Method Study 2 The second sample was composed of 316 French-Canadian heterosexual couples from the province of Quebec (married n = 202, cohabiting n = 114). A survey firm used randomdigit dialing to recruit adult participants who had been married or cohabiting for at least 6 months. The mean age of the participants was 39 years, with an average of 14 years of education, and average relationship duration of 13 years. Most couples had children (73.7%). Questionnaire packets with prepaid return envelopes were mailed to 500 couples (two separate packets to ensure confidentiality). Of the 500 couples, 316 completed and returned both partners’ questionnaires (response rate = 63.2%). A year later, 154 couples agreed to complete the questionnaires again (response rate = 48.7%). All participants completed the French-language version of the original ECR at Time 1 and Time 2. They also completed the four-item French version of the Dyadic Adjustment Scale (DAS-4; Spanier, 1976; abridged by Sabourin, Valois, & Lussier, 2005) assessing couple satisfaction, and the French-language brief Psychological Symptoms Index (PSI; Ilfeld, 1976) assessing psychological distress. The DAS-4 Cronbach’s alphas were .83 for women and .79 for men, whereas the PSI alpha coefficients were .90 for women and .89 for men.

general community. The mean age of the participants was 30 years for women and 33 years for men. The majority of women (82.8%) and men (78.4%) were college educated. Couples had been living together for an average of 6 years, and 30.2% had children. Couples were recruited through local newspapers, posters, and community events in the province of Ontario. At Time 1, they took part in a larger study requiring them to participate in a 2-hr testing session. Partners completed a questionnaire package independently from one another. One year later, 116 couples (response rate = 51.8%) agreed to complete the questionnaires a second time. At Time 1, participants completed the original version of the ECR, as well as the English version of the DAS-4 (women, a = .76; men, a = .73). At Time 2, participants completed the original version of the ECR, as well as the DAS-4 (women, a = .76; men, a = .87) and the PSI (women, a = .91; men, a = .92). Study 4 The fourth sample included 107 men and 288 women involved in same-sex relationships, recruited in the province of Ontario, Canada. They identified themselves as gay or lesbian (78%) or as bisexual (22%). To participate, individuals had to be at least 18 years old, and to have been in their current same-sex couple relationship for at least 12 months. Mean age of the participants was 29 years and mean length of their current relationship was 4.18 years. Most of the participants had a college degree (72.9%) and 12.9% had children. They were recruited through flyers, emails, or posters in local organizations serving the gay and lesbian (GLBT) community. Persons interested in participating in the study were sent a link to the online survey or a paper questionnaire. They completed the original ECR and the English version of the DAS-4 (a = .81). Study 5 The fifth sample included 524 Canadian couples (40.6% married) seeking couple therapy in a private practice in either Quebec or Ontario. The couples spoke French (83.4%) or English (16.6%). The mean age was 40 years for women and 43 years for men. The majority of the women (84.0%) and the men (88.2%) were college educated. Couples had been living together for an average of 12 years, and 83.2% had children. Partners were invited to independently complete the ECR, DAS-4 (women, a = .74; men, a = .74), and the PSI measure of psychological distress (women, a = .91; men, a = .91) just before the first therapy session.

Results

Study 3

Factor Structure

The third sample included 224 English-speaking Canadian heterosexual couples (28.8% married) recruited from the

Confirmatory Factor Analyses (CFA) were conducted separately for women and men to confirm the factor structure of

European Journal of Psychological Assessment 2016; Vol. 32(2):140–154

Ó 2015 Hogrefe Publishing

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

149

Table 5. Fit indexes for the Confirmatory Factor Analyses of the ECR-12 and the ECR-S Study 2W 2M 3W 3M 4 M&W 5W 5M

Version ECR-12 ECR-S ECR-12 ECR-S ECR-12 ECR-S ECR-12 ECR-S ECR-12 ECR-S ECR-12 ECR-S ECR-12 ECR-S

v (52, v2(53, v2(52, v2(53, v2(52, v2(53, v2(52, v2(53, v2(52, v2(53, v2(52, v2(53, v2(52, v2(53,

N N N N N N N N N N N N N N

= = = = = = = = = = = = = =

316) 316) 316) 316) 284) 284) 284) 284) 392) 392) 524) 524) 524) 524)

= = = = = = = = = = = = = =

130.81, 192.40, 136.97, 226.53, 116.37, 209.89, 119.74, 152.98, 132.08, 321.69, 109.66, 300.95, 172.58, 295.27,

p p p p p p p p p p p p p p

< < < < < < < < < < < < < <

.001 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001

v2/df

CFI

TLI

RMSEA

2.52 3.63 2.63 4.27 2.23 3.96 2.30 2.89 2.54 6.07 2.11 5.68 3.32 5.57

.92 .77 .92 .75 .93 .83 .90 .84 .95 .82 .97 .79 .94 .77

.88 .66 .87 .64 .89 .75 .84 .76 .92 .74 .96 .70 .90 .66

.07 .09 .07 .10 .07 .10 .07 .08 .06 .11 .05 .09 .07 .09

Note. W = Women; M = Men.

the ECR-12 and ECR-S, using AMOS (Arbuckle, 1999). Maximum likelihood estimation was employed to estimate the two-factor models, with items 29, 27, 25, 31, 9, and 15 predicted by the ECR-12 avoidance latent factor (ECR-S = items 11, 13, 17, 27, 33, 35) and items 6, 8, 2, 14, 18, and 24 predicted by the ECR-12 anxiety latent factor (ECR-S = items 6, 16, 18, 22, 26, 32). When Brennan et al. (1998) created the ECR, the two 18-item scales were hypothesized to be uncorrelated. However, Cameron, Finnegan, and Morry (2012) have recommended reporting the correlation between anxiety and avoidance based on a meta-analysis. In the present study we estimated the covariance between the two factors. A covariance was also added between the error terms of items 29 and 31 to account for similar wording. Each CFA was tested using several fit indices, namely the Comparative Fit Index (CFI), the Tucker Lewis Index (TLI), and the Root Mean Square Error of Approximation (RMSEA). The CFI and TLI values typically range from 0 to 1, and values equal to or greater than .90 are representative of a well-fitting model. The RMSEA calculates the average error of fit between the submitted model and the data, and a value equal to or lower than .08 corresponds to an adequate fit (Browne & Cudeck, 1993). The chisquare test is perceived to be less valid with a larger sample size. It is therefore preferable to use the v2/df ratio. When the value of the ratio falls between 1 and 5, the fit between the proposed theoretical model and the data is satisfactory (Bollen, 1989). As shown in Table 5, the indexes of fit were marginal or adequate for both women and men with regard to the ECR-12. In contrast, the CFA yielded poorer and inadequate indexes of fit for the ECR-S in all samples, suggesting poor factorial structure for the ECR-S. 1

Across all samples, the correlation between the anxiety and avoidance latent factors was closer to the expected null correlation for the ECR-12 than it was for the ECR-S. In Study 2, this correlation was small for the ECR-12 (women: r = .07, p = .346; men: r = .16, p = .034), whereas for the ECR-S it was moderate for women (r = .43, p < .001) and surprisingly strong for men (r = .71, p < .001). In Study 3, the correlation between the ECR-12 anxiety and avoidance factors was .17 for women ( p = .059) and .14 for men ( p = .132), whereas for the ECR-S it reached .29 for women ( p = .001) and .35 for men ( p < .001). In Study 4, this correlation was small for the ECR-12 (r = .13, p = .050) and moderate for the ECR-S (r = .38, p < .001). In Study 5, the correlation between the ECR12 anxiety and avoidance factors was close to zero for women (r = .01, p = .841) and men (r = .04, p = .485), whereas for the ECR-S it was significant for men (r = .29, p < .001) but not for women (r = .10, p = .094). Structure Invariance Comparisons of the equivalence of factor loadings between men and women were conducted, using the multiple group comparison approach (Byrne, 2001).1 In Study 2, significant chi-square difference tests indicated a difference between men and women in factor loading estimates for the ECR-12 (v2(10, N = 632) = 21.98, p = .015) and the ECR-S (v2(10, N = 632) = 39.49, p < .001). Gender invariance testing of the ECR-12 revealed no significant differences between men and women in the factor loadings in Study 3 (v2(10, N = 568) = 7.46, p = .682), Study 4 (v2(10, N = 392) = 13.60, p = .192), and Study 5 (v2(10,

We also tested language invariance of the ECR-12 and ECR-S separately for men and women (because of the aforementioned nonindependence of the dyadic data) using the two community samples of couples (Studies 2 and 3). Language invariance testing of the ECR-12 revealed no significant differences in the factor loadings between the French- and English-speaking women (v2(10, N = 600) = 11.368, p = .330) and men (v2(10, N = 600) = 8.699, p = .561). For the ECR-S, a nonsignificant chi-square difference test was found in men (v2(10, N = 600) = 14.711, p = .143), but a significant language difference was found in women (v2(10, N = 600) = 32.556, p < .001).

Ă&#x201C; 2015 Hogrefe Publishing

European Journal of Psychological Assessment 2016; Vol. 32(2):140â&#x20AC;&#x201C;154

150

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

Table 6. Standardized coefficients (bs) from test-retest structural equation modeling Study 2 ECR-12 anxiety ECR-S anxiety ECR anxiety ECR-12 avoidance ECR-S avoidance ECR avoidance Study 3 ECR-12 anxiety ECR-S anxiety ECR anxiety ECR-12 avoidance ECR-S avoidance ECR avoidance

Women

Men

.82*** .96*** .86*** .67*** .62*** .67***

.80*** .89*** .80*** .70*** .78*** .70***

.67*** .65*** .69*** .53*** .67*** .64***

.81*** .89*** .81*** .65*** .66*** .64***

(Studies 2 and 3). Using AMOS, structural equation modeling (SEM) allowed us to estimate regression coefficients between Time 1 and Time 2 for the original and the two brief ECR versions. To create indicators of the latent anxiety and avoidance variables, the six items on each ECR subscale were randomly divided into three parcels and averaged (Little, Cunningham, Shahar, & Widaman, 2002). According to Sibley et al. (2005), the use of SEM to assess stability reduces the effect of measurement error and provides a better appraisal of the underlying construct. To rule out the possibility of gender differences and to acknowledge the nonindependence of the dyadic data, separate analyses were conducted for men and women. In both samples, gender-specific results revealed that Time 1 latent avoidance predicted Time 2 latent avoidance, and Time 1 latent anxiety predicted Time 2 latent anxiety (see Table 6). Similar estimates were found for the abridged and original subscales of the ECR, suggesting equivalence of the short versions.

Note. ***p < .001.

N = 1054) = 11.64, p = .310). For the ECR-S, a nonsignificant chi-square difference test was found in Study 3 (v2(10, N = 568) = 15.23, p = .124), but significant gender differences were found in Study 4 (v2(10, N = 392) = 22.52, p = .013) and Study 5 (v2(10, N = 1,058) = 22.28, p = .014).

Internal Consistency Cronbach’s alpha coefficients in the subscales of the original and brief versions of the ECR are displayed in Table 7 for all samples. Cronbach’s alphas for the ECR subscales and the ECR-12 subscales were all adequate, whereas the coefficients were poorer for the ECR-S (a < .70) in Studies 2 and 5 (especially for the men’s avoidance subscale).

Correlations With Original ECR The correlations between the brief and the original subscales of the ECR were also tested to ensure representativeness of the new short scale. Across all samples (Study 2–5), the original anxiety subscale correlated highly with the short anxiety subscale from both the ECR-12 (rs from .89 to .95) and ECR-S (rs from .86 to .94). Similar results were also found for the short avoidance subscales from both the ECR-12 (rs from .84 to .92) and ECR-S (rs from .87 to .93). Z tests revealed no significant differences between the correlations obtained for the ECR-S and the ECR-12 (all ps > .05), suggesting equivalence of the short versions.

Convergent and Predictive Validity To examine the convergent validity of the brief version of the ECR, regression analyses (considering both anxiety and avoidance in the same analysis, as recommended by Cameron et al., 2012) predicting relationship satisfaction and psychological distress were performed. The results revealed significant associations between the attachment subscales (both the original and the brief versions) and measures of psychological distress and relationship satisfaction (see Table 8). In the clinical sample of Study 5, however, only the avoidance subscales (negatively) predicted relationship satisfaction. Z tests revealed no significant differences between the coefficients obtained with the original ECR, the ECR-S, and the ECR-12 (all ps > .05), suggesting equivalence of the short versions.

Test-Retest Stability Test-retest stability of the ECR, ECR-12, and ECR-S over a 1-year period was examined in two samples

Table 7. Cronbach’s alpha coefficients and 95% confidence intervals for ECR anxiety and avoidance subscales Study 2 2 3 3 4 5 5

W M W M M&W W M

N 316 316 224 224 392 524 524

ECR-12 anxiety .78 .78 .84 .82 .87 .83 .80

[.74; [.74; [.81; [.78; [.84; [.80; [.78;

.82] .82] .88] .86] .89] .85] .83]

ECR-S anxiety .60 .64 .77 .75 .82 .71 .73

[.53; [.57; [.71; [.69; [.79; [.67; [.69;

.67] .70] .81] .80] .85] .74] .77]

ECR anxiety .86 .86 .91 .90 .93 .89 .88

[.84; [.83; [.89; [.88; [.92; [.87; [.86;

.88] .88] .92] .92] .94] .90] .89]

ECR-12 avoidance .75 .80 .83 .74 .79 .82 .80

[.71; [.77; [.79; [.67; [.75; [.79; [.78;

.79] .84] .86] .79] .82] .84] .83]

ECR-S avoidance .73 .69 .86 .79 .82 .71 .64

[.68; [.63; [.83; [.74; [.79; [.67; [.59;

.78] .74] .89] .83] .85] .75] .69]

ECR avoidance .88 .87 .93 .89 .90 .89 .88

[.86; [.85; [.92; [.87; [.88; [.88; [.86;

.90] .89] .94] .91] .92] .91] .89]

Note. W = Women; M = Men. European Journal of Psychological Assessment 2016; Vol. 32(2):140–154

Ó 2015 Hogrefe Publishing

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

151

Table 8. Standardized regression coefficients from analyses in which ECR subscales (attachment anxiety and avoidance) predicted two validity criteria, relationship satisfaction (DAS-4) and psychological distress (PSI) Validity criteria Study 2 DAS-4 W DAS-4 M PSI W PSI M Study 3 DAS-4 W DAS-4 M DAS-4 T2W DAS-4 T2M PSI T2W PSI T2M Study 4 DAS-4 Study 5 DAS-4 W DAS-4 M PSI W PSI M

ECR-12 anxiety

ECR-S anxiety

ECR anxiety

ECR-12 avoidance

ECR-S avoidance

ECR avoidance

.21*** .09 .23*** .28***

.23*** .04 .25*** .30***

.20*** .04 .26*** .33***

.49*** .47*** .34*** .21***

.50*** .53*** .30*** .19**

.54*** .56*** .33*** .21***

.13 .10 .08 .25 .38** .48***

.09 .06 .06 .26 .32* .44**

.13 .09 .06 .19 .33* .43**

.56*** .52*** .50*** .23 .08 .07

.63*** .46*** .54*** .30* .26 .12

.52*** .39** .51*** .33* .26 .14

.22**

.20**

.18**

.34***

.42***

.41***

.09 .05 .30*** .32***

.04 .07 .27*** .30***

.06 .08* .31*** .34***

.39** .42*** .09* .20***

.36*** .43*** .11* .19***

.39*** .45*** .10* .23***

Notes. *p < .05. **p < .01. ***p < .001. W = Women; M = Men; DAS-4 = Dyadic Adjustment Scale (four-item short form); PSI = Psychological Symptoms Index; T2 = Time 2.

Discussion The main objective of this research project was to develop an optimal brief version of the frequently used but rather long Experiences in Close Relationships (ECR) measure (Brennan et al., 1998), using IRT analyses to evaluate item performance. The resulting self-report questionnaire is called the ECR-12. A secondary objective was to determine the psychometric properties of the ECR-12 in couples from the general community (Studies 2 and 3), individuals involved in same-sex relationships (Study 4), and dyads seeking couple therapy (Study 5). The five studies show that the ECR-12 preserves the good psychometric properties of the ECR-36 and generally has better psychometric properties than the previously developed brief measure, the ECR-S (Wei et al., 2007).

Development of the ECR-12 Using two criteria from IRT, six items from the original ECR avoidance subscale (9, 15, 25, 27, 29, and 31) and six items from the original anxiety subscale (items 2, 6, 8, 14, 18, and 24) were selected, preserving the theoretically grounded two-dimensional structure of the original ECR (Brennan et al., 1998). These 12 items possessed the best item discrimination and difficulty indices, meaning that the resulting subscales had the greatest discriminating power along a large portion of the avoidance continuum and all along the anxiety continuum. This was not always Ó 2015 Hogrefe Publishing

the case for the 12 items included in the ECR-S, which also evidenced higher measurement errors.

Psychometric Properties of the ECR-12 The two-dimensional structure of the ECR-12 was well established with confirmatory factor analyses (CFA) in all five samples. The distinction between attachment anxiety and avoidance was clear across a variety of populations, subcultures, genders, and languages. We used several indexes to assess the model’s goodness of fit, one of which was the comparative fit index (CFI). CFI values greater than .90 indicate a reasonable fit of the data, and values of .95 or greater indicate a model that fits the data well (Hu & Bentler, 1999). The ECR-12 consistently fit the predicted two-dimensional data structure much better than Wei et al.’s (2007) ECR-S in all four Canadian samples (Frenchspeaking couples, English-speaking couples, individuals in same-sex relationships, and couples seeking therapy). In comparison to the ECR-12, poor and inadequate indexes of fit were obtained for the ECR-S in all samples, suggesting poorer factorial structure for the ECR-S. Wei et al. (2007) had to create two additional latent variables (accounting for response set) to improve the goodness of fit of the model to their data. The ECR-12, like the original ECR, did not require the creation of such latent variables, even with its use of reverse-scored items. The statistical independence of the ECR-12’s two subscales (avoidance and anxiety) was generally demonstrated across our diverse samples. In cases where avoidance and European Journal of Psychological Assessment 2016; Vol. 32(2):140–154

152

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

anxiety scores were significantly related, the correlation was small, as intended by the creators of the original ECR (Brennan et al., 1998). In this regard, the ECR-S (Wei et al., 2007) is akin to the ECR-R (Fraley et al., 2000), which the meta-analysis conducted by Cameron et al. (2012) found to generate higher correlations between anxiety and avoidance than the original ECR. Moreover, the correlations found between the ECR-S anxiety and avoidance subscales within our samples sometimes reached .70, a threshold where a multicollinearity problem can arise. Our results suggest that this problem will not be encountered with the ECR-12. Based on the internal consistency of the questionnaire items and the questionnaire’s stability over a 1-year period (in Studies 2 and 3), the ECR-12 proved to be a reliable measure in our four samples (French-speaking couples, English-speaking couples, individuals in same-sex relationships, and couples seeking therapy). Specifically, Cronbach’s alphas varied from .78 to .87 for the anxiety subscale and from .74 to .83 for the avoidance subscale. Due to the decrease in the number of items and the elimination of redundant items, it was anticipated that the alphas for the ECR-12 would be slightly lower than the ones for the original ECR subscales. However, alpha coefficients remained .74 or higher for the ECR-12, indicating acceptable to excellent internal consistency for short subscales. In comparison, Cronbach’s alphas for the ECR-S were generally lower and in a few cases fell within the .60 range – among French-speaking couples and couples seeking therapy – suggesting higher measurement errors with those populations. Findings from Studies 2 and 3 demonstrated the relatively good stability of all three versions of the ECR over a period as long as 1 year. As expected, both the anxiety and avoidance subscales correlated with two validity criteria: relationship satisfaction and psychological distress. Specifically, findings from Studies 2 through 5 provided further support for the convergent and predictive validity of the ECR-12, with correlations similar to those of the original ECR. These results are consistent with attachment theory and the results of several previous studies (see Mikulincer & Shaver, 2007, for a review).

Gender Considerations As expected, no gender differences were found in the probability of endorsing each level of anxiety or avoidance on each item on the ECR-12 (Study 1). Gender differences were found, however, for 5 out of 12 items from the ECR-S. We also compared the factor structure and factor loadings for men and women in Studies 2 to 5. A genderinvariant factor structure of the ECR-12 was found in three out of four samples (Studies 3, 4, and 5, but not Study 2). In couples from the community, seeking therapy, and samesex couples the link between the items and their target factor was comparable for men and women, but there was a

European Journal of Psychological Assessment 2016; Vol. 32(2):140–154

significant gender difference in the sample of FrenchCanadian couples. We found, however, gender differences in the factor loadings for Wei et al.’s ECR-S in three out of four samples (Studies 2, 4, and 5). Language invariance testing of the ECR-12, using the two community samples of couples (Studies 2 and 3), revealed no significant differences in the factor loadings between the French- and English-speaking women and men. However, we found language differences in the factor loadings for Wei et al.’s ECR-S in the women sample. Thus, we conclude that the new ECR-12 factor structure is more likely to be invariant with regard to gender and language across samples than the ECR-S.

Implications Using item response theory, we selected 12 items from the original ECR that exhibit good measurement precision at most levels of the anxiety and avoidance continuums. Four studies demonstrated the reliability and convergent and predictive validities of the ECR-12. In practical terms, we reduced the number of items from 36 to 12 without compromising the desirable psychometric properties of the original ECR. Moreover, the high correlations between the original ECR subscales and the brief versions of them (rs around .90) imply that the two versions of the subscales assess the same constructs. The ECR-12 maintains the brevity of the ECR-S developed by Wei et al. (2007), but was developed using an IRT method similar to that employed in the development of the ECR-R (Fraley et al., 2000). In the present studies, the good psychometric properties of the ECR-12 were demonstrated in samples of short-term and long-term relationships, heterosexual and same-sex relationships, and couples from different settings (college, community, clinics) and with participants who spoke different languages (French and English). The ECR-12 can confidently be used by researchers and mental health practitioners among these populations when a short measure of attachment anxiety and avoidance is needed. Four limitations of our studies are that (a) all of the data came from self-report measures, (b) the only validity criteria were relationship satisfaction and psychological distress, (c) each six-item scale contains a mix of response to romantic partners in general and relationship-specific items as in the original ECR, and (d) there was an overlap in the wording for two of the items included in the ECR-12 due to the fact that item selection was mainly based on the IRT analyses. In future studies it will be important to use other assessment methods such as behavioral observations, social-cognitive tasks, daily diaries, and physiological measurements to further establish the validity of the ECR-12. Because all of these methods have already been used with the original ECR (see review by Mikulincer & Shaver, 2007), and given that the ECR-12 is structured and functions much like the original ECR, we are confident that

Ó 2015 Hogrefe Publishing

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

the ECR-12 will prove to have high construct validity in future studies.

References Ainsworth, M. D. S., Blehar, M. M., Waters, E., & Wall, S. (1978). Patterns of attachment: A psychological study of the strange situation. Hillsdale, NJ: Erlbaum. Alonso-Arbiol, I., Balluerka, N., & Shaver, P. R. (2007). A Spanish version of the Experiences in Close Relationships (ECR) adult attachment questionnaire. Personal Relationships, 14, 45–63. Arbuckle, J. (1999). AMOS 4.0 [Computer software]. Chicago, IL: Smallwaters. Baker, F. B. (2001). The basics of item response theory. College Park, MD: ERIC Clearinghouse on Assessment and Evaluation. Bartholomew, K., & Horowitz, L. M. (1991). Attachment styles among young adults: A test of a four-category model. Journal of Personality and Social Psychology, 61, 226–244. Bollen, K. A. (1989). Structural equations with latent variables. New York, NY: Wiley. Brennan, K. A., Clark, C. L., & Shaver, P. R. (1998). Self-report measurement of adult attachment: An integrative overview. In J. A. Simpson & W. S. Rholes (Eds.), Attachment theory and close relationships (pp. 46–76). New York, NY: Guilford Press. Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136–162). Beverly Hills, CA: Sage. Bowlby, J. (1969/1982). Attachment and loss: Vol. 1. Attachment. New York, NY: Basic Books. Byrne, B. M. (2001). Structural equation modeling with Amos: Basic concepts, applications, and programming. Mahwah, NJ: Erlbaum. Cameron, J., Finnegan, H., & Morry, M. (2012). Orthogonal dreams in an oblique world: A meta-analysis of the association between attachment anxiety and avoidance. Journal of Research in Personality, 46, 472–476. Collins, N. L., & Allard, L. M. (2001). Cognitive representations of attachment: The content and function of working models. In G. J. O. Fletcher & M. S. Clark (Eds.), Blackwell handbook of social psychology: Vol. 2. Interpersonal processes (pp. 60–85). Oxford, UK: Blackwell. Conradi, H. J., Gerlsma, C., van Duijn, M., & de Jonge, P. (2006). Internal and external validity of the Experiences in Close Relationships Questionnaire in an American and two Dutch samples. European Journal of Psychiatry, 20, 258–269. Del Giudice, M. (2011). Sex Differences in romantic attachment: A meta-analysis. Personality Social Psychology Bulletin, 37, 193–214. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum. Fairchild, A., & Finney, S. J. (2006). Investigating validity evidence for the Experiences in Close Relationships-Revised questionnaire. Educational and Psychological Measurement, 66, 116–135. Fraley, R. C., Waller, N. G., & Brennan, K. A. (2000). An item response theory analysis of self-report measures of adult attachment. Journal of Personality and Social Psychology, 78, 350–365. Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.

Ó 2015 Hogrefe Publishing

153

Ilfeld, F. W. (1976). Further validation of a psychiatric symptom in a normal population. Psychological Reports, 39, 1215– 1228. Lafontaine, M.-F., & Lussier, Y. (2003). Bidimensional structure of attachment in love: Anxiety over abandonment and avoidance of intimacy. Canadian Journal of Behavioural Science, 35, 56–60. Little, T. D., Cunningham, W. A., Shahar, G., & Widaman, K. F. (2002). To parcel or not to parcel: Exploring the question, weighing the merits. Structural Equation Modeling, 9, 151–173. MacDonald, P. L., & Paunonen, S. V. (2002). A Monte Carlo simulation of item and person statistics based on item response theory and classical test theory. Educational and Psychological Measurement, 62, 921–943. Mallinckrodt, B., & Wang, C. (2004). Quantitative methods for verifying semantic equivalence of translated research instruments: A Chinese version of the Experiences in Close Relationships Scale. Journal of Counseling Psychology, 51, 368–379. Marsh, H. W., Hau, K.-T., Balla, J. R., & Grayson, D. (1998). Is more ever too much? The number of indicators per factor in confirmatory factor analysis. Multivariate Behavioral Research, 33, 181–230. Matte, M., Lemieux, A., & Lafontaine, M.-F. (2009). Brief report: Factor structure of the Experiences in Close Relationships with gay and lesbian individuals. North American Journal of Psychology, 11, 361–368. Mikulincer, M., & Florian, V. (2000). Exploring individual differences in reactions to mortality salience: Does attachment style regulate terror management mechanisms? Journal of Personality and Social Psychology, 79, 260–273. Mikulincer, M., & Shaver, P. R. (2007). Attachment in adulthood: Structure, dynamics, and change. New York, NY: Guilford Press. Mooney, C., & Duval, R. (1993). Bootstrapping: A nonparametric approach to statistical inference. Thousand Oaks, CA: Sage. Nakao, T., & Kato, K. (2004). Constructing the Japanese version of the Adult Attachment Style Scale (ECR). Japanese Journal of Psychology, 75, 154–159. Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York, NY: McGraw-Hill. Picardi, A., Bitetti, D., Puddu, P., & Pasquini, P. (2000). Development and validation of an Italian version of the questionnaire ‘‘Experiences in Close Relationships,’’ a new self-report measure of adult attachment. Rivista di Psichiatria, 35, 114–120. Poitras, S.-C., Guay, F., & Ratelle, C. F. (2012). Using the selfdirected search in research: Selecting a representative pool of items to measure vocational interests. Journal of Career Development, 39, 186–207. Ramsay, J. O. (2000). TestGraf: A program for the graphical analysis of multiple choice test and questionnaire data. Unpublished manuscript, McGill University, Montreal, Canada. Reeve, B. B., & Fayers, P. (2005). Applying item response theory modelling for evaluating questionnaire item and scale properties. In P. Fayers & R. D. Hays (Eds.), Assessing quality of life in clinical trials: Methods and practice (2nd ed., pp. 55–73). New York, NY: Oxford University Press. Robins, R. W., Hendin, H. M., & Trzesniewski, K. H. (2001). Measuring global self-esteem: Construct validation of a single-item measure and the Rosenberg Self-Esteem Scale. Personality and Social Psychology Bulletin, 27, 151–161. Sabourin, S., Valois, P., & Lussier, Y. (2005). Development and validation of a brief version of the Dyadic Adjustment Scale with a nonparametric item analysis model. Psychological Assessment, 17, 15–27.

European Journal of Psychological Assessment 2016; Vol. 32(2):140–154

154

M.-F. Lafontaine et al.: Psychometrics Properties of the ECR-12

Samejima, F. (1969). Estimation of ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 1–97. Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York, NY: SpringerVerlag. Sibley, C. G., Fischer, R., & Liu, J. H. (2005). Reliability and validity of the revised Experiences in Close Relationships (ECR-R) self-report measure of adult romantic attachment. Personality and Social Psychology Bulletin, 31, 1524–1536. Spanier, G. B. (1976). Measuring dyadic adjustment: New scales for assessing the quality of marriage and similar dyads. Journal of Marriage and the Family, 38, 15–28. Stroud, M. W., McKnight, P. E., & Jensen, M. P. (2004). Assessment of self-reported physical activity in patients with chronic pain: Development of an abbreviated RolandMorris disability scale. The Journal of Pain, 5, 257–263. Thissen, D., & Steinberg, L. (1997). A response model for multiple choice items. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 381–394). New York, NY: Springer. Valois, P., Houssemand, C., Germain, S., & Abdous, B. (2011). An open source tool to verify the psychometric properties of an evaluation instrument. Procedia, Social and Behavioral Sciences, 15, 552–556.

European Journal of Psychological Assessment 2016; Vol. 32(2):140–154

Wei, M., Russell, D. W., Mallinckrodt, B., & Vogel, D. L. (2007). The Experiences in Close Relationships (ECR) Scale-short form: Reliability, validity, and factor structure. Journal of Personality Assessment, 88, 187–204.

Date of acceptance: August 29, 2014 Published online: February 27, 2015

Marie-France Lafontaine Department of Psychology University of Ottawa 136 Jean-Jacques Lussier Ottawa K1N 6N5 Canada E-mail mlafonta@uottawa.ca

Ó 2015 Hogrefe Publishing

How to assess the social atmosphere in forensic hospitals and identify ways of improving it “All clinicians and researchers who want to help make forensic treatment environments safe and effective should buy this book.” Mary McMurran, PhD, Professor of Personality Disorder Research, Institute of Mental Health, University of Nottingham, UK

Norbert Schalast / Matthew Tonkin (Editors)

The Essen Climate Evaluation Schema – EssenCES A Manual and More

2016, x + 108 pp. US $49.00 / € 34.95 ISBN 978-0-88937-481-2 Also available as eBook The Essen Climate Evaluation Schema (EssenCES) described here is a short, well-validated questionnaire that measures three essential facets of an institution’s social atmosphere. An overview of the EssenCES is followed by detailed advice on how to administer

www.hogrefe.com

and score it and how to interpret findings, as well as reference norms from various countries and types of institutions. The EssenCES “manual and more” is thus a highly useful tool for researchers, clinicians, and service managers working in forensic settings.

European Psychologist

nline o e e r e f le issu p m a s

Official Organ of the European Federation of Psychologists’ Associations (EFPA) Editor-in-Chief Peter Frensch Humboldt-University of Berlin, Germany

Associate Editors Rainer Banse, Bonn, Germany Ulrike Ehlert, Zurich, Switzerland Katariina Salmela-Aro, Helsinki, Finland

Managing Editor Kristen Lavallee

ISSN-Print 1016-9040 ISSN-Online 1878-531X ISSN-L 1016-9040 4 issues per annum (= 1 volume)

Subscription rates (2016) Libraries / Institutions US $254.00 / € 196.00 Individuals US $125.00 / € 89.00 Postage / Handling US $16.00 / € 12.00

www.hogrefe.com

About the Journal The European Psychologist is a multidisciplinary journal that serves as the voice of psychology in Europe, seeking to integrate across all specializations in psychology and to provide a general platform for communication and cooperation among psychologists throughout Europe and worldwide. The journal accepts two kinds of contributions: Original Articles and Reviews: Integrative articles and reviews constitute the core material published in the journal. These state-of-the-art papers cover research trends and developments within psychology, with possible reference to European perceptions or fields of specialization. Empirical articles will be considered only in rare circumstances when they present findings from major multinational, multidisciplinary or longitudinal studies, or present results with markedly wide relevance. EFPA News and Views are a central source of information on important legal, regulatory, ethical, and administrative matters of interest to members of the European Federation of Psychologists’ Associations (EFPA) and other psychologists working

throughout Europe. Such items include: News, reports from congresses or EFPA task forces and member country organizations, policy statements, keynote and award addresses, archival or historical documents with relevance for all European psychologists, and calendars of forthcoming meetings. Manuscript Submissions All manuscripts should be submitted online at www.editorialmanager.com/ep, where full instructions to authors are also available. Electronic Full Text The full text of the journal – current and past issues (from 1996 onward) – is available online at econtent.hogrefe.com/loi/epp (included in subscription price). A free sample issue is also available there. Abstracting Services The journal is abstracted / indexed in Current Contents / Social and Behavioral Sciences (CC / S&BS), Social Sciences Citation Index (SSCI), ISI Alerting Services, Social SciSearch, PsycINFO, PASCAL, PSYNDEX, ERIH, and Scopus. Impact Factor (Journal Citation Reports®, Thomson Reuters): 2015 = 3.372

Original Article

Measuring the Situational Eight DIAMONDS Characteristics of Situations An Optimization of the RSQ-8 to the S8* John F. Rauthmann1 and Ryne A. Sherman2 1

Humboldt-Universität zu Berlin, Germany, 2Florida Atlantic University, FL, USA

Abstract. It has been suggested that people perceive psychological characteristics of situations on eight major dimensions (Rauthmann et al., 2014): The ‘‘Situational Eight’’ DIAMONDS (Duty, Intellect, Adversity, Mating, pOsitivity, Negativity, Deception, Sociality). These dimensions have been captured with the 32-item RSQ-8. The current work optimizes the RSQ-8 to derive more economical yet informative and precise scales, captured in the newly developed S8*. Nomological associations of the original RSQ-8 and the S8* with situation cues (extracted from written situation descriptions) were compared. Application areas of the S8* are outlined. Keywords: situations, psychological situation characteristics, situation assessment, Situational Eight DIAMONDS, Riverside Situational Q-Sort (RSQ)

People form ‘‘psychologically active’’ representations of stimuli (Fleeson, 2007; Rauthmann, 2012). Such psychological situations are perceived, described, and evaluated with situation characteristics which represent salient and important ascriptions of ‘‘meaning’’ and thus circumscribe the psychological attributes, features, or properties of situations. As such, similarly to how persons can be described with traits, situations can be described with characteristics (Edwards & Templeton, 2005). Because there is a cornucopia of characteristics that can be used to describe situations (Edwards & Templeton, 2005; Magnusson, 1981), it is important to reduce this vast amount to a smaller set of dimensions. Recently, Rauthmann and colleagues (2014) examined, in a large multinational data set, the factor structure of the 89-item Riverside Situational Q-Sort (RSQ; Wagerman & Funder, 2009), a measure that broadly assesses psychological situation characteristics (Sherman, Nave, & Funder, 2010, 2012, 2013). They uncovered the ‘‘Situational Eight’’ DIAMONDS: Duty (Does something need to be done?), Intellect (Is deep cognitive processing required?), Adversity (Is someone threatened by external forces?), Mating (Is there an opportunity to attract potential mates?), pOsitivity (Is the situation pleasant?), Negativity (Can the situation arouse negative feelings?), Deception (Can others be trusted?), and Sociality (Is social interaction possible or

2015 Hogrefe Publishing

expected?). These DIAMONDS encompass most dimensions previously identified in existing situational taxonomies (Rauthmann et al., 2014). Moreover, their work yielded a 32-item version of the RSQ tailored to the Situational Eight: the RSQ-8.

The Current Work The overarching aim of this work is to provide psychometrically sound yet economic scales of psychological situation characteristics. This aim is motivated by the fact that there is a staggering paucity of standardized and validated situation measures (Hogan, 2009). Attesting to this, an informal survey was conducted with eight scholars from a personality/social psychological background with expertise in situations and Person · Situation interactions. They were asked to think of a basic research design where they would want to broadly sample characteristics of persons and of situations. Specifically, they should name three personality measures with the constructs they assessed as well as three situation measures with their constructs. The results can be summarized as follows. First, all scholars could readily name three personality measures and their assessed constructs. Second, the only previously validated situation

European Journal of Psychological Assessment 2016; Vol. 32(2):155–164 DOI: 10.1027/1015-5759/a000246

156

J. F. Rauthmann & R. A. Sherman: Assessment of Situation Characteristics

measure named was the RSQ. Third, some scholars indicated that they would make use of ad hoc constructed measures. Together, there does not seem to be (knowledge of) a standardized and validated measure that would allow to broadly assess characteristics of situations. This is surprising given that many designs could benefit from the assessment of how participants have perceived the situation (e.g., person-environment fit, experimental manipulation of situational stimuli, contextualized personality-reaction patterns, etc.). A psychometrically sound measure of situation characteristics is thus in dire need. Rauthmann and colleagues (2014) proposed the RSQ-8 to capture the DIAMONDS with 32 items (i.e., four per dimension). The RSQ-8 items were taken from the 89-item RSQ (Version 3.15). The RSQ-8 showed a clear factor structure, good internal consistency, high interrater agreement, construct validity, and even incremental predictive abilities. Nonetheless, there is room for improvement of this measure. Specifically, Rauthmann and colleagues (2014) suggested edits to the already existing items as well as incorporating new items. Moreover, they called for a more sophisticated psychometric analysis of the RSQ-8. This study thus seeks to revise, extend, and optimize the RSQ-8.

Methods Participants Data were gathered online from a German sample of N = 547 (407 women, 140 men) with a mean age of 28.01 years (SD = 10.47; range: 15–77 years), of which 55.8% had their general university entrance qualification and 36% a university degree. Due to missing values, the analyzable sample dropped to N = 507 (385 women, 122 men; age: M = 27.99, SD = 10.49, range: 15–77 years; 56.2% university entrance qualification and 36.5% university degree).

Procedure Participants took part in an online study (without payment) advertised via multiple platforms (mailing lists, online social networks, etc.). They were first prompted to think about the situation they were in 24 hr prior and then to write open-ended responses to five questions: What was happening? Who was with you? What were you doing? Where were you? What time was it (approximately)? Subsequently, participants rated their situation on the RSQ-8 (plus additional items) as well as some other situation measures (not of relevance to this study). Lastly, participants completed a Big Five personality questionnaire (not of relevance to this study) and obtained immediate personalized feedback on their traits. 1

Measures Several measures were administered, but for this study only the RSQ-8 plus its eight additional items, which is henceforth referred to as the RSQ-8*, is of relevance. Participants indicated on a scale from 1 (= not at all) to 7 (= totally) how much each item applied to the situation they had previously described. Table 1 lists the original (and slightly rephrased) 32 RSQ-8 items plus one additional item per dimension (marked with an asterisk). These extra-items were newly generated based on the content each DIAMONDS dimension is supposed to capture as well as Rauthmann and colleagues’ (2014) suggestions. With now five items per dimension there was a broader basis to select three items for a brief measure of the Situational Eight. The end-product should be a streamlined 24-item measure (i.e., three items per dimension) with DIAMONDS scales that are relatively precise and informative.

Data-Analytical Strategy The item selection process was guided by (a) classical test theory (CTT), (b) item response theory (IRT), and (c) conceptual considerations. Relevant CTT findings were itemscale correlations and internal consistency reliabilities (Cronbach’s alpha). Relevant IRT findings were threshold and discrimination parameters, along with Item Information Curves (IICs) and Test Information Functions (TIFs).1 First, item threshold parameters bn 1 (where n is the number of response options on an ordinal/polytomous response scale) indicate the difficulty of each response option. Positive values are better for measuring high levels of a characteristic (i.e., they are more difficult to be endorsed), and negative values for low levels (i.e., they are easier to be endorsed). Second, discrimination parameters a indicate to what extent items are helpful for describing latent dimensions; thus, the relevance of each item to a dimension can be estimated. Third, IICs indicate how much information each item provides to a scale. Items perform poorly if they (a) provide no information (= flat curve) or (b) two or more items are redundant in their provided information (= highly similar IICs). Lastly, TIFs indicate how precise the respective total scale scores are at different levels of the latent dimensions measured. It can be used to examine points of least and highest measurement precision (i.e., reliability). We performed IRT analyses for ordered polytomous data with a graded response model (Samejima, 1969) via the grm function in the ltm R package (Rizopoulos, 2006). After forming the optimized S8*, we compared the RSQ-8* and the S8* regarding their ‘‘nomological correlations’’ with information contained in participants’ situation descriptions. First, 52 cues were extracted with the Linguistic Inquiry Word Count (German LIWC; Wolf et al., 2008). Second, the 52 cues were correlated with the DIAMONDS dimensions from both the RSQ-8* and S8*. Third, for each

Item Characteristics Curves (ICCs) and Item Operation Characteristics Curves (OCCs) can be found in the Electronic Supplementary Material (ESM) 1.

European Journal of Psychological Assessment 2016; Vol. 32(2):155–164

2015 Hogrefe Publishing

Item-no. and Wording

Duty 1. A job needs to be done. 5. *I have to fulfill my duties. 4. Task-oriented thinking is required. 3. Minor details are important. 2. I am being counted on. Intellect 1. The situation contains intellectual stimuli. 2. There is the opportunity to demonstrate intellectual capacities. 5. *Information needs to be deeply processed. 3. There is the opportunity to express unusual ideas and points of views. 4. The situation evokes values regarding life styles or politics. Adversity 2. I am being blamed for something. 1. I am being criticized. 3. I am being threatened by someone or something. 4. I am being dominated or bossed around. 5. *Troubles thwart me. Mating 5. *The situation is sexually charged. 2. The situation contains stimuli that could be construed sexually. 1. Potential sexual or romantic partners are present. 3. Physical attractiveness is relevant. 4. Members of the other sex are present. 2.39 2.31 2.33 2.12 2.24 2.11 2.17

2.20 2.23

2.12

1.08 1.36 0.83 1.28 2.02 1.22 1.63 2.34 1.76 2.82

4.20 3.66 3.89 3.83

3.40 3.76

3.41

1.35 1.67 1.26 1.57 2.85 1.51 1.92 2.54 2.09 4.41

3.59 3.47 3.59

Descriptives

2015 Hogrefe Publishing

.36

.41

.62

.59

.58

.42

.64

.65 .58

.67

.46

.59

.62

.73

.78

.43 .34

.73 .71 .75

.75

.69

.60

.63

.66

.83

.69

.69 .74

.70

.85

.81

.80

.77

.76

.81 .83

.72 .72 .71

a*RSQ-8*

CTT findings

Table 1. Classical Test Theory and Item Response Theory findings

1.05

0.43

0.48

0.39

0.82

0.65

0.74

0.52 1.23

1.09

0.86

0.85

0.66

0.79

0.87

2.80 2.06

0.51 0.52 0.70

0.92

0.96

0.67

0.80

1.16

0.25

1.26

0.79 1.65

1.40

0.44

0.41

0.09

0.45

0.47

1.72 1.06

0.12 0.20 0.33

0.80

1.35

0.79

1.05

1.42

0.79

1.61

1.06 2.04

1.63

0.05

0.13

0.22

0.16

0.12

1.16 0.40

0.07 0.06 0.03

0.56

1.79

0.95

1.33

1.81

1.39

1.84

1.39 2.28

1.78

0.61

0.22

0.54

0.24

0.16

0.32 0.91

0.30 0.37 0.33

IRT findings

0.33

2.52

1.20

1.67

2.24

3.10

2.21

1.68 2.54

2.07

1.50

1.00

0.94

0.71

0.65

0.66 2.32

0.57 0.84 0.76

0.03

2.99

1.52

2.26

2.64

3.95

2.42

2.11 2.75

2.32

2.54

1.60

1.42

1.24

2.02 4.08

1.08 1.49 1.40

.84

.73

.86

.71

.60

.53 .27

.59

.57

.89

.85

.52 .74

.77 .61

.78 .66 .74

.91

.81

.46

.63

.92

.91 .88 .79

S8*

.89

.41 .31

.90 .88 .81

RSQ-8*

(Continued on next page)

0.74

1.25

1.73

3.35

3.50

1.01

2.76

3.47 2.77

3.61

1.06

1.51

2.19

3.09

4.04

0.72 0.48

4.11 2.78 2.13

CFA loadingsa

J. F. Rauthmann & R. A. Sherman: Assessment of Situation Characteristics 157

European Journal of Psychological Assessment 2016; Vol. 32(2):155â&#x20AC;&#x201C;164

European Journal of Psychological Assessment 2016; Vol. 32(2):155–164

2.18 1.97 2.17 2.06 1.62

1.94 1.61 2.07 2.06 1.89

1.82 1.55 2.04 1.70 1.68

2.40 2.27 2.33 2.43 2.52

4.79 3.79 2.85 5.36

2.75 2.12 3.38 3.37 2.56

2.24 1.79 2.70 2.17 2.10

4.59 5.03 3.86 3.82 3.65

4.06

Descriptives

.58

.72

.78 .73

.79

.47

.52

.59

.64

.68

.74

.75

.74 .75

.78

.73 .73 .52 .23

.81

.89

.86

.84 .86

.84

.79

.78

.75

.74

.72

.88

.87

.88 .88

.87

.73 .73 .80 .86

.70

a*RSQ-8*

CTT findings

0.36

0.56

1.17 0.61

0.91

0.46

0.19

0.10

0.69

0.21

0.14

0.68

0.11 0.68

0.29

1.48 0.85 0.30 7.89

0.95

0.12

0.28

0.94 0.38

0.66

1.17

0.92

0.34

1.15

0.60

0.35

0.19

0.66 0.21

0.23

1.09 0.51 0.13 5.90

0.62

0.02

0.12

0.74 0.16

0.49

1.63

1.37

0.56

1.43

0.89

0.64

0.14

0.98 0.10

0.51

0.73 0.14 0.55 4.32

0.25

0.32

0.16

0.49 0.17

0.23

2.08

1.91

0.95

1.73

1.15

0.94

0.47

1.39 0.48

0.81

0.23 0.24 1.02 2.31

0.15

IRT findings

0.59

0.48

0.19 0.50

0.04

2.74

2.60

1.36

1.99

1.50

0.93

1.85 0.99

1.31

0.21 0.68 1.46 0.49

0.52

1.15

0.94

0.26 1.06

0.44

3.55

3.07

1.78

2.43

1.83

2.04

1.64

2.25 1.53

1.83

0.79 1.28 2.08 2.03

1.02

1.55

2.57

3.52 2.82

3.91

1.18

1.31

2.31

2.41

3.80

2.64

2.89

2.95 2.83

3.21

2.95 2.68 1.55 0.44

4.14

.81

.77

.70

.78

.78 .61

.81

.84 .81

.87

.57

.86

.67

.71

.59

.86

.78

.80

.87

.77 .81 .78

.84

.82

.81 .83 .58 .28

.57

.91

S8*

.90

RSQ-8*

CFA loadingsa

Notes. N = 507–508. Items are ordered from most informative to least informative (see a parameter). rS = item-scale correlation. a*RSQ-8* = Cronbach’s alpha (internal consistency) if item is deleted. CFA = confirmatory factor analysis. Items with an asterisk (*): newly generated extra-items. Bold items: selected for the S8*. Items were translated into English. German originals are obtainable upon request. aLoadings taken from Model 1 (freely intercorrelated characteristics) in Table 3.

pOsitivity 5. *The situation is joyous and exuberant. 1. The situation is pleasant. 3. The situation is humorous. 2. The situation is playful. 4. The situation is simple and clear-cut. Negativity 4. The situation could entail frustration. 5. * The situation is unpleasant. 2. The situation could elicit stress. 3. The situation could elicit feelings of tension. 1. The situation elicits feelings of anxiety. Deception 1. It is possible to deceive someone. 4. Someone in this situation could be deceptive. 5. *Not dealing with others in an honest way is possible. 2. A person or activity could be undermined. 3. The situation could elicit feelings of hostility. Sociality 5. *Communication with other people is important or desired. 1. Social interaction is possible. 3. Others show many communicative signals. 2. Close personal relationships are important or can develop. 4. A reassuring other person is present.

Item-no. and Wording

Table 1. (Continued)

158 J. F. Rauthmann & R. A. Sherman: Assessment of Situation Characteristics

2015 Hogrefe Publishing

J. F. Rauthmann & R. A. Sherman: Assessment of Situation Characteristics

DIAMONDS dimension, the 52 correlations of the RSQ-8* scale and the 52 correlations of the S8* scale were r-to-z transformed and the Burt/Tucker congruence coefficient (Burt, 1948; Lorenzo-Seva and ten Berge, 2006) computed.2 Such congruence coefficients index how similar the correlation patterns were. The S8* should tap nomological networks almost identical to the RSQ-8*, thus showing substantive congruence coefficients.

159

Table 2. Descriptive statistics RSQ-8*

Results

S8*

Dimension

Duty Intellect Adversity Mating pOsitivity Negativity Deception Sociality

3.71 3.66 1.74 2.49 4.17 2.84 2.20 4.19

1.70 1.68 0.99 1.39 1.52 1.62 1.31 1.97

.80 .83 .77 .71 .81 .90 .80 .88

3.55 3.71 1.43 2.05 3.90 3.17 2.24 4.09

2.13 1.91 0.93 1.38 1.76 1.78 1.54 2.10

.90 .86 .71 .61 .80 .86 .80 .85

Note. N = 507–508.

CTT and IRT Analyses Descriptive statistics for RSQ-8* items are presented in Table 1 and for scales in Table 2.3 First, internal consistencies could be increased for several DIAMONDS dimensions by dropping items. Second, we computed confirmatory factor analyses (CFAs) with the CFA function in the lavaan R package (Rosseel, 2012) for the RSQ-8* by specifying four models (freely correlated dimensions; uncorrelated/orthogonal dimensions; correlations fixed between dimensions; correlations fixed to 1 between dimensions). These findings are summarized in Table 3 under ‘‘RSQ-8*.’’ The freely correlated dimensions model fit the data best, CFI = .79, TLI = .77, RMSEA = .08, SRMR = .12. Intercorrelations of latent DIAMONDS dimensions from the RSQ-8* are presented in Table 4. Factor loadings from this model are listed in Table 1 under ‘‘CFA loadings.’’ As can be seen, the items performing

poorly on the CCT findings also did not show strong factor loadings. Third, as can be seen under ‘‘IRT findings’’ in Table 1, these were also the items that produced low discrimination parameters and thresholds that were at odds with the rest of the items. The IICs for each item can be found in Figure 1. Fourth, total information of each RSQ-8* scale and TIFs can be found in Table 5 and Figure 2, respectively.

Deriving the S8* Considering the CTT and IRT findings from Table 1 (e.g., high item-scale correlations, high a parameters, high CFA loadings) as well as conceptual considerations (e.g., content

Table 3. Confirmatory factor analysis findings Fit indices AIC

BIC

BICadj.

dfv

RSQ-8* 1 2 3a 3b

74593.29 75695.31 75437.13 75484.25

75049.97 76033.59 75779.64 75822.53

74707.16 75779.66 75522.54 75568.60

3238.77 4396.79 4136.61 4185.73

712 740 739 740

S8* 1 2 3a 3b

44330.55 45133.90 44962.51 57627.21

44651.92 45336.87 45169.71 57830.39

44410.69 871.27 224 < 45184.51 1730.62 252 < 45014.18 1557.23 251 < 57678.03 14224.13 252 <

Model

< < < <

Model comparisons

CFI TLI RMSEA 90% CI RMSEA SRMR

.001 .001 .001 .001

.790 .696 .718 .714

Ddf

.770 .680 .702 .698

.084 .099 .095 .096

.081–.087 .096–.102 .092–.098 .093–.099

.115 .208 .184 .212

– 1158.00 897.84 946.96

– – 28 < .001 27 < .001 28 < .001

.001 .900 .877 .001 .771 .750 .001 .798 .778 .001 # #

.075 .108 .101 .331

.070–.081 .103–.112 .097–.106 .369–.335

.070 – .192 859.35 .169 685.96 # 13353

– – 28 < .001 27 < .001 28 < .001

Notes. N = 507. # = not properly estimated; AIC = Akaike Information Criterion; BICadj. = Bayesian Information Criterion (adjusted for sample size); CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; RMSEA = Root Mean Square Error of Approximation; SRMR = Standardized Root Mean Square Residual. Model 1: freely correlated dimensions. Model 2: uncorrelated/orthogonal dimensions. Model 3a: correlations fixed between dimensions. Model 3b: correlations fixed to 1 between dimensions.

2 3

‘‘Regular’’ vector correlations were also computed by correlating the r-to-z transformed vectors with Pearson’s correlation. The rounded parameters were almost identical to the Burt/Tucker congruence coefficients. More detailed item-level analyses of the RSQ-8* items can be found in the Electronic Supplementary Material (ESM) 2.

2015 Hogrefe Publishing

European Journal of Psychological Assessment 2016; Vol. 32(2):155–164

160

J. F. Rauthmann & R. A. Sherman: Assessment of Situation Characteristics

Table 4. Latent dimension intercorrelations Dimensions Duty Intellect Adversity Mating pOsitivity Negativity Deception Sociality

Duty – .32 .26 .09 .39 .45 .20 .11

Intellect .31 – .03 .10 .21 .14 .27 .32

Adversity .14 .02 – .19 .25 .68 .57 .13

Mating

pOsitivity

Negativity

Deception

Sociality

.08 .08 .22 – .21 .14 .38 .33

.43 .15 .22 .25 – .53 .03 .64

.50 .18 .47 .14 .49 – .50 .01

.14 .24 .42 .40 .02 .37 – .35

.12 .25 .15 .40 .58 .04 .38 –

Notes. N = 507. Intercorrelations based on Model 1 (freely correlated dimensions) of Table 3. Below diagonal: RSQ-8*. Above diagonal: S8*.

coverage), we selected three of the five items for each DIAMONDS scale to constitute the new S8*. The selected items are marked bold in Table 1, and descriptive statistics of S8* scales are presented in Table 2.4 It is noteworthy that in four instances, we did not select an item with sufficiently high information into the S8*. For Mating, we decided against ‘‘The situation contains stimuli that could be construed sexually’’ because it performed similarly (but a bit worse) to ‘‘The situation is sexually charged’’ which is a more straightforward item that does not rely on jargon (‘‘stimuli are construed’’). For pOsitivity, ‘‘The situation is playful’’ was chosen instead of ‘‘The situation is humorous’’ which actually showed more information. The latter item seemed too close to the two top-performing items, so the former was chosen to diversify the pOsitivity domain. For Negativity, the actually well performing ‘‘The situation is unpleasant’’ was not chosen because it was too unspecific and mimicked pOsitivity’s ‘‘The situation is pleasant’’ too much (thus artificially increasing the covariance among pOsitivity and Negativity by including the item). For Sociality, the actually well performing ‘‘Social interaction is possible’’ was rendered redundant with ‘‘Communication with other people is important or desired’’ which performed similarly but better. Additionally, the item is too unspecific and imprecise because different kinds of social interaction could be possible in many forms under many circumstances. Next, we subjected S8* items to CFAs, and these findings are presented in Table 3 under ‘‘S8*.’’ The freely correlated dimensions model fit the data best, CFI = .90, TLI = .88, RMSEA = .08, SRMR = .07, and its fit parameters outperformed those of the RSQ-8*. Factor loadings from this model are listed in Table 1. Intercorrelations of latent DIAMONDS dimensions from the S8* (average absolute r = .25) are presented in Table 4 and were similar to the RSQ-8* (average absolute r = .27). Total information of each S8* scale and TIFs can be found in Table 5 and Figure 2, respectively. The new S8* scales captured, on average, about 74% of information from the RSQ-8*. In general, the S8* scales behaved psychometrically very

similarly to the RSQ-8* scales with minimal information loss, but can be regarded as more economic forms.

Nomological Validity Table 6 presents the correlations of RSQ-8* and S8* scales with 52 LIWC-categories extracted from participants’ situation descriptions. The correlation patterns were highly similar between the RSQ-8* and the S8*, congruence coefficients = .89–.99. Additionally, the average absolute rs were virtually identical for both measures. Notably, the correlation patterns were conceptually sensible in both the RSQ-8* and S8*. For example, Duty correlated with work-related categories (e.g., Occupation, School, Job, Achievement); Intellect with work-related (e.g., School) and less physically oriented categories (e.g., Body); Adversity with negative emotions (e.g., Anger); Mating with the physically oriented categories (e.g., Sexual); pOsitivity with positive affective categories (e.g., Positive Emotions); Negativity with negative affective categories (e.g., Negative Emotions); Deception with certain cognitive mechanism categories (e.g., Discrepancy, Tentative); and Sociality with social categories (e.g., Communication, Family, Friends, etc.). These patterns also stand in concordance with, and conceptually replicate, Rauthmann and colleagues’ (in press, Studies 3 and 4) findings. Taken together, the RSQ-8* and S8* scales showed highly similar associations with nomological criteria and thus tap virtually identical nomological networks.

Discussion We aimed at optimizing the 32-item RSQ-8 (Rauthmann et al., 2014) by selecting maximally informative and precise items. In this process, an even shorter version with three items per DIAMONDS scale was created: the

More detailed item-level analyses of the S8* items can be found in the Electronic Supplementary Material (ESM) 3.

European Journal of Psychological Assessment 2016; Vol. 32(2):155–164

2015 Hogrefe Publishing

J. F. Rauthmann & R. A. Sherman: Assessment of Situation Characteristics

3 2

-1

-3

-2

-1

-3

-2

-1

-3

IIC of Intellect Item 3

IIC of Intellect Item 4

IIC of Intellect Item 5

3 2

Information

3 2 1

-1

-3

-2

-1

Information

3 2

2 1

0 -3

-2

-1

5 4 3 2 1 0 -3

-2

-1

Latent Theta

0 1 Latent Theta

IIC of Adversity Item 1

IIC of Adversity Item 2

IIC of Adversity Item 3

IIC of Adversity Item 4

-3

IIC of Adversity Item 5

3 2

-2

-1

-3

-2

-1

-3

-2

Latent Theta

-1

Information

-2

-1

-3

IIC of Mating Item 3

IIC of Mating Item 4

3 2

-2

-1

-3

-2

-1

-3

-2

-1

Information

3 2 1 0

-3

-2

-1

-3

-2

-1

Latent Theta

IIC of pOsitivity Item 1

IIC of pOsitivity Item 2

IIC of pOsitivity Item 3

IIC of pOsitivity Item 4

IIC of pOsitivity Item 5 6

3 2

-2

-1

-3

-2

-1

-3

-2

-1

Information

5 Information

2 1 0

-3

-2

-1

-3

-2

-1

Latent Theta

IIC of Negativity Item 2

IIC of Negativity Item 3

IIC of Negativity Item 4

IIC of Negativity Item 5 6

3 2

-2

-1

-3

-2

Latent Theta

-1

-3

-2

Latent Theta

IIC of Deception Item 1

-1

Information

5 Information

3 2

0 -2

-1

Latent Theta

IIC of Deception Item 2

-3

IIC of Deception Item 3

IIC of Deception Item 4

3 2

-2

-1

-3

-2

-1

-3

-2

-1

Information

1 0 -3

-2

-1

-3

-2

-1

Latent Theta

IIC of Sociality Item 1

IIC of Sociality Item 2

IIC of Sociality Item 3

IIC of Sociality Item 4

IIC of Sociality Item 5

2 1

-1

Latent Theta

0 -3

-2

-1

Latent Theta

3 2 1

-2

-1

Latent Theta

3 2 1 0

0 -3

Information

5 Information

Latent Theta

-2

IIC of Deception Item 5

-1

Latent Theta

-2

Latent Theta

-3

IIC of Negativity Item 1

IIC of Mating Item 5

-1

Latent Theta

-2

Latent Theta

0 -3

-1

Latent Theta

IIC of Mating Item 2

-2

-3

Latent Theta

IIC of Intellect Item 2 6

-3

-1

IIC of Intellect Item 1

-3

-2

Latent Theta

-2

0 -3

Latent Theta

Information

Information Information Information Information

Latent Theta

-3

Information

-1

Latent Theta

-3

Information

-2

Information

-2

IIC of Mating Item 1

Sociality

Latent Theta

Deception

-3

Negativity

-3

pOsitivity

Mating

5 4

Adversity

5 4

Information

IIC of Duty Item 5 6

Information

IIC of Duty Item 4 6

Information

IIC of Duty Item 3 6

Information

5 4

-3

Intellect

IIC of Duty Item 2 6

Information

Duty

Information

IIC of Duty Item 1 6

161

-3

-2

-1

Latent Theta

-3

-2

-1

Latent Theta

Figure 1. Item Information Curves. Gray-shaded = not selected for the S8*. Items can be found in Table 1 (see item numbers there). 2015 Hogrefe Publishing

European Journal of Psychological Assessment 2016; Vol. 32(2):155â&#x20AC;&#x201C;164

J. F. Rauthmann & R. A. Sherman: Assessment of Situation Characteristics

Table 5. Information

Duty Intellect Adversity Mating pOsitivity Negativity Deception Sociality Average

27.71 34.58 34.32 26.67 35.47 43.58 27.97 35.21 33.19

25.29 28.97 24.96 15.86 26.98 27.29 22.44 23.29 24.39

91.24 83.77 72.73 59.46 76.05 62.62 80.21 66.14 74.03

Duty S8* TIF (3 Items)

Duty

8 Information

Information ratio S8*: RSQ-8* (in %)

Information

Total information in three-item S8* scale

Optimized 3-item S8*

Duty TIF (5 Items)

6 4

0 -3

-2

-1

-3

Intellect

6 4

0 -2

-1

-3

-2

-1

Latent Theta

Information

6 4

8 6 4 2 0

-2

-4

-2

6 4

6 4 2

0 -4

-2

-4

-2

Latent Theta

pOsitivity TIF (5 Items)

pOsitivity S8* TIF (3 Items)

8 Information

Information

Mating S8* TIF (3 Items)

Information

Mating TIF (5 Items)

6 4 2

0 -3

-2

-1

-3

-2

-1

Latent Theta

Negativity TIF (5 Items)

Negativity S8* TIF (3 Items)

10 Information

Information

Latent Theta

8 6

2 0

0 -3

-2

-1

-3

-2

Latent Theta

-1

Latent Theta

Deception TIF (5 Items)

Deception S8* TIF (3 Items)

8 Information

Information

-4

6 4 2

0 -3

-2

-1

-3

-2

-1

Latent Theta

Sociality TIF (5 Items)

Deception S8* TIF (3 Items)

10 Information

Information

Adversity S8* TIF (3 Items)

Sociality

Deception

Latent Theta

Negativity

-3

pOsitivity

Mating

Adversity

Intellect S8* TIF (3 Items)

Intellect TIF (5 Items) 10

Adversity TIF (5 Items)

European Journal of Psychological Assessment 2016; Vol. 32(2):155–164

-1

Latent Theta

S8*. The scales of the S8* are informative, precise, and nomologically valid. Thus, the S8* represents an economical yet psychometrically sound measure of the Situational Eight DIAMONDS. It is important to consider a central limitation of the S8*. When creating short-version scales, an unavoidable conflict between item homogeneity and content coverage ensues. Usually, less homogeneous items are dropped (as done here) in favor of more homogeneous items to increase measurement information and precision. While this approach leads to high(er) internal consistency, it will restrict construct validity by limiting content coverage. As an example, the Mating domain from the DIAMONDS may draw from different themes such as desirable impression management, mate attraction, courtship, sexual tension, dating, relationship formation, mate guarding, romantic love, partner commitment, and many others, but for the S8* its content was restricted mostly to sexual tension. The short forms could, however, only be formed on the basis of Rauthmann and colleagues’ (2014) analyses of each DIAMONDS domain and their RSQ-8; there is no elaborated theory yet on which facets to expect for each DIAMONDS domain. Thus, in the absence of a hierarchical model of situation characteristics, we chose to maximize what the RSQ-8 already measured. Nonetheless, future research should examine which content each DIAMONDS domain covers, provide short-scales for facets, and maybe also revise some domain-level items once each domain becomes more fleshed out. Several research areas can benefit from the brief S8*. First, experimental designs employ manipulations of ‘‘objective’’ conditions (e.g., by displaying different stimuli to different groups of participants), but do not track whether the psychological situations of participants differed (as a function of the manipulation). Being able to briefly assess situation characteristics can be valuable for designs seeking a manipulation check. Second, experimental stimuli (e.g., situation vignettes or videos) may be optimized with respect to certain situation characteristics they convey. Thus, stimuli maximally representative for certain situation characteristics could be detected and selected with the S8*. Third, the S8* may be used for experience sampling and/or

-2

Latent Theta

Information

Dimension

Total information in five-item RSQ-8* scale

Pre-optimized 5-item RSQ-8 Version

Information

162

8 6 4

0 -3

-2

-1

-3

Latent Theta

-2

-1

Latent Theta

Figure 2. Test Information Functions. 2015 Hogrefe Publishing

J. F. Rauthmann & R. A. Sherman: Assessment of Situation Characteristics

163

Table 6. Nomological correlations of the RSQ-8* and S8* DIAMONDS dimensions with LIWC-categories Duty Criteria

Intellect

Adversity

Mating

pOsitivity

Negativity

Deception

Sociality

RSQ-8* S8* RSQ-8* S8* RSQ-8* S8* RSQ-8* S8* RSQ-8* S8* RSQ-8* S8* RSQ-8* S8* RSQ-8* S8*

I We Self You Negate Assent Affect Positive emotions Positive feelings Optimism Negative emotions Anxiety Anger Sadness Cognitive mechanisms Cause Insight Discrepancy Inhibition Tentative Certainty Senses See Hear Social Communication Other reference Friends Family Humans Inclusion Exclusion Motion Occupation School Job Achievement Leisure Home Sports TV Music Money Metaphors Religion Death Physical Body Sexual Eating Sleep Grooming Average absolute r Congruence coefficients

.14

.13

.09

.11 .21

.22

.16 .28

.17 .27

.14

.12

.14 .18

.13 .17

.17 .29

.21 .31

.19 .30

.15 .24

.25 .11 .13 .16 .12

.22 .09 .12 .14 .12

.10 .22

.19

.24

.20

.09

.09 .12

.12

.20

.10

.52 .22 .29 .37 .26 .12 .18

.51 .24 .27 .32 .25 .11 .20

.12 .19

.18 .29

.16 .25

.09 .11 .09 .11

.14 .17

.10

.09

.11

.27 .32

.25 .22 .30

.23 .21 .30

.09

.10

.27 .31

.12

.09

.10

.09 .16

.14

.12

.10

.18

.15

.09

.22 .14 .19

.19 .10 .15

.18 .09

.15

.11

.15 .13

.13 .13

.10 .11

.11

.10

.09

.12

.09 .16 .11 .15 .15

.09

.17 .18 .12

.09 .16

.09

.32 .20 .26 .17 .14 .13

.34 .22 .27 .16 .10

.20

.15

.14

.10 .16 .23 .22 .13

.15 .15 .17

.15 .15

.11 .10

.12

.11

.09

.21

.16

.33

.31

.30 .11 .10

.30

.15 .27 .18

.16 .26 .17

.13

.15

.19

.13 .10

.12 .10

.17 .13

.19 .15

.15

.12 .13

.09 .13

.13 .14 .11

.09 .11 .10

.10

.14 .09 .09 .14

.15 .11

.09

.12 .09

.13 .18

.14

.09

.09 .09 .09

.10 .20 .09

.10 .19

.08

.08 .95

.10 .10

.14 .09

.13 .15

.10 .11 .13

.07

.06 .91

.06

.04 .89

.13

.11

.14 .10

.15 .09

.07

.07 .96

.12 .09

.10

.09 .99

.09 .09

.14

.06 .98

.13

.16 .13 .09 .11

.13

.09

.07

.13

.10

.09

.16 .13 .10

.05

.12

.94

.99

Notes. N = 506â&#x20AC;&#x201C;507. Statistically significant correlations (rs .09, ps < .05) are marked bold and shaded gray. The full correlation matrix can be found in the Electronic Supplementary Material (ESM) 4.

2015 Hogrefe Publishing

European Journal of Psychological Assessment 2016; Vol. 32(2):155â&#x20AC;&#x201C;164

164

J. F. Rauthmann & R. A. Sherman: Assessment of Situation Characteristics

ambulatory assessment research where participants are asked multiple times in their daily lives about their situations. In such designs, short but validated scales are essential.

Electronic Supplementary Material The electronic supplementary material is available with the online version of the article at http://dx.doi.org/10.1027/ 1015-5759/a000246 ESM 1. Text document Item characteristics curves and item operation characteristics curves. ESM 2. Item analyses tables (PDF) Item analyses of the RSQ-8* with the score.items function from the ‘‘psych’’ R package (Revelle, 2014). ESM 3. Item analyses tables (PDF) Item analyses of the S8* with the score.items function from the ‘‘psych’’ R package (Revelle, 2014). ESM 4. Table (PDF) Nomological correlations of the RSQ-8* and S8* DIAMONDS dimensions with LIWC-categories.

References Burt, C. (1948). The factorial study of temperamental traits. British Journal of Statistical Psychology, 1, 178–203. Edwards, J. A., & Templeton, A. (2005). The structure of perceived qualities of situations. European Journal of Social Psychology, 35, 705–723. Fleeson, W. (2007). Situation-based contingencies underlying trait-content manifestation in behavior. Journal of Personality, 75, 825–861. Hogan, R. (2009). Much ado about nothing: The personsituation debate. Journal of Research in Personality, 43, 249. Lorenzo-Seva, U., & ten Berge, J. M. F. (2006). Tucker’s congruence coefficient as a meaningful index of factor similarity. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 2, 57–64. doi: 10.1027/1614-2241.2.2.57 Magnusson, D. (1981). Toward a psychology of situations: An interactional perspective. Hillsdale, NJ: Erlbaum. Rauthmann, J. F. (2012). You say the party is dull, I say it is lively: A componential approach to how situations are perceived to disentangle perceiver, situation, and perceiver · situation variance. Social Psychological and Personality Science, 3, 519–528.

European Journal of Psychological Assessment 2016; Vol. 32(2):155–164

Rauthmann, J. F., Gallardo-Pujol, D., Guillaume, E. M., Todd, E., Nave, C. S., Sherman, R. A., . . . Funder, D. C. (2014). The Situational Eight DIAMONDS: A taxonomy of eight major dimensions of situation characteristics. Journal of Personality and Social Psychology, 107, 677–718. Revelle, W. (2014). psych: Procedures for Personality and Psychological Research. R package version 1.4.8. Retrieved from http://personality-project.org/r, http://personalityproject.org/r/psych-manual.pdf Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17, 1–25. Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48, 1–36. Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100–114. Sherman, R. A., Nave, C. N., & Funder, D. C. (2010). Situational similarity and personality predict behavioral consistency. Journal of Personality and Social Psychology, 99, 330–343. Sherman, R. A., Nave, C. S., & Funder, D. C. (2012). Properties of persons and situations related to overall and distinctive personality-behavior congruence. Journal of Research in Personality, 46, 87–101. Sherman, R. A., Nave, C. S., & Funder, D. C. (2013). Situational construal is related to personality and gender. Journal of Research in Personality, 47, 1–14. Wagerman, S. A., & Funder, D. C. (2009). Situations. In P. J. Corr & G. Matthews (Eds.), Cambridge Handbook of Personality Psychology. Cambridge, UK: Cambridge University Press. Wolf, M., Horn, A., Mehl, M., Haug, S., Pennebaker, J. W., & Kordy, H. (2008). Computergestützte quantitative Textanalyse: Äquivalenz und Robustheit der deutschen Version des Linguistic Inquiry and Word Count [Computer-aided quantitative textanalysis: Equivalence and reliability of the German adaptation of the Linguistic Inquiry and Word Count]. Diagnostica, 2, 85–98.

Date of acceptance: September 17, 2014 Published online: February 27, 2015 John Rauthmann Institut für Psychologie Humboldt-Universität zu Berlin Unter den Linden 6 10099 Berlin Germany Tel. +49 30 2093-1836 E-mail jfrauthmann@gmail.com

2015 Hogrefe Publishing

Original Article

Ultra-Brief Measures for the Situational Eight DIAMONDS Domains John F. Rauthmann1 and Ryne A. Sherman2 1

Humboldt-Universität zu Berlin, Germany, 2Florida Atlantic University, FL, USA

Abstract. People perceive psychological situations on the ‘‘Situational Eight’’ DIAMONDS characteristics (Duty, Intellect, Adversity, Mating, pOsitivity, Negativity, Deception, Sociality; Rauthmann et al., 2014). To facilitate situational assessment and economically measure these dimensions, we propose four ultra-brief one-item scales (S8-I, S8-II, S8-III-A, S8-III-P) validated against the already existing 24-item S8*. Convergent/discriminant validity of the four S8-scales was examined by analyses of the multi-characteristics multi-measures matrix, and their nomological associations with external criteria were compared. Application areas of the scales are discussed. Keywords: situations, situation assessment, psychological situation characteristics, Situational Eight DIAMONDS, Riverside Situational Q-Sort (RSQ)

People perceive and describe situations on psychological situation characteristics (Edwards & Templeton, 2005; Rauthmann, 2012). Recently, Rauthmann et al. (2014) examined with the Riverside Situational Q-Sort (RSQ), a measure that broadly assesses psychological situation characteristics (Sherman, Nave, & Funder, 2010; Wagerman & Funder, 2009), major dimensions of situation characterization. They uncovered the ‘‘Situational Eight’’ DIAMONDS: Duty (Should something be done?), Intellect (Is deep cognitive processing necessary?), Adversity (Is someone threatened?), Mating (Is there an opportunity to attract someone?), pOsitivity (Is the situation nice?), Negativity (Can negative feelings arise?), Deception (Is mistrust an issue?), and Sociality (Is social interaction possible, desired, or necessary?). The DIAMONDS encompass many dimensions previously identified in existing situational taxonomies and, as such, are equipped to provide a common language to previously insular findings for coordinated cumulative knowledge building in situation research (see Big Five in personality psychology). Further, they are rated with substantive inter-rater agreement, are embedded into a rich nomological network (of distal cues and goalaffordances), and harbor predictive abilities of behavior (Rauthmann et al., 2014). Their creation was motivated by the lack of a taxonomy and measurement tools of major dimensions of situation characteristics. Thus, the authors also provided a 32-item version of the original RSQ tailored to the Situational Eight, the RSQ-8, which has been

2015 Hogrefe Publishing

optimized to the 24-item S8* (Rauthmann & Sherman, 2015). Although already short(er), the S8* is still difficult to use in designs that command maximum economy, such as experience sampling or ambulatory assessment designs where participants are typically asked several times per day how they feel and behave. It would be valuable to include situational assessments in such designs (see Fleeson, 2007), but no feasible, ultra-short, and psychometrically validated measures exist as of yet. Thus, we seek to validate multiple ultra-brief scales of the Situational Eight DIAMONDS dimensions to be used in research that cannot afford long scales.

Methods Participants and Procedure Parts of the current data have already been reported in Rauthmann and Sherman (2015). German participants were recruited via mass emails and newsletters. From an initial sample of N = 547, full data from N = 507 (385 women; age: M = 27.99, SD = 10.49, range: 15–77 years; 56.2% university entrance qualification) could be used. Participants were explained that they would report and rate a daily situation. Specifically, they were prompted to think about the situation they were in 24 hr before and write

European Journal of Psychological Assessment 2016; Vol. 32(2):165–174 DOI: 10.1027/1015-5759/a000245

166

J. F. Rauthmann & R. A. Sherman: Ultra-Brief Assessment of Situation Characteristics

Table 1. Overview of scales Dimension Duty Intellect Adversity

Mating pOsitivity Negativity

Deception Sociality

S8-I: What applies to your situation?

S8-II: The situation contains . . .

Work has to be done. Deep thinking is required. Somebody is being threatened, accused, or criticized. Potential romantic partners are present. The situation is pleasant. The situation contains negative feelings (e.g., stress, anxiety, guilt, etc.). Somebody is being deceived. Social interactions are possible or required.

Work, tasks, duties Intellectual, aesthetic, profound things Threat, criticism, accusation

. . .complete work and fulfill duties. . . .use intellectual capacities.

Romance, sexuality, love Positive, pleasant, nice things Negative things, unpleasant things, bad feelings

. . .appear attractive to someone.

Deceit, lie, dishonesty

. . .be deceived or betrayed by someone.

Communication, interaction, social relationships

. . .talk with other people and socialize.

responses to five questions: What was happening? Who was with you? What were you doing? Where were you? What time was it (approximately)? Next, participants rated their situation on multiple scales capturing the Situational Eight DIAMONDS.

S8-III-A: I have to/ must/need to. . .

S8-III-P: I can/could/ may . . .

. . .be criticized or accused by someone.

. . .enjoy something. . . .suffer something negative.

ultra-brief measures of psychological characteristics, researchers can use the version they see fit for their research and its aims.

Data-Analytical Strategy Measures

Construct Validity

Participants indicated, for all measures, on a scale from 1 (= not at all) to 7 (= totally) how much each item applied to the situation they had described. Table 1 lists items from the S8-I, S8-II, S8-III-A, and S8-III-P. The instruments were presented in the following order, with items randomly displayed within the respective instruments: S8-I, S8-II, S8*, S8-III-A, S8-III-P. The S8-I is a one-item form of its predecessors, the RSQ-8 (Rauthmann et al., 2014) and the S8* (see Rauthmann & Sherman, 2015). Its items (short, but full sentences) have been revised to be better suitable for experience sampling. The S8-II (containing nouns only) is intended for research that seeks to examine which Situational Eight topics certain stimuli (e.g., texts, videos, audio-files, vignettes, etc.) cover. The S8-III-A and S8-III-P are self-referential parallel forms made up from a modified version of the S8-I. All items begin with ‘‘I . . .,’’ but vary in their instructions: The A-version aims to assess perceived ‘‘affordances’’ (‘‘I must/have to/need to. . .’’), while the P-version perceived ‘‘possibilities’’ or opportunities (‘‘I can/could/may. . .’’). The versions thus reflect two different kinds of situation conceptualizations: situations may (a) afford, elicit, or constrain behavior (e.g., Chemero, 2001; Stoffregen, 2000) or (b) provide opportunities for different courses of action (e.g., Argyle, Furnham, & Graham, 1981; Saucier, Bel-Bahar, & Fernandez, 2007). With these different

With eight characteristics in five measures, we can examine the multi-characteristic multi-measure (MCMM) matrix for convergent/discriminant construct validity. Marsh and Grayson (1995) recommended comparing different dataanalytical methods; should findings converge, then conclusions can be regarded as robust. Thus, we (a) visualized scale-dimension correlations, (b) examined the MCMM correlation matrix (see Campbell & Fiske, 1959), (c) employed confirmatory factor analyses (CFA) on the MCMM matrix, and (d) used Hayashi and Hays’ (1987) MTMM program to obtain another analysis of the MCMM matrix.

European Journal of Psychological Assessment 2016; Vol. 32(2):165–174

Nomological Validity We selected 54 cues, extracted with the Linguistic Inquiry Word Count (German LIWC: Wolf et al., 2008) from participants’ situation descriptions, to be correlated with the DIAMONDS dimensions from the five S8-measures. For each DIAMONDS dimension, the 54 correlations with all S8-scales were r-to-z transformed and correlated with each other. Such vector correlations index how similar correlation patterns are. The S8-scales should show substantive vector correlations, tapping highly similar nomological networks. 2015 Hogrefe Publishing

J. F. Rauthmann & R. A. Sherman: Ultra-Brief Assessment of Situation Characteristics

167

Table 2. Descriptive statistics S8*a

S8-Ib

S8-IIc

S8-III-Ad

S8-III-Pe

Dimension

Duty Intellect Adversity Mating pOsitivity Negativity Deception Sociality

3.55 3.71 1.43 2.05 3.90 3.17 2.24 4.09

2.13 1.91 0.93 1.38 1.76 1.78 1.54 2.10

3.60 3.48 1.58 2.56 4.91 2.71 1.47 4.85

2.36 2.07 1.29 2.30 1.86 1.94 1.22 2.17

3.83 3.61 1.69 2.15 4.82 2.35 1.53 4.55

2.28 2.03 1.37 1.73 1.83 1.72 1.24 2.26

3.63 4.04 1.38 1.86 4.11 1.94 1.17 4.44

2.43 2.21 1.04 1.59 2.39 1.64 0.70 2.50

4.16 4.76 2.99 3.26 5.12 3.11 2.35 5.26

2.48 2.24 2.39 2.41 2.21 2.30 2.05 2.30

Notes. Response scale: 1 (= not at all) to 7 (= totally). an = 507–508. bn = 542. cn = 547. dn = 493. en = 473.

Results Construct Validity Visual Inspection Descriptive statistics can be found in Table 2. Visualizations of scale-dimension correlations with the MTMM function from the psy R package (Falissard, 2012) are presented in Figure 1. Gray bars reflect the (magnitude of) correlations of, for example, all Duty scales (S8*, S8-I, S8-II, S8-III-A, S8-III-P) on the Duty dimension. The other seven white bars reflect the correlations of the other dimensions with Duty. Ideally for convergent/discriminant validity, the gray bars would stand out (i.e., higher correlations of Duty scales with the Duty dimension). This was true for all DIAMONDS dimensions although Adversity and Deception tended to show less discriminant validity, being also strongly correlated with Negativity. MCMM Correlations The full MCMM correlation matrix is presented in Online Supplemental Material (OSM) A. A condensed account of (a) convergent coefficients (mono-characteristic heteromeasure), (b) highest discriminant coefficients (heterocharacteristic hetero-measure), and (c) comparisons between convergent and discriminant coefficients is presented in Table 3. As can be seen under ‘‘Convergent correlations,’’ the average convergent correlation (ACC) was .63 across all DIAMONDS dimensions and measures. The ACC was highest for Duty (.79) and lowest for Deception (.38), and highest between S8* and S8-I (.74) and lowest between S8-II and S8-III-P (.44). As can be seen under ‘‘Discriminant correlations,’’ the average maximum absolute discriminant correlation (ADC) was .32 across all DIAMONDS dimensions and measures. The ADC was highest for Adversity (.38) and lowest for Intellect (.16), and highest between S8* and S8-I as well as S8-I and S8-II (.37) and lowest between S8-III-A and S8-III-P (.23). As can be seen under ‘‘Convergent-Discriminant Comparisons,’’ convergent correlations were usually higher than discriminant correlations, 2015 Hogrefe Publishing

with only six exceptions occurring for Adversity and Deception between the measures S8-I and S8-II with S8III-A and S8-III-P, respectively. In those cases, the two dimensions were more strongly associated with heteromethod Negativity (see Figure 1). The values at the bottom of Table 3 indicate what percent a discriminant correlation covers from a convergent correlation. Across all DIAMONDS dimensions and measures, discriminant correlations were roughly only half the magnitude of the convergent correlations (51%, mean difference: .31 r points). On average, the dimension with least discriminant validity was Deception (82%) and the highest was Intellect (24%). Together, the MCMM matrix suggests good convergent validities and sufficient discriminant validities, albeit with the caveat that Adversity and Deception showed for some measures stronger associations with Negativity.

MCMM-CFA CFAs provide a more stringent way of analyzing the full MCMM matrix (OSM A). Several models were specified with the DIAMONDS scales loading onto eight characteristics factors and five measures factors (S8*, S8-I, S8-II, S8-III-A, S8-III-P): – Model 1 (baseline) = freely correlated characteristics, freely correlated measures; – Model 2a: fixed intercorrelations of characteristics (fixed to 1), freely correlated measures; – Model 2b: fixed intercorrelations of characteristics, freely correlated measures; – Model 2c: no intercorrelations of characteristics, freely correlated measures; – Model 3a: freely correlated characteristics, fixed intercorrelations of measures (fixed to 1); – Model 3b: freely correlated characteristics, fixed intercorrelations of measures; – Model 3c: freely correlated characteristics, no intercorrelations of measures. Models 2a–c and 3a–c were tested against Model 1 (in which they were nested) in successive order. If Model 1 fits better than Models 2, then different characteristics are European Journal of Psychological Assessment 2016; Vol. 32(2):165–174

168

J. F. Rauthmann & R. A. Sherman: Ultra-Brief Assessment of Situation Characteristics

Correlation of DIAMONDS items with other DIAMONDS items

1 Duty

2 Intellect

3 Adversity

4 Mating

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

D I AMON DS

5 pOsitivity

6 Negativity

7 Deception

8 Sociality

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

D I AMON DS

Figure 1. Construct validity visualization

sampled, and the characteristics are not orthogonal. Next, if Model 1 fits better than Models 3, then different measures were sampled (i.e., measures are not fully exchangeable), and the measures are not orthogonal. Note that we expected (a) different and intercorrelated characteristics dimensions (see Rauthmann et al., 2014, OSM C) as well as (b) somewhat different and intercorrelated measures. Thus, Model 1 should fit best. Fit indices of these CFA models are presented in Table 4. As can be seen, Model 1 fit the data best compared to all other models (all ps of Dv2 tests < .001), CFI = .92, TLI = .91, RMSEA = .06, SRMR = .06. Thus, there were different/distinguishable and interrelated characteristics and measures, respectively. While Model 1 fit the data better than Models 3a and 3b (constraining correlations of

European Journal of Psychological Assessment 2016; Vol. 32(2):165â&#x20AC;&#x201C;174

measure factors), the practical difference in fit parameters was relatively small. We took this as evidence that, while the S8-scales sample the DIAMONDS dimensions each in a slightly different way and are hence not fully interchangeable, participants did not overly differentially respond to them. Factor loadings of DIAMONDS scales onto latent characteristics and measures factors from Model 1 are presented in Table 5. As can be seen, the mean convergent latent characteristic loading was .76 across all measures, with the highest for Duty (.88) and the lowest for Deception (.60). Further, the S8-III-P showed, on average, the least magnitude of convergent loadings (.60). Loadings across all characteristics dimensions on latent measure factors were, on average, small to moderate with .20, with the highest for

2015 Hogrefe Publishing

J. F. Rauthmann & R. A. Sherman: Ultra-Brief Assessment of Situation Characteristics

169

Table 3. Summary of the MCMM matrix Scale pairs

Duty

Intellect

Convergent correlations S8* · S8-I S8* · S8-II S8* · S8-III-A S8* · S8-III-P S8-I · S8-II S8-I · S8-III-A S8-I · S8-III-P S8-II · S8-III-A S8-II · S8-III-P S8-III-A · S8-III-P Average

.86 .80 .89 .77 .78 .82 .71 .75 .65 .78 .79

.76 .66 .80 .72 .63 .71 .61 .62 .54 .72 .68

Discriminant correlations S8* · S8-I S8* · S8-II S8* · S8-III-A S8* · S8-III-P S8-I · S8-II S8-I · S8-III-A S8-I · S8-III-P S8-II · S8-III-A S8-II · S8-III-P S8-III-A · S8-III-P Average

.38 .31 .47 .36 .26 .44 .33 .48 .33 .36 .37

Convergent-discriminant S8* · S8-I S8* · S8-II S8* · S8-III-A S8* · S8-III-P S8-I · S8-II S8-I · S8-III-A S8-I · S8-III-P S8-II · S8-III-A S8-II · S8-III-P S8-III-A · S8-III-P Average

Adversity

Mating

pOsitivity

Negativity

Deception

Sociality

Average

.65 .59 .67 .33 .76 .57 .18 .53 .17 .31 .50

.73 .55 .61 .54 .49 .42 .44 .43 .34 .53 .52

.77 .71 .67 .63 .79 .64 .61 .62 .56 .69 .68

.77 .61 .62 .47 .74 .69 .46 .62 .36 .50 .60

.45 .38 .38 .53 .68 .35 .13 .33 .15 .25 .38

.82 .77 .84 .74 .73 .77 .71 .72 .59 .73 .75

.74 .65 .72 .61 .71 .64 .51 .59 .44 .59 .63

.19 .18 .17 .14 .22 .10 .13 .19 .21 .07 .16

.47 .46 .44 .31 .50 .46 .23 .43 .25 .19 .38

.32 .27 .23 .27 .21 .18 .19 .29 .26 .24 .25

.45 .41 .39 .36 .43 .43 .23 .39 .22 .22 .36

.47 .46 .36 .31 .53 .35 .34 .33 .24 .28 .37

.30 .28 .28 .39 .50 .42 .20 .37 .21 .16 .31

.36 .32 .32 .36 .28 .26 .36 .26 .26 .32 .31

.37 .34 .34 .31 .37 .34 .25 .35 .25 .23 .32

comparison 0.44 0.25 0.39 0.27 0.53 0.21 0.47 0.19 0.33 0.35 0.54 0.14 0.46 0.21 0.64 0.31 0.51 0.39 0.46 0.10 0.47 0.24

0.72 0.78 0.66 0.94 0.66 0.81 1.28 0.81 1.47 0.61 0.76

0.44 0.49 0.38 0.50 0.43 0.43 0.43 0.67 0.76 0.45 0.48

0.58 0.58 0.58 0.57 0.54 0.67 0.38 0.63 0.39 0.32 0.53

0.61 0.75 0.58 0.66 0.72 0.51 0.74 0.53 0.67 0.56 0.62

0.67 0.74 0.74 0.74 0.74 1.20 1.54 1.12 1.40 0.64 0.82

0.44 0.42 0.38 0.49 0.38 0.34 0.51 0.36 0.44 0.44 0.41

0.50 0.52 0.47 0.51 0.52 0.53 0.49 0.59 0.57 0.39 0.51

Notes. N = 473. The full MCMM (= multi-characteristic multi-measure) matrix can be found in the Electronic Supplementary Material 1. Convergent validity: Mono-characteristic hetero-measure correlation. Discriminant validity: Highest absolute hetero-characteristic hetero-measure correlation. Convergent-Discriminant Comparison: Highest absolute hetero-characteristic hetero-measure correlation divided by monocharacteristic hetero-measure correlation. Smaller values are better, and values > 1.00 indicate lack of discriminant validity (grayshaded). Horizontally gray-shaded: Means for each Measure pair (across DIAMONDS dimensions). Vertically gray-shaded: Means for each DIAMONDS dimension (across Measure pairs).

the S8-III-P (.47). Further, Duty showed, on average, the least (.10) and Deception the highest loadings on measure factors (.35). Intercorrelations of latent characteristics and measure factors are presented in Tables 6 and 7, respectively. The intercorrelations among latent DIAMONDS factors

2015 Hogrefe Publishing

(Table 6) were small to moderately high in magnitude (|rs| = .05–.66), and in line with what Rauthmann and colleagues (2014) and Rauthmann and Sherman (2015) have found. The intercorrelations among latent measures factors (Table 7) were modest-to-high in magnitude (|rs| = .19–.69), indicating that the S8-II behaved differently

European Journal of Psychological Assessment 2016; Vol. 32(2):165–174

170

J. F. Rauthmann & R. A. Sherman: Ultra-Brief Assessment of Situation Characteristics

Table 4. Convergent/Discriminant construct validity with MCMM-CFA Model fit indices AIC

BIC

BICadj.

63,066.07 63,873.24 63,702.08 63,714.75 63,234.16 63,115.18 65,743.27

63,723.21 64,413.92 64,246.92 64,255.43 63,849.71 63,734.89 66,358.82

63,221.75 64,001.32 63,831.15 63,842.83 63,379.98 63,261.99 65,889.09

1,838.15 2,701.31 2,528.16 2,542.82 2,026.24 1,905.26 4,535.45

Model 1 2a 2b 2c 3a 3b 3c

dfv

CFI

662 690 689 690 672 671 672

<.001 <.001 <.001 <.001 <.001 <.001 <.001

0.921 0.865 0.876 0.875 0.909 0.917 0.74

Model comparisons Dv2

TLI RMSEA 90% CI RMSEA SRMR 0.907 0.847 0.860 0.859 0.894 0.904 0.699

0.061 0.079 0.075 0.075 0.065 0.062 0.110

.058–.065 .075–.082 .072–.078 .072–.078 .062–.069 .059–.066 .107–.113

0.056 – 0.300 863.16 0.152 690.01 0.151 704.67 0.089 188.09 0.057 67.11 0.143 2697.20

Ddf

– 28 27 28 10 9 10

– <.001 <.001 <.001 <.001 <.001 <.001

Notes. N = 473. MCMM-CFA = multi-characteristics multi-measure confirmatory factor analysis. Models 2a-c and 3a-c are nested within Model 1 (as the baseline model for comparisons). Model 1 (Baseline): free intercorrelations of characteristics factors, free intercorrelations of measures factors. Models 2: Variations in modeling characteristics factors, but free intercorrelations of measures factors Model 2a: fixed intercorrelations of characteristics factors (fixed to 1), free intercorrelations of measures factors. Model 2b: fixed intercorrelations of characteristics factors, free intercorrelations of measures factors. Model 2c: no intercorrelations of characteristics factors, free intercorrelations of measures factors. Models 3: Variations in modeling measured factors, but free intercorrelations of characteristics factors Model 3a: free intercorrelations of characteristics factors, fixed intercorrelations of measures factors (fixed to 1). Model 3b: free intercorrelations of characteristics factors, fixed intercorrelations of measures factors. Model 3c: free intercorrelations of characteristics factors, no intercorrelations of measures factors.

Table 5. MCMM-CFA loadings of DIAMONDS scales on latent characteristics and measures factors Loadings on characteristics factors

Loadings on measures factors

DIAMONDS scales

S8*

S8-I

S8-II

S8-III-A

S8-III-P

Average

S8*

Duty Intellect Adversity Mating pOsitivity Negativity Deception Sociality Average

.96 .90 .78 .90 .85 .85 .65 .89 .85

.90 .83 .86 .79 .88 .91 .80 .85 .85

.84 .70 .80 .63 .84 .77 .70 .82 .76

.92 .88 .71 .66 .78 .77 .47 .87 .76

.80 .78 .37 .55 .69 .56 .36 .72 .60

.88 .82 .70 .71 .81 .77 .60 .83 .76

.08 .16 .13 .21 .30 .08 .45 .30 .21

S8-I

S8-II

.01 .04 .25 .06 .25 .13 .28 .27 .01

.03 .02 .37 .16 .19 .42 .54 .11 .15

S8-III-A .11 .22 .23 .23 .08 .04 .28 .36 .18

S8-III-P

Average

.25 .31 .76 .51 .29 .48 .76 .40 .47

.10 .13 .25 .23 .15 .16 .35 .24 .20

Notes. N = 473. MCMM-CFA = multi-characteristics multi-measure confirmatory factor analysis. Estimates are based on Model 1 (free intercorrelations of characteristics, free intercorrelations of measures), as can be seen in Table 4. Standardized loading values are presented.

Table 6. Intercorrelations of latent characteristics from MCMM-CFA Latent dimension Duty Intellect Adversity Mating pOsitivity Negativity Deception Sociality

Duty – .35 .13 .19 .50 .38 .12 .17

Intellect – .11 .02 .06 .13 .13 .17

Adversity

– .17 .21 .64 .66 .25

Mating

– .26 .09 .27 .34

pOsitivity

– .57 .18 .39

Negativity

Deception

Sociality

– .55 .05

– .20

–

2015 Hogrefe Publishing

J. F. Rauthmann & R. A. Sherman: Ultra-Brief Assessment of Situation Characteristics

Table 7. Intercorrelations of latent measures from MCMM-CFA Latent dimension S8* S8-I S8-II S8-III-A S8-III-P

S8*

S8-I

S8-II

– .60 .42 .60 .63

– .69 .39 .54

– .24 .19

S8-III-A

S8-III-P

– .32

–

Notes. N = 473. MCMM-CFA = multi-characteristics multimeasure confirmatory factor analysis. Estimates are based on Model 1 (free intercorrelations of characteristics, free intercorrelations of measures), as can be seen in Table 4.

from the other measures (with negative correlations), probably because it had a fundamentally different item format (Table 1). Success Rates Hayashi and Hays’ (1987) MTMM program conducts pairwise comparisons between convergent coefficients (e.g., S8* Duty · S8-I Duty) and off-diagonal discriminant coefficients (i.e., mono-measure and hetero-measure) for each characteristic (e.g., Duty) and pair of methods to be compared (e.g., S8* and S8-I). There are mono-measure comparisons (e.g., for the measures S8* and S8-I: the S8* Duty · S8-I Duty coefficient with S8* Intellect · S8* Duty, S8* Adversity · S8* Duty, etc. and with S8-I Intellect · S8-I Intellect, S8-I Adversity · S8-I Duty, etc.) and hetero-measure comparisons (e.g., for the measures S8* and S8-I: the S8* Duty · S8-I Duty coefficient with S8* Duty · S8-I Intellect, S8* Duty · S8-I Adversity, etc. and S8-I Duty · S8* Intellect, S8-I Duty · S8* Intellect, etc.). Thus, for each characteristic and each measurepair comparison (e.g., S8* · S8-I), there are 14 mono-measure and 14 hetero-measure comparisons to be made. Thus, 140 mono- and 140 hetero-comparisons are made for each

171

characteristic when all pairs of measures are accounted for, summing up to a total of 1,120 mono- and 1,120 heterocomparisons for the entire MCMM matrix (OSM A). ‘‘Successes’’ – when convergent coefficients exceed off-diagonal discriminant coefficients – are counted in these comparisons. This gives a good account of convergent/discriminant validity (e.g., 140 successes = 100% success rate because the convergent coefficient exceeded the discriminant coefficients in all 140 comparisons). Detailed findings are presented in OSM B. The total success rate of mono-comparisons was 87.50% (i.e., 980/1,120) and of hetero-comparisons 94.82% (i.e., 1,062/1,120). This corresponds to a grand total of 91.16% of success across all mono- and heterocomparisons (i.e., 2,042/2,240) – a relatively good value. Duty, Intellect, pOsitivity, and Sociality obtained 100% success in Mono and Hetero, while Negativity obtained 100% success only in Hetero. Adversity, Mating (albeit to a small degree only), Negativity (only Mono), and Deception did not show 100% success. The one pair of measures showing consistently 100% success across all DIAMONDS dimensions is the S8-I with the S8-II. Generally, from the pattern of successes, particularly the S8-I behaved very well.

Summary We found evidence for strong convergent and generally sufficient discriminant validity. First, all scales tapped the latent construct they should tap (i.e., convergent validity; Figure 1, OSM A, Tables 3 and 5). Second, discriminant validity was generally given, but particularly the more negative DIAMONDS dimensions Adversity, Negativity, and Deception showed some problems (Figure 1, OSM A, Tables 3 and 5). Importantly, though, these problems were most pronounced for comparisons with the S8-III-A and S8-III-P (Table 5). Third, the different measures, though moderately-to-highly intercorrelated (Table 7), were not interchangeable in a strict sense (Table 4), but may rather offer each a slightly different picture. Particularly the S8-II stands out with its differing format (Table 1).

Table 8. Convergent nomological validity Measure pairs

Duty

S8* · S8-I S8* · S8-II S8* · S8-III-A S8* · S8-III-P S8-I · S8-II S8-I · S8-III-A S8-I · S8-III-P S8-II · S8-III-A S8-II · S8-III-P S8-III-A · S8-III-P Average

.98 .95 .97 .93 .95 .98 .93 .95 .92 .94 .95

Intellect .87 .79 .92 .89 .68 .86 .73 .71 .76 .81 .82

Adversity .80 .72 .57 .41 .86 .64 .39 .63 .54 .64 .64

Mating .91 .84 .86 .72 .78 .77 .62 .82 .62 .71 .78

pOsitivity .95 .92 .87 .87 .94 .84 .86 .79 .83 .91 .89

Negativity .91 .71 .80 .77 .84 .87 .74 .74 .60 .68 .78

Deception .40 .49 .23 .66 .77 .23 .39 .21 .63 .05 .43

Sociality

Average

.97 .96 .97 .95 .97 .96 .96 .97 .94 .95 .96

.91 .85 .86 .83 .88 .85 .78 .81 .78 .80 .84

Note. Vector correlations across 54 correlations (see Electronic Supplementary Materials 2 and 3 for full matrices). 2015 Hogrefe Publishing

European Journal of Psychological Assessment 2016; Vol. 32(2):165–174

J. F. Rauthmann & R. A. Sherman: Ultra-Brief Assessment of Situation Characteristics

1 Duty

2 Intellect

0.0

-0.5

0.5

0.0

-0.5

0.5

0.0

-0.5

0.5

0.0

-0.5

D I AMON D S

D I A MON D S

D I AM ON D S

5 pOsitivity

6 Negativity

7 Deception

Correlation of Items of Scale i with Items of Scale 6

0.5

0.0

-0.5

8 Sociality

1.0 Correlation of Items of Scale i with Items of Scale 7

1.0

1.0 Correlation of Items of Scale i with Items of Scale 5

1.0 Correlation of Items of Scale i with Items of Scale 4

0.5

4 Mating

1.0 Correlation of Items of Scale i with Items of Scale 3

Correlation of Items of Scale i with Items of Scale 2

Correlation of Items of Scale i with Items of Scale 1

Correlation of DIAMONDS vector correlation With other DIAMONDS vector correlations

3 Adversity

1.0

0.5

0.0

-0.5

1.0 Correlation of Items of Scale i with Items of Scale 8

172

0.5

0.0

-0.5

D I AM O N D S

D I AM O N DS

0.5

0.0

-0.5

D I AM O N DS

Figure 2. Vector correlation visualization.

Nomological Validity Correlations of all scales with the 54 LIWC-categories can be found in OSM C, and the full vector correlation matrix (when correlating these r-to-z-transformed correlations with each other) can be found in OSM D. A condensed account of convergent nomological validity coefficients (NVC; i.e., to what extent one construct from two scales taps similar

nomological correlates) is presented in Table 8 and Figure 2.1 The average NVC was .84, with the highest across all scale pairs for Sociality (.96) and the lowest for Deception (.43). The highest average NVC across all DIAMONDS dimensions was found for the scale pair S8* · S8-I (.96) and the least for S8-I · S8-III-P and S8-II · S8-III-P (.78). From the pattern of findings it can be gleaned that the S8-III-P version produced regularly

The matrix in OSM C could be, in principle, subjected to CFAs to distinguish convergent and discriminant nomological validity. We did not follow this path because there were only 54 correlation coefficients. Instead, we only focused on the most relevant information: convergent nomological validity.

European Journal of Psychological Assessment 2016; Vol. 32(2):165–174

2015 Hogrefe Publishing

J. F. Rauthmann & R. A. Sherman: Ultra-Brief Assessment of Situation Characteristics

higher NVCs than the S8-III-A version for Deception, while this pattern was reversed for the other dimensions, particularly Adversity.

Discussion We proposed four ultra-brief measures for the Situational Eight DIAMONDS (Table 1) and investigated their construct and nomological validity with different methods. The scales showed substantive convergent validity and, in most cases, sufficient discriminant validity. However, Adversity and Deception showed a lack of discriminant validity with Negativity, which was driven primarily through the S8-III-scales. Conceptually, all three scales may share a common ‘‘negativistic’’ core although each comes with different flavors: Adversity = overt threats, Negativity = potential for negative feelings, Deception = mistrust issues. Researchers may seek to differentiate different types of ‘‘negativistic’’ events rather than lumping them into one fuzzy category which greatly reduces content coverage and validity. Further, the S8-scales showed relatively high convergences in their nomological correlation patterns, indicating that they tapped highly similar correlations with external criteria.2 From the convergent construct and nomological validity correlation patterns, we gleaned that Deception is better conceived as possibilitybased, while the other dimensions as affordance-based. Considering all findings, the S8-I and S8-II performed best. As such, we can recommend those two scales for substantive research. Specifically, the S8-I may be used for experience sampling and the S8-II for stimulus-validation studies. Moreover, both may also be used for manipulation checks in between- or within-subject designs. While the S8-III-A and S8-III-P are interesting in their own right because they distinguish between affordance- and possibility-based situation conceptions, they did not perform as well as the S8-I and S8-II. We thus recommend caution when using these scales and suggest further revisions and psychometric analyses of them. Together with the OSM we have provided enough information for researchers to obtain a picture of each scale’s strengths and weaknesses so that they can decide which is best to use in their own research. Some limitations – that provide room for future research – should be noted. First, situation characteristics may be ascribed from different perspectives: Am I being affected by something/someone (target)? Am I affecting something/something (actor)?3 Am I watching something/someone being affected by someone/something (bystander/ observer)? The current scales are better equipped to tap passive aspects, that is, what happens to people (which is especially true for Adversity and Deception). Second, while one-item scales have certain desirable qualities (e.g., reduction of costs, time, and participant fatigue; easier 2 3

173

implementation in longitudinal or repeated-measurement studies), the current one-item scales, of course, are subject to limitations similar to other short-scales (e.g., Gosling, Rentfrow, & Swann, 2003). Most importantly, one-item scales abandon ‘‘traditional’’ psychometric properties (i.e., internal consistency reliability, content validity) for maximum economy. In cases where researchers are interested in a specific situation characteristic with all its nuances or where reliability is of issue (e.g., in underpowered studies or if effect sizes matter), differentiated/longer scale versions should be used. However, it was our aim to provide researchers with practically useful tools to quickly and economically assess major dimensions of situation characteristics in real life and in the laboratory.

Electronic Supplementary Material The electronic supplementary material is available with the online version of the article at http://dx.doi.org/10.1027/ 1015-5759/a000245 ESM1. Excel matrix. Construct validity (convergent/discriminant): Full MCMM correlation matrix. ESM2. Table (Word). Success Rates. ESM3. Excel matrix. Nomological correlations. ESM4. Excel matrix. Nomological validity: Full vector-correlation matrix.

References Argyle, M., Furnham, A., & Graham, J. A. (1981). Social situations. Cambridge, UK: Cambridge University Press. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105. Chemero, A. (2001). What we perceive when we perceive affordances. Ecological Psychology, 13, 111–116. Edwards, J. A., & Templeton, A. (2005). The structure of perceived qualities of situations. European Journal of Social Psychology, 35, 705–723. Falissard, B. (2012). Psy: Various procedures used in psychometry. R Package Version 1.1. Fleeson, W. (2007). Situation-based contingencies underlying trait-content manifestation in behavior. Journal of Personality, 75, 825–861. Gosling, S. D., Rentfrow, P. J., & Swann, W. B. Jr. (2003). A very brief measure of the Big Five personality domains. Journal of Research in Personality, 37, 504–528. Hayashi, T., & Hays, R. D. (1987). A microcomputer program for analyzing multitrait-multimethod matrices. Behavior Research Methods, Instruments, and Computers, 19, 345–348. Marsh, H. W., & Grayson, D. (1995). Latent variable models of multitrait-multimethod data. In R. H. Hoyle (Ed.), Structural

If we could correct for unreliability of all S8-scales, the NVCs would further rise. This conceptualization may be problematic because it blurs the situation with a person’s behavior. We thus eschewed actor-based items in this work.

2015 Hogrefe Publishing

European Journal of Psychological Assessment 2016; Vol. 32(2):165–174

174

J. F. Rauthmann & R. A. Sherman: Ultra-Brief Assessment of Situation Characteristics

equation modeling: Concepts, issues, and applications (pp. 177–198). Thousand Oaks, CA: Sage. Rauthmann, J. F. (2012). You say the party is dull, I say it is lively: A componential approach to how situations are perceived to disentangle perceiver, situation, and perceiver x situation variance. Social Psychological and Personality Science, 3, 519–528. Rauthmann, J. F., Gallardo-Pujol, D., Guillaume, E. M., Todd, E., Nave, C. S., Sherman, R. A., . . . Funder, D. C. (2014). The Situational Eight DIAMONDS: A taxonomy of eight major dimensions of situation characteristics. Journal of Personality and Social Psychology, 107, 677–718. Rauthmann, J. F., & Sherman, R. A. (2015). Measuring the Situational Eight DIAMONDS characteristics of situations: An optimization of the RSQ-8 to the S8*. European Journal of Psychological Assessment,. doi: 10.1027/1015-5759/ a000246 Saucier, G., Bel-Bahar, T., & Fernandez, C. (2007). What modifies the expression of personality tendencies? Defining basic domains of situation variables. Journal of Personality, 75, 479–504. Sherman, R. A., Nave, C. N., & Funder, D. C. (2010). Situational similarity and personality predict behavioral consistency. Journal of Personality and Social Psychology, 99, 330–343. Stoffregen, T. A. (2000). Affordances and events. Ecological Psychology, 12, 1–28.

European Journal of Psychological Assessment 2016; Vol. 32(2):165–174

Wagerman, S. A., & Funder, D. C. (2009). Situations. In P. J. Corr & G. Matthews (Eds.), Cambridge handbook of personality psychology (pp. 27–42). Cambridge, UK: Cambridge University Press. Wolf, M., Horn, A., Mehl, M., Haug, S., Pennebaker, J. W., & Kordy, H. (2008). Computergestützte quantitative Textanalyse: Äquivalenz und Robustheit der deutschen Version des Linguistic Inquiry and Word Count. Diagnostica, 2, 85–98.

Date of acceptance: September 17, 2014 Published online: February 27, 2015

John Rauthmann Institut für Psychologie Humboldt-Universität zu Berlin Unter den Linden 6 10099 Berlin Germany Tel. +49 30 2093-1836 E-mail jfrauthmann@gmail.com

2015 Hogrefe Publishing

EAPA

APPLICATION FORM EAPA membership includes a free subscription to the European Journal of Psychological Assessment. To apply for membership in the EAPA, please fill out this application form and return it together with your curriculum vitae to: Itziar Alonso-Arbiol, PhD (EAPA Secretary General), Dept. de Personalidad, Evaluacion y Tratamiento Psicológico, Universidad del País Vasco, Faculdad de Psicología, 20018 San Sebastian, Spain, E-mail secretary@eapa-homepage.org

Family name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . First name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Affiliation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . City

. . . . . . . . . . . . . . . .

Postcode . . . . . . . . . . . . . . . . . . . .

Country . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Phone

. . . . . . . . . . . . . . .

Fax . . . . . . . . . . . . . . . . . . . . . .

E-mail

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ANNUAL FEES ◆ EURO 75.00 (US $ 98.00) – Ordinary EAPA members ◆ EURO 50.00 (US $ 65.00) – PhD students ◆ EURO 10.00 (US $ 13.00) – Undergraduate student members

FORM OF PAYMENT ◆ Credit card VISA

Mastercard/Eurocard

IMPORTANT! 3-digit security code in signature field on reverse of card (VISA/Mastercard) or 4 digits on the front (AmEx)

American Express

Number Expiration date

CVV2/CVC2/CID#

Card holder’s name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Signature . . . . . . . . . . . . . .

Date

. . . . . . . . . . . . . . . . . . . . .

◆ Cheque or postal order Send a cheque or postal order to the address given above Signature . . . . . . . . . . . . . .

Date

. . . . . . . . . . . . . . . . . . . . .

Instructions to Authors The main purpose of the European Journal of Psychological Assessment is to present important articles which provide seminal information on both theoretical and applied developments in this field. Articles reporting the construction of new measures or an advancement of an existing measure are given priority. The journal is directed to practitioners as well as to academicians: The conviction of its editors is that the discipline of psychological assessment should, necessarily and firmly, be attached to the roots of psychological science, while going deeply into all the consequences of its applied, practice-oriented development. Psychological assessment is experiencing a period of renewal and expansion, attracting more and more attention from both academic and applied psychology, as well as from political, corporate, and social organizations. The EJPA provides a meeting point for this movement, contributing to the scientific development of psychological assessment and to communication between professionals and researchers in Europe and worldwide. European Journal of Psychological Assessment publishes the following types of articles: Original Articles, Brief Reports, and Multistudy Reports. Manuscript submission: All manuscripts should in the first instance be submitted electronically at http://www.editorialmanager.com/ejpa. Detailed instructions to authors are provided at http://www.hogrefe.com/periodicals/european-journal-ofpsychological-assessment/advice-for-authors/ Copyright Agreement: By submitting an article, the author confirms and guarantees on behalf of him-/herself and any coauthors that the manuscript has not been submitted or published elsewhere, and that he or she holds all copyright in and titles to the submitted contribution, including any figures, photographs, line drawings, plans, maps, sketches, tables, and electronic supplementary material, and that the article and its contents do not infringe in any way on the rights of third parties. ESM will be published online as received from the author(s) without any conversion, testing, or reformatting. They will not be checked for typographical errors or functionality. The author indemnifies and holds harmless the publisher from any third-party claims. The author agrees, upon acceptance of the article for publication, to transfer to the publisher the exclusive right to reproduce and

distribute the article and its contents, both physically and in nonphysical, electronic, or other form, in the journal to which it has been submitted and in other independent publications, with no limitations on the number of copies or on the form or the extent of distribution. These rights are transferred for the duration of copyright as defined by international law. Furthermore, the author transfers to the publisher the following exclusive rights to the article and its contents: 1. The rights to produce advance copies, reprints, or offprints of the article, in full or in part, to undertake or allow translations into other languages, to distribute other forms or modified versions of the article, and to produce and distribute summaries or abstracts. 2. The rights to microfilm and microfiche editions or similar, to the use of the article and its contents in videotext, teletext, and similar systems, to recordings or reproduction using other media, digital or analog, including electronic, magnetic, and optical media, and in multimedia form, as well as for public broadcasting in radio, television, or other forms of broadcast. 3. The rights to store the article and its content in machinereadable or electronic form on all media (such as computer disks, compact disks, magnetic tape), to store the article and its contents in online databases belonging to the publisher or third parties for viewing or downloading by third parties, and to present or reproduce the article or its contents on visual display screens, monitors, and similar devices, either directly or via data transmission. 4. The rights to reproduce and distribute the article and its contents by all other means, including photomechanical and similar processes (such as photocopying or facsimile), and as part of so-called document delivery services. 5. The right to transfer any or all rights mentioned in this agreement, as well as rights retained by the relevant copyright clearing centers, including royalty rights to third parties. Online Rights for Journal Articles: Guidelines on authors’ rights to archive electronic versions of their manuscripts online are given in the Advice for Authors on the journal’s web page at www.hogrefe.com.

April 2016

Experimental Psychology

nline o e e r e f le issu p m a s

The journal for experimental research in psychology Editor-in-Chief Christoph Stahl, Cologne, Germany Editors Tom Beckers, Leuven, Belgium Arndt Bröder, Mannheim, Germany Adele Diederich, Bremen, Germany Chris Donkin, Kensington, NSW, Australia Gesine Dreisbach, Regensburg, Germany

ISSN-Print 1618-3169 ISSN-Online 2190-5142 ISSN-L 1618-3169 6 issues per annum (= 1 volume)

Subscription rates (2016) Libraries / Institutions US $498.00 / € 380.00 Individuals US $259.00 / € 185.00 Postage / Handling US $24.00 / € 18.00

www.hogrefe.com

About the Journal As its name implies, Experimental Psychology publishes innovative, original, high-quality experimental research in psychology – quickly! It aims to provide a particularly fast outlet for such research, relying heavily on electronic exchange of information which begins with the electronic submission of manuscripts, and continues throughout the entire review and production process. The scope of the journal is defined by the experimental method, and so papers based on experiments from all areas of psychology are published. In addition to research articles, Experimental Psychology includes occasional theoretical and review articles. Drawing on over 50 years of experience and tradition in publishing high-quality, innovative science as the Zeitschrift für Experimentelle Psychologie, the journal has an internationally renowned team of editors and reviewers from all the relevant areas of psychology, thus ensuring that the highest international standards are maintained.

Kai Epstude, Groningen, The Netherlands Magda Osman, London, UK Manuel Perea, Valencia, Spain Klaus Rothermund, Jena, Germany Samuel Shaki, Ariel, Isreal

Manuscript Submissions All manuscripts should be submitted online at www.editorialmanager.com/exppsy, where full instructions to authors are also available. Electronic Full Text The full text of the journal – current and past issues (from 1999 onward) – is available online at econtent.hogrefe.com/loi/zea (included in subscription price). A free sample issue is also available there. Abstracting Services The journal is abstracted / indexed in Current Contents / Social and Behavioral Sciences (CC / S&BS), Social Sciences Citation Index (SSCI), Medline, PsyJOURNALS, PsycINFO, PSYNDEX, ERIH, Scopus, and EMCare. Impact Factor (Journal Citation Reports®, Thomson Reuters): 2015 = 2.000

Psychological Assessment Science and Practice Book series Edited with the support of the European Association of Psychological Assessment (EAPA) Editors Tuulia M. Ortner, PhD, Austria Itziar Alonso-Arbiol, PhD, Spain Anastasia Efklides, PhD, Greece Willibald Ruch, PhD, Switzerland Fons J.R. van de Vijver, PhD, The Netherlands

Volume 1, 2015, vi + 234 pages US $63.00 / € 44.95 ISBN 978-0-88937-437-9

www.hogrefe.com

New series Each volume in the series Psychological Assessment – Science and Practice presents the state-of-the-art of assessment in a particular domain of psychology, with regard to theory, research, and practical applications. Editors and contributors are leading authorities in their respective fields. Each volume discusses, in a reader-friendly manner, critical issues and developments in assessment, as well as well-known and novel assessment tools. The series is an ideal educational resource for researchers, teachers, and students of assessment, as well as practitioners.

Volume 2, 2016, vi + 346 pages US $69.00 / € 49.95 ISBN 978-0-88937-452-2

Volume 3, 2016, vi + 336 pages US $69.00 / € 49.95 ISBN 978-0-88937-449-2