Blikstad-Balas, Klette and Tengberg (Eds.)
This collection of chapters originates from the discussions at the QUINT Conference 2019: Analysing Teaching Quality: Perspectives, Potential, and Pitfalls, organized by the Nordic Centre of Excellence: Quality in Nordic Teaching 18−20 June 2019 at the University of Oslo.
This book is also available open access at Idunn. ISBN printed edition (print on demand) 978-82-15-04506-1
WAYS OF ANALYZING TEACHING QUALITY
Recent research suggests that there is a direct link between quality of teaching and teacher’s instructions and student achievement scores. Additionally, there is a strong correlation between levels of teaching quality and differences between schools and classrooms. However, measuring teaching quality has proven to be difficult and scholars strive to decide on the what and the how when aiming to measure teaching quality. This book discusses the many dilemmas of measuring teaching quality, be it substantial, theoretical, or methodological. This edited volume presents eight chapters, assigned authors provide updated and new knowledge on the many challenges linked to defining what teaching quality is and how it can be measured.
Open access
Marte Blikstad-Balas, Kirsti Klette and Michael Tengberg (Eds.)
WAYS OF ANALYZING TEACHING QUALITY Potentials and Pitfalls
Ways of Analyzing Teaching Quality
Marte Blikstad-Balas, Kirsti Klette and Michael Tengberg (Eds.)
Ways of Analyzing Teaching Quality Potentials and Pitfalls
Scandinavian University Press
© Copyright 2021 Copyright of the collection and the introduction is held by Marte Blikstad-Balas, Kirsti Klette and Michael Tengberg. Copyright of the individual chapters is held by the respective authors. This book was first published in 2021 by Scandinavian University Press. The material in this publication is covered by the Norwegian Copyright Act and published open access under a Creative Commons CC BY 4.0 licence. This licence provides permission to copy or redistribute the material in any medium or format, and to remix, transform or build upon the material for any purpose, including commercially. These freedoms are granted under the following terms: you must give appropriate credit, provide a link to the licence and indicate if changes have been made to the material. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may not apply legal terms or technological measures that legally restrict others from doing anything the licence permits. The licensor cannot revoke the freedoms granted by this licence as long as the licence terms are met. Note that the licence may not provide all of the permissions necessary for your intended use. For example, other rights, such as publicity, privacy or moral rights, may limit how you use the material. The full text of this licence is available at https://creativecommons.org/licenses/by/4.0/legalcode. This book is published with financial support by NordForsk, Nordic Centre of Excellence: Quality in Nordic Teaching (QUINT), project number 87663 and the Department of Teacher Education and School Research, University of Oslo. ISBN printed edition (print on demand): 978-82-15-04506-1 ISBN electronic pdf-edition: 978-82-15-04505-4 DOI: 10.18261/9788215045054-2021 Enquiries about this publication may be directed to: post@universitetsforlaget.no. www.universitetsforlaget.no Cover: Scandinavian University Press Prepress: Tekstflyt AS
Contents Why – and How – Should We Measure Instructional Quality? . . . . . . . . . . . . Marte Blikstad-Balas, Michael Tengberg and Kirsti Klette The Generally Agreed upon Importance of Teaching . . . . . . . . . . . . . . . . . . . The Purposes of Measuring Instructional Quality . . . . . . . . . . . . . . . . . . . . . . Theoretical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methodological Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Content of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Practice, Feedback, Argument, Measurement: A Frame for Understanding Diverse Perspectives on Teaching Assessments . . . . . . . . . Courtney Bell (corresponding author) and Robert Mislevy Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Short Descriptions of the Four Metaphors . . . . . . . . . . . . . . . . . . . . . . . . . . . . What is Teaching? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applying the Argument Metaphor to the Assessment of Teaching . . . . . . . Applying the Measurement Metaphor to the Assessment of Teaching . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. The Surplus of Quality: How to Study Quality in Teaching in Three QUINT Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nikolaj Elf Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Practice Perspective on Quality Teaching . . . . . . . . . . . . . . . . . . . . . . . . . . . Research Questions and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Meta-View on Studies of Teaching Quality . . . . . . . . . . . . . . . . . . . . . . . . . How LISA Nordic Investigates Teaching Quality in Practice: Learning from the Danish Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analyzing a Lesson Collected in LISA Nordic from The Situated Actors’ Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristics of the Connected Classroom and Quality Literature Education Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparisons of the Construction of the Study of Quality in the Three Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implications Beyond QUINT: Arguing for a Multidimensional Framework References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
9 9 10 11 13 15 19
21 22 23 25 36 43 48 50
53 53 54 57 59 63 66 77 82 84 86
6
Contents | Ways of Analyzing Teaching Quality
3. A Validity Framework for the Design and Analysis of Studies Using Standardized Observation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Mark White Theoretical Foundation of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Validity Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4. Multi-Group Measurement Invariance and Generalizability Analyses for an Instructional Quality Observational Instrument . . . . . . . . . . . . . . . . . . . . . . . Armin Jentsch, Hannah Heinrichs, Lena Schlesinger, Gabriele Kaiser, Johannes König and Sigrid Blömeke Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statistical Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Instructional Quality: A Review of Conceptualizations, Measurement Approaches, and Research Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bas Senden, Trude Nilsen and Sigrid Blömeke Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Instructional Quality: From Generic to Subject-Specific . . . . . . . . . . . . . . . . Conceptualizations of Instructional Quality . . . . . . . . . . . . . . . . . . . . . . . . . . Operationalization and Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teacher Self-Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Instructional Quality: Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Observational Scores as Predictors for Student Achievement Gains . . . Kirsti Klette, Astrid Roe and Marte Blikstad-Balas Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Theoretic Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Research Design and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
121
121 123 125 128 129 134 137
140 140 141 144 151 154 155 164 167 173 173 176 180 185 190 194 195 201
Contents
7. Cognitive Activation Potential of E&S Tasks at Commercial Vocational Schools in German-Speaking Switzerland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eva Weingartner Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cognitive Activation Potential of Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. Exploring the Potential in Using Teachers’ Intended Lesson Goals as a Context-Sensitive Lens to Understanding Observational Scores of Instructional Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jennifer Maria Luoto and Alexander Jonas Viktor Selling Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conceptualizing Instructional Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teachers’ Intended Lesson Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
204 204 207 208 218 222 225
229 230 231 232 233 236 247 251
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
7
Blikstad-Balas, M., Klette, K. & Tengberg, M. (Eds.) (2021). Ways of Analyzing Teaching Quality. Potentials and Pitfalls. Scandinavian University Press. DOI: https://doi.org/10.18261/9788215045054-2021-00
Why – and How – Should We Measure Instructional Quality? Marte Blikstad-Balas, Michael Tengberg and Kirsti Klette
THE GENERALLY AGREED UPON IMPORTANCE OF TEACHING A key responsibility of governments across countries, and a major element of government spending, is the state provision of high-quality education. Education is argued to increase equity, to eradicate poverty, to drive sustainable development, and enable peace and democracy – and it is a key factor shaping global economic and social development (OECD, 2010, 2016; UNESCO, 2017). With the emphasis on “knowledge economy”, the question of how education is provided has become a political and politicized topic that generates debate and contestation (Menter, 2017; Wyse et al., 2017). Though everyone agrees that questions of educational quality and progress are of critical interest – as they concern the welfare of society at large – there is no shortage of opinions about how teachers should fulfil their important mandate as educators. We know that what teachers do in their classroom has a great impact on their students’ lives. Not only do teachers help students reach specific learning goals in their subject on a short-term basis, they can also motivate students, irrespective of gender, ethnic background, and socioeconomic status, to seek knowledge across subjects, to keep studying and overcoming challenges, and to pursue an education and even an academic path later in life – even for students who are the first in their family to do so. It seems “intuitively obvious” that some teachers are more efficient than others, and there is growing evidence to support this idea (Raudenbush & Jean, 2015, p. 171). It has repeatedly been found in empirical studies that the quality of teachers’ instruction is linked to the quality of students’ achievement, and that this one factor can be more important than factors such as class size, classroom climate, and teachers’ years of experience and formal training (Baumert et al., 2010; Bryk et al., 2010; Hattie, 2009; Konstantopoulos & Chung, 2011; Nilsen & Gustafsson, 2016; Seidel & Shavelson, 2007; Timperley & Alton-Lee, 2008). Lipowsky et al. (2009) argue that the impact of both teacher characteristics and
This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
10
Blikstad-Balas, Tengberg and Klette | Ways of Analyzing Teaching Quality
instruction is stronger than previously assumed. Accumulated research on what affects students’ achievement shows that how resources are used is far more important than how much are used, and that differences in teacher quality are the most significant explanation for differences between schools (Hanushek, 2020). “You can’t know where you’re going unless you know where you have been” is a common saying. We believe this to also to be the case when it comes to educational matters and teaching quality. While acknowledging that teaching is extremely complex and hard to measure accurately, we also believe it is of paramount importance for educational discourses to not only focus on “where we are going” or “where we should be going”, but to connect the ambition to empirical and systemic knowledge about where “we are”.
THE PURPOSES OF MEASURING INSTRUCTIONAL QUALITY Research on teaching quality serves different purposes. School leaders might use research on teaching quality in attempts to evaluate their teachers, teachers might be interested in research on teaching quality to improve their own practice, while researchers may be interested in the relationship between different aspects of teaching quality and their possible implications for different groups of students. Policy makers and stakeholders rely on accurate information on the quality of teaching to make informed choices when developing educational policies and curricula. An important justification for research on teaching quality is that in order to improve the quality of instruction, the prevalent practices of teaching must be assessed against the best possible evidence of what characterizes effective teaching practices. This can be done in a number of ways and for a number of purposes at different levels. While school leaders and teachers may collaborate on a local level to assess and improve the teaching practices at a particular school, researchers may inform policy makers within a country or even several countries about the trends on instructional quality. A third route is teacher–researcher collaboration, where new knowledge of the applicability of various practices may be built through iterative cycles of application and analysis in practice contexts. Regardless of purpose and level (local, regional, national, or international), improving teachers’ instruction requires robust and rigorous knowledge about what characterizes the instruction in the first place. Particularly, it is crucial for research (as well as for professional development purposes) to know the characteristics of classroom situations where teachers struggle to bridge the gap between “best practice” and actual practice. Working on long-term development of instruc-
Why – and How – Should We Measure Instructional Quality?
tional quality requires not only the identification of teaching practices that successfully promote students’ learning, but also to have a clear understanding of what may prevent successful integration of practices that are known to be effective. In the words of Archer et al. (2015), “Teaching and learning will not improve if we fail to give teachers high-quality feedback based on accurate assessments of their instruction as measured against clear standards for what is known to be effective” (p. 1). A critical concern for those who desire to measure teaching quality is to be clear about for whose sake this is being done. Who will have access to the knowledge produced? Who is asking for this knowledge? What may be considered legitimate ways of using, for example, detailed data over teachers’ variation in classroom instruction? Even with a general consensus on the fact that the quality of teachers’ instruction is essential for student learning, measuring the quality of teachers’ performance, through assessment of the quality of their instruction, is a muchdebated topic. To search for evidence of teachers’ impact on their students’ learning inevitably involves questions of both researcher ethics and work ethics. In the American context, where teachers are increasingly evaluated, many questions have been raised about weather teachers’ classroom practices should be subjected to systematic attempts to assess quality: There is no shortage of debate and opinion on the challenges and promises of teacher performance evaluation, with interests weighing in on all sides— unions seeking protection of members from undue harm; reformers advancing a good argument for the need to use metrics as levers for workforce development and improvement; teachers pressing for fair and reliable systems and meaningful feedback and support; and parents and members of the public wanting better schools for all students. (Pianta & Kerr, 2015, p. 583) In other contexts where the use of teacher performance measures are less familiar – for example, the Nordic countries – it is essential that the import of new research methods and a new research rationale of teaching quality does not automatically import the ethical tension referred to here by Pianta and Kerr.
THEORETICAL CHALLENGES Although scholars agree that measuring teaching quality has proven difficult, there is also a certain consensus around key factors that have proven critical when assessing teaching quality. Analyses suggest that there are strong commonalities
11
12
Blikstad-Balas, Tengberg and Klette | Ways of Analyzing Teaching Quality
across different frameworks and observation approaches targeting instructional quality (Bell et al., 2019; Klette & Blikstad-Balas, 2018; Klieme & Rakoczy, 2008; Praetorius & Charalombous, 2018) and most frameworks include practices like supportive climate, cognitive challenge and classroom management as key domains when capturing aspects of teaching quality. Thus, there seems to be a consensus around some key domains when measuring aspects of teaching and instruction (Klette, 2015; Kunter et al., 2007; Nilsen & Gustafsson, 2016). These domains include instructional clarity (clear goals, explicit instruction), cognitive activation (cognitive challenge, quality of the task, content coverage), discourse features (teacher–student interaction, student participation in content-related talk), and supportive climate (managing classrooms, creating an environment of respect and rapport). When looking across frameworks and analytical approaches, most of them include three, or all four, domains. (Bell et al., 2019). However, although attending the same overall domains, the frameworks vary in terms of grouping of dimensions (items) and domains (constructs), the level of operationalization of domains into sub –dimensions (and items) and terminology used. Despite strong similarities across the different frameworks, they also differ in terms of theoretical underpinnings, grain size and whether targeted towards subject-specific aspects or generic aspects of teaching. Looking across the frameworks trying to capture instructional quality, they align with specific theoretical traditions and a community’s view of learning, be it cognitive approaches to learning, socio-constructivist theories of learning, process–product approaches to learning and/ or national or country-specific standards. Praetorious & Charalombous (2018), for example, identify 11 different theoretical underpinnings and research traditions when reviewing mathematics frameworks spanning educational effectiveness research, learning and teaching theories, subject-specific theories, motivation theories and the didactic triangle. Communities’ views will, of course vary, emphasizing different aspects of teaching and learning. Perhaps cognitive activation will be a core aspect of high-quality teaching across communities, but the degree to which teachers facilitate classroom discourse and student participation might vary depending on the country’s cultural views of teaching and learning (Clarke et al., 2006). Communities’ views necessarily reflect cultural differences in valued practices assigned to different contexts. In the Nordic countries, for example, an observation approach might privilege a high degree of student engagement as critical for high quality instruction, while an American approach might pay attention to explicitness of instruction. To focus on subject-specific aspects versus generic aspects of teaching quality is yet another theoretical dimension to consider when aiming at analyzing teaching
Why – and How – Should We Measure Instructional Quality?
quality. Internationally, several scholars assert the need for subject specificity when analyzing the qualities of classroom teaching and learning. Hill and Grossman (2013) argue that if classroom analyses are to achieve the goal of supporting teachers in improving their teaching, these frameworks must be subject-specific and involve content expertise. This will enable teachers to provide information that is relevant for situation-specific teaching objectives, regardless of whether this is student engagement, group problem solving or algebra learning. Blöemeke et al. (2015) show how a combination of generic factors and subject-specific factors (in their case, regarding mathematics) is required for producing valid knowledge on how different teaching factors contribute to student learning. Klette et al., (2017) use the PLATO framework (targeted for language subjects) to capture both subject-specific and generic aspects when analyzing the features of Norwegian instruction in languages and mathematics. The MET study (Kane et al., 2012) argues that there were no big differences across the different frameworks deployed when trying to measure teaching quality in 3000 US classrooms using five different frameworks – three subject specific and two generic. There is probably not one right solution to the question of whether to use generic or subject-specific observation frameworks. Instead, the answer to this question will depend on the purpose of the study, be it strengthening student engagement and participation, classroom discussion or content-specific teaching.
METHODOLOGICAL CHALLENGES Different methodological traditions contribute to the aggregated available knowledge on effective teaching and high instructional quality. While intervention studies have provided a growing body of empirical evidence of instructional strategies that are effective with regard to student learning, observation studies have often found that such practices (for example, the explicit instruction of learning strategies or the scaffolding of performance through effective feedback), are scarce in today’s classrooms (Elbers et al., 2008; Klette et al., 2017; Magnusson et al., 2019). This means that even though a significant number of studies define what highquality instruction would look like, few studies are able to locate these practices in naturally occurring, non-intervention classroom studies. Although research studies of teaching quality take stock of measures of student achievement over time in order to identify indicators of quality, few studies have yet investigated the extent to which such indicators might have differential effects on students with regard to achievement levels, gender, student motivation, or language proficiency. Such effects have been identified in intervention studies target-
13
14
Blikstad-Balas, Tengberg and Klette | Ways of Analyzing Teaching Quality
ing well-defined ingredients of teaching such as questioning strategies, self-regulation, or strategic group work (Schünemann et al., 2013; Taboada et al., 2012), but rarely in relation to broader categories of teaching qualities. We therefore need studies that specifically examine how different indicators of classroom quality affect different learners, as students of various backgrounds and capabilities may benefit from different kinds of instruction and need different forms of support. A major concern when advocating additional teaching quality research is the acute awareness of its methodological challenges. Measuring teacher performance by global observation systems on high-inference variables prompts a range of reliability issues. Indeed, the requirements of reliability and validity may depend to a certain extent on the proposed use of the research findings. But, given the farreaching social, cultural, political, and professional implications of scientifically defining a phenomenon such as teaching quality, it is necessary that the measures we use are accurate, reliable, and relevant. This is certainly one of the reasons why instructional quality and assessments of individual teachers are so contested. No measurement tool is likely to capture the complex interaction between teachers, students, and content that teaching entails, and, as Archer et al. (2015) emphasize: The list of things teachers do that may have a significant impact on student learning is extensive. Ensuring accuracy in the face of such complexity poses a major challenge to the design and implementation of tools for measuring effective teaching. (p. 1) And, as pointed out by Raudenbush & Jean (2015), many indicators of classroom quality are highly correlated, making it challenging to make informed decisions about what kinds of data provide the most useful information, and how predictive of instructional quality different data can be. Further, the cumulative nature of education makes it hard to measure accurately, as we often lack longitudinal data and must rely solely on current input measures, which often leads to “analytical errors and biased estimates of specific inputs” (Hanushek, 2020, p. 165). A distinct kind of measurement issue, one that is certainly less thoroughly examined, is the student outcome against which teaching quality indicators are typically validated. In large-scale observations, student learning over time is commonly assessed either by standardized tests, or, if the teaching observed is restricted to a specific domain, by tests specifically designed to target those domains of knowledge (Fauth et al., 2014; OECD, 2010; 2016). Standardized tests, of course, refer to a range of different measures, including state exams, national tests, various commercial assessment products etc. However, in the context of teaching quality studies, they generally involve the measurement of year-to-year progress in either reading, math-
Why – and How – Should We Measure Instructional Quality?
ematics, or science. Using achievement gains in reading and mathematics may be justified because of their reliability, their proven correlation with progress in other subject domains, and for the profound importance of reading and mathematics to learning in many other fields of knowledge. Yet such measures still represent only a small share of all the esteemed benefits expected from education. A broad understanding of teaching quality, or of teaching effectiveness, cannot be delineated only through its contribution to learning of reading and mathematics, but must also consider to what extent the teaching promotes good judgment in ethical dilemmas, democratic ideals, ability to connect historical events and structures, capacity for analyzing scientific arguments, long-term motivation for learning and intellectual growth etc. Even if studies should prove that reading comprehension correlates well with many of these loftier educational goals, that in itself does not prove that they are attributable to the same teaching practices. Therefore, the idea of teaching quality must be maintained as a multi-layered concept that requires consideration from a number of different perspectives. Moreover, correlating prevalent features of teacher’s instruction with measures of their students’ learning is certainly relevant, and pinpoints quality in terms of effectiveness and contribution to learning, but it only represents one of the possible ways of conceptualizing quality. In addition to effective strategies for extending curriculum content, teachers must also make wise decisions about what to teach and when, to consider students’ well-being and motivation for learning, to support disadvantaged students in order to improve equity in education and so on. These factors all pertain to the problem of how to measure quality in teaching. Research on teaching quality also needs to investigate, for instance, the extent to which students’ own perception of teaching quality relates to achievement-based identification of effective instructional features (see e.g., Fauth et al., 2014). Similarly, it will be necessary, especially when moving between national educational contexts, to examine the extent to which different measures of achievement, or of student perceptions of quality, correlate with assumed contextual indicators of teaching effectiveness (see e.g., Grossman et al., 2014).
THE CONTENT OF THE BOOK Thus, while there is broad agreement that teaching quality matters and that teachers’ instructional repertoires in classrooms are key requisites for students’ learning, measuring instructional quality has proven to be challenging. Scholars around the world strive to agree on the ‘what’, ‘how’, and even ‘why’ when measuring teaching quality. Against this backdrop, the Nordic Centre of Excellence: Quality in Nordic
15
16
Blikstad-Balas, Tengberg and Klette | Ways of Analyzing Teaching Quality
Teaching, QUINT, organized a conference with the theme “Analysing Teaching Quality: Perspectives, Principles and Pitfalls”. The conference brought together scholars from around the world, and made available different perspectives and approaches to teaching quality. In this book, we have invited some of the key contributors from the conference to provide insights into the matter of instructional quality. The book consists of eight chapters that together address both methodological, theoretical and substantial aspects when measuring teaching quality. While all chapters touch upon issues of theory, methodology and empirical findings, they are organized according to one these three main themes depending on the overall focus of the chapter.
Theoretical contributions The first chapter, written by Courtney Bell & Robert Mislevy, underscores how teaching is critical to achieving the goal of thriving societies, and that teaching assessments play a role in helping researchers, policy makers, practitioners, and teachers themselves understand and improve teaching. In addition to the issues that readily come to mind when we think of assessment – e.g., the degree to which an instrument captures the complexity of teaching, the meaning of scores, rater reliability, and the consequences associated with assessments – stakeholders frequently do not agree on the goals of assessing teaching. Bell and Mislevy describe four metaphors for understanding assessments of teaching: assessment as a cultural practice, a feedback loop, an evidentiary argument, and a measurement. Their very informative analysis helps us understand why researchers, policy makers, and practitioners often have incompatible views of how to gather and use teaching assessment data. In chapter 2, Nikolaj Elf discusses how to study quality in teaching. The author’s starting point is the hypothesis that any claim on or conceptualization of quality teaching implies a conceptualization of teaching. Exploring this hypothesis from a practice theory perspective, the chapter analyzes how theoretical and methodological premises are established and applied in three empirical projects from QUINT. A key point made by Elf, after comparing analyses of the three projects, is that one should be wary of applying a one-dimensional approach to quality teaching. Rather, one should think multidimensionally and plurally of quality teaching both in research and practice. To grasp this plurality, Elf suggest the term ‘the surplus of quality in teaching’, inspired by Paul Ricoeur, and underlines the importance of combining different frameworks and perspectives when attempting to capture different aspects of teaching quality.
Why – and How – Should We Measure Instructional Quality?
Methodological contributions Measuring instruction using observation instruments across divergent teaching contexts, especially across national boundaries, is a complex intellectual challenge. In chapter 3, Mark White argues that on the one hand, instruction is a very fuzzy construct that needs to be operationalized in concrete terms, which in itself is a value-laden process. On the other hand, White argues, instruction varies widely across days, students, content being taught, schools, and various other facets of the teaching context. In his chapter, he presents a validity framework for comparing instructional quality across contexts using standardized observation systems. The framework explicitly breaks down the steps in operationalizing teaching quality and sampling instruction. In doing so, White highlights various levels to which observation scores can be generalized and the processes affecting generalization. The explicitness of the framework helps to structure potential validity arguments that are necessary to support study conclusions. In chapter 4, Armin Jentsch, Hanna Heinrichs, Lena Wilms, Gabrielle Kaiser, Johannes König and Sigrid Blömeke provide another methodological contribution. Drawing on an influential framework of instructional quality, they have developed an observational protocol with which both generic and subject-specific characteristics were assessed in different teacher samples (N = 76) by trained observers. As an approach to validation, the authors have combined generalizability and measurement invariance analyses to investigate the psychometric quality of the observational protocol. An important contribution from this study is that that psychometric properties can differ not only between observational instruments, but sometimes even between measures of a single observational instrument. The authors argue that this finding could be relevant for the development of future observational instruments, both in educational research and instructional practice. Instructional quality has been identified as one of the most important predictors of student outcomes. Yet, how to conceptualize and measure the concept, what aspects of it are important, in what subject domains, in which countries, for what cohort, and for what type of outcome (cognitive and affective) remains unclear. In chapter 5, Bas Senden, Trude Nilsen and Sigrid Blömeke seek to disentangle these challenges by reviewing previous research and showcasing findings from studies made across countries, cohorts, subject domains, and outcomes. The chapter provides relevant perspectives for anyone interested in the concept of teaching quality, and a key take-away from this work are the findings that the role of instructional quality for student learning might vary across contexts, hinting towards the importance of differential effectiveness for instructional quality.
17
18
Blikstad-Balas, Tengberg and Klette | Ways of Analyzing Teaching Quality
Empirical contributions In chapter 6, Kirsti Klette, Astrid Roe and Marte Blikstad-Balas combine video data from a large number of language arts and mathematics lessons (grade 8), scored with the Protocol for Language Arts Teaching Observations (PLATO), with student achievement gains on the national tests in reading and mathematics. The aim of their study is to investigate correlations between PLATO scores and achievement gains for all students and for subgroups of students. They examine whether certain PLATO elements are more closely related to student achievement gains than others and discuss what possible relations between PLATO scores and student achievement gains tell us about how various instructional practices might affect different students’ learning. Their study highlights that: (a) instructional facets might support students differentially; (b) high-achieving students might profit more from the indicators of instructional quality as measured though PLATO and (c) instructional quality may encompass different features in different subjects. In chapter 7, Eva Weingartner reports on a video study investigating the cognitive activation potential of tasks used in the subject ‘Economy & Society’ at commercial vocational schools in Switzerland. Weingartner analyzes both the objective and the realized cognitive activation potential of the tasks – i.e., both the construction of the task and its implementation by the teacher. The results show that the objective cognitive activation potential was generally on a low to medium level and rarely changed through implementation in class. The insights of this project are useful for future teacher training and may contribute to raise teachers’ awareness of the way in which tasks and materials activate students’ thinking. In chapter 8, Jennifer Maria Luoto and Alexander Jonas Viktor Selling address the possible tension between the need to standardize observations and always look for the same observable features of instruction to enable reliable comparisons across multiple classrooms, and the need to consider the context of each lesson to make sense of what observation scores really tell us about instruction. In the chapter, they use the perspective of the teachers’ set learning goals as context-sensitive lenses and analyze their instruction in relation to these goals. In addition, Luoto and Selling analyze the same instruction with a standardized observation system, the PLATO-manual, and discuss the results from these two approaches. These perspectives – the teachers’ own goals and the standardized measure – may be seen as contrasting, but the authors argue that by using them as complementary lenses, we may come closer to finding ways of measuring instruction in rigorous and contextsensitive ways.
Blikstad-Balas, Klette and Tengberg (Eds.)
This collection of chapters originates from the discussions at the QUINT Conference 2019: Analysing Teaching Quality: Perspectives, Potential, and Pitfalls, organized by the Nordic Centre of Excellence: Quality in Nordic Teaching 18−20 June 2019 at the University of Oslo.
This book is also available open access at Idunn. ISBN printed edition (print on demand) 978-82-15-04506-1
WAYS OF ANALYZING TEACHING QUALITY
Recent research suggests that there is a direct link between quality of teaching and teacher’s instructions and student achievement scores. Additionally, there is a strong correlation between levels of teaching quality and differences between schools and classrooms. However, measuring teaching quality has proven to be difficult and scholars strive to decide on the what and the how when aiming to measure teaching quality. This book discusses the many dilemmas of measuring teaching quality, be it substantial, theoretical, or methodological. This edited volume presents eight chapters, assigned authors provide updated and new knowledge on the many challenges linked to defining what teaching quality is and how it can be measured.
Open access
Marte Blikstad-Balas, Kirsti Klette and Michael Tengberg (Eds.)
WAYS OF ANALYZING TEACHING QUALITY Potentials and Pitfalls