Paul Hambleton, Deputy Headteacher at Cheslyn Hay Academy
At Windsor Academy Trust our Threshold Curriculum is well-established. Our assessment framework allows leaders and teachers to measure the impact of the curriculum and teaching and determine the progress that students are making towards mastery of the Threshold Concepts.
‘A school’s assessment system has many jobs to do. It must provide useful information for students, parents, teachers and leaders about the progress being made by students’ (Wiliam, 2014, p. 6). This is a simple aim that can be difficult to achieve but we know how important assessment is to diagnose and improve learning.
The Purpose of Assessment
Formative assessment is that which provides feedback to students and helps them to learn (Newstead, 2004, p. 97) while the aim of summative assessment is to ‘achieve a summary mark which captures a student’s performance relative to that of other students’ (Newstead, 2004, p.97).
It is important to consider the purpose of WAT’s summative assessments, called Key Assessment Tasks, because this can impact how accurate they are in diagnosing where students are in their learning and can also have an impact on the curriculum, teaching and learning and strategies to accelerate progress more widely Evaluating teaching is an important aspect of summative assessments as it helps us identify strengths.
1. Formative assessment is not graded and is part of our approach to teaching and learning; this is assessment that takes place in lessons as part of our teaching cycle. It can take many forms, such as Key Learning Tasks, multiple choice quizzes, short-answer quizzes or the use of miniwhiteboards. They are often used in Smart Starts which frequently take the
form of low-stakes retrieval quizzes. Marks from these are not collected or analysed centrally; they inform what the teacher does next.
2. It is important to evaluate the impact of our teaching as doing so allows us to improve it (such as through developing the curriculum, teachers or leaders) and so we have a scaled score system in Key Stage 3 and a grading system in Key Stages 4 and 5 using summative assessments (called Key Assessment Tasks), that are undertaken in week 10 of each of three curriculum cycles annually, that allows school and Trust leaders to gauge the progress students are making.
The purpose of our Key Assessment Tasks is to diagnose the extent to which students have achieved understanding of the Threshold Concepts or examination content in each subject. This supports teachers, school subject leaders and Trust Subject Directors/Leaders to take actions that move students forward in their learning and ensure our curriculum is refined and improved over time.
This emphasis is a common theme in the OECD’s evaluation of school systems internationally which drew the conclusion that ‘the point of evaluation and assessment is to improve classroom practice and student learning. With this in mind, all types of evaluation and assessment should have educational value and should have practical benefits for those who participate in them, especially students and teachers’ (OECD, 2013, p. 2).
WAT Cohort
Ranking
A common assessment follows logically from a shared curriculum and allows comparison between schools to determine where areas of best practice are to be found so they can be shared and their positive effects multiplied. Other School Trusts have described the benefits of a Trust wide assessment system including Ark Schools (Davies, 2020, p. 75) which also recognises that formative and summative assessments serve different purposes and which is why formative assessments are not graded in any way.
While formative assessment needs meaning for students and teachers to move themselves forward in their learning, where summative assessments are concerned, ‘shared meanings are much more important’ (Wiliam and Black, 1996, p. 544). This means the information generated by our summative assessments has to be able to be interpreted not only by teachers and students but also parents and school leaders.
At Key Stages 4 and 5, GCSE and A Level grades along with their vocational equivalents are a natural way of reporting the results of Key Assessment Tasks that already have shared meaning; over time we come to know the meaning of what standard is represented by each grade irrespective of the subject or stage in the educational journey. At Key Stage 3, this used to be achieved by National Curriculum Levels before their abolition; ‘they had become disconnected from their original purpose; (Myatt, 2018, p. 57) and the Department for Education rightly identified that they ‘distorted the purpose of in-school assessment’ (DfE, 2015, p. 5) and ‘had a profoundly negative impact on teaching’ (ibid.).
At Windsor Academy Trust, scaled scores create shared meaning instead, but without the negative impact on teaching and learning. Raw marks do not create shared meaning because ‘the raw marks needed to achieve a certain standard fluctuate depending on the particular assessment and the version of the assessment being taken’ (Christodoulou, 2016, p. 193) whereas using scaled scores ‘is the method used for transforming raw marks, which are not comparable, into a scale that is comparable’ (Koretz, 2008, cited in Christodoulou, 2016, p. 193). How scaled scores are used to improve students’ learning and for school improvement activities is discussed later.
This system exploits the benefits of consistency and scale employed by other successful School Trusts, such as Ark Schools (Davies, 2020, p. 80) and is an example of where our Trust ‘operates as a single organisation’ which is ‘one of the five things that mark out great multi-academy trusts’ (Carter, 2020, p. 35).
National Benchmarking
The strength of outcomes achieved within the areas of best practice in our family of schools affords us confidence that the curriculum and assessments designed to evaluate its impact are robust. Due to the bespoke sequencing of our curriculum and the fact that there is no national test that assesses it, it is not possible to benchmark assessments against a national cohort in a useful way. Our aim is to ensure that the standard of our assessment is ambitious, sometimes through external validation and sometimes where outcomes in subsequent assessments, such as at Key Stage 4, are high and robust.
Reading ability is assessed using nationallyreferenced systems to ensure students who need additional support with reading are able to benefit from targeted interventions. These are taken periodically and monitored by leaders to measure the impact of reading intervention programmes.
National benchmarking is achieved at Key Stage 4 and 5 through the use of appropriate grade boundaries using established assessment items from exams from previous exam series and the use of the Subject Progress Index provided by SISRA Analytics. We provide time for rigorous moderation to ensure accurate grading against previous outcomes in all qualifications.
Respecting Subject Domains
Myatt (2018, p. 137) asserts that school leaders ‘need to trust the subject leaders as specialists and to have conversations with them about what is being taught and the rationale for it being included’. Collaboration between the Subject Leaders and Teachers across the WAT family, led by the WAT Subject Director/Leaders is fundamental in developing the Threshold Curriculum and it is an equally fundamental part of assessing it; the desire to create shared meaning must not override the quality of the assessments that generate it and this means ensuring that each subject can assess in a way that respects the nature of that subject.
This is further supported by Wiliam who asserts that ‘it is more productive to focus on decisiondriven data-collection, rather than on data-driven decision-making. By focusing on the decisions that need to be made, rather than the data, we are far more likely to collect the right data, the right amount of data, in the right way, for the need at hand’ (Wiliam, 2014, p. 9). It is so fundamental that our assessments collect the information that students, teachers and leaders need in order to move students forward in their learning.
Whole-Class, Evidence-Based Learning Checks
One of Rosenshine’s (2012) Principles of Instruction is to ‘ask questions’. When asking questions, it is important to ‘check the responses of all students’. Asking all students the questions is fundamental to achieve a number of aims:
• Engage all students in the learning, as opposed to just one student being asked a question
• Give the teacher vital information about the understanding of all students which they can then respond to in the form of adapting explanations and models or providing targeted support to individuals or groups who need it
A second principle is to ‘check for student understanding’ which ‘can help students learn the material with fewer errors’. This is because misconceptions can be identified early in the learning process and therefore corrected before they become embedded.
Learning checks are a vital part of the Teaching and Learning Cycle and are necessary to give teachers confidence that students are ready for the next step in their learning or provide the springboard for effective feedback to be given, where necessary. Learning checks should be:
• Whole-class, i.e. every student gives a response, not just one student
• Evidence-based, i.e. they assess what students actually know, not what they think they know. This means asking students to demonstrate their knowledge or understanding, not declare how ‘confident’ they feel
Effective whole class, evidence-based learning checks can take many forms, such as:
• Mini-whiteboards to show the teacher a response from all students simultaneously
• Hinge question such as a multiple-choice question whose distractors tell the teacher what the students’ misunderstandings are
• A true or false question to test component knowledge
• Short quiz on platform such as Google Forms where the results are analysed immediately
A high value is placed on the expert knowledge of our teaching staff and the leaders of subjects at both school and Trust level and the design of our assessments can vary depending on the subject and can also vary from one cycle to another in the same subject; the principle that underpins our system is that assessments must achieve the outcomes we need and not necessarily get there in the same way and that they must also be able to be completed by all students across the schools in the Trust.
As a result of this principle, assessments sometimes have common features but they are bespoke to each subject. For example, in English and maths, a formal assessment is taken and a raw score generated which feeds
into our Trust Cohort Ranking system. For some performance-based or practical subjects, such as Physical Education (PE), a rubric has been developed that allows students’ performance to be teacher-assessed through a range of different sports or activities.
This allows flexibility to exploit the facilities within our different schools and the expertise of our staff within those schools. Achieving consistency with teacher assessments is key and this is where standardisation and moderation are so important; time is dedicated to this every cycle.
Critically important is that all the schools in our Trust carry out common assessments within each subject area, so that the benefits of consistency and scale that are so important can be gained.
Validity and Reliability
It is important that our assessments are as valid and reliable as possible, meaning that they can tell us what a student knows and can do, accurately. In order to determine if an assessment is valid, it is important to know its purpose and what inferences are to be made: ‘validity is not a static property of a test…, but is contingent on what a test is for, how it is used and how the results are interpreted. Validity is about purposes and uses as well as about what is in the test’ (Stobart, 2009, p. 162). It is for this reason that we are clear about the priority of purpose of our assessments, which is to give students and teachers the information they need to improve their learning.
Reliability is about how accurate the results of our assessments are and the key ways we aim to maximise the reliability of our assessments is firstly through ensuring consistency in assessment conditions across the schools in the Trust and secondly through the high importance we place on moderation to ensure that marks awarded are as accurate as possible.
Knowledge and Application
The Threshold Curriculum is, by design, rich in knowledge and in the application of that knowledge and this is mirrored by the Examination Curriculum in Key Stages 4 and 5 where the qualifications our students are working towards require both knowledge and skills, and their synthesis to access higher grades. Hirsh’s (2017, p. 191) research concluded that ‘a wellrounded, knowledge-specific curriculum can impart needed knowledge to all children and overcome inequality of opportunity.’
Our Trust Subject Directors/Leaders endorse this view and also know how important it is for our students to have the opportunity to develop their skills in applying that knowledge in different contexts. Our aim is to develop students as critical thinkers and problem-solvers in each of the subjects they engage in and to do this they need knowledge and they need to practise applying it in different contexts and so it is vital that our assessments incorporate elements of both of these things.
Components and Composites
Our discourse around knowledge centres around two categories: component knowledge and composite knowledge (see WAT Curriculum Codified Approach). This is because the more specific we are about the knowledge students need to acquire in order to be successful learners, the more effective teaching, learning and assessment can be. Ofsted agrees, whose inspectors ‘will also look at how the leaders (including trust leaders) responsible for the curriculum have broken down the content into components and sequenced that content in a logical progression, systematically and explicitly, for all pupils to acquire the intended knowledge and skills’ (Ofsted 2023).
The EEF’s (2021) ‘Teacher Feedback to Improve Pupil Learning’ has as one of its principles ‘Lay the foundations for effective feedback’ (EEF, 2021) which highlights the importance of designing assessments with feedback in mind; they must allow for feedback which moves the student forward in their learning as their prime focus but must also be as reliable and valid as possible in terms of facilitating leaders’ monitoring of the impact of the curriculum.
Integral to the success of an assessment and its ability to move students forward in their learning is the impact it has upon a student’s self-efficacy and assessments should aid student motivation rather than be a barrier to it. Ofsted (2021) recognised this in its Research Review Series, as exemplified in its languages review in which is stated:
‘in order to have a positive impact on their motivation, curriculum design should ensure that pupils:
• feel successful in their learning
• are clear about how to make progress’
Clearly assessment has a significant role to play here to ensure students are left motivated and clear that they can succeed and make progress.
Christodoulou (2015, p. 22) defines authentic assessments as ‘those which aim to represent more accurately the kinds of problems a pupil might face in the real world’ and she argues that while they might seem fairer because they are assessing the skills we want students to have by the end of the learning experience, they have lots of technical flaws (ibid.), such as ending up being so complex it is not possible to assess where a misconception lies. This is also reflected in Ofsted’s (2021) languages Research Review Series which advocates that ‘assessment should form a balance between language elements tested in isolation … and assessments of integrated language’. This can be applied to other subjects in the same way. As such, our assessments in Key Stage 3 need not and should not mirror the format of the final assessment, although in Key Stages 4 and 5 it will become more appropriate for this to be the case.
Key Learning Tasks (KLTs) can be used as ‘milestone’ assessment throughout a cycle. These are assessments that are not necessarily standardised across the Trust, or even within a school. They should be used to support teachers and students to identify progress being made towards Threshold Concepts throughout a cycle but should not be more frequent than one every 2-3 weeks and they should not take a significant amount of the curriculum time for teaching. There is no prescribed format for these and the most appropriate format should be selected by teachers or leaders to give the information they need. For example, this could be:
• A multiple-choice quiz of component knowledge covered so far
A small paragraph of writing to assess the extent to which students can apply an element of what they are learning
• A performance task observed by the teacher within a practical subject
Key Stage 3 Key Assessment Tasks (KATs): Purpose, Principles and Practice
Purpose
1. KATs are primarily formative in their purpose; their aim is to maximise learning for students (e.g. by enabling teachers to accurately identify misconceptions)
2. The summative functions of KATs will still inform about progress as the system is based on ranking which is possible under our system that allocates marks
Principles
1. KATs do not need to be ‘authentic’, which is to say they do not need to mirror what the ‘ultimate’ assessment will be, e.g. a GCSE-style question
2. KATs should assess retention of a sample of the component knowledge students are expected to learn, prioritising the most powerful knowledge we want students to have
3. Around two thirds of the marks should be MCQs and/or short-answer knowledge questions and a third for application/ extended written responses
4. KATs should assess the extent to which the Threshold Concepts have been understood
5 KATs need to take account of burden of workload under our assessment calendar with assessments taking place concurrently which can make it difficult for teachers to mark a high volume of assessments, especially when less marking-intensive formats can yield better information for feedback
6. KATs need to take into account reasonable adjustments for students with SEND
7. KATs should assess students on what they have been taught in a reasonable measure, e.g. if there is learning that has only been accessed by a limited number of students due to the mastery approach, this should not form a significant part of the assessment
8. KATs should include interleaved content from previous cycles
9. KATs should allow students to feel successful in their learning and ensure they are clear about how to make progress
Practice
A typical KAT might be 40 marks to be completed in a single 50-minute period, although multiple periods might be used for core subjects to give larger overall assessments:
1. 30 marks MCQ/Short-answer
a. 5-10 marks: Tier 3 Vocab
b. 20-25 marks: MCQ or short-answer questions on component knowledge and Threshold Concepts (to identify really key misconceptions)
2. 10 marks (or maximum 1/3 of the marks available) on application/problem-solving
3. MCQs and short-answer questions could be carried out using a Google form
Key Stage 4/5 Mock Exams:
Purpose, Principles and Practice
Purpose
1. Allow students to experience what formal exams feel like, in terms of venue, protocols and learning to manage the pressures that can arise
2. Provide an opportunity for teachers and leaders to set a summative, holistic assessment that has been well-designed and marked as accurately as possible (including an appropriate degree of standardisation and moderation), in order to identify where knowledge, skills and understanding are secure and where they need further development which informs next steps in terms of informing subsequent curriculum planning and possible interventions
3. Evaluate the impact of teaching and learning
Principles
1. Aim for common assessment where there is a common curriculum
2. Where there are curriculum deviations, this can be reflected in deviations to assessment; these deviations will have been agreed
3. Even where there is a common curriculum there may be a rationale for a deviation from a common assessment; such deviations should be agreed with the WAT Subject Director/ Lead’s WAT link
Practice
1. Assessment should cover only content that has been taught
2. Assessments should mimic as closely as possible what is likely to be assessed in the real examination; for some subjects this will mean using a paper from a previous series and for others it will mean constructing a paper similar to a paper from a previous series but with a theme that is more likely to appear on the real exam
3. Assessments should take place at a similar point in the curriculum. This may not be on the same day across all schools
4. Assessments should be sat under similar exam conditions across schools but schools will determine the most appropriate conditions for each series in terms of venue and timetable to account for local factors
5. Students should not be given detailed topic lists for mock exams. Broad guidance to which area of the curriculum is to be assessed will be appropriate in some subjects and all students across the trust should be given the same guidance. WAT Subject Directors/Leads to agree this with FDs, discussing with their WAT link if needed
6. There will be an agreed grade prediction methodology in each subject
7. Assessments need to take into account reasonable adjustments for students with SEND
8. Feedforward activities are carefully planned; for example, students could be provided with a Question Level Analysis for each paper. This will signpost additional revision materials and also be used by teachers to provide immediate and targeted feedback or reteaching
9. Arrangements are made for absent students to take mocks under conditions as similar to other students as possible, upon their return
to school. This may need to be planned carefully, for example to ensure that students do some assessment in all subjects even if they cannot do all assessments in all subjects
10. The mock exam will be quality assured by the WAT Link prior to completion
11. A standardisation session before marking of the mock exams provides teachers with training on how to mark the exams accurately. WAT Subject Leads moderate a sample of marking after the event to ensure mark schemes have been applied consistently
Gradesets used across the Trust
The following gradesets will be collated and analysed at Trust level. It is important that this methodology is used for comparability. Schools are at liberty to collect and analyse other data where there is a clear purpose for doing so.
• Mock Grades:
• For exam subjects, this should be the grade the student achieved in the exam, even if this is not in line with the student’s classwork grades
• For vocational subjects or those with a high coursework component, these should be ‘working at’ grades, which teachers assign based on their evaluation of the student’s coursework and portfolio work
The principle with this grade is to give a snapshot of where the student is in each subject based on a format that mirrors their ultimate assessment for the qualification
• Predicted Grades: This grade is provided by the teacher and is where the teacher’s professional judgement is critical and allows other criteria to be taken into account. The principle here is ‘what grade is the student most likely to get, if they continue to work as they have done so far?’. This grade should not be conservative or lenient; it should be what grade is most likely. Teachers might use other data to inform this grade, such as:
• Classwork/homework data
• How students who have achieved a similar standard in the past went on to achieve in the final qualification
• Considering more holistically the ‘standard’ required for each grade and the likelihood of the student progressing from the mock to the end of the course
Teachers should be given guidance from Subject Leaders (school and/or Trust) to inform their grade prediction methodology.
Grading mock exams
The guidance herein is not applicable to exam series in 2020, 2021 or 2022 due to the changes made to mitigate for disruption caused by the Covid-19 pandemic. Using assessment items from these series may still be beneficial but it will not be possible to grade these as accurately as other exam series.
The reason that grade boundaries fluctuate is to allow exam boards to award a student a grade commensurate with their achievement, in light of the fact that it is incredibly challenging to write a set of exam papers that has the same level of difficulty from on year to the next.
As such, exam boards use data around how well students achieve in each qualification in each exam series to set boundaries that result in the same standard being applied each year.
Our general principle is to use the most recent exam series as the basis for mock exams and therefore use the most recent exam series grade boundaries to grade them. This is because the level of difficulty of the papers is matched to the grade boundaries that have been set nationally.
If a different series is used, the grade boundaries that were assigned to that series should be used.
In all cases, we apply a 3% buffer to grade boundaries. This means increasing the number of marks required for each grade by 3% (rounding up where necessary to the nearest whole number). For example, if an exam requires 100 marks for a grade 8, we would increase 100 by 3% so the grade boundary for a grade 8 would become 103. This allows us to be more conservative in our predictions which means students whose grades are less secure will benefit from support such as additional teaching sessions.
Where assessments have items drawn from multiple exam series, boundaries should be set in the most appropriate way.
WAT Subject Leads should work with subject leaders in school to review scripts from each exam series at key points, to ensure expertise in the standard of each grade level. This will support decisions around grading and support with standardisation processes.
Vocabulary, Multiple Choice Questions (MCQs) and Interleaving
The power of an expansive and diverse vocabulary is huge and students’ possession of a strong vocabulary unlocks vast opportunities; Hirsh (2017, p 48) describes vocabulary size as ‘the single most reliable correlate to reading ability’. The more confident our students are at reading, the more easily they can access learning and develop higher level responses. Developing vocabulary is a key part of imparting knowledge and students need certain words to become the critical thinkers and problem solvers we are aiming to develop. As such we identify Tier 2 and Tier 3 words in our curriculum that students should know and teachers build opportunities to develop and embed these words into their teaching. Therefore, they are part of our assessments.
Where written tests form part of our assessments, Multiple Choice Questions (MCQs) are incorporated. There are several reasons for this:
• When carefully written, they can expose key misconceptions that need to be addressed. This means the ‘distractors’ (the incorrect answers students have to avoid in order to get the right answer) have to be written in such a way that identifies what their misconception is if they have not arrived at the correct answer.
• Students find them encouraging
They are high-impact, low-workload for teachers
• They guide precise next-steps to inform planning to close gaps in knowledge
• They can be used in all subjects
Short-answer and multiple-choice questions allow students to be assessed in a beneficial way on the knowledge, skills and understanding that will allow them to then achieve well on extended writing tasks or problem-solving tasks. Christodoulou is clear that multiple-choice questions ‘are capable of testing higher-order skills’ (Christodoulou, 2014, p. 23) and suggests they could ‘improve both exam reliability and classroom assessment’ (ibid.).
Hirsh recommends a mixture of both multiplechoice questions and a writing task to assess writing (Hirsch, 1988, cited in Christodoulou, 2014, p. 23). This exploits the best of both formats; multiple-choice questions which assess knowledge and understanding and then a writing task to apply this. This is supported by Buckles and Siegfried who researched the use of multiple-choice questions in assessing the skill levels of Bloom’s taxonomy and ‘contend that multiple-choice questions can be used to test student achievement up [to] Bloom’s level four - analysis’ (Buckles and Siegfried, 2006, p. 50), suggesting that multiple-choice questions can assess analysis and application as well as knowledge and comprehension. Extended written responses or complex problems are then needed to assess synthesis and evaluation skills.
Wiliam agrees; ‘with short answer questions, it is possible to ask a lot of questions in a given amount of testing time. This means that it is unlikely that a student’s total score is strongly influenced by the actual choices of items included in the test’ (Wiliam, 2014, p. 21). As such, multiple-choice questions and shortanswer questions can mitigate against some of the issues around assessment design such as how the content is sampled.
Within lessons, ‘hinge questions’ that are well-designed can identify misconceptions in a timely manner and increase the impact of lesson time if they are carried out before students complete work independently.
Rohrer’s research appears to show that interleaving content ‘typically improves final test scores’ (Rohrer, 2012, p. 357). With this in mind we ensure that previously-learned topics are interleaved into our curriculum and so we
also include questions which test knowledge, skills and understanding from previous cycles in our assessments.
Retrieval practice also plays a critical role in moving knowledge into the long-term memory: ‘storage strength is increased both by restudying an item or by successfully retrieving it from memory, but that retrieval has a bigger impact on storage strength than restudy’ (Wiliam, 2020, p. 187). This is known as the ‘testing’ effect and can be used as part of day-to-day lessons but it is also important to give students opportunities to recall previous learning so that they have a great chance of storing it in their long-term memory.
WAT Education Progression Model and Progress Tracking
We want our assessment to perform two functions; move students forward in their learning and to measure the learning that is taking place. The former is much more important to us and to achieve this our cycles are based around a WAT Education Progression Model. There are three of these cycles each year.
We complete summative assessments three times a year because they ‘need to be far enough apart that pupils have the chance to improve on them meaningfully’ (Christodoulou, 2016, p. 193). Grading can be useful but is not the purpose of assessment; research by Black and Wiliam (1998) found that ‘the giving of marks and the grading function are overemphasized, while the giving of useful advice and the learning function are underemphasized’.
Phase 1: Teach (10 weeks)
Assessments identify the extent to which the Threshold Concepts have been understood and the required knowledge has been acquired.
Phase 2: Assess (1 week)
Teachers diagnose misconceptions in learning to inform Phase 3.
Phase 3: Review and Response (2 weeks)
Students review, re-draft, re-do to close gaps in learning. There are opportunities to go beyond if concepts have been fully mastered.
In our system, students get feedback immediately without receiving a grade or scaled score; teachers are able to use the assessments to close gaps in learning without the distraction of grades. Sherrington believes that ‘there is usually an authentic, natural, common-sense mode of assessment that teachers choose with an outcome that fits the intrinsic characteristics of the discipline’ (2014) and our system allows teachers to use this authentic assessment in classrooms while our scaled scores allow for shared meaning from those assessments.
The Department for Education produced a report in 2018 on the effective use of data in schools with some key principles to ensure that the purpose of assessment is clear, it is interpreted in a sensible way and that the frequency of data collection is proportionate (DfE, 2018, p. 5). With clear purpose and an appropriate frequency of three times per year for summative data collection, it is then important to ensure that interpretation supports teacher and leadership actions (Davies, 2020, p. 88).
Trust Subject Directors/Leaders use the assessment data to evaluate the impact of their curriculum and teaching and to inform refinements in the curriculum design and delivery and teacher effectiveness. It can also identify areas of strength across the Trust to facilitate the sharing of best practice. This is endorsed by the OECD report which cited ‘greater reliance on evaluation results for evidence-based decision making’ (OECD, 2013, p. 1) as an emerging theme from school systems internationally.
Davies (2020, p. 83) asserts that ‘the most useful summative assessment data … [is] each student’s network percentile ranking’ which is what is represented by our scaled scores at
Key Stage 3 and is achieved using grades at Key Stages 4 and 5. Tracking a student’s score in Key Stage 3 from one cycle to the next would approximate their rate of progress.
The inferences made by changes to scaled scores rely on the level of demand in the assessment being sufficiently high to mean that the standard has to raise in order for a scaled score to reflect progress.
Progress is not linear and in a system such as ours, fluctuations are to be expected; the scaled score system allows us to account for this but still identify students who are vulnerable to underperformance as well as celebrate exceptional progress. There is a tolerance within which we accept that students are likely to be maintaining their rate of progress. Scores beyond this tolerance signify exceptional progress and students who need additional support.
Using Scaled Scores
Following the allocation of scaled scores to KATs:
• This information should be reported to parents, students and teachers.
• Students should be given an opportunity to reflect on the progress they have made and any causes that have resulted in their progress, with the aim of improving their effectiveness as learners.
• Senior leaders should review scaled scores to identify underperforming students and implement appropriate interventions. As a guide, a student achieving a scaled score of
10 or more above their KS2 score is likely to be making exceptionally good progress in that subject. A score of 5 or more below their KS2 score could be a cause for concern.
• Senior leaders and WAT Subject Leads should analyse scaled score data to identify subjects in each cohort that have performed particularly well, with the intention of sharing good practice. They should also identify any areas that may be causing concern to determine whether lower scaled scores are indicative of a development need in the subject and implement appropriate interventions.
Phase 3: Review and Re-do
Education Progression Model
Our Threshold Curriculum is the progression model through which our students unlock their academic potential.
The curriculum is organised into 3 cycles of learning each year. Each cycle contains three distinct phases:
Phase 1: Threshold Curriculum
Threshold Curriculum (LTP)
Threshold Concepts
Component Knowledge
Tier 3 Vocabulary
Sequencing for long-term memory
Teaching & Learning
Sunshine Model Aspire Framework (character virtues and learner skills) Student Engagement
Phase 2: Assessment
High quality Key Assessment Task to determine the extent to which the Threshold Curriculum has been learnt and applied.
Precisely assesses how far component knowledge has been retained and understood
Identifies where misconceptions exist
Facilitates highly effective response to the assessment in Phase 3
Phase 3: Review and Response
Planning for effective response is informed by the assessment. It is not directed by the Long Term Plan, but by the information provided by students through the assessment as they are telling you what they know and do not know securely.
1. Teacher review of KAT information (e.g. MCQ, readingwritten responses)
2. Identify which questions were answered incorrectly by a significant number of students
3. Identify which component knowledge this set of questions was assessing and therefore needs to be addressed
Reteach identified component knowledge
Redraft or redo to practise component knowledge that has been retaught, providing WAGOLL where appropriate
Reassess within lesson (e.g. Q&A, MCQ, live marking, self marking, peer marking, AfL app, exit tickets
Reapply knowledge to new contexts, providing WAGOLL
Planning to address misconceptions
Planning to deepen understanding of fragile knowledge Response Review
Designing good MCQs
• Very precise and focus on a small aspect of a topic but these can be crucial concepts.
• Have distractors that identify students’ misconceptions; it must not be possible for easy guesses.
• If lots of them, give teachers a far better understanding of a student’s specific strengths and weaknesses.
• Easier to analyse.
• Less time to mark and give feedback.
• Takes less time for students to complete.
• With sufficient number of questions it’s unlikely students will repeatedly ‘guess’ correct answers.
• Can have MCQs with more than one right answer.
Collaboration, Moderation, Validation
‘Schools working together leads to better results’ (DfE, 2010, p. 57) was an assertion made by the Department for Education in a White Paper in 2010 around how academy trusts can improve outcomes for students. This is also endorsed by the Confederation of School Trusts who assert that ‘strong structures can enable strong practice to exist in all
Collaboration
We benefit from the input from a large number of subject specialists across our schools.
Working collaboratively to develop our curriculum and assessments means we can draw on a wide range of skills and expertise.
schools’ (CST, 2021, p 9) because ‘the strategic oversight and accountability inherent in the Trust structure can drive evidence-informed school improvement’ (ibid.). WAT’s assessment system is a real example of this in action. These are three key principles which underpin our assessment system and ensure the quality of our assessment design and implementation.
Moderation
We build into our calendar time to moderate assessments across our schools, giving leaders and teachers the opportunity to discuss how grading criteria and mark schemes are applied.
This means we can be more confident that our assessments are marked accurately.
Our Trust subject leaders ensure that assessments are marked accurate.
Validation
Because the assessments in our Threshold Curriculum are developed across our schools and the moderated and scored using the data from across our schools, individual schools gain validation about the standard students are achieving in individual schools because they are compared to those in other schools.
Feedback
The Education Endowment Foundation (EEF) produced a report in 2021 on ‘teacher feedback to improve learning’ in which it is asserted that ‘done well, [teacher feedback] supports pupil progress, building learning, addressing misunderstandings, and thereby closing the gap between where a pupil is and where the teacher wants them to be’ (EEF, 2021, p. 4). The report gathers together a systematic review of the evidence, the expertise of an advisory panel and research on current practice, giving six key recommendations. This metastudy is the framework on which our feedback policy is based.
At this stage it is important to note that we have ‘give and receive feedback’ as one of the ASPIRE learner skills that runs through all that we do at Windsor Academy Trust. It is so important that students are able to act effectively on the feedback they receive and so it is equally important that the feedback they receive has the potential to close the gaps in their learning when they do.
Wiliam, who has contributed much to the development of thinking in this area, particularly developing and enacting the findings of his 1998 article ‘Inside the Black Box’ and contributing to this report by shaping the six recommendations. In his foreword he argues that ‘the starting point for effective
feedback is eliciting the right evidence’ (EEF, 2021, p. 5) which is what our summative assessments aim to do and why they are under constant development to achieve this aim.
This report highlights the importance of having effective feedback running through teaching and learning and not a bolt-on that is considered only after the event of an assessment. It has to be authentic to the assessment in using that assessment to move students forward in their learning; ‘the idea is that, after feedback, students will be able to do better at some point in the future on tasks they have not yet attempted’ (EEF, 2021, p. 5). This is a bold but desirable ambition.
The report also recognises the fact that teacher feedback takes time and so there is always an opportunity cost. We seek to get the best impact from the time teachers spend on feedback. In the same way that our assessments are not one-size-fits-all in terms of subject domains and format between year groups, this report acknowledges that teaching is too complex for a set of methods that are guaranteed to succeed. However, what we have got is a clear set of principles underpinning the teacher feedback students receive and we exemplify that with a range of strategies that can achieve this effectively.
What is teacher feedback?
Feedback is ‘information given by a teacher to pupil(s) about their performance that aims to improve learning’ (EEF, 2021, p. 7). Feedback can:
• Focus on different content;
• Be delivered in different methods;
• Be directed to different people; and
• Be delivered at different times (ibid.)
This infographic, taken from the EEF’s Teacher Feedback to Improve Pupil Learning 2021 report, exemplifies these:
WAT’s approach to feedback aims to maximise the input-to-impact ratio so that feedback is a lever for improving students’ understanding so that they produce better quality work in future. There is an opportunity cost
associated with feedback and it is important to balance the impact of that feedback with the reduction in time for other tasks, such as planning.
Principles and Methods
This is not intended to be an exhaustive list of how feedback can be done, rather it is a declaration of the principles that must underpin feedback and some exemplification of what these might look like in practice.
These are exemplified in greater detail in the EEF (2021) report. These principles are the result of the bringing together of the best practice from the research and sector representatives
Principle 1
Lay the foundations for effective feedback
and they seamlessly align with the Windsor Academy Trust vision for Increasing Teacher and Learner Effectiveness. The first two principles should result from the model for Teaching and Learning including the Learning Cycle. The third principle is one of the six Learner Skills within our ASPIRE framework and it is clear that feedback can play a significant role in developing our students’ self-regulation which is so crucial to developing successful learners.
Principle 2 Principle 3
Deliver appropriately timed feedback that focuses on moving learning forward
Plan for how pupils receive and use feedback
Exemplification
What this looks like at Windsor Academy Trust
Principle 1: Lay the foundations for effective feedback
• Before providing feedback, teachers should provide high quality instruction, including the use of formative assessment strategies.
• High quality initial instruction will reduce the work that feedback needs to do; formative assessment strategies are required to set learning intentions (which feedback will aim towards) and to assess learning gaps (which feedback will address).
• Threshold and Examinations Curriculum deliver a carefully-sequenced curriculum which teach essential concepts knowledge and skills
• Retrieval practice built into the curriculum
• Learning cycle, which emphasises the importance of modelling, explaining and whole-class learning checks
• High Challenge for All framework to support scaffolding and high expectations for all
• Evaluating ‘What a Good One Looks Like’.
Principle 2: Deliver
appropriately timed feedback that focuses on moving learning forward
‘Feedback interventions delivered immediately after learning, delivered up to a week after, and delivered during learning are all associated with … positive effect is on attainment’ (EEF, 2021, p. 19)
Exemplification
There is not one clear answer for when feedback should be provided. Rather, teachers should judge whether more immediate or delayed feedback is required, considering the characteristics of the tasks set, the individual pupil, and the collective understanding of the class.
• Feedback should focus on moving learning forward, targeting the specific learning gaps that pupils exhibit. Specifically, high quality feedback may focus on the task, subject and self-regulation strategies.
• Feedback that focuses on a learner’s personal characteristics, for feedback that offers only general and vague remarks, is less likely to be effective.
What this looks like at Windsor Academy Trust Task
• Some tasks do not need feedback because misconceptions can become self-evident in lessons.
• Some tasks are designed to give immediate feedback, e.g. electronic quiz or where answers are provided and methods are sought from students, or the use of multiple-choice questions where misconceptions can be addressed quickly if the questions are designed well with the right distractors.
Student
• Teachers monitor which students might need feedback live in the lesson (e.g. walking the room, learning checks). Some students might be distracted or not sufficiently challenged if feedback is given too soon: ‘varying the amount of feedback depending on the pupil to ensure that they are not given the full answer but given enough guidance to usefully progress’ (EEF, 2021, p. 20)
Class
• Teachers aim to assess learning in lessons through whole-class learning checks or ‘walking the room’. If misconceptions affect a large proportion, teachers will provide whole-class feedback or re-teach the content.
For Key Assessment Tasks, feedback (in whatever form) is given at the earliest opportunity following the assessment. This is sometimes in the same lesson or sometimes up to a week later.
Exemplification
What this looks like at Windsor Academy Trust
Principle 3: Plan for how pupils receive and use feedback
• Careful thought should be given to how pupils receive feedback. Pupil motivation, self-confidence, their trust in the teacher, and their capacity to receive information can impact feedback’s effectiveness. Teachers should, therefore, implement strategies that encourage learners to welcome feedback, and should monitor whether pupils are using it.
• Teachers should also provide opportunities for pupils to use feedback. Only then will the feedback loop be closed so that pupil learning can progress.
These ideas are taken from the EEF’s 2021 report with the aim of preparing pupils for receiving feedback:
• Discuss the purpose of feedback (to emphasise that it is not critical but because there are high expectations the student can meet).
• Model the use of feedback (e.g. whole-class discussion on a student who has improved as a result of feedback and celebrating the use of effective feedback in the classroom).
• Provide clear, concise and focused feedback (less is more, so as not to overload).
• Ensure students understand the feedback given (considering carefully the language, handwriting etc to allow clear comprehension).
Providing time is essential to allowing students to use the feedback which makes it useful. Time is made for this in lessons but also each cycle has a two-week ‘Review and Response’ phase following summative assessment during which feedback can be given and students can act on it in order to close gaps in their learning before moving on.
Principles and general exemplification taken from EEF, 2021, p. 10.
Methods of Feedback
Written feedback can be effective but it can also be time-intensive for teachers to a degree such that the cost outweighs the benefit. The following strategies may be used to mitigate this where written feedback is used:
• Live marking, where marking is given during the lesson rather than after it. There are different ways of doing this, such as with individual pupils (see Principle 2 above) or as a whole class through the use of a visualiser. This can then be supported with verbal feedback and teachers should still be mindful that it fulfils the principles above.
• Coded marking, where teachers devise and share the ‘concept of quality’ that teachers are looking for. This can speed up the feedback by identifying the key features that the student needs to work on.
‘Thinking like the teacher’, where time is given for proof-reading and for students to anticipate what teachers might identify as areas for improvement. In this way it is hoped that when feedback is given it is more meaningful and not areas the student could address themselves.
• Written comments, which ‘may offer an invaluable opportunity to provide task, subject and self-regulation feedback. The key is to carefully consider when they are offered, ensure they include useful information, and carefully monitor the time being spent on them’ (EEF, 2021, p. 37).
‘Verbal feedback is an integral aspect of effective instruction that can be delivered in a variety of different ways’ (EEF, 2021, p. 38). It can have a high impact, but only if it is carefully planned and executed. The following strategies
may be used to increase the impact of verbal feedback:
• Target verbal feedback at the learning intentions, using the language set out in the learning intention.
• ‘Action points’, where students summarise actions or goals following detailed verbal feedback, along with the crucial opportunity to use the feedback.
• Verbal feedback using a visualiser, using previously completed or currently ongoing work to model and discuss learning intentions.
• Video or audio recording to provide feedback using digital tools such as the iPad
The important point to remember is that designing opportunities for feedback around the principles is the priority.
WAT Subject Directors and Lead Practitioners support teams across the Trust to develop and implement effective feedback strategies.
Conclusion
The purpose of WAT’s summative assessments is clear; to contribute to the improvement of student learning. This is achieved directly and also through the teacher and leadership actions that result from analysis of the data generated by assessments.
The design of our assessments is carefully considered to ensure it meets the needs of school subject leaders and Trust Subject Directors/Leaders in identifying the extent to which the Threshold Concepts have been understood and can be applied.
Our feedback principles maximises the value students get from assessments in moving their learning forward in the most effective way possible and also contributes to the development of their learner skills under the ASPIRE framework.
The processes in place ensure that our assessment system contributes to our school improvement work by creating the shared meaning that is necessary for students, teachers, leaders and parents to understand what assessment information is telling us.
References
Black, P. and Wiliam, D. (1998) Inside the black box: raising standards through classroom assessment, London: King’s College London
Buckles, S. and Siegfried, J. (2006) Using Multiple-Choice Questions to Evaluate In-Depth Learning of Economics, The Journal of Economic Education, 37 (1), pp. 48 - 57
Carter, D. with McInerney, L. (2020) Leading Academy Trusts: Why Some Fail but Most Don’t, Woodbridge: John Catt Educational
Christodoulou, D. (2015) Assessment Knowledge, in Knowledge and the Curriculum (A collection of essays to accompany E. D. Hirsh’s lecture at Policy Exchange. Available at https://policyexchange. org.uk/wp-content/uploads/2016/09/knowledge-and-the-curriculum.pdf (Accessed: 15th August 2021)
Christodoulou, D. (2014) Seven Myths about Education, Oxon: Routledge
Christodoulou, D. (2016) Making good progress? The future of assessment for learning, Oxford: Oxford University Press
Confederation of School Trusts (CST) (2021) Knowledge-building - School improvement at scale. (Accessed: 6th August 2021)
Davies, R. (2020) ‘Strength in numbers: operationalising a network-wide assessment model’ in S. Donarski, ed., The ResearchEd Guide to Assessment, Woodbridge: John Catt Educational, pp. 73 - 90
DfE (2010) The Importance of Teaching. (Accessed: 27th May 2021)
DfE (2015) FInal report of the Commission on Assessment without Levels. (Accessed: 27th May 2021)
DfE (2016) Making data work. (Accessed: 27th May 2021)
EEF (2021) Teacher Feedback to Improve Pupil Learning. (Accessed 19th July 2021)
Hirsch, E. D. (2017) Why Knowledge Matters, Cambridge: Harvard University Press
References
Myatt, M. (2018) The Curriculum: Gallimaufry to coherence, Woodbridge: John Catt Educational
Newstead, S. (2004) ‘The purposes of assessment’, Psychology Learning and Teaching, 3 (2), pp. 97-101
OECD (2013) Synergies for Better Learning: An International Perspective on Evaluation and Assessment. (Accessed: 10th December 2020)
Ofsted https://www.gov.uk/government/publications/school-inspection-handbook-eif/school-inspectionhandbook-for-september-2023 Accessed 8th October 2023
Ofsted research review: languages https://www.gov.uk/government/publications/curriculum-research-review-series-languages/ curriculum-research-review-series-languages
Rohrer, D. (2012) Interleaving Helps Students Distinguish among Similar Concepts, in Educational Psychology Review, 24 (3), pp. 355 - 367
Sherrington, T. (2014) Authentic Assessment and Progress: Keeping It Real. (Accessed: 27th May 2021)
Stobart, G. (2009) Determining validity in national curriculum assessments, Educational Research, 51 (2), pp. 161-179
Wiliam, D. and Black, P. (1996) Meanings and Consequences: a basis for distinguishing formative and summative functions of assessment?, British Educational Research Journal, 22 (5), pp. 537548
Wiliam, D. (2014) Principled assessment design. (Accessed: 27th May 2021)
Wiliam, D. (2020) ‘Learning and memory’ in S. Lock, ed., The ResearchEd Guide to Leadership, Woodbridge: John Catt Educational, pp. 185 - 197