Nick Pointer, Programmes Tutor, Ambition Institute, UK

The hegemony of assessment

For many years, schools have given a central role to summative-style assessments. The use of terminal exam questions to drive in-class learning and employing regular internal mock exams to generate grades and measure student progress have become unquestioned practices. This is demonstrated in recent national findings that the majority of teachers are asked to report graded ‘data-drops’ every term or half-term, with fewer than five per cent of teachers reporting that they submit attainment data just twice a year (FFT, 2019).

One impact of this is on staff workload. Teachers overwhelmingly report that too much time is spent preparing, administering and marking tests, in addition to recording, analysing and monitoring the subsequent assessment data (DfE, 2019). A less tangible consequence is the opportunity cost for student learning: termly internal examination cycles can take up to nine weeks out of the year, if a week is taken before assessments for revision and a week after to review. This represents a significant loss of curriculum time, making it doubly challenging for teachers to cover the necessary content in the time available.

However, in recent years there have been signs of a collective awakening to the idea that an excessive focus on examinations and data might not be as effective as previously believed. In revising the aims of Ofsted, HMCI Amanda Spielman has acknowledged that this historical focus on testing has been at the expense of curriculum thinking, and that using assessments as the benchmark for student progress might detract from more meaningful models for student learning over time (Spielman, 2018). Moreover, the Teacher Workload Advisory Group’s report ‘Making data work’ found no evidence that running more than two or three graded assessment points per year has any positive impact on student outcomes (Teacher Workload Advisory Group, 2018).

A 2019 FFT Education Datalab study concluded that headteachers might ‘perceive some (real or imagined) outside pressures regarding data collection’ (FFT, 2019) – notwithstanding Ofsted’s recent overtures to the contrary. An alternative hypothesis might be that there is an innate bias towards the value of conducting assessments – a belief that assessments are unique in their ability to both measure progress and add value to the learning process, and thus are an indispensable component of schooling. This article aims to address two widespread misconceptions that might be used to justify the ubiquity of mock exam use in schools: firstly, that practising exam-style questions is an effective learning approach for students, and secondly, that conducting frequent mock assessments can both allow effective tracking of students’ progress over time and accurately inform necessary adaptations to teaching. 

The limited power of assessments as learning tools

Underpinning the status quo use of assessments in classrooms is a tacit assumption that they are powerful tools for supporting students to learn and develop. It feels intuitively ‘right’ that as instructors we should look at the final ‘expert’ performance that we ultimately want our students to achieve and use this as a model for their development. This approach to developing expertise has been dubbed the ‘generic-skill approach’ (Christodoulou, 2017). It implies that marathon runners should train by running marathons, trainee surgeons should practise full surgical procedures or rugby union players best improve by playing full 15-a-side matches. There is a wealth of evidence to suggest that this is not how skill is best developed in these domains (e.g. Baker and Young, 2014; Ford et al., 2015), yet in the classroom, exam-style questions are regarded as an invaluable way for students to practise and improve, years before they will sit the real thing.

This approach fails to recognise the well-examined limits of the brain – new information must be processed and interpreted in working memory, and to do so requires extensive, well-connected prior knowledge or skill in a given domain (Sweller et al., 2019). As a result, novices largely fail to learn from problem-solving activities as they lack the well-organised prior knowledge with which to access and interpret novel problems (Kirschner et al., 2006). Therefore, practising complex, exam-style questions that require learners to problem-solve, think critically or draw on a wide range of content is often a counterproductive and confusing process for beginner students.

In fact, a more efficient way to develop expertise is the deliberate practice approach: identifying the component knowledge and skills that underpin the desired expert performance and practising these in isolation, with regular feedback (Ericsson et al., 1993). This allows learners to build up expertise over time whilst accounting for the finite limits of their working memories.

This approach is uncontroversial for PE teachers, who, for example, would never consider telling pupils to ‘just play volleyball, you’ll work it out’; instead, the three core moves of the sport are explained, modelled and practised in isolation, and put together only once fluency is achieved for each.

In the classroom, however, such approaches might bear uncomfortably little resemblance to the mainstays of exam-style questions: in maths, for example, Boulton (2017) has detailed the process of turning a seemingly single skill – solving simultaneous equations by elimination – into 13 separate sub-components. In practice, each of these sub-components was taught, modelled and rehearsed in isolation, and only once pupils were secure in each foundational step did the class combine them and practise the ‘joined up’ process, leading to success for 100 per cent of students. This deliberate practice approach is characterised by the fact that ‘the tasks we want students to master eventually often differ from those which build knowledge and offer practice’ (Fletcher-Wood, 2018, p. 3).

Exam-style questions are poor diagnostic tools

This other significant shortcoming of the status quo use of mock exams is the attempt to use them for both summative and formative purposes. It is important to note that there is no such thing as a ‘summative’ or ‘formative’ assessment – what matters is the use that we make of the information elicited from an assessment (Wiliam and Black, 1996).

Formative purposes – when assessments are used to diagnose student understanding, skill, misconceptions or gaps in knowledge – serve to inform adaptations to teaching practices (Wiliam, 2014). Assessments that are well designed for formative uses allow us to make very precise inferences about what students understand at a given point – for example, a well-designed multiple-choice question might allow us to conclude that a student holds the misconception that ‘more dense objects sink because they are heavier’. However, such assessments are poor at reliably measuring student progress and, crucially, using them to compare student performance is tough, if not impossible, since unless we can objectively measure the relative difficulty of questions (a non-trivial process), students’ scores in such assessments tell us very little (Christodoulou, 2017).

Summative purposes – where we aim to benchmark students’ attainment against their peers in school or more widely to a national cohort – are fundamentally concerned with generating a ‘shared meaning’ from students’ answers that can be reliably compared (Christodoulou, 2017, p. 66). Assessments that are well designed for summative uses are inherently poor for making formative inferences about the root cause of a student’s performance. Exam-style questions are good examples of this shortcoming – because they typically require students to draw on a wide range of content and skill, it is very challenging to use them to diagnose students’ underlying understanding. For example, whilst a score of 55 per cent or a ‘grade 4’ in a mathematics GCSE mock exam might allow us to compare a student to their peers, it tells us very little about what they do or do not know.

Despite this, it is common practice to try to make formative use of such tests by insisting that teachers break down and analyse students’ assessments in a question-level analysis, which, whilst attractive, is fundamentally misguided. Recording that a student scores 2/5 on a statistics question does not yield any detail about what they do or do not know. The question might include some unknown vocabulary or a context unfamiliar to the student. Alternatively, a number of potential gaps in their mathematical knowledge may have rendered them unable to succeed, which is a) impossible to pick out from a raw score, and b) prohibitively time-consuming to check individually, and may not even be possible if there are misconceptions or gaps that their answer does not reveal or extant knowledge that lies undemonstrated due to vocabulary or context-related barriers. The practice of subsequently providing targeted ‘statistics’ intervention for all students in a given cohort who scored under 3/5 in this question is therefore a crude, ineffective and essentially meaningless process, as given 20 such students in an intervention group, there may be 20 completely separate (and undiagnosed) reasons for failing to attain highly in this question.

Measuring student progress is not as easy as it seems

Advocates of using summative assessments in schools might, at this stage, concede that assessments might not be effective learning or development tools, but contest that they remain a necessary component of schooling, as they both allow us to measure student learning over time and inform adaptations to teaching practices.

Hard, numerical data is often considered ‘objective’ evidence of students’ attainment or growth in schools, but there are growing concerns that internal testing might fail to generate an accurate picture of student progress and the associated implication that ‘a lot of data currently compiled by schools is pretty meaningless’ (Allen, 2018). The combination of the inconsistency of internal assessment practices, the variability in the performance of students and the inherently slow nature of students’ development over time means that even attempting to measure student progress annually is highly challenging to do with confidence (Wiliam, 2010).

Using assessments meaningfully – a proposal for the future of assessment

Taking these limitations into account, the current disproportionate emphasis on exam-style questions is detrimental to curriculum time, and often fails in its remit to produce meaningful progress data or information that can accurately inform teaching. In terms of attempts to measure progress, internal mock exams might be best suited to yearly use, in stark contrast to many current practices. Moreover, insisting that teachers review these summative assessments or produce question-level analyses should be seen as a redundant practice that cannot meaningfully inform changes to teaching.

In class, decreasing the emphasis on mock-exam-style questions in favour of deliberate practice and targeted formative approaches is a change that would support teachers’ abilities to intentionally check for student understanding and more accurately inform future teaching. It is also less time-consuming, as more focused approaches narrow what is required from students, decreasing the complexity and ambiguity of diagnosing pupils’ levels of understanding.

However, to facilitate meaningful formative assessment, we must first look at our curricula. For formative assessments to link to the outcomes that we are aiming for in a curriculum, we must have a ‘model of progression’ – how we can plan for student learning over time to culminate in our desired end-points (Wiliam, 2011). Therefore, an explicit, well-defined curriculum is a prerequisite for meaningful assessment.

Historically, the progression model has been defined by statutory assessment systems such as the now defunct National Curriculum levels (Christodoulou, 2017). Explicitly planning for the knowledge and skills that we want students to learn and sequencing this knowledge over time is a more meaningful way to consider this model of progression (Fordham, 2017, 2020; Counsell, 2018), and this also presents a solution to the question of how to design targeted formative assessments to measure student understanding. We cannot hope to measure what a student has ‘understood’ in our lessons if we don’t know what it is that we are checking for them to understand (Fletcher-Wood, 2018), and thus detailed and intentional curriculum planning is the key to the effective design of these focused, formative assessments.

School leaders therefore have a duty to buck the trend and reduce the frequency and volume of summative assessments, using the time gained to develop and roll out focused, finely tuned, formative-style assessments. A key implication of this proposed shift in practice is that a detailed and intentional consideration of the curriculum is a prerequisite for meaningful formative assessment, and must be an integral component of any developments in assessment approaches.

References

Allen R (2018) Meaningless data is meaningless. In: Becky Allen Musings on Education Policy. Available at: https://rebeccaallen.co.uk/2018/11/05/meaningless-data-is-meaningless (accessed 15 January 2021).

Baker J and Young BW (2014) 20 years later: Deliberate practice and the development of expertise in sport. International Review of Sport & Exercise Psychology 7(1): 135–157.

Boulton K (2017) My best planning. Part 1. In: …to the real. Available at: https://tothereal.wordpress.com/2017/08/12/my-best-planning-part-1 (accessed 15 January 2021).

Christodoulou D (2017) Making Good Progress? The Future of Assessment for Learning. Oxford: OUP.

Counsell C (2018) Senior curriculum leadership 1: The indirect manifestation of knowledge: (B) final performance as deceiver and guide. In: The dignity of the thing. Available at: https://thedignityofthethingblog.wordpress.com/2018/04/12/senior-curriculum-leadership-1-the-indirect-manifestation-of-knowledge-b-final-performance-as-deceiver-and-guide (accessed 15 January 2021).

Department for Education (DfE) (2019) Teacher workload survey 2019: Technical report. Available at: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/855934/teacher_workload_survey_2019_technical_report__amended.pdf (accessed 15 January 2021).

Ericsson K, Krampe R and Tesch-Roemer C (1993) The role of deliberate practice in the acquisition of expert performance. Psychological Review 100(3): 363–406.

FFT Education Datalab (2019) How is data used in schools today? A 2019 survey of current practice. Available at: https://fft.org.uk/how-schools-use-data (accessed 15 January 2021).

Fletcher-Wood H (2018) Responsive Teaching. London: Routledge.

Ford PR, Coughlan EK, Hodges NJ et al. (2015) Deliberate practice in sport. In: Baker J and Farrow D (eds) Routledge Handbook of Sport Expertise. London: Routledge, pp. 347–362.

Fordham M (2017) The curriculum as progression model. In: Clio et cetera. Available at: https://clioetcetera.com/2017/03/04/the-curriculum-as-progression-model (accessed 15 January 2021).

Fordham M (2020) What did I mean by ‘the curriculum is the progression model’? In: Clio et cetera. Available at: https://clioetcetera.com/2020/02/08/what-did-i-mean-by-the-curriculum-is-the-progression-model (accessed 15 January 2021).

Kirschner P, Sweller J and Clark R (2006) Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching. Educational Psychologist 41(2): 75–86.

Spielman A (2018) HMCI commentary: Curriculum and the new education inspection framework. Available at: www.gov.uk/government/speeches/hmci-commentary-curriculum-and-the-new-education-inspection-framework (accessed 15 January 2021).

Sweller J, Van Merrienboer JJG and Paas F (2019) Cognitive architecture and instructional design: 20 years later. Educational Psychology Review 31(2): 261–292.

Teacher Workload Advisory Group (2018) Making data work. Available at: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/754349/Workload_Advisory_Group-report.pdf (accessed 15 January 2021).

Wiliam D (2010) Standardized testing and school accountability. Educational Psychologist 45(2): 107–122.

Wiliam D (2011) Embedded Formative Assessment. Bloomington: Solution Tree Press.

Wiliam D (2014) Principled assessment design. In: Chambers P and Birbeck J (eds) Redesigning Schooling-8. London: SSAT (The School Network) Ltd, pp. 2–97.

Wiliam D and Black P (1996) Meanings and consequences: A basis for distinguishing formative and summative functions of assessment? British Educational Research Journal 22(5): 537–548.