Administrations worldwide are seeing assessment as a policy tool to achieve educational aims and objectives. The outcomes of public examinations are increasingly viewed not only as important for individual learners – to enable them to access the labour market and further educational opportunities – but as a means of pursuing a series of aims, including assessing the quality of schools, influencing teaching and learning, and affecting curriculum content.

This article looks at some of the challenges this raises, and some of the steps that can be taken to move things forward.

Challenges in assessment

It is naive to think that qualifications and tests can or should only carry a single function, such as providing students with a ‘grade’ or other record of attainment.

Rather, assessments inevitably stand in a complex relationship with educational aims and processes. One seemingly simple reason for this is that we can seldom assess everything that has been learned. Final assessments such as GCSEs and A-levels tend to sample from the possible things that can be assessed to attest to someone’s attainment – we try to gather just enough information to make a dependable judgement. For essentially practical reasons, to try to assess everything would be too lengthy a process, too cumbersome, too costly, and largely unnecessary. Were we always to attempt to do this, the impact of final assessment would be to take huge amounts of time out of the learning programmes, to expose young people to months of assessment, and to collect information that does not really tell us much more about the person than a ‘sample’ assessment.

Some technical developments in assessing – such as adaptive assessment – may soon help with homing in swiftly on the learners’ level of attainment. Adaptive testing allows assessments to avoid wasting time on questions that are too hard or too easy for a given learner. They give more time for covering content at an appropriate level of demand. But just how much each assessment should cover remains a constant issue.

Beyond this perennial question for those designing assessments regarding ‘just how much should we sample?’ there exists an additional limitation that has greatly influenced the effect of assessment. This relates to which methods we can dependably and practically use for assessing attainment in a given subject discipline or skill area. In the 1970s and 1980s, ‘coursework’ seemed to be a means of securing more authentic assessment of each student’s attainment, enriching learning and broadening what assessment could cover – both in terms of syllabus and in terms of type of outcomes: ‘skills’ as well as ‘knowledge’. By 2010, evaluation had shown a difficult ‘balance sheet’ of assets and problems – assets were joined by problems, and undue workload and complexity arose for pupils and teachers. Ironically, some coursework topics narrowed students’ experience. But, most significantly, professional contradictions arose for teachers. In science, for example, Abrahams’ research (Abrahams 2005) showed teachers regarded coursework as ‘non-science’ – just a way of chasing marks. At the same time, teachers were expected to both optimise students’ marks and act as the objective representatives of exam boards while marking work.

By 2011, Cambridge Assessment felt this professional contradiction had become so serious it threatened not only the quality of assessment (and its impact on learning) but also societal trust in public examinations. In response, it formulated the new model of assessment in GCSE and A-level, in which practical work must be completed, but does not contribute directly to the grade. Far from causing a ‘crash’ in practical science in schools, early extensive evaluation suggests the impact of the approach has been to reinvigorate practical work. This shows the importance of constant evaluation of impact and fine-tuning of policy.

The impact of accountability

As a result of the use of qualification results in school accountability, there are high expectations over the consistency of marking and final grades as well as the validity – or authenticity – of the assessment. With its comprehensive 150- year archive of exam papers, Cambridge Assessment undertakes substantial historical review of assessment, not solely out of intellectual curiosity, but as a means of establishing how assessment has changed over time and why.

This record shows a movement towards items that are capable of being more consistently marked – a tendency which of course has specific assets, but one that also leads to more structured questions, restriction in what is assessed, and so greater predictability. In a context of high accountability, this restriction in the form and syllabus coverage of exam questions appears to have coincided with the restriction of teaching programmes in schools within subjects.

It’s not of course axiomatic that narrowing the curriculum in schools in a specific subject is always bad. Syllabuses and teaching programmes can become bloated over time, moving away from essential subject elements. But when narrowing means certain fundamentals are not taught, as they do not appear in the assessment, or the learning programmes become boring and uninteresting, then there is a problem. Research over the past 15 years – such as Bill Boyle’s work – has confirmed general perceptions that this appears to be a persistent problem with national testing at Key Stage 2. It shows adverse narrowing of learning programmes, particularly in the year before the tests.

Discussions have also raged over the impact of the English Baccalaureate (EB) measure – which of course uses examinations as a means of determining the curriculum scope of schools and subject choices of learners. But here, discussion with school leaders shows very different reactions of schools with very similar intakes of pupils – some reacting adversely to the restriction, others seeing the options within the EB, and the fact that it does not determine the entire school curriculum, as positive or at worst neutral in respect of school aims. The chief inspector’s recent announcement of Ofsted’s engagement with this issue is timely and welcome.

This all perhaps suggests that accountability – the uses of the outcomes of assessment – is the villain of the piece, rather than assessment itself. But for all its impact, accountability is unlikely to go away. Our transnational research shows that most high-performing systems around the world – including Finland’s, contrary to common accounts – use assessment outcomes as part of public accountability arrangements.

But neglect of the details of impact can have serious consequences. We have written before that we consider it quite wrong that the crude five A*-C measure was left to run for so long, resulting in more than 10 years’ encouragement of a narrow focus on borderline C-D pupils.

The future of assessment

So if we live in a system in which accountability is a continuing reality, and there are high expectations of consistent final assessment, what should we do? On almost every occasion when the future of assessment is raised in open discussion, the question of ‘pen and paper testing’ comes up. Firstly, it is vital to recognise that pen and paper tests do not assess only factual recall. Beyond their capacity to readily assess comprehension and complex analysis, a high final grade in a GCSE or A-level also includes elements of concentration, persistence and dedication to learning.

Secondly, there are general expectations of highly robust delivery of high-stakes mass assessment, and this is currently delivered by pen and paper tests – with computer-based tests only just beginning to be able to approach the same levels of dependability. While on-screen marking has transformed the quality assurance of marking, the assets from on-screen assessment – adaptive testing, more interactive assessment items and other features – are still around the corner, and will need to be carefully scrutinised for impact. Promise may be high, but practical delivery needs to be assured. Where on-screen assessment has already displayed positive impact on learning is in the provision of a high volume of formative assessment. Systems such as Isaac Physics (isaacphysics.org) have provided a stream of high-quality items – drawn from the past as well as the present – and enjoy high levels of enthusiastic use.

The results of policy based on realistic and penetrating evaluation of assets and problems, combined with technical innovation informed by assessment theory, suggest we are more than capable of developing assessment with a sustained beneficial impact, even in a high-stakes setting – but much remains to be done.

 
 

References

Abrahams, IZ (2005) Between rhetoric and reality: the use and effectiveness of practical work in secondary school science. PhD thesis, University of York.