I had some concerns about the marking system used for Key Stage 3 in my school when I joined two years ago. It followed the general policy that classwork would be marked every two weeks and pieces graded from E* to 3. Approximate proportions were set for each grade, so one or two students might achieve E* or E (excellent) for each task, the remainder of the upper half of the class received a 1, most of the lower half received a 2 and those students at the bottom received a 3 – in other words, a ‘relative’ grading system.

However, at the point of transition from primary to secondary, students can experience declines in enthusiasm, motivation, attainment and confidence, in some cases related to changes in assessment approach (McGee et al., 2004). I therefore thought the use of more comment-only marking system might be better than a student receiving a discouraging grade of 3, but it was the relative nature of the system I was particularly keen to explore. There is some debate around the advantages and disadvantages of relative grading in terms of impact on both motivation and performance (OECD, 2012; Watkins, 2010), and this seems particularly relevant in Key Stage 3. Indeed, when I submitted a survey to all teachers, eliciting their views on the policy, some admitted that they chose not to award a grade of 3 because of the detrimental effect it could have on the girls’ confidence and motivation.

I had the opportunity to investigate the effects of relative marking, and of moving away from this, as part of the Research Learning Communities project initiated this academic year between the Girls’ Day School Trust (GDST) and the UCL Institute of Education to encourage more education research in GDST schools. I was therefore able to share my findings with other classroom practitioners along the way and was provided with support in designing, carrying out and analysing the results.

Surveys

To understand the perceptions of teachers, students and parents around the approach taken to marking in Key Stage 3, I carried out surveys of these three groups. The first survey asked all teachers to score from zero (very poor) to 10 (excellent) on two scales the effectiveness of the marking policy in promoting student progress and helping student confidence. Based on the 36 responses, each scale received an average of 4.4. When converted into average scores awarded by each department, the highest mark for each was 6, while the lowest for progress was 2.5 (mathematics and computing) and for confidence 2.3 (science).

The general nature of the teachers’ comments is best summarised by the following selection:

  •  ‘Pupils focus too much on the number and not enough on the comments.’
  • ‘The system doesn’t equate to anything useful in terms of future GCSE gradings.’
  • ‘The range of possible marks is rather narrow and lacks nuance.’
  • ‘Confidence is affected if the girls get anything below a 1.’

I then issued a survey to current and former Key Stage 3 students. This asked them to explain the marking system and to award a score from zero to 10 for how well it allowed them to understand their performance. Only four of the 109 students who responded mentioned anything pertaining to the relative nature of the marking system. In the section where they could give additional comments, a high number (20 of the 57 former Key Stage 3 students surveyed) mentioned the stigma attached to receiving a 2 or below, and some said the different approaches taken by teachers to using the system had caused confusion for many students. Interestingly, a detrimental effect on motivation was most explicit in Year 7 responses – 12 of the 52 students surveyed had found the marks demotivating within less than a term, echoing findings elsewhere.

The results of a parent survey, which highlighted some confusion with the system and how it was applied in different ways, supported the need for intervention.

Intervention

In January, three teachers agreed to embark on a five-month research project with treatment classes using a system of comment-only marking, with an absolute marking system for infrequent assessment (where all students could achieve 10/10, for example). These classes were Year 7 English (26 students), Year 8 French (27 students) and Year 9 geography (20 students). Each teacher had another class in the same year group, so the presence of these comparison classes – plus the lack of selection criteria other than year group and subject – afforded this project the advantages of a randomised controlled trial, commonly seen as the best approach for educational intervention. The Education Endowment Foundation has detailed resources on how to conduct such trials.

As a testament to the school’s support for this project, the director of assessment made the necessary changes in the school information management system to omit a relative grade from the report cards issued during this period to the treatment classes. Instead, a different set of numbered grades was available to each treatment class, reflecting the grade they would be likely to achieve if assessed at this exact moment on the new GCSE 9-1 scale. This scale is gradually being phased in to replace A*-G grades in both maintained and independent schools, with a 9 reserved for the highest-achieving students. The reports reflected students’ increasing attainment, so Year 7 received grades from 6-1, Year 8 received 7-2 and Year 9 received 8-3.

Each teacher adopted a different approach to their marking and we met after one month to discuss initial findings. All three teachers provided favourable feedback – all students had responded maturely and no parents had expressed concerns.

The geography teacher had chosen a ratio of two positive remarks to one area for improvement in each of her comments and ensured that these were always specific and individualised. The English teacher said the use of comment-only marking forced the students to focus more on the comments and improved the quality of dedicated improvement reflection time. She added: ‘It is a great opportunity to praise the weaker ones for what they are doing well, and that becomes more meaningful to them if there isn’t a grade attached to it. Having small, achievable targets is also less overwhelming for them than, say, seeing that they have a 3 and thinking about all the things they need to do to get a 1.’

Eventual findings

The annual examinations in May gave us interesting quantitative data. The French teacher’s treatment class was the best-performing French class in terms of the average score, achieving 79 per cent against an average of 77 per cent for the rest of the year group and allaying fears that a lack of marks might reduce achievement. The same number of top grades was evenly distributed among classes, but there were fewer students achieving the lower grades in this class. ‘I would say it had a lot more of a tangible effect on the students who would struggle more in the subject,’ said the teacher, ‘and those who are going to do well are always going to do well’.

By the end of the research project, the French teacher had been providing formative marking with individual corrections in the text and a general comment at the end, along with a target tailored specifically to that student. Before subsequent pieces of work, the students would draw upon his previous individualised comments to write a target at the top of the page but also give a ‘how’, a specific way in which they were going to meet it. The teacher would refer to the ‘how’ in his marking at the end and state whether the student was successful. There was sometimes peer review to see if the students met their targets – they could make corrections, but the teacher still checked this work at the end.

The teacher said the students who did not perform so well over the course of the research were those not properly reading targets and reflecting on the process, although even these students began to improve when they realised past comments allowed for future success. The cyclical nature of the feedback process gave weaker students a framework to use – they worked hard and had something that clearly showed them what their targets needed to be.

‘It is easier for you to categorise where weakness lies and what is the nature of the weakness,’ the teacher said. ‘It’s also a good thing because it shifts responsibility on to them to look at their work.’

The geography teacher’s treatment class fared comparably with her other class and the remainder of the year group. The English teacher’s treatment class performed just as well. In preparation for the close analysis required in the comprehension task, she typically had students rework paragraphs of writing where the target and the ‘how’ concerned the effects of language devices.

Behavioural changes

More noticeable improvements in grades may require more time with this process, but there is already clear progress in reactions to returned examination papers. ‘I returned the exam results and the non-treatment class immediately got out their calculators, trying to calculate their percentage on the paper, but they couldn’t do it because they didn’t realise everything was weighted,’ said the French teacher. ‘Not a single person got out their calculator in the treatment class, which was interesting to me. They were removed from that summative instinct. They were then asking me which pieces of paper they could write on so they could make notes that they could refer to again.’

The geography teacher supported these comments, saying her treatment class was a lot more subdued when examination papers were returned, whereas the other class was ‘trying to claw back certain marks’. This increase in reflectiveness bodes well for other subjects and, indeed, future life; whether it remains with the students is a matter for further research. Interestingly, the geography teacher added that the decision not to award grades had encouraged her to be more honest when writing undisclosed formative marks in her mark book, which therefore gave her a better idea of how students would perform. When awarding marks to the non-treatment class, she had tended to overinflate because it was a challenging class that needed encouragement before picking their option subjects.

Student response

The teachers’ feedback was encouraging, but I also wanted to hear the students’ thoughts. A survey issued to each of the treatment classes revealed a general desire for a return to grades, although there was a strong feeling among some students that the new approach made it easier to see how they could improve their work. The surveys were completed anonymously, so I gathered a selection of students from Years 8, 10 and 12 to offer comments on my findings.

They told me that any new system, such as comment-only marking, must start in Year 7 and as early as possible – the expectation of a grade is a hard thing to eliminate. There was also the issue of non-treatment classes having the ability to brag about grades that treatment classes had not been receiving, and the E*-3 system gave students a reassurance that they were not currently working towards their GCSEs, unlike the 9-1 system applied to reports for treatment classes.

Some in this group said that comment-only marking for Years 7 and 8 would be sustainable but Year 9 work should be graded in preparation for the GCSE years, while another said that effort grades rather than attainment grades should be considered for lower years.

One interviewee was part of the Year 8 treatment. She said she had not missed the grades and, although she did not know quite so clearly how she was performing in the class, the process had obvious benefits. ‘It forces you to look at your comment from before, which I don’t think I was previously doing. I don’t think I was looking at it and thinking, “How can I improve directly from this piece of work?” and I think the fact I’ve only had comments has helped me to do that.’

Conclusion

The director of assessment responded with interest to the findings and had considered various ways to improve such marking since my project began, but he has since stated that retaining a distinction between Key Stage 3 attainment and GCSE grades may be desirable in terms of maintaining student confidence. Creating report descriptors for 9-1 grades available to each of the year groups and for each of the subjects is also too burdensome a task.

He has, however, now asked all teachers to award grades only for significant pieces of work – approximately half-termly with classes seen for at least two hours a week and termly when seen for one hour a week. He has stressed that the focus should be on good-quality feedback and teachers should not feel constrained by the proportions suggested for the awarding of grades E*-3.

This move towards infrequent and more flexible grading with an emphasis on good-quality feedback is commensurate with my findings. While the experiment did not significantly raise achievement, it did not make things worse, and it seems to have engendered a more productive approach to feedback on the students’ part. The director of assessment and I also agree that there is more to investigate in terms of perfecting teachers’ comments: ‘Improving the quality of feedback is a shorthand for establishing a better dialogue between student and teacher. This allows the students to develop a more sophisticated mindset and get them away from a fixation on grades.’

 

References

McGee C, Ward R, Gibbons J, and Harlow A (2004) Transition to Secondary School: A Literature Review. Hamilton, New Zealand: The University of Waikato.

OECD (2012) Grade Expectations: How Marks and Education Policies Shape Students’ Ambitions. Paris: OECD Publishing.

Watkins, C (2010) Learning, performance and improvement. Research Matters: The Research Publication of the International Network for School Improvement (34): 1-15.