RICHARD CHURCHES AND COLIN PENFOLD
EDUCATION DEVELOPMENT TRUST, UK

Teacher development is complex and takes place over time. This said, despite there being much evidence about the factors that contribute to teacher professional growth, little research has looked at whether abilities prior to initial teacher training influence early practice in the classroom. This article reports on the first two analysis points in a four-year longitudinal study looking at the psychometric properties of an assessment centre used for school-centred initial teacher training recruitment. As well as assessing the approach’s construct validity and internal reliability, we have been looking at correlation and regression data to evaluate which assessment centre scores predict later classroom performance. Classroom simulation was a better predictor of later classroom teaching ability than other selection activities. Full statistical data and information about the measures used are online (Churches and Lawrance, 2020; Churches, 2021).

Assessment centres for recruitment and selection

The term ‘assessment centre’ describes a series of exercises used by an employer to evaluate skills that they cannot assess using traditional interviews alone. They include behavioural activities and often evaluation of interactions between candidates. Since their first use in the middle of the 20th century, recruiters and employers have seen assessment centres as effective tools for selection, promotion and the development of managerial abilities (Bray and Grant, 1966; Thornton and Rupp, 2005). In the UK, a wide range of organisations use assessment centres (see Bath Spa University, 2019). At assessment centres, the word ‘competence’ applies to an area that observers score the candidate against, along with its associated rubric (e.g. ability to balance competing objectives).

The programme and its assessment centre

The Department for Education Future Teaching Scholars programme is a six-year route into teaching. During an undergraduate degree, participants receive experiences in schools, online learning and face-to-face training. In the fourth year, they join initial teacher training as an employed unqualified teacher. Following attainment of qualified teacher status, in collaboration with a SCITT (school-centred initial teacher training centre), they receive two further years of support (Education Development Trust, 2021).

Most candidates attend the assessment centre while completing their A-levels (aged 17 to 18) and over three years before beginning SCITT initial teacher training. At the assessment centre, assessors evaluate a total of 12 competencies across four activities:

  • competence-based interview
  • classroom simulation, in which candidates teach a short lesson to two serving teachers from an outstanding Teaching School
  • reflective discussion about the teaching they just completed
  • group problem-solving, in which observers score interactions with other candidates (Education Development Trust, 2016).

The classroom simulation, in which two serving teacher-observers role-play learners interrupting or finding it hard to understand, aims to measure innate ‘mental set’ (Marzano et al., 2003) prior to teacher training – specifically, a candidate’s ‘with-it-ness’ (ability to monitor/quickly identify potential problems and act) and ‘emotional objectivity’ (staying calm, not getting angry or frustrated). Together, these are known to have a large effect (d = -1.29) on reducing classroom management issues (Marzano et al., 2003). If the assessment centre was effective, we would expect to find this activity particularly able to predict later classroom performance. A film of the classroom simulation in progress, created using actors to help train assessors, is available here.

Validity and reliability

The terms ‘validity’ and ‘reliability’ relate to whether a measure is fit for purpose (valid) and assesses what it claims to be assessing, and whether the measure is consistent (reliable). The metaphor of shooting arrows at a target can help to clarify the concept (Figure 1).

Figure 1: The concept of validity and reliability

Because assessment centres developed within applied and organisational psychology, organisations often use this provenance to give recruitment processes credibility. However, as with all forms of psychometric measurement, it is vital to evaluate the validity and reliability of such processes. This is particularly important early in the use of an assessment centre. Although well-designed assessment centres are more likely to predict future performance than other forms of selection (Anderson et al., 2008), you should not assume that the design of an individual assessment centre will achieve the same levels of validity and reliability that previous centres have (BPS, 2005). Systematic review of the main education databases and other search systems suggests that, despite increased use of assessment centres in education, no one has previously published a study of an education assessment centre’s validity and reliability.

In assessment centre evaluation, three forms of validity and reliability are particularly important:

  • construct validity – the extent to which the approach measures the construct(s) that it intends to measure, assessed using factor analysis
  • internal consistency reliability – how well items produce comparable results when measuring the same competence, often assessed using Cronbach’s alpha; achieving reliability requires assessor training and a clear rubric
  • predictive validity – the extent to which competences predict scores on a criterion (or performance) measure, such as later teaching ability, conducted using correlational and/or regression analyses.

We evaluated both the Future Teaching Scholars assessment centre and the classroom observation rubric (the Teacher Practice Tool – see the next section) used to measure teachers’ classroom effectiveness once they were in a teaching post. Both assessment centre and classroom observation data reached good or better levels of internal reliability. Construct validity data also supported the model and findings. We measured predictive validity by looking at the correlations between assessment centre competence scores and classroom practice. To date, we have looked at the assessment centre scores of two cohorts (N = 89), recruited in 2016 and 2017, and compared the relationship between their assessment centre scores prior to any training and their classroom performance at the end of their first term in teaching.

Measuring classroom practice

Measuring teacher effectiveness is challenging. Although lesson observation can be useful in teacher development and performance improvement (Pianta and Hamre, 2009), it is hard to conduct effectively (Coe, 2014). Observation may not be enough to measure teacher effectiveness (Penfold and Childs, 2019), as even if individual lesson judgements are accurate, there may be an inaccurate overall impression (Kane and Staiger, 2012). Experiments using lesson videos of ‘effective’ and ‘ineffective’ teachers (Strong et al., 2011) found that despite high agreement and consistency of judgement, there was low success in identifying which group the teachers belonged to (with less than one per cent identified accurately as ‘ineffective’).

Ensuring consistency of judgements to achieve reliability can also be difficult (Ingram et al., 2018), particularly where the observation tool items require too much interpretation or when non-experts conduct observations (Coe, 2014). Expert observers, in contrast, become sensitive to, and notice, areas of practice that others may miss (Grant et al.,1998). Two other issues are worth noting. Firstly, despite the importance of paying attention to subject teaching issues (Penfold and Childs, 2019), there are few subject-specific instruments. Secondly, there are question marks over the validity of a snapshot rather than extended observation. Although there is evidence to support the effectiveness of a short observation in predicting the quality of a whole lesson (60 per cent accuracy; Ho and Kane, 2013), the risk of an overall inaccurate impression remains (Kane and Staiger, 2012).

To address such challenges, Education Development Trust has developed a classroom observation tool that can both evaluate performance and be integrated into a cycle of observation, reflection and development. As well as showing positive results in schools in England, the tool has been used in education programmes in Brunei, Rwanda and Lebanon.

The Teacher Practice Tool is a 12-item observation schedule that includes an additional subject-specific dimension. Items are grouped into five areas: creating a positive climate, structuring and organising lessons, interactive teaching that encourages dialogue, providing well-designed learning tasks and assessing learning continuously. Individual items break down into quality indicators describing practice at distinct levels of maturity (full details are in Churches and Lawrance, 2020).

To assess participants’ teaching, experienced teacher trainers associated with the teacher conducted a lesson observation at the end of the teachers’ first term employed in a school. Observers combined this data with other observations and knowledge of the teachers’ practice. They then completed the Teacher Practice Tool based on all this knowledge of the teacher. We then conducted a statistical analysis to assess the extent to which candidate competency scores at the assessment centre were associated with later classroom practice (i.e. whether higher scores at the assessment centre predicted higher scores for classroom teaching).

Findings

Classroom simulation was the best predictor of later classroom practice in relation to all the competences assessed by this activity. Candidates’ ability to explain subject-specific concepts and problem-solving abilities in the simulation predicted later teaching performance in areas such as high expectations (Figure 2), structuring and designing learning, maximising learning time, giving feedback to learners and subject-specific variety of learning tasks.

Figure 2: Ability to explain predicts later high expectations (ordinal linear regression), N = 89

Conversely, areas of competence assessed during interview were not good predictors of later effective classroom practice. Notably – and surprisingly – candidates’ espoused passion for working with young people was negatively associated with a range of later classroom practice (such as quality of continuous assessment).

These findings parallel research from applied psychology recruitment, where observations of people conducting work-related activities in a wide range of professions are usually better predictors of later performance than what people say about themselves during interview.

Lessons for schools

Our research supports the value of schools asking candidates to demonstrate their ability to explain subject content and their skills in dealing with learners, or of observers role-playing learners while doing this. As important as interviews are, they can only draw out a person’s motivation and theoretical understanding of practice. Uncovering someone’s ability to teach requires an observation of teaching in some form.

What our research also points to is how teachers alone can conduct such activities, including role-playing and observation, in a valid and reliable way, without always needing the use of a live class of students. However, further research would be necessary to establish the relative effectiveness of different approaches and to compare serving teachers as observers with people who no longer practise as teachers.

References

Anderson N, Salgado J, Schinkel S et al. (2008) Personnel selection. In: Chmiel N (ed) Introduction to Work and Organizational Psychology. Oxford: Blackwell, pp.257–280.

Bath Spa University (2019) Guide to Assessment Centres. Bath: Bath Spa University Careers.

Bray DW and Grant DL (1966) The assessment centre in the measurement of potential for business management. Psychological Monographs 80: 1–2.

British Psychological Society (BPS) (2005) Design, Implementation and Evaluation of Assessment and Development Centres: Best Practice Guidelines. Leicester: British Psychological Society.

Churches R (2021) How to assess the potential to teach, new evidence from a STEM teacher assessment centre model in England, DATA UPDATE 2021. Reading: Education Development Trust.

Churches R and Lawrance J (2020) How to Assess the Potential to Teach: New Evidence from a STEM Teacher Assessment Centre Model in England. Reading: Education Development Trust.

Coe R (2014) Classroom observation: It’s harder than you think. In: CEM Blog. Available at: www.cem.org/blog/414 (accessed 22 June 2021).

Education Development Trust (2016) Future Teaching Scholars assessment centre. Department for Education. Available at: www.youtube.com/watch?v=25I9loSxLB4 (accessed 20 January 2021).

Education Development Trust (2021) Welcome to Future Teaching Scholars. Available at: www.futureteachingscholars.com (accessed 20 January 2021).

Grant TJ, Hiebert J and Wearne D (1998) Observing and teaching reform-minded lessons: What do teachers see? Journal of Mathematics Teacher Education 1: 217–236.

Ho AD and Kane TJ (2013) The reliability of classroom observations by school personnel. Research paper, MET Project, Bill and Melinda Gates Foundation. Available at: https://eric.ed.gov/?id=ED540957 (accessed 22 June 2021).

Ingram J, Sammons P and Lindorff A (2018) Observing Effective Mathematics Teaching: A Review of the Literature. Reading: Education Development Trust.

Kane TJ and Staiger DO (2012) Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains. Research paper, MET Project, Bill and Melinda Gates Foundation. Available at: https://eric.ed.gov/?id=ED540960 (accessed 22 June 2021).

Marzano RJ, Marzano JS and Pickering DJ (2003) Classroom Management that Works: Research-Based Strategies for Every Teacher. Virginia: Association for Supervision and Curriculum Development.

Penfold C and Childs A (2019) Evaluating teacher quality. Education Development Trust. Available at: www.educationdevelopmenttrust.com/our-research-and-insights/commentary/evaluating-teaching-quality (accessed 22 June 2021).

Pianta RC and Hamre BK (2009) Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher 38(2): 109–119.

Strong M, Organi J and Hacifazlioglu O (2011) Do we know a successful teacher when we see one? Experiments in the identification of effective teachers. Journal of Teacher Education 62(4): 367–382.

Thornton GC III and Rupp DE (2005) Assessment Centres in Human Resource Management: Strategies for Prediction, Diagnosis, and Development. New Jersey: Lawrence Erlbaum.