One of the many challenges facing the evidence-informed teacher or school leader is knowing when to trust the experts (Willingham, 2012). Great importance is often ascribed to meta-analysis in the EEF’s Teaching and Learning Toolkit (for example, see (Teaching and Learning Toolkit, 2018)), and by influential commentators such as John Hattie (Hattie, 2008), and as such, has an impact on our understanding of what makes effective learning. When considering approaches such as metacognition, the evidence base for which draws heavily on meta-analysis, it is therefore essential for teachers and school leaders to be aware of the limitations of this approach to enable them to make appropriate use of research evidence.

Understanding meta-analysis

Meta-analyses are conducted as follows:

‘Individual studies report quantitative measures of the outcomes of particular interventions; meta-analysts collect studies in a given area, convert outcome measures to a common metric and combine those to report an estimate which they claim represents the impact or influence of interventions in that area. Meta-meta-analysis then takes the results of metaanalyses, collected in broader fields, and combines those estimates to provide a rank ordering of those fields which make the most difference.’ ((Simpson, 2017), p. 450)

There are a number of advantages of meta-analysis when conducted as part of a systematic review:

  • Large amounts of information can be assimilated quickly
  • The delay between research ‘discoveries’ and implementation of effective programmes can be reduced
  • Results of different studies can be compared to establish generalisability
  • Reasons for inconsistency across studies can be identified and new hypotheses generated ((Greenhalgh, 2014), p. 118).

Despite these advantages, Adrian Simpson (Simpson, 2017) points out two key assumptions associated with metaanalyses which do not stand up to critical scrutiny. The first is that larger effect sizes are associated with greater educational significance. The second is that two or more different studies on the same interventions can have their effect sizes combined to give a meaningful estimate of the intervention’s educational importance.

“Effect size’ is simply a way of quantifying the size of the difference between two groups… It is particularly valuable for quantifying the effectiveness of a particular intervention relative to some comparison’. ((Coe, 2017), p. 339)

Simpson (Simpson, 2017) identifies three reasons as to why these assumptions do not hold.

1. Unequal comparator groups

Studies that have used different types of comparator groups cannot be accurately combined to report meaningfully on effect size. For example, if two studies report the effect size of text message feedback on essays, we cannot reasonably compare them or combine them if one study uses no feedback for the comparator group, and the other uses short written feedback for the comparator group (Simpson, 2017).

2. Range restriction

Unless the interventions combine studies that use the same range of students, the combined effect size is unlikely to be an accurate estimate of the ‘true’ effect size of the intervention. For example, if an intervention measuring attainment includes only high-attaining students, the range of scores they will achieve on a test will be more restricted than the same study conducted across the year group, so the effect size will appear much larger. The different ranges of students used in interventions may influence the impact of an intervention and the effect size (Simpson, 2017).

3. Measure design

Researchers can directly influence effect size by choices they make about how they seek to measure the effect. If you undertake an intervention to improve algebra scores, for example, you could choose to use a measure specifically designed to ‘measure’ algebra. Or, you could use a measure of general mathematical competence that includes an element of algebra. The effect size of the former will be greater than the latter, due to the precisions of the measure used. Furthermore, increasing the number of test items can influence the effect size. Simulations suggest that if the number of questions used to measure the effectiveness of an intervention is increased, this may increase the effect size by a large amount (Simpson, 2017).

Wiliam (Wiliam, 2016) identifies a number of other limitations of ‘meta-analyses’, including that the intensity and duration of the intervention will have an impact on the resulting effect size, that much of the published research involves undergraduate students, who will have little in common with school-age students yet will have a substantial impact on the generalisability of the findings, and, finally ‘publication bias’ in favour of studies with ‘positive’ results.


So, what are the implications for teachers and school leaders who wish to use meta-analyses? First, research findings should not be seen as uncontested, and it is sensible to be wary of simplistic interpretations of effect sizes and subsequent recommendations for action. Second, it will be valuable to read the original studies relevant to the issues you are trying to address, rather than relying on reports based on comparative effect sizes. Third, it might be helpful to use a checklist developed by the Critical Appraisal and Skills Programme to assess the quality of studies: casp-uk. net/casp-tools-checklists/. Finally, the relevance and importance of meta-analyses largely depends on the nature of the problem you are seeking to address. Such studies are pieces of the evidence jigsaw that practitioners need to put together before making a decision about the use of a particular intervention or approach. (Jones, forthcoming)

For further reading, see Akobeng AK (2005) Understanding systematic reviews and meta-analysis. Archives of Disease in Childhood 90: 845-848, also available on the online edition of this issue here.


Coe R (2017) Effect Size. In: Waring M, Hedges L, and Arthur J (eds) Research Methods and Methodologies in Education. 2nd ed. London: SAGE, pp. 368–376.
Greenhalgh T (2014) How to Read a Paper: The Basics of Evidence-Based Medicine. London: John Wiley & Sons.
Hattie J (2008) Visible Learning: A Synthesis of over 800 Meta-Analyses Relating to Achievement. Abingdon: Routledge.
Simpson A (2017) The Misdirection of Public Policy : Comparing and Combining. Standardised Effect Sizes. Journal of Education Policy 32(4): 450–466.
Teaching and Learning Toolkit (2018) Metacognition and self-regulation | Toolkit Strand. Available at: (accessed 11 May 2018).
Wiliam D (2016) Leadership for Teacher Learning. West Palm Beach: Learning Sciences International.
Willingham D (2012) When Can You Trust the Experts? How to Tell Good Science from Bad in Education. San Francisco: John Wiley & Sons.