The research of John Hattie created great interest in 2009 when he published Visible Learning. In the book, he compares the statistical measure of “effect size” to the impact of a variety of influences on students’ achievement such as class size, holidays, feedback, and learning strategies.
Hattie’s study was designed as a “meta-meta-study” that collects, compares, and analyzes the findings of nearly 50,000 previous studies in education and brings the data together in a way that is seemingly comprehensible. In 2012, Hattie published a follow-up, Visible Learning For Teachers, which concentrated on the underlying story behind the data and provided many concrete and hands-on examples in the classroom. (In many countries, Hattie’s findings have become an important part of a teacher’s professional development and guides districts in their prioritization of many initiatives.)
But how do educators make sense of this statistical practice of the effect size often cited in education publications and books? They can easily become confused when reading this literature.
For example, if you’re reading a research paper and the specific program had an effect size of +0.35 (or 35% of a standard deviation), some questions you might ask would be: Is this program worth pursuing? Is this effect size large or small? The answers aren’t that clear and simple in actual practice because it depends on many factors that can affect the quality of the studies in question such as:
- Did the researchers use quality measures and tools to assess the impact of the program?
- Was the study very brief and artificial relative to actual classroom conditions?
- Was the sample size too small to generalize to the larger population? (Savin, R., 2013)
It would behoove us to first define what effect size is and what it can reveal as a metric, then discover how to interpret the values and use them effectively to impact student outcomes.
What is “Effect Size”?
Effect size is a simple way of quantifying the difference between two groups that has many advantages over the use of typical tests of statistical significance alone (e.g., t-test). It should be easy to calculate and understand, and it can be used with any outcome in education (or other disciplines).
One of the most commonly used scenarios for effect size is to determine the efficacy of an intervention or educational practice relative to a comparison group or approach. Not only does the effect size indicate if an intervention would work, but it also predicts how much impact to expect in a range of scenarios.
The goal of the effect size is to provide a measure of “the size of the effect” from the intervention rather than pure statistical significance, which tends to get confounded with effect size and sample size. Hattie mentions the term “meta-analysis,” which is one of the most useful ways of using effect size; it’s the process of synthesizing research results into a single effect size estimate. When the research has been replicated, the different effect size results from each study can be combined to give an overall best estimate of the size of the effect.
The calculation of the effect size is actually quite simple and is the standardized mean difference between the two groups. It can be expressed as an equation:
This approach allows the researcher to look at various studies and essentially, average the effect sizes across these studies to derive a single metric—one that can predict how impactful an intervention or educational practice will be on specific student outcomes.
Effect size scores are equal to “Z-scores” of a normal distribution and thus, have the same possible range of scores. Effect size scores will typically range about -2.0 to +2.0, but could range from +/- infinity as the normal curve never touches the baseline. In theory, you could have many standard deviations above or below the average. Generally, effect sizes will range from -.5 to +1.75 in most educational contexts.
How to Interpret the Values?
Jacob Cohen described a basic method for interpreting the effect size: .20 as “small,” .50 as “medium,” and .80 as “large.” Ever since, these values have been widely cited as the standard for assessing the magnitude of the effects found in intervention research.
However, Cohen cautions about their appropriateness for such general use. Many people consider effect sizes of +.3 or less to indicate a small impact on outcomes, +.4 to +.6 to represent moderate treatment effects and +.70 or greater to indicate highly effective treatments. Certainly, we can deduce that the higher the effect size is, the greater the expected magnitude of the effect will be on student outcomes. (For example, an effect size of 0.7 means that the score of the average student in the intervention group is 0.7 standard deviations higher than the average student in the “control group,” and hence exceeds the scores of 69% of the similar group of students that did not receive the intervention.)
Tying this statistical discourse to the classroom, Hattie published his latest meta-analyses and reported the interventions and educational practices that are most effective (based on meta-analyses of 1200 studies). The following chart displays all effect sizes larger than .70 from his 2016 book:
From these results, we can determine, for example, that response-to-invention systems produced a 1.07 standard deviation greater impact on student outcomes (higher test scores) than districts not implementing RtI approaches.
Furthermore, Hattie has identified what he terms the “Super Factors” on student outcomes:
- Teacher estimates of achievement (d = 1.62). Unfortunately, this reflects the accuracy of a teacher’s knowledge of their students and not “teacher expectations.” Therefore, this is not a factor teachers can use to boost student achievement.
- Collective teacher efficacy (d = 1.57). This factor involves helping all teachers on the staff to understand that the way they do their work on a day-to-day basis has a significant impact on student performance. This also means that teachers should not use distal factors such as home life, socio-economic status, and motivation as reasons for poor achievement. In other words, great teachers will often try to make a difference despite these inhibitory factors.
- Self-reported grades (d = 1.33). This factor reflects the fact that students are quite aware and capable of anticipating their grades before even receiving their report cards. But this is not something teachers can truly use to boost performance.
- Piagetian levels (d = 1.28). This is another super factor of which teachers have no influence. Students who were assessed as being at a higher Piagetian level than other students perform better at school. The research does not suggest that trying to boost students’ Piagetian levels has any effect.
- Conceptual change programs (d = 1.16). This research refers to the type of textbook used by secondary science students. While some textbooks simply introduce new concepts, conceptual change textbooks simultaneously introduce concepts and discuss relevant and common misconceptions. These misconceptions can hinder deeper levels of learning. While the current research is limited to science textbooks in secondary school, it’s reasonable to predict that when teachers apply this same idea to introduce any new concept in their classroom, it could have a similar impact.
- Response to Intervention (d = 1.07). There’s plenty of commercial literature and material to help schools use RtI or Multi-Tier System of Supports (MTSS). RtI involves screening students to see who is at-risk, deciding whether supporting intervention will be given in class or out of class, using research-based teaching strategies within the chosen intervention setting, closely monitoring the progress, and adjusting the strategies being used when enough progress is not being made. While the program is designed for at-risk students, the underlying principles are the same as advocated by Hattie as being applicable for all students.
Word of Caution
Although the use of effect size has produced much conversation and innovation in education, there are some cautions to which educators must attend. According to Coe, care must be taken with respect to interpreting effect size for educational programs and interventions. The word “effect” connotes or implies “causality” when in many cases there is an identified relationship and should be used only when it can be justified. We must also be careful when comparing or aggregating effect sizes when there are: (1) different operationalizations of the same outcome, (2) clearly different treatments, (3) measures derived from different populations, (4) different levels of the same intervention and (5) measures derived from different populations.
Coe, R. (2002) It’s the Effect Size, Stupid: What effect size is and why it is important School of Education, University of Durham, presented at the Annual Conference of the British Educational Research Association, University of Exeter, England.
Cohen J. Things I have learned (so far) Am Psychol. 1990;45:1304–1312.
Lipsey, M., Puzio, K., Yun, C., et. al. (2012) Translating the Statistical Representation of the Effects of Education Interventions Into More Readily Interpretable Forms. U.S. Department of Education Publication.
Hattie, J. Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement (2008). Routledge Press.
Hattie, J. Visible Learning for Literacy, Grades K-12: Implementing the Practices That Work Best to Accelerate Student Learning. (2016). Corwin Press.
Savin, R. Effect Size Matters in Educational Research. (2013). Education Week publication.
Illuminate Education is a provider of educational technology and services offering innovative data, assessment and student information solutions. Serving K-12 schools, our cloud-based software and services currently assist more than 1,600 school districts in promoting student achievement and success. We also offer professional learning workshops and consultation.
Ready to discover your one-stop shop for your district’s educational needs? Let’s talk.