Air pollution and primary school absence rates in Greater Manchester

The BBC recently published a tool that allows you to put in a post code and get an air quality score. Public spirited data ninja, Jamie Whyte promptly scraped the data for schools in the Greater Manchester area (here’s a link to a blog he wrote about how he did it, and to the data).

This opens up a bunch of interesting questions, like are pollution levels associated with school performance? It’s easy to imagine that high pollution levels might be associated with more sickness absence, and so worse educational outcomes. Or maybe schools in polluted areas are also likely to be in more deprived places, where kids face all sorts of extra challenges. Or maybe traffic noise makes it hard to concentrate.

Here I’m going to do a quick pilot study to see if there’s a relationship between the pollution score for primary schools in Greater Manchester and their sickness absence rate. I have used primary schools rather than secondary schools because (a) there’s more of them; and (b) the secondary schools all fall into the lower two pollution categories, and this means we’re unlikely to see much of the variation in sickness absence explained by differences in pollution.

Below is a box plot showing the illness absence rate in 2015/16 (number of authorised absences for illness per child enrolled in the school), taken from the Department for Education’s website, against the pollution score (EarthSense give some details on what those scores mean here).

The first thing to point out is that no primary schools in Greater Manchester fall in the ‘3 out of 6’ category. The second thing to point out is that only four fall in the ‘5 out of 6’ category (the most polluted category in the UK). So this data should be treated with some caution. I have also excluded 11 schools because the pollution score data was missing.

It looks like the primary schools in the ‘5 out of 6’ category have a higher rate of authorised sickness absence than the other schools.

This calls for statistics!

Below is the output from an analysis of variance (ANOVA) and linear regression model. Essentially, we don’t have enough data to be sure. The ANOVA tests whether the average sickness rate differs between any of the groups. This test was not statistically significant — we can’t be sure that the differences we see aren’t just down to chance variation between schools. The linear regression suggests that the average sickness rate in the schools in the top category might be higher (by around 3.86 half-days per child per year on average), but again, we can’t be sure that this effect is real.

Even if the effect is real, we can’t be sure that the pollution causes the sickness absence. As I said above, it’s possible that schools in more polluted areas are also in more deprived areas and its the level of deprivation that drives the sickness absence.

But this would be worrying if true, so it seems to me that it’s worth someone getting the data for all the schools in England and doing a proper analysis. This would include adjusting for things like deprivation in the local catchment area and so on. Better scientists than me have already found associations between air pollution and school sickness absence (see this or [this] ( for examples), so it seems like something that’s worth following up. Ideally by someone better qualified than me.

## Analysis of Variance Table
## Response: auth_ill_per_child
## Df Sum Sq Mean Sq F value Pr(>F)
## Score 3 15.83 5.2780 1.7773 0.15
## Residuals 796 2363.80 2.9696
## Call:
## lm(formula = auth_ill_per_child ~ Score, data = primary)
## Residuals:
## Min 1Q Median 3Q Max
## -8.381 -1.151 -0.003 1.028 5.954
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.43430 0.06775 124.495 <2e-16 ***
## Score2 out of 6 -0.05298 0.15659 -0.338 0.7352
## Score4 out of 6 -0.43564 0.99722 -0.437 0.6623
## Score5 out of 6 3.85774 1.72458 2.237 0.0256 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 1.723 on 796 degrees of freedom
## Multiple R-squared: 0.006654, Adjusted R-squared: 0.00291
## F-statistic: 1.777 on 3 and 796 DF, p-value: 0.1

Public health registrar. Recovering government policy wonk. Lapsed neuroscientist. Opinions strictly my own.