On the Spectator website today, Mark Gettleson wrote that,

This was a landmark week in this long election campaign. It was the first this year in which two pollsters (YouGov and Lord Ashcroft) each posted a Conservative lead outside of the margin of error. A 4 per cent lead for the blues may not sound like much – but it represents the largest Conservative lead on YouGov in more than three years.

Is the Conservative lead in the polls truly “outside the margin of error”? Possibly, but that’s not the whole story.

The margin of error of a poll is the measure of how wide we think the actual value of public support for a particular party is. For example, a poll that puts Labour on 30% with a MoE (margin of error) of 3% means that we are 95% certain that the true level of public support for Labour is somewhere between 27% and 33%. That range of options isn’t an equal probability though – we assume that the probabilities are distributed “normally” so the curve of possible options looks a bit like the diagram on the right. The vertical lines show us the standard deviations from the mean of the series and the MoE is 2 standard deviations each side.

Margins of error vary depending on how large your sample is. The normal YouGov polls have around 1,750 people in them and the margin of error on a sample of that size is 2.24%. For an Ashcroft poll the sample size is around 1,000 people, and the margin of error on that is 3.02%. Simply put, Lord Ashcroft’s polls are smaller and with that reduction in size comes an increase in the uncertainty of the actual result compared to the sampled result.

All clear so far? With this information we can test the idea that the Conservative lead is “outside the margin of error”. We create 10,000 simulations of two levels of party support, Labour and Conservative, each normally distributed around the mean with the margins of error from the Ashcroft and YouGov polls (for convenience sake rounded to 3% and 2.25% respectively). We start with the mean set to 30 for each party (so identical poll shares) then increase the gap between the Conservatives and Labour by one point and repeat the exercise. Each time, after creating 10,000 different variations of the two combinations (a random Labour and Conservative poll value, normally distributed with the given MoE and the mean as described) we see how many times Labour is in the lead and how many times the Conservatives are.

And the results are…

Conservative lead over Labour in Polls | % Conservative higher than Labour (Ashcroft) | % Conservative higher than Labour (YouGov) |
---|---|---|

0 – Equal Poll Rating | 50.2% | 50.2% |

1 – e.g Con = 31%, Lab = 30% | 68.4% | 73.6% |

2 – e.g Con = 32%, Lab = 30% | 82.7% | 89.4% |

3 – e.g Con = 33%, Lab = 30% | 92.1% | 97.1% |

4 – e.g Con = 34%, Lab = 30% | 97.0% | 99.5% |

With a Conservative lead of 4% both polls suggest over 95% certainty that the Conservative lead is real. Interestingly though, with a 3% Conservative lead, we are not statistically certain from the Ashcroft poll that the Tory lead is genuine.

So a 4% lead in the polls suggests over 95% certainty of a genuine Conservative advantage in the polls, but of what level? We can examine for each of the two polls what the probability is of the actual Conservative lead based on the 10,000 simulations we ran.

Probability of Actual Conservative Lead | Ashcroft Poll 4% Lead | YouGov Poll 4% Lead |
---|---|---|

Less than 0 (Labour lead) | 3.0% | 0.6% |

0% to 2% | 14.5% | 9.7% |

2% to 4% | 32.6% | 39.3% |

4% to 6% | 32.6% | 40.0% |

6% to 8% | 14.4% | 9.9% |

More than 8% | 2.9% | 0.6% |

That’s really interesting. The Ashcroft Poll with it’s smaller sample size means that it’s more likely that a 4% poll lead actually means a real lead of less than 2% (14.5% versus 9.7%, so half as large again a probability compared to YouGov). But because the sample size is smaller, we actually have a greater probability of a genuine lead greater than 6% (17.3% to 10.5%) with the Ashcroft Poll compared to YouGov. Swings and Roundabouts.

Mark is right that the Tory lead is statistically significant, but that’s only half the story. The two polls he references have differing sample sizes and that affects what we can say about the Tory lead. A larger sample size leads to greater certainty of what the true population position is, a smaller sample size means a greater chance that the poll lead isn’t quite as big as we think (or might be even bigger).

There are further complications. If Lord Ashcroft had added another 750 people to his poll (bringing his sample size to be roughly equal to YouGov) it doesn’t necessarilly follow that he would still show a 4% lead. On top of this, the analysis above is based on the assumption that the two polling figures for the parties are independent of each other. In reality, it could be that the more the poll underestimates the Conservative share of the vote, the more it underestimates the Labour share (or the other way round – Conservative under-estimates correlate with Labour over-estimates). We need to explore this possibility and use a covariance matrix to produce our simulation.

Is the Conservative lead outside the Margin of Error? Well yes, but as we’ve seen that’s only half the story. A little bit of statistics show us that two polls with the same headline figures can actually be telling us completely different things.