During August 2018, the Guardian, the Independent, the Observer, the Mirror and other UK national and local newspapers reported that “[m]ore than 100 Westminster constituencies that voted to leave the EU have now switched their support to Remain”.
The source for this claim was a study in which “the FocalData consumer analytics company compiled the breakdown by modelling two YouGov polls of more than 15,000 people in total” in 632 of the 650 constituencies in the UK.
The press has not published the breakdown of these 15,000 people between the constituencies, but simple arithmetic shows that the average is 24 persons per constituency (15,000 divided by 632). In other words, FocalData and the press were making statements about individual constituencies on the basis of an average of 24 questionnaires.
Let us assume (a very large assumption, to which we will return) that the sampling was random. In statistical terms, this means that every voter in the constituency had an equal chance of being interviewed. We could think of the voters as numbers in the national lottery, where (so we hope) no human agency intervenes in picking the winning combination. If that were so, FocalData could claim (as they do) that each of these groups of 24 persons was statistically representative of the constituency; in plain English, that in an average constituency of 70,997 voters, the other 70,973 voters had the same attitudes as the 24 who filled in questionnaires for YouGov.
Let us, as an exercise, apply this assumption to Surrey Heath, Mr Michael Gove’s constituency, where the Guardian reported that “[s]upport for Remain increased from 48% in 2016 to 50.2%”. If this claim were true, Mr Gove, as a committed and very visible Leaver, should now be worried about keeping his seat in the event that he has to defend it.
Let us invoke some basic laws of statistical sampling.
If we want to make a statistically valid statement about any group of people (what statisticians call a “universe”), we have to specify what margin of error we are comfortable with. In demographics, economics, politics and other social sciences, typically plus or minus 5 percent is considered acceptable.
Secondly, we have to specify our confidence that our statement is within the margin of error. Typically, for social scientists a confidence level of 95% is considered acceptable; in other words, we assert that if we repeated our poll 20 times, in 19 cases the result would be within the margin of error (and in one case it would be outside that margin).
The relationship between the universe, the sample, the confidence level and the margin of error is governed by laws of sampling which all statisticians accept.
By these laws, any statement about Surrey Heath, based on 24 randomly selected voters, has a margin of error of +/-19.99%, with a confidence interval of 95%. With regard to FocalData’s statement about Surrey Heath, they are entitled (at most) to claim that with 95% confidence, the true situation is between 80% and 120% of what they say.
Therefore, if the YouGov sample is random, support for Remain in Surrey Heath is between 40.2% and 60.2%. It might have gone up since 2016, but it might also have gone down. Surrey Heath may have a majority for Leave or a majority for Remain. Both results are within the 95% confidence interval. Mr Gove does not have to worry.
An analogous argument can made in respect of the other 111 constituencies which FocalData claims have “switched”.
We have done this analysis.
We first enquired, given samples of 24 people, what statements we could make with 95% confidence about individual constituencies. We found that of the 112 constituencies, none (we repeat, none) could be said with this level of confidence to have a majority for Remain.
We then relaxed our criterion; we asked what we could say with 90% confidence. Applying the basic laws of statistical sampling, any statement based on a random sample of 24 persons has a margin of error of +/-16.78% with 90% confidence. With this condition we found that of the 112 constituencies, exactly four (Knowsley, Liverpool Walton, Southampton Test and Liverpool West Derby) could be said to have a majority for Remain.
In short, the headlines in the press are simply indefensible. No professional statistician would support them.
It gets worse. That’s the situation if the samples were random.
Let’s therefore look at the website of YouGov, from whom FocalData obtained their numbers. We see that YouGov’s sampling frame is an international panel of individuals, of whom about 800,000 state that they are resident in the UK. The panel is self-selected and consists of persons who have internet access and are willing to participate regularly in polls. They are typically rewarded with 50 YouGov points per poll; that translates to about ₤2 per hour. These people are highly motivated (but not by money). There is also a referral process whereby panel members introduce new members. The panel is therefore, by definition, non-random.
YouGov cannot make their respondents random, but in any given poll, they apparently attempt to make the respondents “representative” of the target universe by inviting selected panel members who collectively have some demographic and social characteristics that are similar to those of the universe. YouGov do not disclose how they do this.
It is easy to create the illusion of representativity. The trick is to select some well-publicised (or at least, publicly available) indicators such as gender, age, occupation, voting record, maybe religion and so on, and select your respondents so that they collectively resemble the electorate, or the census, or whatever is the target universe.
The bait and switch is when the word “resemble” becomes the very different word “represent”.
Pollsters would have us believe that their “sample” has the right mix of males and females, old and young, millennials and pensioners, or whatever. They then invite us into the illusion whereby we say “that’s all right then, because all millennials (or whatever) think and act alike”. That leads us to buy into the pitch that what is true of the “sample” is true of the electorate, or the population (or whatever).
There is a story of the businessman who was proud of the diversity of his workforce: by which, he meant that the workforce was in some way “representative” of the population. He liked to point out, for example, that a large proportion was of Indian ancestry. An Indian colleague said simply “But all of your Indian employees are of the Brahmin caste”.
In my own constituency (Bedford and Kempston), FocalData claim that the electorate is now 57% for Remain. (The underlying truth is that they probably have about 23 YouGov questionnaires of whom 13 said Remain and 10 said Leave. If so, it would only take two persons to have given an opposite answer, and the result would have been 12 for Leave, 11 for Remain.)
This prompts me to issue the following challenge to FocalData (or to YouGov, whoever wants to take it up). Give me access to your panel for Bedford and Kempston, and I will select two groups of 24 people. Each group will consist of 12 males and 12 females, 81% white, 59% Christian, with a median age of 39 (and other characteristics, on which we can agree). Each group will “resemble” the population of Bedford and Kempston. One group will have a majority for Leave, and the other group will have a majority for Remain.
Let’s then leave aside constituencies, and take another headline which purports to refer to the UK as a whole (this from the Independent): “2.6 million Leave voters have abandoned support for Brexit since referendum, major new study finds”. This headline again refers to the FocalData study, and again is based on YouGov data. The Independent’s underlying (and unstated) assumption is that the 15,000 YouGov respondents are representative of the UK electorate (which, in 2017, numbered 46,148,000 people).
What (probably) really happened is that of YouGov’s 15,000 respondents, about 850, or 5.6 percent, told YouGov that they had switched from Leave to Remain. FocalData (and the press, quoting them) thereupon assumed that 5.6 percent of the entire UK electorate had switched from Leave to Remain. Multiplying 5.6 percent by 46,148,000 gives 2.6 million people.
The reality is that the YouGov respondents are (at best) representative of the YouGov panel. So the only statistically valid statement that can be made on the basis of these data would be something like “we estimate that 5.6 percent of the YouGov panel, or 45,000 panellists, have switched from Leave to Remain”.
About the remaining 46,103,000 electors, nothing can be said.
Office of National Statistics.