In the statistics on coronavirus in the United Kingdom, confusion reigns.
There are at present three competing sources of data on deaths in the UK of persons tested, diagnosed or suspected positive for coronavirus, as follows
- gov.uk, assembled daily at cutoff times of 17:00 GMT (England and Wales), 09:00 (Scotland) and 09:15 (Northern Ireland), and reported the following day
- NHS England, assembled daily at a cutoff time of 17:00 GMT, and reported the following day
- Office of National Statistics (ONS), assembled daily but reported weekly (at the time of writing, up to 18 April 2020).
The figures from gov.uk are redistributed worldwide, with minor differences, by the ECDC.
On delving into the notes and explanations on the respective websites, it becomes clear that the three sources are measuring different concepts, in terms of both definition, geography and timing, namely:
- gov.uk is reporting figures for the UK (England, Northern Ireland, Wales and Scotland), in hospitals only, by date of reporting, not by date of occurrence of death. They state “The figures currently shown for England are deaths in NHS-commissioned services of patients who have tested positively for COVID-19”. These figures once published are not revised.
- NHS England is reporting deaths in hospitals in England in which there was a positive lab test for coronavirus. It revises these figures (upwards) daily as they receive additional reports from hospitals throughout England. The figures generally stabilise after 3 or 4 days, after which any revision will be generally be under 2 percent. It specifies that “All deaths are recorded against the date of death rather than the day the deaths were announced”.
- The ONS is subject to its mandate as to what it can report as “national statistics”. Its data are for England and Wales only; they are based on individual death certificates filed with the Registrar of Births, Marriages and Deaths. The ONS counts any death in which coronavirus was mentioned anywhere on the death certificate, whether or not it was cited as the primary case of death and whether or not there were other health conditions. It has two sets of data, one by date of death, and one by date of registration of death.
One of the consequences of the multiple datasets is that it becomes extremely difficult to give answer to one of the most pressing public issues relating to coronavirus; namely whether the epidemic has passed its peak.
While government, politicians and media continue to predict a forthcoming or imminent peak, all three sources seem to show that the peak in deaths has already passed, probably around April 10.
The differences between the three graphs are readily explicable in terms of definitions, but it seems to us that this is not the time to confuse the public.
To attempt some form of reconciliation of the multiple datasets, we assembled the cumulative data from these three sources. The advantage of working with cumulative data is that these data smooth out day to day fluctuations or inaccuracies, and enable us to see the underlying differences in timing and definition.
Our graph of the three datasets is shown below. The graph illustrates that the ONS figures, for the period for which they are available, are substantially the highest of the three sources, as we would expect since they include cases where coronavirus was not necessarily the cause of death.
We then tested the three sets of data for statistical correlations. The best fitting equations were as follows;
- gov.uk figure = 0.908 x ONS figure (lagged 4 days) with R squared = 99.9%
- NHS England figure = 0.774 x ONS figure (lagged 0 days) with R squared = 99.9%.
(For the mathematicians in our audience: we ignored the “intercepts”, which are nonzero but not statistically significant.)
In plain English, what these equations are telling us is as follows;
- About 91% of the ONS figures are cases in which there was a positive test for coronavirus, and 9% are diagnosed or suspected cases without a positive test
- It takes an average of 4 days for the occurrence of a death in England or Wales to be reported to the government and to be included in the figures announced on gov.uk
- About 77% of the ONS figures refer to cases which NHS England has also recorded; about 9% (as noted above) refer to cases which were untested and therefore would not be recorded by NHS England; and the remaining 14% refer to cases outside hospitals in England, or to cases in Wales.
We believe that the British government should now speak to the public with a single voice. Of the three datasets, the least useful is the gov.uk website which, on a daily basis, reports deaths that have occurred earlier. The ONS and NHS England both report deaths according to the dates when they occurred, which is what the public needs to know. The drawback of the data from NHS England is that they only cover deaths in hospitals. The two weaknesses of the ONS data are that they are late, and they include cases of suspected or diagnosed coronavirus which have not been confirmed by lab tests, thus exaggerating the impact of the epidemic.
The government should now give priority to presenting the ONS figures as the official story of the epidemic, with the following essential enhancements:
- extending the coverage to Scotland and Northern Ireland
- a transparent reporting of the breakdown between tested and untested cases (something for which we submitted a Freedom of Information request, which was refused)
- and above all, a faster release of the data so that public alarm and confusion can be mitigated.