Sunday 2 February 2020

Coronavirus data sources and my amateur modelling

I found a couple of interesting and reliable sources for anyone interested in tracking the progress of the 1999 nCoV outbreak. The official daily WHO SITREPs can be found here, and I've started using the reported cases and deaths figures from these reports to calculate the daily lethality 'estimate' (around 2.5%) which the media is generally citing, and then also calculate my hypothetical estimates - where the current death total is divided by the case numbers from previous days (1, 2, 3 or 4 days prior) rather than the current number of reported cases. My logic (possibly flawed) is that while deaths (of known coronavirus patients) and generally known immediately, the number of reported cases will always be less than the true current number of infected people, as there is a 1-14 days incubation period before symptoms become apparent. Reported cases may also be less than the true figure, due to misdiagnosis or underreporting - but that would then also understate the deaths figures. So overall I think using current deaths/current reported cases is likely to underestimate the 'true' figure. Initially my rough calculations comparing each days reported death total to the case numbers 4 days prior (T-4) was quite alarming, as it was close to 20% on some days, and was averaging over 12% during the first 10 SITREPS (days). However, the latest daily figures show that this T-4 estimate is now dropping towards 4%-5%. So my current "guess" is that the true fatality rate for this virus may turn out to be around 4%-5%. Only time (and the control of the epidemic eliminating new cases) will provide the correct figure.

Another interesting site to monitor the coronavirus statistics is here. This site is 'live' (updated every few hours) so it provides more current figures that the daily WHO SITREP, but it is harder to use these figures for calculating day-to-day changes in the figures and identify trends in case numbers and fatality rates. The plot looks rather impressive though - a bit like the ones shown in nearly every SF movie about global disasters - but is a little frightening when you realise that this is reality and not a disaster movie! The relative numbers of 'total deaths' vs. 'total recovered' also isn't very reassuring at present (305 deaths, 348 recovered), but hopefully 95%+ of all coronavirus patients will eventually recover.

A link to a WHO introductory 'course' about 1999 nCoV was provided in their latest SITREP, so I did a free registration for an OpenWHO account and went through the 'course' (it's only a short general video and some powerpoint slides).

I did post a comment in the course discussion about lethality rates ("unknown at this time") and how the media is reporting the simplistic deaths/cases figure, and why that is probably too low an estimate. I'm surprised that they don't have an estimate (with error range) for this virus based on time series analysis of the daily figures. While the final 'correct' figure on lethality can't be calculated until it's all over (for example, the reported cases is likely to be lower than the actual number of people infected - but hopefully not too much lower, as that would mean the outbreak is harder to contain), there should be some standard statistical methodology for getting a reasonably accurate prediction based on the daily data series.

My very rough projections done on Friday had estimated the daily case numbers for SITREPS 11 and 12 would be 10,163 and 13,212. The actual numbers turned out to be 9,826 and 11,953. So my 'model' had overestimated by 3.4% and 10.5% respectively. This was because I projected case numbers to continue increasing by 29% per day (the average from the first 10 SITREP figures), while the last two days have seen case numbers increase by 'only' 26% and 22%. The plot of SITREP figures for reported cases and deaths doesn't yet show any downward inflection point that would suggest that the spread of the disease (within China) is being fully contained (as yet):


On a more positive note the international spread of the disease appears to be under reasonable control, provided the initial cases imported from China didn't spread into the general population before adequate quarantine etc. was put into place. Given the incubation period the international figures over the next week or so will show if the effects will be mostly restricted within China, or if it will become a major international pandemic (as was SARS).

Hopefully my 'model' is total unrealistic and I don't know what I'm doing*, as it currently projects reported cases hitting 90,000+ and deaths exceeding 1,800 in a week's time from now... we'll see what the actual figures in SITREP 22 turn out to be.

* As stated in my previous post I have NO medical training or qualifications, and my statistics and data modelling is very rudimentary, and I'm working from a very superficial set of data.

  Subscribe to Enough Wealth. Copyright 2006-2020

No comments: