How Deadly Is COVID-19? New Stanford Study Raises as Many Questions as It Answers
Among the many “known unknowns” complicating the creation of public policies to respond to the COVID-19 pandemic is estimating its lethality. We know that overall, it has been dramatic, with nearly 40,000 fatalities in the US alone in just over a month. But since we don’t know how many people have been infected, we don’t know how likely it is to be deadly for someone who contracts it.
Early estimates based on confirmed cases have ranged from 1 to 5 percent. It has always been assumed that these estimates are high since, in most countries including the US, only the sickest have been tested — at least until very recently. But we don’t have any solid data on the real number of cases, or how much the mortality rate varies by demographics. It does seem clear that COVID-19 is more dangerous to older people and those with underlying conditions, but we don’t know by how much.
In order to get real answers for the mortality rate, studies of broader populations are needed. Quite a few of those have gotten underway around the world, with several of them in the United States. One of the first to report its results, in the form of a “pre-print” (not yet peer-reviewed), is an effort led by Stanford University researchers to test 3,300 volunteers from Santa Clara County. That includes Stanford at one end, stretches through much of Silicon Valley past San Jose at the other end, and has a population of almost two million.
Estimated Infections of ’50 to 85 Times’ Confirmed Case Count
The striking conclusion of the Stanford researchers in the pre-print of their study, which has gained traction in media around the world, is their estimate that the prevalence of COVID-19 in the area is 50 to 85 times higher than the confirmed case count. It’s not surprising that the actual number is higher than the confirmed number. But previously, most estimates have been closer to 5 or 10 times the confirmed case count.
The obvious implication of their conclusion is that the mortality rate for COVID-19 is much lower than current estimates, and by a large enough margin that it is worth re-evaluating our public policy response. However, there are a number of good reasons to tread carefully in using the study’s findings. These reasons have unfortunately been overlooked by many in their rush to trumpet the headline conclusion or justify policy actions. We’ll take you through some of the most significant caveats.
A Quick Review of Antibody Testing for COVID-19
Almost all the testing that has been done in the US, and most of the world, related to COVID-19 has been using diagnostic tests for 2019-nCov, the virus which causes it (also referred to as SARS-2-nCoV). A correct positive result means that the subject is currently infected. That’s helpful for deciding on possible courses of therapy, and for compiling active case counts, but it doesn’t tell you if a person has had COVID-19 and recovered. As a result, those tests don’t allow you to sample the general population to see who might have developed some immunity, or how widespread unnoticed or undiagnosed cases have been.
Antibody testing is complementary to diagnostic testing in this case. Tests can measure one or both IgM and IgG (Immunoglobulin M and Immunoglobulin G) reactivity to the 2019-nCoV virus. IgM levels rise fairly soon after the onset of COVID-19, but eventually decrease, while IgG levels represent an ongoing resistance (and hopefully some longer-term at least partial immunity). So for completeness, antibody tests should ideally measure both.
Test Sensitivity and Specificity
If you haven’t previously dug into evaluating tests, two important terms to learn are sensitivity and specificity. Sensitivity is how likely a test is to correctly identify a positive subject with a positive test result. A low sensitivity means that many subjects who should test as positive don’t — aka a false negative. Specificity is a similar concept, except it measures how many subjects who should test negative actually do. Here, a low sensitivity means more false positives. Depending on the purpose of the test, one may be a lot more important than the other. Interpreting them is also dependent on the overall ratio of positive to negative subjects, as we’ll see when we look at Stanford’s results.
About the Antibody Test Stanford Used
At the time Stanford did the study, there weren’t any FDA-approved COVID-19 antibody tests for clinical use. But for research purposes, the team purchased tests from Premier Biotech in Minnesota. Premier has started marketing a COVID-19 antibody test, but it doesn’t create it. The test listed on the company’s website, and that it appears Stanford used, is from Hangzhou Biotest Biotech, an established Chinese lab test vendor. It is similar in concept to a number of COVID-19 antibody tests that have been available in China since late February and the clinical test data matches the data Stanford provides exactly, so it appears to be the one used.
In particular, the sensitivity and particularly the specificity results for the Hangzhou test are impressive — and important. The researchers analyzed test results from the manufacturer and complemented them with additional testing on blood samples from Stanford. Overall, they rated the sensitivity of the tests at 80.3 percent and the specificity at 99.5 percent. Strikingly, though, the manufacturer’s test results for sensitivity (on 78 known positives) were well over 90 percent, while the Stanford blood samples yielded only 67 percent (on 37 known positives). The study combined them for an overall value of 80.3 percent, but clearly, larger sample sizes would be helpful, and the massive divergence between the two numbers warrants further investigation. This is particularly important as the difference between the two represents a massive difference in the final estimates of infection rate.
On sensitivity, the manufacturer’s results were 99.5 percent for one antibody and 99.2 percent for the other, on 371 samples. The tests for both antibodies performed perfectly on Stanford’s 30 negative samples. Overall, Stanford estimated the test sensitivity at 99.5 percent. That’s important because if the sample population is dominated by negative results — as it is when testing the general public for COVID-19 — even a small percentage of false positives can throw things off.
There is some additional reason to be skeptical about the particular test used. In another pre-print, researchers from Hospitals and Universities in Denmark rated the Hangzhou-developed test last in accuracy of the nine they tested. In particular, it had only an 87 percent specificity (it misidentified two of 15 negative samples as being positive). That is a far cry from the 99.5 percent calculated by Stanford:
Models Have Error Bars for a Reason
The paper is quite upfront about the large potential errors introduced by the relatively small sample sizes involved. For example, the 95 percent Confidence Interval (CI) for specificity is given as 98.3 to 99.9 percent. If the specificity was actually 98.3 percent, the number of false positives would just about equal the number of positive results in the study. The team’s own paper points out that with slightly different numbers, the infection rate among its test subjects could be less than 1 percent, which would put it fairly close to existing estimates. Obviously errors in specificity could be canceled out by offsetting errors in sensitivity, but the point is that news headlines never seem to come with error bars.
Models and studies also need to be reality checked against known data. For example, the Stanford study estimates that the actual mortality rate for COVID-19 among the general population is .12-.2 percent, instead of the much larger figures we’re used to reading. However, New York City already has a COVID-19 mortality rate of around .15 percent of its total population. That would imply that every single resident of New York City has been infected and had enough time for the disease to have taken hold.
As unlikely as that is, more people are unfortunately dying there each day, so it just isn’t plausible that the mortality rate there is as low as Stanford’s paper estimates. Here, too, they point out that there are lots of variables at play that would affect mortality rates. But those caveats are small solace if people run off with the headline numbers as if they were settled science.
The Study’s Selectivity Bias May Not be Fixable After the Fact
Volunteers for the study were recruited via Facebook ads, for reasons of expediency. The researchers have done an impressively thorough job of trying to correct for the resulting demographic skew of volunteers compared with the general population of Santa Clara County — ultimately estimating that the general public has nearly twice the infection rate of their subjects. Demographically, that might make sense, but it completely ignores how volunteers might self-select. Those who felt sick earlier in the year but thought it was the flu, those who thought they had COVID-19 but couldn’t get tested, those who had traveled to China or Europe, and those who’d been in contact with someone with COVID-19 but been unable to get tested would all seem like very likely enthusiasts for a quick sign up. After all, volunteering meant spending a chunk of a day waiting in a parking lot to have your finger pricked.
There doesn’t seem to have been any attempt to measure or control for this bias in subject selectivity. As a result, it is hard to see how the study can be interpreted as literally as it has been by so many sources.
It’s great that we’ve finally started to collect some data on the true incidence of COVID-19 here in the United States, and a much higher than expected incidence of infections certainly has implications in determining how fatal it is and the best approach for dealing with it. However, we need to look past the headline and remember that this is just one small piece of a very large puzzle. It’s going to take a lot more work to fill the rest in.