Covid Vaccines #2: Can We Trust the Data?

In my first post on covid vaccines, I made a case for why vaccination is the best way to love our neighbors and be wise stewards of our own health. However, my argument had a major weakness. I think most skeptics would agree that vaccination is wise if everything I assumed was true was actually true. It’s that “if” which causes all the trouble. Most of us have heard any number of arguments that the data about covid and vaccines is unreliable, or that the vaccines don’t really work, or that they are dangerous. If those arguments are sound, then the case for vaccination collapses—and some of them seem quite compelling at first glance.

In this post, I will look at one major set of arguments against the vaccines: that, for one reason or another, we shouldn’t trust the numbers on covid and vaccination. This is an important question, because our picture of the pandemic and its possible solutions is necessarily built from statistics, percentages, and probabilities. Because covid is only moderately lethal compared to viruses like smallpox or the Spanish flu, we aren’t going to see bodies littering our neighborhoods. The official number of roughly 600,000 dead is “only” one in 500 Americans. Even though that’s twice as many Americans as died in World War II, it’s still few enough that most of us can’t get a good sense of the magnitude of the tragedy by personal experience unless we work in a hospital in a hard-hit region. For most of us, our picture of the pandemic has to come from numbers, and if we can’t trust the numbers, we can’t even begin to discuss anything else.

So let’s look at those numbers. I’ll first discuss the overall numbers which inform our understanding of the pandemic in the US, then more recent accusations that the CDC is intentionally fudging the numbers to blame most current cases on the unvaccinated. If you think I missed an important argument against the accuracy of covid numbers, please share a summary or link in the comments and I’ll update this post if needed.

It’s easy to worry that a shadowy “them” is somehow lying about the covid numbers. However, we should be cautious about arguments which blame “them” without being able to clearly sketch out who “they” actually are and how they could do whatever they are being blamed for. If we’re talking about vaccine safety data during trials, the original data was collected by individual doctors and hospitals, collated by drug companies, and evaluated by federal regulators. After the trials, ongoing data about vaccine safety is collected by doctors and hospitals, submitted to the Vaccine Adverse Event Reporting System (VAERS), and evaluated by a different set of regulators. Data about covid itself (cases, hospitalizations, etc.) is gathered by doctors and hospitals and submitted to various information-gathering systems at the county, state, and federal level.

Obviously, there is plenty of room for dishonesty or mere incompetence as information moves through these systems, but there is also a lot of opportunity for whistleblowers and for competing interests to keep one another honest. For example, drug companies have an obvious interest in coloring data favorably during trials, but federal regulators are famously risk-averse, insisting on comprehensive and verifiable data before approval. (There have been reams of angry articles written against the FDA for years for being too slow to approve potentially life-saving treatments because they felt there was insufficient safety data!) Meanwhile, the actual data-gathering starts with everyday doctors whose main interest is simply in treating their patients. Any theory for how numbers might be misrepresented will need to accommodate the complexity and accountability built into these systems. Any idea that Dr. Fauci, drug executives, or anyone else could wave a hand and swap in whatever numbers they want just isn’t grounded in reality.

Are hospitals paid more for covid patients?

But what if there was a strong financial incentive for doctors to overcount covid cases? If that were the case, the initial data inputs would be potentially untrustworthy, not because of some outlandish mass conspiracy but simply because of the coordinated self-interest of thousands of doctors who were incentivized to mis-code hospitalizations and deaths to get more cash.

This accusation gained prominence following a Fox News interview with doctor and Minnesota state senator Scott Jensen in April 2020. He stated that federal Medicare reimbursement was significantly higher for treating covid cases, and argued that the higher reimbursements created an incentive to count cases as covid when they really weren’t. A number of fact-checks have confirmed his assertion that hospitals have been paid more for covid patients.

However, the existence of a financial incentive isn’t the same as proof that anyone is acting on it. (My life insurance policy means my wife would get a large check in the mail if she had me murdered, but I’m not going around in a bulletproof vest.) Even apart from moral considerations which would deter many, there are practical reasons not to mislabel cases. For one thing, it isn’t hospital administrators making these diagnoses, but individual doctors who would get no direct financial benefit. Of course, administrators might exert pressure behind the scenes, or doctors might act for the perceived benefit of their institution, but everyone involved would know they were committing Medicare fraud. If they were caught, they would have to reimburse misallocated funds and face civil or criminal penalties, including potential expulsion from the Medicare program.

Our world is full of opportunities for illicit profit which most people reject because of a guilty conscience or fear of consequences. It is unreasonable to assume higher Medicare reimbursements would necessarily create rampant fraud across hundreds of hospital systems, merely because the possibility exists.

However, we don’t need to leave the question there. There is actually solid, hard-to-fake data which suggests the covid picture is at least roughly the same as what the official numbers suggest. According to aggregate data from death certificates, covid killed around 345,000 Americans in 2020. (I am using 2020 numbers because 2021 data is still incomplete.) These death certificates were filled out by individual doctors and coroners across the medical system. If covid was being systematically overdiagnosed and overcounted, that discrepancy should show up here. We would expect to see a relatively normal overall mortality record for the year, with other types of deaths (perhaps flu, pneumonia, etc.) surprisingly low, as medical personnel took real deaths and falsely attributed them to covid. Instead, the count of excess deaths, those above the number which would normally be expected in a year, was very high in 2020. This suggests the deaths attributed to covid were real cases of covid, not mislabeled instances of other, routine diseases. However, that brings up another common complaint about covid numbers…

Does CDC data show no excess deaths?

To reiterate, according to death certificates covid killed roughly 345,000 Americans in 2020. Death records cannot simply be created out of thin air, certainly not in large numbers, so if covid numbers were being misrepresented we would expect an approximately normal total number of deaths for 2020, but roughly 345,000 fewer deaths in non-covid categories corresponding to whichever types of deaths were being mislabeled as covid.

According to some, that’s exactly what CDC data showed! Perhaps the most widely circulated claim originated in a retracted article from a Johns Hopkins student-run newsletter, reporting on a November 2020 presentation by Genevieve Briand, an economics professor. Briand stated that CDC data demonstrated no increase in overall deaths in March-September 2020 and speculated that “deaths due to heart diseases, respiratory diseases, influenza and pneumonia may instead be recategorized as being due to COVID-19.” Claims like Briand’s and others that circulated widely on social media seemed compelling because they claimed to be based upon the CDC’s own data.

However, the CDC’s tracking dashboard consistently showed significant numbers of excess deaths throughout 2020, and by October 2020 multiple studies based on data collected by the CDC’s National Center for Health Statistics (NCHS) had counted around 300,000 excess deaths up to that point in 2020. Personally, when I have tried to check claims like that of Briand, I’ve never been able to figure out what CDC data they were purportedly seeing which gave alternative tallies. My best guess is that they were looking at incomplete, preliminary data for recent months and incorrectly assuming it had been finalized. Typically the claims referred vaguely to “CDC numbers” or merely linked to the CDC tracking dashboard, leaving it unclear where their different numbers were coming from.

Once the NCHS published provisional 2020 mortality numbers in early 2021, the data showed an approximately 17.7% increase in deaths for 2020, with 3,358,814 deaths in 2020 compared to 2,854,838 in 2019, an increase of 503,976. While some types of non-covid deaths increased in 2020, none can come close to accounting for that amount. (For example, drug overdose deaths exploded in 2020, but the total increase was only about 20,000 over 2019.) In fact, many experts believe the high excess death tally suggests covid deaths were significantly undercounted in the official CDC estimate of 345,000. At minimum, the hundreds of thousands of extra deaths in 2020 make it hard to argue that so-called covid deaths were simply the result of mislabeling other, ordinary causes of death. No national data-gathering program will be entirely accurate, but there seems to be no good reason to doubt the overall numbers on covid hospitalizations and deaths.

Are PCR tests unreliable?

But what about covid cases in general? For months, some have argued that the PCR tests which are commonly used to diagnose covid infections are unreliable, especially at high cycle thresholds. According to many experts, those concerns are largely invalid. In general, I’ve found the medical establishment more trustworthy than covid skeptics over the past 18 months, but neither side has covered itself with glory, so I’m not inclined to accept either claim on mere authority. However, even if we assume a degree of skepticism about PCR tests, that doesn’t greatly impact our ability to make judgments about the pandemic.

Epidemiological testing is often imprecise. For example, the CDC numbers for seasonal flu cases are obtained by multiplying actual reported case counts by approximately 25, in part to accommodate underreporting and false negative tests (a multiplier not used for covid counts, incidentally). Is that likely to produce an exactly accurate count? Of course not. But the goal is to get a roughly accurate figure, one that can be used for estimating case prevalence and recognizing trends over time.

We have already seen that hospitalization and death data seems generally trustworthy. A covid diagnosis from a hospitalized patient is also much more likely to be accurate, because it will be based on symptoms and multiple tests, and because the prevalence of the virus in a sicker hospitalized patient should be detectable with a lower number of amplification cycles.

If PCR critics are correct and higher-cycle samples are less trustworthy, that would mean some percentage of diagnosed cases weren’t really covid. If you imagine one of those trend-line graphs we’re all tired of seeing, that would mean the “cases/infections” line should be lower. On the other hand, if we accept the CDC’s flu-reporting assumption that there will be a substantial amount of underreporting and false negatives, perhaps the line ought to actually be higher. What this uncertainty actually illustrates is that our crisp-looking line on the graph should really be more of a fuzzy swath reflecting uncertainty about the exact numbers. How the line moves (cases increasing or decreasing) is far more important than exactly where it is.

Until vaccines arrived, months of experience demonstrated that when the case count trend line moved upward, hospitalizations and then deaths followed that trend line a few weeks later. That is why case counts matter. Not because of the exact raw numbers, but because of their correlation with the numbers that actually matter: hospitalizations and deaths. Case counts just give us a leading indicator of how we are trending and what’s coming in a few weeks once some of the sick people end up in the hospital or morgue.

As long as hospitalization and death counts are basically accurate, and as long as covid diagnoses of hospitalized patients are generally correct, the exact precision of PCR testing isn’t very important. We should assume case counts are fuzzy—whether because of PCR test imprecision or other testing and reporting imperfections—but that just doesn’t have much bearing on most policy questions.

Are vaccinated cases counted differently?

If 2020’s big data-related question was whether CDC reporting showed excess deaths, 2021’s big question is whether data showing much lower rates of hospitalization and death for vaccinated people is reliable. This accusation seems to have begun with a May 2021 post on Off-Guardian.org, which made two explosive allegations (summarized in my words):

  1. New CDC guidance states that samples only from vaccinated people should be tested to a lower threshold of 28 or fewer cycles. This ties back to the argument that using more cycles produces false positives. The idea is that the CDC is knowingly testing unvaccinated people at high cycle thresholds which produce many false positives, while samples from vaccinated people are tested with a more accurate method, creating the illusion that much greater numbers of unvaccinated people are being infected.
  2. Asymptomatic or mild infections will no longer be counted as covid cases, only for vaccinated people. So if a vaccinated person gets mild covid, it’s omitted from case counts, while the same case of mild covid is counted if they are unvaccinated. Again, the result is a wildly skewed data set where the vast majority of covid cases will be among the unvaccinated, simply because vaccinated cases are excluded.

This claim—which rocketed around the internet and was picked up by larger sites like Zero Hedge—is another example of a technically true “fact” which was badly misunderstood and then passed along without even the most basic verification. The Off-Guardian post quotes real guidance from the CDC… for a very specific program which has nothing whatsoever to do with overall case counts or determining the percentage of vaccinated vs. unvaccinated people who have tested positive.

The CDC guidance which was quoted is specifically for reporting vaccine breakthrough cases. Once the initial covid test has been run and the data fully recorded according to federal law, IF the test was positive and IF the sick person was vaccinated (a breakthrough infection), the CDC has a separate reporting system set up to monitor those cases to track vaccine efficacy, the prevalence of different variants, and other important questions specific to breakthrough infections. Since genomic sequencing to determine the presence of variants is an important element of their analysis, they only want labs to send them specimens with enough viral material for genomic analysis—hence the request for samples detectable at low cycle thresholds, and those from symptomatic individuals. But, again, this guidance is only for special analysis of breakthrough infections and has nothing to do with legally required reporting of all positive covid tests.

In the end, despite all the noise, I have not found any credible arguments which undermine my confidence in the basic reliability of the official covid data. Is it 100% accurate? Surely not. Is it accurate enough to draw broad conclusions? It does seem to be. In my next two posts, I will look at what the data indicates, first about vaccine efficacy, then about vaccine safety.

Is all the data hopelessly outdated? (added 8/27/21)

I forgot to mention one other argument against the integrity of the covid data. It is an astonishingly uninformed concern, but it has been passed along seriously by a number of sources who many people trust, so it is worth addressing. This August 16 article by Dr. Mercola is a good example. He writes (correctly) that a specific CDC briefing in July on covid infection rates among the vaccinated vs. the unvaccinated was based on data collected from January through June 2021. Since much of the data was collected early in the year when few had been vaccinated and the delta variant was not circulating, including it made the aggregate data referenced in the briefing less representative of our present situation and therefore less helpful.

From this, Mercola implies that all available data on vaccination rates is similarly skewed, as if the only possible source of data on vaccination is this single CDC dataset referenced in the briefing, and that dataset could not possibly be reassessed to include only more recent and representative data. This is, to borrow Mercola’s own indignant subhead, “Grossly Misleading.” In my next couple posts, I’ll look at more recent and representative data on vaccination, which is plentiful and comes from many different sources, both within the US and internationally.

Did you enjoy this article? Add your email below to get new posts sent to your inbox!

Leave a Comment