Garbage in, garbage out

What election forecasting and nutrition research have in common

Nov 12, 2020

It’s not politically controversial to say that most of the polls preceding the presidential election last week were way off. As a consequence, most predictions about the election got a lot of things wrong.

This isn’t news.

Nate Silver, the most famous of the election forecasters, writes on his website that the polls weren’t great, but they weren’t that off, at least by historical standards.

During election season, I started to get into the weeds a bit about the details regarding polling (the act of surveying public opinion) and forecasting (what Silver does, which is aggregating the polls and analyzing them).

I started this election season thinking that polls could quantify public opinion in a way that illuminated the conversation, making us all smarter.

I came out of election season thinking that political polls and election forecasting are the political equivalent of nutritional research.

For most of us, they’re both just mental junk food.

Election forecasting and nutritional science are both fields populated by brilliant people doing high level analytics

When you’re analyzing data to draw conclusions that people will use to make important life decisions, it’s important that your models make sense.

What’s striking about both political forecasting and nutritional science is that they’re both being done by incredibly smart people.

Look at the details of the FiveThirtyEight forecast methodology. It’s amazingly detailed, with a model that analyzes economic data back to the 1880s, adjusts polls by historical accuracy of the pollster, and creates an uncertainty estimate to account for systemic polling errors.

In other words, it’s a brilliant model created by geniuses.

A lot of nutrition research is similar. Take a look at this study that made it into the New England Journal of Medicine: Association of Nut Consumption with Total and Cause-Specific Mortality.

The authors analyze data from tens of thousands of patients over decades, they use complex statistical adjustments to account for confounding variables, and update their data on nut consumption every 2-4 years.

Incredibly thoughtful modeling created by some of the brightest minds in research.

Unfortunately, political polls and nutritional science both suffer from the “garbage in, garbage out” problem

I mentioned before that when you’re creating a model that will influence the way that people make important decisions in their lives, it’s vital that the model makes sense and that the math isn’t wrong.

But any model is only as good as the data that goes into it.

No statistical analysis can draw useful conclusions from useless information.

And unfortunately, with both polling and nutritional epidemiology, the inputs to these fancy models aren’t accurate enough for us to make important decisions based on.

Polls get it wrong all the time.

Nutritional epidemiology is a field in need of radical reform.

These problems aren’t solved by doing better models.

Just take a look at how nutrition data is collected

For all of these nutrition studies, the majority of data is based on food frequency questionnaires.

Have you ever taken one of those?

Seriously, take a look at it. Here’s an example of a few questions.

How accurately do you think the average person can answer these questions? How accurately do you think that you could answer these questions?

The researchers then do all kinds of fancy statistical adjustments based on individual patient information to draw conclusions like “the frequency of nut consumption was inversely associated with total and cause-specific mortality.”

Polling isn’t much better than nutritional research

You can read a lot about how the polls weren’t as bad as they’re being made out to be or how the forecasts were actually fairly accurate when thinking in probabilistic terms.

Just like with nutrition research, I would argue that this misses the point.

Pollsters made all kinds of changes after their polls were inaccurate in 2016. The 2020 polls incorporated these adjustments. And a similar level of inaccuracy persists.

I get that there can be disagreement about election forecasting depending on how you model for uncertainty.

But the consistency of polling errors (even if it’s unpredictable which direction the error is going to be), makes me think that forecasting suffers from the same data quality problem that nutrition science does.

The more meta-problem: the constant need for content creation

Political polls and political forecasting is a content factory. Look how much is written each day on FiveThirtyEight or any other election forecasting site.

Think about how many people “checked the polls” every day during the weeks leading up to the election.

So much wasted time and energy on cable news, Twitter, and in the newspaper pontificating on the polls, why they’re changing, and what it all means.

It’s not really different than the latest Today Show segment about how 4 cups of coffee a day are bad for men.

We’re amusing ourselves to death.

With election forecasting, we’re learning nothing new about what might happen in the election despite the constant stream of new polls. And with nutrition, we’re learning nothing new about what’s healthy despite the constant influx of new research.

Garbage in, garbage out.

Share Gregory Katz's email newsletter

Thank you for reading! If you’re enjoying my newsletter, please consider sharing with your friends and family and encouraging them to subscribe!

I always appreciate any feedback or thoughts you might have. You can reply directly to this email to reach me directly.

Greg Katz's Newsletter