Data Is Not Enough: Simpson’s Paradox

TLDR:
The total aggregate can show the opposite trend of the aggregate of its subdivisions. This statistical Gestalt explains how people can justifiably live in alternate realities. Data alone is not enough

Gestalt theory says that that the whole is different from the sum of its parts. In statistics this is called Simpson’s Paradox.

So when we hear people today say “listen to the data” or “I’m data driven” that doesn’t really mean much because data are susceptible to paradoxes, and can tell contradictory stories depending on how they are clustered.  And bad actors manipulate this fact to use data you accept to prove conclusion you do not, trying to persuade you that day is night.

As an example lets take the covid incidence rates for 3 states, and each state shows an increasing trend. But each state show lower overall rates than the preceding. So each individual state show increasing rates, but the total shows decreasing.

This dynamic also explain how one can win the electoral college yes loose the general.   Or vice-versa.  It’s why gerrymandering works.  You can manipulate the cluttering to achieve the effect you want, regardless of the total aggregate.

IOW, you can use the same data to tell contradicting stories. So data alone are not enough.