Track 43 – Fire (The Pointer Sisters)

Exactly five years ago I obtained my PhD degree which marked the start of my postdoctoral career. Back then, I could have imagined that my research activities would change over time, but likely not as much as they did. In the years before obtaining my PhD, for example, I spent more than 20 hours per week in the lab, thus corresponding to around 1,000 hours per year. In the past three years, however, I probably did not spend more than 100 hours in the lab in total. Instead, I am processing and analyzing data on my computer, which sometimes almost seems to catch fire.

I should thus maybe rebrand myself as a scientist, especially by stop seeing myself purely as an analytical scientist. From now on, I see myself more as a data scientist which comes along with new interests, challenges, and topics for blog posts. And connecting to the latter, I will start off with an intriguing example of how data can fool us, at least that is what data can do to me. I am curious whether the story below about comparing between hospitals can trick you as well.

So, what if you could choose between two hospitals having the following yearly success rates for treating kidney stones:
Hospital 1: 265 successful treatments and 400 treatments in total (=66%)
Hospital 2: 280 successful treatments and 400 treatments in total (=70%)
I guess that you would feel inclined to select Hospital 2 over Hospital 1

Well, it happens to be the case that the aforementioned statistics reflect the sum of two distinct conditions, namely small kidney stones and large kidney stones. If we would zoom in on the separate performance metrics for both hospitals, it turns out that Hospital 1 is the preferred option for both conditions:
Small kidney stones:
Hospital 1: 85 successful treatments and 100 treatments in total (=85%)
Hospital 2: 240 successful treatments and 320 treatments in total (=75%)
Large kidney stones:
Hospital 1: 180 successful treatments and 300 treatments in total (=60%)
Hospital 2: 40 successful treatments and 80 treatments in total (=50%)

At this point, you might want to stop reading and check whether I did not make any typos. Because intuitively, it does not make any sense that Hospital 1 performs better when looking at both conditions separately but that the combined performance is worse than that of Hospital 2.

The boring truth, however, is that my calculations are right, and that our intuitions are being fooled by something we refer to as “Simpson’s paradox”. This statistical phenomenon teaches us that associations observed in a population can disappear or even reverse when dividing the population into subpopulations. More importantly, it teaches us to be mindful when preparing, conducting, and evaluating data analysis projects, which you should probably do together with a team rather than by yourself. And since I like working in teams, I am pretty satisfied with this starting point of solid data science projects.