The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

How To Dodge The Simpson's Paradox In Descriptive Analytics

May 30, 2018 at 8:56 AM

The Simpson's Paradox is a phenomenon in statistics illustrating how easy it is to misinterpret data. (Click to Tweet!) It occurs mainly in descriptive and diagnostic analytics (see our blog on the different types of analytics) where an analyst may jump to a conclusion driven by motivated reasoning and not by objectively assessing the evidence.

This blog is part of a series of blogs on how to avoid the logical fallacies and cognitive biases in data science.

Today we look at a famous example of the Simpson's paradox, and that is a study from the University of Berkley where admission records appeared to show that males are favoured over females.  When breaking it down by department, it seems that there was no noticeable difference in male over female admissions. 

Success Rates0Let's have a look at the numbers.  The table shows that the male applicants have a 47% success rate compared to 36% for female applicants. A rash conclusion, as admissions were managed at the departmental level, would be that male applicants are being favoured above female applicants.

Success Rates 1When one assesses the statistics at department level a different picture emerges.  Here not only does it appear that women enjoyed a higher success rate in four of the six departments, but the biggest differences between genders favours women (department I and VI).

So what's going on here?

Success RatesThe first thing one needs to do is to navigate away from the percentages and look specifically at the numbers. Department III shows that a much higher proportion of women applied to a department with a relatively low success rate whereas the men did not.  Conversely Department I had a high proportion of men apply with a relatively good success rate, but very few women applied despite a very high success rate.  

The overall conclusion was the fact that women applied in larger proportion to the departments where it was difficult to get in and in lower proportions to departments where it was easy to get in. There was no departmental bias it seems, just application biases.

Essential tips to avoid the Simpson Paradox

  1. Try and understand the base data (numbers) – i.e. avoid relying solely on percentages.
  2. Do not be swayed easily in concluding what you (or your boss) want to see in the numbers (motivated reasoning), instead conduct the full analytical exercise (try and blind/double-blind your analysis if you can)
  3. Read up on as many statistical paradoxes as you can. Your awareness of the statistical pitfalls will better prepare you to avoid them in your analysis. (Click to Tweet!)

Truthseeker - logical fallacies

Thomas Maydon
Thomas Maydon
Thomas Maydon is the Head of Credit Solutions at Principa. With over 17 years of experience in the Southern African, West African and Middle Eastern retail credit markets, Tom has primarily been involved in consulting, analytics, credit bureau and predictive modelling services. He has experience in all aspects of the credit life cycle (in multiple industries) including intelligent prospecting, originations, strategy simulation, affordability analysis, behavioural modelling, pricing analysis, collections processes, and provisions (including Basel II) and profitability calculations.

Latest Posts

Solving the Credit Unaware Challenge with Psychometrics

At Principa, we engage with clients and organisations across the entire credit lifecycle and track the focus of the South African credit industry. For nearly ten years the focus has consistently been in the collection space, but recently (since early 2021) this has changed and a large number of our clients are focused on acquisitions and originations.

Predicting Customer Behaviour (PART 2)

In Part One of this two-part blog, we started providing a short overview of just some of the propensity models that Principa has developed. In this Part Two, we continue to look at different types of propensity models available across the customer engagement lifecycle that are used to predict behaviour and solve business problems. 

PART 2: How to Cure the Post Pandemic “Collections” Symptoms

In PART 1 of this two-part series, we explored how the current socio-economic climate resulting from the lingering financial hangover caused by the pandemic is negatively impacting the consumer's ability to settle a debt.