The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

How To Dodge The Simpson's Paradox In Descriptive Analytics

May 30, 2018 at 8:56 AM

The Simpson's Paradox is a phenomenon in statistics illustrating how easy it is to misinterpret data. (Click to Tweet!) It occurs mainly in descriptive and diagnostic analytics (see our blog on the different types of analytics) where an analyst may jump to a conclusion driven by motivated reasoning and not by objectively assessing the evidence.

This blog is part of a series of blogs on how to avoid the logical fallacies and cognitive biases in data science.

Today we look at a famous example of the Simpson's paradox, and that is a study from the University of Berkley where admission records appeared to show that males are favoured over females.  When breaking it down by department, it seems that there was no noticeable difference in male over female admissions. 

Success Rates0Let's have a look at the numbers.  The table shows that the male applicants have a 47% success rate compared to 36% for female applicants. A rash conclusion, as admissions were managed at the departmental level, would be that male applicants are being favoured above female applicants.

Success Rates 1When one assesses the statistics at department level a different picture emerges.  Here not only does it appear that women enjoyed a higher success rate in four of the six departments, but the biggest differences between genders favours women (department I and VI).

So what's going on here?

Success RatesThe first thing one needs to do is to navigate away from the percentages and look specifically at the numbers. Department III shows that a much higher proportion of women applied to a department with a relatively low success rate whereas the men did not.  Conversely Department I had a high proportion of men apply with a relatively good success rate, but very few women applied despite a very high success rate.  

The overall conclusion was the fact that women applied in larger proportion to the departments where it was difficult to get in and in lower proportions to departments where it was easy to get in. There was no departmental bias it seems, just application biases.

Essential tips to avoid the Simpson Paradox

  1. Try and understand the base data (numbers) – i.e. avoid relying solely on percentages.
  2. Do not be swayed easily in concluding what you (or your boss) want to see in the numbers (motivated reasoning), instead conduct the full analytical exercise (try and blind/double-blind your analysis if you can)
  3. Read up on as many statistical paradoxes as you can. Your awareness of the statistical pitfalls will better prepare you to avoid them in your analysis. (Click to Tweet!)

predictive analytics guide

Thomas Maydon
Thomas Maydon
Thomas Maydon is the Head of Credit Solutions at Principa. With over 13 years of experience in the Southern African, West African and Middle Eastern retail credit markets, Tom has primarily been involved in consulting, analytics, credit bureau and predictive modelling services. He has experience in all aspects of the credit life cycle (in multiple industries) including intelligent prospecting, originations, strategy simulation, affordability analysis, behavioural modelling, pricing analysis, collections processes, and provisions (including Basel II) and profitability calculations.

Latest Posts

[Slideshare] How To Make Your Business Data Work For You

Common barriers to success: Skills shortage: data scientists are in high demand and in low supply. Companies lack the skills to develop advanced data analytics or machine learning applications. Cost: recruiting and building up or training a team, as well as infrastructure costs are immense. Inefficiency and low ROI on: acquisition campaigns; re-activation and retention campaigns; outbound sales calls and debt collection. Resulting in: No or ineffective use of data. High cost to get insights from data. Low returns from campaigns. What’s the alternative? Machine Learning as a Service (MLaaS): removes infrastructure skills and requirements for machine learning, allowing you to begin benefiting from machine learning quickly with little investment. Subscription based pricing, allowing you to benefit using machine learning while minimising your set-up costs and seeing returns sooner. Answers as a Service: Use historic data and machine learning to allow answers to increase in accuracy with time. MLaaS with predictive models pre-developed to answers specific questions: Genius Call Connect: What is the best time and number to call customers? Genius Customer Growth: Which customers are most likely to respond to cross-sell? Genius Re-activation: Which dormant customers are worth re-activating? Genius Customer Retention: Which customers are most likely to churn? Genius Leads: Which contacts are likely to respond to my campaign? Genius Risk Classifier: Which debtors are most likely to pay or roll? Benefits of Genius: Quick and cost-effective ability to leverage machine learning: Minimal set-up time Minimal involvement from IT Subscription based service Looking to make your data work for your business? Read more on Genius to see how it can help your business succeed. 

5 Must-Join Facebook Pages For Data Science, Machine Learning And Artificial Intelligence In 2019

While LinkedIn has traditionally been thought of as the business or work focussed social platform, Facebook has been making headway into gaining market share in the space as well. With company pages and groups, Facebook is catering to every interest and aspiration that people might have – and combining that with their social interactions and news sources. Facebook aims to give users a one-stop-shop experience, and it’s very good at doing it.

Our 2018 Customer Acquisition And Engagement Blog Roundup

Our final roundup this year covers two of our main topics: customer acquisition and customer engagement. We’ve not covered these topics in depth this year, and so decided to combine these two to provide a roundup of the best of both.