The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

Eight Questions On The Gini-Coefficient

September 25, 2019 at 8:48 AM

Whether you’ve been involved in introducing models into your business or have had a passing interest in economic affairs, you may have come across the term “Gini-coefficient”. This blog hopes to demystify the concept and give you a good deal of information on the statistical measurement. We answer:

  1. What is the Gini coefficient?
  2. How is it applied in economics?
  3. How did it come about?
  4. What does the Gini mean with reference to a scorecard?
  5. Is it a good measure of a scorecard's strength?
  6. What other scorecard performance measurements are there?
  7. What is a good Gini?
  8. How is a Gini calculated?

1. What is the Gini coefficient?

The Gini coefficient is a measure of statistical dispersion. This simply means it measures the separation of two populations. In economics this might be the rich and the poor. In modelling this may be goods and bads.  It is a measure from 0-100 (strictly speaking from -100-100). 0 means that there is no dispersion – or that there is “equality”. 100 means that there is perfect dispersion – or that there is complete “inequality”.

2. How is it applied in economics?

Gini-mapGini-graphThe Gini has multiple uses, but arguably the most common use is as a measure of a country’s income distribution.  Here the economist is interested in what portion of the income of the country is earned by what portion of the population.  The economist ranks the population from the largest to smallest earners. The economist then measures the portion of the total income earned by each individual.  We may find that the richest 10% earn 25% of the total income of the country; the richest 20% earn 42%, and the richest 30% earns 57% of the income (etc.). This can all be plotted on a graph – with proportion of income on one axis and proportion of population on another. If you calculate the area between the orange and grey line (and multiply it by 2) you get the Gini coefficient.  South Africa infamously has the highest Gini coefficient of income disparity (as illustrated by the map).

3. How did it come about?

You’ll see that it is sometimes referenced as “GINI coefficient” and sometimes “Gini-coefficient”.  Only the latter is correct as it is named after a sociologist who, in 1912, invented it: Corrado Gini (i.e. it’s not an acronym).  

4. What does the Gini mean with reference to a scorecard?

The Gini represents the scorecard’s ability to differentiate between good and bad. The goods and bads should be distributed in two distributions.  A large Gini would mean a larger separation of goods and bads and visa-versa.


5. Is it a good measure of a scorecard’s strength?

Yes and no! The Gini represents the ability for the scorecard to separate goods from bads. But, the population is often split into more than just a binary set of goods and bads. For example, when reviewing an application population, some applicants may well have been rejected. We therefore normally build a model utilising reject inference too. Analysts often forget that when monitoring scorecard performance, they should note that the development data was subject to reject inference and the monitoring sample was not – so it may appear weaker. You can get around this, by calculating the Gini of the development sample from the cut-off upwards and comparing this to the monitoring sample.

Confounded metric

The Gini may often give a conflicting view of what is actually going on. It may be worth looking at an extreme example. If you had a perfect model where you rejected 100% of the bad population, your Gini calculation would be 0 (you would have no Bads to calculate your GINI). However, your model is likely very good, and therefore 0 cannot be the Gini. While this is extreme the notion still applies to scores, particularly where a significantly high population is rejected.

A similar view can be looked at with behaviour and collection scores. Aggressive behaviour (e.g. marketing heavily to low risk) – may push them to be more like medium risk customers compared to not marketing to them. Similarly, aggressively collecting on high risk collections customers may make them behave more like the medium risk customers. In both cases these behavioural scorecards may appear to have flat Gini's when in fact they may well be okay. 

In the interest of not throwing the baby out with the bathwater, the Gini – if used correctly by an analyst aware of some of the points listed above - can provide a decent metric of performance.

6. What other scorecard performance measurements are there?

Remember that a Gini reflects the scorecards ability to separate Good from Bads across all score ranges. That’s a good thing to know if you are going to use the full score ranges for different purposes. However, many of our clients use the score for a single score cut-off in which case the only important metric is how strong the scorecard is at the cut-off. There are many other metrics, but two useful ones are listed here: 

Kolmogorov-Smirnov (KS) statistic: this measures how strong the discrimination is at the scorecard strongest point.  This may be a high score or a low score.

Scorecard discrimination (information value): if you have set up risk bands, you can calculate the information value (I’ve linked to a previous blog where this is explained).

Other worthwhile metrics include “R-squared” and divergence.

7. What is a good Gini?

Gini-cumulative-1We frequently get asked what is a good Gini. For the answer to this, it really depends on what you are measuring, what population and what data. A rough benchmark is as follows:

  1. Application scorecards (demographic data) – Gini: 25-45 (dependent on portfolio)
  2. Behavioural scorecards (internal behavioural data) – Gini: 35-60 (dependent on portfolio)
  3. Collections scorecards (internal collections data) – Gini: 60-80
  4. Bureau scorecard calibration – Gini: 30-45 (dependent on portfolio)

8. How is a Gini calculated?

The Gini is the calculation of the yellow area in the graph above (multiplied by 2). To calculate it, one needs to calculated the area of strips across the Lorenz curve. The spreadsheet and graph shows a rough example.


For more information on Principa’s modelling and data analytics services – reach out to us here.

Subscribe to our blog

Thomas Maydon
Thomas Maydon
Thomas Maydon is the Head of Credit Solutions at Principa. With over 17 years of experience in the Southern African, West African and Middle Eastern retail credit markets, Tom has primarily been involved in consulting, analytics, credit bureau and predictive modelling services. He has experience in all aspects of the credit life cycle (in multiple industries) including intelligent prospecting, originations, strategy simulation, affordability analysis, behavioural modelling, pricing analysis, collections processes, and provisions (including Basel II) and profitability calculations.

Latest Posts

Model validation and adjustment

The time is NOW for model validation and adjustment. One of the major premises used in credit scoring is that “the future is like the past”. It’s usually a rational assumption and gives us a reasonable platform on which to build scorecards whether they be application scorecards, behavioural scores, collection scores or financial models. That is reasonable until something unprecedented comes along. You can read about this black swan event in our previous two blogs here and here.

Payment holidays – what did everyone do?

Payment holidays have been used throughout South Africa and around the world to help alleviate the economic stress during the COVID-19 lockdown. In this blog we look at some of the steps taken internationally and by some of South Africa’s major lenders (specifically in the consumer space).

Psychometrics in credit originations

If 2020 was not hit by the COVID-19 global pandemic, many were touting 2020 as the year of alternative data. In the credit assessment world, data has typically incorporated demographic data and credit bureau data (where available), but now we are seeing alternative data playing more of a role namely in cellular behavioural data and psychometrics.