The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

Eight Questions On The Gini-Coefficient

September 25, 2019 at 8:48 AM

Whether you’ve been involved in introducing models into your business or have had a passing interest in economic affairs, you may have come across the term “Gini-coefficient”. This blog hopes to demystify the concept and give you a good deal of information on the statistical measurement. We answer:

  1. What is the Gini coefficient?
  2. How is it applied in economics?
  3. How did it come about?
  4. What does the Gini mean with reference to a scorecard?
  5. Is it a good measure of a scorecard's strength?
  6. What other scorecard performance measurements are there?
  7. What is a good Gini?
  8. How is a Gini calculated?

1. What is the Gini coefficient?

The Gini coefficient is a measure of statistical dispersion. This simply means it measures the separation of two populations. In economics this might be the rich and the poor. In modelling this may be goods and bads.  It is a measure from 0-100 (strictly speaking from -100-100). 0 means that there is no dispersion – or that there is “equality”. 100 means that there is perfect dispersion – or that there is complete “inequality”.

2. How is it applied in economics?

Gini-mapGini-graphThe Gini has multiple uses, but arguably the most common use is as a measure of a country’s income distribution.  Here the economist is interested in what portion of the income of the country is earned by what portion of the population.  The economist ranks the population from the largest to smallest earners. The economist then measures the portion of the total income earned by each individual.  We may find that the richest 10% earn 25% of the total income of the country; the richest 20% earn 42%, and the richest 30% earns 57% of the income (etc.). This can all be plotted on a graph – with proportion of income on one axis and proportion of population on another. If you calculate the area between the orange and grey line (and multiply it by 2) you get the Gini coefficient.  South Africa infamously has the highest Gini coefficient of income disparity (as illustrated by the map).

3. How did it come about?

You’ll see that it is sometimes referenced as “GINI coefficient” and sometimes “Gini-coefficient”.  Only the latter is correct as it is named after a sociologist who, in 1912, invented it: Corrado Gini (i.e. it’s not an acronym).  

4. What does the Gini mean with reference to a scorecard?

The Gini represents the scorecard’s ability to differentiate between good and bad. The goods and bads should be distributed in two distributions.  A large Gini would mean a larger separation of goods and bads and visa-versa.


5. Is it a good measure of a scorecard’s strength?

Yes and no! The Gini represents the ability for the scorecard to separate goods from bads. But, the population is often split into more than just a binary set of goods and bads. For example, when reviewing an application population, some applicants may well have been rejected. We therefore normally build a model utilising reject inference too. Analysts often forget that when monitoring scorecard performance, they should note that the development data was subject to reject inference and the monitoring sample was not – so it may appear weaker. You can get around this, by calculating the Gini of the development sample from the cut-off upwards and comparing this to the monitoring sample.

Confounded metric

The Gini may often give a conflicting view of what is actually going on. It may be worth looking at an extreme example. If you had a perfect model where you rejected 100% of the bad population, your Gini calculation would be 0 (you would have no Bads to calculate your GINI). However, your model is likely very good, and therefore 0 cannot be the Gini. While this is extreme the notion still applies to scores, particularly where a significantly high population is rejected.

A similar view can be looked at with behaviour and collection scores. Aggressive behaviour (e.g. marketing heavily to low risk) – may push them to be more like medium risk customers compared to not marketing to them. Similarly, aggressively collecting on high risk collections customers may make them behave more like the medium risk customers. In both cases these behavioural scorecards may appear to have flat Gini's when in fact they may well be okay. 

In the interest of not throwing the baby out with the bathwater, the Gini – if used correctly by an analyst aware of some of the points listed above - can provide a decent metric of performance.

6. What other scorecard performance measurements are there?

Remember that a Gini reflects the scorecards ability to separate Good from Bads across all score ranges. That’s a good thing to know if you are going to use the full score ranges for different purposes. However, many of our clients use the score for a single score cut-off in which case the only important metric is how strong the scorecard is at the cut-off. There are many other metrics, but two useful ones are listed here: 

Kolmogorov-Smirnov (KS) statistic: this measures how strong the discrimination is at the scorecard strongest point.  This may be a high score or a low score.

Scorecard discrimination (information value): if you have set up risk bands, you can calculate the information value (I’ve linked to a previous blog where this is explained).

Other worthwhile metrics include “R-squared” and divergence.

7. What is a good Gini?

Gini-cumulative-1We frequently get asked what is a good Gini. For the answer to this, it really depends on what you are measuring, what population and what data. A rough benchmark is as follows:

  1. Application scorecards (demographic data) – Gini: 25-45 (dependent on portfolio)
  2. Behavioural scorecards (internal behavioural data) – Gini: 35-60 (dependent on portfolio)
  3. Collections scorecards (internal collections data) – Gini: 60-80
  4. Bureau scorecard calibration – Gini: 30-45 (dependent on portfolio)

8. How is a Gini calculated?

The Gini is the calculation of the yellow area in the graph above (multiplied by 2). To calculate it, one needs to calculated the area of strips across the Lorenz curve. The spreadsheet and graph shows a rough example.


For more information on Principa’s modelling and data analytics services – reach out to us here.

Subscribe to our blog

Thomas Maydon
Thomas Maydon
Thomas Maydon is the Head of Credit Solutions at Principa. With over 17 years of experience in the Southern African, West African and Middle Eastern retail credit markets, Tom has primarily been involved in consulting, analytics, credit bureau and predictive modelling services. He has experience in all aspects of the credit life cycle (in multiple industries) including intelligent prospecting, originations, strategy simulation, affordability analysis, behavioural modelling, pricing analysis, collections processes, and provisions (including Basel II) and profitability calculations.

Latest Posts

Principa Partners with Astra Constantine to Deliver Psychometric Solutions to Credit Grantors

Customers Can Predict Outcomes by Adding the Power of Psychometrics to Credit Determination

Solving the Credit Unaware Challenge with Psychometrics

At Principa, we engage with clients and organisations across the entire credit lifecycle and track the focus of the South African credit industry. For nearly ten years the focus has consistently been in the collection space, but recently (since early 2021) this has changed and a large number of our clients are focused on acquisitions and originations.

Predicting Customer Behaviour (PART 2)

In Part One of this two-part blog, we started providing a short overview of just some of the propensity models that Principa has developed. In this Part Two, we continue to look at different types of propensity models available across the customer engagement lifecycle that are used to predict behaviour and solve business problems.