The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

Will Your Machine Learning Models Pass The William Tell Test?

August 14, 2018 at 8:34 AM

william tell

Machine learning models can be used very successfully in many different contexts to predict outcomes for different use cases accurately. These predictions can be used within the business to make better decisions or to operate more efficiently (or both) and can give you an edge over your competitors. Predictive models all follow the same recipe – i.e. train a model on historical data and then apply this model to unseen data to get predictions. If your model generalises well, you have a prediction that you can trust and use to decide "do this, not that" with some degree of accuracy.

Machine learning in financial services

In the financial services sector the most common requirement is to predict binary classifier outcomes – i.e. predicting a yes/no, True/False or a 1/0 outcome. Some examples include answers to these typical questions:

  • Shall we grant this applicant a loan?
  • Will this customer pay back their facility?
  • Will this customer attrite and move to a competitor?
  • Will the right customer answer this call?
  • Will this customer take up this new product?

There are many techniques out there that can provide varying levels of accurate predictions – e.g. logistic regression, support vector machines and neural nets. Principa has tried various techniques over time and we are seeing good results with the gradient boosted algorithm approach. This is a machine learning algorithm that is often the winning algorithm on the open competition website, There are numerous internal parameters that can be configured to fine-tune your model, plus it is fast (especially the Python libraries 'XGBoost’ and ‘LightGBM’) and their lightning fast speeds during the training phase allows one to run more experiments in the time you have available, giving you a better chance of finding the optimal tuning parameters. Check this out for a great illustration on how gradient boosted algorithms work:



XGBoost is happiest when the positive (e.g. responders) and negative (e.g. non-responders) classes are well balanced – i.e. you have around a 50% response rate. However, in our experience, this very seldom occurs. Take modelling fraud for example – there are generally very few positive classes (or fraudsters) on which to model, and this is often referred to as rare event modelling. This is quite an extreme case, but we still struggle with imbalanced classes, like response modelling or predicting a right-party-connect where the RPC rate is only around 1%. One can force balance in the algorithm by tuning the scale_pos_weight parameter. This will give you a good model that will separate the positive and negative classes quite nicely, but the problem with this approach is that that the resulting probability is going to be scaled incorrectly. So the RPC scores that fall in the 1-2% range are not going to average out at 1.5%, it will be something quite different. This is fine if you only want the model to help select the top records – i.e. you want the best 1,000 records out of a possible 10,000.

However, if your business strategy relies on the pin-point accuracy of your model's predictions, then this approach is not going to work for you. Fortunately, xgboost has many parameters to choose from that can be used to fine-tune the construct of the underlying algorithm. One of these parameters, the max_delta_step parameter can be used to great effect to give accurate point predictions in the case when the target variable is imbalanced.  We can show the impact of this in the views below using a right party connect (RPC) use case as the target that we want to predict. The first view shows a good model by tuning the scale_pos_weight parameter - the Gini coefficient for this model is a healthy 68.8%. But notice how poorly its prediction accuracy is (the blue line does not follow the perfect or unicorn model's green line in the second graph). When we tune the max_delta_step parameter, the model still separates the two classes nicely (with a Gini coefficient of 68.5% that is very close to the original model) AND gives good overall point prediction. We have seen real-world success following this approach on a few use cases now. If you would like skillful and reliable models that give you accurate predictions, contact us.

Scape Pos Weight 30-1

Max Delta Step 1

predictive analytics guide

Robin Davies
Robin Davies
Robin Davies is the Head of Product Development at Principa. Robin’s team packages complex concepts into easy-to-use products that help our clients to lift their business in often unexpected ways.

Latest Posts

Incorporating Credit Lifecycle Predictive Outcomes In Your Collections And Recoveries Call Centre

In a collections environment, an agent needs to follow up with numerous customers on their outstanding credit and the more distinct information the agent has on each customer, the better the agent will understand who they are interacting with and what the opportunities, risks and expectation of the collections call with the client are.

[Slideshare] How To Make Your Business Data Work For You

Common barriers to success: Skills shortage: data scientists are in high demand and in low supply. Companies lack the skills to develop advanced data analytics or machine learning applications. Cost: recruiting and building up or training a team, as well as infrastructure costs are immense. Inefficiency and low ROI on: acquisition campaigns; re-activation and retention campaigns; outbound sales calls and debt collection. Resulting in: No or ineffective use of data. High cost to get insights from data. Low returns from campaigns. What’s the alternative? Machine Learning as a Service (MLaaS): removes infrastructure skills and requirements for machine learning, allowing you to begin benefiting from machine learning quickly with little investment. Subscription based pricing, allowing you to benefit using machine learning while minimising your set-up costs and seeing returns sooner. Answers as a Service: Use historic data and machine learning to allow answers to increase in accuracy with time. MLaaS with predictive models pre-developed to answers specific questions: Genius Call Connect: What is the best time and number to call customers? Genius Customer Growth: Which customers are most likely to respond to cross-sell? Genius Re-activation: Which dormant customers are worth re-activating? Genius Customer Retention: Which customers are most likely to churn? Genius Leads: Which contacts are likely to respond to my campaign? Genius Risk Classifier: Which debtors are most likely to pay or roll? Benefits of Genius: Quick and cost-effective ability to leverage machine learning: Minimal set-up time Minimal involvement from IT Subscription based service Looking to make your data work for your business? Read more on Genius to see how it can help your business succeed. 

5 Must-Join Facebook Pages For Data Science, Machine Learning And Artificial Intelligence In 2019

While LinkedIn has traditionally been thought of as the business or work focussed social platform, Facebook has been making headway into gaining market share in the space as well. With company pages and groups, Facebook is catering to every interest and aspiration that people might have – and combining that with their social interactions and news sources. Facebook aims to give users a one-stop-shop experience, and it’s very good at doing it.