The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

What Is R And What Have We Learned Since Working With It?

November 10, 2016 at 3:56 PM

Hands up who has not heard of R? If you are in the data analytics space and have an internet connection then you would have heard of the open source programming language for predictive analytics and statistical computing that has taken the analytics world by storm. 

Like most things, it takes time to reach critical mass and I would say that R has very much reached that point. It was first released back in 1995, with a stable beta version released in 2000. We had heard about R in various contexts before, but there was no specific requirement to start using the tool in anger - or so we thought. 

Enter Machine Learning 

All of our predictive modelling was done using other proprietary tools, which were giving us good results.  Unfortunately, the predictive models that we were building were offline and static in nature and took some time to develop. Enter Machine Learning.  

The very nature of a ‘proper’ Machine Learning system is dynamic in nature and requires the models to track recent trends in the data.  I say “proper” as static modelling can be considered to be one form of Machine Learning (like predicting the survivors on the Titanic on the data science website Kaggle). By nature therefore, one cannot hand craft models in which regular retraining is a requirement, it is just too onerous. So, one needs to work with a tool that can build predictive models quicker. Sure, you might lose some predictive power by not binning characteristics in the optimal way or taking extra care with missing values, but what you lose by cutting back on the TLC you gain by retraining on more recent data.  

This is particularly relevant in dynamic environments like a call centre where call centre agents can come and go at an alarming rate, diallers change, and the underlying data changes at a fundamental level.  By the way, we have some great tricks now that dramatically narrow the gap between “quick-and-dirty” and “hand-crafted” using R, but more on that in another Blog.  

There are many Machine Learning tools out there that can do a good job.  But they all cost ‘quite a bit’ and in this fast-changing space, one is just not sure whether your carefully selected (and expensive) tool will be top of the pile in a year’s time.  Plus, there is a requirement to up-skill with that tool, and that takes additional time. One thing is sure though, Microsoft will be around for some time. 

What does Microsoft have to do with R?

But what does Microsoft have to do with R?  Quite a bit actually.  In April 2015, Microsoft took the most amazing leap forward and purchased Revolution Analytics.  Revolution Analytics were the ones you contacted if you wanted to integrate R into your business, and they were doing a pretty good job.  Let’s just say they knew R pretty well.  

In purchasing RA, Microsoft bought the IP that would allow them to incorporate R into all their mainstream products – which they are wasting no time in doing and we are loving them for it.  Let us take Power BI as an example, Microsoft’s BI solution.  It’s dirt cheap (for now) and they are taking the BI world by storm by investing millions into its development and upgrading aggressively in line with all the user feedback comments.  It is currently in the most favourable position in Gartner’s Magic Quadrant for BI tools.  An R console is available on the back end (data load) and front end (User Interface).  

On the back-end side, this means that you can manipulate data using the SQLDF package which is based on SQL LITE.  If you know SQL, you will LOVE this.  You can join tables, create new fields, and manipulate tables to your heart’s content.  Very few BI tools have this capability (Qlikview being the exception, and this is one feature that I love about Qlikview).  Basically, whatever works in native R works in Power BI.  Brilliant! 

On the front-end side, things get interesting.  Again, anything you can do in R, you can do in Power BI.  This throws the door wide open in ways you may not have realised.  Here is a link showing just some of the visuals you can achieve using R (note: using R and not Power BI’s built- in functions)  

Check out this link for a how-to guide.

What about SQL2016?

And then there is SQL2016.  Dear, dear SQL2016, so happy you arrived.  Traditionally, R has been more suitable for the research and small-scale cases due to its inability to efficiently process and model on big data.  

Some pretty cutting-edge R libraries have been developed by some clever people who compete with the big hitters like SAS, but the limitation has always been on the data size.  By bringing R into SQL2016, this solves this issue.  Retraining using any of the powerful R libraries just got a whole lot quicker.  Here is a case study from the Microsoft blog that illustrates this nicely and contains a pretty convincing quote: “PROS Holdings uses SQL Server 2016’s superior performance and built-in R Service to deliver advanced analytics more than 100x faster than before, resulting in higher profits for their customers”.

Here is a great link showing why R and SQL are a match made in heaven (in particular around the 2m30s mark). 

R not only covers descriptive, predictive and prescriptive data analytics.  There are over 7,000 packages available that make this tool extremely versatile - from image manipulation to heat maps, to linking to any type of DB like SalesForce. 

We started off by asking: “I wonder if there is an R package for that?” but this has become a running rhetorical question.  RStudio have even created a web service offering that allows you to create very attractive UI around your R code and showcase the resulting product to the outside world.  Check out their gallery

So we are pretty excited about all the things that R can bring to our table and we’d love to put these skills and passion for R and what it can do towards benefiting your business.  

If you’d like us to use our R skills to develop some models that can predict outcomes for your business and answer business critical questions, just drop us a line!


Using machine learning in business - download guide

Image credit: Designed by Freepik

Robin Davies
Robin Davies
Robin Davies was the Head of Product Development at Principa for many years during which Robin’s team packaged complex concepts into easy-to-use products that help our clients to lift their business in often unexpected ways. Robin is currently the Head of Machine Learning at a prestigious firm in the UK.

Latest Posts

How to choose the correct collections chatbot

Principa has a wealth of experience in building and deploying chatbots for the financial services industry. Our custom-built solution is flexible and fully customisable which allows your bot to assume your brand’s persona. We can also seamlessly integrate with existing systems. Click here to find out more. 

Model validation and adjustment

The time is NOW for model validation and adjustment. One of the major premises used in credit scoring is that “the future is like the past”. It’s usually a rational assumption and gives us a reasonable platform on which to build scorecards whether they be application scorecards, behavioural scores, collection scores or financial models. That is reasonable until something unprecedented comes along. You can read about this black swan event in our previous two blogs here and here.

Payment holidays – what did everyone do?

Payment holidays have been used throughout South Africa and around the world to help alleviate the economic stress during the COVID-19 lockdown. In this blog we look at some of the steps taken internationally and by some of South Africa’s major lenders (specifically in the consumer space).