The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

What Is R And What Have We Learned Since Working With It?

November 10, 2016 at 3:56 PM

Hands up who has not heard of R? If you are in the data analytics space and have an internet connection then you would have heard of the open source programming language for predictive analytics and statistical computing that has taken the analytics world by storm. 

Like most things, it takes time to reach critical mass and I would say that R has very much reached that point. It was first released back in 1995, with a stable beta version released in 2000. We had heard about R in various contexts before, but there was no specific requirement to start using the tool in anger - or so we thought. 

Enter Machine Learning 

All of our predictive modelling was done using other proprietary tools, which were giving us good results.  Unfortunately, the predictive models that we were building were offline and static in nature and took some time to develop. Enter Machine Learning.  

The very nature of a ‘proper’ Machine Learning system is dynamic in nature and requires the models to track recent trends in the data.  I say “proper” as static modelling can be considered to be one form of Machine Learning (like predicting the survivors on the Titanic on the data science website Kaggle). By nature therefore, one cannot hand craft models in which regular retraining is a requirement, it is just too onerous. So, one needs to work with a tool that can build predictive models quicker. Sure, you might lose some predictive power by not binning characteristics in the optimal way or taking extra care with missing values, but what you lose by cutting back on the TLC you gain by retraining on more recent data.  

This is particularly relevant in dynamic environments like a call centre where call centre agents can come and go at an alarming rate, diallers change, and the underlying data changes at a fundamental level.  By the way, we have some great tricks now that dramatically narrow the gap between “quick-and-dirty” and “hand-crafted” using R, but more on that in another Blog.  

There are many Machine Learning tools out there that can do a good job.  But they all cost ‘quite a bit’ and in this fast-changing space, one is just not sure whether your carefully selected (and expensive) tool will be top of the pile in a year’s time.  Plus, there is a requirement to up-skill with that tool, and that takes additional time. One thing is sure though, Microsoft will be around for some time. 

What does Microsoft have to do with R?

But what does Microsoft have to do with R?  Quite a bit actually.  In April 2015, Microsoft took the most amazing leap forward and purchased Revolution Analytics.  Revolution Analytics were the ones you contacted if you wanted to integrate R into your business, and they were doing a pretty good job.  Let’s just say they knew R pretty well.  

In purchasing RA, Microsoft bought the IP that would allow them to incorporate R into all their mainstream products – which they are wasting no time in doing and we are loving them for it.  Let us take Power BI as an example, Microsoft’s BI solution.  It’s dirt cheap (for now) and they are taking the BI world by storm by investing millions into its development and upgrading aggressively in line with all the user feedback comments.  It is currently in the most favourable position in Gartner’s Magic Quadrant for BI tools.  An R console is available on the back end (data load) and front end (User Interface).  

On the back-end side, this means that you can manipulate data using the SQLDF package which is based on SQL LITE.  If you know SQL, you will LOVE this.  You can join tables, create new fields, and manipulate tables to your heart’s content.  Very few BI tools have this capability (Qlikview being the exception, and this is one feature that I love about Qlikview).  Basically, whatever works in native R works in Power BI.  Brilliant! 

On the front-end side, things get interesting.  Again, anything you can do in R, you can do in Power BI.  This throws the door wide open in ways you may not have realised.  Here is a link showing just some of the visuals you can achieve using R (note: using R and not Power BI’s built- in functions)  

Check out this link for a how-to guide.

What about SQL2016?

And then there is SQL2016.  Dear, dear SQL2016, so happy you arrived.  Traditionally, R has been more suitable for the research and small-scale cases due to its inability to efficiently process and model on big data.  

Some pretty cutting-edge R libraries have been developed by some clever people who compete with the big hitters like SAS, but the limitation has always been on the data size.  By bringing R into SQL2016, this solves this issue.  Retraining using any of the powerful R libraries just got a whole lot quicker.  Here is a case study from the Microsoft blog that illustrates this nicely and contains a pretty convincing quote: “PROS Holdings uses SQL Server 2016’s superior performance and built-in R Service to deliver advanced analytics more than 100x faster than before, resulting in higher profits for their customers”.

Here is a great link showing why R and SQL are a match made in heaven (in particular around the 2m30s mark). 

R not only covers descriptive, predictive and prescriptive data analytics.  There are over 7,000 packages available that make this tool extremely versatile - from image manipulation to heat maps, to linking to any type of DB like SalesForce. 

We started off by asking: “I wonder if there is an R package for that?” but this has become a running rhetorical question.  RStudio have even created a web service offering that allows you to create very attractive UI around your R code and showcase the resulting product to the outside world.  Check out their gallery

So we are pretty excited about all the things that R can bring to our table and we’d love to put these skills and passion for R and what it can do towards benefiting your business.  

If you’d like us to use our R skills to develop some models that can predict outcomes for your business and answer business critical questions, just drop us a line!


Using machine learning in business - download guide

Image credit: Designed by Freepik

Robin Davies
Robin Davies
Robin Davies was the Head of Product Development at Principa for many years during which Robin’s team packaged complex concepts into easy-to-use products that help our clients to lift their business in often unexpected ways. Robin is currently the Head of Machine Learning at a prestigious firm in the UK.

Latest Posts

Solving the Credit Unaware Challenge with Psychometrics

At Principa, we engage with clients and organisations across the entire credit lifecycle and track the focus of the South African credit industry. For nearly ten years the focus has consistently been in the collection space, but recently (since early 2021) this has changed and a large number of our clients are focused on acquisitions and originations.

Predicting Customer Behaviour (PART 2)

In Part One of this two-part blog, we started providing a short overview of just some of the propensity models that Principa has developed. In this Part Two, we continue to look at different types of propensity models available across the customer engagement lifecycle that are used to predict behaviour and solve business problems. 

PART 2: How to Cure the Post Pandemic “Collections” Symptoms

In PART 1 of this two-part series, we explored how the current socio-economic climate resulting from the lingering financial hangover caused by the pandemic is negatively impacting the consumer's ability to settle a debt.