The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

What Is R And What Have We Learned Since Working With It?

November 10, 2016 at 3:56 PM

Hands up who has not heard of R? If you are in the data analytics space and have an internet connection then you would have heard of the open source programming language for predictive analytics and statistical computing that has taken the analytics world by storm. 

Like most things, it takes time to reach critical mass and I would say that R has very much reached that point. It was first released back in 1995, with a stable beta version released in 2000. We had heard about R in various contexts before, but there was no specific requirement to start using the tool in anger - or so we thought. 

Enter Machine Learning 

All of our predictive modelling was done using other proprietary tools, which were giving us good results.  Unfortunately, the predictive models that we were building were offline and static in nature and took some time to develop. Enter Machine Learning.  

The very nature of a ‘proper’ Machine Learning system is dynamic in nature and requires the models to track recent trends in the data.  I say “proper” as static modelling can be considered to be one form of Machine Learning (like predicting the survivors on the Titanic on the data science website Kaggle). By nature therefore, one cannot hand craft models in which regular retraining is a requirement, it is just too onerous. So, one needs to work with a tool that can build predictive models quicker. Sure, you might lose some predictive power by not binning characteristics in the optimal way or taking extra care with missing values, but what you lose by cutting back on the TLC you gain by retraining on more recent data.  

This is particularly relevant in dynamic environments like a call centre where call centre agents can come and go at an alarming rate, diallers change, and the underlying data changes at a fundamental level.  By the way, we have some great tricks now that dramatically narrow the gap between “quick-and-dirty” and “hand-crafted” using R, but more on that in another Blog.  

There are many Machine Learning tools out there that can do a good job.  But they all cost ‘quite a bit’ and in this fast-changing space, one is just not sure whether your carefully selected (and expensive) tool will be top of the pile in a year’s time.  Plus, there is a requirement to up-skill with that tool, and that takes additional time. One thing is sure though, Microsoft will be around for some time. 

What does Microsoft have to do with R?

But what does Microsoft have to do with R?  Quite a bit actually.  In April 2015, Microsoft took the most amazing leap forward and purchased Revolution Analytics.  Revolution Analytics were the ones you contacted if you wanted to integrate R into your business, and they were doing a pretty good job.  Let’s just say they knew R pretty well.  

In purchasing RA, Microsoft bought the IP that would allow them to incorporate R into all their mainstream products – which they are wasting no time in doing and we are loving them for it.  Let us take Power BI as an example, Microsoft’s BI solution.  It’s dirt cheap (for now) and they are taking the BI world by storm by investing millions into its development and upgrading aggressively in line with all the user feedback comments.  It is currently in the most favourable position in Gartner’s Magic Quadrant for BI tools.  An R console is available on the back end (data load) and front end (User Interface).  

On the back-end side, this means that you can manipulate data using the SQLDF package which is based on SQL LITE.  If you know SQL, you will LOVE this.  You can join tables, create new fields, and manipulate tables to your heart’s content.  Very few BI tools have this capability (Qlikview being the exception, and this is one feature that I love about Qlikview).  Basically, whatever works in native R works in Power BI.  Brilliant! 

On the front-end side, things get interesting.  Again, anything you can do in R, you can do in Power BI.  This throws the door wide open in ways you may not have realised.  Here is a link showing just some of the visuals you can achieve using R (note: using R and not Power BI’s built- in functions)  

Check out this link for a how-to guide.

What about SQL2016?

And then there is SQL2016.  Dear, dear SQL2016, so happy you arrived.  Traditionally, R has been more suitable for the research and small-scale cases due to its inability to efficiently process and model on big data.  

Some pretty cutting-edge R libraries have been developed by some clever people who compete with the big hitters like SAS, but the limitation has always been on the data size.  By bringing R into SQL2016, this solves this issue.  Retraining using any of the powerful R libraries just got a whole lot quicker.  Here is a case study from the Microsoft blog that illustrates this nicely and contains a pretty convincing quote: “PROS Holdings uses SQL Server 2016’s superior performance and built-in R Service to deliver advanced analytics more than 100x faster than before, resulting in higher profits for their customers”.

Here is a great link showing why R and SQL are a match made in heaven (in particular around the 2m30s mark). 

R not only covers descriptive, predictive and prescriptive data analytics.  There are over 7,000 packages available that make this tool extremely versatile - from image manipulation to heat maps, to linking to any type of DB like SalesForce. 

We started off by asking: “I wonder if there is an R package for that?” but this has become a running rhetorical question.  RStudio have even created a web service offering that allows you to create very attractive UI around your R code and showcase the resulting product to the outside world.  Check out their gallery

So we are pretty excited about all the things that R can bring to our table and we’d love to put these skills and passion for R and what it can do towards benefiting your business.  

If you’d like us to use our R skills to develop some models that can predict outcomes for your business and answer business critical questions, just drop us a line!


Using machine learning in business - download guide

Image credit: Designed by Freepik

Robin Davies
Robin Davies
Robin Davies was the Head of Product Development at Principa for many years during which Robin’s team packaged complex concepts into easy-to-use products that help our clients to lift their business in often unexpected ways. Robin is currently the Head of Machine Learning at a prestigious firm in the UK.

Latest Posts

The 7 types of credit risk in SME lending

  It is common knowledge in the industry that the credit risk assessment of a consumer applying for credit is far less complex than that of a business that is applying for credit. Why is this the case? Simply put, consumers are usually very similar in their requirements and risks (homogenous) whilst businesses have far more varying risk elements (heterogenous). In this blog we will look at all the different risk elements within a business (here SME) credit application. These are: Risk of proprietors Risk of business Reason for loan Financial ratios Size of loan Risk industry Risk of region Before we delve into this list, it is worth noting that all of these factors need to be deployable as assessment tools within your originations system so it is key that you ensure your system can manage them. If you are on the look out for a loans origination system, then look no further than Principa’s AppSmart. If you are looking for a decision engine to manage your scorecards, policy rules and terms of business then take a look at our DecisionSmart business rules engine. AppSmart and DecisionSmart are part of Principa’s FinSmart Universe allowing for effective credit management across the customer life-cycle.  The different risk elements within a business credit application 1) Risk of proprietors For smaller organisations the risk of the business is inextricably linked to the financial well-being of the proprietors. How small is small? The rule of thumb is companies with up to two to three proprietors should have their proprietors assessed for risk too. This fits in with the SME segment. What data should be looked at? Generally in countries with mature credit bureaux, credit data is looked at including the score (there is normally a score cut-off) and then negative information such as the existence of judgements or defaults; these are typically used within policy rules. Those businesses with proprietors with excessive numbers of “negatives” may be disqualified from the loan application. Some credit bureaux offer a score of an individual based on the performance of all the businesses with which they are associated. This can also be useful in the credit risk assessment process. Another innovation being adopted internationally is the use of psychometrics in credit evaluation of the proprietors. To find out more about adopting credit scoring, read our blog on how to adopt credit scoring.   2) Risk of business The risk of the business should be managed through both scores and policy rules. Lenders will look at information such as the age of company, the experience of directors and the size of company etc. within a score. Alternatively, many lenders utilise the business score offered by credit bureaux. These scores are typically not as strong as consumer scores as the underlying data is limited and sometimes problematic. For example, large successful organisations may have judgements registered against their name which, unlike for consumers, is not necessarily a direct indication of the inability to service debt.   3) Reason for loan The reason for a loan is used more widely in business lending as opposed to unsecured consumer lending. Venture capital, working capital, invoice discounting and bridging finance are just some of many types of loan/facilities available and lenders need to equip themselves with the ability to manage each of these customer types whether it is within originations or collections. Prudent lenders venturing into the SME space for the first time often focus on one or two of these loan types and then expand later – as the operational implication for each type of loan is complex. 4) Financial ratios Financial ratios are core to commercial credit risk assessment. The main challenge here is to ensure that reliable financials are available from the customer. Small businesses may not be audited and thus the financials may be less trustworthy.   Financial ratios can be divided into four categories: Profitability Leverage Coverage Liquidity Profitability can be further divided into margin ratios and return ratios. Lenders are frequently interested in gross profit margins; this is normally explicit on the income statement. The EBIDTA margin and operating profit margins are also used as well as return ratios such as return on assets, return on equity and risk-adjusted-returns. Leverage ratios are useful to lenders as they reflect the portion of the business that is financed by debt. Lower leverage ratios indicate stability. Leverage ratios assessed often incorporate debt-to-asset, debt-to-equity and asset-to-equity. Coverage ratios indicate the coverage that income or assets provide for the servicing of debt or interest expenses. The higher the coverage ratio the better it is for the lender. Coverage ratios are worked out considering the loan/facility that is being applied for. Finally, liquidity ratios indicate the ability for a company to convert its assets into cash. There are a variety of ratios used here. The current ratio is simply the ratio of assets to liabilities. The quick ratio is the ability for the business to pay its current debts off with readily available assets. The higher the liquidity ratios the better. Ratios are used both within credit scorecards as well as within policy rules. You can read more about these ratios here. 5) Size of loan When assessing credit risk for a consumer, the risk of the consumer does not normally change with the change of loan amount or facility (subject to the consumer passing affordability criteria). With business loans, loan amounts can range quite dramatically, and the risk of the applicant is normally tied to the loan amount requested. The loan/facility amount will of course change the ratios (mentioned in the last section) which could affect a positive/negative outcome. The outcome of the loan application is usually directly linked to a loan amount and any marked change to this loan amount would change the risk profile of the application.   6) Risk of industry The risk of an industry in which the SME operates can have a strong deterministic relationship with the entity being able to service the debt. Some lenders use this and those who do not normally identify this as a missing element in their risk assessment process. The identification of industry is always important. If you are in manufacturing, but your clients are the mines, then you are perhaps better identified as operating in mining as opposed to manufacturing. Most lenders who assess industry, will periodically rule out certain industries and perhaps also incorporate industry within their scorecard. Others take a more scientific approach. In the graph below the performance of an industry is tracked for two years and then projected over the next 6 months; this is then compared to the country’s GDP. As the industry appears to track above the projected GDP, a positive outlook is given to this applicant and this may affect them favourably in the credit application.                   7) Risk of Region   The last area of assessment is risk of region. Of the seven, this one is used the least. Here businesses,  either on book or on the bureau, are assessed against their geo-code. Each geo-code is clustered, and the projected outlook is given as positive, static or negative. As with industry this can be used within the assessment process as a policy rule or within a scorecard.   Bringing the seven risk categories together in a risk assessment These seven risk assessment categories are all important in the risk assessment process. How you bring it all together is critical. If you would like to discuss your SME evaluation challenges or find out more about what we offer in credit management software (like AppSmart and DecisionSmart), get in touch with us here.

Collections Resilience post COVID-19 - part 2

Principa Decisions (Pty) L

Collections Resilience post COVID-19

Principa Decisions (Pty) L