The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

Football Had The Octopus, Rugby Has Data Science

October 28, 2015 at 6:40 PM

Rugby has Data Science

Many of us remember the hoopla around the predicting ability of the now deceased FIFA World Cup predictor, Paul the Octopus. For those who don’t recall, Paul was an Octopus at a German aquarium that famously predicted with 100% accuracy the results of team Germany’s six matches and final match of the 2010 Soccer World Cup.

Although, Goldman Sachs defending their 37.5% success rate back then contested that Paul would have only been 33% accurate had he had to predict the results of all 48 games, including draws. Be that as it may, given the option of breaking into Cape Town Aquarium and kidnapping an octopus to help us predict the outcome of the Rugby World Cup or rather relying on data science and machine learning, we stuck to what we know and believe in and it has paid off.

Listen to our friends at CliffCentral.com (South African Internet radio station) test the prediction ability of their Sports Guy's cat, Meowsers, against our machine-learning predictions.

To date, now on the cusp of the Rugby World Cup final, our machine learning based predictions have been 91% accurate and in the top 1% of sports prediction site SuperBru’s members’ predictions. 

We simply believe in the power of machine learning and the simple principle of predicting future behaviour by analysing past behaviour, identifying patterns and trends, and leaning on the strong assumption that past behaviour carries forward into the future. And one of the main drivers for us initiating our Man vs. Machine-Learning initiative was to demonstrate superiority of using data science over gut-feel or even an octopus to base your business decisions on. 

In our previous blog on this subject, we covered some of the lessons learned during the initial round of matches and the challenges we faced leading up to the knock-out rounds of last week.

In this blog, I’d like to go through some additional lessons learned during the knock-out rounds as we head into the final this weekend. 

Adaptability is central to machine learning

Although our win and margin prediction for the Australia vs. Argentina match were spot-on, we overstated the margin with the All Blacks game (12 vs. 2). Our predictive models were built on a selection of historical data points such as bookie odds, fantasy league value, world-ranking and number of tries scored in the last three games, etc. We could have combined a wider range of characteristics that might have narrowed the gap for the knock-out rounds, but had to draw the line somewhere in terms of the data we were injecting into our predictive models. Additional metrics like weather conditions, crowd-size or even referee track record would have given us considerably more granular results, thereby narrowing the gap between our predictions and the results.

What this points to is the centrality of almost up-to-the-minute, dynamic data that can only be determined in the 11th hour to reach the most accurate result. A combination of historic and near-live data will result in much closer insight which might appear almost psychic to the data-novice. You see, machine learning, in essence, hinges on the availability of a wide spectrum of data that allows it to learn from it and adapt its algorithms to more accurately calculate the possible outcomes of whatever situation it is being applied to.

Your data needs to move at the speed of life

One often has grand ideas when going into a real-world predictive modelling problem, but it is important to bear in mind that with every change or variable, new dynamics kick into action that predictive models need to factor in.

Our cognitive inner workings occur automatically thanks to the complexity of the human brain and its ability to process complex variables and stimuli instantaneously. This, of course, is the end goal in our adventures in machine learning and artificial intelligence and thus leads to the question of how capable our current state of Information and Communications Technology (ICT) is of matching humans’ capacity to collect, filter, organise and analyse information and ultimately determine the right action based on aforementioned processes. Adding to this is the ability to apply context to the stimuli we receive, which in turn influences our decision-making processes.

Let’s use the case of the semi-finals: if we had applied contextual data that informed our predictive models that these games were knock-out matches as opposed to first-round games, our systems could have factored in possible context-based strategy changes by the teams that could have narrowed the score margin somewhat. But we think we have done pretty well with our 0.42% position in the SuperBru league using the available data, and wouldn’t wager the rich insights we’ve gained on the inclinations of any tentacled beings anytime soon.

As we learn and adapt our understanding and approaches to the world of predictive modelling, we’ll continue to evolve to reach the goals we’ve set for ourselves. But we had fun and learned a lot in the machine learning space which has been a positive outcome, for us anyway. If anything, we’ve managed to make our point that important business decisions should be left to data science and predictive analytics and not to gut or hocus pocus.

We’ll be posting our predictions here on our Rugby World Cup Predictions and Stats page for this weekend’s final matches, just in time for for you to make your prediction and try and beat our "machines." Although South Africa will sadly not be playing on Saturday, as we had predicted, we will be watching intently and this time cheering for our two teams of data scientists’ final predictions in our Man vs Machine-Learning initiative.

May the best rugby and data science teams win!

Using machine learning in business - download guide

Julian Diaz
Julian Diaz
Julian Diaz was Head of Marketing for Principa until 2017, after which he became Head of Marketing for Honeybee CRM. American born and raised, Julian has worked in the IT industry for over 20 years. Having begun his career at a major software company in Germany, Julian made the move to South Africa in 1998 when he joined Dimension Data and later MWEB (leading South African ISP). Since then, Julian has helped launch various South African technology brands into international markets, including Principa.

Latest Posts

The 7 types of credit risk in SME lending

  It is common knowledge in the industry that the credit risk assessment of a consumer applying for credit is far less complex than that of a business that is applying for credit. Why is this the case? Simply put, consumers are usually very similar in their requirements and risks (homogenous) whilst businesses have far more varying risk elements (heterogenous). In this blog we will look at all the different risk elements within a business (here SME) credit application. These are: Risk of proprietors Risk of business Reason for loan Financial ratios Size of loan Risk industry Risk of region Before we delve into this list, it is worth noting that all of these factors need to be deployable as assessment tools within your originations system so it is key that you ensure your system can manage them. If you are on the look out for a loans origination system, then look no further than Principa’s AppSmart. If you are looking for a decision engine to manage your scorecards, policy rules and terms of business then take a look at our DecisionSmart business rules engine. AppSmart and DecisionSmart are part of Principa’s FinSmart Universe allowing for effective credit management across the customer life-cycle.  The different risk elements within a business credit application 1) Risk of proprietors For smaller organisations the risk of the business is inextricably linked to the financial well-being of the proprietors. How small is small? The rule of thumb is companies with up to two to three proprietors should have their proprietors assessed for risk too. This fits in with the SME segment. What data should be looked at? Generally in countries with mature credit bureaux, credit data is looked at including the score (there is normally a score cut-off) and then negative information such as the existence of judgements or defaults; these are typically used within policy rules. Those businesses with proprietors with excessive numbers of “negatives” may be disqualified from the loan application. Some credit bureaux offer a score of an individual based on the performance of all the businesses with which they are associated. This can also be useful in the credit risk assessment process. Another innovation being adopted internationally is the use of psychometrics in credit evaluation of the proprietors. To find out more about adopting credit scoring, read our blog on how to adopt credit scoring.   2) Risk of business The risk of the business should be managed through both scores and policy rules. Lenders will look at information such as the age of company, the experience of directors and the size of company etc. within a score. Alternatively, many lenders utilise the business score offered by credit bureaux. These scores are typically not as strong as consumer scores as the underlying data is limited and sometimes problematic. For example, large successful organisations may have judgements registered against their name which, unlike for consumers, is not necessarily a direct indication of the inability to service debt.   3) Reason for loan The reason for a loan is used more widely in business lending as opposed to unsecured consumer lending. Venture capital, working capital, invoice discounting and bridging finance are just some of many types of loan/facilities available and lenders need to equip themselves with the ability to manage each of these customer types whether it is within originations or collections. Prudent lenders venturing into the SME space for the first time often focus on one or two of these loan types and then expand later – as the operational implication for each type of loan is complex. 4) Financial ratios Financial ratios are core to commercial credit risk assessment. The main challenge here is to ensure that reliable financials are available from the customer. Small businesses may not be audited and thus the financials may be less trustworthy.   Financial ratios can be divided into four categories: Profitability Leverage Coverage Liquidity Profitability can be further divided into margin ratios and return ratios. Lenders are frequently interested in gross profit margins; this is normally explicit on the income statement. The EBIDTA margin and operating profit margins are also used as well as return ratios such as return on assets, return on equity and risk-adjusted-returns. Leverage ratios are useful to lenders as they reflect the portion of the business that is financed by debt. Lower leverage ratios indicate stability. Leverage ratios assessed often incorporate debt-to-asset, debt-to-equity and asset-to-equity. Coverage ratios indicate the coverage that income or assets provide for the servicing of debt or interest expenses. The higher the coverage ratio the better it is for the lender. Coverage ratios are worked out considering the loan/facility that is being applied for. Finally, liquidity ratios indicate the ability for a company to convert its assets into cash. There are a variety of ratios used here. The current ratio is simply the ratio of assets to liabilities. The quick ratio is the ability for the business to pay its current debts off with readily available assets. The higher the liquidity ratios the better. Ratios are used both within credit scorecards as well as within policy rules. You can read more about these ratios here. 5) Size of loan When assessing credit risk for a consumer, the risk of the consumer does not normally change with the change of loan amount or facility (subject to the consumer passing affordability criteria). With business loans, loan amounts can range quite dramatically, and the risk of the applicant is normally tied to the loan amount requested. The loan/facility amount will of course change the ratios (mentioned in the last section) which could affect a positive/negative outcome. The outcome of the loan application is usually directly linked to a loan amount and any marked change to this loan amount would change the risk profile of the application.   6) Risk of industry The risk of an industry in which the SME operates can have a strong deterministic relationship with the entity being able to service the debt. Some lenders use this and those who do not normally identify this as a missing element in their risk assessment process. The identification of industry is always important. If you are in manufacturing, but your clients are the mines, then you are perhaps better identified as operating in mining as opposed to manufacturing. Most lenders who assess industry, will periodically rule out certain industries and perhaps also incorporate industry within their scorecard. Others take a more scientific approach. In the graph below the performance of an industry is tracked for two years and then projected over the next 6 months; this is then compared to the country’s GDP. As the industry appears to track above the projected GDP, a positive outlook is given to this applicant and this may affect them favourably in the credit application.                   7) Risk of Region   The last area of assessment is risk of region. Of the seven, this one is used the least. Here businesses,  either on book or on the bureau, are assessed against their geo-code. Each geo-code is clustered, and the projected outlook is given as positive, static or negative. As with industry this can be used within the assessment process as a policy rule or within a scorecard.   Bringing the seven risk categories together in a risk assessment These seven risk assessment categories are all important in the risk assessment process. How you bring it all together is critical. If you would like to discuss your SME evaluation challenges or find out more about what we offer in credit management software (like AppSmart and DecisionSmart), get in touch with us here.

Collections Resilience post COVID-19 - part 2

Principa Decisions (Pty) L

Collections Resilience post COVID-19

Principa Decisions (Pty) L