The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

Keeping Our Skills Fresh: Predicting The FIFA World Cup 2018 Results

May 22, 2018 at 10:24 AM

In 2015, we predicted the Rugby World Cup to great success, out predicting 99.68% of humans. In 2016, we predicted the results of the Oscars, accurately predicting DiCaprio’s first win. This year we'll be trying our hand at predicting the outcomes of the FIFA Football World Cup, and we're cautiously optimistic about our predictions.

View our Football World Cup 2018 predictions

Why “cautiously optimistic” after our previous success?

This is a whole new ballgame – literally! While a number of our data scientists are passionate football fans, we have never applied our skills to determine the outcome of a football match. We’ll also be sourcing public domain data and testing different techniques that are rarely used in our day-to-day analytics. On top of that, we have our previous successes to live up to!

Why are we using new techniques?

Our teams are very busy working wonders for our clients, and predicting the results of the World Cup is considered a fun activity, as well as a training exercise. They therefore don’t get as much time as they would like to spend on building their models. We put our clients first! But our data scientists had so much fun the previous time, and they love a challenge, which is why they are taking this one on. By opting to use this exercise as training, they won’t be using inconsequential data to hone their skills, but rather real match statistics, and training time is now filled with building their models for sports predictions.

But, back to the question – why are we using different techniques? Firstly, the team want to challenge themselves, and by using different techniques, they not only get to do so, but they also explore and hone new skills. Therefore, instead of using techniques that we apply on a daily basis, we’ll be trying something fresh!

Check out some of the interesting stats our team has found during their preparations! 

Which techniques will be used?

We've divided into three teams, who will all be using different techniques and going head-to-head in competition. The methods that each of the four teams will be using are:

Bayesian Inference

This technique can be used to enhance predictions by using what we already know (determined by looking at historic game results), with a recent sample of data to predict the likely outcome. In this way, recent performance and player statistics are used to enhance the predictions of models that are developed on historic data alone.

Multinomial Logistic Regression

A multinomial logistic regression model is merely an extension of a binary logistic regression model as it allows for more than two classes of the dependent variable. We will use a method of variable selection to choose which variables are significant in predicting the dependent variable, and that would be our independent variables for our model. The model will then give us the probabilities for each class (or goals scored). If we repeat this for the opponent team, we can logically arrive at the score of each team by choosing the class with the highest probabilities for each run of our Multinomial Logistic Regression model.

Poisson Regression Model

The Poisson distribution is a probability distribution that can be used to model data that can be counted, like the number of goals scored in a football match. This means we have a method of assigning probabilities to the number of goals in a game and from this, we can find probabilities for different match results. To be able to find the probabilities for different number of goals we would use the regression method, based on certain variables, such as the strength of the attack, ratings of the team etc.

If you'd like to see the predictions of each model, view our algorithm show-down!

Will our predictions perform better than the deaf cat?

In keeping with recent FIFA World Cup tradition, Russia has appointed an animal to predict their team’s outcomes: a deaf cat named Achilles, who will be choosing between two bowls of food, marked with the opposing teams. You might remember Paul the Octopus, who at the 2006 World Cup predicted host nation, Germany’s, matches with 100% accuracy. While we have three horses in the race with our three different techniques, we will ultimately be backing one result, and measuring our success against Russia’s deaf cat. High standards, indeed!

Will data analytics or animal succeed? Check back here, or follow our Twitter account to find out!

Contact Us to Discuss Your Business Requirements

Francel Mitchell
Francel Mitchell
Francel Mitchell is the Head of Decision Analytics at Principa. Francel’s team has a winning track record using descriptive, predictive and prescriptive analytical techniques within the financial services, marketing and loyalty sectors. Utilising available data and through the application of advanced analytical techniques, the team takes pride in their ability to predict human behaviour that can be used to assist business in making profitable decisions.

Latest Posts

Amazon Web Services Vs Microsoft Azure: Which Cloud Provider Should You Host Core Business Systems On?

Deciding on which cloud service to host your core business systems on can be a daunting task. Amazon Web Services (AWS) and Microsoft Azure are two of the biggest players around, while Google Cloud and IBM Cloud are also gaining market-share.

Truth Seeker: How To Avoid Logical Fallacies And Cognitive Biases In Data Science

We have released a new eBook titled Truth Seeker: a guide to avoiding logical fallacies and cognitive biases in data science.

Exploring The Evolution Of Data Science In Banking

We’re looking forward to attending this year’s Evolution of Data Science in Banking conference. The 2019 conference will be held on the 5th and 6th of June at the Indaba Hotel, Fourways, Johannesburg. The event will explore the use of data and analytical techniques to help financial services providers meet regulatory and reporting requirements. Also to be discussed is how running analytics at a product level can provide a more holistic view of customers across their portfolios.