May 22, 2018 at 10:24 AM
In 2015, we predicted the Rugby World Cup to great success, out predicting 99.68% of humans. In 2016, we predicted the results of the Oscars, accurately predicting DiCaprio’s first win. This year we'll be trying our hand at predicting the outcomes of the FIFA Football World Cup, and we're cautiously optimistic about our predictions.
Why “cautiously optimistic” after our previous success?
This is a whole new ballgame – literally! While a number of our data scientists are passionate football fans, we have never applied our skills to determine the outcome of a football match. We’ll also be sourcing public domain data and testing different techniques that are rarely used in our day-to-day analytics. On top of that, we have our previous successes to live up to!
Why are we using new techniques?
Our teams are very busy working wonders for our clients, and predicting the results of the World Cup is considered a fun activity, as well as a training exercise. They therefore don’t get as much time as they would like to spend on building their models. We put our clients first! But our data scientists had so much fun the previous time, and they love a challenge, which is why they are taking this one on. By opting to use this exercise as training, they won’t be using inconsequential data to hone their skills, but rather real match statistics, and training time is now filled with building their models for sports predictions.
But, back to the question – why are we using different techniques? Firstly, the team want to challenge themselves, and by using different techniques, they not only get to do so, but they also explore and hone new skills. Therefore, instead of using techniques that we apply on a daily basis, we’ll be trying something fresh!
Which techniques will be used?
We've divided into three teams, who will all be using different techniques and going head-to-head in competition. The methods that each of the four teams will be using are:
This technique can be used to enhance predictions by using what we already know (determined by looking at historic game results), with a recent sample of data to predict the likely outcome. In this way, recent performance and player statistics are used to enhance the predictions of models that are developed on historic data alone.
Multinomial Logistic Regression
A multinomial logistic regression model is merely an extension of a binary logistic regression model as it allows for more than two classes of the dependent variable. We will use a method of variable selection to choose which variables are significant in predicting the dependent variable, and that would be our independent variables for our model. The model will then give us the probabilities for each class (or goals scored). If we repeat this for the opponent team, we can logically arrive at the score of each team by choosing the class with the highest probabilities for each run of our Multinomial Logistic Regression model.
Poisson Regression Model
The Poisson distribution is a probability distribution that can be used to model data that can be counted, like the number of goals scored in a football match. This means we have a method of assigning probabilities to the number of goals in a game and from this, we can find probabilities for different match results. To be able to find the probabilities for different number of goals we would use the regression method, based on certain variables, such as the strength of the attack, ratings of the team etc.
If you'd like to see the predictions of each model, view our algorithm show-down!
Will our predictions perform better than the deaf cat?
In keeping with recent FIFA World Cup tradition, Russia has appointed an animal to predict their team’s outcomes: a deaf cat named Achilles, who will be choosing between two bowls of food, marked with the opposing teams. You might remember Paul the Octopus, who at the 2006 World Cup predicted host nation, Germany’s, matches with 100% accuracy. While we have three horses in the race with our three different techniques, we will ultimately be backing one result, and measuring our success against Russia’s deaf cat. High standards, indeed!
Will data analytics or animal succeed? Check back here, or follow our Twitter account to find out!