Although this World Cup is overshadowed by many sporting and ethical issues, the research team decided, out of scientific interest, to use the machine learning approach, which had already been used successfully in previous tournaments, to create probabilistic predictions “, said Achim Zeileis of the University of Innsbruck.
The research group composed of Andreas Groll and Neele Hormann (both TU Dortmund), Gunther Schauberger (TU Munich), Christophe Ley (University of Luxembourg), Hans Van Eetvelde (University of Ghent) and Achim Zeileis (University of Innsbruck) simulated the upcoming World Cup 100,000 times.
Four sources of information for the calculation
The researchers’ calculation is based on four sources of information: a statistical model for the strength of play of each team based on all international matches over the last eight years (Ghent University and Luxembourg) and another statistical model for the strength of play of teams based on betting odds from 28 international bookmakers (University of Innsbruck).
Information about the teams, such as market value, and their countries of origin, such as population (TU Dortmund and TU Munich) have also been taken into consideration. The fourth source or fourth “partner” is a machine learning model that brings together the other sources and optimizes them incrementally.
Model fed with past WM data
The researchers previously trained the model with historical data, as Andreas Groll of TU Dortmund explains: ‘We fed the model with current data from the last five World Cups, i.e. between 2002 and 2018, and with the actual results of all matches of the respective tournaments can be compared – ideally, the weighting of the individual sources of information for the current tournament will be very precise”.
Brazil leads ahead of Argentina and the Netherlands
The researchers simulated match by match, following the tournament draw and all FIFA rules. This translates into odds for all teams to progress through individual rounds of the tournament and ultimately win the World Cup. The favorites this time around are Brazil with a 15% chance of winning, followed by Argentina (11.2%), the Netherlands (9.7%), Germany (9.2%) and France (9.1%). .
The fact that the outcome of the World Cup is not yet certain and therefore remains exciting is shown by the relatively low probability of victory even for the top four nations. “It is in the nature of predictions that they can also be wrong, otherwise even football tournaments would be very boring. We provide probabilities, not certainties, and a 15% chance of winning also means that 85% of the time the team cannot win the tournament,” explains Andreas Groll.
So far, researchers have often been right
So far, the predictions have been quite successful: Achim Zeileis’ Innsbruck model, which relies on the adjusted odds of betting providers, was able to correctly predict the EURO final in 2008, as well as world champions Spain and of Europe in 2010 and 2012 – read more about this in Forecasts: France become European champion.
This year it will be used for the second time after Euro 2021 as part of a larger combined model developed by the teams of Andreas Groll (TU Dortmund), Gunther Schauberger (TU Munich) and Christophe Ley (University of Luxembourg) at the Football World Cup 2018 had surpassed the quality of predictions from betting providers.
An unusual date puts the teams at a disadvantage
A date for the 2022 World Cup during the winter months would raise very critical sporting questions in addition to ethical issues, Zeileis says.
“In the winter months, all major football leagues in Europe and South America must now interrupt their usual match schedule to accommodate the tournament. As a result, national teams have less time to prepare and players have less time to recover before and after the World Cup. In connection with the extreme weather conditions, this also increases the risk of injury,” explained Achim Zeileis.
Having a team with many players in international leagues – such as the Champions League, Europa League, Europa Conference League – could therefore prove to be a disadvantage this year rather than an advantage, as Andreas Groll explained: “All these factors make it more difficult to predict the performance of the tournament, since the variables that have proved to be very significant in the previous World Cups may not work or work differently”.
The model can also be used for more accurate weather forecasting
Incidentally, the model that was trained in this way can also be used for other predictions in the future: the researchers say that better football prediction could also provide more accurate weather forecasts in the future. How well the model performs when it comes to football becomes apparent on the evening of December 18 at the latest, the night of the World Cup final.