For the World Cup, we asked
bloggers around the world to join a common project. Participants should write an article on their preferred national team and we support them in doing this by delivering stats and charts of all players that Goalimpact would select for that country. Every blogger was free to write the story they'd like. Some were discussing the players Goalimpact selected and some were using the material to supplement their own view.
Daniel Altmandiscussed the Argentinian team and used the opportunity to raise some points on Goalimpact itself that he considered suboptimal. We like critical feedback as it gives us the opportunity to improve and thank Daniel for it. Hence we feel obliged to elaborate on Daniel's points to return the favor.
Goalimpact is predictive
Daniel seems concerned that Goalimpact is a metric that predicts future performance rather than describing past performance. He doesn't clearly state the reason for his concern, but he notes that this is a property that is neither shared by plus/minus statistics nor by
Shapley values. Daniel is absolutely right in this. Both statistics are descriptive and not predictive. They fully comply to the ten traits for a good soccer metric that Daniel
proposed, but they miss out on trait number eleven that we
added.
One point that I find particularly important is the individual robustness. I would define it narrower than Daniel, by demanding that for any score value X at time t, the expected score value at time t+1 is also X. In other words, the metric should be an unbiased predictor of the future. The score should neither follow a trend, e.g. increasing with the number of minutes played, nor should it regress to the mean.
For our point of view, the predictive power of Goalimpact is not a quirk, but instead the single most important feature of the metric. After all, you don't want to line up the players that have been good in the past, but the players that will be good in the upcoming games.
Goalimpact gives high value to past games
Daniel's second concern was the fact that Goalimpact does not use any weighting when analyzing the game outcomes. A game a player lost in his youth has exactly the same impact as a game lost yesterday. This observation again is absolutely correct. An introduction of a weighting has been often requested in comments on the blog and we agree that using a weighting could improve the metric to further improve its predictive power.
Unfortunately, it is not very easy to implement and also not very obvious how big the improvement would actually be. Because there is a trade-off with the reduction of the influence of luck. Imagine, for the sake of the argument, that we calculate the value only using the last game. Obviously, the highest Goalimpact would always be with a player that was lucky enough to come in late in a game to see his team score. He would have an incredible goal difference per minute. In other words, too short averaging periods lead to too much influence of luck on the result and hence the score would be random and not predictive anymore. Humans tend to give too much value to recent observations and, in our opinion, many expert do just this e.g. when they buy a striker just because he played a brilliant season for once.
Goalimpact is erring on the other end. It gives too little value to the recent past. This makes it slow to adopt on changes. Assume a player changed his tactical position and plays considerably better on the new one than his old one (e.g. Bale, Durm). The lower impact the player had in his old position will keep his Goalimpact down for some while. The Goalimpact will raise only to the value in the new position only gradually over time. We may address this in a later version of the algorithm.
Some players are overvalued, some undervalued
Goalimpact is a statistical metric and as such it is sometimes too high or too low just by random variation. This is not terribly bad as long (a) it is unbiased and (b) the signal it contains is not dominated by the noise. We should expect Goalimpact to over- or undervalue some players and obviously the metric will deliver on this.
However, identifying over- or undervaluation implies the usage of another metric according to which there is a misvaluation. Human expectation or expert consensus is also only a kind of metric. Whenever we see a misvaluation it is unclear which of the two metrics we compared is actually more right (both will be wrong to some extend). Daniel uses the following metrics to compare with the Goalimpact.
- Playing at a major team (for all positions)
- Goalkeeper save percentage
- Number of goals and assists (for midfielder and forward)
Not arguing that Daniel is wrong in his judgement, we just don't know, but it is unclear if the comparing metrics actually provide better judgement of the players. Playing at a major team certainly correlates well with being a good player, but it seems stretched to argue that everybody that plays outside a major team or even everybody playing outside a major league can't be a good player. We observe in every transfer period that good teams buy players from worse teams and thus at least the good team's managers seem to believe that at least some of the worse team's players are actually good.
Goalkeeper save percentage was investigated by
11tegen11. They found that it is actually a terrible metric to judged goalkeepers and concluded: "
Never judge a goal keeper by his saves". Regardless of this, Daniel is probably right that we should not have selected Franco Costanzo as first goalkeeper, because he was already retired from football which was unknown to us. As he return to the game from retirement, he probably had a significant amount of time without football training. This is not reflected in the score and hence there is good reason to believe that he currently is not as good as he once was.
We don't know how well non-penalty goals and assists per 90 minutes (NPGA90) perform in judging players. It has some serious flaws such as not correcting for team mates and opposition, but let's assume it is a good metric at least for forwards. For the players in question it was excellent in the season prior to the current one, so we are back to the question how to weight the past. It is not obvious if taking the current season only, as Daniel did, or all seasons, as Goalimpact does, introduces the bigger error. Again, both is wrong to some extend. They just err on different ends of the problem.
Conclusion
We agree with Daniel's statement:
I’m left with two possible conclusions. One is that Goalimpact is consistently telling us something that the market has missed entirely, both in the professional leagues and the national team. Another is that Goalimpact is missing something important.
Just we would like to add that both possibilities are true to some extend. Both Goalimpact and the humans that form the market are imperfect and always will be. Take the experts' opinion on Mauro Icardi. We're sure he is a good player, though we're less sure that his high NPGA90 of 0.7 in the last thirteen games is good measure of that. After all the metric is very unstable. The NPGA90 in the other nine games of this season was 0.45. But the experts seems to value the even more recent past higher than the recent past here. We expect him to be quite good in a few years, but he is only 21 and has still to improve until he gets to play on Champions League level in a constant fashion. But we can't rule out that the experts are right an he is actually already on that top level and his future NPGA90 next season will be closer to 0.7 than to 0.45.
Update: In a prior version it stated that Icardi's NPGA90 was 0.005. That was incorrect as it was the value per minute rather than per 90 minutes. Sorry for the mistake.