One of the most important roles that advanced stats can play in sports conversation is helping us to understand what’s real and what isn’t. Luck and randomness play such a large part of a given outcome — especially in the two sports I write the most about, college football and soccer — and over short periods of time, both your eyes and the scoreboard can lie to you.
With that in mind, I wanted to see which statistics are particularly predictive from one sample to another, and which stats from one season translate pretty well when looking ahead to the next. This sport is blurry compared to even a tricky sport like American football — at least football has stoppages and specialized 11-man units for offense, defense and special teams. But looking at what actually appears predictive and sustainable allows us to break things into compartments, key factors of sorts.
In my soccer writing over the past few months, I’ve randomly talked about how certain stats are predictive of success moving forward, because either they are stable quality measures, or proof that success in certain areas leaves you vulnerable to regression to the mean. As the 2020-21 season gets underway, I wanted to further explore what these stats can tell us moving forward. Here’s how I’ve found myself grouping them:
Efficiency and finishing factors. The basics (goal differentials, expected goals and whatnot) along with some situational extras. These are the main measures I’ve found that correlate well to the next season’s performance.
Turnover and field factors. Not the greatest title in the world, but go with it. Two of the most important factors in American football are turnovers and field position. Essentially, where you start your possessions and how frequently you’re giving the ball away at inopportune times. Those ideas work pretty well in soccer, too. Better, even.
Ball control factors. These are not as directly impactful as the factors above, but they remain predictive. In short, teams that control the ball better than their opponent tend to find more success. Yes, there are exceptions and yes, a possession-heavy style can leave you vulnerable to counterattacks and other maladies. But it’s still step one toward solid point totals. If you’re trying to win without the ball, you have to do a lot of other things well.
Regression-to-the-mean factors. Simply put, if you are a little too good in these categories, both you and those categories are likely to regress soon. Similarly, if you’re poor here, you’re likely to improve.
Now let’s get specific and put these stats to action. What can they tell us about the (extraordinarily long) season that just unfolded? What can they tell us about what’s to come? We’ll use the Premier League as our guinea pig before picking a champion in each of Europe’s big five leagues.
Below are last year’s Premier League teams, plus the three teams promoted from the Championship. (We don’t have second-tier data from every major country, but I’ll share it where applicable.) For teams that underwent noteworthy personnel changes within a given year, I noted both their full-season stats and their stats after said change.
Some immediate takeaways from the table above:
– While Manchester United undoubtedly improved after adding Bruno Fernandes at the end of the January transfer window, XG differential paints a conflicting picture. How much of that improvement was real and will carry through over a long period of time?
– Chelsea had a bit of a defensive problem last year, but you didn’t need fancy math to know that.
– Leicester’s overall stat profile backed up their top-five status even if it’s impossible to separate their late-year loss of form from the big picture.
– After Jose Mourinho took over, Tottenham Hotspur improved to what amounts to a third-place pace — 1.73 points per match, almost identical to what United and Chelsea produced while qualifying for the Champions League. Better yet, when Mourinho had a healthy Harry Kane at his disposal, Spurs averaged 1.94. XGD again throws this improvement into question a bit, but if or when Kane is 100%, I expect Spurs to play at a Champions League level again moving forward.
– Arsenal’s form improved a bit under Mikel Arteta, but he certainly didn’t have the pieces he needed/wanted to apply any sort of pressure. (And as with United and Spurs, the Gunners’ XG differential raises questions about just how much their form actually improved.)
– Southampton were the third-best team in the league in terms of turnover/field effects. #HasenhuttlEffect
– Carlo Ancelotti backed off the pressure when he took over at Everton, and while it seemed like their defense improved a bit in the process, once more XGD raises questions about how much better the Toffees actually were.
– Whew, Newcastle’s base stats were bad. Like, “ahead of only Norwich City” bad. And of the league’s Nos. 14 through 18 teams last year, West Ham seemed to have the most to offer overall.
On to ball control!
(Note: this is possibly overthinking, but I’m using a different color scheme here as a way of noting that these factors aren’t quite as directly important as those on the previous table. It’s good to be good, i.e. orange, at these things, but it isn’t quite as vital.)
Pep Guardiola’s Manchester City have mastered the possession game. Of that, there is no doubt. They hold the ball the longest, average far more passes per possession than their opponents, attempt fewer long balls than anyone else and deploy long balls better than anyone else. They were particularly vulnerable to counterattacks compared to previous years — and it cost them dearly in both the Premier League and Champions League — but this style will put you in position to win far more often than not. After all, this was seen as a dreadfully disappointing season even though they still finished second and went to the Champions League quarterfinals.
Any improvement Arsenal made under Arteta was not derived from any sort of Guardiola-esque effects: their possession rate and per-possession passing numbers regressed, and their long ball usage went up slightly. (He also didn’t have much of a chance to bring in his own personnel and was simply working with what he had.)
Newcastle were the ultimate anti-possession team, only without good vertical passing numbers. (Or good offensive or defensive numbers overall.) Based solely on last year’s stats, and not taking into account who they did or didn’t add in the offseason, the Magpies start out this year at the bottom of the heap.
Next up: who’s likely to regress, or progress, toward the mean?
The color coding here is similar to the last table; darker blue = regression candidate, while deeper orange = progression candidate.
From an XG-only perspective, this should have been another down-to-the-wire Man City vs. Liverpool title race. But City suffered quite a few losses that were, from an XG standpoint, rather fluky. Those aren’t likely to play out the same way in a new season. (Meanwhile, down in the English Championship, Leeds were even less lucky in their losses, which is the main reason why the team didn’t pull away for the title until late in the season.)
We heard a lot about how Jurgen Klopp had adjusted his tactics to help make Liverpool better at killing off wins and maneuvering better in tight games, but there’s almost no way for them to possibly average 2.5 points per game in close matches again. As I wrote in July, this was almost entirely the source of the Reds’ advantage over Manchester City, and they’re going to fall back to earth a bit. How much? (And how much will City improve in this category?) We’ll see.
Quite a few teams are investing heavily in set-piece prowess, so the deep blue teams in that column — Liverpool, Manchester City, Burnley, post-Fernandes Manchester United — aren’t necessarily going to lose their advantage altogether. But again, we’re talking about extremes. It’s hard to derive that much of an advantage two years in a row.
An Aston Villa opponent picked up a second yellow or red card in seven matches, and while Villa somehow managed to only pick up seven points from these matches, that raised their PPG rate just enough to stay up. You probably can’t count on that saving you twice.
I’ve been playing with data that compares teams’ save percentage margin to both their and their opponent’s XG-per-shot rates — the higher the XG/shot, the lower the save percentage. In the future, I should be able to do a better job of separating skill from randomness here, but at first glance let’s just say that Chelsea’s save percentage margin is almost guaranteed to go up this year, and Arsenal’s is almost guaranteed to go down.
Looking at the hierarchy these stats seem to create, and taking into account offseason acquisitions — most notably, Chelsea’s rampant spending binge — my top five for this season looks like this:
1. Manchester City
4. Manchester United
5. Tottenham Hotspur
I’m excited to see what Spurs look like with a full season (hopefully) of Kane and Mourinho, I think Leicester have absolute top-five potential despite losing another star to a big sale (Ben Chilwell to Chelsea), and now that Arteta has had an offseason to further flesh out his system and work some personnel moves, I think Arsenal are a bit of a wild card.
Mark Ogden says Manchester United’s lack of investment will leave them behind Chelsea in the Premier League.
Dark horse: Leeds. They would have probably been mid-table or close to it with last year’s team, and I’m curious how well they can play Marcelo Bielsa’s possession game against steadier high-end talent.
Let’s quickly move through other Big Five leagues to see what stood out there.
OK, what do we have here….
– Quique Setien deserved better. Barcelona‘s XGD with him in charge was easily the best in the league, but their actual goal differential went down with him in charge. Barca had brilliant possession numbers, as you’ll see, but they didn’t do a ton with the ball, kept matches closer than they could have been, and paid the price just enough for a smoking hot Real Madrid — overachieving a bit, it must be said — to pass them after the restart.
– Almost no top European team does less with pressure than Zinedine Zidane‘s Real Madrid. That was doubly true after the coronavirus stoppage. Los Blancos simply stayed organized, conserved energy and relied upon their raw talent advantages. It worked, even if their XGD suggests it shouldn’t have worked quite as well as it did.
– Even when they were mired in a tighter-than-expected race for third place, Atletico Madrid easily had the third-best stats in the league. Eventually, they pulled ahead.
– Leganes might have been the most unlucky relegated team in Europe’s big five. They pressured extremely well, and their XGD was decidedly mid-table. But either they had the worst finishing skill in Europe, or Lady Luck just really wasn’t a fan.
– Atletico: the best team in Europe with a sub-50% possession rate. But you probably assumed that already.
– Villarreal‘s turnover stats and corner margin tells you the ball was in dangerous spots far too often. That they succeeded anyway is remarkable, but they were playing with serious fire.
– Granada made it to the Europa League qualifying stages with an aggressively anti-possession approach. They were even better at it than Burnley.
– During the post-stoppage stretch run, Real Madrid almost locked down maximum points while rarely dominating. That probably won’t continue, at least not to that level.
– Your save percentage margin tends to regress toward the mean … unless you have Jan Oblak. Atletico will probably post similar numbers this season.
– Alaves were inferior to Leganes in just about every possible way, except the one that mattered most: the table.
If Barca weren’t in the middle of what we’ll politely call a turbulent offseason, I’d be tempted to predict them No. 1 simply because of the bad fortune down the stretch. Alas. Instead, we’ll go with this:
1. Real Madrid
3. Atletico Madrid
5. Real Sociedad
I was tempted to bump Atletico to No. 2, but smarter money is likely on Barca finding a way to produce during what is almost surely Lionel Messi‘s Last Dance season.
– Before Hansi Flick took over for Niko Kovac as manager, Bayern Munich were a slightly unlucky team likely to turn things around as they had the year before. When Flick took the reins, Bayern almost immediately became the best team in Europe. Those post-Flick turnover numbers are absolutely staggering.
– From an XGD perspective, Borussia Dortmund were lucky to finish second. As you see, though, their numbers improved across the board after the midyear addition of Erling Haaland, but they improved on defense, too, creating more structure in their attack and leaving themselves less vulnerable to counters.
– Defensive lapses cost Eintracht Frankfurt significantly, but the baseline stats suggest they should have been a few spots higher up the table. They should contend for a European spot this year.
– A horrible XGD should have stuck Freiburg near the relegation zone. Instead, they were able to ride out an early hot streak (22 points in their first 12 games) to a comfortable mid-table finish. Probably can’t count on that happening twice.
– Cologne’s season: not even slightly boring. Only Bayern and BVB matches averaged more combined goals, and the Billy Goats sandwiched two dreadful cold streaks around a torrid hot streak. Stability: who needs it?
– One of the things I want to explore moving forward is what kind of promoted teams tend to fare better. Stuttgart played a proper possession game in earning promotion from the 2. Bundesliga last year, applying solid pressure and hitting 63% possession overall. Arminia Bielefeld, meanwhile, were pretty defense-heavy and chunked a few more long balls downfield at the expense of possession numbers. The possession game wins, but can Stuttgart establish anything close to these numbers against the top division? Is it better have a defensive approach as you move up?
– Augsburg deployed a staggeringly vertical, anti-possession, counterattacking style, alternating between winning close matches and getting thumped.
– Bayer Leverkusen were, along with Real Betis, the poster children for “possession isn’t everything.” Die Werkself hogged the ball as well as almost anyone, but didn’t create a ton of high-quality opportunities — and that was with Chelsea-bound attacker Kai Havertz — and didn’t pressure opponents all that well.
– Hoffenheim and RB Leipzig succeeding with set pieces isn’t surprising given their innovative backgrounds. They should maintain those gains.
– Leipzig should also bounce back significantly from a close-matches perspective. Their title attempt was derailed by an incredible 12 draws.
– Bielefeld’s defensive numbers were solid, but maybe too solid. They probably can’t count on such a dramatic save percentage advantage in the first division.
1. Bayern Munich
2. RB Leipzig
3. Borussia Dortmund
4. Eintracht Frankfurt
5. Borussia Monchengladbach
Frankfurt knocked on the door a lot last year without much to show for it. Adi Hutter’s Eagles should bounce back. The race for second place between Leipzig and Dortmund could also be a thriller. (First place: less thrilling.)
– The theme of anything I wrote about Serie A this summer was basically the same: Juventus‘s title streak really should have ended last season. Juve were fourth in goal differential and third in XGD, but an early cold spell from Atalanta and head-to-head faltering by Internazionale helped Juve to keep the streak alive.
– I am fascinated by Inter heading into this season. They were probably the best team in the league last year, and they have an exciting, soon-to-peak core if they can keep it away from poachers.
Weston McKennie used to play as Cristiano Ronaldo in video games. Now they are teammates at Juventus.
– If Inter weren’t the best team in the league, Atalanta probably were, even if their aggressive style leaves them vulnerable to counterattack (and doesn’t help them to preserve a late lead in the Champions League quarterfinals).
– AC Milan were potentially a top-three team over the second half of the season, faring well enough down the stretch that they elected to hold on to manager Stefano Pioli instead of bringing in Ralf Rangnick as rumored. Now let’s see them keep it up for a full season.
– Napoli did almost everything right last year except finish. They were the best pure ball control team in the league, but they were just eighth in scoring, far worse than their XG stats suggested they should be. That made them a prime bounce-back candidate even before they shelled out big bucks for Lille‘s Victor Osimhen.
– Sassuolo were a poor man’s Napoli — possession numbers were nearly as good, but while finishing wasn’t a problem, defensive breakdowns very much were.
– Juve weren’t Liverpool from a close-matches perspective, but wringing points from tight matches is how they kept themselves ahead of the field. They have a reputation for this, and maybe they can keep it up, but I don’t expect the challengers to regress much. Juve will have to improve overall to keep the title streak going. (And they very well might do just that.)
– OK, Lazio might regress. The Biancocelesti were a bit lucky to be title challengers as long as they were, although they were also bitten pretty hard by the injury bug late in the year.
5. AC Milan
ESPN FC’s Gab Marcotti believes Cristiano Ronaldo had a big say in Juventus’ decision to appoint Andrea Pirlo.
This was maybe the hardest league to forecast, as you could make a case for any of last year’s top seven teams to be top three or so this year. Theoretically that could make for a thrilling late-year push. And Juve will probably come out on top again just to spite us.
The Ligue 1 season was never resumed after the coronavirus stoppage, and the table above hints pretty clearly at an incomplete season. While Paris Saint-Germain were predictably the best overall team by far, you could make the case that the next two best teams were Lille and Lyon, neither of which finished in the top three or qualified for the Champions League. Meanwhile, you could make the case that second-place Marseille was the sixth- or seventh-best team overall and was due some regression had the season continued.
– Lyon were 10 points out of the top three with 10 matches to go — a dreadful start to the season (10 points in their first 10 match days) doomed them, and that probably wasn’t going to change if the season had continued — but Lille finished a single point outside of the top three. Considering how much money Champions League play has to offer, that was an awfully costly point.
– What Ligue 1 lacks in competitive title races, it makes up for in diversity. There are lots of different styles among the top teams. Stade Rennes and Stade Reims were defense-first, Nice and Lyon leaned on the possession game, and Marseille and AS Monaco tried to split the difference with various levels of success.
– Again, Marseille were sort of getting by with smoke and mirrors, while rival Lyon were the opposite. Those two teams will likely see their fortunes flip this year, especially considering Marseille now indeed have to navigate Champions League play while Lyon, which didn’t even qualify for the Europa League, stay fresher than they intended to be.
People usually think of stats as a way of finding answers to questions, and while they’re definitely that, I like to view them as content generation machines, too. These tables make that point for me pretty well. You look at some of these stats, you ask yourself “I wonder why that is…” and away you go. I came up with about 100 questions to ask and potential stories to write from this exercise, and hopefully you did too.