How much should you believe in the standings?
The first thing you learn about following sports is that there's nothing more important than the standings. Wins and losses are everything. They control how you feel about a team, and they control how you feel about the individual players. They determine whether a team will be in contention for a championship, and as far as people believe, when they first get into this, it's all about titles. It's not, but that realization comes down the road.
Something you learn later on is that, yeah, the standings are important, but they might not be predictive. It's fun when your team has a bunch of wins, but that doesn't guarantee a bunch of future wins to follow. Sports fans are always doing what they can to try to tell the future. No one can ever do well, but you can do better or worse.
Future-telling is the goal of projection systems. You could argue, maybe, the goal is to estimate a player or team's true talent at a moment in time, but true talent is virtually indistinguishable from expected future performance. Projections are everywhere in baseball analysis. Even when the word itself isn't used, observers are always making educated guesses about what's going to happen, which is a form of projecting. Official projection systems formalize it.
People have some trust issues with projections. Humans always want to believe they're smarter than human-designed machines or systems, and they especially distrust projections when they show something different from what's already happened. When projections are at odds with evidence, projections are given funny looks. It's perfectly natural, yet it leaves projections always needing to be validated. In this article, let's consider projections. And let's consider wins and losses. And let's consider what matters more, if we're trying to look ahead.
A little over a month ago, I tried a project. I had, in my possession, 10 years of preseason team projections. The season was also about two months old, so I reviewed the previous 10 years of baseball, around the two-month mark. I was curious what was a better predictor of the next four months: team performance over the first two months, or preseason team projection. As it turned out, the projections fared quite a bit better. After two months, you're better off keeping the same opinion of a team you had in March.
Here, I want to do something similar. But there are two twists. One, it's been more than a month since. And two, FanGraphs has published in-season projections for a few years. These projections account for changes to the team depth charts, so they're updated from preseason team projections. I was able to access them using the Wayback Machine. The goal here: what's more predictive, between three months of performance and updated team projections? I have just two years of data, meaning a team sample of 60, but we can at least see what's there.
It makes sense that, early on, the projections are more predictive than performance. Early performance can be totally random. There's a belief, though, that things switch as the season gets older. That, at some point, the standings mean more than the projections. Where we are now, the average baseball team has played 85 games. So for both 2013 and 2014, I split at the dates at which the average team had played 85 games. That puts us at the same mark in the season. Let's examine some predictiveness.
First, we have winning percentage through this point, and winning percentage after this point:
I'm not going to call that relationship strong, or weak, or anything. It just is. There's not yet anything to compare it to. There is evidence of a relationship, which is of course what you'd expect -- teams with good or bad records through this point tend to subsequently post good or bad records, respectively. There's variation around that, but then, we're dealing with a sample of just 60 teams, splitting seasons roughly in half, so it's inevitable there's going to be noise. This is mostly experimental.
Relatedly, let's look at the same graph, only this time, instead of using actual winning percentage, let's look at expected winning percentage based on run differential. You might know this as Pythagorean record.
No real difference. Some of the dots have shifted around, but the relationship is basically identical to the first one. This has given us no additional clarity. So we move on to the final graph: this time, we look at the projected performance over the rest of the season.
Ever so slightly, there's a tighter relationship. And there's a considerably steeper slope. The simplest interpretation: even at this point, you should put more stock in the projections than in what the standings say. A better interpretation would be that the jury's out for now, since we don't have enough of a sample built up, and there's lots of potential error in here. But we can at least say the projections aren't worse than the standings, in terms of seeing the future. We aren't at the point where the projections can be dismissed. That point probably never comes. Even over 80 or 90 games, a team's record can deceive.
The argument in favor of believing the standings is that projection systems tend to be slow to pick up on sudden improvements or declines. That's true, but sudden improvements or declines are also rare. And there's a bias here, in favor of the standings being more predictive: a team with a good record is more likely to add around the trade deadline. A team with a bad record is more likely to sell. If a team's record is different from its projected record, it's going to act on the former. So a team with a good record and a bad projection will quite possibly upgrade, and that's something a projection can't see. It's a small factor, but it's a factor.
Without question, the standings are important. They contain information, and not all of it is historical. Around this point one has to start taking over- or under-achievers seriously. But you also can't just throw projections away, because they don't predict any worse than the standings themselves, even despite a bias that works against them. Here's a link showing current rest-of-season projections. Some teams have been good, and are projected to be good. With other teams, there are differences. The Royals. The Astros. The Twins and the Mariners and the Indians. Over- or under-achievers, all of them. The projections tell a different future. That shouldn't be ignored, no matter how a season has felt for three months.
There's no arguing about wins and losses that are already in the bank. It's far less clear, though, what those wins and losses mean.