Upon further review: Instant replay seems to be getting it right
Remember when people were worried about instant replay and how it would mangle the game of baseball beyond recognition? Well, baseball has survived the month of April and instant replay has become just part of the game. Take these numbers from the first month of April.
Given that there were roughly 400 games played in April, it means that there was a replay in every other game, and that it added roughly two minutes to a game when it was used. Not only that, but managers who challenged actually got the call overturned about 45 percent of the time. Not bad. Fears that Dan Brooks and I had about challenges being thrown around in an annoying (but mathematically correct) manner were unfounded. Manager ejections are down too. Now, instead of managers peeling out of the dugout looking like they might commit ump-icide, they calmly walk out and make small talk until the video guy gives them a thumbs up.
Like everything else in life, the thought of change was much scarier than the actual change that it wrought. Broadcasters have found a way to integrate the drama of the replay into the language of a baseball game, and they’re getting calls correct that they would have previously gotten wrong.
But is instant replay worth all the fuss? Let’s do some math and find out!
Warning! Gory mathematical details ahead!
The website Baseball Savant keeps a log of all instances in which replay has been used this season, often with links to the video of the play. I focused specifically on the calls that had been overturned, because that’s the real yield of the system. There certainly have been plays that have been called wrong this year that were not challenged, and those that were reviewed but were not reversed due to lack of conclusive evidence. But in ye olde days – like 2013 -- those would have still been blown calls.
I watched all instances of plays in which a call was overturned on instant replay from March 31 to April 30. I discarded the couple of instant replays which involved the now much-maligned “transfer rule” which MLB has since mercifully re-re-clarified. After watching almost all of them, I was surprised at how neat and orderly replay had actually worked out. One fear that people had going in was that situations would happen where it wouldn’t be clear where runners should go if a replay was upheld. Only in one case (Wil Myers fake catching a fly ball that had really hit the wall first) was there a question of what should happen next if the umpires overturned the call. In most other cases, like a bang-bang play at first, it was simply a question of whether the runner should be allowed to stay on first or go back to the dugout and an extra out added to the board. It was pretty clear where any other runners might go.
I wanted to find out how many runs had been affected by instant replay. For those who are initiated in this sort of thing, I used a run expectancy matrix (I used the one from 2013 for these analyses), but for any first-timers, a quick explanation.
Suppose that there’s a bang-bang play at first base leading off an inning. If the runner is called out, there’s one out and no one on base. If he’s safe, then there’s a runner on first with no one out. Now, no run scored, so how can we figure how many runs were affected? Take a look at this table. In 2013, a team that had no runners on and one out scored, on average, 0.2489 runs over the rest of the inning. A team that had a runner on first and no outs scored, on average, 0.8262 runs. That’s called the run expectancy. There’s no guarantee that if the runner is safe that he will score, or that a team won’t score anyway even if he’s out, but those are the averages. We also assume that a blown call is just as likely when it’s 15-2 and the potential run means nothing or when it’s tied in the late innings and the run means everything. Of course, that makes a big difference in real life, but to be able to put some value on it, we take an average.
To try and put some value on the bang-bang play, we’re going to take the difference between the two run expectancy values (0.8262 and 0.2489), which is .5773 runs. That’s how much is at stake in the replay. For each situation where a call was overturned in April, I looked at the difference between the run expectancy if the call was upheld and if the call was reversed. In some cases, a call would have ended an inning but reversing it would have kept the inning alive and allowed a run (or two!) to score. In those cases, there’s a lot of run expectancy riding on the decision.
In fact, the average reversal was worth .687 runs on the run expectancy chart. Over the course of 85 reversals, that’s about 58 “runs” that in some sense would have previously been given to the wrong team. If we assume that the rates at which replays happen and their overall value will stay consistent over the next five months, that’s 348 runs that would have been mis-allocated last year, but because of replay are now going to their rightful owners. Last year, there were a total of 20,225 runs scored during the regular season, so 348 runs would represent 1.7 percent of all runs. That’s certainly not trivial. If we also assume that just as many bad calls were made last year that could have been reversed if instant replay had been in effect, we could make the case that between 1.5 and 2 percent of runs were “given” to the wrong team last year. And the year before that too.
Now, from an individual team’s point of view, there are some cases where they see a bad call that went their way and times when it went against them. If we assume that umpires are unbiased in which teams are victimized by bad calls, then we might assume that in the end, a team ends up re-claiming their property as often as they have to give back their ill-gotten gains, and it all evens out in the end. But, that’s not always the case. In the same way that if you flip a coin 10 times, the most likely outcome is five heads and five tails, there will be times when there are 6-4 or 7-3 outcomes. It’s possible -- though not very likely -- to have a 10-0 outcome, just ask Tom Stoppard. There’s a formula for determining exactly how likely those outcomes are, called the binomial distribution.
In April, there were 85 reversals league-wide (let’s just round that to 90), and each reversal affects two teams. Assume that rate continues to hold over a six-month season, a team might expect to have 36 cases where they are involved in a reversal. The most likely outcome is an 18-18 split of times when the reversal went their way and times that it didn’t. Then again, it isn’t always like that. In fact, according to the binomial distribution, we would only expect an exact 18-18 split only 13 percent of the time. The other 87 percent of the time, there’s an imbalance. A lot of them will be 19-17 or 20-16, but what are the chances that a team might have a 24-12 split, just by pure chance? The answer is a little more than 3 percent, meaning that it is likely that one of the 30 teams will have such luck (or bad luck). Because each reversal is worth .687 runs, a 24-12 split works out to a swing of 8.2 runs – most of a win. In fact, a split of 22-14 would work out to a net gain or loss of about five runs, roughly half a win, and the binomial distribution tells us that we might expect, again just by chance, that about a quarter of teams would have a swing that big (or bigger) in any given season. Some might be better off for it, some worse, but that’s randomness for you.
Because of instant replay, value is now being returned to its rightful owner. That means that in the days of yore – back in a curious time in the distant past when a song almost entirely in Korean was No. 1 in the United States -- there were probably similar numbers of blown calls that could have been overturned, but weren’t because we lacked instant replay. Some team probably got screwed out of (or were the undeserving beneficiary of) most of a win just from those bad calls. Seven others probably suffered from the theft of or enjoyed the fortune of an extra half of a win. It’s probably been like that for a while. Instant replay has taken this source of luck out of the game.
Do you still love the human element?
I know, part of the charm of baseball is that the umpires are human and that they make mistakes – at least that’s what everyone says until Don Denkinger torpedoes your team’s World Series championship chances. If there’s something that instant replay is doing qualitatively, it’s taking away that gnawing feeling of dread that used to accompany a game where a team won, but because of a call that on further examination was clearly wrong. You can walk away from a game knowing that the game was more likely to be settled by the players on the field than by an umpire who had a bad angle and made his best guess. I’ll trade two extra minutes for that any day.
But if the aesthetic argument doesn’t work for you, here’s one with cold, hard numbers attached. The instant replay system that we now have has fixed a problem that was affecting something around 1.5 to 2 percent of all runs scored in baseball. In addition, it has made a team’s record less dependent on luck, to the tune of a half a win or more for a sizable handful of teams. No, that’s not a complete game-changer, but that’s not trivial either, and I don’t know that people really realized how much of a difference bad calls could make in the game.
What drives that fact is that bad calls were, despite being very emotionally salient, actually pretty rare even in the pre-instant replay world. ESPN estimated in 2010 that there were only 1.3 calls per game that were close enough to need review, and only 20 percent of those were conclusively called wrong on the field. Most of the time, it’s not that hard a call, and even when it is a hard call, umpires still have a pretty good record. So, it’s going to be very few plays that are going to actually be affected by instant replay, but the cost of a blown play, even something like a bang-bang play, is worth a lot. Consider that in a world where teams score four runs or so per game, a swing of .5773 runs is about 1/7th of its average production for a game. Low frequency, high value events are ripe for high variance outcomes – which means that they can have a profound and incorrect effect on a game and on a season and that effect comes down to the luck of the draw. Thankfully, baseball put in a system which deletes a good chunk of that problem.
So yes, maybe I am welcoming the new robot overlords. And I would encourage you to as well. They are making the game of baseball better.