Building the strike zone
It has become abundantly clear in recent years that the strike zone, as defined in the rulebook, is not the zone that is called. The true strike zone, the one the umpires and batters see, is affected by myriad factors, including everything from the count to the catcher. The actual zone moves, shrinks, stretches, and distorts over the course of a baseball game depending on any number of details.
Using Pitchf/x, we can explore this hidden strike zone, the one that is actually called but not on the books. To do so, I built a model to predict whether each pitch thrown in 2014 would be called a ball or strike. I start with the standard elements of the zone as characterized by the rulebook, namely the horizontal and vertical coordinates of a pitch, and the height of each batter (a proxy for their stance). Then, I layer on additional components, each of which is known or suspected to affect the zone’s boundaries.
For each potential element of the zone, we can calculate the total accuracy of the cumulative model we’ve built up to this point. If all goes well, as additional ingredients are added, we will get to ever-more truthful estimates of whether a pitch will be called a ball or strike. Furthermore, by way of telling how much the zone has changed when we add some new factor, we can take the average absolute difference in called strike probability. This statistic shows the extent to which each element changes the way balls and strikes are called.
The impact of the count on the dimensions of the zone doesn’t get nearly as much attention as it deserves. The count modifies called strike probability more than any other element I tested, and yet, it’s not difficult to find a baseball fan unaware of the count’s influence on the zone. That influence is best observable at the extremes, for example in comparing a 3-0 (blue) to a 0-2 (red):
These are the horizontal and vertical coordinates of some pitches about 70% likely to be called strikes, for the two different counts. As you can see, in the midst of the pitcher’s count, the red dots are nearly encircled by pitches of the same called strike probability in the hitter’s count. The zone therefore shrinks and expands substantially.
In sum, the boundary of the zone moves inward by more than 1.5 inches, about half the diameter of a baseball, from the most hitter-friendly to the most pitcher-friendly count. That the zone contracts and expands in this way could be regarded as a feature, not a bug. After all, it offers the disadvantaged batter or pitcher an easier path to battle back into the at-bat, making the outcome a little less certain (and maybe a little more interesting).
Accuracy of called strike model: 91.80%
Average absolute change in strike probability due to the count: 5.29%
Catcher framing is a well-known phenomenon at this point. Framing is the ability of the catcher to influence the probability of a strike call. This objective is accomplished by the catcher receiving the ball with a minimum of excess movement, allowing the umpire to track the path of the pitch more easily, and hopefully make a more accurate call.
Catcher framing has been vilified in some quarters as a form of cheating or deception, as a means of "stealing strikes" from the batter. I don’t buy into this, and according to the descriptions of great backstops and umpires, that’s neither the objective nor the outcome. Instead, framing is best thought of as enhancing the umpire’s ability to make correct calls by decreasing the interference of the catcher on the most marginal pitches.
To reinforce this notion, look at the calls which are most affected by catcher framing (in purple), as opposed to all called pitches (blue).
This ghostly, purple imprint is the site of a catcher’s greatest impact. Framing isn’t stealing strikes four feet off the plate or six feet high, it’s primarily making very tiny adjustments to the strike probabilities of pitches painting the black. The reason that’s so important is that a catcher touches every single pitch thrown in every game — so, miniscule, seemingly insignificant differences quickly add up.
Accuracy of called strike model: 92.08%
Average absolute change in strike probability due to the catcher: 2.62%
Unsurprisingly, each umpire has their own small set of biases. Some call a slightly larger strike zone, some smaller; some high, some low. Provided that the umpires are consistent in their deviations from the rulebook, the impact is unlikely to favor one team or the other. There’s also evidence to suggest that players are well-aware of the ways umpires vary, and take that into account in deciding when to swing. Still, as a fan, it can be jarring to switch between games with subtly different strike zone boundaries. Perhaps these discrepancies are simply understood and accepted by the players, but to the uninitiated they may appear abrupt and incongruous.
Accuracy of called strike model: 92.26%
Average absolute change in strike probability due to the umpire: 2.2%
The Batter’s Handedness
It’s a fact that left-handed and right-handed batters don’t see the same calls. Lefties have long been forced to defend an area just off the plate that right-handers don’t have to worry about. Colloquially, this extra rectangle of left-handed disadvantage has been termed "the lefty strike". Interestingly, in recent years, much research has shown that the area of the lefty strike has diminished. One hypothesis for its disappearance is related to the newfound ability to grade umpires objectively, on the basis of their calls using Pitchf/x.
Given that I am analyzing data from 2014, I should discover a smaller lefty strike. And indeed, factoring in the handedness of the batter affects the strike zone to a small, but still significant extent. Although some impacts of Pitchf/x upon the strike zone have been pernicious and distressing, to the gradual diminution of the lefty strike, I say good riddance.
Accuracy of called strike model: 92.52%
Average absolute change in strike probability due to the batter’s stance: 2.35%
The flipside of catcher framing is the potential effect of the person on the mound. Pitchers could conceivably add or subtract strike probability by their reputations (if they are known as whining malcontents, perhaps the umpire fails to give them the benefit of the doubt). Alternatively, they might be able to manufacture strikes by precisely hitting the targets laid by their catchers, making the task of framing less challenging.
But in fact, I don’t see much evidence in my model that pitchers influence the probability of a called strike. The model in which I account for the identity of the pitcher is only barely more accurate than the one in which I leave it out. Thus the impact of the pitcher, if it is present, must be relatively small, and perhaps perceptible only in even larger samples. Here, again, we might be observing a positive, recent effect of Pitchf/x: Umpires might be hesitant to rule in favor of a pitcher they like (or against one they hate) for fear of being chastised later for having missed a call.
Accuracy of called strike model: 92.54%
Average absolute change in strike probability due to the pitcher: 2.0%
When all is said and done, having added these five elements, we arrive at a model that’s able to tell whether a pitch will be called a ball or a strike with an accuracy of 92.5%. That’s about as high as most models of the strike zone go: It appears that no combination of factors will perfectly describe the way an umpire calls a pitch. Other factors, ranging from temperature to inning, might make marginal contributions, but I suspect they will not be able to push the accuracy much beyond 93%. Maybe we are limited by the margin of error in the PitchF/X system, or maybe there’s some residual noise in how umpires call pitches that cannot be explained away no matter what.
It’s worth noting, too, that the basic model using only the coordinates and the batter’s height achieves a predictive accuracy of about 90%. In other words, although most of the above-mentioned factors influence the strike zone, they do so marginally, influencing a handful of pitches per game. They would be almost undetectable if not for the great magnifying power of the technology at our disposal.
Let’s not undersell the job the umpires do. Tracking the 3-dimensional motion of a 3-inch sphere hurtling toward you at upwards of 90 mph (while breaking several inches one direction or another) is not far off from what the hitter is doing, after all — and that’s been called the hardest job in sports. If anything, my studies of the strike zone have impressed upon me that the performance of the umpires is by-and-large nearly superhuman. In the presence of screaming fans and clever athletes doing everything possible to deceive, intimidate and otherwise coerce them, the men in masks quietly and precisely manage to call balls and strikes with extremely impressive fidelity. That they are occasionally swayed on one pitch in a hundred by this or that factor does little to diminish their core competency.