Evidence that 'clutch' actually might exist

Evidence that 'clutch' actually might exist

Updated Mar. 4, 2020 9:33 p.m. ET

I know, I'€™m not supposed to, but I believe in clutch hitting.

By clutch hitting, I mean that certain players have some sort of ability to perform better in higher leverage situations. Leverage, for the uninitiated, is a concept formalized by sabermetrician Tom Tango. We know that some situations in a game are more important than others. When it'€™s 15-1, no one cares what happens in a plate appearance. When it'€™s the bottom of the ninth with runners on second and third, two outs, and the home team is down by one, pretty much the entire game rides on this at-bat. Leverage index is a mathematical model of how much more important that late-game situation is.

Leverage is based on the idea of win probability. We can look at each game situation (let'€™s say, bottom of the third, one out, runner on first, and the home team down by two) and figure out over some past time frame how often the home and visiting team won. More to the point, we can figure out how much that win probability can change based on whatever is about to happen next. In the 15-1 situation, whatever the batter does is going to move the needle very little. In the bottom of the ninth, the win probability could go from roughly 50-€“50 to 100-€“0 in a hurry. When a batter does something positive that increases his team'€™s chances of winning, we give him credit for adding win probability (even if giving him all the credit is silly). In a high-leverage situation, a batter can accumulate a lot of win probability in a single at-bat.

The standard test for whether there is such a thing as clutch hitting has been to look at the win probability that a player records over the course of a season and compare it to what his win probability would have been in situations where the leverage index was 1. (This is the basis of how our friends at FanGraphs calculate clutch.) From season to season, players show very little correlation on this measure of clutch. In general, the interpretation has been "€œclutch doesn'€™t exist"€ rather than "€œwe had a poor measure of clutch to begin with."€ Indeed, I have found that this measure of clutch eventually does become reliable. It just takes a while. Maybe there is signal in all that noise; maybe we need a better antenna.


Warning! Gory Mathematical Details Ahead!

In the 2014 Baseball Prospectus Annual, I introduced the idea of looking a little more closely at individual players to see how they reacted to pressure situations. I examined how, for each player, the leverage of a situation affected his tendencies to swing at the first pitch. There'€™s a separate regression equation for Daniel Murphy, David Murphy and Donnie Murphy. Since every plate appearance has a first pitch and the count is always 0-0 when it happens, I'€™m able to hold a few things constant. But my program runs a logistic regression looking only at Daniel's at-bats and what he did in them, creates an equation describing his behavior, and then does it again for David, and again for Donnie.

I then took each equation and calculated the chances that each player would swing at a first pitch when the leverage index was 1 (average) and 2 (a situation twice as important as the average situation). Then, I subtracted the two and got a rough indicator of how high leverage began to affect a player (at least on this one behavior). I used a minimum of 250 plate appearances in a season and looked at players from 2009 to 2013. In the past, I'€™d found that clutch, as described above, had a year-to-year correlation of .074. (I used a method known as auto-regressive intra-class correlation.) For this group, across the five years, the ICC was .30. That'€™s not huge, but we call home runs a true outcome for pitchers with year-to-year correlations in the same neighborhood. I termed this difference between predicted first-pitch swing rate "€œswing difference."€ Some players swing a lot more when the leverage goes up. Some barely notice. A few start to freeze.

Next, I wanted to see if swing difference predicted changes in outcomes. For the years 2009 to 2013, I used the log-odds ratio method (which I have used multiple times before) to create a predicted percentage that each plate appearance would end in a strikeout based on the batter’s and pitcher’s usual rates in that area. I did the same for walks and singles and home runs and the rest of it. Next, I looked at all plate appearances in which a batter with 250 plate appearances in that season faced a pitcher with 250 batters faced in that season. I created a binary logit regression in which I had my predicted percentage of a strikeout (for the initiated, expressed in a log of the odds ratio), and then entered in the leverage index for each plate appearance, the swing difference stat for the batter and the multiplicative interaction of swing difference and leverage.

This type of analysis, called a moderator analysis, is well-suited to answering the "€œclutch question." If certain players have some sort of clutch factor (and here, we’re using swing difference as a rough measure of clutch) then as leverage increases, we would expect to see those who are higher on this clutch factor to show greater increases (or sharper decreases). That'€™s what the interaction term between swing difference and leverage does. If it'€™s significant, it means that as leverage goes up (or down), the effect it has will depend, at least in part, on that clutch factor.

What I found is that for hitters who show more of an effect on swing difference (leverage makes them swing at the first pitch more), they were less likely than expected to walk and less likely to strike out as leverage went up. Instead, they showed higher rates of both extra base hits and outs in play. To show some sense of how much of an effect this could have, here are the numbers for strikeout rate.

Let's say that our pitcher-batter matchup stats alone would suggest that the chances of a strikeout are 20 percent. Now, let'€™s take a look at what would happen in a situation that has a leverage value of 1, and compare a batter who has a swing difference of .10 (he swings at first pitches 10 percent more often in higher leverage situations than he does in medium-leverage situations) and a batter who has a swing difference of 0 (he swings equally in both situations). The values are the likelihood of a strikeout happening.

Swing Difference

  High Swing Difference No Swing Difference
Leverage = 1 19.3% 19.3%
Leverage = 2 17.7% 18.3%

In an average-leverage situation, the two hitters are about the same (they differ at the fourth decimal place), but once the leverage is turned up a bit, they get different results. Not by a lot, but it's there. You get the same basic effect sizes for the other outcomes.

Before we go further, the careful observer will note that there’s a certain tautology that goes along with these analyses. I think it doubles as both a feature and a bug. A batter who is more likely to swing at the first pitch in high-leverage situations is probably just more likely to swing in high-leverage situations. It'€™s no wonder he sees a drop in his expected walk rate (and in some sense his expected strikeout rate). And if we'€™re saying that his swing rate drops because of leverage (or at least in accordance with leverage), then it'€™s not surprising that the effect appears. We'€™ll talk about this more in a bit.

Clutch. Heart. Grit. Myocardial Infarction.

Let'€™s clear a few things up. Clutch is not a result of having superior moral character, notwithstanding the plot of every sports movie. It is also not a guarantee that a hitter will always come through. My contention is a much more reserved one. Clutch is likely some combination of ability to deal with pressure combined with some particular change in approach, whether conscious or unconscious, that results in slight variations from what we might otherwise expect. For some, that change makes a hitter better and in some it makes him worse.

These analyses may not completely prove that clutch ability exists, but they do lay what I hope is a foundation for how we might continue the search. "€œClutch"€ is a way of saying that the situation matters because players are human. What we have here is an indicator that has reasonable (if not great) consistency across years, and it explains differences between players in how leverage affects them. More searching might find something with more consistency. Even then, year-to-year consistency is not the only way to establish that a measure is reflective of a player'€™s true talent level. Using a more tracking-based approach might help. Players can and do change, even within a season. There'€™s no reason clutch needs to be an enduring trait, rather than a state we can detect with some reliability. The rest is simply showing that the factor, whatever it is, can explain some of the differences between players'€™ performances in different leverage situations.

As to these specific analyses, it might very well be that what'€™s driving things is that some players are looking at the sorts of relievers they face in high-leverage situations and saying, "€œWell, he usually comes right at me, so no point in messing around. I might as well swing when I see something interesting."€ It might not be a mystical force at work, but a very reasonable reaction to the circumstances. In that case, clutch isn’t even something psychological, but a mental skill. Still, there could be problems with multi-collinearity. What this might be showing is that some players swing more in high-leverage situations, and so we would expect them to take fewer walks, somewhat by definition. Then again, even knowing that information could have strategic value. Maybe when we have other data sets to work with, we might be able to look at measures of how leverage affects a player that aren'€™t based on game results.

The other piece of this, and it'€™s one that I tried to drive home in the piece in the Annual that started everything, is that knowing that a player swings more (or less) often in high-leverage situations might be good within the context of one skill set and bad within another. These analyses fall into the large-N trap that assumes that more swinging is better (or seems to be) for everyone. But if nothing else, I'€™d present these analyses as a way of re-opening what had been assumed to be a closed debate. Clutch hitting might just exist.