What the search for the next “Moneyball” really looks like

Outside of major religious texts, has there been a book interpreted in more ways (and in more ways to serve the person speaking’s interests) than Moneyball? The now-iconic book about the 2002(!!!) Oakland A’s and their still-incumbent General Manager Billy Beane has become both a movie and a slang term for using advanced math to evaluate… anything really, whether baseball or athletics in general are even involved.

It’s not often that baseball books transcend the sport itself. While classic baseball books like Summer of ’49 or Ball Four are popular and worth the read well after their publication, they appeal mostly to people who are already baseball fans. Moneyball was a couple of orders of magnitude larger. Moneyball has become a cultural touchstone, both within the game of baseball and outside of it. There are few books in the American canon that such a large number of people have an opinion on, regardless of whether they’ve actually read it. Moneyball has become one of those books.

In some sense, the transcendence of Moneyball rests in its origins. There’s a case to be made that Moneyball, penned by veteran Wall Street writer Michael Lewis, was never intended as a baseball book, but as a business book that just happened to be looking at a baseball team as its case study. Business thinkers could read it at the beach. Its message that even a company with a structural competitive disadvantage could propser by thinking creatively is a something of a genre staple. It’s hardly novel advice, nor is it very helpful to someone who is on some other structurally disadvantaged team or startup or company that isn’t the Oakland A’s. “Be smarter and come up with better ideas” is good advice, but how does one actually… y’know… do that?

The beauty of Moneyball was that there was sort of an answer to that question in the book. Teams (or companies) could harness the then-nascent power of what has become known as “big data” to figure things out. They could take a cold, rational approach to evaluating their situation and let the numbers guide their actions. Part of the problem with conducting systematic research over the years is that collecting systematic data was often hard and/or cost-prohibitive to do. Baseball was no exception. There were encyclopedias of “classic” player statistics and a few on-line portals where one could go to find information, but at the turn of the century, it was a huge undertaking to even run the most rudimentary study, the kind that a 17-year-old kid with a $500 laptop and some programming skills could now crank through in a few hours.

Teams slowly, but surely jumped on the bandwagon. There’s a bit of a misperception about when teams actually began using data-driven decision-making. The A’s certainly did not start using numbers only in 2001 and before then, there were other teams which did have official and unofficial “stat people” some going back a couple of decades. But still, in 2001, the question was whether a team had a statistical wizard on staff. Now, the question is “What role does does that wizard play?” But if the statistically enlightened have been in front offices for a number of years, why did it take until the A’s before someone leveraged all that work into actual baseball success? Or at least until someone wrote a book about leveraging all that work into actual baseball success?

And herein lies the dirty little secret of big data. Just assembling a big data set does not give you much in the way of answers. The point of Moneyball was not that the A’s had a big database. The point was that the A’s asked different questions of their database than anyone else was asking. As my father is fond of saying, the problem isn’t in the engine. It’s the nut behind the wheel. The much-discussed use of on-base percentage (OBP) as an evaluation tool for players, a doctrine that is now passé among most front-office types, came from an insight into the nature of the most commonly used stat to assess players at the time, batting average. On-base percentage was well-known in 2002. It was just that nobody paid it much attention.

Consider for a moment the line “Smith was 1 for 3 with a walk.” We recognize that when Smith’s batting average is computed, today’s contribution will be one hit, three at bats, and that walk never happened. It’s an appendix to the game which for some reason is being surgically removed from the record. The reason is that a long time ago, the keepers of such stats decided that a batter deserved no moral credit for a walk. The pitcher was the one who missed with the four balls. Therefore, the batter should get no credit for being passive. That bit of moralizing gave us batting average as we knew it, and because very few people had the capacity to formally challenge that thinking, it stuck. To this day, television broadcasts list a player’s batting average, rather than his on-base percentage, when he comes to the plate, because… that’s what they’ve always listed.

But let’s go back to that phantom walk. Stepping aside from whether Smith deserves any credit for that walk in any moral sense (he did have the good judgment to not swing at those pitches), we know that there are players who are skilled in getting on base by the walk and if nothing else, a walk is not an out. And given the underlying question that batting average seems to be asking (How often does Smith make an out?) OBP seems to make more sense as an answer to the question that’s being asked. It’s not that OBP is a perfect stat (it counts a single the same as a home run), nor should everyone strive to walk all the time. The victory that the A’s scored in Moneyball was asking the question “Why are we pretending that these walks don’t count for anything?” The only thing that their database did was to allow them to test their theory against the available data.

It’s the same basic story for two of the biggest statistically-fueled trends in baseball now: the emphasis on catcher “framing” (the ability of some catchers to “frame” certain borderline pitches so that the umpire will call a strike, rather than a ball) and everyone’s favorite bogeyman, the infield shift. The infield shift, like on-base percentage, wasn’t really a new thing. Teams have had spray charts on opposing hitters for years and fielders would often shift a step or two in either direction, depending on what the scouting report told them. But for too long, people were stuck on the idea that there needed to be two infielders to the left of second and two to the right. Once teams broke out of that limitation, the data simply gave them the confidence that for some hitters, overshifting is a good idea in the long run and that they would convert more balls into outs, which is the whole point of the exercise. With catcher framing, it’s long been known that there are edges to the strike zone and that catchers who are “quieter” in their mechanics seem to get more calls. It makes sense. The umpire is only human, and if the catcher’s glove is moving, it will influence where he thinks the ball crossed the plate. The data only provided teams with the ability to show that this is true and figure out who was good at it. It turns out that the ability to frame pitches is one of the most powerful forces in the game. A player like Yadier Molina, already well-respected as a catcher (and dearly missed right now by Cardinal fans) is actually worth far more than we ever might have imagined just because of his ability to frame pitches. We didn’t used to know that because the data wasn’t available, but we already had an inkling that something might be going on.

That’s what the search for the “next Moneyball” really looks like. Yes, it involves men and women hunched over keyboards looking at spreadsheets. But what I think people misunderstand is that the math isn’t the point. Despite what the Brad Pitt movie version of Moneyball might have said, there is no big divide between the “real” baseball people (the “scouts”) and the statistical analysts. A database is just a tool to explore an idea, so while I do spend a good amount of time running numbers, I’m powerless unless I have a good idea backing me up. The more time I spend talking to people who live and breathe the game of baseball, the more fertile the soil is for ideas to grow (or alternately, the more times someone says to me, “You know what I think…” and I respond with “That’s kinda interesting…”) It’s not that all of those ideas will be true, it’s that one of them might be and it might be a game changer. Or just a tiny little edge. Every little bit helps. Right now (well, if you’re reading this at 4:00 am, maybe not right now), teams are all searching for that magic idea that gives them a leg up. Whether that comes from an insight from a scout watching a game or a coach noticing something about a player, if it’s a good idea, it’s welcome. Statistical analysis is just another frontier in that search.

Often, I’ll hear people put down the idea of statistical analysis in baseball. Often they have reasonable points, as I’ll also hear analysts oversell what they can do. Numbers can’t fully measure everything about a player or the game (nor should any responsible analyst make such a claim), but they certainly can uncover useful information and some of that useful information is on display at your local ballpark. Even if numbers can’t measure things precisely, they can at least give us some of the outline of the answer. But ultimately, using statistical analysis in the game of baseball is just an extension of the desire to look for another new idea or a competitive edge. It’s people sitting around trying to crack the code. We just do it using a spreadsheet.