PECOTA says Boston will win the AL East. Marcel says
Albert Pujols will hit 34 homers. Bill James says
Cole Hamels will bounce back. ZiPS says
Stephen Strasburg will notch a 4.18 ERA. CHONE says Joe Mauer will bat .332 ...
Huh? What are these things, and why are they speaking so authoritatively on the upcoming MLB season?
They are projections systems (well, Bill James is actually a very smart human being, but he has his own projection system), and they're using their various inputs and methodologies to predict what will happen in 2010.
Do they yield perfect forecasts? Of course not. Are they almost always more accurate than what you or I could come up with? Yep. After all, there's far more to it than saying "A-Rod should be good for 30 spanks and 100 RBI this season."
These systems use things like platoon splits, ballpark data, groundball percentages, line-drive rates, strikeouts, unintentional walks, aging patterns, league environment, pitch location and pitch speed data, and so on and so on. It's a science, if occasionally an inexact one.
So whether you're a roto player or just a fan who's fondly dreaming of Opening Day, projection systems are worth your while. (And the best part is that almost all of them are available for free.) What follows, then, is a brief introduction to some of the notable systems that are out there.
Dan Szymborski (ZiPS standfs for "sZymborski Projection System") of the wonderfully addictive
Baseball Think Factory concocted this particular system. Here's how Szymborski described ZiPS in a recent interview with the Web site Mets360.com: "ZiPS uses the recent past for a player and tries to find similar players at roughly the same age." ZiPS also forecasts defensive ratings. For instance, Omar Vizquel, who'll turn 43 in late April, still rates as "excellent" at shortstop. ZiPS projections for the 2010 season can be found, free of charge,
Sample prediction: According to ZiPS,
Prince Fielder, who drove in a career-high 141 runs last year, will drive in 142 this season.
This is Baseball Prospectus'
proprietary engine (subscription required), and the man behind the curtain is Nate Silver of Fivethirtyeight.com, the Web site that analyzes polling and political data. PECOTA (Player Empirical Comparison and Optimization Test Algorithm), like most other systems, forecasts production and playing time by using a player's performance history and by examining players deemed to be similar. Also, PECOTA uses what's called "phenotypic attributes," a category that includes things like a player's height and weight. PECOTA takes all these data and the resulting player forecasts and also makes predictions on the team level.
Sample prediction: PECOTA says the playoff teams in 2010 will be the
Rangers and Yankees. Why just seven teams? PECOTA foresees a three-way tie for first in the AL Central ...
Tom Tango's Marcel system is as simple as these things get. It's so named (after the monkey on "Friends") because it requires nothing more than a monkey's grasp of the system (not quite true, but the point is made). Tango describes this system thusly: "It uses three years of MLB data, with the most recent data weighted heavier. It regresses towards the mean. And it has an age factor." You can download the Marcel spreadsheet for 2010
Sample prediction: Marcel tabs
David Wright for 19 home runs in 2010.
The master of the statistical movement is still cranking out projections. Actually, he's doing hitter projections (based on past performance, age, home park, and expected playing time), and Baseball Info Solutions (BIS) is doing pitcher projections on James' behalf. To arrive at their numbers, BIS uses eight years of data (when possible, of course) with a particular focus on the last three. As well, they project innings based on how the pitcher was being deployed late in the most recent season. You can find the Bill James projections at Fangraphs.com. Just type in a player's name and click on the "Projections" link near the top of the page.
Sample prediction: According to James (BIS),
Tim Lincecum will strike out 261 batters for the second year in a row.
To churn out his CHONE projections, Sean Smith (an
Angels fans who named his system after former Halo Chone Figgins) uses four years of weighted numbers, runs a regression, adjusts for age, and then makes some common-sense tweaks to the playing time forecasts. You can access his numbers
here, or, once again, on the Fangraphs player pages.
Sample prediction: According to Chone, the Rockies and Dodgers will tie for the NL West crown.
Ah, the wisdom of crowds ... Fangraphs.com busts out the democracy by letting you have your say. If you're a registered user, then you can fill out a projections ballot. All the fan projections are then aggregated
Ryan Howard will lead the majors with 46 home runs, sayeth the Fans.
Usually, these systems are in general agreement on most players. That's why eyeballing what each system has to say will give you a good grasp of how players should fare. Occasionally, though, the systems diverge. For instance, Bill James (BIS) says
Adam Wainwright will have a 3.64 ERA this season, while the Fans think his ERA will be a much stronger 3.18.
So poke around, dominate your draft, or otherwise just whet your appetite for warmer weather and the crack of bats.