Looking at baseball projection systems

PECOTA says Boston will win the AL East. Marcel says Albert Pujols
will hit 34 homers. Bill James says Cole Hamels will bounce back.
ZiPS says Stephen Strasburg will notch a 4.18 ERA. CHONE says Joe
Mauer will bat .332 …

Huh? What are these things, and why are they speaking so
authoritatively on the upcoming MLB season?

They are projections systems (well, Bill James is actually a
very smart human being, but he has his own projection system), and
they’re using their various inputs and methodologies to predict
what will happen in 2010.

Do they yield perfect forecasts? Of course not. Are they
almost always more accurate than what you or I could come up with?
Yep. After all, there’s far more to it than saying “A-Rod should be
good for 30 spanks and 100 RBI this season.”

These systems use things like platoon splits, ballpark data,
groundball percentages, line-drive rates, strikeouts, unintentional
walks, aging patterns, league environment, pitch location and pitch
speed data, and so on and so on. It’s a science, if occasionally an
inexact one.

So whether you’re a roto player or just a fan who’s fondly
dreaming of Opening Day, projection systems are worth your while.
(And the best part is that almost all of them are available for
free.) What follows, then, is a brief introduction to some of the
notable systems that are out there.


Dan Szymborski (ZiPS standfs for “sZymborski Projection
System”) of the wonderfully addictive
Baseball Think
concocted this particular system. Here’s how Szymborski
described ZiPS in a recent interview with the Web site Mets360.com:
“ZiPS uses the recent past for a player and tries to find similar
players at roughly the same age.” ZiPS also forecasts defensive
ratings. For instance, Omar Vizquel, who’ll turn 43 in late April,
still rates as “excellent” at shortstop. ZiPS projections for the
2010 season can be found, free of charge,

Sample prediction: According to ZiPS, Prince Fielder, who
drove in a career-high 141 runs last year, will drive in 142 this


This is Baseball Prospectus’
proprietary engine (subscription required), and
the man behind the curtain is Nate Silver of Fivethirtyeight.com,
the Web site that analyzes polling and political data. PECOTA
(Player Empirical Comparison and Optimization Test Algorithm), like
most other systems, forecasts production and playing time by using
a player’s performance history and by examining players deemed to
be similar. Also, PECOTA uses what’s called “phenotypic
attributes,” a category that includes things like a player’s height
and weight. PECOTA takes all these data and the resulting player
forecasts and also makes predictions on the team level.

Sample prediction: PECOTA says the playoff teams in 2010 will
be the Phillies, Cardinals, Rockies, Braves, Red Sox, Rangers and
Yankees. Why just seven teams? PECOTA foresees a three-way tie for
first in the AL Central …


Tom Tango’s Marcel system is as simple as these things get.
It’s so named (after the monkey on “Friends”) because it requires
nothing more than a monkey’s grasp of the system (not quite true,
but the point is made). Tango describes this system thusly: “It
uses three years of MLB data, with the most recent data weighted
heavier. It regresses towards the mean. And it has an age factor.”
You can download the Marcel spreadsheet for 2010

Sample prediction: Marcel tabs David Wright for 19 home runs
in 2010.

Bill James

The master of the statistical movement is still cranking out
projections. Actually, he’s doing hitter projections (based on past
performance, age, home park, and expected playing time), and
Baseball Info Solutions (BIS) is doing pitcher projections on
James’ behalf. To arrive at their numbers, BIS uses eight years of
data (when possible, of course) with a particular focus on the last
three. As well, they project innings based on how the pitcher was
being deployed late in the most recent season. You can find the
Bill James projections at Fangraphs.com. Just type in a player’s
name and click on the “Projections” link near the top of the page.

Sample prediction: According to James (BIS), Tim Lincecum
will strike out 261 batters for the second year in a row.


To churn out his CHONE projections, Sean Smith (an Angels
fans who named his system after former Halo Chone Figgins) uses
four years of weighted numbers, runs a regression, adjusts for age,
and then makes some common-sense tweaks to the playing time
forecasts. You can access his numbers
or, once again, on the Fangraphs player pages.

Sample prediction: According to Chone, the Rockies and
Dodgers will tie for the NL West crown.

The fans

Ah, the wisdom of crowds … Fangraphs.com busts out the
democracy by letting you have your say. If you’re a registered
user, then you can fill out a projections ballot. All the fan
projections are then aggregated

Sample prediction: Ryan Howard will lead the majors with 46
home runs, sayeth the Fans.

Usually, these systems are in general agreement on most
players. That’s why eyeballing what each system has to say will
give you a good grasp of how players should fare. Occasionally,
though, the systems diverge. For instance, Bill James (BIS) says
Adam Wainwright will have a 3.64 ERA this season, while the Fans
think his ERA will be a much stronger 3.18.

So poke around, dominate your draft, or otherwise just whet
your appetite for warmer weather and the crack of bats.