Looking at baseball projection systems

PECOTA says Boston will win the AL East. Marcel says Albert Pujols

will hit 34 homers. Bill James says Cole Hamels will bounce back.

ZiPS says Stephen Strasburg will notch a 4.18 ERA. CHONE says Joe

Mauer will bat .332 …

Huh? What are these things, and why are they speaking so

authoritatively on the upcoming MLB season?

They are projections systems (well, Bill James is actually a

very smart human being, but he has his own projection system), and

they’re using their various inputs and methodologies to predict

what will happen in 2010.

Do they yield perfect forecasts? Of course not. Are they

almost always more accurate than what you or I could come up with?

Yep. After all, there’s far more to it than saying “A-Rod should be

good for 30 spanks and 100 RBI this season.”

These systems use things like platoon splits, ballpark data,

groundball percentages, line-drive rates, strikeouts, unintentional

walks, aging patterns, league environment, pitch location and pitch

speed data, and so on and so on. It’s a science, if occasionally an

inexact one.

So whether you’re a roto player or just a fan who’s fondly

dreaming of Opening Day, projection systems are worth your while.

(And the best part is that almost all of them are available for

free.) What follows, then, is a brief introduction to some of the

notable systems that are out there.


Dan Szymborski (ZiPS standfs for “sZymborski Projection

System”) of the wonderfully addictive

href="http://www.baseballthinkfactory.org/">Baseball Think

Factory concocted this particular system. Here’s how Szymborski

described ZiPS in a recent interview with the Web site Mets360.com:

“ZiPS uses the recent past for a player and tries to find similar

players at roughly the same age.” ZiPS also forecasts defensive

ratings. For instance, Omar Vizquel, who’ll turn 43 in late April,

still rates as “excellent” at shortstop. ZiPS projections for the

2010 season can be found, free of charge,



Sample prediction: According to ZiPS, Prince Fielder, who

drove in a career-high 141 runs last year, will drive in 142 this



This is Baseball Prospectus’

target="new">proprietary engine (subscription required), and

the man behind the curtain is Nate Silver of Fivethirtyeight.com,

the Web site that analyzes polling and political data. PECOTA

(Player Empirical Comparison and Optimization Test Algorithm), like

most other systems, forecasts production and playing time by using

a player’s performance history and by examining players deemed to

be similar. Also, PECOTA uses what’s called “phenotypic

attributes,” a category that includes things like a player’s height

and weight. PECOTA takes all these data and the resulting player

forecasts and also makes predictions on the team level.

Sample prediction: PECOTA says the playoff teams in 2010 will

be the Phillies, Cardinals, Rockies, Braves, Red Sox, Rangers and

Yankees. Why just seven teams? PECOTA foresees a three-way tie for

first in the AL Central …


Tom Tango’s Marcel system is as simple as these things get.

It’s so named (after the monkey on “Friends”) because it requires

nothing more than a monkey’s grasp of the system (not quite true,

but the point is made). Tango describes this system thusly: “It

uses three years of MLB data, with the most recent data weighted

heavier. It regresses towards the mean. And it has an age factor.”

You can download the Marcel spreadsheet for 2010


Sample prediction: Marcel tabs David Wright for 19 home runs

in 2010.

Bill James

The master of the statistical movement is still cranking out

projections. Actually, he’s doing hitter projections (based on past

performance, age, home park, and expected playing time), and

Baseball Info Solutions (BIS) is doing pitcher projections on

James’ behalf. To arrive at their numbers, BIS uses eight years of

data (when possible, of course) with a particular focus on the last

three. As well, they project innings based on how the pitcher was

being deployed late in the most recent season. You can find the

Bill James projections at Fangraphs.com. Just type in a player’s

name and click on the “Projections” link near the top of the page.

Sample prediction: According to James (BIS), Tim Lincecum

will strike out 261 batters for the second year in a row.


To churn out his CHONE projections, Sean Smith (an Angels

fans who named his system after former Halo Chone Figgins) uses

four years of weighted numbers, runs a regression, adjusts for age,

and then makes some common-sense tweaks to the playing time

forecasts. You can access his numbers


or, once again, on the Fangraphs player pages.

Sample prediction: According to Chone, the Rockies and

Dodgers will tie for the NL West crown.

The fans

Ah, the wisdom of crowds … Fangraphs.com busts out the

democracy by letting you have your say. If you’re a registered

user, then you can fill out a projections ballot. All the fan

projections are then aggregated



Sample prediction: Ryan Howard will lead the majors with 46

home runs, sayeth the Fans.

Usually, these systems are in general agreement on most

players. That’s why eyeballing what each system has to say will

give you a good grasp of how players should fare. Occasionally,

though, the systems diverge. For instance, Bill James (BIS) says

Adam Wainwright will have a 3.64 ERA this season, while the Fans

think his ERA will be a much stronger 3.18.

So poke around, dominate your draft, or otherwise just whet

your appetite for warmer weather and the crack of bats.