Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.

Disclaimer: there are as many ways to evaluate projections as there are to create them. This is a SQuiD (Semi Quick-n-Dirty) method that involves looking at some basic descriptive statistics.

I was able to find access to eight projection systems that are either publicly available or I have a subscription to of some kind. These were: PECOTA, Sean Smith’s CHONE, Dan Szymborski’s ZiPS, Tango’s Marcel, and the projections from The Hardball Times (THT), ESPN Fantasy, Rotowire, and RotoTimes, respectively.

The metric of choice is OPS for all hitters who had at least 250 plate appearances and received a projection in at least six of the eight forecasting systems. If the player was missing from three or more projection systems, he was thrown out. If a player was missing from one or two projection systems, he was assigned a .750 OPS in the systems where he was absent. Generally there were only a couple of these cases per system.

All of the projection systems missed high on league batting norms except THT and CHONE (which did miss high, but by a trivial amount). This is probably to be expected since offensive levels declined a bit from 2006. On the other hand, maybe THT and CHONE saw something coming that the other systems didn’t; I don’t know.

The projection systems break down into basically three tiers in terms of Standard Deviation (StDEV). Marcel and CHONE were conservative, with a lot of regression to the mean built in. The three systems from the roto-oriented entities (RotoWire, RotoTimes, ESPN) were more aggressive, and had less regression to the mean. Meanwhile, PECOTA, ZiPS, and THT were middle-of-the-road.

“Corr/Avg” is the correlation with the average projection from all eight systems. This tells us how unique a projection system was — how much it was guessing differently (for better or for worse) than the other systems. ESPN was the most unique, followed by PECOTA and Marcel. ZiPS and RotoTimes were the least unique.

PECOTA is out in front, but by such a trivial margin that it should basically be considered to be in a tie with ZiPS. CHONE and Marcel are next in line.

THT and CHONE gain some ground here because they came closer to predicting league average offensive levels, which shows up in a metric based on average error but didn’t in correlation coefficient. Overall, we have a 3-way tie between PECOTA, CHONE, and ZiPS, with Marcel and THT quite close.

Pretty much the same order, although PECOTA falls a trivial amount behind CHONE and ZiPS.

Finally, my favorite metric, which is based determining which systems give us the best information. Specifically, what I’m doing here is throwing all the forecasts into a regression analysis and determining which ones contribute the most to the forecast bundle. This is basically a combination of how accurate a forecasting system is and how unique it is.

The three systems that give you the most positive information are PECOTA, ZiPS, and (somewhat surprisingly) ESPN in that order. In other words, if you had our projections and some of the other projections, the ideal blend would be 5 parts PECOTA, 4 parts ZiPS, and 3 parts ESPN. You could also add in 2 parts of Marcel without hurting yourself. The other projection systems don’t really tell you anything … they might be perfectly fine systems, but they don’t give you any unique information. (Actually, you could almost do better by adding in a NEGATIVE weight from the RotoTimes projections, but that result is not statistically significant).

So, another good year from PECOTA, certainly a good year from ZiPS — Dan does excellent work. I think we can call those two co-champs, but several of the other systems weren’t far behind. We’ll repeat this exercise for pitchers at some point within the next week or two.