27 April 2012

Saber Metrics and Baseball Being Late to Modernism

In a case to being late to the party FanGraphs (a leading baseball statistics company) published "Power Rankings" this week in which they listed the Kansas City Royals as the 4th best team in Major League Baseball.

The Royals are currently 5 / 14 on the season, and I don't see them getting a hell of a lot better any time soon.  Do I know how the Royals will preform moving forward?  Of course not.
The FanGraphs analysis was based upon the WAR (wins above replacement) statistic and nothing else.  Basically the case can be made that if you add up how many wins each player on the roster will add (or lose) compared to what that player would be replaced with if they get hurt, retire, etc. the Royals come up with the 4th highest number. 

This analysis completely misses the point that baseball is essentially a sequence of independent statistical events, in which the sample size (even over an entire season) often is too small to regress to the mean.  On top of this, there is enough variation in the random noise to mess up even a large sample.

I won't make the case for returning to just "watching the game".  Statistical analysis has a place in the modern baseball.  The thing is, that many "sciences" including the dismal one that I study for a living, have to some degree accepted the limits of statistical analysis and moved onto dialectical modeling and epistemology.  Baseball is 30+ years late to this party (which is no great surprise).

In that vein, the overdetermination of any specific independent event is something that can never be captured.  Everything from relationships with domestic partners, to fan noise, to seasonal allergies, to direction of the wind, impact individual player performance in any given event.  Not to mention the dialectical relationship of prior events in the same game, season, career, impacting the present, and future events. Causality is complex and omnidirectional in baseball and otherwise.  Previous statistical performance is just an, arguably small, part of the story of any given event. 

Essentially statistics can provide a jumping off point for baseball analysis that provides a more "scientific" way for teams to spend money and manage players, but statistics can never capture the complexity of the game and all its variation.  A reason to keep watching I suppose?  But the Royals are NOT the 4th best team in baseball and any claim to the contrary is based upon analysis so flawed that it does nothing useful other than to motivate people such as myself to write.  

1 comment:

Anonymous said...

Hi, I don't follow baseball stats that closely - Australian football is my code - but in he data as you describe them, I'm not so sure that it follows that the Royals are 4th best. Surely it means 4th best relative to their budget - for the benchmark is not a given player relative to the best player, but relative to an affordable (ie same priced) player. That's a very different calculation. It says the Royals are performing above what you'd expect; not that they are performing well in an absolute sense.

As Marxists, I think we should love these sorts of data - or way of thinking, at least. It gives us a glimpse of what Marx meant by 'capital'.