Wednesday, June 18, 2014

Finding a New Magic Number

In the formula for Pythagorean Expectation, a magic number exists.

Ok, it's not really magic, rather, it started from an assumption made by a very smart man ("2 would be a good number") and ended with rigorous scientific testing by even more very smart people ("1.83 is actually a better value for baseball").

When I started this project, I knew that different sports use different exponents, that more scoring means a higher exponent, and that the CFL has more scoring than the NFL.  Unfortunately, as I was just beginning to collect data, I had no way of determining what the best exponent for the CFL would be.  In the end, after looking at the values used for various sports (MLB = 1.83, EPL= 1.30, NHL = 2.15, NFL = 2.37, NBA = 13.91), I decided the gap between the CFL and NFL was probably small enough that the known exponent for the NFL was likely good enough to be useful for my calculations.

Now, however, I have data going back to 1990, and after some prompting from a gentleman from Hamilton, I realized it would be prudent to go back and do the math.

Based on some research, I settled on the method outlined here (external link), which calculates a value for lacrosse.  The calculation itself is fairly simple:

1) Find the expected win total using the Py Expectation formula, and subtract it from the actual win total.
2) Square that value.
3) Calculate this value for every team in every year that I have data for.
4) Add up all the values.
5) Find the square root.

This leaves me with the root-mean-square error (RMSE) for the data using whichever exponent I used in step 1. All that's left at this point is to run the calculation with a range of exponents to determine which results in the lowest RMSE.

Thanks to Bill Barnwell and the others who have already done these calculations for the NFL, I had a reasonable clue as to where the exponent would fall, so I calculated the RMSE for 2.00 through 5.00, increasing by 0.01 each time.

As expected, the value came out higher than the NFL, but not by much:


The most accurate value of the bunch is 2.74 (raw RMSE data), with an error rate approximately 3% lower than the original 2.37 exponent.

So what does this all mean?

Good question. For starters, it means that going forward, I will be using 2.74 for future calculations.  At some point, I will also go back and revise some of the posts discussing historical data to improve the accuracy.  I will not go back and alter the data for 2013, as they were simply to provide a week by week run down, and there would be limited value in correcting the data at this point.

So there you have it: 2.74, my new favorite number.

No comments:

Post a Comment