Advanced CFL Stats: pythagorean wins

Showing posts with label pythagorean wins. Show all posts

Friday, September 13, 2013

Pythagorean Stats since 1990

Pythagorean (py) wins (what are py wins?), and Big Wins are interesting stats, but they don't tell us much on their own. In order to put perspective on them, it's necessary to look at historical data, and see if they have useful, or even any, connection with past seasons.

Py Wins As a Prediction Model

The idea behind the pythagorean expectation formula is that points for and against provide a better indication of team quality than actual wins and losses, and that over time, teams which significantly over or underperform their expectation tend to regress or improve back to expectations. NFL and MLB statisticians use historical data to provide perspective on what kind of regression or improvement a team a team is likely to show in the next season, or even half season. I now have data dating back to the 1990 season, which I can use to gather the same data (in the future I will look at pre-1990 seasons, but I expect that as you go back in time, the changes to the game will start to hurt the accuracy of our current data):

Over the past 182 seasons (that's 1 season per team since 1990, including Ottawa twice and the failed American teams), you can see how many teams finished above or below expectation, and how they did in the following season. Seasons where the team was not in the league the following year have been removed from the table and chart. The 2012 and 2013 seasons are also not yet included, as they have no follow up season to analyse.

As you can see, the majority of seasons fall into a range quite near to expectation: 41 of 182 fall between -0.5 and 0.5, and 100 between -1 and 1. That's pretty good; 55% of teams finish within 1 win of expectation, and less than only 28 times in 31 years has a team missed expectations by more +/- 2 wins.

In the ranges where we have more data, the chart follows the line you would expect; teams which miss expectations tend to turn it around the following year, while teams which surpass them end up with a few less wins the next year.

There are of course some outliers in the data at the outer edges where we have poor sample sizes. In 1997, Montreal finished a full 4.5 wins above expectation, winning 13 games despite a -23 point differential. Defying the expectations, they won another 12 games in 1998, finishing another 2.7 wins up on expectations. On the other end of the spectrum, we have the 2010 Blue Bombers, finishing 4.5 wins below expectation. They had extraordinarily bad luck that year, winning only 4 games despite a point differential better (-21) than those '97 Als. The next year, Winnipeg won 10 games and made it to the Grey Cup.

Neither one of these examples gives us a good idea what to expect when a team is so far above or below expectation, simply because it's so uncommon. Were the Bombers lucky to turn it around? Were the Als lucky to avoid regression? I think the latter is likely the case based on the ranges where we do have more data, but no one can say for sure.

All in all, I'm comfortable with saying now that as with other sports, Pythagorean Expectation is a good way to predict future performance in the CFL.

Monday, September 9, 2013

Advanced CFL Stats - Week 11

The week is over, so it's time for more stats.

This week the Riders got a little less lucky, the Bombers got a win in the new stadium, and the Eskimos just can't buy a break.

I stream-lined the chart a bit this week and it's presented in a slightly different format, as my stats are now in a database instead of a spreadsheet, so I can store more and do cooler things, like:

Big Win Percentage.

Big Win Percentage is a simple stat, created by Jim Glass. It's based on the premise that football by nature is a game that can be heavily influenced by luck. A bad call, a fumble recovery, a gust of win; these are all things which can turn a close game into a win or a loss. According to Brian Burke (the guru of NFL stats), the outcome of more than 40% of NFL games is determined by random chance. This makes judging a team by it's record a difficult proposition (especially in the NFL, where teams don't play every team in the league).

What Mr. Glass's formula does it try to account for that luck by giving teams credit for "Big Wins", defined as a game decided by 9 or more points. 9 points makes a good cut off because it is the border between 1 and 2 possession games.

The formula is simple - games won by 9+ points count as a "Big Win", games lost by 9+ points are considered a "Big Loss", and all the rest are considered ties. If you read the article linked above, you'll see that he's found that teams with a high number of "Big Wins" in a season tend to fare much better in the playoffs. We'll see if that holds true for the CFL (I'm compiling data back to 1990 for a post later this week), but in the mean time, I'm going to include it on the chart for this week.


Py W = Pythagorean Wins, Projected = Py Wins over 18 games

The Riders remain the best team in the league based on Py Expectation, but they are no longer considered the luckiest team in the league, that honour now goes to Calgary. Edmonton remains the unluckiest team so far, nearly 3 wins below expectation. Winnipeg, despite a win over the Riders this week, still sits at the bottom, though they are still considered unlucky by the formula.

Coming soon...

As noted above, I've been collecting data, back to 1990 so far. I plan to do a post to highlight some of the interesting points once I have a bit more information gathered.

- Mike

Friday, September 6, 2013

CFL Pythagorean Wins

I'm a big believer in statistics and analysis when it comes to sports. As noted by some on /r/cfl previously, there is a significant lack of advanced stats for the CFL. I'm not a statistician, nor do I have charting stats for each any every game like the NFL stats sites, so there are definite limits on what I can provide, but one stat I can calculate easily is Pythagorean Wins.

Bill James created the formula for baseball years ago, and it's been modified to better suit the NFL since then. Obviously the CFL is not the NFL, but the season is of similar length and scoring numbers are also in the same ball park, so I believe the stat should apply fairly well to our league. Down the line I will look at some past seasons and see if I can determine how well (or poorly) it actually does work.

The formula itself is based on the idea that not all wins are created equal, and that point differential is actually a better indicator of future winning percentage than actual wins and losses. When applied to NFL games, the stat is a good indicator of future performance, both for future seasons, and second halves of the same season.

For a more detailed explanation from someone much smarter than I, check out Bill Barnwell's explanation on grantland.com.

With all of that said, we are at the half way point of the CFL season, so this is a perfect time to run the numbers on the first half and see what they might tell us.


Legend P-W%: Pythagorean Winning Percentage, P-W: Pythagorean Wins, P W-L: Pythagorean Win-Loss, Diff: Difference between Py Wins and Actual wins, P-W-T: Pythagorean Win Total (projected over 18 games)

By the numbers, Saskatchewan and BC are the luckiest teams of the first half, while Edmonton and Winnipeg are the unluckiest. Despite being the luckiest team, the formula still believes that the Riders are the best team in the league, while Edmonton has been particularly unlucky, performing almost 2.5 wins below expectation.

Teams which over or under perform the formula by a wide margin tend to fall back or climb closer to their expected win total as the season progresses, so according to Pythagoras, both Edmonton and Winnipeg fans should have some hope that their team will rebound slightly in the second half. That said, there aren't many surprises here, other than some shuffling in the middle. The formula believes that Toronto is slightly better than BC (but clearly isn't aware that Ricky Ray is injured), and that Montreal is slightly worse than Hamilton.