Advanced CFL Stats: 2014

Thursday, October 9, 2014

Rushing Success Rates - Week 15

[If you're not familiar with success rates, please see the original post here.]

Running Backs

Minimum 30 carries.

Quarterbacks

Minimum 10 attempts

Friday, September 19, 2014

Playoff Outlook - Week 13

It's week 13, every team has played every other team, and for some teams, there are only 6 games left on the schedule. Time to take a look at the playoff picture.

The Wild West

BC Lions

Current Record: 7-4

Games Remaining: 7

Opponent Win Percentage: 0.539 (5th hardest, easiest in West)

Road Games: 4

Division Games: 4

The bad news for the Lions is they drew one of the short straws this year, and have to face the Stampeders 2 more times this season. The good news is that the remaining 5 games come against teams with a combined record of 21-33, which would rank as the 3rd easiest opponent schedule. They currently sit in 4th position, but a strong finish and a road win in Edmonton could get them the 2nd seed.

Key games: Week 19 @ EDM

Projected finish: 10-8 (3-4), 3rd in West

Calgary Stampeders

Current Record: 10-1

Games Remaining: 7

Opponent Win Percentage: 0.570 (3rd hardest in league, 3rd easiest in West)

Road Games: 4

Division Games: 6

Can anyone beat the Stamps? With Bo Levi Mitchell done for a least the immediate future, the task looks a little less daunting, but with Drew Tate under center the Calgary quarterbacking situation isn't exactly hurting. The Stampeders have more West Division opponents on the schedule than any other team, but they are in the driver's seat with 7 games to go and no obvious losses on the horizon. Even if they go 0-3 on the road vs division rivals, a playoff bye as the #1 seed looks all but wrapped up.

Key game(s): Week 14 vs BC, Week 20 @ BC.

Projected finish: 15-3 (5-2), 1st in the West.

Edmonton Eskimos

Current Record: 8-3

Games Remaining: 7

Opponent Win Percentage: 0.558 (4th hardest in league, 2nd easiest in West)

Road Games: 4

Division Games: 5

The Eskimos are undefeated this year, if you don't count games against those pesky Stampeders. Fortunately for Edmonton, they won't face Calgary again unless it's in the playoffs. With 3 games against the quarterback-less Riders on the schedule, and a game in hand against BC, the Eskimos are have the second seed in their sights, but they will probably need another win against BC and at least 2 of 3 against the Riders.

Key game(s): Week 14 vs SSK, Week 17 @ SSK, Week 19 vs BC

Projected finish: 13-5 (5-2), 2nd in West

Saskatchewan Roughriders

Current Record: 8-3

Games Remaining: 7

Opponent Win Percentage: 0.632 (hardest in league)

Road Games: 3

Division Games: 5

The Riders sit in 3rd place currently (Edmonton holds the tie-breaker on points at the moment), but they need to find a quarterback in a hurry or they'll quickly find themselves on the outside looking in. Three games against a strong Edmonton team loom on the horizon, but two are at home, and a strong showing can propel them to a home playoff game in the West semis. The good news is that a season sweep of the Bombers gives them the 4th place tie-break if necessary, in a year where a possible cross-over looks appealing.

Key game(s): Week 14 @ EDM, Week 17 vs EDM, Week 20 vs EDM

Projected finish: 9-9 (1-6), 4th in West (cross-over)

Winnipeg Blue Bombers

Current Record: 6-6

Games Remaining: 6

Opponent Win Percentage: 0.609 (2nd hardest in West)

Road Games: 3

Division Games: 4

Three losses to the arch-rival Roughriders has crushed what looked like a very promising season for a surprising Blue Bomber team. With the second hardest schedule in the West and the fewest games remaining, the Bombers need some help from the teams they are chasing. While they can't win a tie-break with the Riders, wins against BC and Edmonton would still give them a chance.

Key game(s): Week 16 @ EDM, Week 18 vs BC

Projected finish: 9-9 (3-3), 5th in West

The Erratic East

Hamilton Tiger-Cats

Current Record: 3-7

Games Remaining: 8

Opponent Win Percentage: 0.368 (easiest in league)

Road Games: 4

Division Games: 5

Finally, the Eastern teams get to play each other and this ugly win-loss discrepancy with the West will start to even out. The Tiger-Cats currently sit in the 1st seed in the East, and are looking at the easiest schedule in the league down the stretch. The Argos are just below them though, and face the 2nd easiest, so their head to head matchups will likely determine the outcome of the East.

Key game(s): Week 16 @ TOR, Week 18 @ TOR

Projected finish: 6-12 (3-5), 2nd in East

Toronto Argonauts

Current Record: 3-8

Games Remaining: 7

Opponent Win Percentage: 0.378 (2nd easiest in league, 2nd easiest in East)

Road Games: 2

Division Games: 5

Toronto has the second easiest schedule down the stretch, and the fewest road games of any team. If Owens and the rest of the receiving corps can stay healthy for Ricky Ray, Toronto looks to be the team to beat in the East.

Key game(s): Week 16 vs HAM, Week 17 vs MTL, Week 18 vs HAM

Projected finish: 8-10 (5-2), 1st in East

Ottawa RedBlacks

Current Record: 1-9

Games Remaining: 8

Opponent Win Percentage: 0.414 (4th easiest in league, hardest in East)

Road Games: 3

Division Games: 5

The Expansion Blues are no joke, as Ottawa fans are finding out, and with the most difficult schedule in the East ahead, it doesn't look good for the Lumberjacks. Wins against their Quebecois neighbours give them a real shot at 3rd place, but in a down year for the East, that doesn't look like it will be enough for a playoff spot. The RedBlacks need at least 5 wins to have a shot, and to win a tie breaker with Hamilton they'll need to win both matchups. It's must-win from here on out.

Key game(s): Week 14 vs MTL, Week 15 @ HAM, Week 18 vs MTL, Week 19 vs HAM.

Projected finish: 3-15 (2-6), last in East

Montreal Alouettes

Current Record: 3-8

Games Remaining: 7

Opponent Win Percentage: 0.392 (3rd easiest in league, 2nd hardest in East)

Road Games: 4

Division Games: 5

With 2 wins in their last 3, and one against Hamilton, the Alouettes have a good chance at keeping their playoff streak alive, but only if they can keep up their success against division rivals. Four road games will make that a tall order, but with a game in hand against Hamilton, a week 20 matchup on the road may decide the 2nd seed in the East and what appears to be the last spot in the playoffs.

Key game(s): Week 17 @ TOR, Week 19 vs TOR, Week 20 @ HAM

Projected finish: 5-13 (2-5), 3rd in East (eliminated due to West cross-over)

Filling in the blanks for TSN's Field position article

On September 18th, Paul LaPolice wrote this great article for TSN, which breaks down some of the scoring data across the CFL this year. It's a good read, if you haven't checked it out yet, take a moment and do so, I'll wait here.

The one downside to this article was that Mr. LaPolice opted to trim his tables down to highlight only a subset of teams in each table.

Using the Drive Search feature on CFLStats.ca, I will fill in the rest for those who are interested. Please note that because I do not have direct access to the stats that TSN uses, my numbers are slightly different. I can't explain these discrepancies, as I'm confident in the accuracy of my data. It's possible that in some cases, our criteria for certain cases is different, for example the total number of possessions (TSN cites 1497 possessions, while my data includes 1485.) Whatever the reason for these discrepancies, they constitute a very small portion of the data and do not change the overall trends of the results.

For each of these datasets, if you click the link to the underlying search, you'll be able to view the stats from the defensive perspective (which team allows the highest TD percentage, etc), as well as each individual drive that matched the results.

Touchdown Percentage

Tm	G	Possessions	TD	Pct
CGY	11	163	34	21%
WPG	12	178	25	14%
EDM	11	171	23	13%
TOR	11	173	23	13%
SSK	11	161	19	12%
BC	11	169	19	11%
HAM	10	156	17	11%
MTL	11	169	13	8%
ORB	10	145	11	8%
Total	98	1485	184	12%

Drives starting inside your own 20 yard line

There is a slight semantic difference in the search here - "inside the 20" on the drive finder includes the 20 yard line, while the TSN data does not. The actual search results presented here from CFLStats.ca is "Drives inside own 19". That said, the TSN article may mixing the two data sets, as they claim a 21% TD rate for the Stamps, which is correct if I include the 20 yard line, but a 0% rate for the Riders, which is only correct if I omit the 20 yard line. The issue here may be that TSN's source tracks the starting line of scrimmage independently from the CFL's scoring, which is what CFLStats.ca's stats are based off of.

Tm	G	Possessions	TDs	Pct
CGY	10	32	6	19%
MTL	10	21	3	14%
WPG	11	30	4	13%
BC	10	21	2	10%
ORB	9	20	2	10%
TOR	11	33	3	9%
EDM	10	24	2	8%
HAM	9	29	1	3%
SSK	10	25	0	0%
Total	90	235	23	10%

Drives starting from own 21-49

I don't have a good search for this one, I had to search for drives within own 49 and remove the data from drives within own 20. Even still, my numbers are quite different from TSN's here.

Tm	G	#Dr	TDs	Pct
BC	11	118	11	9%
CGY	11	86	16	19%
EDM	11	112	11	10%
HAM	10	95	12	13%
MTL	11	124	6	5%
ORB	10	105	6	6%
SSK	11	104	6	6%
TOR	11	110	10	9%
WPG	12	110	16	15%
Total	98	964	94	10%

Starting from Opponent's End

I'd very much like to see TSN's source data on this one to compare, because they or I have a big error (or Mr. LaPolice made a mistake in this section). There are minor discrepancies here such as the article crediting the Riders with 9 touchdowns on 21 possessions, vs my data showing 8 touchdowns on 21 possessions, but the big one here is Toronto, which isn't included in the article's data. The article claims that the Riders lead the league with a 43% TD rate, but according to my data, Toronto is way ahead of the pack at 53%.

Tm	G	#Dr	TD	Pct
BC	11	21	5	24%
CGY	9	30	12	40%
EDM	9	26	9	35%
HAM	9	20	3	15%
MTL	8	15	2	13%
ORB	7	10	2	20%
SSK	8	21	8	38%
TOR	9	17	9	53%
WPG	11	25	5	20%
Total	81	185	55	30%

Monday, July 28, 2014

Who are the CFL's most successful runners?

Anyone who's watched a football game can tell you that not all yards are created equally. QBs pile up yardage in failed comeback attempts, and running backs rack up the carries while teams protect a lead. They might look the same on the score sheet at the end of the day, but there is a significant difference between an 8 yard gain on 2nd and 5, and an 8 yard gain on 2nd and 15.

Success Rate

Success rate is a simple metric that attempts to put a number on the difference between those plays - one of those plays is a successful one (it gained a first down), the other is not.

Your definition of success may differ from mine, but I've opted to define a successful running play as follows:

1) On first down, it gained at least 50% of the needed yards.

2) On second or third down, it gained 100% of the needed yards.

3) The runner did not fumble on the play.

In other words, a 5 yards on 1st and 10 is successful, but 5 yards on 2nd and 10 is not, and neither is a 15 yard run on 1st and 10 where the runner fumbled after gaining the yardage. Possession of the fumble is not relevant, any fumble, recovered by the offense or not, is considered to be an unsuccessful play. (Ask any coach and I think you'd find they agree.)

Success rate is shown as a percentage (successes / total attempts). A rusher with a high yardage total and low success rate probably tends to have long runs mixed with frequent stops for short or no yardage. A rusher with a low yardage total and high success rate is getting just enough to be successful, and not much more (perhaps indicative of a goal line QB or full back).

2014 Success Rates through Week 5

Through week 5 I've limited this list to running backs with at least 15 carries, and quarterbacks to those with at least 5 carries. I will increase these value as the season goes on.

Success Rate - Running Backs (min 15 attempts)

Success Rate - Quarterbacks (min 5 attempts)

By itself, Success Rate doesn't tell the whole story about a runner (would anyone rather have Pat White's 100% success rate and 1.7 YPC than Tanner Marsh's 88% and 6.4 YPC?), but it does provide an interesting metric to add to the conversation.

2013 Success Rates

I intend to put up a page on cflstats.ca to display success rates for all seasons, but in the meantime, here are the values for last year.

Success Rate - Running Backs (minimum 50 attempts)

Success Rate - Quarterbacks (minimum 15 attempts)

Friday, July 25, 2014

Missing data status

At launch, data from 2009, 2010 and 2013 was available for the majority of games. All game boxscores for these years are in the system, but a small number are missing play data. Data for games in 2011 and 2012 is available, but has not yet been processed. 2014 data is being entered into the system on a weekly basis and will be kept up to date throughout the season. Data for 2011 is currently being processed as time allows, and 2012 data will follow. As games are completed they will become immediately available on the website. There is no time frame for completion, but I hope very much to have them ready before the start of the 2015 season. There is currently no work being done on games with missing play data, these games will remain flagged in the system (a note appears at the top of these games), and individual play data will remain unavailable until the back log of 2011 and 2012 games has been completed. This page will be updated as the status changes.

A deeper look at the Edmonton fake punt (July 24, 2014)

Some background: Expected Points and Expected Points Added

Expected points (EP), and more specifically, Expected Points Added (EPA), are metrics I use on http://www.cflstats.ca that has been used by NFL analyst Brian Burke for quite a few years now.

Using historical scoring data, we can assign a point value for every down/distance/line of scrimmage combination in a game. By looking at every play and then the next score for either team, then grouping it by down/distance/LOS, we can come up with an average expectation for that play. EP can be positive or negative, indicating whether we expect the offense (positive) or defense (negative) to score next. In calculating the values, some game situations are filtered out in order to keep the values more accurate; plays which occur in the last 4 minutes of a half, or when the score margin is 14 points or more are not included in the calculations, in order to decrease the effect of garbage time or 2 minute drill type possessions.

Once we have a value for each game situation, we can then calculate EPA, which is simply the difference between EP on the next play and EP on the current play (EP After - EP Before). Positive EPA means the play moved the offence into a better position, negative means they are worse off than before.

Looking at EPA and comparing some possible outcomes can give us clues to whether in-game decisions were good or bad, or if risks were worth it.

The Play

On 3rd and 10 from their own 6 yard line with 26 seconds to go, Edmonton opted for a fake punt and gained just shy of the 10 yards necessary for the first down. Calgary scored a touchdown on the next play, and Edmonton was left looking like they'd made a bad decision.

But was it really a bad decision?

Outcomes and potential EPA

3rd and 10 from your own 6 yard line is a bad place to be, and the EP value reflects that. EP in this position is -1.3 points for the offense, meaning most of the time, the offense will give up the ball (or a safety) and the defense will be the next team to score.

Going into the play, they had three options:

1) Punt - Edmonton averaged 38.7 net punt yards on the night, so punting from the 6 yard line would expect to give Calgary the ball back somewhere around the 44 yard line. 1st and 10 from their own 44 yard line carries an EP of -2.4 for the Edmonton defense. Over the past 5 years, kickers have averaged 81% on field goal attempts from this range, which was the mostly likely scenario given the time remaining in the quarter. EPA for this outcome would be -2.4 - -1.3 = -1.1

2) Go for it (and succeed) - Lets assume they got those extra needed inches, and kept the ball on their own 16 yard line. That gives Edmonton 1st and 10, which carries an EP of 0.3. In this situation though, Edmonton would certainly have opted to kneel out the quarter, so in actuality their EP for this case would be a flat 0 EP. EPA for this outcome would be 0 - -1.3 = 1.3

3) Go for it (and fail) - Or exactly what happened, in other words. On average, teams on 3rd and 10 that go for it are successful 23% of the time, for an average gain of 6.3 yards. Plugging in the average yardage gives Calgary back the ball on the Edmonton 12 yard line, for an EP of -4.0. EPA for this outcome would be -4.0 - -1.3 = -2.7

The success rate for 2 and 3 are linked, meaning the true value of going for it must be calculated as a fraction of both, however. Historically, teams have converted on 3rd and 10 just under 23% of the time, which includes fake punts. That means the average EPA is a combination of the two:

EP_Success * SuccessRate + EP_Failure * FailureRate

For this situation, we get an average EPA of -1.78

To recap, that leaves us two outcomes - punt for an EPA of -1.1, or go for it, for an EPA of -1.78. Those are surprisingly close.

Conclusion

In a vacuum, or as a standard third down gamble, going for it here is the wrong decision. Both options are bad, as both indicate that Calgary is more likely to score next, but going for it is a little more than a half a point worse over all.

But, this didn't happen in a vacuum, time was a major factor here. Ordinarily, gaining your team a first down on your own 16 yard line isn't worth all that much, especially when it comes at the risk of a -4.0 EPA swing. But in this case, gaining the yardage would have allowed Edmonton to kneel out the quarter, giving up zero points, opposed to giving Calgary the ball back in position to kick a field goal from a spot on the field where kickers are fairly successful (81%).

And it wasn't a gamble, it was a fake punt. Fake punts are a bit harder to quantify the success rate on, as they rely heavily on the element of surprise, and whether or not the team has found a weakness they hope to exploit. I don't know the league average for success on fake punts, but I would wager that they are slightly more likely to succeed than a 3rd and long gamble, especially if the punting team has spotted something they think they can exploit. Edmonton only needed to convert at a 40% rate in order to break even vs the punt.

It's a very close call on this one. In a tie game, giving up the ball and a likely field goal could have been the difference in the game. Conversely, giving up the ball inside your own 20 is a huge risk, but with less than 20 seconds left, there's a very good chance that most of the time, Calgary only walks away with a field goal here anyway.

So should Edmonton have punted? I actually like the decision here: if the Edmonton coaches felt that the Calgary defense was unprepared or likely to cheat back to block, they may have felt their chances of succeeding were much higher than 40%, and after a good defensive half, they had to feel that they could hold Calgary to a short field goal in the case of a turnover. Unfortunately, Edmonton's defense didn't hold, and the gamble resulted in the worst possible outcome, but I give credit to Chris Jones and his staff for making the aggressive choice at a reasonable time.

Thursday, July 24, 2014

July 24th Update

CflStats.ca has been updated with a few minor enhancements:

1) Player and team pages now include "pass targets" (times a player made a catch or was targeted with a pass). This data was always tracked, but for some reason not available on any pages.

2) Added a list of recent games to the front page for quicker access.

In addition, I've greatly improved the way that game day rosters are handled. Going forward, the official game day roster will be used to determine which players were actually in the game, so the per game stats and player game logs should be much more accurate. (Previously I used the transaction list to try and guess which players were available, but the transaction list is incomplete and results in some players appearing to still be on a team, when in fact they were inactive or even no longer on the roster.) It will take some time to implement this into the games in the archive, but it will be done eventually.

Monday, July 7, 2014

/r/CFL Mathematical Rankings explained (And the Week 2 Math Rankings)

Many of the readers here were originally introduced to this blog via http://www.reddit.com/r/cfl, a great CFL-based community of which I am a regular participant.

One of the ways I participate is as one of the 10 voters (technically 9 currently, as we have no Edmonton voter) for the official /r/CFL Power Rankings.

The /r/CFL Power Rankings work in the usual way, we have a group of voters, and each voter ranks the teams at the end of each week. The average of the votes determines a team's position on the list. Most of the rankers vote from a team standpoint; they are designated as the official voter for their team, and they will contribute a short note regarding their team for the rankings.

My contribution is different, however. The folks organizing the rankings decided that they would like to include my voice as well, and I'm happy to have the opportunity to contribute. My rankings, as I'm not designated as a team ranker, are intended to be an unbiased vote, so as per my nature, my votes are math based.

For my vote, I've opted to use the Simple Ranking System, a new system I haven't talked about on this blog before (you can see each team's SRS rank on the standings page of www.cflstats.ca).

Simple Ranking System, or SRS, follows the same concept as Pythagorean Expectation and is based on the theory that points for and against are a better indicator of team strength than a team's actual record. However, what Pythagorean Expectation and points for/against lack are adjustment's based on matchup. The best team in the league beating the worst team in the league in a close game is both less impressive for the best team, and more impressive for the worst team. SRS attempts to adjust results based on opponent rankings.

In basic terms, the formula for SRS is a team's average point margin, plus the average of their opponent's ratings.

I'll quote www.pro-football-reference/blog?p=37 for this part:

So every team's rating is their average point margin, adjusted up or down depending on the strength of their opponents. Thus an average team would have a rating of zero. Suppose a team plays a schedule that is, overall, exactly average. Then the sum of the terms in parentheses would be zero and the team's rating would be its average point margin. If a team played a tougher-than-average schedule, the sum of the terms in parentheses would be positive and so a team's rating would be bigger than its average point margin.

You can figure out any team's rating if you know their opponent ratings. Which sounds easy, except you can't know an opponent rating until you've figured out their own opponent rating. Which brings you back to the first opponent, and leaves you in an infinite loop.

Fortunately, the loop will stabilize after a number of iterations. On cflstats.ca, SRS is calculated first with an opponent adjustment of 0, then again once we've calculated them all once. And then again and again, until the ratings stop changing. Once the ratings stop changing, you have your ratings for the week.

How does this apply to the Power Rankings?

Simple. My votes are simply the order of the teams ranked by SRS on www.cflstats.ca. It's bias-free because I have no direct input on the process, and it provides a good way to contextualize a team's perfomance, especially in the early parts of the season when there aren't too many common opponents.

There's one caveat though: with a small sample size, the usefulness of a stat like this is reduced, as a single game makes up a significant portion of the rating and may be an outlier in the teams actual season. Over time, those even out, but early on, they count too heavily. So I have introduced an element of human intervention for the early part of the season. It's not based on any tested math, it's just a means to avoid wild swings to a certain extent.

Going into the season, I ranked the teams based on their expected win total change (from historical Pythagorean Expectation data). After 1 game, a team's movement was capped at +/- 3 spots on the list (ie: the 9th place team on the list was limited to no higher than 6th position). After 2 games, the cap was raised to +/- 6 positions. After 3 games, the limit will be removed and SRS will be used directly.

The Week 2 Math Rankings

1) Winnipeg (SRS rank 1)

2) Toronto (SRS rank 3)

3) Calgary (SRS rank 4)

4) Saskatchewan (SRS rank 5)

5) Montreal (SRS rank 6)

6) Ottawa (SRS rank 2)

7) Edmonton (SRS rank 7)

8) Hamilton (SRS rank 8)

9) Montreal (SRS rank 9)

With the movement cap up to 6 for most teams this week (Calgary and Ottawa were restricted to +/- 3), the cap was mostly a non-issue, and only Ottawa was affected. They started the season in 9th and weren't moved in the bye week, so despite a strong performance against what SRS thinks is the best team in the league, Winnipeg, they were moved down to 6th by the cap.

Some might find the rankings of 1-1 Toronto and 2-0 Edmonton to be rather odd, but they can be explained by opponent adjustments. Toronto dominated a Saskatchewan team which had a very strong week 1 ranking, and that makes Winnipeg look that much better in week 1, and subsequently Toronto's loss to a strong Winnipeg team no longer hurts as much. Likewise, while Edmonton is sitting pretty at 2-0, their two wins have come against the 8th and 9th ranked teams, both 0-2 with an average point differential of -23.5 between them.

Tuesday, June 24, 2014

2014 Season Preview

This is the post where I attempt to use math and logic to predict the outcome of a game which is based on randomness and luck. By the time November rolls around, this post will probably make me look silly.

Nonetheless, this is what you do this time of year, so lets go.

West Division

BC Lions

2013 Record: 11-7

Pythagorean Wins: 10.1 (over-performed by 0.9 wins, 3rd luckiest)

Record in Close Games (decided by 7 points or fewer): 4-1

Simple Rating: 3.0 (3rd)

Turnover Differential: +2 (4th)

The math suggests that the Lions were a bit lucky in 2013; they had the best record in the league in close games, and they over-performed their point differential by a small margin. The math also suggests that despite these factors, the Lions were the 3rd best team in the league last year. Unfortunately for BC fans, they were also the 3rd best team in their own conference. Teams that over-perform in the 0.5-1 game range tend to regress by around a game and a half the next year, but in this case, with Calgary and Saskatchewan looking vulnerable and an extra game against Winnipeg on the schedule, I wouldn't expect that to happen.

Prediction: 11-7, 2nd in the West.

Calgary Stampeders

2013 Record: 14-4

Pythagorean Wins: 12.3 (over-performed by 1.7 wins, luckiest in the league)

Record in Close Games : 3-2 (4th)

Simple Rating: 7.2 (1st)

Turnover Differential: +19 (T-1st)

Just because the numbers say a team is the luckiest in the league, doesn't mean they aren't also a very good team. The Stamps were a very good team in 2013, one of only 12 teams since 1990 to finish with at least 14 wins. They were good in close games, but not overly so, good at taking care of the ball, and solid on defense against the pass. The loss of Kevin Glenn may hurt them if Drew Tate is unable to stay healthy, but Mitchell has shown flashes of brilliance in his time under center, so the QB play should remain solid. Teams that over-perform in the 1.5+ range tend to fall back to the pack a bit though, and I expect the Stampeders to follow suit this year.

Prediction: 12-6, 1st in the West.

Edmonton Eskimos

2013 Record: 4-14

Pythagorean Wins: 6.5 (under-performed by 2.5 wins, unluckiest in the league)

Record in Close Games: 1-6 (last)

Simple Rating: -3.8 (7th)

Turnover Differential: -15 (7th)

Lets get this out of the way early: the Eskimos were tremendously unlucky last year. Only 5 teams since 1990 have fallen short of their expected win total by more then Edmonton did in 2013. The good news for Eskimo fans? Of those which played a season the next year (the 1995 Shreveport Pirates folded after their season), each of them finished with at least 9 wins the next season. Of course, there is also the 1997 Bombers, who come out just ahead of the Eskimos at -2.4 wins, and finished with just 3 the next year. That said, their record in close games is bound to improve, so it's a good bet that they'll see some improvement.

Prediction: 8-10, 3rd in the West.

Saskatchewan Roughriders

2013 Record: 11-7

Pythagorean Wins: 12.1 (under-performed by 1.1 wins, 2nd unluckiest)

Record in Close Games: 3-5 (6th)

Simple Rating: 6.2 (2nd)

Turnover Differential: +19 (T-1st)

The 2013 Riders were a very good team by the numbers; nearly the equal of the Stampeders, despite a 3 game difference in the standings. Of course, we all know how it turned out in the end. Normally, a team that falls short of expectations by 1+ wins would be expected to show improvement next year, in the range of another 1-1.5 wins. Sadly, this Rider fan doesn't see that happening here. These math functions can be a great way to judge performance beyond the standings, but they lack situational awareness, and what they don't know, is that the 2014 Riders do not look all that much like the 2013 Riders. Hits to the receiving corps and the loss of Kory Sheets are bound to hurt the offense. The upside for Rider Nation? The 2013 team was the best defense in the league, allowing a league-low 398 points and only 20 passing touchdowns.

Prediction: 6-12, 4th in the West

Winnipeg Blue Bombers

2013 Record: 3-15

Pythagorean Wins: 3.8 (under-performed by 0.8 wins, 3rd unluckiest)

Record in Close Games: 1-4 (7th)

Simple Rating: -11.4

Turnover Differential: -27 (last)

The Bombers were really bad in 2013. The single game they finished behind the Eskimos in the basement of the CFL doesn't tell the full story here, by simple ranking system (which ranks team by average point margin adjusted for opponents), they were a full touchdown per game worse than the Eskimos. By this metric, the Eskimos were closer to the 3rd place team (BC), than they were the Bombers. There are faint signs of hope, however; the turnover margin is likely to improve in 2014, and they were unlucky in close games last year, a stat which is likely to be closer to 50-50 over time. Still, they have an unproven starter under center and they play in the difficult West; 2014 may be a long season as well.

Prediction: 5-13, 5th in the West

East Division

Hamilton Tiger-Cats

2013 Record: 10-8

Pythagorean Wins: 8.6 (over-performed by 1.4 wins, second luckiest)

Record in Close Games: 5-3 (3rd)

Simple Rating: -1.9 (6th)

Turnover Differential: -13 (6th)

Hamilton had a nice run to the Grey Cup final in 2013, but a strong record in close games and the second luckiest W/L percentage in the league are all signs for regression. However, this is a team with a strong coach and a pair of young QBs who have shown they can play. Couple those with a schedule against the weak East Division, and you may have a team able to make another run in 2014.

Prediction: 9-9, 2nd in the East

Montreal Alouettes

2013 Record: 8-10

Pythagorean Wins: 8.7 (under-performed by 0.7 wins, 4th unluckiest)

Record in Close Games: 5-5 (5th)

Simple Rating: -1.1 (5th)

Turnover Differential: -2 (1st)

Montreal finished just about right where they should have last year, ending up closer to their expected win total than any other team. They were .500 in close games, just about even in turnover differential and near zero (average) in simple rating. The defense was still good, first in the league for takeaways and yards allowed, but the offense will need to take much better care of the ball in 2014 for things to get better. I think they will, but not by much.

Prediction: 9-9, 2nd in the East

Ottawa RedBlacks

2013 Record: n/a

Pythagorean Wins: n/a

Record in Close Games: n/a

Simple Rating: n/a

Turnover Differential: n/a

How do you make a stats-based prediction for a team that's never played a down of football? You really can't, but we do have some historical data to work with. Since 1990, seven CFL teams have started from scratch (technically 9, but the 1996 Texans and Alouettes were both relocations). Their combined record was 49-77 (.389). A veteran QB and a promising backup will provide some optimism, but history is not on Ottawa's side.

Prediction: 7-11, 4th in the East

Toronto Argonauts

2013 Record: 11-7

Pythagorean Wins: 10.2 (over-performed by 0.8 wins, 4th luckiest)

Record in Close Games: 6-2 (2nd)

Simple Rating: 1.7 (4th)

Turnover Differential: -13 (6th)

Mathematically, the best team in the East would have been a mere 4th in the strong West Division, but this was a well balanced team in 2013. They were 3rd in points for, 3rd in points allowed, and 3rd in turnover differential. A late season loss to Ricky Ray in the midst of an all-time great season hurt, and they fell short of expectations in the playoffs. The record in close games is likely to take a hit, but the turnover differential should balance out as well, and a full season with Ray at the helm should keep them at the top of the East again.

Prediction: 11-7, 1st in the East

Monday, June 23, 2014

Introducing cflstats.ca

My name is Mike, and I'm a stat-aholic.

It should be obvious by now that I'm mildly obsessed with sports stats. I started this blog mid-season last year with the intention of bringing some of the so-called "advanced stats" up north to the CFL. I don't claim to be a math genius, but I can read a formula, and as a programmer, I'm fairly adept at collecting stats. This made Pythagorean Expectation was a perfect place to start, as the required data (points for and against) was easily available, and the formula was fairly simple.

But it was obvious from the start that we CFL fans suffer from a lack of good data. Towards the end of the season, I set out to improve that.

After some long nights in the off season, I'm proud to unveil CFLStats.ca.

What is CFLStats.ca?

For the NFL statheads out there, the resemblance to pro-football-reference.com will be immediately apparently. It was my inspiration and guide throughout the process. While the code behind cflstats.ca has no ties to PFR, the development would not have been possible without having it as a guide.

What CFLStats.ca provides is a very large searchable database that includes (almost*) every action of every play, for every game processed so far. Due to time constraints, this means the 2009, 2010 and 2013 seasons. I hope to finish processing 2011 and 2012 in the near future.

What's missing?

A few things right now, but primarily the data from 2011 and 2012. It's available and will be there eventually, but importing games is a time consuming process, and they simply weren't ready in time. Rest assured you'll see that data in time.

A major limitation of the data however is "Games Played" stats, and by association, the "per game" averages. Unfortunately, it's impossible to determine based on the play by play whether a player actually suited up for a game or not. Part of the process was to run through player transactions to determine trades as well as active/injury status, but there are some players (primarily backup quarterbacks) who remain active but never appear in the game. The DB will therefore show them as having "played" in 18 games for the season, while in reality they may have only appeared in a handful, or even none at all. As a result, the per game stats should be considered an estimate.

What can I search for?

A lot of stuff.

Oh you want more? Oh alright.

You can search for team games that match your criteria (how many games where there in 2013 where a team passed for 400 yards?). (Answer: 4. Toronto did it 3 times.)

You can search for player games that match your criteria (which players had a game where they rushed for 150 yards?). (Answer: Kory Sheets (3 times), Chris Garrett, Jon Cornish (3 times), Chad Kackert and Brandon Whitaker)

You can search for drives matching your conditions (show me the drives where a team got the ball after an interception or fumble). (Answer: it happened 246 times, and 82 touchdowns were scored)

And you can search for plays matching your conditions (show me the result of play from the opponents goal line). (Answer: 73 plays and 51 touchdowns)

Within these searches are a lot of options for filtering, sorting and grouping. I'm sure there are things which currently can't be searched for, but I think you'll find there are a ton of things you can.

Errors

What errors? Everything is perfect, or I wouldn't be releasing it, obviously.

Yea that's a lie. There are going to be errors in the data, it's a fact of life with a database this large. The import process is designed to catch as many as can be identified, and I fix those by hand as I find them, but I'm certain some have slipped through. At the bottom of every page, you'll see a "report error" link. If you find something you think is wrong, click that button and send in the details of the error. The more detail the better. You don't have to provide an email to send a report, but if you do, I'll update you with the resolution.

* In rare cases, the play by play data was not available for processing or contained errors too numerous to be utilized. These games are clearly marked when you access them, and will have high level stats available based on game box scores, but no searchable plays.

Wednesday, June 18, 2014

Finding a New Magic Number

In the formula for Pythagorean Expectation, a magic number exists.

Ok, it's not really magic, rather, it started from an assumption made by a very smart man ("2 would be a good number") and ended with rigorous scientific testing by even more very smart people ("1.83 is actually a better value for baseball").

When I started this project, I knew that different sports use different exponents, that more scoring means a higher exponent, and that the CFL has more scoring than the NFL. Unfortunately, as I was just beginning to collect data, I had no way of determining what the best exponent for the CFL would be. In the end, after looking at the values used for various sports (MLB = 1.83, EPL= 1.30, NHL = 2.15, NFL = 2.37, NBA = 13.91), I decided the gap between the CFL and NFL was probably small enough that the known exponent for the NFL was likely good enough to be useful for my calculations.

Now, however, I have data going back to 1990, and after some prompting from a gentleman from Hamilton, I realized it would be prudent to go back and do the math.

Based on some research, I settled on the method outlined here (external link), which calculates a value for lacrosse. The calculation itself is fairly simple:

1) Find the expected win total using the Py Expectation formula, and subtract it from the actual win total.
2) Square that value.
3) Calculate this value for every team in every year that I have data for.
4) Add up all the values.
5) Find the square root.

This leaves me with the root-mean-square error (RMSE) for the data using whichever exponent I used in step 1. All that's left at this point is to run the calculation with a range of exponents to determine which results in the lowest RMSE.

Thanks to Bill Barnwell and the others who have already done these calculations for the NFL, I had a reasonable clue as to where the exponent would fall, so I calculated the RMSE for 2.00 through 5.00, increasing by 0.01 each time.

As expected, the value came out higher than the NFL, but not by much:

The most accurate value of the bunch is 2.74 (raw RMSE data), with an error rate approximately 3% lower than the original 2.37 exponent.

So what does this all mean?

Good question. For starters, it means that going forward, I will be using 2.74 for future calculations. At some point, I will also go back and revise some of the posts discussing historical data to improve the accuracy. I will not go back and alter the data for 2013, as they were simply to provide a week by week run down, and there would be limited value in correcting the data at this point.

So there you have it: 2.74, my new favorite number.