Monday, June 23, 2014

Introducing cflstats.ca

My name is Mike, and I'm a stat-aholic.

It should be obvious by now that I'm mildly obsessed with sports stats. I started this blog mid-season last year with the intention of bringing some of the so-called "advanced stats" up north to the CFL. I don't claim to be a math genius, but I can read a formula, and as a programmer, I'm fairly adept at collecting stats. This made Pythagorean Expectation was a perfect place to start, as the required data (points for and against) was easily available, and the formula was fairly simple.

But it was obvious from the start that we CFL fans suffer from a lack of good data.  Towards the end of the season, I set out to improve that.

After some long nights in the off season, I'm proud to unveil CFLStats.ca.



What is CFLStats.ca?

For the NFL statheads out there, the resemblance to pro-football-reference.com will be immediately apparently.  It was my inspiration and guide throughout the process.  While the code behind cflstats.ca has no ties to PFR, the development would not have been possible without having it as a guide.

What CFLStats.ca provides is a very large searchable database that includes (almost*) every action of every play, for every game processed so far.  Due to time constraints, this means the 2009, 2010 and 2013 seasons.  I hope to finish processing 2011 and 2012 in the near future.

What's missing?

A few things right now, but primarily the data from 2011 and 2012.  It's available and will be there eventually, but importing games is a time consuming process, and they simply weren't ready in time.  Rest assured you'll see that data in time.

A major limitation of the data however is "Games Played" stats, and by association, the "per game" averages.  Unfortunately, it's impossible to determine based on the play by play whether a player actually suited up for a game or not.  Part of the process was to run through player transactions to determine trades as well as active/injury status, but there are some players (primarily backup quarterbacks) who remain active but never appear in the game.  The DB will therefore show them as having "played" in 18 games for the season, while in reality they may have only appeared in a handful, or even none at all.  As a result, the per game stats should be considered an estimate.

What can I search for?

A lot of stuff.

Oh you want more?  Oh alright.  

You can search for team games that match your criteria (how many games where there in 2013 where a team passed for 400 yards?).  (Answer: 4. Toronto did it 3 times.)

You can search for player games that match your criteria (which players had a game where they rushed for 150 yards?).  (Answer: Kory Sheets (3 times), Chris Garrett, Jon Cornish (3 times), Chad Kackert and Brandon Whitaker)

You can search for drives matching your conditions (show me the drives where a team got the ball after an interception or fumble). (Answer: it happened 246 times, and 82 touchdowns were scored)

And you can search for plays matching your conditions (show me the result of play from the opponents goal line). (Answer: 73 plays and 51 touchdowns)

Within these searches are a lot of options for filtering, sorting and grouping.  I'm sure there are things which currently can't be searched for, but I think you'll find there are a ton of things you can.


Errors

What errors?  Everything is perfect, or I wouldn't be releasing it, obviously.

Yea that's a lie.  There are going to be errors in the data, it's a fact of life with a database this large.  The import process is designed to catch as many as can be identified, and I fix those by hand as I find them, but I'm certain some have slipped through.  At the bottom of every page, you'll see a "report error" link.  If you find something you think is wrong, click that button and send in the details of the error.  The more detail the better.  You don't have to provide an email to send a report, but if you do, I'll update you with the resolution.

* In rare cases, the play by play data was not available for processing or contained errors too numerous to be utilized.  These games are clearly marked when you access them, and will have high level stats available based on game box scores, but no searchable plays.

No comments:

Post a Comment