NHL game outcomes using R and Hockey Reference

I’m always impressed with the contest and accessibility of the Baseball with R website (here), which features a great cast of statisticians writing about everything from Hall of Fame entry to umpire bias.

In a similar vein, I highly recommend Sam and AC’s nhlscrapr package in R. I’ve used it extensively to analyze play-by-play data from past seasons (for example, this post on momentum in hockey).

However, I have a soft spot for overtime outcomes in the NHL, and while the nhlscrapr package has game-by-game results, there isn’t a straight-forward mechanism for identifying whether or not a given game went to overtime. Further, data in the nhlscrapr package only goes back about a decade or so.

Thankfully, Hockey Reference has easily accessible (and scrapable) tables for us to use. Given that I am doing some updated analyses over NHL overtime rates, and that I wanted an easier method than copying and pasting .csv files from nhl.com, I figured I would post the code that I used to scrape NHL game outcomes. The code that follows extracts each game’s outcome for the last five years; if you are interested in other years, its easy enough to change the url’s.

Feel free to use, and hope you enjoy!


urls<- c("http://www.hockey-reference.com/leagues/NHL_2011_games.html",

for (i in 1:length(urls)){
 tables <- readHTMLTable(urls[i])
 n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))

nhl<-nhl[nhl$OTCat!="Get Tickets",]

Wiola!  That easy. We are in business with a few lines of code.

Here’s the output:

Screen Shot 2014-11-21 at 10.01.32 PM



