I’m always impressed with the contest and accessibility of the Baseball with R website (here), which features a great cast of statisticians writing about everything from Hall of Fame entry to umpire bias.
In a similar vein, I highly recommend Sam and AC’s nhlscrapr package in R. I’ve used it extensively to analyze play-by-play data from past seasons (for example, this post on momentum in hockey).
However, I have a soft spot for overtime outcomes in the NHL, and while the nhlscrapr package has game-by-game results, there isn’t a straight-forward mechanism for identifying whether or not a given game went to overtime. Further, data in the nhlscrapr package only goes back about a decade or so.
Thankfully, Hockey Reference has easily accessible (and scrapable) tables for us to use. Given that I am doing some updated analyses over NHL overtime rates, and that I wanted an easier method than copying and pasting .csv files from nhl.com, I figured I would post the code that I used to scrape NHL game outcomes. The code that follows extracts each game’s outcome for the last five years; if you are interested in other years, its easy enough to change the url’s.
Feel free to use, and hope you enjoy!
library(XML) library(stringr) nhl<-NULL urls<- c("http://www.hockey-reference.com/leagues/NHL_2011_games.html", "http://www.hockey-reference.com/leagues/NHL_2012_games.html", "http://www.hockey-reference.com/leagues/NHL_2013_games.html", "http://www.hockey-reference.com/leagues/NHL_2014_games.html", "http://www.hockey-reference.com/leagues/NHL_2015_games.html") for (i in 1:length(urls)){ tables <- readHTMLTable(urls[i]) n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) temp<-tables[[which.max(n.rows)]] nhl<-rbind(nhl,temp) } names(nhl)<-c("Date","Visitor","VisGoals","Home","HomeGoals","OTCat","Notes") table(nhl$OTCat) nhl<-nhl[nhl$OTCat!="Get Tickets",] nhl$OT<-nhl$OTCat=="OT"|nhl$OTCat=="SO"
Wiola! That easy. We are in business with a few lines of code.
Here’s the output:
Reblogged this on Stats in the Wild.
I think this is againts the TOS of sports reference website