NBA win totals, 2014-15 edition

We’re back again to judge preseason predictions, this time looking at NBA prognosticators. Our outcome of interest is the number of wins for each NBA team, and we’ll compare predictions to the totals set by sportsbooks in late October.

Despite my best efforts on social media, I could only find three competitors: Team Rankings, the Basketball Distribution, and Seth Burn. I also merged the predictions from those three sites, in what I’ll call the statheads Aggregate.

Our first criterion is Mean Absolute Error (MAE), which represents the average deviation between the predicted win total from each site and the observed total. In addition to the prediction sites, I also calculated the MAE using last year’s win totals, as well as 41’s, which represents a prediction of every team finishing 41-41.

For the 2014-15 season, statheads reigned supreme.


The Basketball Distribution and the aggregated picks from the three sites were both noticeably better than the totals set by sportsbooks. This matched results from last year for the Basketball Distribution. (Note: Nathan from the Basketball Distribution sent along a table with similar results to these).  

Of course, its easy to assume that these results could be due to chance. To check this, I simulated 1000 seasons by using the sportsbook totals as the true mean team totals (slightly scaled, to account for the fact that the sportsbook totals add up to 41.5 wins per team), while also assuming that the 2014-15 season standard deviation of 9.3 marks the actual standard deviation of win totals. Here’s a plot of the simulated MAE for both Basketball Distribution and the Aggregate picks.

Screen Shot 2015-04-17 at 10.01.11 AM

In only about 1% of simulations was the MAE for Basketball Distribution lower than the observed total. That’s a pretty solid effort. The aggregated picks from the three sites finished in about the 4th percentile, another solid effort.

And while it was a good year for the statheads, it wasn’t quite that for the Knicks. At 17 wins, New York finished more than 20 wins below its expected total of 40.5.

Here’s a team by team play of observed and predicted totals, arranged in order of projected total (the V). The calculator image refers to the stathead projection.


Hear the story about the Michigan State bettor who could win 1 million dollars? Read on.

If you’ve clicked on a sports website in the past week, you’ve probably read an article about a bettor in line to win 1 million dollars if Michigan State were to win the NCAA title this weekend.

Indeed, Derek Stevens, a supposed regular at the Golden Nugget casino, placed a $20,000 bet on December 5th on the Spartans to win the title. At 50:1 odds, Stevens’ bet will pay big if the Spartans can pull off upsets of both Duke and either Kentucky or Wisconsin this weekend.

Awesome, right?

There’s just one problem.

Stevens’ bet wasn’t that smart.

Back on December 5th, the Spartans were 5-3 and looked headed for the relatively unspectacular regular season that they were expected to have. Preseason polls had Michigan State as the third best team in their own conference and ranked No. 18 in the nation, and by week 5 of this season, the Spartans were unranked. Getting Michigan State at 50:1 wasn’t a good deal then and I’m still not sure it looks like a good deal now (most prediction sites only give MSU about a 5% chance to win it all, even with two games left).

Instead, like many futures bettors, Stevens would have been much better investing his $20k in the Spartans before the tournament began. For example, I went through MSU’s tournament path what a bettor would have earned off a $20k investment in the Spartans to win each game, assuming that 100% of the return was simply re-invested in Michigan State to win its next game. For example, the Spartans money line was -250 against Georgia; a bet of $20k would’ve yielded an $8k profit, and together, that $28k sum would be again invested on the Spartans to beat Virginia.

Here’s a graph of the results.


Two things stand out. First, a bettor would stand to make just under $1 million on the Spartans in this scenario…before they even played the championship game. Indeed, this strategy would yield a profit of just over $900,000 if Michigan State beats Duke on Saturday.

Second, assuming a line of +350 for Michigan State versus either Kentucky or Wisconsin (and that’s generous towards MSU), such a strategy would yield more than $4 million if MSU were to win the title.

Of course, there are several caveats here. Futures bets are not always bad ideas, and you cannot use the described strategy with all teams. For example, Kentucky was +650 to 2015 title a year ago; before the tournament began, the Wildcats were close to even money. Indeed, the Michigan State bet would’ve looked like a pretty good wager if the Spartans had started the tournament at, say, 10:1 or 12:1 (they didn’t, opening at anywhere from 30:1 to 40:1).

So, while the headline you are reading continues to say “Bettor can win $1 million dollars on MSU,” perhaps the more applicable headline is “Bettor cost himself $3 million dollars in profit by taking a futures bet in December.”

Measuring responses to incentives in the National Hockey League using in-game behavioral modifications

When the NHL changed its point system (PS) after each of the 1999 and 2004 seasons, initially adding a point for overtime losses and then the shootout at the end of tied overtime games, economists jumped at the chance to measure how teams varied their behavior in response to the incentives encouraged by league policies.

We can be assured in one thing, which is that overtime rates went up when the newest point system was implemented. Teams reaching OT now share three points, with two going to the winner and one to the loser. Because only two points are awarded in regulation, the larger share of points for OT participants encourages teams to play for regulation ties, as to guarantee themselves a point. Empirical evidence (see Abrevaya) and theoretical justification (see Longley & Sankaran) has shown that in most tied-game situations, NHL teams should be playing for overtime.

However, while game outcomes have been contrasted extensively, one aspect that has been lacking in research is empirical evidence showing how the behavior of NHL teams varies. In an article today at FiveThirtyEight, Noah Davis and I try to do just that, helping to show how, where, and when NHL teams appear to vary their on-ice behavior in accordance with the league’s current and past PS’s.

Using expected goal rates, we first show that in the final 5 minutes of tied regulation games, NHL teams play passively, as judged by lower expected goal rates. This difference is exaggerated when comparing to behavior at the end of the first and second periods, and to the teams’ behavior when the game is not tied.

Next, comparing two of the league’s PS’s, we show how the rates of overtime in the current PS go up at the end of the season, when the postseason push becomes of primary of importance, and teams look to guarantee themselves a point.


Of course if you are interested in this stuff, keep reading :)


One figure left out of the article on FiveThirtyEight compares the behavior of teams in conference and non-conference games. Given that overtime incentives vary based on their opponent (see the quote in our article from the coach of the Washington Capitals), we wanted to compare the behavior of teams by game type. We chose conference and nonconference games, because for most of the past decade, postseason eligibility was judged within a teams’ conference.

Here’s our graph. The red band depicts the 95% confidence interval for expected goal rate during nonconference games, while the black band represents conference games. Nonconference games, by and large, feature lower EG totals during moments in which the game is tied, relative to conference ones, from late in the first period through the remainder of the game.

NHL Conference

The widths of the confidence bands are important; while nonconference behavior tends to be more passive throughout the entirety of the second period, at the end of the third period, the EG rates drop to significantly lower levels among non-conference games. Note: Because the distribution of EG rate is unknown (or it would require making some strong assumptions), the calculation of confidence limits for the smoothed lines took additional care. I used bootstrap resampling of the observed data, fitting smoothed lines to each of the resampled data. The limits shown reflect the 2.5th and 97.5th percentiles of the smoothed lines at each game minute. Also, given that non-conference games are generally clustered in earlier portions of the season, this figure only looks at games played during the first half of each season, and we excluded the lockout-shortened 2013 season that featured conference games only.

In nonconference games, it appears teams have taken a more passive approach.


I am greatly indebted to war-on-ice for its data, and to Carnegie Mellon’s Sam Ventura for his work with expected goals (Sam specifically shared his code with me, but credit also goes to folks who started with work on expected goals).

In that spirit, I wanted to share the code I used to replicate the Figures for FiveThirtyEight. These can be found on my new GitHub page. (Yes, it is a bit embarrassing that I am new to Github. For several years, I have just posted useful code at the end of blog posts. However, I figured it was time the code moved to a more traditional resource).

Figure 1 will require you to have downloaded all of the play-by-play data from the nhlscrapr package in R. This data can take overnight to download, and again overnight to run the for loop to extract game behavior based on each game minute and situation. So, perhaps for the intermediate or veteran coders only.

Figure 2 is straight-forward and should work in less than 2 minutes for anyone with an internet connection.

Cheers, and thanks for reading!


NCAA hoops & circadian rhythms

One of my favorite NFL studies (Smith et al, 1996, with several follow-ups) looks at the results of night contests in which a West Coast team has played an East Cost team. Researchers have suggested that, during these contests, West Coast teams perform at a much higher level than the East Coast teams, even after accounting for the point spread set by sportsbooks.

Why might these results occur in night games? Writes this article from Deadspin,

Without knowing it, athletes on teams from the East Coast are playing at a disadvantage. Because of the circadian rhythm, which they can't control, their bodies are past their natural performance peaks before the first quarter ends. By the fourth quarter, the team from the East Coast will be competing close to its equivalent of midnight. Their bodies will be subtly preparing for sleep by taking steps such as lowering the body temperature, slowing the reaction time, and increasing the amount of melatonin in their bloodstream. Athletes on the team from the West Coast, meanwhile, are still competing in the prime time of their circadian cycle.

Unfortunately, in the NFL, night games between West Coast and East Coast franchises are few and far between. In 2014, for example, I only recall two such games, Seattle beating Washington and New England ousting San Diego. Relatedly, Danny Tuccitto found a similar issues with the schedule for Football Outsiders a few years back (bonus: Danny also shows a spreadsheet with the results of past NFL games).

All of this brings me to tonight’s Division 1 NCAA men’s hoop contest between Xavier (EST) and Arizona (PST), which tipped off at 10:40 EST. Given the late start, Xavier is currently playing well past its supposed prime for athletic performance, while Arizona, according to the circadian cycle, should be in much better position. This begs the obvious question – is it worth looking at circadian rhythms in the NCAA basketball tournament?

Going back to 2002, I extracted any NCAA men’s D1 tournament game that was played at 9:00 EST or later. I found 24 of them (note: I did this manually, and my identification of East Coast and West Coast time zones may be a bit off).

Here’s a screenshot of games: I counted West Coast teams at 11-13 ATS.

Screen Shot 2015-03-27 at 12.17.39 AM

At least relative to the game’s spread, there does not see to be an advantage for West Coast teams playing night games against East Coast opponents.

Of course, there are several caveats here. First, the advantage of playing at night could already be built into the line. Second, we are dealing with a really small sample size – only a few tournament games per year meet this standard – making it difficult to learn much. At any rate, if anyone is interested in studying this further, please send along your results. Other postseason games, and perhaps even regular season contests, would be interesting to look at.

For now, given that these athletes are often competing during night games all season, it certainly seems plausible that the effect of circadian rhythms is limited and/or negligible in postseason college hoops.

March Madness bracket advice, adjusted for 2015

It’s that time of year again, and while you can go to just about any media outlet for March Madness advice, I’m fairly confident you won’t get most of the stuff that I’m going to write about here. I think that’s a good thing?

As preliminary thoughts, feel free to check out my two posts from last year:

Value and March Madness

What are the actual odds of someone picking a perfect bracket?

Okay, here are some thoughts and general strategies.

1- Your first round choices depend on your scoring system. And maybe even your second round picks, too.

Most pools can generally be separated into one of two categories – those with upset points or those without upset points.

Strangely enough, the vast majority of people entering picks in pools with upset points pick the same way as they would in pools without upset points. This is silly. In upset pools, for example, correctly backing a No. 13 seed to beat a No. 4 seed could be worth 5-10 times as much as the alternative. This makes taking Mercer over Duke, or Georgia State over Baylor, turn into reasonable choices.

If your pool does not have upset points, however, and is scored in standard 1-2-4-8-16-32 form, its usually not worth picking many upsets. In these formats, the only thing that really matters is picking the champion and finalist. So, it is not worth trying to be hip and taking Georgia State. And its not worth worrying about picking the right No. 12 seed to beat a No. 5 seed. Just take the teams that are favored to win by sportsbooks (lines here) and don’t fall behind.

2- Your champion depends on the size of your pool

Just like most events, there are usually no prizes in March Madness for finishing in 10th place. Your only goal is to finish first. So how does that impact your choice of teams?

Quite simply, it is a better idea to take riskier options as the size of your pool increases.

Over on Grantland, Ed Feng provides a useful example with the 2010 tournament, showing how Duke was undervalued, Kansas was overvalued, and the backers of Duke had much better chances of winning their pools as a result. Here’s his graph:

Screen Shot 2015-03-17 at 8.58.35 PM

The probability of winning in 2010 after backing Duke was roughly five times higher than after backing Kansas, despite the two starting the tournament with similar probabilities.

So, estimate the number of people in your pool, and vary your aggressiveness as a result.

3- You are not trying to pick games correctly, but trying to get more points than your competitors. 

An important March Madness strategy lies in identifying the teams you think your opponent is going to pick, and then picking the opposite teams.

I stole this from Jordan Sperber’s blog, which has some great insights on under and overvalued teams. Here’s a Table with 2015 Final 4 odds, using the expected values from Ken Pomeroy’s website and those chosen by the public.

Screen Shot 2015-03-17 at 8.55.39 PM

So, here are some Final 4 teams with decent 2015 probabilities that your opponents are not picking: Villanova, Gonzaga, Utah, Arizona

And here are some Final 4 teams with decent 2015 probabilities that your opponents are picking way too often: Duke, Louisville, Wisconsin, North Carolina

So, what does this all mean?

It looks like about 1 in 4 sheets on has Kentucky and Duke in the final game. My top advice is that you are not one of these people in 2015.

Of course, this is not because I don’t think UK and Duke can reach the finals; instead, it is because even if they do reach the finals, you still probably won’t win your pool. There’s just too much competition in picking those two teams.

In larger pools, Arizona, Villanova, and Virginia, are all being backed by small fractions of the public (no more than 6%, according to ESPN), but each have between a 9% and a 13% chance of winning the title, according to many of the sites that run through bracket simulations. One of those teams appears much more likely to give you a chance at capturing first place.

Memo to the NHL: Shootouts are still a problem

Last offseason, officials from the National Hockey League met to discuss possible ways to limit the number of contests decided by a shootout. Shootouts, while entertaining, can have a disproportionately large effect on league standings, given that, by and large, these outcomes are random.

One of the minor ‘tweaks‘ to the former rules was that for the 2014-2015 season, overtime sessions would mimic the second-period in that teams would be forced to make longer line changes during overtime. The longer line changes, in principal, would create more scoring opportunities during the overtime session, which would thus limit the number of games that eventually ended in a shootout.

Some solid math supported this idea, too. Rink Stats‘ Stephen Pettigrew, for example, used differences in scoring rates from the first and second periods to estimate that around 35% fewer games would reach the shootout, comparing overtime rules with the longer line-change to the shorter line-change. That would be a massive reduction, and one that most hockey fans would be happy to see.

Alas, not much has changed during the 2014-15 season, despite the rule change. Overall, roughly 14% of NHL games have been decided by a shooutout this year, which is similar to past years.

Here are the overall percentages of games that have reached a shootout, by season:


Next, among OT games, here are the percentages of games that reached a shootout.


Again, pretty similar rates to previous year.  Just more than half (56%) of games reaching OT have subsequently reached a shootout in 2015, which is just a slight tick below the average rate from 2006 through 2014 (57%).

If the league wants to limit the effect of the shootout on league standings, the change is simple – prevent teams from wanting to get to overtime in the first place. Create an incentive for teams to win in regulation – like, say, a three point rule – and they’ll stop playing overtime so often. More on this to follow, but feel free to read more in this article here.

Sunday is a day for relaxing. Just ask the NHL

A few weeks back, folks in the Harvard Sports Analysis Collective (HSAC) looked at the shooting rates of NBA players based on whether or not the game fell on a Sunday (link).  While the evidence was mostly inconclusive in the HSAC study, I thought it was a good idea to check for whether or not similar results exist during Sunday NHL games.

Piggybacking on recent work from Carnegie Mellon’s and War-on-Ice’s Sam Ventura, I calculated the expected goals in each NHL contest between the start of the 2005 season and February 1, 2015, which is done using a logistic regression model based on the type of shot, shot location, and shot distance. This was done based on both the game’s minute, score, and the day of week in which the game was being played. Here, note that I’m using expected goals, which Sam and other folks have shown to be as if not more predictive of future goals than traditional statistics like goals or shots.

Looking at the first period, I plotted the expected goal rate per each tied-game minute (Minute 1 through 20), using a different color for games that occurred on a Sunday and those occurring on all other days. I also included 95% confidence bands for the loess smoother, although I should note that these are the default bands in the ggplot2 package, and might not fully account for the variability in each rate.


At any rate, the difference between Sundays and all other days of the week was much larger than I anticipated. It does seem plausible that Sunday NHL games take a less aggressive tone, at least in terms of expected offensive output.

And here’s a plot of Sunday compared to every other weekday:

NHL daily

Of course, its impossible to tell exactly what drives potential drops in expected goals based on Sunday. For example, it could simply be that Sunday games tend to be the second of back-to-backs for one or both participating teams, which might cause tired legs. It could also be that Sunday games tend to feature matinees games, although the same could be said for Saturday (Note: reader Justen Fox notes that nearly all Sunday games are played before nighttime, compared to only about a quarter of Saturday games)

In any case, I thought that this was an interesting result worth sharing.

Also, I’m working on a more extensive and related project that should finish up within the month, and at that point, I’ll happily share code.