Thoughts on the Sloan research paper contest

Folks who have submitted abstracts over the past two years to the Sloan Sports Analytics Conference research paper contest were recently surveyed as to their thoughts on the contest.

Here are my (expanded) answers to the open ended question “Do you have any other suggestions or comments that will help us improve the research papers competition?

1- Maintain a strong prize pool, but eliminate the crazy discrepancy between 1st and, say, 5th place. 

In the current set-up, first place is $20k, second place $10k, and third place onwards is nothing. This structure incentivizes researchers to oversell their findings, because admitting that your work is simply building on the research of others is not nearly as sexy as claiming to be the first in your field to find something.

What’s a more equitable system? One that encourages good content, appropriate citation of sources, and makes it clear why each paper is relevant to advancing sports analytics.

From a prize perspective, this makes it less of a crapshoot. Financially, each finalist gets $2k and a free ticket. Winner get $10k. Boom, done.

2- Reward participants whose submission is reproducible.

I cannot remember a single finalist paper that has either included (i) its data or (ii) its source code. This is not a good (note: I’m also guilty. I didn’t submit code or data two years ago).  Given that the majority of findings in professional research are not reproducible, it is difficult – perhaps impossible – to know if each paper truly got things right. Rewarding papers that include data and source code would be a major step in promoting reproducible research.

Of course, work is only reproducible if the data set is public.  A more aggressive but related idea would be to use separate tracks for both proprietary and public data. This was suggested a year ago by analyst Christopher Long (and perhaps by others).  Such a distinction levels the playing field among researchers who have good work to share but are working with standard data, where it is becoming more and more difficult to make novel discoveries each year.

3- Implement a conference proceedings section.

For many people in academics, there is a lesser incentive for submitting to Sloan given that, unless your paper finishes as one of the finalists, all of your work is for not. Having a conference proceedings would likely encourage more submissions in this regard. If you are worried about the cost, publish online only and charge anyone who wants a hard copy. This would be very cheap.

4- Also allow submissions in TeX.

For years, the conference has used the same Microsoft Word template for participants. But many analytics researchers use TeX and only TeX for their work, as the formatting, particularly for mathematical notation, is substantially easier and more readable in TeX than in Word. TeX is also more visually appealing than Word.

Allowing submissions in both TeX and Word seems like an easy compromise.

**********

Happy to hear other takes as well. I am appreciate of the fact that SSAC has upgraded the rewards for poster recipients over the past few years. Futher, the fact that SSAC has implemented a survey in the first place is hopefully a promising sign of changes to come.

Two reasons the future two-point conversion rate might be higher than current estimations

The NFL recently updated its extra point rules, moving the yard line for extra points from the 2 to the 15.

In the wake of the the change, one topic of conversation is whether or not the offensive teams will choose to go for two more often. The thought is that if extra points become more difficult, perhaps it is worth the risk of getting two points.

Critical to the conversation is the idea of expected points, which weigh the points and probabilities of conversions against those of extra points.* However, a basic expected points analysis requires, among other assumptions, both that we have reliable data and that all two-point conversions are created equally.

That may not be the case. Here are related reasons that the conversion rate might be higher that its currently being estimated (in most places, around 48%).

1 – Data issues

Using data from Armchair Analysis (AA), I was accurately able to confirm that teams converted 48% of the two-point conversions since the 2000 season, a number that has been reported in several outlets. But I was also able to obtain that the primary rusher or passer on 40 of these plays did not exist and that a punter/kicker rushed or passed the ball on another 26 plays. Here are some of the players of the players listed with conversion attempts: C Kluwe, B. Moorman, K. Walter, S. Koch, T. Sauerbrun. We can’t expect that crew to be leading conversion attempts on real conversions from the two-yard line in 2015 and beyond

Overall, offensive teams converted just 6% of such attempts into two-points (4 for 26 on plays handled by the kicker or punter, 0 of 40 otherwise). More likely than not, these 66 plays were fumbled snaps, fake extra points by designs, or muffed somethings.

If you remove the unknowns and kickers/punters from AA’s conversion data, things look a bit different, with teams converting at 51% since 2000.

2 – Teams that have gone for two have been generally playing from behind

Here’s a chart of the scoring margin at the time in which a team is attempting a conversion (I focused on point differentials between -20 and 20, which ignores a few games on the outside).

dotplot

Treating all conversions as identical misses the fact that under the previous system, most teams going for it had to do so given the score differential. Moreover, 60% of the teams that went for two were trailing at the time of the attempt. And if those teams were trailing, you could make the argument that they were likely worse than their opponent in terms of overall talent.

Comparing the success rates of these teams yields a small difference: teams leading converted 53% of the time, compared to 49% among trailing teams.

It seems reasonable to think that even the 51% is a slight underestimation of what the success rate of teams would be if there were more evenly distributed attempts by team talent. Further, there’s also somewhat of an association between conversion rates and a teams offensive proficiency: see the next section for a chart.

*******

Other notes:

-There are likely more issues remaining with the data. For example, a botched snap featuring a pass from TE Jay Riemersma shows up in our data (the play is listed here, many thanks to a loyal reader for finding this stuff). But there are also purposeful conversions from non-kickers, including Antwaan Randle El, who had three of them. Further work is needed.

-Teams passed on 71% of their conversion attempts. Strange, given that passing attempts were only 48% successful, compared to 59% of rushes.

Update: A few smart folks pointed out that many of the rushes might be QB scrambles. Here are the success percentages and counts by play type:

Rushes: RB’s 109 of 190 (57%), QB’s 47 of 72 (65%), WR’s 4 of 6 (66%)

Passes: QB’s 313 of 659 (47%), WR’s 5 of 7 (71%), RB/DB/TE 1 of 4 (25%)

-There did not seem to be any differences in conversion rates by weather or surface.

-Here’s a plot of success rate and conversion attempts for each team since 2000. Jets and Cardinals doing their thing.

SuccessRate

********

*Here’s a primer on expected points:

Teams have successfully converted 48% of their two-point conversions over the past 15 years – it’s at 49% over the last three years – making the expected number of points on a two-point conversion attempt approximately 0.49 x 2 = 0.98. Alternatively, given that teams make between 90 and 95% of their field goals from near the 32-yard line, and that the extra point remains worth a single point, we can assume that the expected value of a longer extra point is somewhere between 0.90 and 0.95. Game, score, and coaching conditions aside, on average, it is evident that there’s now a slight advantage to going for two.  Benjamin Morris makes the excellent point that we should also expect the number of expected points on XP’s to rise, too, given how awesome kickers have become.

85% is a unicorn – on predictions in the National Hockey League postseason

Screen Shot 2015-05-20 at 12.40.25 PM

On February 20, the National Hockey League announced a partnership with software company SAP. The alliance’s primary purpose was to bring a new enhanced stats section to NHL.com, built in the shadows of popular analytics sites like war-on-ice and the now dormant extra-skater.

It was, it seemed, a partial admission from the league that it’s best metrics were hosted elsewhere.

“The stats landscape in the NHL is kind of all over the place,” suggested Chris Foster, Director of the NHL’s Digital Business Development, at the time. “One of the goals is to make sure that all of the tools that fans need are on NHL.com.”

One tool presented in February was SAP’s Matchup Analysis, designed to predict the league’s postseason play. The tool claimed 85% accuracy, which Yahoo’s Puck Daddy boasted was good enough to make “TV executives nervous and sports [bettors] rather happy.”

There’s just one problem.

85% is way too high in the NHL.

Specifically, at the series level, 85% accuracy is a crazy good number for the short term, likely impossible to achieve long-term. And at the game-level, 85% is reachable only in a world with unicorns and tooth fairies. A more reasonable upper bound for game and series predictions, in fact, lie around 60% and 70%, respectively.

So what in the name of Lord Stanley is going on?

To start, the model began with 240 variables, eventually settling on the 37 determined to have the best predictive power. Two sources (one, two) indicate the tool used 15 seasons of playoff outcomes, although an SAP representative is also quoted as saying that the model in fact used four or five years worth of data. This is a big difference, as using 240 variables is a risky idea for 15 seasons (225 playoff series), much less five.

But it’s also unclear if the model was predicting playoff games or playoff series. Puck Daddy, like most others, indicated that it was meant for predicting playoff series, but in its own press release, SAP indicated that the 85% actually applied to game-level data.

So as the details of the algorithm remain spotty, here are two guesses at what happened.

1- SAP’s 85% is an in-sample prediction, not an out-of-sample one.

Let’s come up with a silly strategy, which is to always pick the Kings, Bruins, and Blackhawks to win. In all other series, or in ones where two of those teams faced one another, we’ll pick the team whose city is listed second alphabetically.

This algorithm – using just four variables – wins at a 68% rate over the 2010-2014 postseasons.  But note that the 68-percent is measured in-sample, where I designed it. Predictions are useful not in how they perform in-sample, but in how they do out-of-sample –  that is, when they are applied to a data set other than the one in which they were generated.

The NHL’s 85-percent seems feasible as an in-sample prediction only, and it’s overzealous to use in-sample accuracy to reflect out-of-sample performance. Our toy approach with four variables, for example, hits at a not-so-impressive 47% clip between 2006 and 2009.

2- SAP’s model included overfitting and multicollinearity, a dangerous strategy.

It’s easy to assume that using 240 variables is a good thing. It can be, but including so many variables with small sample sizes runs the risk of overfitting, where a statistical model includes too many predictors and ends up describing random noise instead of the underlying relationship.

And with so many possible predictors, it shouldn’t be surprising that at least one will be surprisingly accurate in a sample of games. For example, shorthanded goal ratio, a somewhat superfluous metric, predicted more playoff series winner between 1984 and 1990 than both goals for and goals against.

Further, while it is tempting just to combine several such predictors together, many of these variables are likely correlated. Including highly correlated variables in the same model is known as multicollinearity, which can make predictions sensitive to small changes in the data, including when applied to out of sample data.

Fortunately for skeptics of SAP’s model, the 2015 postseason provides our first example of out-of-sample data with which to judge SAP’s predictions.

Through rounds one and two, the Matchup Analysis tool looks not much different than a balanced coin, correctly pegging seven of the twelve series winners. But a tool with 85% accuracy would pick 7 in 12 winners or worse only about 1 in 50 times (2%). In other words, unless the 2015 tournament is a 1 in 50 type of outlier, we can be confident that the model’s true accuracy lies below the 85% threshold. Finally, keep in mind these are rounds 1 and 2, which should be the easiest round to predict, given that these rounds tend to feature the largest gaps in team strength.

The Matchup Analysis tool might be awesome, and perhaps it is more accurate than using any of the NHL’s enhanced stats alone. However, it appears likely that the algorithm will fail to meet its own high standards; even if it accurately predicts each of the final three NHL series, the tool won’t crack 70%.

It has been said that sports organizations have a cold start problem with analytics. Writes Trey Causey, “How does an organization with no analytics talent successfully evaluate candidates for an analytics position?”

In such situations, it is easy to fall prey to sexy numbers like 85%. But like unicorns and tooth fairies, such predictive capabilities in the NHL are likely too good to be true.

Penalties and the NHL

Noah and I wrote an article about penalty patterns in the NHL. It’s over on FiveThirtyEight – many thanks to the folks over there for efforts in helping us put it together.

I wanted to share two plots that I thought were interesting, and related to our study.

First, here’s a contrast of penalty violations, comparing the probability of a home team penalty by game type (regular, postseason).

RS_PS

Home teams are unlikely to get penalties when they are owed penalties. They are even less likely to get those penalties in postseason play.

Also, Sam and Micah suggested that it would be worth looking at the type of penalty called given each scenario. While some of the penalties are of unknown types (strange data entry), here’s a mosaic plot of the types that are known, using the first four letters in the columns as the penalty type, along with the penalty differential.

mosaic

I might be missing something, but there doesn’t seem to be huge variations in the frequency of each penalty call given the previous penalty differential. This would support an argument that penalties are not substantially driven by retaliation.

A generalized linear mixed model approach to estimating fumble frequencies in the National Football League

I told myself I was done with with Deflategate – and really, I was – that is, until I read this.

Now I actually have some validation in the field,” Sharp said. “‘Hey, this guy was right all along.'”

Wait, what?

Forget the data twisting and statistical errors of the original analysis. The author claims to be vindicated by the fact that the Wells report found Patriots quarterback Tom Brady to be ‘more likely than not’ to have been involved with the deflation of footballs.

Okay then.*

*******

But despite my skepticism regarding Sharp’s analysis, two of the brightest minds in football analytics also taken the time to look at Patriots fumble rates, eventually concluding that the Patriots were indeed outliers.

First, after comparing Sharp’s critics to Nabisco running a study on snack cookies**, Brian Burke used multiple linear regression to model the number of fumbles in each NFL game since 2000, finding that the Pats posted much lower rates than the rest of the league in the years following 2007. Next, Benjamin Morris argued that the likelihood of team fumbling rates being at the Patriots levels or lower to be about 1 in 10,000. Linking low fumble rates and Deflategate findings, Morris writes that it “makes it more likely that the relationship between inflation levels and fumbling is real.”

One thing that Morris argues for – which I agree with – is that “there’s definitely more to be done on the Patriots fumbling to isolate for the fact that they were the most consistently winning team, the types of plays they ran.

As Morris indicates, and what Burke hints at, is that modeling fumble rates is not straightforward, nor close to it. Because NFL teams aren’t randomized to run the same plays with the same time on the clock and from from the same spot on the field, any finding through this point has been evidence on the aggregate, averaged over games, plays, or perhaps a few in-game variables.

A play-by-play analysis, however, is missing.

And while it doesn’t ‘vindicate’ any particular finding, nor leave the Patriots free from suspicion, I found the task of looking at NFL play-by-play data to determine fumble rates quite interesting.

*******

I took the last 15 years of play-by-play data from Armchair Analysis (AA). All the code is linked here: the data costs $35, so I can’t provide that, unfortunately. However, if you have AA’s data, feel free to play around. Also, I’m going to focus on data from 2007 onwards. If you are interested in contrasting whether or notPatriots fumble rates changed substantially at any point over the last 15 years, I’d recommend a change-point analysis.

Let’s start with some descriptive statistics.

Point 1: Teams are less likely to fumble on QB kneel downs.

It’s easy one to begin with.

In fact, you are probably laughing right now, and you should be. There have been 5284 NFL kneel-downs since 2000, and not a single one resulted in a fumble using AA’s data. So who cares?

Here’s a plot of the teams who have taken the most kneel downs since 2007.

kneeldowns

More than 25 snaps ahead the second place team, the Patriots have the most kneel downs.

Mentioning kneel downs seems silly, but this matters. Including kneel downs in an analysis of fumbles per play inflates the denominator (number of total plays) among teams more likely to be taking a knee, as the Patriots apparently were. In fact, the correlation between fumbles per play and kneel downs is -0.6. Here’s the relationship between the two variables. Teams with lower fumble rates tend to take more kneel downs (for one of several reasons).

Rplot01

After making the graph above, I deleted these plays. I also deleted QB spikes (Patriots had more of these than the average team) and any pass that was intercepted (Patriots had fewer than average). It’s hard for the offensive team to fumble on these plays. It’s even harder to fumble on kneel downs.

Point 2: Teams are less likely to fumble when they have the lead.

This was a bit surprising to me. For the regression model, I characterized each play based on the possession lead (3+, 2+, 1+, 0, 1-, 2-, or 3-) of the team with the ball. For example, an offensive team leading by more than 16 points would be up by three or more possessions.

Like kneel downs, scoring differential matters. Teams with the ball up by three possessions or more fumble more than 20% less often than other teams with the ball. So let’s see which teams have run the most offensive plays while up by three touchdowns.

ThreePosLEad

Again, the Patriots show up, with nearly three times the median number of plays when holding a three possession lead. Again, this matters. To generally contrast New England’s fumble rates with Cleveland’s, when the Patriots have run more than 11x as many plays with a 3+ possession lead as the Browns, is silly. Teams fumble with the ball less when they are leading on the scoreboard.

Point 3: Yard line matters

Given the tighter window with which to run a successful play, it stands to hold that teams would fumble less on plays close to their opponents end zone. So, similar to points 1 and 2, any aggregated analysis of fumble rates could abnormally penalize teams that run a disproportionate number of plays in this area. Here’s the number of goal-to-go plays for each team since 2007.

Rplot

The Patriots have run nearly 200 more goal-to-go plays than any other NFL team since 2007.

*******

Hopefully we can agree that not all plays are created equal. So how can we account for all of these factors?

Using hierarchical generalized linear mixed models (GLMM) of binary data via the lme4 package in R, I modeled the log-odds of a fumble occurring (Fumble = Yes/No) as a function of several play and game specific factors that are conceivably associated with fumble likelihood.

A hierarchical mixed model is advantageous for a few reasons. First, we can account for game conditions (such as the weather), play conditions (like down, distance, and yard line) and play characteristics (run left or pass deep right, for example) that may dictate fumble rates. Next, instead of model with several dozen fixed effects for each team’s offense and defense, we’ll use random intercepts for both the offensive and defensive units. Of particular interest will be the random intercept for New England; if this intercept is extremely low, it would provide evidence that after accounting for all the game and play specific variables, the Patriots fumble rate remains mysteriously lower than other teams. We can also test the significance of the random intercept for each team – if it is variance term is significantly different from 0, it would provide evidence that there remains substantial variation in the fumble rates driven by the team with the ball or the team on defense.

Please note that some of these results mirror a live-tweet version of the model that I ran in late January, but please check out the R code for how I decided to syphon things like down & distance, etc. These decisions were not easy, but were made with the intent of identifying what characteristics of each play might determine fumble outcomes. Here are the fixed effects included in the GLMM:

Score, Play direction, Final Minutes (Y/N), Playoffs (Y/N), Weather/Surface, Goal to Go (Y/N), Home team on Offense (Y/N), Goal to Go (Y/N), Down/Distance, No huddle (Y/N), Shotgun (Y/N), over/under, and spread.

And here are the random intercepts***:

Offensive Unit, Defensive Unit

And here’s the code. Model results are here:


fit.rush<-glmer(Fumble10~Score+playcall+FinalMins+Playoffs+Weather+GoaltoGo+OffHome+
 DownDistance+sg+nh+ou2+spread+(1|off)+(1|def),data=filter(pbp,type=="RUSH"),
 control=glmerControl(optCtrl=list(maxfun=300)),
 verbose=TRUE,family=binomial())
summary(fit.rush)

fit.pass<-glmer(Fumble10~Score+playcall+FinalMins+Playoffs+Weather+GoaltoGo+OffHome+
 DownDistance+sg+nh+ou2+spread+(1|off)+(1|def)
 ,data=filter(pbp,type=="PASS"),control=glmerControl(optCtrl=list(maxfun=300)),
 verbose=TRUE,family=binomial())
summary(fit.pass)

*******

The first thing we’ll look at is a plot of random effects for each of the GLMM fits. On the left is passing plays, on the right, running plays.

Screen Shot 2015-05-09 at 9.06.09 PMRushplays

Once you account for play and game characteristics, it is really difficult to distinguish between the fumble rates of NFL teams.

In looking at passing plays, the random intercept terms for each offensive team are not significant predictors of fumble rates. The Patriots ranked as third in terms of teams least likely to fumble, given our model’s parameters. No teams intercept is noticeably different from 0.

There’s slightly more descriptive ability in using random intercepts with rushing plays. The Patriots’ intercept lies the furthest from 0, but it is not noticeably different from teams like Indy, Jacksonville, and Atlanta, which also boast significantly lower rates of fumbling on running plays.

Interestingly, Washington has the highest intercept on both rushes and passes.****

*******

If you are still reading, it is greatly appreciated. Mixed models have been used in awesome ways to answer really good questions in sports (see catcher framing and deserved run average for recent examples).***** This is not one such awesome application.

However, we learn in Introduction to Statistics that two variables are often associated for reasons beyond a causal mechanism. Given the results here, it seems safe to say that part of the link between the Patriots and low fumble rates was driven by game and play-specific conditions that those two variables were also associated with. Further, its easy to forget about funny data quirks in nearly all applied work, as we noticed with kneel downs and spikes in the football play by play data.

*******

Footnotes:

*There are a few other issues to consider. First, the Wells report also proposed that the Patriots started purposely deflating footballs in mid-2014. So, any lower fumble rates prior to this would have been, in relative terms, within league rules. Further, there’s also the issue of whether or not the Patriots ‘deflater’ travelled with the team, which unfortunately goes against the author’s inclusion of all games simultaneously. I can’t believe I just wrote the word ‘deflater.’

**This comparison seems ironic looking back, given that the NFL hired Exponent for its Wells report. Exponent was once was paid to argue that secondhand smoke did not cause cancer, among other suspicious claims.

***You may be asking yourself if we should be including effects (intercepts) for each running back. This is a fair question; if we include running backs as intercepts in the model, all team intercepts go to essentially 0. Given the RB’s are not randomized to carries, any team that purposely avoids playing running backs with high fumble rates would be penalized in our current fitting strategy.

****As a final step, I looked the significance of the random intercepts, given that from a model building standpoint, it’s generally preferred to use a model as parsimonious as possible. Including the random intercepts for both the offensive and defensive units significantly improves the model of fumbles on running plays, as judged by comparing the BIC of models with each random intercept to those without. On passing plays, the intercepts should be dropped from the model; there’s no evidence that, after accounting for game and play-specific covariates, teams’ fumbling rates differ from one another on passes.

*****A Bayesian strategy is also easy to implement. My guess is that a prior on team by team intercepts would only work to drag each team closer to 0.

*******

UPDATE: Scott from Football Outsiders requested that I include year effects as opposed to aggregating data across every season. I nested intercepts for each year within each team; this would account for seasonal trends for each unit, incorporated within some larger team effect. The yearly effects were indeed significant, both for offensive and defensive units. 

Here’s the plot of the random intercepts for each team, after accounting for seasonal trends.

Screen Shot 2015-05-12 at 12.21.01 AM Screen Shot 2015-05-12 at 12.47.33 AM

NBA win totals, 2014-15 edition

We’re back again to judge preseason predictions, this time looking at NBA prognosticators. Our outcome of interest is the number of wins for each NBA team, and we’ll compare predictions to the totals set by sportsbooks in late October.

Despite my best efforts on social media, I could only find three competitors: Team Rankings, the Basketball Distribution, and Seth Burn. I also merged the predictions from those three sites, in what I’ll call the statheads Aggregate.

Our first criterion is Mean Absolute Error (MAE), which represents the average deviation between the predicted win total from each site and the observed total. In addition to the prediction sites, I also calculated the MAE using last year’s win totals, as well as 41’s, which represents a prediction of every team finishing 41-41.

For the 2014-15 season, statheads reigned supreme.

NBAtotals

The Basketball Distribution and the aggregated picks from the three sites were both noticeably better than the totals set by sportsbooks. This matched results from last year for the Basketball Distribution. (Note: Nathan from the Basketball Distribution sent along a table with similar results to these).  

Of course, its easy to assume that these results could be due to chance. To check this, I simulated 1000 seasons by using the sportsbook totals as the true mean team totals (slightly scaled, to account for the fact that the sportsbook totals add up to 41.5 wins per team), while also assuming that the 2014-15 season standard deviation of 9.3 marks the actual standard deviation of win totals. Here’s a plot of the simulated MAE for both Basketball Distribution and the Aggregate picks.

Screen Shot 2015-04-17 at 10.01.11 AM

In only about 1% of simulations was the MAE for Basketball Distribution lower than the observed total. That’s a pretty solid effort. The aggregated picks from the three sites finished in about the 4th percentile, another solid effort.

And while it was a good year for the statheads, it wasn’t quite that for the Knicks. At 17 wins, New York finished more than 20 wins below its expected total of 40.5.

Here’s a team by team play of observed and predicted totals, arranged in order of projected total (the V). The calculator image refers to the stathead projection.

NBA2015

Hear the story about the Michigan State bettor who could win 1 million dollars? Read on.

If you’ve clicked on a sports website in the past week, you’ve probably read an article about a bettor in line to win 1 million dollars if Michigan State were to win the NCAA title this weekend.

Indeed, Derek Stevens, a supposed regular at the Golden Nugget casino, placed a $20,000 bet on December 5th on the Spartans to win the title. At 50:1 odds, Stevens’ bet will pay big if the Spartans can pull off upsets of both Duke and either Kentucky or Wisconsin this weekend.

Awesome, right?

There’s just one problem.

Stevens’ bet wasn’t that smart.

Back on December 5th, the Spartans were 5-3 and looked headed for the relatively unspectacular regular season that they were expected to have. Preseason polls had Michigan State as the third best team in their own conference and ranked No. 18 in the nation, and by week 5 of this season, the Spartans were unranked. Getting Michigan State at 50:1 wasn’t a good deal then and I’m still not sure it looks like a good deal now (most prediction sites only give MSU about a 5% chance to win it all, even with two games left).

Instead, like many futures bettors, Stevens would have been much better investing his $20k in the Spartans before the tournament began. For example, I went through MSU’s tournament path what a bettor would have earned off a $20k investment in the Spartans to win each game, assuming that 100% of the return was simply re-invested in Michigan State to win its next game. For example, the Spartans money line was -250 against Georgia; a bet of $20k would’ve yielded an $8k profit, and together, that $28k sum would be again invested on the Spartans to beat Virginia.

Here’s a graph of the results.

gbg_MSU

Two things stand out. First, a bettor would stand to make just under $1 million on the Spartans in this scenario…before they even played the championship game. Indeed, this strategy would yield a profit of just over $900,000 if Michigan State beats Duke on Saturday.

Second, assuming a line of +350 for Michigan State versus either Kentucky or Wisconsin (and that’s generous towards MSU), such a strategy would yield more than $4 million if MSU were to win the title.

Of course, there are several caveats here. Futures bets are not always bad ideas, and you cannot use the described strategy with all teams. For example, Kentucky was +650 to 2015 title a year ago; before the tournament began, the Wildcats were close to even money. Indeed, the Michigan State bet would’ve looked like a pretty good wager if the Spartans had started the tournament at, say, 10:1 or 12:1 (they didn’t, opening at anywhere from 30:1 to 40:1).

So, while the headline you are reading continues to say “Bettor can win $1 million dollars on MSU,” perhaps the more applicable headline is “Bettor cost himself $3 million dollars in profit by taking a futures bet in December.”