After six years of graduate school – two at UMass-Amherst (MS, statistics), and four more at Brown University (PhD, biostatistics) – I am finally done (I think). At this point, I have a few weeks left until my next challenge awaits, when I start a position at Skidmore College as an assistant professor in statistics this fall. While my memories are fresh, I figured it might be a useful task to share some advice that I have picked up over the last several years. Thus, here’s a multi-part series on the lessons, trials, and tribulations of statistics graduate programs, from an n = 1 (or 2) perspective.
Part IV: What I wish I had learned in my graduate program in statistics (with Greg Matthews) The point of this series is to be as helpful as possible to students considering statistics graduate programs now or at some point later in their lives. As a result, if you have any comments, please feel free to share below. Also, I encourage anyone interested in this series to read two related pieces:
It’s no secret that American football lags behind several other sports in its perceived ability to use statistical tools. But in my opinion, one of the most obvious aspects of the game that demands more attention lies in predicting the play-calling of an opponent. Knowing whether or not an opponent is likely to run or pass can make an important difference in terms of both play-calling – should you blitz? should you expect a screen? – and in terms of how aggressive certain defenders should play.
In that regard, I figured it was worth a quick investigation. In this post, I’ll suggest that the link between one play call and the next, at least early in a game, is a bit stronger than I thought it would be.
The importance of a run-pass balance is a common football narrative. And because coaches want to appear balanced between the run and the pass by the end of a game, they may also feel the need to appear balanced between the run and the pass in small samples of plays. If a coach calls three run plays in a row, he may fear looking too committed to the run-game, or, even worse, too predictable for the defense.
Of course, it’s not just football. If it exists, an evening up of play types would reflect more general human misconceptions rooted in probability. It’s why when we play rocks-paper-scissors, we rarely use the same throw three times in a row. If you aren’t gonna throw rock after throwing rock-rock in rocks-paper-scissors, you probably aren’t gonna run after calling run-run during a football game. And a similar bias also impacts sport officials. In the NHL, for example, referees calling violations on one team are more likely to call the next penalty on that team’s opponent, no matter the game’s score. Just like coaches want to appear balanced, so too do referees.
While a large-scale predictive model of opponent play calls would be one of the first things I would do as an NFL team analyst (see this example or this one), it may not be the most straightforward way to look at whether or not coaches even up play calls. In particular, decisions made as the game progresses are particularly tied to the score. And from my perspective, although the approaches shown in the links above include a term to test for an autocorrelation of play calls, the exact effect remains unknown.
To reduce the impact of other play and game characteristics, I’ll start as simple as possible, by only looking at a team’s first few offensive plays in a game.
Per usual, I’ll use the play-by-play data provided by Armchair Analysis, which includes each play from 2000-2015. To limit the effect of field position, I only included drives that started between the 10-yard lines, and I dropped penalties to focus on the remaining runs and passes.
Here’s a chart of run percentages on each team’s second play, varied by the play-type of the first play. The error bars account for our uncertainty in each probability estimate.
Teams run more often after they pass, and they do so significantly more often – an absolute difference of about 12%. On a relative scale, teams are about 25% more likely to run when their first play was a pass.
That said, savvy readers may have picked up on the fact that if rushes and passes were to result in different types of second plays (e.g, different yards to go), such a comparison wouldn’t make sense.
But we can look closer. Here’s the same chart above, faceted by the down and distance of the second play (2nd & short: 3 yards or less, 2nd & medium: 4-6 yards, 2nd & long: 7 yards or more).
For 2nd & shorts (bottom right), there’s no obvious difference in the likelihood of running based on the initial play call. Teams tend to run the ball here.
Among other play types – in particular, on 2nd & medium and 2nd & longs, there remains a significant difference in how an offense calls its plays given what it just called. On 2nd & longs, for example, teams rushed 44% more often (an absolute difference of 19%) after passing on first down. That’s an enormous effect.
Of course, there may be other things at play. Perhaps teams failing at one play type (rush or pass) feel the need to try another play type (pass or rush) on the second play. But if you’re feeling the need to vary your play calls based on the first play of the game (literally, that’s the only play on the x-axis), that’s a whole other issue to write about.
But we can also go just beyond the game’s first two plays. Here’s a histogram of the number of rush attempts using the first four offensive plays for each team in each game. The red bars reflect what we’d expect if teams were to pick four play types (runs and passes) out of a hat (using a run type probability of 49%); the black bars reflect what we see in the data.
The higher black bar in the middle highlights that in the first four plays of the game, coaches make more of an effort to call exactly two runs and two passes (about 46% of the time) than what we’d expect due to chance (37% of the time). Along similar lines, while we’d expect about 13 in 100 sequences of four plays to include all rushes or all passes, that only happened about 7 in 100 times in the data. Altogether, this matches our conclusion from above; coaches are a bit more balanced than we’d expect them to be if they were randomly dialing up plays.
Offensive play-callers are probably better at designing plays than we give them credit for. Schemes are enormously complex, and the amount of detail that goes into a gameplan can be awe-inspiring.
But during a game, when faced with split-second (well, 40-second) decisions, it’s natural for those same play-callers to revert to predictable tendencies. In the case of the above evidence, it appears more likely that, all else being equal, runs are more likely to follow early-game passes and passes are more likely to follow early-game runs.
During two of the last three Sunday Night Football contests, one of the participating teams scored in the final few minutes of the fourth quarter to take a 7-point lead. At those points in the game, both Denver (last night) and Seattle (two weeks ago) were faced with the decision to kick the extra point, thereby likely securing the 8-point advantage, or to go attempt a two-point conversion, thus taking a (roughly) 50-50 chance at a two-possession lead.
What’s the optimal strategy? It’s a tough question, so I posed this on Twitter.
In 2 of last 3 Sunday night NFL games, a team scored to take a 7-pt lead late in the 4th quarter. Should they go for two?
Roughly 50% of my respondents (overall, a more analytic-friendly crowd) answered that, yes, teams should go for two, with the remaining voters equally split between “No” and “It depends.”
In this post, I’ll suggest that, at least empirically, it hasn’t made a ton of difference one way or the other.
In considering the optimal two-point strategy with a seven-point lead, we can start by looking at how often teams have come back when trailing by seven, eight, or nine points. While there are hundreds of games where teams have scored and kicked an extra-point to build exactly a seven-point lead late in the game, it’s a bit dicier to find examples of teams scoring and taking a seven-point lead before kicking the extra point. Using Armchair Analysis’ data, for example, there were just 88 such examples between 2000 and 2015.
So instead of looking at those 88 games, I expanded the analysis to include any game where a team took possession in the final eight minutes of the fourth quarter between 10 and 40 yards from their own goal when down either 7, 8, or 9 points. In essence, this adds about 1300 contests (so 1400 total) that should be equivalent to a team trailing late in the game having just given up a touchdown.
Here’s how the games eventually played out. The chart below shows the fraction of times that the winning team held on, depending on the size of their lead. The size of each dot is proportional to the number of games with teams in those situations. I also used two colors to vary when the offensive team started its possession.
Teams ahead by seven points have won about 86% of games when starting a possession on defense with between 4 and 8 minutes left, a number that jumps to 89% when up eight and 94% when up nine points. This makes sense. If you have a larger lead, you are more likely to win.
And there’s a similar increase for teams getting the ball in the final four minutes of a contest (shown in red). In fact, in the 94 games when a team has started a defensive possession with fewer than 4 minutes left when ahead by exactly nine points, they’ve won all 94 times. That isn’t to say that teams can’t lose when ahead by this margin – they’ve lost when up by 10, for example – but it’s quite unlikely. A two-possession lead late in the game is really hard to overcome.
We can use the probabilities above to outline a strategy of whether or not to attempt the two-point conversion.
For teams scoring with between 4 and 8 minutes left, we are left with the following calculation:
Go for two (assuming a 50% chance of a successful conversion):
50% chance to get a 94% chance of a win + 50% chance to get an 86% chance of a win = Win 90% of the time
Win 89% of the time.
Using these numbers, there’s a *slight* advantage to going for the two-possession lead by attempting the two-point conversion. Given the associated errors that come with these probabilities (the margins of error in the graph, for example, are about 4%), this difference is not statistically meaningful.
For teams scoring with between 0 and 4 minutes left, we use the following calculation:
Go for two:
50% chance at a 99% chance of a win (best guess) + 50% chance at an 89% chance of a win = Win 94% of the time
Win 93% of the time.
Again, very little difference, and not a statistically meaningful one.
Altogether, there’s little empirical evidence to suggest that teams should attempt the two-point conversion late in the game when up seven. While there may be a slight advantage to the more aggressive strategy, it does not appear to be an overwhelming one. Relative to more common scenarios that coaches often screw up – like punting on 4th and 1 near midfield – the decision to attempt a late-game conversion appears to be a minor one.
-Some readers may have identified that the recent increase in extra point distance should be part of the discussion. That may be true. However, while it’s now more likely than before that the leading team misses an extra point that would give it an eight-point lead, it’s also more likely that the trailing team misses a game-tying chance if it were to score when down seven.
-I’ve seen frequent suggestions that teams should vary their decisions based on the caliber of their defense. As one example:
I would rather rely on my defense to stop a two-pointer than ask my offense to score one in this game. https://t.co/b2UnLBVFz6
This is fair, but two things to keep in mind. First, when a strong defensive team like Denver goes for two, the benefit of the two-possession lead looms even larger! No way the Chiefs score on two drives last night.
Second, team strength probably doesn’t matter as much as you think. As part of work I did last year for SI.com, I looked at both the game’s point spread and team offensive and defensive efficiency metrics from Football Outsiders as it related to two-point success. While the game’s point spread was a significant predictor (favored teams converted more often), neither the offensive team’s strength alone, nor the defensive team’s strength alone, factored into two-point success. Team-specific probabilities of successful conversions were almost always between 40 and 60 percent, with most of those differences accounted by the game’s point spread.
-I split game minute into two categories above: 0-4 minutes left and 4-8 minutes left. I tried similar splits and they told a similar story.
-It’s worth noting that simply splitting games by deficit alone would be troublesome if there were differences in the team strength among those leading by 7, 8, or 9 points (e.g., if the Patriots and Seahawks always led by 9 points). Judging by the game’s point spread, however, this didn’t seem to be the case. The teams leading late by 7, 8, and 9 points were relatively similar in terms of team strength.
I spent last Saturday at the first Boston Hockey Analytics conference, a gathering of analytically inclined hockey faithful. For those unable to attend, here are a few highlights. Note that, to the best of my knowledge there was no audio or visual recording of this conference.
–Michael Schuckers gave a talk summarizing the state of goalie research, including material that he’s working on for an upcoming book chapter. For the unfamiliar, the most common metric in evaluating NHL goaltenders is save percentage, which is limited in part because different goaltenders face different distributions of shots over the course of a season. Indeed, you could even have a Simpson’s Paradox scenario, where Goalie A is better at saving each type of shot than Goalie B, but that Goalie B still ends up with a better save percentage overall. This upcoming book chapter will be a must-read.
-Schuckers also pushed for those in the audience to do whatever necessary to get the NHL to share its tracking data. IMO, this is a no-brainer. The world of basketball is better for the brief look into this rich information that the NBA shared during the 2014-15 season and parts of the 2015-16 one. See, among other examples, this excellent tutorial on how to scrape and analyze player movements. The NHL’s lagging behind, and given the well known flaws that the league has with scorer/rink biases, the potential is there for public analysts to answer some excellent questions and help grow the game.
-Rob Vollman gave a talk on roster construction, providing a glimpse into how rules of the CBA dictate who and what players are reasonable values. You can buy Rob’s book here, which presents this and other analytically driven research. The takeaway linking Rob’s and Schuckers’ talks: don’t give goalies massive contracts, as there’s too good of a chance they won’t be worth it.
-I gave a talk on how to use R for reproducible hockey research. Slides and code here (note: download the pdf of the slides if you are looking for links). There are very few hockey researchers who share both their code and data. It’d be better for everyone involved if we can change this.
-Cole Anderson presented work on an ELO-based player comparison tool, in which the hockey public can rank players. This makes sense, particularly given that traditional player rankings (say, a scale of 1-10) can lead to ambiguous numbers (like 7.7). Cole’s work appears similar to the surveys that 538 has used to, for example, rank James Bond villains or pick summer Olympic sports. Cole’s code is also in R, and available for your perusal here. Hope to see more out of this project.
-Eric Cantor gave a talk looking at roster construction, looking at the Tampa Bay Lightning’s experience with seven active defensemen in place of the usual six. Eric’s evidence suggests that Tampa performed slightly better with the non-traditional construction.
-Ryan Davenport and Edwin Niederberger looked at shot locations from world tournaments, including the recent World Cup and the Olympic seasons. Relative to the rest of the world, the USA’s backline looks particularly ineffective offensively.
-Brian Carothers and Joseph Nelson gave two talks. First, the pair looked at quantifying defensemen given their hit and blocked shot totals, the slides of which are found here. Second, the pair led a Python workshop, the overview of which is linked here and the code of which is found here. Python and R are both free and powerful. You should learn (at least) one of them.
-Rob was asked how many teams are doing appropriate due diligence with respect to analytics? Rob guessed six, with most teams “nowhere close.”
-Billy Jaffe (New England Sports Network), Neil Abbott (player agent), and Ron Rolston (coach) led a panel moderated by Babson’s Rick Cleary which summarized their perspectives on how analytics have changed the game. Perhaps unsurprisingly, their biggest take-home is that these practitioners don’t care about what model you may have chosen or what (statistical) tools you needed to employ, they just want immediate, actionable, and simplified recommendations. This begs the question – that wasn’t asked – what happens when those suggestions don’t match their prior viewpoints?
-Although each panelist likely loses more hockey knowledge in their sleep than I’ll ever learn, there was a bit too much selecting on the dependent variable for my taste. In other words, because Team X and Team Y have recently won the Stanley Cup, this is how all teams need to win the Stanley Cup. Hockey’s way to random for that.
-Perhaps given that the conference was held in Boston, the Bruins’ 2011 Cup winning team was held in particularly high regard. Two of the panelists, for example, praised the Bruins’ winning culture and development of a high character locker room as driver’s behind their success. Of course, if that Boston team was so good at hockey, why did it need seven games – and an overtime – just to get out of the first round? If Montreal had won that round one series instead of Boston, did Boston still have a winning culture and a high character locker room?
-I missed a few other talks, but if those researchers or anyone else wants to share materials, please send them along! And many thanks to Luke, Rick, George, Michael, Rob, and the rest of the organizers for their great work.
Researchers have long and loudly banged the drum that NFL teams should be more aggressive on fourth downs. Among other sources, see recent work at the 4th down bot and Brian Burke’s primer here, the latter of which links to a handful of excellent academic papers.
However, there’s no evidence that any of this work has made a difference as far as team behavior. For example, in close games (two possessions or less) prior to the fourth quarter, teams went for it 6.4% of the time in 2015, nearly identical to the 6.5% in the year 2000. In fact, that number was as low as 4.8% in the year 2011.
But perhaps just as many of us analytically inclined fans were ready to give up, there seemed to be a few more aggressive plays in week 1 of the 2016 season, highlighted by Antonio Brown’s fourth-down touchdown grab on Monday night.
League-wide, was this a meaningful uptick in aggressiveness?
For now, the answer is a slightly unsatisfying maybe.
Using Armchair Analysis’ excellent data, I grabbed every fourth down play since the 2000 season. We’re interested in whether or not teams playing close games (two possessions or less) went for it on fourth down, defined as either attempting a rush or a pass. I filtered out fourth quarter plays, as decisions later in the game are too often dictated by game situation.
In week 1 of 2016, teams went for it on 4th down on 18 of a possible 160 attempts (11.2%). That’s the highest such percentage for week 1 games since 2000, and it’s not particularly close.
The following chart shows the weekly 4th down attempt percentage in these situations. I included a separate point for each season to give a sense of the season-to-season and the week-to-week variability. The smooth blue line reflects the trend, and the grey area reflects our uncertainty in the average fourth down attempt rate.
The aggressiveness observed in week 1 of 2016 (the red point in the top left) is only exceeded on a week-level basis by 7 weeks total since 2000. Interestingly, the chart also points to the possibility that coaches are slightly more aggressive later in the year, as shown by the increasing trend by week of the season.
It’s safe to say that, given what is traditionally less-aggressive behavior during the early parts of the season, week 1 of 2016 stands out as unusual. That said, there are several caveats. This analysis doesn’t take into account other factors that undoubtably are linked to fourth-down calls, including field position and opponent characteristics. For example, it’s certainly feasible that there just happened to be more 4th-down attempts in 4th-down friendly spots during week 1.
In any case, it’s certainly worth monitoring as the season progresses.
During Sunday’s NCAA contest between Notre Dame and Texas, the Longhorns faced a second-and-goal from the Irish one yard line midway through the second quarter.
“I wouldn’t be surprised to see a run,” said ABC announcer Todd Blackledge, opining on what type of play the Longhorns should try. “However, I will say that second down is the down to throw if you want to throw.”
Blackledge’s comment is classic football-think, behavior that’s been suggested for decades with no known empirical basis. However, thanks to the supple data of Armchair Analysis, it’s the type of behavior that easy to check and quite possibly validate (at least using NFL data). Thus, the two questions I’ll attempt to answer are:
How do coaches call plays in goal to go situations? Second, how should they call plays in goal to go situations?
Armchair’s database contains each NFL play since 2000, which I filtered into goal-to-go plays that occurred on first through third downs. I also cut out fourth quarter plays, as to worry less about the effects of varying late-game behavior.
How do coaches call plays? Here’s a barchart showing the percentage of plays which are passes, separated by down and distance (Note: called passes include sacks).
With one yard to go, roughly one in four plays are passes, which is roughly the same on first, second, and third downs. Across other distances, coaches are fairly consistent in their desire to run on first down and throw on third down, with second down decisions roughly a 50-50 split.
Interestingly, there is no noticeable spike in teams calling pass plays on second down, at least not relative to their behavior at other downs and distances. So, while football coaches love to talk about throwing the ball on second and goal, they aren’t necessarily acting that way.
Perhaps the more interesting question is how should coaches call their plays in goal to go situations. The answer is not straightforward. For example, passing plays are more likely to yield touchdowns on longer plays, but they’re also more likely to yield negative plays and plays of no gain.
One possibility is to consider the drive’s eventual point total as the outcome and to work from there. Using the same set of plays, I used the drive’s result (categorized as a touchdown, field goal, or neither) to estimate the average point total given each play call at the various down and distances.
Here’s a chart of expected points, separated by runs and passes.
Each point in the chart reflects the expected point total given a run or a pass at a certain down (1st, 2nd, or 3rd) and distance (x-axis). The size of the circle is proportional to the number of plays called at each point.
Interestingly, across most downs and distances, running plays offer slightly more of a return than passing plays. The differences aren’t overwhelming, but they are consistent, generally in the neighborhood of one or two tenths of an expected point. Notably, there’s no evidence to back up any claim that teams should pass the ball on second down – if anything, it’s the opposite.
These results are somewhat surprising given Kovash and Levitt’s seminal paper on NFL team behavior, which implies that teams should pass more than they currently do. One possibility is that the shorter field lessens the abilities of teams to throw the ball. Anecdotally, and as an example, teams seem to call way too many fade patterns.
All together, what did we learn?
First, there is no obvious truth to the theory that teams are passing more on second down and goal to go situations than we would expect. Second, there’s no evidence to the theory that they should be passing more often than they already are.
If anything, there’s a drop in efficiency on passing plays in goal-to-go situations, which may be showing up in the form of fewer expected points. However, I’m cautious of reading too much into this conclusion, given the nature of this analysis (it’s on aggregate, and doesn’t account for game and play specific factors) and the inherent difficulty in categorizing a team’s decision to run or a pass.
The 2016 Joint Statistical Meetings start Sunday in Chicago. Karl Broman put out his calendar – I figured I’d put mine together as well. Here’s a combination of sports, causal inference, and education talks that I noticed. If there’s something related that I should also be at, please let me know!
Also, kudos to Karl for pointing me to the JSM snack page. This is my fourth JSM and somehow this is the first I’ve heard about this. And also to Greg for helping out with JSM’s art show. This is the first of its kind, and hopefully something that grows in the future.
I’ve long had my eye out for intriguing papers that cover my two favorite areas of research, causal inference and sports statistics. For unfamiliar readers, causal inference tools allow for the estimation of causal effects (i.e., does smoking cause cancer) in non-experimental settings. Given that almost all sports data is inherently observational, there would seem to be opportunities for applied causal papers to answer questions in sports (here’s one).
It was with this vigor in mind that I read a paper, the Midweek Effect on Performance: evidence from the Bundesliga, recently posted and linked here. The authors, Alex Krumer & Michael Lechner – the latter of which has done substantial causal inference work – use propensity score matching to estimate the effect of midweek matches on home scoring.
The authors conclude that:
Playing midweek leads to an effect of about half a point in total, resulting from the home team losing about 0.2 points, while the away team gains about 0.3 points (the asymmetry results from the ‘3 points rule’)..it becomes clear that the home team loses all its home advantages in midweek games.
Interestingly, although matching is used as the primary method, selection effects (i.e., how much the weekend and weekday games differ) are weak. Primarily, conclusions are drawn as a result of the varying point totals described above.
As the authors discuss, several factors could be at play here, most notably referee bias and attendance. The authors also (gulp) suggest that testosterone levels could be linked to the poorer home team performance. In conclusion, Krumer & Lechner recommend that Bundesliga officials work to balance midweek game assignments.
All together, these findings would have a substantial place in sports literature with respect to the drivers of home advantage in sports. The results were so cool, in fact, that my first thought was: let’s replicate them.
Thanks to James Curley’s awesome engsoccerdatapackage, results from several professional soccer leagues are right at the R users fingertips. I started by trying to replicate the findings of Krumer and Lechner using the Bundesliga. Matching aside, our outcome of interest is a difference in difference: the average home point difference between weekend and weekday, minus the average away point difference between weekend and weekday.
In the paper linked above, the author’s found a gap of about 0.5 (in favor of weekend games) using the last 8 years of data.
As good news, so did I.
Using all years since 1960, I estimated the yearly average home weekend advantage. Sure enough, while there didn’t appear to be much of a difference prior to the year 2000, the last 15 years have seen a notable spike; teams are winning more points at home during weekend games. The blue line reflects the trend over time, which has roughly stabilized at at half a point over the last decade. The red dotted line reflects no difference.
Teams performed better on home weekday games in all but one of the last 10 years (2011). Meanwhile, in three of those years was the difference in point totals greater than 1. This difference is both statistically and practically significant (although it is important that only about 10% of a team’s games are weekday ones). Indeed, the author’s conclusions seem reasonable.
But replication on the data used by the authors is one thing; validation on another data set (e.g., another league) would go a long way towards confirming a weekday effect in professional soccer.
Fortunately, Curley’s R package contains more than the Bundesliga. I chose the English Premier League (EPL), Spain’s La Liga, and Italy’s Serie A to try and apply the paper’s approach elsewhere. If you want to try another league, feel free to expand upon it using my code posted here.
Long story short, I couldn’t replicate our initial findings in any of the three leagues. There wasn’t a single time-span in any of the EPL, La Liga, or Serie A where there was an additional benefit to teams playing home games on weekends.
Here are the three graphs, made similar to the one above. Note that there are varying x-axes: each league has had different numbers of seasons with weekday games. I went as far back as I could within each league (while also trying to assure continuity).
First, the home weekend advantage in Italy. By and large, there have been no differences.
Second, the home weekend advantage in England. Arguable, the weekday advantage was actually a disadvantage for a while.
Finally, the home weekend advantage in Spain. Again, if anything here, there has been a disadvantage.
To conclude, Krumer and Lechner find evidence of a difference in the home versus away point totals when comparing weekend and weekday games. Over the last decade, this magnitude of these differences has been fairly large – half a point, on average.
That said, while it was encouraging to replicate their findings, it is disconcerting the replications failed in three other top European leagues. There are obviously differences between the Bundesliga and each of the EPL, Serie A, and La Liga, including the types of weekdays on which each league plays its games (Serie A appears most similar to the Bundesliga in this respect, in not playing Monday games). However, there doesn’t appear to be anything else unique about the Bundesliga which would lend that league, and that league only, to a weekday effect.
That said, returning to the author’s original approach, this isn’t to say a midweek effect can be entirely discounted. If other leagues have assigned specific teams to midweek games based on past performances, it would mean our approach (a simple difference in difference one) was inappropriate. However, in absence of this other information, it seems more than plausible that the observed midweek effect in the Bundesliga could be accounted for due to chance.
Over the years, there have been several ‘draft curves’ put together in each of the four major North American sports. These charts provide intuitive visualizations of the relative value of each pick, while allowing us to better understand prospect potential and evaluate trades.
Despite the growing popularity of drafts in each sport, I was disappointed to find that there are apparently
No open-source guideline for how to make a draft curve and/or value chart
No attempt at comparing each of the sports’ draft curves simultaneously.
Those will be my goals here.
To start, I’ll explore how to estimate a draft value curve within a single sport. Then, I’ll compare curves between the NFL, NHL, NBA, and MLB using a pair of figures.
How to make a draft curve
The association between draft pick (x-axis) and player performance (y-axis) is generally non-linear, featuring a steep drop-off between the first few picks and a more steady decline thereafter. This is because the gap in talent between players chosen at picks 1 and 10 is, in expectation, larger than the gap between picks 50 and 60.
Thus, draft curves are most appropriately estimated using a non-linear fit. As examples, here are curves constructed using an exponential decay model, a logarithmic decay curve, locally weighted scatterplot smoothing (loess), and monotonic regression. However, like any fitting process, there is no right answer as far as which curve is most appropriate. While it’s beyond the scope of this report, an interesting project would compare different draft curves on some out-of-sample drafts to identify which type most accurately predicts average player performance. That’s a doable and important task.
The technique adopted here uses that of loess smoothing, which fits low degree polynomial functions between small subsets of the data, across all of the data. Loess is attractive as far as estimating draft curves for a few reasons. First, we don’t have to specify any specific functional form between pick number and player output. This is a big benefit, as instead of guessing what the association between our variables is, we’ll let the data tell us. Related, a loess method is more simple and flexible than a deterministic approach. As for downsides, the most obvious one is that we won’t be left with a simple equation with which to estimate player performance given a draft position. However, the estimated value of each pick, as well as a measure of uncertainty, can easily be extracted using software.
In fitting a loess smoother, the input most often controlled manually is the smoothing parameter, generally expressed as alpha, which accounts for the fraction of nearby points used in fitting the curve. Non-technically, alpha refers to the jigginess of each curve; values near 0 allow for a jagged trend, while values near 1 reflect more smoothness. I settled on an alpha of 0.4, which allows for the identification of some rougher edges, while hopefully rounding off others that are mostly due to random fluctuations.
Show me some charts
Once you have the data, estimating draft curves using a loess smoother within a single sport is doable in a few lines of code.
I started by using the pro-reference sites to scrape draft position and player performance measures in each of the four major sports. While not perfect, I settled on a player’s career wins above replacement (MLB), win shares (NBA), approximate value (NFL), and games played (NHL) as my outcome measures. In the case of the NHL, it may seem strange to use games played as a metric of player success. However, there’s a precedent for using this outcome – Michael Schuckers suggests doing so here. In any case, if you want to try your own outcomes, or change anything you see below, the code for the entire analysis is posted on my Github page.
Here’s a draft curve for NHL drafts between 1990 and 2005, using the first 100 picks and a loess smoother. The area in grey reflects our uncertainty above and below the curve at each pick number, and each dot represents a single player.
As far as games played, the average top pick is somewhere around 850, which is about three times the value of players picked late in the first round, and about six times the value of players chosen around pick 60. By and large, these numbers and this curve pass the smell test. So it’s a good start.
In addition, the NHL curve shows a nice feature of a loess fit which less flexible approaches would not have picked up on: right around pick 30, there’s a significant drop-off in games played. This dip could mean a few things. First, teams could be more willing to play their Rd. 1 picks on behalf of the sunk costs already invested in those players. Second, other teams could more frequently sign Rd. 1 free agents a few years down the road because of their prior label. Finally, and as a less provocative claim, because Rd. 1 picks are generally the top player chosen by their team, they’ll usually have an easier path to making their initial team’s rosters. While the player chosen 31st (Rd. 2) may have to outperform the player chosen 1st to make a roster, that’s usually not the case for the player chosen 30th.
Comparing across sports
While sport-level curves are interesting, I was also curious how each league has compared to one another.
There’s no easy way to answer to this question, however. In addition to disparities in the distribution and units of our outcome measures, there are also differences in the number of rounds in each sport’s draft (the NBA currently has 2, the NHL and NFL each have 7, and the MLB has 40).
One simple mechanism for making cross-sport comparisons is to only look at the top-60 picks, as this reflects the number of selections made in most NBA drafts. That handles our x-axis. To better understand the y-axis, I averaged the outcomes of players chosen between the 55th and 60th picks, using this number as a baseline. In the example above, we expect the top-pick in the NHL to be worth about 6 times that of the 60th pick.
Here’s a chart comparing the relative curves in each of the four sports, when divided by the average value of picks 55-60.
The top pick in the NBA draft is worth about 20x that of late second round picks, at least based on average win shares. Meanwhile, curves for MLB and the NHL are relatively similar. Finally, the most consistent pick-to-pick value appears in the NFL, where top picks are only worth roughly twice that of late round 2 picks, on average.
While the results of the NBA mostly matched expectations, the lack of any strong shape in the NFL curve, relative to the other sports, stands out. For example, it’s surprising that the MLB, which is evaluating high school players who are, by and large, a few years away from playing professionally, has a more significant drop-off in player talent at the top of the draft than is found in the NFL.
But don’t drafts have different numbers of rounds?
To account for the differing draft lengths (in rounds), we can tweak our curves so that the x-axis reflects the percentage of picks that were made up through each selection in each year, as opposed to a specific pick number. For example, the 50th percentile reflects the end of the NBA’s round 1, and roughly the middle of the 4th round in each of the NHL and the NFL. The MLB is excluded – at 40 rounds and with multiple minor league feeder teams, it is unclear that Rd. 40 of an MLB draft should be compared to, for example, Rd. 7 of an NFL or NHL draft.
In any case, here are draft curves across all rounds in the NBA, NFL, and NHL.
As in earlier, the NBA features the sharpest drop-off, while the NHL follows close behind. There’s a steeper decline in the NFL when looking across all rounds, with players chosen first overall, on average, worth about 10 times that of players chosen at the end of the draft.
Final assorted comments:
-It’s fair to use these curves to extrapolate to tanking incentives provided by each league. In the current system, it makes lots of sense to tank in the NBA, slightly less so in the NHL, and not as much sense in the NFL, as judged by the drop-offs in player talent.
-Pretty impressive job done by MLB scouting departments to accurately peg athletes who are three to four years away from playing professionally, and who stem from both top-level college programs and faraway high schools. Moreover, note that the MLB curve would appear even steeper if we could account for past issues with respect to league-level financial disparities. For a long period of time, more talented players were passed over by teams who could not afford them.
-Some technical notes: Our formal cutoff in the NHL was 210 picks – at one point it was higher, but I wanted this number to be consistent over time. Our NFL cutoff was 224 picks – at one point that number was higher, too. The NBA has used 2 rounds since 1989, so same cutoff throughout.
-One surprising factor to account for was a subtlety embedded in MLB draft history – players can be drafted more than once. This required setting initial player-level outcomes to 0 if that player was eventually drafted again.
-I don’t love my outcome measures, but they were the easiest ones available. As one positive sign, Saurabh at the Nylon Calculus found similar ratios to the ones above while using more advanced outcome measures in basketball.
-Finally, one could argue that a more preferred outcome would look at a player’s peak performance, instead of his career worth. That could certainly be the case. You could also make curves with “Probability of drafting an all-pro/all-star” as your outcome to answer a slightly different question.
No one wants to read about Patriots fumble rates, and I don’t want to write about Patriots fumble rates.
But I can’t not write about this.
The football person behind the initial commotion regarding low fumble rates was interviewed recently for a podcast. In response to a question about the 2015 season, in which the Patriots once again held onto the ball better than the rest of the league, the football person’s response was as follows:
One thing I noticed is that the weather and the climate up there during New England games was abnormally warm, which is one of the reasons that I found it phenomenal and crazy that they were having so few fumbles because as you know, and as I’ve studied and analyzed, it’s much more difficult to hold onto the football when you are playing out in the cold. So it was crazy how well they were able to hold onto the ball. But last year it was pretty warm, they didn’t have many cold weather games, and their fumble rate was pretty good as well.
Two suggestions were made clear:
1 – The weather during Patriots games was abnormally warm.
2 – It’s much more difficult to hold onto the football during cold weather
Let’s check these claims. Data from Armchair Analysis.
1 – The weather during 2015 Patriots games was abnormally warm.
A side-by-side boxplot should do the trick.
Here’s the temperature during Patriots games across the last 16 years.
The median, first quartile, and minimum game-time temperatures during Patriots games were not obviously different last year, and the temperature distribution in 2015 matches most of the prior years. It certainly does not appear to have been an “abnormally warm” year.
Writer’s claims: 0-for-1.
2 – It’s much more difficult to hold onto the football during cold weather
This is also straightforward to check out.
Using every game since 2000, I linked the game’s temperature to the fumble rate of the participating teams, defined as the total number of offensive team fumbles divided by the number of offensive team plays. So, 2 fumbles in 130 plays would give a fumble rate around 1.5%, or 0.015. If fumbles were associated with low temperatures, we would expect to see a decline in game-level fumble rates with increasing temperature.
Here’s a scatter plot.
While there’s a slight dip around 45 degrees, it’s neither statistically nor practically significant. The smoothed line moving through fumble rate and temperature is nearly perfectly horizontal. On aggregate, the cumulative rate of fumbles is relatively consistent across temperature (Note that a better analysis would probably use temperature in a model with play-level information).
Writer’s claims: 0-for-2.
So what does this mean?
The Patriots led the league with one of their lowest ever fumble rates again in 2015, which I have little double links to their style of play and their success (such as red-zone plays, playing with the lead, kneel downs, etc).
Despite the new evidence, our podcast guest appears to still be holding onto the idea that something funny is going on. Moreover, he’s doubling down on false claims, ones which at first glance appear reasonable. Like the initial analysis from a year ago, however, it’s mostly a bunch of hot air.
As part of their final assignment in my statistics and sports class, students were tasked with looking at the home advantage in the English Premier League (EPL). In some recent and related work, James Curley and Oliver Roeder found that, by 2014, an EPL home advantage had reached an all time low.
Interestingly, that low reached new depths in 2016.
Home teams have won 40.8% of games this past year, pending this weekend’s final contests. If that mark stands, it would be the lowest in EPL/English Division 1 history, one which dates back to 1888.
Here’s a chart, similar to the one that James and Oliver produced. Overall home team win percentage in each year is shown in black, draw percentage in red, and away win percentage in green. The grey region reflects our uncertainty in the trend curve.
As we knew there’d be, it’s a fairly big drop in win percentage, from roughly 60% to 45% across about 120 seasons. Using this rate of decline, we can expect home teams to win 0% of games by around the year 2400 (I kid).
While win percentage is a useful metric, it’s not perfect, as it doesn’t account for differences in team schedules. If the better teams generally got to host the worse teams, or if winners from previous year were forced to play one another more often (hi, NFL), overall home win percentage would fluctuate as a result. (Note that I am aware that EPL teams currently face each opponent exactly one time at both home and away stadia, which is nice and balanced. However, I’m not sure if that’s always been the case.)
As a result, a paired-comparison method that can account for the team strength of each team, and accordingly estimate a home advantage as a result, could be worth looking at. Using the BradleyTerry2 package in R, and with a hat-tip to James’ engsoccerdata package, I ran a Bradley-Terry model (BTM) with a home-team advantage coefficient within each season. When exponentiated, the coefficient from a BTM reflects a league-wide estimate of the home advantage, taken on an odds scale. As an example, if a team would have a 50% chance of a win at a neutral site (Odds = p/(1-p) = 1), they’d have a 60% win probability with a 50% increase in odds (Odds = .6/(1-0.6) = 1.5). As additional examples, 100% and 200% increases in odds would bump that 50% win probability team to 66% and 75%, respectively.
Here are the season-level increased odds of a home win across the last several decades. Conditional on team strength, in 2015, the odds of a team winning at home were about 25% higher than that team winning a neutral site. While still greater than 0, it’s again the lowest mark in league history.
It’s worth pointing out that the BTM coefficients are estimates, and do come with standard errors attached (generally about 0.12 on the log-odds scale). Even so, it’s interesting that the last 5 seasons all rank among the lowest 10 seasons as far as an estimated EPL home advantage.
James and Oliver posit several reasons as to what caused the drop in home advantage, including ease of travel and referee awareness. Their work also shows that the primary impetus behind fewer home wins is fewer home goals. Interestingly, while home advantage has also seemingly dropped in the NBA, it’s stayed relatively consistent in the NFL, MLB, and NHL.
It’s straight-forward to link EPL results with those in other professional soccer leagues. James’ data also includes La Liga (Spain), Serie A (Italy), and the Bundesliga (Germany), so I’ll use those.
Here’s a plot of win percentage across time in each of the four leagues. It’s sort of a cluster, but other leagues seem to match the EPL in terms of a home advantage in recent seasons.
Finally, we can use the BTM in each league in each season to get relative odds of a home win relative to that game being played on a neutral field. Here’s that graph.
-It’d be interesting to go back at each league’s schedule to identify where and why the two graphs above yield different results. For example, there’s a noticeable win percentage gap between the EPL and Serie A in 1975 that’s not apparent when looking at the BTM coefficients.
-The Bundesliga’s home advantage shape is quite strange in the years between 1962 and 1982, showing an inverse quadratic trend over time. I have no logical explanation for such an association. (In fairness, I wasn’t born yet.)
-England switched to a 3-point rule in 1981, while Germany, Italy, and Spain waited until the mid-1990’s. Behavioral economists would do well to look at the impact of rule changes using graphs like these. Generally, most related work only uses a few seasons of play (token plug for my NHL article).
-Nate Silver’s World Cup prediction model was taken to task after its overwhelming optimism for the host Brazilians in 2014. Given the results above, it’s possible that a decline in home advantage across soccer played a role. Nate’s model surely inflated Brazil’s chances because of what was anticipated to be a noticeable benefit of playing at home. But if much of the home advantage that used to exist in soccer was no longer a part of the game, it could explain why a prediction model would do so poorly for the hosts yet do so well when other teams played.