After six years of graduate school – two at UMass-Amherst (MS, statistics), and four more at Brown University (PhD, biostatistics) – I am finally done (I think). At this point, I have a few weeks left until my next challenge awaits, when I start a position at Skidmore College as an assistant professor in statistics this fall. While my memories are fresh, I figured it might be a useful task to share some advice that I have picked up over the last several years. Thus, here’s a multi-part series on the lessons, trials, and tribulations of statistics graduate programs, from an n = 1 (or 2) perspective.
Part IV: What I wish I had learned in my graduate program in statistics (with Greg Matthews) The point of this series is to be as helpful as possible to students considering statistics graduate programs now or at some point later in their lives. As a result, if you have any comments, please feel free to share below. Also, I encourage anyone interested in this series to read two related pieces:
The 2016 Joint Statistical Meetings start Sunday in Chicago. Karl Broman put out his calendar – I figured I’d put mine together as well. Here’s a combination of sports, causal inference, and education talks that I noticed. If there’s something related that I should also be at, please let me know!
Also, kudos to Karl for pointing me to the JSM snack page. This is my fourth JSM and somehow this is the first I’ve heard about this. And also to Greg for helping out with JSM’s art show. This is the first of its kind, and hopefully something that grows in the future.
I’ve long had my eye out for intriguing papers that cover my two favorite areas of research, causal inference and sports statistics. For unfamiliar readers, causal inference tools allow for the estimation of causal effects (i.e., does smoking cause cancer) in non-experimental settings. Given that almost all sports data is inherently observational, there would seem to be opportunities for applied causal papers to answer questions in sports (here’s one).
It was with this vigor in mind that I read a paper, the Midweek Effect on Performance: evidence from the Bundesliga, recently posted and linked here. The authors, Alex Krumer & Michael Lechner – the latter of which has done substantial causal inference work – use propensity score matching to estimate the effect of midweek matches on home scoring.
The authors conclude that:
Playing midweek leads to an effect of about half a point in total, resulting from the home team losing about 0.2 points, while the away team gains about 0.3 points (the asymmetry results from the ‘3 points rule’)..it becomes clear that the home team loses all its home advantages in midweek games.
Interestingly, although matching is used as the primary method, selection effects (i.e., how much the weekend and weekday games differ) are weak. Primarily, conclusions are drawn as a result of the varying point totals described above.
As the authors discuss, several factors could be at play here, most notably referee bias and attendance. The authors also (gulp) suggest that testosterone levels could be linked to the poorer home team performance. In conclusion, Krumer & Lechner recommend that Bundesliga officials work to balance midweek game assignments.
All together, these findings would have a substantial place in sports literature with respect to the drivers of home advantage in sports. The results were so cool, in fact, that my first thought was: let’s replicate them.
Thanks to James Curley’s awesome engsoccerdatapackage, results from several professional soccer leagues are right at the R users fingertips. I started by trying to replicate the findings of Krumer and Lechner using the Bundesliga. Matching aside, our outcome of interest is a difference in difference: the average home point difference between weekend and weekday, minus the average away point difference between weekend and weekday.
In the paper linked above, the author’s found a gap of about 0.5 (in favor of weekend games) using the last 8 years of data.
As good news, so did I.
Using all years since 1960, I estimated the yearly average home weekend advantage. Sure enough, while there didn’t appear to be much of a difference prior to the year 2000, the last 15 years have seen a notable spike; teams are winning more points at home during weekend games. The blue line reflects the trend over time, which has roughly stabilized at at half a point over the last decade. The red dotted line reflects no difference.
Teams performed better on home weekday games in all but one of the last 10 years (2011). Meanwhile, in three of those years was the difference in point totals greater than 1. This difference is both statistically and practically significant (although it is important that only about 10% of a team’s games are weekday ones). Indeed, the author’s conclusions seem reasonable.
But replication on the data used by the authors is one thing; validation on another data set (e.g., another league) would go a long way towards confirming a weekday effect in professional soccer.
Fortunately, Curley’s R package contains more than the Bundesliga. I chose the English Premier League (EPL), Spain’s La Liga, and Italy’s Serie A to try and apply the paper’s approach elsewhere. If you want to try another league, feel free to expand upon it using my code posted here.
Long story short, I couldn’t replicate our initial findings in any of the three leagues. There wasn’t a single time-span in any of the EPL, La Liga, or Serie A where there was an additional benefit to teams playing home games on weekends.
Here are the three graphs, made similar to the one above. Note that there are varying x-axes: each league has had different numbers of seasons with weekday games. I went as far back as I could within each league (while also trying to assure continuity).
First, the home weekend advantage in Italy. By and large, there have been no differences.
Second, the home weekend advantage in England. Arguable, the weekday advantage was actually a disadvantage for a while.
Finally, the home weekend advantage in Spain. Again, if anything here, there has been a disadvantage.
To conclude, Krumer and Lechner find evidence of a difference in the home versus away point totals when comparing weekend and weekday games. Over the last decade, this magnitude of these differences has been fairly large – half a point, on average.
That said, while it was encouraging to replicate their findings, it is disconcerting the replications failed in three other top European leagues. There are obviously differences between the Bundesliga and each of the EPL, Serie A, and La Liga, including the types of weekdays on which each league plays its games (Serie A appears most similar to the Bundesliga in this respect, in not playing Monday games). However, there doesn’t appear to be anything else unique about the Bundesliga which would lend that league, and that league only, to a weekday effect.
That said, returning to the author’s original approach, this isn’t to say a midweek effect can be entirely discounted. If other leagues have assigned specific teams to midweek games based on past performances, it would mean our approach (a simple difference in difference one) was inappropriate. However, in absence of this other information, it seems more than plausible that the observed midweek effect in the Bundesliga could be accounted for due to chance.
Over the years, there have been several ‘draft curves’ put together in each of the four major North American sports. These charts provide intuitive visualizations of the relative value of each pick, while allowing us to better understand prospect potential and evaluate trades.
Despite the growing popularity of drafts in each sport, I was disappointed to find that there are apparently
No open-source guideline for how to make a draft curve and/or value chart
No attempt at comparing each of the sports’ draft curves simultaneously.
Those will be my goals here.
To start, I’ll explore how to estimate a draft value curve within a single sport. Then, I’ll compare curves between the NFL, NHL, NBA, and MLB using a pair of figures.
How to make a draft curve
The association between draft pick (x-axis) and player performance (y-axis) is generally non-linear, featuring a steep drop-off between the first few picks and a more steady decline thereafter. This is because the gap in talent between players chosen at picks 1 and 10 is, in expectation, larger than the gap between picks 50 and 60.
Thus, draft curves are most appropriately estimated using a non-linear fit. As examples, here are curves constructed using an exponential decay model, a logarithmic decay curve, locally weighted scatterplot smoothing (loess), and monotonic regression. However, like any fitting process, there is no right answer as far as which curve is most appropriate. While it’s beyond the scope of this report, an interesting project would compare different draft curves on some out-of-sample drafts to identify which type most accurately predicts average player performance. That’s a doable and important task.
The technique adopted here uses that of loess smoothing, which fits low degree polynomial functions between small subsets of the data, across all of the data. Loess is attractive as far as estimating draft curves for a few reasons. First, we don’t have to specify any specific functional form between pick number and player output. This is a big benefit, as instead of guessing what the association between our variables is, we’ll let the data tell us. Related, a loess method is more simple and flexible than a deterministic approach. As for downsides, the most obvious one is that we won’t be left with a simple equation with which to estimate player performance given a draft position. However, the estimated value of each pick, as well as a measure of uncertainty, can easily be extracted using software.
In fitting a loess smoother, the input most often controlled manually is the smoothing parameter, generally expressed as alpha, which accounts for the fraction of nearby points used in fitting the curve. Non-technically, alpha refers to the jigginess of each curve; values near 0 allow for a jagged trend, while values near 1 reflect more smoothness. I settled on an alpha of 0.4, which allows for the identification of some rougher edges, while hopefully rounding off others that are mostly due to random fluctuations.
Show me some charts
Once you have the data, estimating draft curves using a loess smoother within a single sport is doable in a few lines of code.
I started by using the pro-reference sites to scrape draft position and player performance measures in each of the four major sports. While not perfect, I settled on a player’s career wins above replacement (MLB), win shares (NBA), approximate value (NFL), and games played (NHL) as my outcome measures. In the case of the NHL, it may seem strange to use games played as a metric of player success. However, there’s a precedent for using this outcome – Michael Schuckers suggests doing so here. In any case, if you want to try your own outcomes, or change anything you see below, the code for the entire analysis is posted on my Github page.
Here’s a draft curve for NHL drafts between 1990 and 2005, using the first 100 picks and a loess smoother. The area in grey reflects our uncertainty above and below the curve at each pick number, and each dot represents a single player.
As far as games played, the average top pick is somewhere around 850, which is about three times the value of players picked late in the first round, and about six times the value of players chosen around pick 60. By and large, these numbers and this curve pass the smell test. So it’s a good start.
In addition, the NHL curve shows a nice feature of a loess fit which less flexible approaches would not have picked up on: right around pick 30, there’s a significant drop-off in games played. This dip could mean a few things. First, teams could be more willing to play their Rd. 1 picks on behalf of the sunk costs already invested in those players. Second, other teams could more frequently sign Rd. 1 free agents a few years down the road because of their prior label. Finally, and as a less provocative claim, because Rd. 1 picks are generally the top player chosen by their team, they’ll usually have an easier path to making their initial team’s rosters. While the player chosen 31st (Rd. 2) may have to outperform the player chosen 1st to make a roster, that’s usually not the case for the player chosen 30th.
Comparing across sports
While sport-level curves are interesting, I was also curious how each league has compared to one another.
There’s no easy way to answer to this question, however. In addition to disparities in the distribution and units of our outcome measures, there are also differences in the number of rounds in each sport’s draft (the NBA currently has 2, the NHL and NFL each have 7, and the MLB has 40).
One simple mechanism for making cross-sport comparisons is to only look at the top-60 picks, as this reflects the number of selections made in most NBA drafts. That handles our x-axis. To better understand the y-axis, I averaged the outcomes of players chosen between the 55th and 60th picks, using this number as a baseline. In the example above, we expect the top-pick in the NHL to be worth about 6 times that of the 60th pick.
Here’s a chart comparing the relative curves in each of the four sports, when divided by the average value of picks 55-60.
The top pick in the NBA draft is worth about 20x that of late second round picks, at least based on average win shares. Meanwhile, curves for MLB and the NHL are relatively similar. Finally, the most consistent pick-to-pick value appears in the NFL, where top picks are only worth roughly twice that of late round 2 picks, on average.
While the results of the NBA mostly matched expectations, the lack of any strong shape in the NFL curve, relative to the other sports, stands out. For example, it’s surprising that the MLB, which is evaluating high school players who are, by and large, a few years away from playing professionally, has a more significant drop-off in player talent at the top of the draft than is found in the NFL.
But don’t drafts have different numbers of rounds?
To account for the differing draft lengths (in rounds), we can tweak our curves so that the x-axis reflects the percentage of picks that were made up through each selection in each year, as opposed to a specific pick number. For example, the 50th percentile reflects the end of the NBA’s round 1, and roughly the middle of the 4th round in each of the NHL and the NFL. The MLB is excluded – at 40 rounds and with multiple minor league feeder teams, it is unclear that Rd. 40 of an MLB draft should be compared to, for example, Rd. 7 of an NFL or NHL draft.
In any case, here are draft curves across all rounds in the NBA, NFL, and NHL.
As in earlier, the NBA features the sharpest drop-off, while the NHL follows close behind. There’s a steeper decline in the NFL when looking across all rounds, with players chosen first overall, on average, worth about 10 times that of players chosen at the end of the draft.
Final assorted comments:
-It’s fair to use these curves to extrapolate to tanking incentives provided by each league. In the current system, it makes lots of sense to tank in the NBA, slightly less so in the NHL, and not as much sense in the NFL, as judged by the drop-offs in player talent.
-Pretty impressive job done by MLB scouting departments to accurately peg athletes who are three to four years away from playing professionally, and who stem from both top-level college programs and faraway high schools. Moreover, note that the MLB curve would appear even steeper if we could account for past issues with respect to league-level financial disparities. For a long period of time, more talented players were passed over by teams who could not afford them.
-Some technical notes: Our formal cutoff in the NHL was 210 picks – at one point it was higher, but I wanted this number to be consistent over time. Our NFL cutoff was 224 picks – at one point that number was higher, too. The NBA has used 2 rounds since 1989, so same cutoff throughout.
-One surprising factor to account for was a subtlety embedded in MLB draft history – players can be drafted more than once. This required setting initial player-level outcomes to 0 if that player was eventually drafted again.
-I don’t love my outcome measures, but they were the easiest ones available. As one positive sign, Saurabh at the Nylon Calculus found similar ratios to the ones above while using more advanced outcome measures in basketball.
-Finally, one could argue that a more preferred outcome would look at a player’s peak performance, instead of his career worth. That could certainly be the case. You could also make curves with “Probability of drafting an all-pro/all-star” as your outcome to answer a slightly different question.
No one wants to read about Patriots fumble rates, and I don’t want to write about Patriots fumble rates.
But I can’t not write about this.
The football person behind the initial commotion regarding low fumble rates was interviewed recently for a podcast. In response to a question about the 2015 season, in which the Patriots once again held onto the ball better than the rest of the league, the football person’s response was as follows:
One thing I noticed is that the weather and the climate up there during New England games was abnormally warm, which is one of the reasons that I found it phenomenal and crazy that they were having so few fumbles because as you know, and as I’ve studied and analyzed, it’s much more difficult to hold onto the football when you are playing out in the cold. So it was crazy how well they were able to hold onto the ball. But last year it was pretty warm, they didn’t have many cold weather games, and their fumble rate was pretty good as well.
Two suggestions were made clear:
1 – The weather during Patriots games was abnormally warm.
2 – It’s much more difficult to hold onto the football during cold weather
Let’s check these claims. Data from Armchair Analysis.
1 – The weather during 2015 Patriots games was abnormally warm.
A side-by-side boxplot should do the trick.
Here’s the temperature during Patriots games across the last 16 years.
The median, first quartile, and minimum game-time temperatures during Patriots games were not obviously different last year, and the temperature distribution in 2015 matches most of the prior years. It certainly does not appear to have been an “abnormally warm” year.
Writer’s claims: 0-for-1.
2 – It’s much more difficult to hold onto the football during cold weather
This is also straightforward to check out.
Using every game since 2000, I linked the game’s temperature to the fumble rate of the participating teams, defined as the total number of offensive team fumbles divided by the number of offensive team plays. So, 2 fumbles in 130 plays would give a fumble rate around 1.5%, or 0.015. If fumbles were associated with low temperatures, we would expect to see a decline in game-level fumble rates with increasing temperature.
Here’s a scatter plot.
While there’s a slight dip around 45 degrees, it’s neither statistically nor practically significant. The smoothed line moving through fumble rate and temperature is nearly perfectly horizontal. On aggregate, the cumulative rate of fumbles is relatively consistent across temperature (Note that a better analysis would probably use temperature in a model with play-level information).
Writer’s claims: 0-for-2.
So what does this mean?
The Patriots led the league with one of their lowest ever fumble rates again in 2015, which I have little double links to their style of play and their success (such as red-zone plays, playing with the lead, kneel downs, etc).
Despite the new evidence, our podcast guest appears to still be holding onto the idea that something funny is going on. Moreover, he’s doubling down on false claims, ones which at first glance appear reasonable. Like the initial analysis from a year ago, however, it’s mostly a bunch of hot air.
As part of their final assignment in my statistics and sports class, students were tasked with looking at the home advantage in the English Premier League (EPL). In some recent and related work, James Curley and Oliver Roeder found that, by 2014, an EPL home advantage had reached an all time low.
Interestingly, that low reached new depths in 2016.
Home teams have won 40.8% of games this past year, pending this weekend’s final contests. If that mark stands, it would be the lowest in EPL/English Division 1 history, one which dates back to 1888.
Here’s a chart, similar to the one that James and Oliver produced. Overall home team win percentage in each year is shown in black, draw percentage in red, and away win percentage in green. The grey region reflects our uncertainty in the trend curve.
As we knew there’d be, it’s a fairly big drop in win percentage, from roughly 60% to 45% across about 120 seasons. Using this rate of decline, we can expect home teams to win 0% of games by around the year 2400 (I kid).
While win percentage is a useful metric, it’s not perfect, as it doesn’t account for differences in team schedules. If the better teams generally got to host the worse teams, or if winners from previous year were forced to play one another more often (hi, NFL), overall home win percentage would fluctuate as a result. (Note that I am aware that EPL teams currently face each opponent exactly one time at both home and away stadia, which is nice and balanced. However, I’m not sure if that’s always been the case.)
As a result, a paired-comparison method that can account for the team strength of each team, and accordingly estimate a home advantage as a result, could be worth looking at. Using the BradleyTerry2 package in R, and with a hat-tip to James’ engsoccerdata package, I ran a Bradley-Terry model (BTM) with a home-team advantage coefficient within each season. When exponentiated, the coefficient from a BTM reflects a league-wide estimate of the home advantage, taken on an odds scale. As an example, if a team would have a 50% chance of a win at a neutral site (Odds = p/(1-p) = 1), they’d have a 60% win probability with a 50% increase in odds (Odds = .6/(1-0.6) = 1.5). As additional examples, 100% and 200% increases in odds would bump that 50% win probability team to 66% and 75%, respectively.
Here are the season-level increased odds of a home win across the last several decades. Conditional on team strength, in 2015, the odds of a team winning at home were about 25% higher than that team winning a neutral site. While still greater than 0, it’s again the lowest mark in league history.
It’s worth pointing out that the BTM coefficients are estimates, and do come with standard errors attached (generally about 0.12 on the log-odds scale). Even so, it’s interesting that the last 5 seasons all rank among the lowest 10 seasons as far as an estimated EPL home advantage.
James and Oliver posit several reasons as to what caused the drop in home advantage, including ease of travel and referee awareness. Their work also shows that the primary impetus behind fewer home wins is fewer home goals. Interestingly, while home advantage has also seemingly dropped in the NBA, it’s stayed relatively consistent in the NFL, MLB, and NHL.
It’s straight-forward to link EPL results with those in other professional soccer leagues. James’ data also includes La Liga (Spain), Serie A (Italy), and the Bundesliga (Germany), so I’ll use those.
Here’s a plot of win percentage across time in each of the four leagues. It’s sort of a cluster, but other leagues seem to match the EPL in terms of a home advantage in recent seasons.
Finally, we can use the BTM in each league in each season to get relative odds of a home win relative to that game being played on a neutral field. Here’s that graph.
-It’d be interesting to go back at each league’s schedule to identify where and why the two graphs above yield different results. For example, there’s a noticeable win percentage gap between the EPL and Serie A in 1975 that’s not apparent when looking at the BTM coefficients.
-The Bundesliga’s home advantage shape is quite strange in the years between 1962 and 1982, showing an inverse quadratic trend over time. I have no logical explanation for such an association. (In fairness, I wasn’t born yet.)
-England switched to a 3-point rule in 1981, while Germany, Italy, and Spain waited until the mid-1990’s. Behavioral economists would do well to look at the impact of rule changes using graphs like these. Generally, most related work only uses a few seasons of play (token plug for my NHL article).
-Nate Silver’s World Cup prediction model was taken to task after its overwhelming optimism for the host Brazilians in 2014. Given the results above, it’s possible that a decline in home advantage across soccer played a role. Nate’s model surely inflated Brazil’s chances because of what was anticipated to be a noticeable benefit of playing at home. But if much of the home advantage that used to exist in soccer was no longer a part of the game, it could explain why a prediction model would do so poorly for the hosts yet do so well when other teams played.
NHL Game 7’s are awesome, and in the next two days, we get two such contests – Dallas vs. St. Louis and San Jose vs. Nashville.
Here’s a primer on what to expect with respect to team behavior in these games. All of my findings used data from the nhlscrapr package in R for the 10 years of postseason action between 2006 and 2015.
Fewer penalties, slightly fewer goals in Game 7’s.
Nate has a nice chart covering the tendency for teams to accumulate fewer penalties in game 7’s, relative to other games in the series. I found roughly the same thing; about 11 total non-matching penalties per game during the first six contests of a series, compared to just seven in game 7’s. That’s about eight more minutes of even strength to expect in a Game 7.
Perhaps the fewer power plays awarded in game 7’s are driving a small but noticeable difference on the total number of goals; while games 1-6 average 5.3 goals per game, game 7’s average 4.9.
Penalties are less likely to be called at the beginning of Game 7’s.
Here’s a chart of the per-minute penalty rate comparing game 7’s to game’s 1-6. Each line reflects a smoothed curve, and the grey area reflects our uncertainty in each curve’s trend. Rates are adjusted to reflect the number of penalties we would expect if the rate of whistles for that minute of play were extended for an entire game.
The biggest difference in penalty rates between game 7’s and other games in a series looks to be the first period, where penalties are consistently called less often. This could be a combination of factors – players adopting a safer style of play, for example, or a referee’s hesitancy to call possible violations. Interestingly, these results mirror those from the NFL, where many judgement calls are rarely whistled to start a game.
There are also smaller differences in periods 2 and 3, and marked differences in the game’s final minute. However, note that rate differences at the end of the game are perhaps not too surprising given the frequent scrums in earlier games where teams do silly things like trying to “send a message.” Sidenote: I wonder how style of play would change if penalties at the end of a game carried over to a team’s next contest.
Here’s a similar plot looking at goals (empty net goals were excluded).
Note that the standard error bars for each curve were fairly large and overlapped throughout a game, and so differences between the two curves should be taken with a grain of salt. There is slightly less scoring throughout most of game 7’s, particularly at the end of the first period and at the beginning of the third period. Teams operate at about a two goals-per-game pace to begin the third period of game 7’s, for example.
More pressure, more call reversals?
In work a few years back (ignore the Excel chart! I was learning R at the time), Kevin and I found slightly higher rates of make-up calls in Game 7’s, relative to other games in a series. This came on top of the higher frequency of make-up calls in the postseason, relative to regular season action. Here, I’m defining a make-up call as one that works to even out the total number of penalties each team has.
A few years later, that trend still seems to hold. When a home team has exactly one more penalty that its opponent, it has received the next power play 58% of the time during the regular season. That number jumps to 62% in postseason game’s 1-6 and 68% in game 7’s. When owed two or more penalties, the home team has been awarded the next power play 61% of the time in the regular season, 65% of the time in game’s 1-6, and a remarkable 78% of the time (18 of 23 sequences) during game 7’s. So, if the home team gets behind on penalty differential, expect that to even up by game’s end.
Differences in postseason call reversals are not as evident when looking at if away teams are owed power plays. Across each game number, away teams that are owed penalties are given the next power play about 57% of the time.
A few days back, I summarized recent research on the NFL draft.
One interesting anecdote was the reliance of nearly all NFL draft research, both academic and non-academic, on Pro-Football-Reference’s approximate value (abbreviated AV), and it’s player-level career summary (cAV), a metric designed by Doug Drinen and described by Neil Paine in a series of articles here.
So, part of this post is designed to look at AV.
Before I start that, however, here’s a plug for those looking to do work in statistics in sports. We need more and better player-level outcomes in the NFL. Hockey, baseball, and basketball each have a half-dozen metrics which have or are being used and tested in the public sphere. As examples, we know which catchers are good at framing pitches, whether or not hockey teams should dump and chase or carry-in, and, for goodness sake, we have Nylon Calculus dominating the NBA scene on a daily basis. Meanwhile, football is mostly stuck on fourth downs and counting statistics. It will be difficult, but it is time for football fans to step up their games.
On today’s agenda:
What can we learn about Approximate Value?
Why does it matter?
Coming in the future:
How else can we analyze the NFL draft?
Pro-Football-References describes AV as “putting a single numerical value on any player’s season, at any position, from any year.”
Points are awarded to offense’s on the basis of team points per drive, relative to the league average. These points are then divided up among member’s of each offensive unit. A similar strategy is also used for defensive players based on points against. In addition, bonus points are given for being named an All-Pro, and points are also awarded for making the Pro Bowl at certain positions. Career AV numbers are weighted by season-level ones to account for varying numbers of seasons played. A more complete description is found here.
Here’s a rough summary of cAV using quarterbacks, one which passes the smell test: JaMarcus Russell is a 6, Tim Tebow a 12, Christian Ponder a 22, Ryan Fitzpatrick a 61, Alex Smith a 72, Matt Ryan a 96, Big Ben a 96, and Tom Brady a 160.
In understanding AV, it is useful to start by looking at the distributions by position among players drafted before 2010. Here’s a plot of career AV (cAV), split by era. I left out players drafted after 2010 as they have not had much of a chance to accumulate a representable career number.
A few things stand out.
-The distribution of cAV by position appears consistent between eras. That is, the shapes of each boxplot are the roughly same within each position, moving from era to era. This seems interesting, and arguably debatable, given changes to the game since 1967.
-At all positions, more than 50% of players accumulate a cAV of less than 10 (recall: Tim Tebow’s was 12). This makes sense, given the number of players drafted that barely play a game. The right skew at each position also seems reasonable.
-Tight ends appear to be the lowest-valued position, with offensive linemen and linebackers the highest, as judged by median career AV and density in the right tails.
-Relative to what I perceive of the last several years of the NFL, as well as what teams are paying top players, there seems to be a bit of disconnect. For example, quarterbacks have long been paid substantially more than players at other positions, implying that the best ones are worth more to their teams than the best players at other positions. Other than a few quarterbacks drafted in the 1990’s, this doesn’t show up in cAV.
It’s straightforward to link draft position to cAV.
Let’s start by fitting cAV to draft position across all seasons and positions. Each point below is a player, and the blue curve is a non-parametric smoother, which accounts for the likely non-linear association between draft position and performance. If this curve looks familiar, it looks like the one used in several of the draft-trade charts in circulation.
One question that I am really curious about is whether or not teams have gotten better at drafting since the late 1960’s. An obvious thing to do would be to fit an identical curve to the one shown above across different eras. If curves in recent years are higher and/or steeper, it would suggest that teams in today’s NFL are better at identifying talent and are drafting those players earlier.
Unfortunately, there’s a problem with that strategy. Turns out, cAV has grown over time. Here’s the average player cAV among the draft’s first 260 picks (this is the number of picks in the last few years. The trend is identical when using different cutoffs). The grey area accounts for our uncertainty in the trend curve.
The average cAV of players drafted in 2006 is about 3-4 units higher when compared to players drafted four decades ago. It’s a significant increase as well as a practical one. All together, on a yearly basis, about 1000 more units of cAV are being handed out now than in the early stages of the league. [Edits: This matches conclusions found by Danny at Football Outsiders. Sean proposes that expansion is responsible for part of this increase, given that AV is tied to playing time.]
Looking into the formula for AV, it is unclear what is causing these increases. One possibility is that more players have been drafted in recent years in the positions that AV values more than others. Another possibility, albeit a stretch, would blame recent increases in the number of Pro Bowl players, as AV rewards this.
Why does this matter?
Well, as far as looking at draft success over time, increases in cAV are vital to identify. Without adjusting for when a player was drafted, standard analyses could confuse improved league-wide drafting ability with what appears to be era-specific changes in our outcome measure.
To sum, we know that drafting players in any sport is hard given the extrapolation required to project the performance of young athletes several years down the line. In addition, analyzing the NFL draft is made even more difficult given that what is currently a poor set of outcome measures.
Much of today’s draft research uses cAV, a promising metric which is attractive in that it’s a single number accounting for many team and individual level performance metrics. However, we showed above a few potential weaknesses in cAV, at least with respect for its use in looking at the NFL draft. Overall:
-Approximate value mostly the smell test as far as players ranked high and low.
-Arguably, position-level AV measures don’t perfectly seem to match league-wide trends.
-Valuations are increasing in time.
Under these constraints, one reasonable recommendation for using cAV is to make it year and position specific. In future work, I’ll propose one way of doing this.
Another NFL draft has come and gone, and with it has come the predictable displays of unyielding optimism, stale and arguably race-based generalizations of player skill, and, as a relative newcomer in 2016, lazy misuse of the term analytics.
In following along this spring, it became clear that what is mainstream knowledge among researchers is far from it in the national media. This despite a decent amount of both academic and non-academic research into the topic.
For those new to the scene, or even for a few veterans who may have missed an article or two along the way, I decided to write a quick review of what’s out there. Note that many of the following points are related to one another.
1. Top draft picks are overvalued.
In an efficient market, the value of picks traded between teams would be equivalent. That is, if Pick X was traded for Picks Y and Z, the value added by picking in spot X should equal that of picking in spots Y and Z. Interestingly, in a pair of well-regarded papers from researchers Cade Massey and Richard Thaler, the authors identified marked inefficiencies in how NFL franchises value different draft picks.
Turns out, the preferable solution is almost always to acquire more picks. In our example, better to have Y and Z than X alone. It’s why the the common suggestion among many who have studied the draft is that because teams overvalue earlier picks, it is generally a good idea to trade down. The edge goes to those who take advantage of the overconfidence shown by others.
In addition to being overvalued on behalf of league officials, players picked at the top of the draft have also tended to offer less of a return on their investment than those picked at the end of round one. To be specific, the NFL rookie pay scale is nearly fixed, and players picked near the top of the draft get paid substantially more. As a result, for several years, it was optimal to pick late in the first or early in the second rounds, where the expectation was to obtain a decent player with a substantially more palatable contract. See Figure III in Massey & Thaler’s paper here, as one example. The author’s aptly call this a loser’s curse; bad NFL teams are stuck picking at the top of the first round, where the expected return is actually lower.
Note that given the new CBA which updated rookie contracts in 2014, it is less clear if such a surplus value still exists in today’s NFL.
3. Teams are, by and large, making guesses
One of the most consistent findings in current literature is that a team’s prowess in picking good players doesn’t translate from one season to the next. Chase reached such a conclusion a few years ago, and I really liked Neil’s scatter plot below, which shows the lack of a year-to-year consistency in team-level returns over expectation.
To sum, despite all that you’ll hear to the contrary, there’s no evidence to date that an organizational-level ability to identify talent in the draft is repeatable.
And if team’s are unable to consistently identify good picks in the draft, it’s safe to say that draft pundits can’t, either. In fact, when I compared Mel Kiper’s draft grades to his re-grade of that same draft after a few seasons of play, the link was wink. That is, Kiper’s post-hoc analysis of a draft barely drew any of the same conclusions as his first one.
So, whenever you hear team officials boasting about an incoming player – or even when looking at the so-called draft ‘grades’ put out everywhere – it’s not much different than listening to someone saying they like heads in a coin flip or red on the roulette table.
4. Sunk costs make analysis difficult.
Let’s take an interlude to talk fantasy football. Imagine that you drafted Peyton Manning early in 2016, making a substantial investment in a quarterback who you anticipated would lead your squad to greener pastures. Turns out, Manning stunk from week 1 onwards. But there he was in your week 5 line-up, then again in your week 10 line-up, and so on. You couldn’t move on!
Turns out, NFL organizations also suffer from the same problem. Such was the conclusion of Quinn Keifer, who found that players chosen at the end of Rd. 1 received substantially more playing time than those who were picked at the beginning of Rd. 2. In other words, the premium made on first round picks was responsible for eventual increases in games started. Keifer blames sunk costs, in which prior investments have led to teams making irrational decisions down the road. This is also what ruined any fantasy player who stuck with Manning.
This result matters when it comes to analyzing player success. If organizations are afraid to cut loose players that they drafted early, it would muddle any link between draft round and player success, and it could artificially inflate the link between early draft picks and long term success.
5. What else is out there?
I explored a few major findings above, but here is a brief synopsis of other interesting material.
-There may be position-level variability as far as identifying talent. Of course, this becomes complicated at some positions, such as linebacker. Write Zach & Rob, “It’s also possible that players with good pedigrees and name recognition are being given preferential treatment when awards are granted, in the absence of meaningful performance data (like what exists for offensive skill players) to challenge our perceptions.”
-Neil and Alison look at position-level value here.
-In a more recent article, a pair of authors conclude that picks made from teams that trade up tend to outperform picks made when teams stand pat, as judged by games started and approximate value in a player’s first three years. However, my initial reaction to this work is one of skepticism, in large part because of what we went over earlier about sunk costs. If teams trade up for a player, that investment could lend itself towards a team aggressively showcasing him. Additionally, the author’s do not mention the often prohibitive cost of trading up, which could offset any gains in surplus drafting.
-In the next few days, I’ll have some posts that look at the history of the draft and team success. More to come.
So what did we learn?
It is extremely difficult to consistently identify talent in the NFL draft. However, there are inefficiencies, which generally arise when teams overvalue players by trading up to acquire them. Such displays of confidence belie a process which is, by and large, no different than picking heads or tails before a coin toss.
In probability class, one of the most frequently cited examples is the birthday problem: Given a class of N students, what is the probability that 2 or more share the same birthday?
While the formula for calculating the answer is straightforward, the true lesson from the birthday problem – that there’s a difference between “any 2 students” and “2 particular students” – hits much quicker when simplifying the question by using real data.
In other words, instead of coming up with formulas to calculate exact probabilities (which, even in the birthday problem, require making some unjustifiable assumptions*), it’s also worth looking at a data sets of different size N and estimating the probabilities using those results.
It’s with such a mindset that I inquisitively read the following article on Yahoo, summarizing a South Carolina town in which 5 residents have recently made the National Football League. Here’s the headline:
The odds that a town of 1,000 would produce five NFL players in 25 years are long. Really long. Like 1 in 10 million billion long.
Those are really long odds – you’re more likely to, among other things, record a perfect March Madness bracket (1 in 128 billion) or win the Powerball (1 in 238 million).
So where’d 1 in 10 million billion come from? To his credit, the article’s author, Eric Adelson**, appears to have done his due diligence, having asked for assistance in estimating these probabilities.
How rare is it? Yahoo Sports asked a handful of experts and mathematicians around the country. One couldn’t come up with an answer. Jeffrey Forrester, associate professor of math at Dickinson College (Pa.), put the chances at approximately 0.0000000000797. Yes, that’s 10 zeros. (Forrester notes the odds of being dealt a royal flush are 0.00000154, or about 20,0000 times more likely.) Dominic Yeo, who is studying math at Oxford, approximated the probability as 1 in “ten million billion.”
It’s not in my place to question the calculations of other mathematicians – particularly without seeing the details – but as a statistician, why estimate unknown probabilities when we can use real data?
So that’s what I did.
My question is whether or not the town in question (Lamar, SC) is truly an outlier, or if any of the other 43,000 United States municipalities can boast a similar claims. If it’s the latter, we can be pretty confident that the 1 in 10 million billion claim is overzealous.
To collect the data, I scraped*** birth cities of NFL players from Pro-Football-Reference and merged those with 2014 population information provided by the Census Bureau. Looking at players-per-1000 people (Ex: Lamar has 5.12), and only using towns with at least 2 NFL players, here’s are the top-20 places in terms of NFL production rates. The table is sorted by Ratio_1000, which is the ratio of players per 1000 residents.
By this measurement, Lamar ranks 19th in the country as far as a ratio of NFL players-per-capita. While many of the towns in front of Lamar only have 2 NFL players (Lamar has 5), and a few stand out as towns that may be drawing from larger populations than given in the census, it’s difficult to conceive of Lamar as one in ten-million-billion when it’s not even ranked near the top of the United States.
As examples, Town Creek, Alabama (6 players, population 1076) and Gloster, Mississippi (6 players, population 916) stand out more than Lamar does.
We can also plot our data.
Focusing on all US towns with between 800 and 1200 residents, here’s a histogram of the number of NFL players from each. Note that I dropped the roughly 3,000 towns in this category without NFL players, as it made looking at the towns that have produced NFL players nearly impossible.
There are several other towns of Lamar’s size with similar numbers of NFL players, and 10 towns have at least 3 players.
All of which brings us back to the birthday-style probability questions:
What is the problem that Lamar itself happens to be one of the small towns with an extreme number of players (4 or more)? Probably about 1 in 700, which is several orders of magnitude lower than the estimate given by Yahoo.
And what is the probability that any town of Lamar’s size could have produced that many players? Given what we see in the data, this estimate is seemingly much, much, higher than 1 in 700.
So why are our estimates are much different? Primarily, there’s not an independence in the NFL-talent production rates of United States towns. Not every town has football, and towns that produce more NFL talent are, for several reasons, more likely to continue to do so. There’s also a spatial structure to our production rates; note that in our table above, nearly every state is from the Midwest or South. So when there’s neither independence within or between towns in the likelihood of a resident making the NFL, standard methods of calculating probabilities, which rely on independence to use multiplication, are inappropriate.
Fortunately, in this and many other cases, data can come to our rescue.
*In this case, the incorrect assumption is that birthdays are evenly distributed throughout the year
Given a questionable man advantage, Chicago capitalized, as Duncan Keith scored for a 2-1 lead.
When reviewing the play during the second period intermission, NBC analyst Keith Jones made an interesting observation. While I don’t have his comments word-for-word, here is roughly how Jones reacted to seeing the questionable call and the resulting Keith goal.
“So if you’re Chicago now, you have to play extra careful because the officials are going to be looking to give St. Louis a chance next.”
The insinuation from Jones is that with Keith scoring, the refs were going to be extra sure that the next penalty would be called on Chicago. Of course, that’s what happened next, as Andrew Ladd was whistled for interference a few minutes later. Vladimir Tarasenko scored for St. Louis, and the Blues eventually won 4-3.
Jones’ comments are right in my wheelhouse. It is well established that NHL referees (and perhaps those in other sports, too) are prone to make calls that even up each team’s total number of penalties. This can be termed a biased impartiality – in an effort to appear impartial, refs no longer make impartial decisions.
But if refs vary the frequency of even-up calls based on whether or not the team with the initial power play scored, it would add an extra dimension to understanding officials’ decision making.
So, I decided to look into whether Jones was onto something.
A complete analysis linking penalty likelihood to previous power play performance would take some advanced modeling techniques, given the difficulty in separating possible associations to other game factors, including score effects and the impact of overall penalty differential.
But if power play performance is impacting future referee decisions, an easy place to start would be to look at the game’s second penalty, while considering the following two questions:
(i) Is it an even-up call? (In other words, is it called on the team that did not receive the first penalty?)
(ii) Did the team on the first power play score?
By looking at penalties early in the game, there’s less of a chance that score-effects and complex penalty-differential effects are impacting referee decisions. Additionally, most NHL games have at least two penalties, so we are roughly getting one observation from each game.
It will also be easy to separate any possible effects by whether or not the home team was owed a penalty, as well as season type (regular, postseason).
As usual, I’ll use the nhlscrapr package in R. This includes play by play events from all regular and postseason contests since the 2002-03 season (roughly 27,000 games). I decided to drop penalties that were assessed simultaneously, as many of these are matching minors that did not yield a man advantage. However, results were similar when not doing this.
Overall, teams with the first power play of the game converted 19% of the time. When they scored, the ensuing power play was awarded to the opposite team 60.8% of the time. When they didn’t score, the ensuing power play was awarded to the opposite team 57.9% of the time. At a tick under 3%, the difference is statistically significant, albeit not an overwhelming one in terms of practical significance.
Interestingly, the results were much stronger in the postseason. Teams scoring on the game’s first power play were given the next penalty 69.3% of the time, compared to 62.2% of the time when they didn’t score, for a relative difference of 7.1%. Across all games, the difference of in make-up call likelihood was slightly stronger for the home team (3.2%) than the away team (2.6%), conditional on the first power-play chance being converted.
It’s far from exhaustive, but there does appear to be a possible link between power play success and the evening up of penalties, as initially suggested by Jones, the NBC studio host.
Of course, there are other factors at play, even in our simplified analysis. If teams score on the first power play of the game, they are more likely to be playing with the lead. And because teams with the lead tend to possess the puck less frequently, they are potentially more likely to pick up penalties. In this sense, looking early in the game, when penalties and possession are less likely linked to overwhelming score effects, is preferable. Interestingly, the effect of previous power play performance was actually stronger when the game was tied at the time the second penalty was called.
In any case, NHL refs wouldn’t be the first ones accused of basing calls on what should be independent factors. It’s not uncommon for NBA officials, for example, to be accused of a late whistle – that is, deciding whether or not to call a foul based on whether or not the offensive player’s shot went in. As in the NBA, where the refs may feel bad for the offensive player who missed a shot after a borderline foul, NHL refs could seemingly feel bad for the defensive team after giving up a power play goal after a borderline penalty.
Anyways, here’s a plot using the overall rates. As referenced earlier, not a huge difference, but not nothing, either.
*Here’s the code. It code took a bit of care, as the nhlscrapr package does not contain power play success outcomes in the set of play-by-play events. I considered a power play a success if there was a goal within 120 seconds of the penalty scored by the team that received the power play. Feel free to play around if you’d like – I’m using the grand.data file from nhlscrapr.