After six years of graduate school – two at UMass-Amherst (MS, statistics), and four more at Brown University (PhD, biostatistics) – I am finally done (I think). At this point, I have a few weeks left until my next challenge awaits, when I start a position at Skidmore College as an assistant professor in statistics this fall. While my memories are fresh, I figured it might be a useful task to share some advice that I have picked up over the last several years. Thus, here’s a multi-part series on the lessons, trials, and tribulations of statistics graduate programs, from an n = 1 (or 2) perspective.
Part IV: What I wish I had learned in my graduate program in statistics (with Greg Matthews) The point of this series is to be as helpful as possible to students considering statistics graduate programs now or at some point later in their lives. As a result, if you have any comments, please feel free to share below. Also, I encourage anyone interested in this series to read two related pieces:
Researchers have long and loudly banged the drum that NFL teams should be more aggressive on fourth downs. Among other sources, see recent work at the 4th down bot and Brian Burke’s primer here, the latter of which links to a handful of excellent academic papers.
However, there’s no evidence that any of this work has made a difference as far as team behavior. For example, in close games (two possessions or less) prior to the fourth quarter, teams went for it 6.4% of the time in 2015, nearly identical to the 6.5% in the year 2000. In fact, that number was as low as 4.8% in the year 2011.
But perhaps just as many of us analytically inclined fans were ready to give up, there seemed to be a few more aggressive plays in week 1 of the 2016 season, highlighted by Antonio Brown’s fourth-down touchdown grab on Monday night.
League-wide, was this a meaningful uptick in aggressiveness?
For now, the answer is a slightly unsatisfying maybe.
Using Armchair Analysis’ excellent data, I grabbed every fourth down play since the 2000 season. We’re interested in whether or not teams playing close games (two possessions or less) went for it on fourth down, defined as either attempting a rush or a pass. I filtered out fourth quarter plays, as decisions later in the game are too often dictated by game situation.
In week 1 of 2016, teams went for it on 4th down on 18 of a possible 160 attempts (11.2%). That’s the highest such percentage for week 1 games since 2000, and it’s not particularly close.
The following chart shows the weekly 4th down attempt percentage in these situations. I included a separate point for each season to give a sense of the season-to-season and the week-to-week variability. The smooth blue line reflects the trend, and the grey area reflects our uncertainty in the average fourth down attempt rate.
The aggressiveness observed in week 1 of 2016 (the red point in the top left) is only exceeded on a week-level basis by 7 weeks total since 2000. Interestingly, the chart also points to the possibility that coaches are slightly more aggressive later in the year, as shown by the increasing trend by week of the season.
It’s safe to say that, given what is traditionally less-aggressive behavior during the early parts of the season, week 1 of 2016 stands out as unusual. That said, there are several caveats. This analysis doesn’t take into account other factors that undoubtably are linked to fourth-down calls, including field position and opponent characteristics. For example, it’s certainly feasible that there just happened to be more 4th-down attempts in 4th-down friendly spots during week 1.
In any case, it’s certainly worth monitoring as the season progresses.
During Sunday’s NCAA contest between Notre Dame and Texas, the Longhorns faced a second-and-goal from the Irish one yard line midway through the second quarter.
“I wouldn’t be surprised to see a run,” said ABC announcer Todd Blackledge, opining on what type of play the Longhorns should try. “However, I will say that second down is the down to throw if you want to throw.”
Blackledge’s comment is classic football-think, behavior that’s been suggested for decades with no known empirical basis. However, thanks to the supple data of Armchair Analysis, it’s the type of behavior that easy to check and quite possibly validate (at least using NFL data). Thus, the two questions I’ll attempt to answer are:
How do coaches call plays in goal to go situations? Second, how should they call plays in goal to go situations?
Armchair’s database contains each NFL play since 2000, which I filtered into goal-to-go plays that occurred on first through third downs. I also cut out fourth quarter plays, as to worry less about the effects of varying late-game behavior.
How do coaches call plays? Here’s a barchart showing the percentage of plays which are passes, separated by down and distance (Note: called passes include sacks).
With one yard to go, roughly one in four plays are passes, which is roughly the same on first, second, and third downs. Across other distances, coaches are fairly consistent in their desire to run on first down and throw on third down, with second down decisions roughly a 50-50 split.
Interestingly, there is no noticeable spike in teams calling pass plays on second down, at least not relative to their behavior at other downs and distances. So, while football coaches love to talk about throwing the ball on second and goal, they aren’t necessarily acting that way.
Perhaps the more interesting question is how should coaches call their plays in goal to go situations. The answer is not straightforward. For example, passing plays are more likely to yield touchdowns on longer plays, but they’re also more likely to yield negative plays and plays of no gain.
One possibility is to consider the drive’s eventual point total as the outcome and to work from there. Using the same set of plays, I used the drive’s result (categorized as a touchdown, field goal, or neither) to estimate the average point total given each play call at the various down and distances.
Here’s a chart of expected points, separated by runs and passes.
Each point in the chart reflects the expected point total given a run or a pass at a certain down (1st, 2nd, or 3rd) and distance (x-axis). The size of the circle is proportional to the number of plays called at each point.
Interestingly, across most downs and distances, running plays offer slightly more of a return than passing plays. The differences aren’t overwhelming, but they are consistent, generally in the neighborhood of one or two tenths of an expected point. Notably, there’s no evidence to back up any claim that teams should pass the ball on second down – if anything, it’s the opposite.
These results are somewhat surprising given Kovash and Levitt’s seminal paper on NFL team behavior, which implies that teams should pass more than they currently do. One possibility is that the shorter field lessens the abilities of teams to throw the ball. Anecdotally, and as an example, teams seem to call way too many fade patterns.
All together, what did we learn?
First, there is no obvious truth to the theory that teams are passing more on second down and goal to go situations than we would expect. Second, there’s no evidence to the theory that they should be passing more often than they already are.
If anything, there’s a drop in efficiency on passing plays in goal-to-go situations, which may be showing up in the form of fewer expected points. However, I’m cautious of reading too much into this conclusion, given the nature of this analysis (it’s on aggregate, and doesn’t account for game and play specific factors) and the inherent difficulty in categorizing a team’s decision to run or a pass.
The 2016 Joint Statistical Meetings start Sunday in Chicago. Karl Broman put out his calendar – I figured I’d put mine together as well. Here’s a combination of sports, causal inference, and education talks that I noticed. If there’s something related that I should also be at, please let me know!
Also, kudos to Karl for pointing me to the JSM snack page. This is my fourth JSM and somehow this is the first I’ve heard about this. And also to Greg for helping out with JSM’s art show. This is the first of its kind, and hopefully something that grows in the future.
I’ve long had my eye out for intriguing papers that cover my two favorite areas of research, causal inference and sports statistics. For unfamiliar readers, causal inference tools allow for the estimation of causal effects (i.e., does smoking cause cancer) in non-experimental settings. Given that almost all sports data is inherently observational, there would seem to be opportunities for applied causal papers to answer questions in sports (here’s one).
It was with this vigor in mind that I read a paper, the Midweek Effect on Performance: evidence from the Bundesliga, recently posted and linked here. The authors, Alex Krumer & Michael Lechner – the latter of which has done substantial causal inference work – use propensity score matching to estimate the effect of midweek matches on home scoring.
The authors conclude that:
Playing midweek leads to an effect of about half a point in total, resulting from the home team losing about 0.2 points, while the away team gains about 0.3 points (the asymmetry results from the ‘3 points rule’)..it becomes clear that the home team loses all its home advantages in midweek games.
Interestingly, although matching is used as the primary method, selection effects (i.e., how much the weekend and weekday games differ) are weak. Primarily, conclusions are drawn as a result of the varying point totals described above.
As the authors discuss, several factors could be at play here, most notably referee bias and attendance. The authors also (gulp) suggest that testosterone levels could be linked to the poorer home team performance. In conclusion, Krumer & Lechner recommend that Bundesliga officials work to balance midweek game assignments.
All together, these findings would have a substantial place in sports literature with respect to the drivers of home advantage in sports. The results were so cool, in fact, that my first thought was: let’s replicate them.
Thanks to James Curley’s awesome engsoccerdatapackage, results from several professional soccer leagues are right at the R users fingertips. I started by trying to replicate the findings of Krumer and Lechner using the Bundesliga. Matching aside, our outcome of interest is a difference in difference: the average home point difference between weekend and weekday, minus the average away point difference between weekend and weekday.
In the paper linked above, the author’s found a gap of about 0.5 (in favor of weekend games) using the last 8 years of data.
As good news, so did I.
Using all years since 1960, I estimated the yearly average home weekend advantage. Sure enough, while there didn’t appear to be much of a difference prior to the year 2000, the last 15 years have seen a notable spike; teams are winning more points at home during weekend games. The blue line reflects the trend over time, which has roughly stabilized at at half a point over the last decade. The red dotted line reflects no difference.
Teams performed better on home weekday games in all but one of the last 10 years (2011). Meanwhile, in three of those years was the difference in point totals greater than 1. This difference is both statistically and practically significant (although it is important that only about 10% of a team’s games are weekday ones). Indeed, the author’s conclusions seem reasonable.
But replication on the data used by the authors is one thing; validation on another data set (e.g., another league) would go a long way towards confirming a weekday effect in professional soccer.
Fortunately, Curley’s R package contains more than the Bundesliga. I chose the English Premier League (EPL), Spain’s La Liga, and Italy’s Serie A to try and apply the paper’s approach elsewhere. If you want to try another league, feel free to expand upon it using my code posted here.
Long story short, I couldn’t replicate our initial findings in any of the three leagues. There wasn’t a single time-span in any of the EPL, La Liga, or Serie A where there was an additional benefit to teams playing home games on weekends.
Here are the three graphs, made similar to the one above. Note that there are varying x-axes: each league has had different numbers of seasons with weekday games. I went as far back as I could within each league (while also trying to assure continuity).
First, the home weekend advantage in Italy. By and large, there have been no differences.
Second, the home weekend advantage in England. Arguable, the weekday advantage was actually a disadvantage for a while.
Finally, the home weekend advantage in Spain. Again, if anything here, there has been a disadvantage.
To conclude, Krumer and Lechner find evidence of a difference in the home versus away point totals when comparing weekend and weekday games. Over the last decade, this magnitude of these differences has been fairly large – half a point, on average.
That said, while it was encouraging to replicate their findings, it is disconcerting the replications failed in three other top European leagues. There are obviously differences between the Bundesliga and each of the EPL, Serie A, and La Liga, including the types of weekdays on which each league plays its games (Serie A appears most similar to the Bundesliga in this respect, in not playing Monday games). However, there doesn’t appear to be anything else unique about the Bundesliga which would lend that league, and that league only, to a weekday effect.
That said, returning to the author’s original approach, this isn’t to say a midweek effect can be entirely discounted. If other leagues have assigned specific teams to midweek games based on past performances, it would mean our approach (a simple difference in difference one) was inappropriate. However, in absence of this other information, it seems more than plausible that the observed midweek effect in the Bundesliga could be accounted for due to chance.
Over the years, there have been several ‘draft curves’ put together in each of the four major North American sports. These charts provide intuitive visualizations of the relative value of each pick, while allowing us to better understand prospect potential and evaluate trades.
Despite the growing popularity of drafts in each sport, I was disappointed to find that there are apparently
No open-source guideline for how to make a draft curve and/or value chart
No attempt at comparing each of the sports’ draft curves simultaneously.
Those will be my goals here.
To start, I’ll explore how to estimate a draft value curve within a single sport. Then, I’ll compare curves between the NFL, NHL, NBA, and MLB using a pair of figures.
How to make a draft curve
The association between draft pick (x-axis) and player performance (y-axis) is generally non-linear, featuring a steep drop-off between the first few picks and a more steady decline thereafter. This is because the gap in talent between players chosen at picks 1 and 10 is, in expectation, larger than the gap between picks 50 and 60.
Thus, draft curves are most appropriately estimated using a non-linear fit. As examples, here are curves constructed using an exponential decay model, a logarithmic decay curve, locally weighted scatterplot smoothing (loess), and monotonic regression. However, like any fitting process, there is no right answer as far as which curve is most appropriate. While it’s beyond the scope of this report, an interesting project would compare different draft curves on some out-of-sample drafts to identify which type most accurately predicts average player performance. That’s a doable and important task.
The technique adopted here uses that of loess smoothing, which fits low degree polynomial functions between small subsets of the data, across all of the data. Loess is attractive as far as estimating draft curves for a few reasons. First, we don’t have to specify any specific functional form between pick number and player output. This is a big benefit, as instead of guessing what the association between our variables is, we’ll let the data tell us. Related, a loess method is more simple and flexible than a deterministic approach. As for downsides, the most obvious one is that we won’t be left with a simple equation with which to estimate player performance given a draft position. However, the estimated value of each pick, as well as a measure of uncertainty, can easily be extracted using software.
In fitting a loess smoother, the input most often controlled manually is the smoothing parameter, generally expressed as alpha, which accounts for the fraction of nearby points used in fitting the curve. Non-technically, alpha refers to the jigginess of each curve; values near 0 allow for a jagged trend, while values near 1 reflect more smoothness. I settled on an alpha of 0.4, which allows for the identification of some rougher edges, while hopefully rounding off others that are mostly due to random fluctuations.
Show me some charts
Once you have the data, estimating draft curves using a loess smoother within a single sport is doable in a few lines of code.
I started by using the pro-reference sites to scrape draft position and player performance measures in each of the four major sports. While not perfect, I settled on a player’s career wins above replacement (MLB), win shares (NBA), approximate value (NFL), and games played (NHL) as my outcome measures. In the case of the NHL, it may seem strange to use games played as a metric of player success. However, there’s a precedent for using this outcome – Michael Schuckers suggests doing so here. In any case, if you want to try your own outcomes, or change anything you see below, the code for the entire analysis is posted on my Github page.
Here’s a draft curve for NHL drafts between 1990 and 2005, using the first 100 picks and a loess smoother. The area in grey reflects our uncertainty above and below the curve at each pick number, and each dot represents a single player.
As far as games played, the average top pick is somewhere around 850, which is about three times the value of players picked late in the first round, and about six times the value of players chosen around pick 60. By and large, these numbers and this curve pass the smell test. So it’s a good start.
In addition, the NHL curve shows a nice feature of a loess fit which less flexible approaches would not have picked up on: right around pick 30, there’s a significant drop-off in games played. This dip could mean a few things. First, teams could be more willing to play their Rd. 1 picks on behalf of the sunk costs already invested in those players. Second, other teams could more frequently sign Rd. 1 free agents a few years down the road because of their prior label. Finally, and as a less provocative claim, because Rd. 1 picks are generally the top player chosen by their team, they’ll usually have an easier path to making their initial team’s rosters. While the player chosen 31st (Rd. 2) may have to outperform the player chosen 1st to make a roster, that’s usually not the case for the player chosen 30th.
Comparing across sports
While sport-level curves are interesting, I was also curious how each league has compared to one another.
There’s no easy way to answer to this question, however. In addition to disparities in the distribution and units of our outcome measures, there are also differences in the number of rounds in each sport’s draft (the NBA currently has 2, the NHL and NFL each have 7, and the MLB has 40).
One simple mechanism for making cross-sport comparisons is to only look at the top-60 picks, as this reflects the number of selections made in most NBA drafts. That handles our x-axis. To better understand the y-axis, I averaged the outcomes of players chosen between the 55th and 60th picks, using this number as a baseline. In the example above, we expect the top-pick in the NHL to be worth about 6 times that of the 60th pick.
Here’s a chart comparing the relative curves in each of the four sports, when divided by the average value of picks 55-60.
The top pick in the NBA draft is worth about 20x that of late second round picks, at least based on average win shares. Meanwhile, curves for MLB and the NHL are relatively similar. Finally, the most consistent pick-to-pick value appears in the NFL, where top picks are only worth roughly twice that of late round 2 picks, on average.
While the results of the NBA mostly matched expectations, the lack of any strong shape in the NFL curve, relative to the other sports, stands out. For example, it’s surprising that the MLB, which is evaluating high school players who are, by and large, a few years away from playing professionally, has a more significant drop-off in player talent at the top of the draft than is found in the NFL.
But don’t drafts have different numbers of rounds?
To account for the differing draft lengths (in rounds), we can tweak our curves so that the x-axis reflects the percentage of picks that were made up through each selection in each year, as opposed to a specific pick number. For example, the 50th percentile reflects the end of the NBA’s round 1, and roughly the middle of the 4th round in each of the NHL and the NFL. The MLB is excluded – at 40 rounds and with multiple minor league feeder teams, it is unclear that Rd. 40 of an MLB draft should be compared to, for example, Rd. 7 of an NFL or NHL draft.
In any case, here are draft curves across all rounds in the NBA, NFL, and NHL.
As in earlier, the NBA features the sharpest drop-off, while the NHL follows close behind. There’s a steeper decline in the NFL when looking across all rounds, with players chosen first overall, on average, worth about 10 times that of players chosen at the end of the draft.
Final assorted comments:
-It’s fair to use these curves to extrapolate to tanking incentives provided by each league. In the current system, it makes lots of sense to tank in the NBA, slightly less so in the NHL, and not as much sense in the NFL, as judged by the drop-offs in player talent.
-Pretty impressive job done by MLB scouting departments to accurately peg athletes who are three to four years away from playing professionally, and who stem from both top-level college programs and faraway high schools. Moreover, note that the MLB curve would appear even steeper if we could account for past issues with respect to league-level financial disparities. For a long period of time, more talented players were passed over by teams who could not afford them.
-Some technical notes: Our formal cutoff in the NHL was 210 picks – at one point it was higher, but I wanted this number to be consistent over time. Our NFL cutoff was 224 picks – at one point that number was higher, too. The NBA has used 2 rounds since 1989, so same cutoff throughout.
-One surprising factor to account for was a subtlety embedded in MLB draft history – players can be drafted more than once. This required setting initial player-level outcomes to 0 if that player was eventually drafted again.
-I don’t love my outcome measures, but they were the easiest ones available. As one positive sign, Saurabh at the Nylon Calculus found similar ratios to the ones above while using more advanced outcome measures in basketball.
-Finally, one could argue that a more preferred outcome would look at a player’s peak performance, instead of his career worth. That could certainly be the case. You could also make curves with “Probability of drafting an all-pro/all-star” as your outcome to answer a slightly different question.
No one wants to read about Patriots fumble rates, and I don’t want to write about Patriots fumble rates.
But I can’t not write about this.
The football person behind the initial commotion regarding low fumble rates was interviewed recently for a podcast. In response to a question about the 2015 season, in which the Patriots once again held onto the ball better than the rest of the league, the football person’s response was as follows:
One thing I noticed is that the weather and the climate up there during New England games was abnormally warm, which is one of the reasons that I found it phenomenal and crazy that they were having so few fumbles because as you know, and as I’ve studied and analyzed, it’s much more difficult to hold onto the football when you are playing out in the cold. So it was crazy how well they were able to hold onto the ball. But last year it was pretty warm, they didn’t have many cold weather games, and their fumble rate was pretty good as well.
Two suggestions were made clear:
1 – The weather during Patriots games was abnormally warm.
2 – It’s much more difficult to hold onto the football during cold weather
Let’s check these claims. Data from Armchair Analysis.
1 – The weather during 2015 Patriots games was abnormally warm.
A side-by-side boxplot should do the trick.
Here’s the temperature during Patriots games across the last 16 years.
The median, first quartile, and minimum game-time temperatures during Patriots games were not obviously different last year, and the temperature distribution in 2015 matches most of the prior years. It certainly does not appear to have been an “abnormally warm” year.
Writer’s claims: 0-for-1.
2 – It’s much more difficult to hold onto the football during cold weather
This is also straightforward to check out.
Using every game since 2000, I linked the game’s temperature to the fumble rate of the participating teams, defined as the total number of offensive team fumbles divided by the number of offensive team plays. So, 2 fumbles in 130 plays would give a fumble rate around 1.5%, or 0.015. If fumbles were associated with low temperatures, we would expect to see a decline in game-level fumble rates with increasing temperature.
Here’s a scatter plot.
While there’s a slight dip around 45 degrees, it’s neither statistically nor practically significant. The smoothed line moving through fumble rate and temperature is nearly perfectly horizontal. On aggregate, the cumulative rate of fumbles is relatively consistent across temperature (Note that a better analysis would probably use temperature in a model with play-level information).
Writer’s claims: 0-for-2.
So what does this mean?
The Patriots led the league with one of their lowest ever fumble rates again in 2015, which I have little double links to their style of play and their success (such as red-zone plays, playing with the lead, kneel downs, etc).
Despite the new evidence, our podcast guest appears to still be holding onto the idea that something funny is going on. Moreover, he’s doubling down on false claims, ones which at first glance appear reasonable. Like the initial analysis from a year ago, however, it’s mostly a bunch of hot air.
As part of their final assignment in my statistics and sports class, students were tasked with looking at the home advantage in the English Premier League (EPL). In some recent and related work, James Curley and Oliver Roeder found that, by 2014, an EPL home advantage had reached an all time low.
Interestingly, that low reached new depths in 2016.
Home teams have won 40.8% of games this past year, pending this weekend’s final contests. If that mark stands, it would be the lowest in EPL/English Division 1 history, one which dates back to 1888.
Here’s a chart, similar to the one that James and Oliver produced. Overall home team win percentage in each year is shown in black, draw percentage in red, and away win percentage in green. The grey region reflects our uncertainty in the trend curve.
As we knew there’d be, it’s a fairly big drop in win percentage, from roughly 60% to 45% across about 120 seasons. Using this rate of decline, we can expect home teams to win 0% of games by around the year 2400 (I kid).
While win percentage is a useful metric, it’s not perfect, as it doesn’t account for differences in team schedules. If the better teams generally got to host the worse teams, or if winners from previous year were forced to play one another more often (hi, NFL), overall home win percentage would fluctuate as a result. (Note that I am aware that EPL teams currently face each opponent exactly one time at both home and away stadia, which is nice and balanced. However, I’m not sure if that’s always been the case.)
As a result, a paired-comparison method that can account for the team strength of each team, and accordingly estimate a home advantage as a result, could be worth looking at. Using the BradleyTerry2 package in R, and with a hat-tip to James’ engsoccerdata package, I ran a Bradley-Terry model (BTM) with a home-team advantage coefficient within each season. When exponentiated, the coefficient from a BTM reflects a league-wide estimate of the home advantage, taken on an odds scale. As an example, if a team would have a 50% chance of a win at a neutral site (Odds = p/(1-p) = 1), they’d have a 60% win probability with a 50% increase in odds (Odds = .6/(1-0.6) = 1.5). As additional examples, 100% and 200% increases in odds would bump that 50% win probability team to 66% and 75%, respectively.
Here are the season-level increased odds of a home win across the last several decades. Conditional on team strength, in 2015, the odds of a team winning at home were about 25% higher than that team winning a neutral site. While still greater than 0, it’s again the lowest mark in league history.
It’s worth pointing out that the BTM coefficients are estimates, and do come with standard errors attached (generally about 0.12 on the log-odds scale). Even so, it’s interesting that the last 5 seasons all rank among the lowest 10 seasons as far as an estimated EPL home advantage.
James and Oliver posit several reasons as to what caused the drop in home advantage, including ease of travel and referee awareness. Their work also shows that the primary impetus behind fewer home wins is fewer home goals. Interestingly, while home advantage has also seemingly dropped in the NBA, it’s stayed relatively consistent in the NFL, MLB, and NHL.
It’s straight-forward to link EPL results with those in other professional soccer leagues. James’ data also includes La Liga (Spain), Serie A (Italy), and the Bundesliga (Germany), so I’ll use those.
Here’s a plot of win percentage across time in each of the four leagues. It’s sort of a cluster, but other leagues seem to match the EPL in terms of a home advantage in recent seasons.
Finally, we can use the BTM in each league in each season to get relative odds of a home win relative to that game being played on a neutral field. Here’s that graph.
-It’d be interesting to go back at each league’s schedule to identify where and why the two graphs above yield different results. For example, there’s a noticeable win percentage gap between the EPL and Serie A in 1975 that’s not apparent when looking at the BTM coefficients.
-The Bundesliga’s home advantage shape is quite strange in the years between 1962 and 1982, showing an inverse quadratic trend over time. I have no logical explanation for such an association. (In fairness, I wasn’t born yet.)
-England switched to a 3-point rule in 1981, while Germany, Italy, and Spain waited until the mid-1990’s. Behavioral economists would do well to look at the impact of rule changes using graphs like these. Generally, most related work only uses a few seasons of play (token plug for my NHL article).
-Nate Silver’s World Cup prediction model was taken to task after its overwhelming optimism for the host Brazilians in 2014. Given the results above, it’s possible that a decline in home advantage across soccer played a role. Nate’s model surely inflated Brazil’s chances because of what was anticipated to be a noticeable benefit of playing at home. But if much of the home advantage that used to exist in soccer was no longer a part of the game, it could explain why a prediction model would do so poorly for the hosts yet do so well when other teams played.
NHL Game 7’s are awesome, and in the next two days, we get two such contests – Dallas vs. St. Louis and San Jose vs. Nashville.
Here’s a primer on what to expect with respect to team behavior in these games. All of my findings used data from the nhlscrapr package in R for the 10 years of postseason action between 2006 and 2015.
Fewer penalties, slightly fewer goals in Game 7’s.
Nate has a nice chart covering the tendency for teams to accumulate fewer penalties in game 7’s, relative to other games in the series. I found roughly the same thing; about 11 total non-matching penalties per game during the first six contests of a series, compared to just seven in game 7’s. That’s about eight more minutes of even strength to expect in a Game 7.
Perhaps the fewer power plays awarded in game 7’s are driving a small but noticeable difference on the total number of goals; while games 1-6 average 5.3 goals per game, game 7’s average 4.9.
Penalties are less likely to be called at the beginning of Game 7’s.
Here’s a chart of the per-minute penalty rate comparing game 7’s to game’s 1-6. Each line reflects a smoothed curve, and the grey area reflects our uncertainty in each curve’s trend. Rates are adjusted to reflect the number of penalties we would expect if the rate of whistles for that minute of play were extended for an entire game.
The biggest difference in penalty rates between game 7’s and other games in a series looks to be the first period, where penalties are consistently called less often. This could be a combination of factors – players adopting a safer style of play, for example, or a referee’s hesitancy to call possible violations. Interestingly, these results mirror those from the NFL, where many judgement calls are rarely whistled to start a game.
There are also smaller differences in periods 2 and 3, and marked differences in the game’s final minute. However, note that rate differences at the end of the game are perhaps not too surprising given the frequent scrums in earlier games where teams do silly things like trying to “send a message.” Sidenote: I wonder how style of play would change if penalties at the end of a game carried over to a team’s next contest.
Here’s a similar plot looking at goals (empty net goals were excluded).
Note that the standard error bars for each curve were fairly large and overlapped throughout a game, and so differences between the two curves should be taken with a grain of salt. There is slightly less scoring throughout most of game 7’s, particularly at the end of the first period and at the beginning of the third period. Teams operate at about a two goals-per-game pace to begin the third period of game 7’s, for example.
More pressure, more call reversals?
In work a few years back (ignore the Excel chart! I was learning R at the time), Kevin and I found slightly higher rates of make-up calls in Game 7’s, relative to other games in a series. This came on top of the higher frequency of make-up calls in the postseason, relative to regular season action. Here, I’m defining a make-up call as one that works to even out the total number of penalties each team has.
A few years later, that trend still seems to hold. When a home team has exactly one more penalty that its opponent, it has received the next power play 58% of the time during the regular season. That number jumps to 62% in postseason game’s 1-6 and 68% in game 7’s. When owed two or more penalties, the home team has been awarded the next power play 61% of the time in the regular season, 65% of the time in game’s 1-6, and a remarkable 78% of the time (18 of 23 sequences) during game 7’s. So, if the home team gets behind on penalty differential, expect that to even up by game’s end.
Differences in postseason call reversals are not as evident when looking at if away teams are owed power plays. Across each game number, away teams that are owed penalties are given the next power play about 57% of the time.
A few days back, I summarized recent research on the NFL draft.
One interesting anecdote was the reliance of nearly all NFL draft research, both academic and non-academic, on Pro-Football-Reference’s approximate value (abbreviated AV), and it’s player-level career summary (cAV), a metric designed by Doug Drinen and described by Neil Paine in a series of articles here.
So, part of this post is designed to look at AV.
Before I start that, however, here’s a plug for those looking to do work in statistics in sports. We need more and better player-level outcomes in the NFL. Hockey, baseball, and basketball each have a half-dozen metrics which have or are being used and tested in the public sphere. As examples, we know which catchers are good at framing pitches, whether or not hockey teams should dump and chase or carry-in, and, for goodness sake, we have Nylon Calculus dominating the NBA scene on a daily basis. Meanwhile, football is mostly stuck on fourth downs and counting statistics. It will be difficult, but it is time for football fans to step up their games.
On today’s agenda:
What can we learn about Approximate Value?
Why does it matter?
Coming in the future:
How else can we analyze the NFL draft?
Pro-Football-References describes AV as “putting a single numerical value on any player’s season, at any position, from any year.”
Points are awarded to offense’s on the basis of team points per drive, relative to the league average. These points are then divided up among member’s of each offensive unit. A similar strategy is also used for defensive players based on points against. In addition, bonus points are given for being named an All-Pro, and points are also awarded for making the Pro Bowl at certain positions. Career AV numbers are weighted by season-level ones to account for varying numbers of seasons played. A more complete description is found here.
Here’s a rough summary of cAV using quarterbacks, one which passes the smell test: JaMarcus Russell is a 6, Tim Tebow a 12, Christian Ponder a 22, Ryan Fitzpatrick a 61, Alex Smith a 72, Matt Ryan a 96, Big Ben a 96, and Tom Brady a 160.
In understanding AV, it is useful to start by looking at the distributions by position among players drafted before 2010. Here’s a plot of career AV (cAV), split by era. I left out players drafted after 2010 as they have not had much of a chance to accumulate a representable career number.
A few things stand out.
-The distribution of cAV by position appears consistent between eras. That is, the shapes of each boxplot are the roughly same within each position, moving from era to era. This seems interesting, and arguably debatable, given changes to the game since 1967.
-At all positions, more than 50% of players accumulate a cAV of less than 10 (recall: Tim Tebow’s was 12). This makes sense, given the number of players drafted that barely play a game. The right skew at each position also seems reasonable.
-Tight ends appear to be the lowest-valued position, with offensive linemen and linebackers the highest, as judged by median career AV and density in the right tails.
-Relative to what I perceive of the last several years of the NFL, as well as what teams are paying top players, there seems to be a bit of disconnect. For example, quarterbacks have long been paid substantially more than players at other positions, implying that the best ones are worth more to their teams than the best players at other positions. Other than a few quarterbacks drafted in the 1990’s, this doesn’t show up in cAV.
It’s straightforward to link draft position to cAV.
Let’s start by fitting cAV to draft position across all seasons and positions. Each point below is a player, and the blue curve is a non-parametric smoother, which accounts for the likely non-linear association between draft position and performance. If this curve looks familiar, it looks like the one used in several of the draft-trade charts in circulation.
One question that I am really curious about is whether or not teams have gotten better at drafting since the late 1960’s. An obvious thing to do would be to fit an identical curve to the one shown above across different eras. If curves in recent years are higher and/or steeper, it would suggest that teams in today’s NFL are better at identifying talent and are drafting those players earlier.
Unfortunately, there’s a problem with that strategy. Turns out, cAV has grown over time. Here’s the average player cAV among the draft’s first 260 picks (this is the number of picks in the last few years. The trend is identical when using different cutoffs). The grey area accounts for our uncertainty in the trend curve.
The average cAV of players drafted in 2006 is about 3-4 units higher when compared to players drafted four decades ago. It’s a significant increase as well as a practical one. All together, on a yearly basis, about 1000 more units of cAV are being handed out now than in the early stages of the league. [Edits: This matches conclusions found by Danny at Football Outsiders. Sean proposes that expansion is responsible for part of this increase, given that AV is tied to playing time.]
Looking into the formula for AV, it is unclear what is causing these increases. One possibility is that more players have been drafted in recent years in the positions that AV values more than others. Another possibility, albeit a stretch, would blame recent increases in the number of Pro Bowl players, as AV rewards this.
Why does this matter?
Well, as far as looking at draft success over time, increases in cAV are vital to identify. Without adjusting for when a player was drafted, standard analyses could confuse improved league-wide drafting ability with what appears to be era-specific changes in our outcome measure.
To sum, we know that drafting players in any sport is hard given the extrapolation required to project the performance of young athletes several years down the line. In addition, analyzing the NFL draft is made even more difficult given that what is currently a poor set of outcome measures.
Much of today’s draft research uses cAV, a promising metric which is attractive in that it’s a single number accounting for many team and individual level performance metrics. However, we showed above a few potential weaknesses in cAV, at least with respect for its use in looking at the NFL draft. Overall:
-Approximate value mostly the smell test as far as players ranked high and low.
-Arguably, position-level AV measures don’t perfectly seem to match league-wide trends.
-Valuations are increasing in time.
Under these constraints, one reasonable recommendation for using cAV is to make it year and position specific. In future work, I’ll propose one way of doing this.
Another NFL draft has come and gone, and with it has come the predictable displays of unyielding optimism, stale and arguably race-based generalizations of player skill, and, as a relative newcomer in 2016, lazy misuse of the term analytics.
In following along this spring, it became clear that what is mainstream knowledge among researchers is far from it in the national media. This despite a decent amount of both academic and non-academic research into the topic.
For those new to the scene, or even for a few veterans who may have missed an article or two along the way, I decided to write a quick review of what’s out there. Note that many of the following points are related to one another.
1. Top draft picks are overvalued.
In an efficient market, the value of picks traded between teams would be equivalent. That is, if Pick X was traded for Picks Y and Z, the value added by picking in spot X should equal that of picking in spots Y and Z. Interestingly, in a pair of well-regarded papers from researchers Cade Massey and Richard Thaler, the authors identified marked inefficiencies in how NFL franchises value different draft picks.
Turns out, the preferable solution is almost always to acquire more picks. In our example, better to have Y and Z than X alone. It’s why the the common suggestion among many who have studied the draft is that because teams overvalue earlier picks, it is generally a good idea to trade down. The edge goes to those who take advantage of the overconfidence shown by others.
In addition to being overvalued on behalf of league officials, players picked at the top of the draft have also tended to offer less of a return on their investment than those picked at the end of round one. To be specific, the NFL rookie pay scale is nearly fixed, and players picked near the top of the draft get paid substantially more. As a result, for several years, it was optimal to pick late in the first or early in the second rounds, where the expectation was to obtain a decent player with a substantially more palatable contract. See Figure III in Massey & Thaler’s paper here, as one example. The author’s aptly call this a loser’s curse; bad NFL teams are stuck picking at the top of the first round, where the expected return is actually lower.
Note that given the new CBA which updated rookie contracts in 2014, it is less clear if such a surplus value still exists in today’s NFL.
3. Teams are, by and large, making guesses
One of the most consistent findings in current literature is that a team’s prowess in picking good players doesn’t translate from one season to the next. Chase reached such a conclusion a few years ago, and I really liked Neil’s scatter plot below, which shows the lack of a year-to-year consistency in team-level returns over expectation.
To sum, despite all that you’ll hear to the contrary, there’s no evidence to date that an organizational-level ability to identify talent in the draft is repeatable.
And if team’s are unable to consistently identify good picks in the draft, it’s safe to say that draft pundits can’t, either. In fact, when I compared Mel Kiper’s draft grades to his re-grade of that same draft after a few seasons of play, the link was wink. That is, Kiper’s post-hoc analysis of a draft barely drew any of the same conclusions as his first one.
So, whenever you hear team officials boasting about an incoming player – or even when looking at the so-called draft ‘grades’ put out everywhere – it’s not much different than listening to someone saying they like heads in a coin flip or red on the roulette table.
4. Sunk costs make analysis difficult.
Let’s take an interlude to talk fantasy football. Imagine that you drafted Peyton Manning early in 2016, making a substantial investment in a quarterback who you anticipated would lead your squad to greener pastures. Turns out, Manning stunk from week 1 onwards. But there he was in your week 5 line-up, then again in your week 10 line-up, and so on. You couldn’t move on!
Turns out, NFL organizations also suffer from the same problem. Such was the conclusion of Quinn Keifer, who found that players chosen at the end of Rd. 1 received substantially more playing time than those who were picked at the beginning of Rd. 2. In other words, the premium made on first round picks was responsible for eventual increases in games started. Keifer blames sunk costs, in which prior investments have led to teams making irrational decisions down the road. This is also what ruined any fantasy player who stuck with Manning.
This result matters when it comes to analyzing player success. If organizations are afraid to cut loose players that they drafted early, it would muddle any link between draft round and player success, and it could artificially inflate the link between early draft picks and long term success.
5. What else is out there?
I explored a few major findings above, but here is a brief synopsis of other interesting material.
-There may be position-level variability as far as identifying talent. Of course, this becomes complicated at some positions, such as linebacker. Write Zach & Rob, “It’s also possible that players with good pedigrees and name recognition are being given preferential treatment when awards are granted, in the absence of meaningful performance data (like what exists for offensive skill players) to challenge our perceptions.”
-Neil and Alison look at position-level value here.
-In a more recent article, a pair of authors conclude that picks made from teams that trade up tend to outperform picks made when teams stand pat, as judged by games started and approximate value in a player’s first three years. However, my initial reaction to this work is one of skepticism, in large part because of what we went over earlier about sunk costs. If teams trade up for a player, that investment could lend itself towards a team aggressively showcasing him. Additionally, the author’s do not mention the often prohibitive cost of trading up, which could offset any gains in surplus drafting.
-In the next few days, I’ll have some posts that look at the history of the draft and team success. More to come.
So what did we learn?
It is extremely difficult to consistently identify talent in the NFL draft. However, there are inefficiencies, which generally arise when teams overvalue players by trading up to acquire them. Such displays of confidence belie a process which is, by and large, no different than picking heads or tails before a coin toss.