After six years of graduate school – two at UMass-Amherst (MS, statistics), and four more at Brown University (PhD, biostatistics) – I am finally done (I think). At this point, I have a few weeks left until my next challenge awaits, when I start a position at Skidmore College as an assistant professor in statistics this fall. While my memories are fresh, I figured it might be a useful task to share some advice that I have picked up over the last several years. Thus, here’s a multi-part series on the lessons, trials, and tribulations of statistics graduate programs, from an n = 1 (or 2) perspective.
Part IV: What I wish I had learned in my graduate program in statistics (with Greg Matthews) The point of this series is to be as helpful as possible to students considering statistics graduate programs now or at some point later in their lives. As a result, if you have any comments, please feel free to share below. Also, I encourage anyone interested in this series to read two related pieces:
I spent last Saturday at the first Boston Hockey Analytics conference, a gathering of analytically inclined hockey faithful. For those unable to attend, here are a few highlights. Note that, to the best of my knowledge there was no audio or visual recording of this conference.
–Michael Schuckers gave a talk summarizing the state of goalie research, including material that he’s working on for an upcoming book chapter. For the unfamiliar, the most common metric in evaluating NHL goaltenders is save percentage, which is limited in part because different goaltenders face different distributions of shots over the course of a season. Indeed, you could even have a Simpson’s Paradox scenario, where Goalie A is better at saving each type of shot than Goalie B, but that Goalie B still ends up with a better save percentage overall. This upcoming book chapter will be a must-read.
-Schuckers also pushed for those in the audience to do whatever necessary to get the NHL to share its tracking data. IMO, this is a no-brainer. The world of basketball is better for the brief look into this rich information that the NBA shared during the 2014-15 season and parts of the 2015-16 one. See, among other examples, this excellent tutorial on how to scrape and analyze player movements. The NHL’s lagging behind, and given the well known flaws that the league has with scorer/rink biases, the potential is there for public analysts to answer some excellent questions and help grow the game.
-Rob Vollman gave a talk on roster construction, providing a glimpse into how rules of the CBA dictate who and what players are reasonable values. You can buy Rob’s book here, which presents this and other analytically driven research. The takeaway linking Rob’s and Schuckers’ talks: don’t give goalies massive contracts, as there’s too good of a chance they won’t be worth it.
-I gave a talk on how to use R for reproducible hockey research. Slides and code here (note: download the pdf of the slides if you are looking for links). There are very few hockey researchers who share both their code and data. It’d be better for everyone involved if we can change this.
-Cole Anderson presented work on an ELO-based player comparison tool, in which the hockey public can rank players. This makes sense, particularly given that traditional player rankings (say, a scale of 1-10) can lead to ambiguous numbers (like 7.7). Cole’s work appears similar to the surveys that 538 has used to, for example, rank James Bond villains or pick summer Olympic sports. Cole’s code is also in R, and available for your perusal here. Hope to see more out of this project.
-Eric Cantor gave a talk looking at roster construction, looking at the Tampa Bay Lightning’s experience with seven active defensemen in place of the usual six. Eric’s evidence suggests that Tampa performed slightly better with the non-traditional construction.
-Ryan Davenport and Edwin Niederberger looked at shot locations from world tournaments, including the recent World Cup and the Olympic seasons. Relative to the rest of the world, the USA’s backline looks particularly ineffective offensively.
-Brian Carothers and Joseph Nelson gave two talks. First, the pair looked at quantifying defensemen given their hit and blocked shot totals, the slides of which are found here. Second, the pair led a Python workshop, the overview of which is linked here and the code of which is found here. Python and R are both free and powerful. You should learn (at least) one of them.
-Rob was asked how many teams are doing appropriate due diligence with respect to analytics? Rob guessed six, with most teams “nowhere close.”
-Billy Jaffe (New England Sports Network), Neil Abbott (player agent), and Ron Rolston (coach) led a panel moderated by Babson’s Rick Cleary which summarized their perspectives on how analytics have changed the game. Perhaps unsurprisingly, their biggest take-home is that these practitioners don’t care about what model you may have chosen or what (statistical) tools you needed to employ, they just want immediate, actionable, and simplified recommendations. This begs the question – that wasn’t asked – what happens when those suggestions don’t match their prior viewpoints?
-Although each panelist likely loses more hockey knowledge in their sleep than I’ll ever learn, there was a bit too much selecting on the dependent variable for my taste. In other words, because Team X and Team Y have recently won the Stanley Cup, this is how all teams need to win the Stanley Cup. Hockey’s way to random for that.
-Perhaps given that the conference was held in Boston, the Bruins’ 2011 Cup winning team was held in particularly high regard. Two of the panelists, for example, praised the Bruins’ winning culture and development of a high character locker room as driver’s behind their success. Of course, if that Boston team was so good at hockey, why did it need seven games – and an overtime – just to get out of the first round? If Montreal had won that round one series instead of Boston, did Boston still have a winning culture and a high character locker room?
-I missed a few other talks, but if those researchers or anyone else wants to share materials, please send them along! And many thanks to Luke, Rick, George, Michael, Rob, and the rest of the organizers for their great work.
Researchers have long and loudly banged the drum that NFL teams should be more aggressive on fourth downs. Among other sources, see recent work at the 4th down bot and Brian Burke’s primer here, the latter of which links to a handful of excellent academic papers.
However, there’s no evidence that any of this work has made a difference as far as team behavior. For example, in close games (two possessions or less) prior to the fourth quarter, teams went for it 6.4% of the time in 2015, nearly identical to the 6.5% in the year 2000. In fact, that number was as low as 4.8% in the year 2011.
But perhaps just as many of us analytically inclined fans were ready to give up, there seemed to be a few more aggressive plays in week 1 of the 2016 season, highlighted by Antonio Brown’s fourth-down touchdown grab on Monday night.
League-wide, was this a meaningful uptick in aggressiveness?
For now, the answer is a slightly unsatisfying maybe.
Using Armchair Analysis’ excellent data, I grabbed every fourth down play since the 2000 season. We’re interested in whether or not teams playing close games (two possessions or less) went for it on fourth down, defined as either attempting a rush or a pass. I filtered out fourth quarter plays, as decisions later in the game are too often dictated by game situation.
In week 1 of 2016, teams went for it on 4th down on 18 of a possible 160 attempts (11.2%). That’s the highest such percentage for week 1 games since 2000, and it’s not particularly close.
The following chart shows the weekly 4th down attempt percentage in these situations. I included a separate point for each season to give a sense of the season-to-season and the week-to-week variability. The smooth blue line reflects the trend, and the grey area reflects our uncertainty in the average fourth down attempt rate.
The aggressiveness observed in week 1 of 2016 (the red point in the top left) is only exceeded on a week-level basis by 7 weeks total since 2000. Interestingly, the chart also points to the possibility that coaches are slightly more aggressive later in the year, as shown by the increasing trend by week of the season.
It’s safe to say that, given what is traditionally less-aggressive behavior during the early parts of the season, week 1 of 2016 stands out as unusual. That said, there are several caveats. This analysis doesn’t take into account other factors that undoubtably are linked to fourth-down calls, including field position and opponent characteristics. For example, it’s certainly feasible that there just happened to be more 4th-down attempts in 4th-down friendly spots during week 1.
In any case, it’s certainly worth monitoring as the season progresses.
During Sunday’s NCAA contest between Notre Dame and Texas, the Longhorns faced a second-and-goal from the Irish one yard line midway through the second quarter.
“I wouldn’t be surprised to see a run,” said ABC announcer Todd Blackledge, opining on what type of play the Longhorns should try. “However, I will say that second down is the down to throw if you want to throw.”
Blackledge’s comment is classic football-think, behavior that’s been suggested for decades with no known empirical basis. However, thanks to the supple data of Armchair Analysis, it’s the type of behavior that easy to check and quite possibly validate (at least using NFL data). Thus, the two questions I’ll attempt to answer are:
How do coaches call plays in goal to go situations? Second, how should they call plays in goal to go situations?
Armchair’s database contains each NFL play since 2000, which I filtered into goal-to-go plays that occurred on first through third downs. I also cut out fourth quarter plays, as to worry less about the effects of varying late-game behavior.
How do coaches call plays? Here’s a barchart showing the percentage of plays which are passes, separated by down and distance (Note: called passes include sacks).
With one yard to go, roughly one in four plays are passes, which is roughly the same on first, second, and third downs. Across other distances, coaches are fairly consistent in their desire to run on first down and throw on third down, with second down decisions roughly a 50-50 split.
Interestingly, there is no noticeable spike in teams calling pass plays on second down, at least not relative to their behavior at other downs and distances. So, while football coaches love to talk about throwing the ball on second and goal, they aren’t necessarily acting that way.
Perhaps the more interesting question is how should coaches call their plays in goal to go situations. The answer is not straightforward. For example, passing plays are more likely to yield touchdowns on longer plays, but they’re also more likely to yield negative plays and plays of no gain.
One possibility is to consider the drive’s eventual point total as the outcome and to work from there. Using the same set of plays, I used the drive’s result (categorized as a touchdown, field goal, or neither) to estimate the average point total given each play call at the various down and distances.
Here’s a chart of expected points, separated by runs and passes.
Each point in the chart reflects the expected point total given a run or a pass at a certain down (1st, 2nd, or 3rd) and distance (x-axis). The size of the circle is proportional to the number of plays called at each point.
Interestingly, across most downs and distances, running plays offer slightly more of a return than passing plays. The differences aren’t overwhelming, but they are consistent, generally in the neighborhood of one or two tenths of an expected point. Notably, there’s no evidence to back up any claim that teams should pass the ball on second down – if anything, it’s the opposite.
These results are somewhat surprising given Kovash and Levitt’s seminal paper on NFL team behavior, which implies that teams should pass more than they currently do. One possibility is that the shorter field lessens the abilities of teams to throw the ball. Anecdotally, and as an example, teams seem to call way too many fade patterns.
All together, what did we learn?
First, there is no obvious truth to the theory that teams are passing more on second down and goal to go situations than we would expect. Second, there’s no evidence to the theory that they should be passing more often than they already are.
If anything, there’s a drop in efficiency on passing plays in goal-to-go situations, which may be showing up in the form of fewer expected points. However, I’m cautious of reading too much into this conclusion, given the nature of this analysis (it’s on aggregate, and doesn’t account for game and play specific factors) and the inherent difficulty in categorizing a team’s decision to run or a pass.
The 2016 Joint Statistical Meetings start Sunday in Chicago. Karl Broman put out his calendar – I figured I’d put mine together as well. Here’s a combination of sports, causal inference, and education talks that I noticed. If there’s something related that I should also be at, please let me know!
Also, kudos to Karl for pointing me to the JSM snack page. This is my fourth JSM and somehow this is the first I’ve heard about this. And also to Greg for helping out with JSM’s art show. This is the first of its kind, and hopefully something that grows in the future.
I’ve long had my eye out for intriguing papers that cover my two favorite areas of research, causal inference and sports statistics. For unfamiliar readers, causal inference tools allow for the estimation of causal effects (i.e., does smoking cause cancer) in non-experimental settings. Given that almost all sports data is inherently observational, there would seem to be opportunities for applied causal papers to answer questions in sports (here’s one).
It was with this vigor in mind that I read a paper, the Midweek Effect on Performance: evidence from the Bundesliga, recently posted and linked here. The authors, Alex Krumer & Michael Lechner – the latter of which has done substantial causal inference work – use propensity score matching to estimate the effect of midweek matches on home scoring.
The authors conclude that:
Playing midweek leads to an effect of about half a point in total, resulting from the home team losing about 0.2 points, while the away team gains about 0.3 points (the asymmetry results from the ‘3 points rule’)..it becomes clear that the home team loses all its home advantages in midweek games.
Interestingly, although matching is used as the primary method, selection effects (i.e., how much the weekend and weekday games differ) are weak. Primarily, conclusions are drawn as a result of the varying point totals described above.
As the authors discuss, several factors could be at play here, most notably referee bias and attendance. The authors also (gulp) suggest that testosterone levels could be linked to the poorer home team performance. In conclusion, Krumer & Lechner recommend that Bundesliga officials work to balance midweek game assignments.
All together, these findings would have a substantial place in sports literature with respect to the drivers of home advantage in sports. The results were so cool, in fact, that my first thought was: let’s replicate them.
Thanks to James Curley’s awesome engsoccerdatapackage, results from several professional soccer leagues are right at the R users fingertips. I started by trying to replicate the findings of Krumer and Lechner using the Bundesliga. Matching aside, our outcome of interest is a difference in difference: the average home point difference between weekend and weekday, minus the average away point difference between weekend and weekday.
In the paper linked above, the author’s found a gap of about 0.5 (in favor of weekend games) using the last 8 years of data.
As good news, so did I.
Using all years since 1960, I estimated the yearly average home weekend advantage. Sure enough, while there didn’t appear to be much of a difference prior to the year 2000, the last 15 years have seen a notable spike; teams are winning more points at home during weekend games. The blue line reflects the trend over time, which has roughly stabilized at at half a point over the last decade. The red dotted line reflects no difference.
Teams performed better on home weekday games in all but one of the last 10 years (2011). Meanwhile, in three of those years was the difference in point totals greater than 1. This difference is both statistically and practically significant (although it is important that only about 10% of a team’s games are weekday ones). Indeed, the author’s conclusions seem reasonable.
But replication on the data used by the authors is one thing; validation on another data set (e.g., another league) would go a long way towards confirming a weekday effect in professional soccer.
Fortunately, Curley’s R package contains more than the Bundesliga. I chose the English Premier League (EPL), Spain’s La Liga, and Italy’s Serie A to try and apply the paper’s approach elsewhere. If you want to try another league, feel free to expand upon it using my code posted here.
Long story short, I couldn’t replicate our initial findings in any of the three leagues. There wasn’t a single time-span in any of the EPL, La Liga, or Serie A where there was an additional benefit to teams playing home games on weekends.
Here are the three graphs, made similar to the one above. Note that there are varying x-axes: each league has had different numbers of seasons with weekday games. I went as far back as I could within each league (while also trying to assure continuity).
First, the home weekend advantage in Italy. By and large, there have been no differences.
Second, the home weekend advantage in England. Arguable, the weekday advantage was actually a disadvantage for a while.
Finally, the home weekend advantage in Spain. Again, if anything here, there has been a disadvantage.
To conclude, Krumer and Lechner find evidence of a difference in the home versus away point totals when comparing weekend and weekday games. Over the last decade, this magnitude of these differences has been fairly large – half a point, on average.
That said, while it was encouraging to replicate their findings, it is disconcerting the replications failed in three other top European leagues. There are obviously differences between the Bundesliga and each of the EPL, Serie A, and La Liga, including the types of weekdays on which each league plays its games (Serie A appears most similar to the Bundesliga in this respect, in not playing Monday games). However, there doesn’t appear to be anything else unique about the Bundesliga which would lend that league, and that league only, to a weekday effect.
That said, returning to the author’s original approach, this isn’t to say a midweek effect can be entirely discounted. If other leagues have assigned specific teams to midweek games based on past performances, it would mean our approach (a simple difference in difference one) was inappropriate. However, in absence of this other information, it seems more than plausible that the observed midweek effect in the Bundesliga could be accounted for due to chance.
Over the years, there have been several ‘draft curves’ put together in each of the four major North American sports. These charts provide intuitive visualizations of the relative value of each pick, while allowing us to better understand prospect potential and evaluate trades.
Despite the growing popularity of drafts in each sport, I was disappointed to find that there are apparently
No open-source guideline for how to make a draft curve and/or value chart
No attempt at comparing each of the sports’ draft curves simultaneously.
Those will be my goals here.
To start, I’ll explore how to estimate a draft value curve within a single sport. Then, I’ll compare curves between the NFL, NHL, NBA, and MLB using a pair of figures.
How to make a draft curve
The association between draft pick (x-axis) and player performance (y-axis) is generally non-linear, featuring a steep drop-off between the first few picks and a more steady decline thereafter. This is because the gap in talent between players chosen at picks 1 and 10 is, in expectation, larger than the gap between picks 50 and 60.
Thus, draft curves are most appropriately estimated using a non-linear fit. As examples, here are curves constructed using an exponential decay model, a logarithmic decay curve, locally weighted scatterplot smoothing (loess), and monotonic regression. However, like any fitting process, there is no right answer as far as which curve is most appropriate. While it’s beyond the scope of this report, an interesting project would compare different draft curves on some out-of-sample drafts to identify which type most accurately predicts average player performance. That’s a doable and important task.
The technique adopted here uses that of loess smoothing, which fits low degree polynomial functions between small subsets of the data, across all of the data. Loess is attractive as far as estimating draft curves for a few reasons. First, we don’t have to specify any specific functional form between pick number and player output. This is a big benefit, as instead of guessing what the association between our variables is, we’ll let the data tell us. Related, a loess method is more simple and flexible than a deterministic approach. As for downsides, the most obvious one is that we won’t be left with a simple equation with which to estimate player performance given a draft position. However, the estimated value of each pick, as well as a measure of uncertainty, can easily be extracted using software.
In fitting a loess smoother, the input most often controlled manually is the smoothing parameter, generally expressed as alpha, which accounts for the fraction of nearby points used in fitting the curve. Non-technically, alpha refers to the jigginess of each curve; values near 0 allow for a jagged trend, while values near 1 reflect more smoothness. I settled on an alpha of 0.4, which allows for the identification of some rougher edges, while hopefully rounding off others that are mostly due to random fluctuations.
Show me some charts
Once you have the data, estimating draft curves using a loess smoother within a single sport is doable in a few lines of code.
I started by using the pro-reference sites to scrape draft position and player performance measures in each of the four major sports. While not perfect, I settled on a player’s career wins above replacement (MLB), win shares (NBA), approximate value (NFL), and games played (NHL) as my outcome measures. In the case of the NHL, it may seem strange to use games played as a metric of player success. However, there’s a precedent for using this outcome – Michael Schuckers suggests doing so here. In any case, if you want to try your own outcomes, or change anything you see below, the code for the entire analysis is posted on my Github page.
Here’s a draft curve for NHL drafts between 1990 and 2005, using the first 100 picks and a loess smoother. The area in grey reflects our uncertainty above and below the curve at each pick number, and each dot represents a single player.
As far as games played, the average top pick is somewhere around 850, which is about three times the value of players picked late in the first round, and about six times the value of players chosen around pick 60. By and large, these numbers and this curve pass the smell test. So it’s a good start.
In addition, the NHL curve shows a nice feature of a loess fit which less flexible approaches would not have picked up on: right around pick 30, there’s a significant drop-off in games played. This dip could mean a few things. First, teams could be more willing to play their Rd. 1 picks on behalf of the sunk costs already invested in those players. Second, other teams could more frequently sign Rd. 1 free agents a few years down the road because of their prior label. Finally, and as a less provocative claim, because Rd. 1 picks are generally the top player chosen by their team, they’ll usually have an easier path to making their initial team’s rosters. While the player chosen 31st (Rd. 2) may have to outperform the player chosen 1st to make a roster, that’s usually not the case for the player chosen 30th.
Comparing across sports
While sport-level curves are interesting, I was also curious how each league has compared to one another.
There’s no easy way to answer to this question, however. In addition to disparities in the distribution and units of our outcome measures, there are also differences in the number of rounds in each sport’s draft (the NBA currently has 2, the NHL and NFL each have 7, and the MLB has 40).
One simple mechanism for making cross-sport comparisons is to only look at the top-60 picks, as this reflects the number of selections made in most NBA drafts. That handles our x-axis. To better understand the y-axis, I averaged the outcomes of players chosen between the 55th and 60th picks, using this number as a baseline. In the example above, we expect the top-pick in the NHL to be worth about 6 times that of the 60th pick.
Here’s a chart comparing the relative curves in each of the four sports, when divided by the average value of picks 55-60.
The top pick in the NBA draft is worth about 20x that of late second round picks, at least based on average win shares. Meanwhile, curves for MLB and the NHL are relatively similar. Finally, the most consistent pick-to-pick value appears in the NFL, where top picks are only worth roughly twice that of late round 2 picks, on average.
While the results of the NBA mostly matched expectations, the lack of any strong shape in the NFL curve, relative to the other sports, stands out. For example, it’s surprising that the MLB, which is evaluating high school players who are, by and large, a few years away from playing professionally, has a more significant drop-off in player talent at the top of the draft than is found in the NFL.
But don’t drafts have different numbers of rounds?
To account for the differing draft lengths (in rounds), we can tweak our curves so that the x-axis reflects the percentage of picks that were made up through each selection in each year, as opposed to a specific pick number. For example, the 50th percentile reflects the end of the NBA’s round 1, and roughly the middle of the 4th round in each of the NHL and the NFL. The MLB is excluded – at 40 rounds and with multiple minor league feeder teams, it is unclear that Rd. 40 of an MLB draft should be compared to, for example, Rd. 7 of an NFL or NHL draft.
In any case, here are draft curves across all rounds in the NBA, NFL, and NHL.
As in earlier, the NBA features the sharpest drop-off, while the NHL follows close behind. There’s a steeper decline in the NFL when looking across all rounds, with players chosen first overall, on average, worth about 10 times that of players chosen at the end of the draft.
Final assorted comments:
-It’s fair to use these curves to extrapolate to tanking incentives provided by each league. In the current system, it makes lots of sense to tank in the NBA, slightly less so in the NHL, and not as much sense in the NFL, as judged by the drop-offs in player talent.
-Pretty impressive job done by MLB scouting departments to accurately peg athletes who are three to four years away from playing professionally, and who stem from both top-level college programs and faraway high schools. Moreover, note that the MLB curve would appear even steeper if we could account for past issues with respect to league-level financial disparities. For a long period of time, more talented players were passed over by teams who could not afford them.
-Some technical notes: Our formal cutoff in the NHL was 210 picks – at one point it was higher, but I wanted this number to be consistent over time. Our NFL cutoff was 224 picks – at one point that number was higher, too. The NBA has used 2 rounds since 1989, so same cutoff throughout.
-One surprising factor to account for was a subtlety embedded in MLB draft history – players can be drafted more than once. This required setting initial player-level outcomes to 0 if that player was eventually drafted again.
-I don’t love my outcome measures, but they were the easiest ones available. As one positive sign, Saurabh at the Nylon Calculus found similar ratios to the ones above while using more advanced outcome measures in basketball.
-Finally, one could argue that a more preferred outcome would look at a player’s peak performance, instead of his career worth. That could certainly be the case. You could also make curves with “Probability of drafting an all-pro/all-star” as your outcome to answer a slightly different question.
No one wants to read about Patriots fumble rates, and I don’t want to write about Patriots fumble rates.
But I can’t not write about this.
The football person behind the initial commotion regarding low fumble rates was interviewed recently for a podcast. In response to a question about the 2015 season, in which the Patriots once again held onto the ball better than the rest of the league, the football person’s response was as follows:
One thing I noticed is that the weather and the climate up there during New England games was abnormally warm, which is one of the reasons that I found it phenomenal and crazy that they were having so few fumbles because as you know, and as I’ve studied and analyzed, it’s much more difficult to hold onto the football when you are playing out in the cold. So it was crazy how well they were able to hold onto the ball. But last year it was pretty warm, they didn’t have many cold weather games, and their fumble rate was pretty good as well.
Two suggestions were made clear:
1 – The weather during Patriots games was abnormally warm.
2 – It’s much more difficult to hold onto the football during cold weather
Let’s check these claims. Data from Armchair Analysis.
1 – The weather during 2015 Patriots games was abnormally warm.
A side-by-side boxplot should do the trick.
Here’s the temperature during Patriots games across the last 16 years.
The median, first quartile, and minimum game-time temperatures during Patriots games were not obviously different last year, and the temperature distribution in 2015 matches most of the prior years. It certainly does not appear to have been an “abnormally warm” year.
Writer’s claims: 0-for-1.
2 – It’s much more difficult to hold onto the football during cold weather
This is also straightforward to check out.
Using every game since 2000, I linked the game’s temperature to the fumble rate of the participating teams, defined as the total number of offensive team fumbles divided by the number of offensive team plays. So, 2 fumbles in 130 plays would give a fumble rate around 1.5%, or 0.015. If fumbles were associated with low temperatures, we would expect to see a decline in game-level fumble rates with increasing temperature.
Here’s a scatter plot.
While there’s a slight dip around 45 degrees, it’s neither statistically nor practically significant. The smoothed line moving through fumble rate and temperature is nearly perfectly horizontal. On aggregate, the cumulative rate of fumbles is relatively consistent across temperature (Note that a better analysis would probably use temperature in a model with play-level information).
Writer’s claims: 0-for-2.
So what does this mean?
The Patriots led the league with one of their lowest ever fumble rates again in 2015, which I have little double links to their style of play and their success (such as red-zone plays, playing with the lead, kneel downs, etc).
Despite the new evidence, our podcast guest appears to still be holding onto the idea that something funny is going on. Moreover, he’s doubling down on false claims, ones which at first glance appear reasonable. Like the initial analysis from a year ago, however, it’s mostly a bunch of hot air.
As part of their final assignment in my statistics and sports class, students were tasked with looking at the home advantage in the English Premier League (EPL). In some recent and related work, James Curley and Oliver Roeder found that, by 2014, an EPL home advantage had reached an all time low.
Interestingly, that low reached new depths in 2016.
Home teams have won 40.8% of games this past year, pending this weekend’s final contests. If that mark stands, it would be the lowest in EPL/English Division 1 history, one which dates back to 1888.
Here’s a chart, similar to the one that James and Oliver produced. Overall home team win percentage in each year is shown in black, draw percentage in red, and away win percentage in green. The grey region reflects our uncertainty in the trend curve.
As we knew there’d be, it’s a fairly big drop in win percentage, from roughly 60% to 45% across about 120 seasons. Using this rate of decline, we can expect home teams to win 0% of games by around the year 2400 (I kid).
While win percentage is a useful metric, it’s not perfect, as it doesn’t account for differences in team schedules. If the better teams generally got to host the worse teams, or if winners from previous year were forced to play one another more often (hi, NFL), overall home win percentage would fluctuate as a result. (Note that I am aware that EPL teams currently face each opponent exactly one time at both home and away stadia, which is nice and balanced. However, I’m not sure if that’s always been the case.)
As a result, a paired-comparison method that can account for the team strength of each team, and accordingly estimate a home advantage as a result, could be worth looking at. Using the BradleyTerry2 package in R, and with a hat-tip to James’ engsoccerdata package, I ran a Bradley-Terry model (BTM) with a home-team advantage coefficient within each season. When exponentiated, the coefficient from a BTM reflects a league-wide estimate of the home advantage, taken on an odds scale. As an example, if a team would have a 50% chance of a win at a neutral site (Odds = p/(1-p) = 1), they’d have a 60% win probability with a 50% increase in odds (Odds = .6/(1-0.6) = 1.5). As additional examples, 100% and 200% increases in odds would bump that 50% win probability team to 66% and 75%, respectively.
Here are the season-level increased odds of a home win across the last several decades. Conditional on team strength, in 2015, the odds of a team winning at home were about 25% higher than that team winning a neutral site. While still greater than 0, it’s again the lowest mark in league history.
It’s worth pointing out that the BTM coefficients are estimates, and do come with standard errors attached (generally about 0.12 on the log-odds scale). Even so, it’s interesting that the last 5 seasons all rank among the lowest 10 seasons as far as an estimated EPL home advantage.
James and Oliver posit several reasons as to what caused the drop in home advantage, including ease of travel and referee awareness. Their work also shows that the primary impetus behind fewer home wins is fewer home goals. Interestingly, while home advantage has also seemingly dropped in the NBA, it’s stayed relatively consistent in the NFL, MLB, and NHL.
It’s straight-forward to link EPL results with those in other professional soccer leagues. James’ data also includes La Liga (Spain), Serie A (Italy), and the Bundesliga (Germany), so I’ll use those.
Here’s a plot of win percentage across time in each of the four leagues. It’s sort of a cluster, but other leagues seem to match the EPL in terms of a home advantage in recent seasons.
Finally, we can use the BTM in each league in each season to get relative odds of a home win relative to that game being played on a neutral field. Here’s that graph.
-It’d be interesting to go back at each league’s schedule to identify where and why the two graphs above yield different results. For example, there’s a noticeable win percentage gap between the EPL and Serie A in 1975 that’s not apparent when looking at the BTM coefficients.
-The Bundesliga’s home advantage shape is quite strange in the years between 1962 and 1982, showing an inverse quadratic trend over time. I have no logical explanation for such an association. (In fairness, I wasn’t born yet.)
-England switched to a 3-point rule in 1981, while Germany, Italy, and Spain waited until the mid-1990’s. Behavioral economists would do well to look at the impact of rule changes using graphs like these. Generally, most related work only uses a few seasons of play (token plug for my NHL article).
-Nate Silver’s World Cup prediction model was taken to task after its overwhelming optimism for the host Brazilians in 2014. Given the results above, it’s possible that a decline in home advantage across soccer played a role. Nate’s model surely inflated Brazil’s chances because of what was anticipated to be a noticeable benefit of playing at home. But if much of the home advantage that used to exist in soccer was no longer a part of the game, it could explain why a prediction model would do so poorly for the hosts yet do so well when other teams played.
NHL Game 7’s are awesome, and in the next two days, we get two such contests – Dallas vs. St. Louis and San Jose vs. Nashville.
Here’s a primer on what to expect with respect to team behavior in these games. All of my findings used data from the nhlscrapr package in R for the 10 years of postseason action between 2006 and 2015.
Fewer penalties, slightly fewer goals in Game 7’s.
Nate has a nice chart covering the tendency for teams to accumulate fewer penalties in game 7’s, relative to other games in the series. I found roughly the same thing; about 11 total non-matching penalties per game during the first six contests of a series, compared to just seven in game 7’s. That’s about eight more minutes of even strength to expect in a Game 7.
Perhaps the fewer power plays awarded in game 7’s are driving a small but noticeable difference on the total number of goals; while games 1-6 average 5.3 goals per game, game 7’s average 4.9.
Penalties are less likely to be called at the beginning of Game 7’s.
Here’s a chart of the per-minute penalty rate comparing game 7’s to game’s 1-6. Each line reflects a smoothed curve, and the grey area reflects our uncertainty in each curve’s trend. Rates are adjusted to reflect the number of penalties we would expect if the rate of whistles for that minute of play were extended for an entire game.
The biggest difference in penalty rates between game 7’s and other games in a series looks to be the first period, where penalties are consistently called less often. This could be a combination of factors – players adopting a safer style of play, for example, or a referee’s hesitancy to call possible violations. Interestingly, these results mirror those from the NFL, where many judgement calls are rarely whistled to start a game.
There are also smaller differences in periods 2 and 3, and marked differences in the game’s final minute. However, note that rate differences at the end of the game are perhaps not too surprising given the frequent scrums in earlier games where teams do silly things like trying to “send a message.” Sidenote: I wonder how style of play would change if penalties at the end of a game carried over to a team’s next contest.
Here’s a similar plot looking at goals (empty net goals were excluded).
Note that the standard error bars for each curve were fairly large and overlapped throughout a game, and so differences between the two curves should be taken with a grain of salt. There is slightly less scoring throughout most of game 7’s, particularly at the end of the first period and at the beginning of the third period. Teams operate at about a two goals-per-game pace to begin the third period of game 7’s, for example.
More pressure, more call reversals?
In work a few years back (ignore the Excel chart! I was learning R at the time), Kevin and I found slightly higher rates of make-up calls in Game 7’s, relative to other games in a series. This came on top of the higher frequency of make-up calls in the postseason, relative to regular season action. Here, I’m defining a make-up call as one that works to even out the total number of penalties each team has.
A few years later, that trend still seems to hold. When a home team has exactly one more penalty that its opponent, it has received the next power play 58% of the time during the regular season. That number jumps to 62% in postseason game’s 1-6 and 68% in game 7’s. When owed two or more penalties, the home team has been awarded the next power play 61% of the time in the regular season, 65% of the time in game’s 1-6, and a remarkable 78% of the time (18 of 23 sequences) during game 7’s. So, if the home team gets behind on penalty differential, expect that to even up by game’s end.
Differences in postseason call reversals are not as evident when looking at if away teams are owed power plays. Across each game number, away teams that are owed penalties are given the next power play about 57% of the time.
A few days back, I summarized recent research on the NFL draft.
One interesting anecdote was the reliance of nearly all NFL draft research, both academic and non-academic, on Pro-Football-Reference’s approximate value (abbreviated AV), and it’s player-level career summary (cAV), a metric designed by Doug Drinen and described by Neil Paine in a series of articles here.
So, part of this post is designed to look at AV.
Before I start that, however, here’s a plug for those looking to do work in statistics in sports. We need more and better player-level outcomes in the NFL. Hockey, baseball, and basketball each have a half-dozen metrics which have or are being used and tested in the public sphere. As examples, we know which catchers are good at framing pitches, whether or not hockey teams should dump and chase or carry-in, and, for goodness sake, we have Nylon Calculus dominating the NBA scene on a daily basis. Meanwhile, football is mostly stuck on fourth downs and counting statistics. It will be difficult, but it is time for football fans to step up their games.
On today’s agenda:
What can we learn about Approximate Value?
Why does it matter?
Coming in the future:
How else can we analyze the NFL draft?
Pro-Football-References describes AV as “putting a single numerical value on any player’s season, at any position, from any year.”
Points are awarded to offense’s on the basis of team points per drive, relative to the league average. These points are then divided up among member’s of each offensive unit. A similar strategy is also used for defensive players based on points against. In addition, bonus points are given for being named an All-Pro, and points are also awarded for making the Pro Bowl at certain positions. Career AV numbers are weighted by season-level ones to account for varying numbers of seasons played. A more complete description is found here.
Here’s a rough summary of cAV using quarterbacks, one which passes the smell test: JaMarcus Russell is a 6, Tim Tebow a 12, Christian Ponder a 22, Ryan Fitzpatrick a 61, Alex Smith a 72, Matt Ryan a 96, Big Ben a 96, and Tom Brady a 160.
In understanding AV, it is useful to start by looking at the distributions by position among players drafted before 2010. Here’s a plot of career AV (cAV), split by era. I left out players drafted after 2010 as they have not had much of a chance to accumulate a representable career number.
A few things stand out.
-The distribution of cAV by position appears consistent between eras. That is, the shapes of each boxplot are the roughly same within each position, moving from era to era. This seems interesting, and arguably debatable, given changes to the game since 1967.
-At all positions, more than 50% of players accumulate a cAV of less than 10 (recall: Tim Tebow’s was 12). This makes sense, given the number of players drafted that barely play a game. The right skew at each position also seems reasonable.
-Tight ends appear to be the lowest-valued position, with offensive linemen and linebackers the highest, as judged by median career AV and density in the right tails.
-Relative to what I perceive of the last several years of the NFL, as well as what teams are paying top players, there seems to be a bit of disconnect. For example, quarterbacks have long been paid substantially more than players at other positions, implying that the best ones are worth more to their teams than the best players at other positions. Other than a few quarterbacks drafted in the 1990’s, this doesn’t show up in cAV.
It’s straightforward to link draft position to cAV.
Let’s start by fitting cAV to draft position across all seasons and positions. Each point below is a player, and the blue curve is a non-parametric smoother, which accounts for the likely non-linear association between draft position and performance. If this curve looks familiar, it looks like the one used in several of the draft-trade charts in circulation.
One question that I am really curious about is whether or not teams have gotten better at drafting since the late 1960’s. An obvious thing to do would be to fit an identical curve to the one shown above across different eras. If curves in recent years are higher and/or steeper, it would suggest that teams in today’s NFL are better at identifying talent and are drafting those players earlier.
Unfortunately, there’s a problem with that strategy. Turns out, cAV has grown over time. Here’s the average player cAV among the draft’s first 260 picks (this is the number of picks in the last few years. The trend is identical when using different cutoffs). The grey area accounts for our uncertainty in the trend curve.
The average cAV of players drafted in 2006 is about 3-4 units higher when compared to players drafted four decades ago. It’s a significant increase as well as a practical one. All together, on a yearly basis, about 1000 more units of cAV are being handed out now than in the early stages of the league. [Edits: This matches conclusions found by Danny at Football Outsiders. Sean proposes that expansion is responsible for part of this increase, given that AV is tied to playing time.]
Looking into the formula for AV, it is unclear what is causing these increases. One possibility is that more players have been drafted in recent years in the positions that AV values more than others. Another possibility, albeit a stretch, would blame recent increases in the number of Pro Bowl players, as AV rewards this.
Why does this matter?
Well, as far as looking at draft success over time, increases in cAV are vital to identify. Without adjusting for when a player was drafted, standard analyses could confuse improved league-wide drafting ability with what appears to be era-specific changes in our outcome measure.
To sum, we know that drafting players in any sport is hard given the extrapolation required to project the performance of young athletes several years down the line. In addition, analyzing the NFL draft is made even more difficult given that what is currently a poor set of outcome measures.
Much of today’s draft research uses cAV, a promising metric which is attractive in that it’s a single number accounting for many team and individual level performance metrics. However, we showed above a few potential weaknesses in cAV, at least with respect for its use in looking at the NFL draft. Overall:
-Approximate value mostly the smell test as far as players ranked high and low.
-Arguably, position-level AV measures don’t perfectly seem to match league-wide trends.
-Valuations are increasing in time.
Under these constraints, one reasonable recommendation for using cAV is to make it year and position specific. In future work, I’ll propose one way of doing this.