Talk on NHL shootouts

Schuckers and I have done some work on NHL shootouts, presented at the New England Symposium on Statistics in Sports.

The slides for our presentation are linked here.

A github page with R code is found here. The R code links to the data, which is publicly hosted for anyone to analyze.

Finally, we made a pair of interactive plots using Plotly. The shooter interactive plot is linked here, while the goalie interactive plot is linked here.

tl;dr version:

On their own, shootouts aren’t a crapshoot. First, there’s a decent amount of bias with respect to both when shooters are allocated to take attempts and which rounds those attempts are in. Second, shooters, and to a lesser extent goalies, vary more than we would expect them to if every player was equivalent.

All else equal, and since 2005, the best shootout shooter would have been worth about $700k to his team on shootout performance alone, and the best goalie worth about $1,000,000, on a per-year basis. Given the reduction in the frequency of games ending in a shootout going forward, however, these values are likely smaller going forward.

Finally, conditional on what we know about team behavior, shootouts remain much closer to a crapshoot than a sure thing. And they still aren’t a great way to end a hockey game.

So you want a graduate degree in statistics?

After six years of graduate school – two at UMass-Amherst (MS, statistics), and four more at Brown University (PhD, biostatistics) – I am finally done (I think). At this point, I have a few weeks left until my next challenge awaits, when I start a position at Skidmore College as an assistant professor in statistics this fall. While my memories are fresh, I figured it might be a useful task to share some advice that I have picked up over the last several years. Thus, here’s a multi-part series on the lessons, trials, and tribulations of statistics graduate programs, from an n = 1 (or 2) perspective.

Part I: Deciding on a graduate program in statistics

Part II: Thriving in a graduate program in statistics

Part III: What I wish I had known before I started a graduate program in statistics  (with Greg Matthews)

Part IV: What I wish I had learned in my graduate program in statistics (with Greg Matthews) The point of this series is to be as helpful as possible to students considering statistics graduate programs now or at some point later in their lives. As a result, if you have any comments, please feel free to share below. Also, I encourage anyone interested in this series to read two related pieces:

Cheers, and thanks for reading.

Discretionary penalties in the NFL

As a former college offensive linemen, I’m well aware of the reputation that holding penalties have – ‘you could call one on every play’ goes the old adage.

Kevin and I wrote a paper, recently appearing in JQAS, in which we looked at the rates of NFL penalties. Specifically, we wanted to address how rates fluctuate over the course of the game.

Quick summary: the rates of discretionary penalties in NFL games are hugely correlated with time.

Here’s my favorite plot, where, letting OHR be the holding rate on run plays, OHP the holding rate on pass plays, and DPI the defensive pass interference rate, we compare versus game minute (1 through 60). These rates are adjusted for play and game characteristics, and given per 1000 plays along with 95% confidence limits.

Model estimated penalty rates by game minute. DPI: defensive pass interference. OHP: offensive holding on pass plays. OHR: offensive holding on running plays
Model estimated penalty rates by game minute. DPI: defensive pass interference. OHP: offensive holding on pass plays. OHR: offensive holding on running plays

The association between game minute and penalty likelihood is strongest for holding on running plays – with rates 4-5 times higher during parts of the second and third quarters, when compared to the beginning and ends of a contest. Holding penalties, and to a lesser extent the other judgemental calls that we looked at, are exceedingly rare in the game’s first and last minutes.

If you are interested in the paper, feel free to drop me an email and I can send you a copy.


If The Patriots Stole Playsheets, It Appears Not To Have Been A Big Advantage

On Tuesday morning, Outside the Lines released a damning investigation into the link between Spygate and Deflategate, two scandals that have consumed the New England Patriots in the past decade. There were a lot of revelations in there, but one of the more damaging ones had nothing to do with taped signals or the Ideal Gas Law. From Don Van Natta Jr. and Seth Wickersham’s report:

Several [former New England coaches and employees] acknowledge that during pregame warm-ups, a low-level Patriots employee would sneak into the visiting locker room and steal the play sheet, listing the first 20 or so scripted calls for the opposing team’s offense. (The practice became so notorious that some coaches put out fake play sheets for the Patriots to swipe.)

In response, Yahoo’s Charles Robinson wrote, “Knowing the first 15 or 20 offensive plays scripted by an NFL team is knowing the future.” If the Patriots indeed stole scripted play sheets, we’d expect to see their defensive performance peak early in the game, only to wilt later on. Indeed, the OTL report gives evidence of games where New England’s defense was staunch early, but later gave up big yardage.

But that’s anecdotal evidence. There’s a more empirical way to judge whether or not the alleged play stealing had any effect.

The chart below compares New England’s defensive performance for the first 5 playsof the game (when the Patriots would have had knowledge of what was coming) with the second 5 plays of the game (when they wouldn’t). (I also contrasted the first 20 plays and the second 20 plays. The results were similar). If New England was stealing play sheets, it would, in theory, yield fewer yards a play on the first five plays of the game than on the second five plays of the game. Armchair Analysis provided the data.

Interestingly, the shapes (histograms) are very similar, both within each era and in terms of the difference between eras. Further, in pretty much every year (not shown), there’s a consistent overlap between the yardage New England allowed on plays likely to have been scripted and those less likely to be scripted. Finally, in terms of average yards per play, New England’s patterns tend to match numbers from the rest of the league.  

What’s the take home?  It could be a couple of things. One explanation is that New England stole a bunch of play sheets, but just did a poor job of using them. Alternatively, as suggested in the ESPN piece, rogue opponents could have subbed in fake play sheets, fooling the Patriots and potentially our numbers. Of course, one or two stolen sheets probably wouldn’t show up in our data, and because we have no idea exactly which (if any) sheets were stolen, it’s somewhat hard to define what we are looking for.

Interestingly, the more noticeable change is that the Pats defense gave up about half a yard more per play overall post-2007 than pre-2007. Of course, there are several explanations for such a drop in performance, including differences in team personnel beginning and/or league-wide changes in offensive strategy. Conspiracy theorists, however, could argue that with a unit no longer able to accurately predict play calls, the Pats’ defense suffered. But this seems like a bit of a stretch. League-wide, there was no difference post 2007. 

Finally, although its tempting to jump to aggressive conclusions, it’s a good reminder that an absence of evidence is not evidence of an absence. Instead, all we know is what the data tells us, which in this case, is only a small part of the story; on a yards-per-play basis, the Pats defense was pretty consistent before and after an opponent would script its plays.

Two point art

Greg’s artwork inspired me to do something.

I’m not sure if this is art, because if that’s the case, then that may make me an artist.

But here it is.

Can you guess what the dots are for? Hint – it’s from the NFL, and I think its going to change.

Extra-point decisions since 2000
Extra point

The x-axis in the chart above is an index, from 1 to 19,255. This represents the number of touchdowns since 2000. The y-axis represents is point differential for the team on offense (from down 12 to up 12). The light dotted red line in the middle intercepts the y-axis at 0.  Finally, the dot depicts if the team went for two (blue is yes, light grey is no).

It’s pretty evident that teams almost always go for two in only a few situations. This isn’t a surprise. However, I’m curious how it will look after the 2015 season, given the recent rule changes. My guess is that we’ll see some changes in a few spots, with teams occasionally being more aggressive.

I also think its interesting how consistent the strategy has been over time.

For those of you who like axes, here’s the same plot.


And we expand the y-axis here:


Regression or Reversion? It’s likely the latter

With interest in statistical applications to sports creeping from the blogosphere to the mainstream, more writers than ever are interested in metrics that can more accurately summarize or predict player and team skill.

This is, by and large, a good thing. Smarter writing is better writing.  A downside, however, is that writers without a formal training in statistics are forced to discuss concepts that can take more than a semester’s work of undergraduate or graduate training to flesh out. That’s difficult, if not impossible and unfair.

One such topic that comes up across sports is the concept of regression toward the mean. Here are a few examples of headlines:

Regression to the mean can be a bitch! (soccer)

Clutch NFL teams regress to the mean (football)

Beware the regression to the mean (basketball)

30 MLB players due for regression to the mean (baseball)

Avalanche trying to stave off regression and history (hockey)

In each case, the regression (i) sounds scary, (ii) applies to over-performance, not under-performance, and (iii) is striving really hard to reach an exact target, in these examples a vaguely specified ‘mean.’

From a statistical perspective, however, regression toward the mean requires strict assumptions and precision, the context of which are almost never discussed in practice.  As a result, examples that refer to a regression to the mean may be ill-informed, and are often best described by a similar sounding but more relaxed alternative.

Using the notation and descriptions in Myra Samuels’ 1991 paper in the American Statistician, “Statistical Reversion toward the mean: More universal than regression toward the mean,” here’s a quick primer through the context of sports.

What is regression towards the mean?

Let X and Y be a pair of random variables with the same marginal distribution and common mean µ. Most often in sports, X and Y are simply team/individual outcomes that we are interested in measuring and describing. For example, X could be the batting average of a baseball player through July 1, and Y his average from July 2 through the end of the season. In this example, µ represents that player’s probability of getting a hit.

The definition of regression toward the mean is based on the regression function, E[Y|X = x]. That is, conditional on knowing one variable (X = x), what can we say about the other? Formally, regression toward the mean exists if, for all x > µ,

µ < E[Y|X = x] < x, 

with the reverse holding when x < µ.

This is a fairly strict requirement. For an outcome above a player or team’s true talent, we can expect that the ensuing outcome, on average, will lie in between µ and the original outcome. Linking to linear regression, for any initial observation x, the point prediction of y is regressed towards an overall mean representative of that subject. However, y will still exhibit some natural variation above and below the regression line; some points will fall closer to the mean, and others further away.

There are easy pitfalls when it comes to applying regression toward the mean in practice. The most common one is assuming that what goes up must come down. For example, assuming that players or teams become more and more average over time is not regression toward the mean. A second misinterpretation is linking regression toward the mean with the gambler’s fallacy, which entails assuming that a team or player that was initially lucky is then going to get less lucky. This is also not true. The probability of a fair coin landing heads, given that it landed heads five, ten, or even fifteen times in a row, remains at 0.5.  Such misinterpretations are frequent in sports, particularly when describing team performance with respect to point spreads or performance in close games.

While its easy to confuse regression toward the mean with such scenarios, there’s good news, in the form of some easy to understand alternatives.

What’s the better alternative?

To start, replacing ‘regression’ with ‘reversion’ relaxes the assumptions presented above while still implying that extreme observations are more likely to be followed by less extreme ones. More often than not, when writers speak of regression to the mean, using reversion is sufficient and accurate. Furthermore, Samuels proves mathematically that ‘regression toward the mean implies reversion toward the mean, but not vice versa.’ Namely, reversion is a more relaxed alternative; the conditional mean of the upper or lower portion of a distribution shifts, or reverts, toward an unconditional mean µ.

For example, in the headlines listed above, good soccer teams, MLB players hitting for high numbers, and the Colorado Avalanche were all more likely to revert to a more standard form that was indicative of their true talent. No regression equation is necessary.

In addition to generally being a more appropriate term, use of the word reversion has a side benefit, in that it is more interpretable when applied to outcomes that initially fell short of expectations. It is recognizable, for example, to expect an MLB batter hitting 0.150 to revert to form; meanwhile, it doesn’t make sense to claim that the same MLB batter will regress, given the negative connotations of the latter.

And is it regression/reversion ‘to’ or ‘toward’ the mean?

Well, it depends. While increased use of the word reversion is part of the solution, more precise writing should also consider both the outcome of interest and that outcome’s expected value. For example, here are two examples of over-performance:

Mike Trout hits 0.500 in his first ten games of the season.

Mike Trout tosses a coin 10 times, landing nine heads.

And here’s the same sentence to describe our future expectations – can you tell which one is accurate?

Mike Trout’s batting average will revert to the mean.

Mike Trout’s ability to land heads will revert to the mean.

In the first example, the outcome of interest is Mike Trout’s probability of getting a hit. Because we can comfortably say that Mike Trout is better than the league average hitter, while his batting average is going to come down, it is reverting towards an overall average, but not to the overall average.

Meanwhile, unless Trout can outduel the Law of Large Numbers, I can comfortably say that in the long term, his observed ability to land heads will revert to a probability of 0.5. In this silly example, the second statement is the more precise one.

Anything else worth discussing?

Well, maybe. In searching for some of the examples used above, I found it strange how little was written of the one word that tends to encompass much of a players’ performance above or below his or her true talent.


The obvious aspect linking, say, the Colorado Avalanche winning games while being outshot and Mike Trout tossing coins and landing heads, is that each was on the receiving end of some lucky breaks. So while we expect some type of reversion to or towards a more traditional performance, that’s to no fault of the Avalanche or Trout. With outcomes that are mostly (or entirely) random, variability above or below the league average is simply luck.  As a result, there’s nothing for the Avalanche, Trout, or even us to be scared/beware of. We wouldn’t tell Trout to fear a balanced coin, nor should we tell Avalanche fans to beware of reversion towards a more reasonable performance given their teams shot distribution.

The issue here lies not in a distinction between regression and reversion, but a deeper and more serious problem; humans have a poor grasp of probability. In sports (and likely in other areas of life), lucky outcomes are all too often touted as clutch, while unlucky players or teams are given the label of chokers. It’s standard practice to use terms like savvy to describe the Patriots win over Seattle, for example. A more skilled writer, however, would perhaps recognize that the Patriots were on the better end of a 50-50 coin toss, from more or less the start of the game all the way until the end (in more ways than one; the game closed as a near pick-em at sports books. Even bettors couldn’t nail down a winner).

Writes Leonard Mlodinow in The Drunkard’s Walk,

the human mind is built to identify for each event a definite cause and can therefore have a hard time accepting the influence of unrelated or random factors.

It’s difficult and counterintuitive to describe an outcome in sports as lucky. However, that’s what many of them are are.

So while it may sound trendy to toss around terms like ‘regress to the mean,’ it is often more accurate, and certainly more simple, to propose that some luck was involved in the initial outcome. As a result, a decline from overperformance is nothing more than a player or a team, much like a coin tosser no longer landing heads five times in a row, not getting as lucky as they initially had been.

JSM 2015

There’s a fun session at JSM 2015 on referee decision making in sports, held Wednesday at 10:30.

I’m presenting some new work on a sideline pressure in the NFL that appears to impact referee behavior. For defensive judgement penalties, including pass interference and aggressive calls like unsportsmanlike conduct and personal fouls, we find statistically and practically significant differences in the call rates based on which sideline the play occurred in front of. There are also significantly different rates in the rate of holding calls on outside run plays.

Here are my slides, and here’s a more technical paper. I encourage feedback!

One soccer ref makes every judgement decision. Is that absurd?

In last night’s Gold Cup semi-final between Mexico and Panama, Mexico escaped with a 2-1 extra-time victory. Like many recent CONCACAF games, a few judgmental calls more or less decided the outcome. This game included an early red card to a Panama player, and a late penalty kick awarded to Mexico. See Deadspin for highlights here.

Much of the commentary after the game ripped the game’s referee, American Mark Geiger. However, I’m not quite sure Geiger’s to blame.  Specifically, while I’m not smart enough to get into the technicalities of any soccer call, I did notice that Geiger was forced to make a red call decision from at least 40 yards away. This seems absurd.

Using the dimensions from each of the fields/rinks in the NFL, NBA, NHL, and FIFA, as well as each organizations respective number of officials, I estimated the amount of square footage that each ref is responsible for. For example, three NBA referees cover 4,700 square-feet, or about 1,600 per ref.

Here’s a chart with the estimated square-footage (in thousands) covered by each ref.

Screen Shot 2015-07-23 at 12.49.16 AM

It’s no contest. A soccer referee covers about 7 times as much ground in a game as NFL and NHL ones, and about 50 times as much ground as NBA refs.

There are obviously several caveats with such a simple chart. NBA officials have to call a much larger assortment of violations than FIFA ones, and NFL plays stop and start from one spot on the field, making it easier for the group of referees to reset. Further, its silly to think that putting a half-dozen more refs out there would make soccer games more equitable. Finally, I’m obviously aware that soccer has assistant referees; from my perspective, like NHL linesmen, the role of assistant referees is secondary on the game’s most important decisions.

However, its patently absurd to blame referees for wrong calls when they are making the decisions from half the field away. Would we expect an NBA ref to assess a flagrant foul from the opposite end of the court?  Or an NFL official to whistle pass interference from the opposite sideline?

Of course not. That’d be crazy.  But it seems just as crazy to blame soccer refs for failing a test that they never had a chance to pass.

From a relative outsider’s perspective, an extra ref would yield more accurate calls and could help curtail flopping. There’s probably a good reason why soccer has only one referee, but a quick Google search didn’t help. What am I missing?

Finally, here’s Noah’s take:

I think part of it is that play is more wide open than most sports, so it’s a bit easier to spot fouls. Which always seemed like sort of a dumb argument to me (the same thing with kickoffs in the NFL) but it does make some sense. And I think they’ve tried to empower the assistants to make more calls, but there’s always a strange balance of power issue because the assistants are just assistants. so yeah, having a second ref would make sense. If anything, it’s probably a man-power issue. There are so many terrible refs already that i can’t imagine having to double that number worldwide. 

MLB win percentage versus salary – a follow up

Noah and I had heard and read a bunch of discussion about the rise of small budget teams in baseball. When we set out to prove it, we actually found the opposite to be true. Here’s our article for 538, titled “Don’t be fooled by baseball’s small budget success stories.”

There were several interesting follow-up questions, as well as some anecdotes that didn’t quite fit in the article. I encourage to read Tango’s blog for interesting comments on my article, as well as general thoughts on salary and winning in baseball.

Anyways, I’ll answer a few of the questions here (comments in bold).

1 – Can you rank teams over 30 years by area between team regression line and mlb regression line (via @beerback)?


Given that some franchises (Montreal, Tampa, Washington) have only played in a portion of the seasons we covered, I looked at the average yearly residual between each franchises win percentage and its expected win percentage, given its relative payroll. Here’s a barplot.

Average annual wins above expectation, 1985-2014
Average annual wins above expectation, 1985-2014

No surprises here. Relative to their payroll, the Cubs have been about 5 annual wins worse than expectation, with Oakland about 6 wins better. Montreal, St. Louis, and Atlanta all stand out as teams that have spent wisely over the last 30 years, on average. By and large, these results match our intuition.

Also, its worth pointing out that Montreal’s run in the 90’s nearly matches Oakland’s in the 2000’s as far a small-budget team spending wiser. In three of four seasons between 1993 and 1996, the Expos finished with a win percentage above 0.540 while spending less than $20 million. In relative salaries, that’d be equivalent to spending $42 million in 2015…which is about a third less than the Astros current payroll.

2- I don’t like the idea of creating a best-fit curve, if a best-fit line will do.  And we can see for the overall 30-year league average, it IS practically a straight-line.  That it doesn’t look like a straight line at the team level simply means “small sample size” (@tangotiger).

In our article, I used smoothed lines to express the relationship between winning and spending for each team. However, by and large, the plot for all teams together is nearly linear. Are the funky team-specific curves just due to chance?

As one way of considering this question, I calculated a residual for each team in each season, which represents the distance above or below the line of best fit for that year’s winning percentage. As an example, positive residuals represent teams that outperformed expectations.

Next, I used bootstrap resampling, taking each fitted value from the line-of-best-fit and adding random noise, where the error was sampled (with replacement) from the observed set of residuals. This gives a set of imputed winning percentages, representing a sample of seasons that could have occurred if there was simply noise above or below a straight line.

We can compare the imputed curves to the observed ones to answer a few questions. First, are there as many curved lines when we bootstrap? If so, the curved relationships that we observed are likely explained by chance. Another question  – are there as many teams that are consistently above or below their payroll expectation?

Here’s the first simulation. Click for a second iteration, if you are interested.


As a reminder, here’s what we are comparing to – the observed curves. And as an example, this is the set NL East curves. The x-axis is standardized salary, and the y-axis is win percentage.

NL East

Few, if any, of the smoothed curves that were simulated using an underlying linear association were able to match either the (i) impressive performance (relative to salary) of the Braves or (ii) the Mets’ bizarre u-shape.

This exercise tends to support a few conclusions.

First, results like those of the Braves and the A’s, which, on average, outperformed their expectations, were likely not due to chance. None of the simulated curves were consistently above or below the line the Atlanta’s and Oakland’s curves were.

Second, while most teams can be fit using a straight line, the relationship may not have been linear for all teams. No franchises in the simulated iteration seems to match the Mets’ u-shape (or a few similar ones from other teams).

3- How strong did you find the correlation to be? It seemed like most points were clustered along the wins (y) axis and not necessarily following the average curve.

The average yearly correlation between winning percentage and standardized salary has been between 0.30 and 0.65 during each season between 1993 and 2014. In all but four seasons, the correlation is significantly different from 0.

Also, it’s worth pointing out that Tango used a similar strategy and aggregated salary and win percentage across a decade’s worth of seasons. He found that the correlation between winning and a salary index to be about 0.70, using the seasons 2002-2011.

Thoughts on the Sloan research paper contest

Folks who have submitted abstracts over the past two years to the Sloan Sports Analytics Conference research paper contest were recently surveyed as to their thoughts on the contest.

Here are my (expanded) answers to the open ended question “Do you have any other suggestions or comments that will help us improve the research papers competition?

1- Maintain a strong prize pool, but eliminate the crazy discrepancy between 1st and, say, 5th place. 

In the current set-up, first place is $20k, second place $10k, and third place onwards is nothing. This structure incentivizes researchers to oversell their findings, because admitting that your work is simply building on the research of others is not nearly as sexy as claiming to be the first in your field to find something.

What’s a more equitable system? One that encourages good content, appropriate citation of sources, and makes it clear why each paper is relevant to advancing sports analytics.

From a prize perspective, this makes it less of a crapshoot. Financially, each finalist gets $2k and a free ticket. Winner get $10k. Boom, done.

2- Reward participants whose submission is reproducible.

I cannot remember a single finalist paper that has either included (i) its data or (ii) its source code. This is not a good (note: I’m also guilty. I didn’t submit code or data two years ago).  Given that the majority of findings in professional research are not reproducible, it is difficult – perhaps impossible – to know if each paper truly got things right. Rewarding papers that include data and source code would be a major step in promoting reproducible research.

Of course, work is only reproducible if the data set is public.  A more aggressive but related idea would be to use separate tracks for both proprietary and public data. This was suggested a year ago by analyst Christopher Long (and perhaps by others).  Such a distinction levels the playing field among researchers who have good work to share but are working with standard data, where it is becoming more and more difficult to make novel discoveries each year.

3- Implement a conference proceedings section.

For many people in academics, there is a lesser incentive for submitting to Sloan given that, unless your paper finishes as one of the finalists, all of your work is for not. Having a conference proceedings would likely encourage more submissions in this regard. If you are worried about the cost, publish online only and charge anyone who wants a hard copy. This would be very cheap.

4- Also allow submissions in TeX.

For years, the conference has used the same Microsoft Word template for participants. But many analytics researchers use TeX and only TeX for their work, as the formatting, particularly for mathematical notation, is substantially easier and more readable in TeX than in Word. TeX is also more visually appealing than Word.

Allowing submissions in both TeX and Word seems like an easy compromise.


Happy to hear other takes as well. I am appreciate of the fact that SSAC has upgraded the rewards for poster recipients over the past few years. Futher, the fact that SSAC has implemented a survey in the first place is hopefully a promising sign of changes to come.

Two reasons the future two-point conversion rate might be higher than current estimations

The NFL recently updated its extra point rules, moving the yard line for extra points from the 2 to the 15.

In the wake of the the change, one topic of conversation is whether or not the offensive teams will choose to go for two more often. The thought is that if extra points become more difficult, perhaps it is worth the risk of getting two points.

Critical to the conversation is the idea of expected points, which weigh the points and probabilities of conversions against those of extra points.* However, a basic expected points analysis requires, among other assumptions, both that we have reliable data and that all two-point conversions are created equally.

That may not be the case. Here are related reasons that the conversion rate might be higher that its currently being estimated (in most places, around 48%).

1 – Data issues

Using data from Armchair Analysis (AA), I was accurately able to confirm that teams converted 48% of the two-point conversions since the 2000 season, a number that has been reported in several outlets. But I was also able to obtain that the primary rusher or passer on 40 of these plays did not exist and that a punter/kicker rushed or passed the ball on another 26 plays. Here are some of the players of the players listed with conversion attempts: C Kluwe, B. Moorman, K. Walter, S. Koch, T. Sauerbrun. We can’t expect that crew to be leading conversion attempts on real conversions from the two-yard line in 2015 and beyond

Overall, offensive teams converted just 6% of such attempts into two-points (4 for 26 on plays handled by the kicker or punter, 0 of 40 otherwise). More likely than not, these 66 plays were fumbled snaps, fake extra points by designs, or muffed somethings.

If you remove the unknowns and kickers/punters from AA’s conversion data, things look a bit different, with teams converting at 51% since 2000.

2 – Teams that have gone for two have been generally playing from behind

Here’s a chart of the scoring margin at the time in which a team is attempting a conversion (I focused on point differentials between -20 and 20, which ignores a few games on the outside).


Treating all conversions as identical misses the fact that under the previous system, most teams going for it had to do so given the score differential. Moreover, 60% of the teams that went for two were trailing at the time of the attempt. And if those teams were trailing, you could make the argument that they were likely worse than their opponent in terms of overall talent.

Comparing the success rates of these teams yields a small difference: teams leading converted 53% of the time, compared to 49% among trailing teams.

It seems reasonable to think that even the 51% is a slight underestimation of what the success rate of teams would be if there were more evenly distributed attempts by team talent. Further, there’s also somewhat of an association between conversion rates and a teams offensive proficiency: see the next section for a chart.


Other notes:

-There are likely more issues remaining with the data. For example, a botched snap featuring a pass from TE Jay Riemersma shows up in our data (the play is listed here, many thanks to a loyal reader for finding this stuff). But there are also purposeful conversions from non-kickers, including Antwaan Randle El, who had three of them. Further work is needed.

-Teams passed on 71% of their conversion attempts. Strange, given that passing attempts were only 48% successful, compared to 59% of rushes.

Update: A few smart folks pointed out that many of the rushes might be QB scrambles. Here are the success percentages and counts by play type:

Rushes: RB’s 109 of 190 (57%), QB’s 47 of 72 (65%), WR’s 4 of 6 (66%)

Passes: QB’s 313 of 659 (47%), WR’s 5 of 7 (71%), RB/DB/TE 1 of 4 (25%)

-There did not seem to be any differences in conversion rates by weather or surface.

-Here’s a plot of success rate and conversion attempts for each team since 2000. Jets and Cardinals doing their thing.



*Here’s a primer on expected points:

Teams have successfully converted 48% of their two-point conversions over the past 15 years – it’s at 49% over the last three years – making the expected number of points on a two-point conversion attempt approximately 0.49 x 2 = 0.98. Alternatively, given that teams make between 90 and 95% of their field goals from near the 32-yard line, and that the extra point remains worth a single point, we can assume that the expected value of a longer extra point is somewhere between 0.90 and 0.95. Game, score, and coaching conditions aside, on average, it is evident that there’s now a slight advantage to going for two.  Benjamin Morris makes the excellent point that we should also expect the number of expected points on XP’s to rise, too, given how awesome kickers have become.