So you want a graduate degree in statistics?

After six years of graduate school – two at UMass-Amherst (MS, statistics), and four more at Brown University (PhD, biostatistics) – I am finally done (I think). At this point, I have a few weeks left until my next challenge awaits, when I start a position at Skidmore College as an assistant professor in statistics this fall. While my memories are fresh, I figured it might be a useful task to share some advice that I have picked up over the last several years. Thus, here’s a multi-part series on the lessons, trials, and tribulations of statistics graduate programs, from an n = 1 (or 2) perspective.

Part I: Deciding on a graduate program in statistics

Part II: Thriving in a graduate program in statistics

Part III: What I wish I had known before I started a graduate program in statistics  (with Greg Matthews)

Part IV: What I wish I had learned in my graduate program in statistics (with Greg Matthews) The point of this series is to be as helpful as possible to students considering statistics graduate programs now or at some point later in their lives. As a result, if you have any comments, please feel free to share below. Also, I encourage anyone interested in this series to read two related pieces:

Cheers, and thanks for reading.

Maybe Doug Pederson wasn’t asinine after all

Former Kansas City offensive coordinator Doug Pederson is receiving a ton of flack in NFL circles for his passive approach to the end of Saturday’s Divisional Round contest at New England.

Down a pair of touchdowns, Kansas City, with Pederson calling plays, methodically moved the ball down the field midway through the fourth quarter, eventually scoring with less than two minutes remaining. Throughout the drive, the Chiefs took their time.

“It took us time because No. 1, we did not want to give [Patriots QB] Tom Brady the ball back,’’ said Pederson.

The Chiefs preferred strategy, it appears, was to score a touchdown, recover an onside kick, and score the equalizer. This was preferred over an earlier touchdown and a traditional kickoff.

In the aftermath of Kansas City’s loss, Pederson’s comment has been ridiculed as the worst answer ever, as well as senseless. That was my instinct, too, until an old friend asked me to look up some data.

Screen Shot 2016-01-20 at 2.28.58 PM.png

Using Armchair Analysis‘ play-by-play, I looked for every game in which the Patriots possessed the ball with between 2:00 and 4:00 minutes left, while holding a 6 to 8 point lead. This roughly reflects Pederson’s alternative – kicking the ball back to the Patriots, while still needing a touchdown to extend the game.

Of the 31 games since 2000 that meet this criterion, New England’s won 29 of them (94%).  The lone losses came against Seattle and Indianapolis, both road games for New England, and in both contests the lead was 6 points.  Indeed, of the 25 games when New England’s held the ball with a 7 or 8 point lead and 2:00-4:00 minutes left, the Patriots haven’t lost a single one of them.

Given these numbers, perhaps Pederson’s explanation makes some sense. Onside kicks boast about a 20% recovery rate  when the opponent is expecting it, according to Brian Burke’s research. Relative to how rare it is for the Patriots to surrender a lead, that 1 in 5 shot may have been Kansas City’s best chance after all.


Postscript 1: Obviously the Chiefs could’ve attempted an onside kick with an earlier score. But even if they did (and that’s a big if), their win probability with an onside kick a minute earlier is probably not all that different from their win probability with an onside kick with a minute and twenty seconds left.

Postscript 2: Kansas City should have gone for two after the first score. See more from Ben here.


It sucks to kick in the cold

There’s been some excellent work done on field goal kickers, but I figured that in preparation for Sunday’s game between Minnesota and Seattle, it’d be worth a look at the link between kicking success and game temperature. For related work, see Brian’s post here.

Using awesome data from Armchair Analysis, I grabbed all field goal attempts from 2000-2014. Via the mgcv package in R, I fit a generalized additive model of field goal success (Yes/No) as a function of three unknown smoothed functions – one apiece for distance, wind speed, and temperature – and a single fixed effect, for the season.  See the footnotes for more details and the code.

Here’s a plot of estimated success rate, by temperature, for a fixed field goal distance of 33-yards (this distance is the current extra point) and assuming a wind of 15 mph. In Sunday’s game, the temperature is expected to be 0 degrees.


The plot suggests that even with normal weather, the anticipated wind speed in Minnesota is enough to drop extra point rates from the usual success rate of 95% to about 90%. Add in the 0-degree temperature, and my model’s best guess at the success rate of an extra point lies between 80% and 85%.

Does this mean anything?

Maybe. At about 83%, it’s just as likely than as not that if the game has four touchdowns, kickers will be perfect. But it’s something to look out for if the game comes down to even the easiest of kicks.


Next, to compare across kicks of different distances, I fixed wind speed and looked at the predicted probabilities of successful 30, 40, and 50-yard field goal attempts.


The plot makes it evident that cold temperatures have the biggest effect on kicks of longer distances. The absolute drop from 70% to 50% with 50-yard FGs is gigantic, relative to a drop from 94% to 89% for 30-yard FGs.

In fact, one could argue that the change in field position is severe enough on long-distance field goals that they should be avoided all together in Minnesota on Sunday.


Finally, here are a pair of contour plots. To account for wind and temperature, I created a new variable, windchill.

First, a plot with lines showing the probability by distance and windchill.


According to the plot above,  a 40-yard field goal with a 25-below windchill (as is anticipated in Minnesota) is roughly the same probability as a 50-yarder in warmer weather.   By and large, a windchill of -25 degrees  appears to cost kickers 10 yards.

Second, here’s a heat map. The density of red reflects the distances and the temperature in which it is difficult to kick field goals.


Same idea. Kickers are near automatic in the white area, but things are a bit spottier in the red. And there’s a whole bunch more red when its cold out.


Notes on the method:

Generalized additive models are attractive with these (and many other) types of continuous explanatory variables because they don’t make assumptions about the functional form between say, temperature and field goal success. Thus, I don’t have to assume what my relationship is and then check it with other model fits. Further, by using cross-validation, I don’t worry about over-fitting. Note that the fixed effect for season ties into Ben Morris’ awesome work showing that kickers have gotten better.

Here’s the code on Github.


Finally, fewer shootouts.

As part of an effort to limit the number of games decided by a shootout, the NHL changed from a 4 v 4 overtime session to a 3 v 3. This update came after a (mostly) failed change before the prior season, in which the league instituted longer line changes in overtime in the hope of sparking more overtime goals.

Said commissioner Gary Bettman to the Boston Globe’s Amalie Benjamin this past July, “I think to the extent some people wanted to see fewer shootouts, this will get us there, and that’s fine.”

But while the 2015 change seemed promising, 3 v 3 play is relatively rare, and so most work into projecting the fraction of games going to a shootout required some extrapolation.

In any case, about a quarter of the way through the 2015-16 season, the league has seen some promising results. Shootouts are down, and significantly so.

Here’s a chart showing the fraction of overtime games decided by a shootout, by season, along with 95% confidence intervals. While nearly every season since the shootout’s implementation boasted rates around 60%, nearly half that many of this year’s OT games (33%) have reached the skills contest.


If the current numbers hold, it would mean that instead of roughly 12 shootouts a year, each team will play closer to 7 or 8. This seems like a good thing, as it means a lesser chance that a team’s playoff seed comes down to performance in the shootout.

Relatedly, there’s also a potential that the rule change will hurt a few specific teams while helping others. In my talk at NESSIS in September, I discussed why Pittsburgh, Chicago, and both of the New York teams ranked as the league’s best at shootouts over the past decade. But if those teams aren’t reaching the shootout – and this year, for example, Chicago hasn’t played in one yet – perhaps it’ll cost them a point or two in the standings over what they would’ve earned in the past. Along those lines, there are six teams that are yet to play in a single shootout thus far in 2015-16.

Final thought – we know shootouts are (mostly) a coin flip. But how often do better teams win in a sudden-death overtime? Is scoring during 4 v 4 or 3 v 3 compare also a coin flip?  Or is there a true, measurable, and repeatable talent?  Something to think about, which we should have more time to do without as many shootouts.

On the spotting of the ball in the NFL

There’s a fun reddit thread going around that shows the work of Joey Faulkner, who identified the varying ways in which officials spot the ball in the NFL. The graph shows pretty powerful evidence that implies refs make a subconscious decision to spot the ball near round numbers. In football, those round numbers are multiples of five.

The initial plot is pretty easy to replicate. Here it is, thanks data from Armchair Analysis.


Pretty cool to see all of the peaks near round numbers in the graph.

One possible issue with the graph above, however, is that many NFL drives initiate at the 20-yard line (after touchbacks), and, to a lesser extent, the 40-yard line. As a result, given the frequency of 5 and 10 yard penalties, we could expect to see fewer peaks on round numbers when dropping these drives.

So I dropped any drive that began at a team’s 20 or 40 yard line. Here’s the same plot.


By and large, we see the same results. There are peaks at all of the five-yard marks, which mostly reflect the original figure. As a result, it doesn’t appear the touchbacks and penalties are driving the findings.

However, I’d like to throw one more theory out there, which deals with player behavior. Take a closer look yard lines between 50 and 90 yards from the offensive team’s own goal.


While there are obvious peaks at multiples of five, those peaks appear to be coming at the expense of plays just short of those yard lines. Meanwhile, just after multiples of five, there are still several spikes. For example, compare yard lines like 56 and 54, 61 and 59, or 76 and 74. In each scenario, there are far fewer plays just short of the round number than there are afterwards.

It’s as if we have a series of skewed right histograms, beginning every five yards. Why is that? If there was a referee bias towards round numbers, wouldn’t it come evenly at the expense of plays just short and just after the five yard-line thresholds?

Well, one alternative is that the players themselves causing part of the funny shape. It is well known that athletes shoot for arbitrary thresholds, like the triple double in basketball, hitting 0.300, or running a sub 4:00 hour marathon (see this fun image about runners, for example). Perhaps when given the choice of whether or not to extend the play, football players are also shooting for round numbers, too.

Here’s code if you want to replicate the plots yourselves.


A <- read_csv("PLAY.csv")




ggplot(A1, aes(x=yfog)) +
geom_histogram(alpha=0.4, position="identity",binwidth = 1)+
scale_x_continuous("Yards from own goal")+Five38Thm+scale_y_continuous(lim=c(0,21000))+
theme(legend.text=element_text(size=16))+ggtitle("Number of plays at each yard line")
ggplot(A3, aes(x=yfog)) +
geom_histogram(alpha=0.4, position="identity",binwidth = 1)+
scale_x_continuous("Yards from own goal")+Five38Thm+scale_y_continuous(lim=c(0,21000))+
theme(legend.text=element_text(size=16))+ggtitle("Number of plays at each yard line")
ggplot(A3, aes(x=yfog)) +
geom_histogram(alpha=0.4, position="identity",binwidth = 1)+
scale_x_continuous("Yards from own goal",lim=c(50,90))+Five38Thm+
theme(legend.text=element_text(size=16))+ggtitle("Number of plays at each yard line")

On 3rd and one, why do teams go shotgun?

In yesterday’s game against New England, Miami faced a third-and-one from midfield with four minutes to go in the third period.  This was an important play. While the Dolphins trailed 22-7, Miami’s offense in the second half seemed to have awoken, having picked up 111 yards on 10 plays, including a Lamar Miller touchdown run on its first drive after the break. And although the visitors would eventually get blown out, it’s reasonable to argue that at this point, there was still a chance the Dolphins could make it a game.

But without even running the play, Miami did one of my least favorite things a team can do on a third down and short – it lined up in shotgun formation.

With Tannehill five yards behind center, Miller took a handoff and never had a chance, as the Dolphins running back was swallowed by Sealver Siliga and a host of Patriots for a two-yard loss. Miami punted the ball instead of attempting a 4th-and-three, and would go on to lose 36-7.

Of course, my intuition that shotgun can be a bad idea on third and short was more of a guess than anything else. As it turns out, data mostly backs me up.

Using data from Armchair Analysis, I looked at the conversion rates of the 7,094 third or fourth and one plays from 2000 to 2014. League-wide, there’s an absolute improvement of about 4.5% (67.5 success rate versus 63%) comparing plays under center versus from shotgun. The difference is statistically significant (and if you want the p-value, it’s about 2 in 10000).

Of course, offensive-specific data is probably more useful given that certain teams have ended up in short-yardage situations more often than others. When lined up in shotgun, 24 of the league’s 32 teams converted less often than when under center. Here’s a graph, with the red dots showing success rates when lined up in shotgun, and black dots the rates under center. The size of each dot is proportional to the number of plays each team ran, relative to other offensive units in these situations (overall, teams have been under center about 80% of the time).


In addition to most of the black dots falling above the red dots, it is interesting to see how different teams use different strategies. Carolina lines up under center less often than any other team in the data on third or fourth and short, and with good reason, as the Panthers conversion rates using the shotgun have been greater than 80%. For a team like Baltimore (49% in shotgun, 68% under center) or Green Bay (57%, 68%), perhaps more plays under center would have been warranted.

These results are more descriptive than anything else, and further work is warranted to account for other factors that may be involved in a team choosing to go shotgun. Notably, the league-wide percentages were pretty similar when I looked only at plays run in the first half. This makes me a bit more confident that the results aren’t skewed by the choices of trailing teams. Additionally, I am defining success rate as simply getting the first down – shotgun plays may be more likely to yield longer plays. A comparison of running versus passing may also be useful.

Finally, it is worth noting that teams are now using the shotgun more often in these situations. While only about 4% of short-yardage plays used the shotgun between 2000 and 2007, this number ballooned to 30% for the 2013 and 2014 seasons. In other words, teams are going shotgun more than ever, and, at least in short-yardage situations, it may be to their detriment.

In any case, the R code is super easy for this; check it out below, although you’ll need Armchair’s data. At under $50, I think its worth it.

library(ggplot2); require(extrafont);loadfonts()

A <- read_csv("PLAY.csv")
B <- read_csv("GAME.csv")
C <- read_csv("PASS.csv")
D <- read_csv("RUSH.csv")
E <- rbind(C[,c("pid","succ")],D[,c("pid","succ")])

 group_by(off) %>%
 summarise( = sum(sg==0),n.gun=sum(sg), & succ==1),success.gun=sum(sg==1 & succ==1)) %>%
 mutate( =, prop.gun = success.gun/n.gun)

ggplot(, aes(x=off, +
 geom_point(shape=16,aes(size=( xlab("Offensive team") +
 geom_point(shape=16, aes(x=off, y=prop.gun,size=(n.gun-20)/10),col="red")+
 Five38Thm+ggtitle("Short yardage conversion rates")+
 annotate("text", x = 27, y = .8, label = "Under center",size=6)+
 annotate("text", x = 26, y = .5, label = "Shotgun",size=6,col="red")</pre>

Talk on NHL shootouts

Schuckers and I have done some work on NHL shootouts, presented at the New England Symposium on Statistics in Sports.

The slides for our presentation are linked here.

A github page with R code is found here. The R code links to the data, which is publicly hosted for anyone to analyze.

Finally, we made a pair of interactive plots using Plotly. The shooter interactive plot is linked here, while the goalie interactive plot is linked here.

tl;dr version:

On their own, shootouts aren’t a crapshoot. First, there’s a decent amount of bias with respect to both when shooters are allocated to take attempts and which rounds those attempts are in. Second, shooters, and to a lesser extent goalies, vary more than we would expect them to if every player was equivalent.

All else equal, and since 2005, the best shootout shooter would have been worth about $700k to his team on shootout performance alone, and the best goalie worth about $1,000,000, on a per-year basis. Given the reduction in the frequency of games ending in a shootout going forward, however, these values are likely smaller going forward.

Finally, conditional on what we know about team behavior, shootouts remain much closer to a crapshoot than a sure thing. And they still aren’t a great way to end a hockey game.

Discretionary penalties in the NFL

As a former college offensive linemen, I’m well aware of the reputation that holding penalties have – ‘you could call one on every play’ goes the old adage.

Kevin and I wrote a paper, recently appearing in JQAS, in which we looked at the rates of NFL penalties. Specifically, we wanted to address how rates fluctuate over the course of the game.

Quick summary: the rates of discretionary penalties in NFL games are hugely correlated with time.

Here’s my favorite plot, where, letting OHR be the holding rate on run plays, OHP the holding rate on pass plays, and DPI the defensive pass interference rate, we compare versus game minute (1 through 60). These rates are adjusted for play and game characteristics, and given per 1000 plays along with 95% confidence limits.

Model estimated penalty rates by game minute. DPI: defensive pass interference. OHP: offensive holding on pass plays. OHR: offensive holding on running plays
Model estimated penalty rates by game minute. DPI: defensive pass interference. OHP: offensive holding on pass plays. OHR: offensive holding on running plays

The association between game minute and penalty likelihood is strongest for holding on running plays – with rates 4-5 times higher during parts of the second and third quarters, when compared to the beginning and ends of a contest. Holding penalties, and to a lesser extent the other judgemental calls that we looked at, are exceedingly rare in the game’s first and last minutes.

If you are interested in the paper, feel free to drop me an email and I can send you a copy.


If The Patriots Stole Playsheets, It Appears Not To Have Been A Big Advantage

On Tuesday morning, Outside the Lines released a damning investigation into the link between Spygate and Deflategate, two scandals that have consumed the New England Patriots in the past decade. There were a lot of revelations in there, but one of the more damaging ones had nothing to do with taped signals or the Ideal Gas Law. From Don Van Natta Jr. and Seth Wickersham’s report:

Several [former New England coaches and employees] acknowledge that during pregame warm-ups, a low-level Patriots employee would sneak into the visiting locker room and steal the play sheet, listing the first 20 or so scripted calls for the opposing team’s offense. (The practice became so notorious that some coaches put out fake play sheets for the Patriots to swipe.)

In response, Yahoo’s Charles Robinson wrote, “Knowing the first 15 or 20 offensive plays scripted by an NFL team is knowing the future.” If the Patriots indeed stole scripted play sheets, we’d expect to see their defensive performance peak early in the game, only to wilt later on. Indeed, the OTL report gives evidence of games where New England’s defense was staunch early, but later gave up big yardage.

But that’s anecdotal evidence. There’s a more empirical way to judge whether or not the alleged play stealing had any effect.

The chart below compares New England’s defensive performance for the first 5 playsof the game (when the Patriots would have had knowledge of what was coming) with the second 5 plays of the game (when they wouldn’t). (I also contrasted the first 20 plays and the second 20 plays. The results were similar). If New England was stealing play sheets, it would, in theory, yield fewer yards a play on the first five plays of the game than on the second five plays of the game. Armchair Analysis provided the data.

Interestingly, the shapes (histograms) are very similar, both within each era and in terms of the difference between eras. Further, in pretty much every year (not shown), there’s a consistent overlap between the yardage New England allowed on plays likely to have been scripted and those less likely to be scripted. Finally, in terms of average yards per play, New England’s patterns tend to match numbers from the rest of the league.  

What’s the take home?  It could be a couple of things. One explanation is that New England stole a bunch of play sheets, but just did a poor job of using them. Alternatively, as suggested in the ESPN piece, rogue opponents could have subbed in fake play sheets, fooling the Patriots and potentially our numbers. Of course, one or two stolen sheets probably wouldn’t show up in our data, and because we have no idea exactly which (if any) sheets were stolen, it’s somewhat hard to define what we are looking for.

Interestingly, the more noticeable change is that the Pats defense gave up about half a yard more per play overall post-2007 than pre-2007. Of course, there are several explanations for such a drop in performance, including differences in team personnel beginning and/or league-wide changes in offensive strategy. Conspiracy theorists, however, could argue that with a unit no longer able to accurately predict play calls, the Pats’ defense suffered. But this seems like a bit of a stretch. League-wide, there was no difference post 2007. 

Finally, although its tempting to jump to aggressive conclusions, it’s a good reminder that an absence of evidence is not evidence of an absence. Instead, all we know is what the data tells us, which in this case, is only a small part of the story; on a yards-per-play basis, the Pats defense was pretty consistent before and after an opponent would script its plays.

Two point art

Greg’s artwork inspired me to do something.

I’m not sure if this is art, because if that’s the case, then that may make me an artist.

But here it is.

Can you guess what the dots are for? Hint – it’s from the NFL, and I think its going to change.

Extra-point decisions since 2000
Extra point

The x-axis in the chart above is an index, from 1 to 19,255. This represents the number of touchdowns since 2000. The y-axis represents is point differential for the team on offense (from down 12 to up 12). The light dotted red line in the middle intercepts the y-axis at 0.  Finally, the dot depicts if the team went for two (blue is yes, light grey is no).

It’s pretty evident that teams almost always go for two in only a few situations. This isn’t a surprise. However, I’m curious how it will look after the 2015 season, given the recent rule changes. My guess is that we’ll see some changes in a few spots, with teams occasionally being more aggressive.

I also think its interesting how consistent the strategy has been over time.

For those of you who like axes, here’s the same plot.


And we expand the y-axis here:


Regression or Reversion? It’s likely the latter

With interest in statistical applications to sports creeping from the blogosphere to the mainstream, more writers than ever are interested in metrics that can more accurately summarize or predict player and team skill.

This is, by and large, a good thing. Smarter writing is better writing.  A downside, however, is that writers without a formal training in statistics are forced to discuss concepts that can take more than a semester’s work of undergraduate or graduate training to flesh out. That’s difficult, if not impossible and unfair.

One such topic that comes up across sports is the concept of regression toward the mean. Here are a few examples of headlines:

Regression to the mean can be a bitch! (soccer)

Clutch NFL teams regress to the mean (football)

Beware the regression to the mean (basketball)

30 MLB players due for regression to the mean (baseball)

Avalanche trying to stave off regression and history (hockey)

In each case, the regression (i) sounds scary, (ii) applies to over-performance, not under-performance, and (iii) is striving really hard to reach an exact target, in these examples a vaguely specified ‘mean.’

From a statistical perspective, however, regression toward the mean requires strict assumptions and precision, the context of which are almost never discussed in practice.  As a result, examples that refer to a regression to the mean may be ill-informed, and are often best described by a similar sounding but more relaxed alternative.

Using the notation and descriptions in Myra Samuels’ 1991 paper in the American Statistician, “Statistical Reversion toward the mean: More universal than regression toward the mean,” here’s a quick primer through the context of sports.

What is regression towards the mean?

Let X and Y be a pair of random variables with the same marginal distribution and common mean µ. Most often in sports, X and Y are simply team/individual outcomes that we are interested in measuring and describing. For example, X could be the batting average of a baseball player through July 1, and Y his average from July 2 through the end of the season. In this example, µ represents that player’s probability of getting a hit.

The definition of regression toward the mean is based on the regression function, E[Y|X = x]. That is, conditional on knowing one variable (X = x), what can we say about the other? Formally, regression toward the mean exists if, for all x > µ,

µ < E[Y|X = x] < x, 

with the reverse holding when x < µ.

This is a fairly strict requirement. For an outcome above a player or team’s true talent, we can expect that the ensuing outcome, on average, will lie in between µ and the original outcome. Linking to linear regression, for any initial observation x, the point prediction of y is regressed towards an overall mean representative of that subject. However, y will still exhibit some natural variation above and below the regression line; some points will fall closer to the mean, and others further away.

There are easy pitfalls when it comes to applying regression toward the mean in practice. The most common one is assuming that what goes up must come down. For example, assuming that players or teams become more and more average over time is not regression toward the mean. A second misinterpretation is linking regression toward the mean with the gambler’s fallacy, which entails assuming that a team or player that was initially lucky is then going to get less lucky. This is also not true. The probability of a fair coin landing heads, given that it landed heads five, ten, or even fifteen times in a row, remains at 0.5.  Such misinterpretations are frequent in sports, particularly when describing team performance with respect to point spreads or performance in close games.

While its easy to confuse regression toward the mean with such scenarios, there’s good news, in the form of some easy to understand alternatives.

What’s the better alternative?

To start, replacing ‘regression’ with ‘reversion’ relaxes the assumptions presented above while still implying that extreme observations are more likely to be followed by less extreme ones. More often than not, when writers speak of regression to the mean, using reversion is sufficient and accurate. Furthermore, Samuels proves mathematically that ‘regression toward the mean implies reversion toward the mean, but not vice versa.’ Namely, reversion is a more relaxed alternative; the conditional mean of the upper or lower portion of a distribution shifts, or reverts, toward an unconditional mean µ.

For example, in the headlines listed above, good soccer teams, MLB players hitting for high numbers, and the Colorado Avalanche were all more likely to revert to a more standard form that was indicative of their true talent. No regression equation is necessary.

In addition to generally being a more appropriate term, use of the word reversion has a side benefit, in that it is more interpretable when applied to outcomes that initially fell short of expectations. It is recognizable, for example, to expect an MLB batter hitting 0.150 to revert to form; meanwhile, it doesn’t make sense to claim that the same MLB batter will regress, given the negative connotations of the latter.

And is it regression/reversion ‘to’ or ‘toward’ the mean?

Well, it depends. While increased use of the word reversion is part of the solution, more precise writing should also consider both the outcome of interest and that outcome’s expected value. For example, here are two examples of over-performance:

Mike Trout hits 0.500 in his first ten games of the season.

Mike Trout tosses a coin 10 times, landing nine heads.

And here’s the same sentence to describe our future expectations – can you tell which one is accurate?

Mike Trout’s batting average will revert to the mean.

Mike Trout’s ability to land heads will revert to the mean.

In the first example, the outcome of interest is Mike Trout’s probability of getting a hit. Because we can comfortably say that Mike Trout is better than the league average hitter, while his batting average is going to come down, it is reverting towards an overall average, but not to the overall average.

Meanwhile, unless Trout can outduel the Law of Large Numbers, I can comfortably say that in the long term, his observed ability to land heads will revert to a probability of 0.5. In this silly example, the second statement is the more precise one.

Anything else worth discussing?

Well, maybe. In searching for some of the examples used above, I found it strange how little was written of the one word that tends to encompass much of a players’ performance above or below his or her true talent.


The obvious aspect linking, say, the Colorado Avalanche winning games while being outshot and Mike Trout tossing coins and landing heads, is that each was on the receiving end of some lucky breaks. So while we expect some type of reversion to or towards a more traditional performance, that’s to no fault of the Avalanche or Trout. With outcomes that are mostly (or entirely) random, variability above or below the league average is simply luck.  As a result, there’s nothing for the Avalanche, Trout, or even us to be scared/beware of. We wouldn’t tell Trout to fear a balanced coin, nor should we tell Avalanche fans to beware of reversion towards a more reasonable performance given their teams shot distribution.

The issue here lies not in a distinction between regression and reversion, but a deeper and more serious problem; humans have a poor grasp of probability. In sports (and likely in other areas of life), lucky outcomes are all too often touted as clutch, while unlucky players or teams are given the label of chokers. It’s standard practice to use terms like savvy to describe the Patriots win over Seattle, for example. A more skilled writer, however, would perhaps recognize that the Patriots were on the better end of a 50-50 coin toss, from more or less the start of the game all the way until the end (in more ways than one; the game closed as a near pick-em at sports books. Even bettors couldn’t nail down a winner).

Writes Leonard Mlodinow in The Drunkard’s Walk,

the human mind is built to identify for each event a definite cause and can therefore have a hard time accepting the influence of unrelated or random factors.

It’s difficult and counterintuitive to describe an outcome in sports as lucky. However, that’s what many of them are are.

So while it may sound trendy to toss around terms like ‘regress to the mean,’ it is often more accurate, and certainly more simple, to propose that some luck was involved in the initial outcome. As a result, a decline from overperformance is nothing more than a player or a team, much like a coin tosser no longer landing heads five times in a row, not getting as lucky as they initially had been.