Featured post

So you want a graduate degree in statistics?


After six years of graduate school – two at UMass-Amherst (MS, statistics), and four more at Brown University (PhD, biostatistics) – I am finally done (I think).

At this point, I have a few weeks left until my next challenge awaits, when I start a position at Skidmore College as an assistant professor in statistics this fall.

While my memories are fresh, I figured it might be a useful task to share some advice that I have picked up over the last several years. Thus, here’s a multi-part series on the lessons, trials, and tribulations of statistics graduate programs, from an n = 1 (or 2) perspective.

Part I: Deciding on a graduate program in statistics

Part II: Thriving in a graduate program in statistics

Part III: What I wish I had known before I started a graduate program in statistics  (with Greg Matthews)

Part IV: What I wish I had learned in my graduate program in statistics (with Greg Matthews)

The point of this series is to be as helpful as possible to students considering statistics graduate programs now or at some point later in their lives. As a result, if you have any comments, please feel free to share below.

Also, I encourage anyone interested in this series to read two related pieces:

Cheers, and thanks for reading.

To Excel or not to Excel?

On the Statistics Education Section’s email listserv, there is an interesting debate going on regarding the software with which statistics education should be taught.

The first email stated:

My dean is a member of a CUPM committee writing a section on using technology in teaching.  She has asked me for recommendations  for technology in teaching Statistics.  I know all the standard ones (TI calculators, Minitab, R, SPSS, SAS, JMP, etc).  Are there other forms of technology that should be mentioned here?

Many aspects of the conversation are curious, and in particular, the conversation was driven towards the usefulness of Microsoft Excel for teaching statistics.

At this point, St. Lawrence’s Robin Lock took over, responding:

Always fun to see the Excel for teaching statistics debate every few years - especially for those who may have missed the earlier versions.  I have to admit that I rarely use Excel for teaching or doing statistics, so I took a quick look at the latest version of base Excel to see what it now does.  

To keep things simple, in intro stat we usually worry about two basic kinds of variables, categorical and quantitative, so how would Excel do for displaying the distribution of a single sample of either of these variables.  Not too bad for categorical, I see bar charts, pie charts and some other possible options.  How about a single quantitative variable? Now I'm stuck.  Maybe, with some effort,  I can coerce a column chart to look vaguely like a histogram, but how about a dotplot, boxplot or real histogram with a good numeric horizontal scale.   There may be ways, but they aren't obvious to me and probably not convenient for students.  I'm sure someone will suggest some possibilities!

Could I recommend using software for teaching statistics that has no easy way to create a reasonable graphical display of one of the two common types of variables (and the type that most often needs a graphical display)?  No.  I'm sure there are add-ins that accomplish such tasks which might be reasonable to use in teaching, but without such capabilities I wouldn't want to try.  Maybe some of the 80% can show me how, but I would not be willing leave an idea as fundamental as looking graphically at the distribution of a single quantitative variable out of a course.

I wholeheartedly agree.

Note: This was published with Lock’s permission

Analyzing the SuperContest

This fall, the Westgate Las Vegas Hotel Casino is hosting the prestigious SuperContest, in which entrants pony up $1500 to pick 5 games a week against the spread. At season’s end, standings are judged by which entrants have the highest number of correctly picked games.

Last year’s winner was David Frohardt-Lane, who took home about $550k for first place. Perhaps not surprisingly, Frohardt-Lane is a statistician!

One of the neat aspects about the SuperContest is that the picks of all 1,403 of the 2014 entrants are posted immediately after the games begin (here). Even better, the data is fairly clean and easy to play with (at least so far).

That said, there are some very intriguing and potentially difficult questions to answer from a statistical perspective: How to account for the fact that people only pick 5 games per week? What about bye weeks? Do entrants pick each week seemingly by chance or are there trends over time? Are there teams that are picked more or less often? And how about success – do entrants perform better or worse than we would expect by chance? Lastly, does success beget more success?

I’ll hope to answer a few of these questions over the course of the season, and I encourage any interested readers to do so, too! Here’s a link to the google doc where I’ll post the picks (csv file). From what I can tell, the hotel only posts picks for the previous week, so I’ll do my best to update the file weekly.

For a taste of what one could do, I decided to plot Week 1 and Week 2 picks by team. The size of the circles in the graph below are proportional to the number of entrants which chose that team, and the colors (for Week 2 only) represent whether or not that team had covered the spread in Week 1 (Green is yes, red is no).


For reference, 38% of participants took the Patriots in Week 2; only 6% of participants had Carolina in Week 1. 

My hypothesis was that teams which covered in Week 1 would be perceived as “hotter” and would be picked more often in Week 2. Hard to tell if that’s the case, although perhaps if you remove the Patriots as the outlier, you could make an argument that the green circles are larger than the red ones. It’s worth pointing out that lines are fixed on the Wednesday prior to games beginning, and so news of Adrien Peterson’s transgressions perhaps moved entrants to pick the Patriots at a higher rate. 

Also, it’s amazing how many entrants lost with Tampa Bay in Week 1 and didn’t pick the Bucs in Week 2.

Thanks for reading, and if you have any ideas for graphs or analyses, please feel free to share below or send me an email!

538 is six months old…where does it stand?

Wednesday marks the six-month anniversary of Nate Silver’s FiveThirtyEight launch with ESPN.

As the site mixes statistics, sports, data visualization, and academic research, it’s been a must-read on nearly a daily basis for me.

Here’s my unsolicited and slightly ambiguous view of where things stand:

It’s really, really hard to do what FiveThirtyEight is trying to do and to do it well.

Here’s why:

FiveThirtyEight’s business model is primarily based on advertising, and advertisers generally don’t flock to sites that only spit out content once a week. As a result, Silver and colleagues are forced to put out articles at a frenetic pace. As one example, I estimated that FiveThirtyEight wrote 2.5 articles per day covering the 2014 World Cup, posts which were generally written by only one or two full time writers. That’s an incredible pace.

But here’s the catch; while the judging of a data journalism website should be based on content alone, and not the frequency of posts, the precision that is required and expected for this level of research can often take an inordinate amount of time. For instance, on his old website, current 538 sports data journalist Benjamin Morris wrote an outstanding set of articles on Dennis Rodman. You should read or skim these if you haven’t already, as its some really fascinating and convincing stuff. It’s no surprise that Morris could put together this type of series, given both his background and, importantly, the time and effort he was able to devote to the series. Based on postdates, the whole series appears to have taken at least 9 months for Morris to put together.

The problem now is that 538 can’t afford to wait a full year for a series of Rodman posts or something similar – data journalism has to meet deadlines, even when the data analysis can’t.

Let’s now move to Morris’ recent piece on Week 2 of the NFL season, linked here. Midway through, he describes some research that looks into the eventual success of rookie QB’s. He writes:

There are a lot of ways to run regressions on rookie [QB] stats to see how likely these players are to have good careers, but here’s a basic version with quick and dirty t-values4 for each stat:


To see which stats are predictive, we’re looking to see which have t-values higher than 2 or lower than -2.....

So what does that mean for this year’s crop of rookies? Every week that Teddy Bridgewater and Johnny Manziel sit is more bad news for their career prospects. From a purely predictive perspective, it’s better for them to play and have a bad game than to sit.

It’s better news for Oakland quarterback Derek Carr, who started in Sunday’s performance against the Jets (good), had two touchdowns (good) and no interceptions (doesn’t matter) in a losing effort (doesn’t matter), but only threw for 151 yards (bad). With Carr continuing to start while others continue to sit, his stock continues to rise.

There’s nary a statistician in the country who would let this through an editorial process.

There are many reasons that rookie quarterbacks who start more games turn out to be better quarterbacks. Most obviously, if you start more games as a rookie QB, you are probably a better QB to begin with! Results like this are an obvious case of selection bias, in which QBs that receive one treatment (starting role) differ from those that receive another (bench role) with respect to other traits (talent) that are also associated with the outcome (career success).

To suggest that games started helps to predict future success ignores the talent that the starting rookie QB’s have more talent to begin with.

Let’s imagine, for example, that I had written the following:

There’s a significant association between being drafted early and how well a QB performs.

Well, yes, that statement is obviously true, and it’s an association with the same issue as the “games started” comparison. But what if followed it up with this:

For predictive purposes, that means if Tim Tebow had been drafted higher, he would play better.

You’d call me insane if I wrote that.

Morris isn’t insane, and far from it. But it is misleading at best to argue that Manziel, Bridgewater, and Carr’s career prospects are dependent upon games started because over the last several decades, NFL coaches have usually been smart enough to play good rookie QB’s but not bad rookie QB’s.

Morris’ NFL Week 2 article was posted on 538 on Friday at 1:00 pm. I won’t pretend to know the ins and outs of a webpage’s business model, but I’m smart enough to know that there’s a big difference in pageviews between Friday at 1:00 pm and, say, Friday at 7 pm or Saturday at 7 am, particularly for a post on relating to the NFL’s upcoming weekend.

Additional time to write, an exhaustive editorial process, or the collaboration of a few writers on a piece like this may have yielded additional care in the word choice for the Rookie QB’s section.

But taking care, of course, might not make a deadline.


Other related comments that didn’t fit above:

1- In almost all cases, the graphics on 538 are well designed. It’s clear that lots of effort goes into these, and it is not unnoticed.

2- The site has really picked up its work on areas outside of sports over the last few months. Ask Mona keeps me on my toes, the eventual articles on Ferguson were well done, and I’ve enjoyed several of the pieces in the science and economics sections.

3- I’d love to know how much money was wasted in 538’s burrito contest.

4- I encourage readers to check out Alberto Cairo’s related comments here.

5- From a reproducibility aspect, I respect the fact that 538 links to and shares some of its data. Great for aspiring and interested researchers.

6- If nothing else, 538’s doing a much better job with statistics than Grantland (Ex 1, Ex 2)

7- I’d love to see 538 put some long-form pieces together (like Morris did with the Rodman series). It’s writers are creative and talented enough to change the way fans, coaches, and players think about sports, and longer and more in depth pieces could do that.

8- Final idea: The “Sports Analytics Blog” runs a great weekly round-up, which summarizes important sports analytics articles written during the previous week, using content from both major websites and smaller blogs. This is a great resource for weeks in which I don’t have the time to access content daily.

538 should copy/poach/adopt this model, particularly with an eye towards academic articles. For example, the recent article of JQAS just came out, and it’d be easy to link and summarize the key articles (especially because most people can’t get around the paywall). It’d be an easy post to write, if nothing else.

Postscript 1: Morris responds: “Fair enough, but I don’t think the rookie thing means to imply all that you suggest it’s failing to prove….i.e., my analysis of rookie QBs has always been acausal. E.g. or

Postscript 2: Not sure how I forgot this, but someone needs to remind the boss the difference between probabilities and odds.

Postscript 3: Eric writes the following:

An interesting older study that might have something to say about your Tim Tebow comment:
Staw, Barry M., and Ha Hoang. “Sunk costs in the NBA: Why draft order affects playing time and survival in professional basketball.” Administrative Science Quarterly (1995): 474-494.

If Tebow had been drafted higher, he wouldn’t necessarily be better, but the owner/manager would have stuck with him longer. Too much of a sunk cost.

Postscript 4: Another reader writes:

One thing someone pointed out to me and I can’t get past now: 538 follows the web tradition of just using hyperlinks for references. That format makes sense for generic blogs, but for a page that has a footnote format, why not use footnotes to give the reader a sentence on what the linked piece actually did so they know whether they should click it?

The B1G had a bad Saturday. How bad was it?

The conference formally known as the Big 10 seemingly had one of the worst weekends it could imagine, with most of its football teams either losing on the national stage or struggling against in contests versus perceived lower level opponents. 

So how bad was the B1G’s day?

To start, the 13 teams playing (Indiana was off) finished 2-11 against the Las Vegas point spread, with several teams falling well short of the game’s closing number. 

Here’s a dot-chart of how each team did, relative to game point spreads. For example, Nebraska, which was favored by 35.5 points over McNeese State but only won by 7, had the conference’s worst day relative to the point spread expectation (-28.5). 


On the whole, a conference finishing 2-11 ATS is bad; due to chance, and assuming each game’s ATS result is a coin flip, a sample of 13 games would only produce 2 wins or fewer about 1 in 100 times.

What made the conference’s bad day even worse is that so many of its results did not seem like coin flips; Nebraska, Michigan, Ohio State, Rutgers, and Purdue all finished more than 20 points worse than Las Vegas expected them too. Overall, the conference was about two touchdowns worse per game than expected (172.5 points total, or 13.2 points per game). 

Fortunately, its relatively strait-forward to quantify this ineptitude. Football game margins, relative to the point-spread, follow a Normal distribution with mean 0 and a standard deviation of σ. In this case, σ represents the average distance between the game’s eventual margin of victory and the margin of victory predicted by the point spread. 

For NFL games, it’s been suggested that σ ≈ 13. Given that there’s likely more variability in NCAA game outcomes, let’s allow for σ ≈ 15, which, given results shown here, seems fair. One option is to compute the probability of 13 games finishing, on average, 13.2 points worse than 0 using properties of the Normal distribution.

Instead, for reasons that we’ll see later, I chose to simulate game outcomes. 

Using σ = 15 points, 13 teams would finish with results like the B1G’s (172.5 total points worse than the spread) about 1 in 1500 times.

Here’s a plot of the total point spread margin over 100,000 simulations of 13 games (such a large number of simulations is needed with rare outcomes).


So, it’s pretty clear that the Big1G had a historically bad weekend. 

That’s not all, however. The above graph looks only at total margin against the spread, and doesn’t account for the fact that the B1G also covered (e.g., finished with a positive margin ATS) just 2 of its 13 games. Those two results are highly correlated, but I figured it was worth checking. 

Running the same simulations, I looked for scenarios where 13 games produced just 2 positive numbers ATS and finished with a total margin of 172.5 points or worse. This happened on 241 of 500,000 simulations, or about 1 in every 2000. 

Overall, I estimate the probability of the B1G’s weekend as follows:

Event A = Covering 2 of 13 games (or fewer): about 1 in 100

Event B = Being outscored be point-spread expectations by 172.5 points or more in 13 games: about 1 in 1,500

A ∩ B = Covering 2 games or fewer, and being outscored by expectations by 172.5 points or more in 13 games: 1 in 2,000

Given that there are about 12 weekends a year with the conference playing this many games, we’d expect the B1G to only have a weekend this bad every 165 seasons or so. 

So, yes, it was a bad weekend for B1G football.

Like once a century or two bad. 


1) Considering that there are 10 conferences in the FCS, we’d only expect any conference to have a weekend this bad about once every 15 seasons. 

2) The correlation between the standard deviation of the ATS margin and the game’s total (the expected total number of points in the game) is higher than you might expect, and definitely non-zero (0.35). Given the low totals in B1G games, using σ = 15 likely overestimates the true σ for the conference’s games this past weekend. Using a σ lower than 15 would actually decrease the likelihood of observing this past weekend’s results.

In this sense, these results are likely being slightly generous to the conference. 

3) Here’s my R code. Thanks for reading!

Teams<-c("Illinois","Nebraska","Penn State", "Purdue", "Rutgers", "Wisconsin","Northwestern",
 "Minnesota","Iowa","Maryland","Michigan State","Michigan","Ohio State")
par(mar=c(4,5,1,1 ))
 xlim=c(-30,30),xlab="Performance against spread")


for(i in 1:500000){

hist(b,xlab="",main="Total points, relative to point spread")
legend(-200,13000,"Big 1G",col="red",cex=1.3,pch=16)


Player tracking and snake oil: sports sessions at JSM 2014

Two of the most interesting sessions – at least as judged by twitter interest – at the 2014 Joint Statistical Meetings in Boston were the player tracking session (abstracts here) and the hockey analytics panel (here). Here’s a brief summary, with some relevant tweets.

A) First, some links

Here’s the .mp3 of our hockey analytics talk – we apologize about the sound quality!

Here are my slides on hockey’s point system

Here are Luke Bornn‘s slides from the player tracking session.

Here are Sam Ventura’s slides on quantifying defensive ability in hockey.

Lastly, click here for Andrew’s post, which links to talks from Michael Schuckers and Kevin Mongeon.

B) Eye in the Sky: The Player Tracking Revolution in Sports Analytics

Hearing the story of how Goldsberry’s NBA player tracking data came to be was really interesting. Dan Cervone is a graduate student in the statistics department at Harvard, and joined Goldsberry in a group interested in the NBA’s player tracking data. While he didn’t get a chance to speak at JSM, I have a lot of respect for Dan, and it was nice of Goldsberry to provide a walk-through of the student contributions to one idea for analyzing NBA player tracking data.


Given Goldsberry’s familiarity with player tracking data, I thought this was a really interesting observation.


The above image represents Prozone’s NHL player tracking software. Neat stuff. One unsolved issue that was discussed at JSM regarding tracking data in hockey is the uncertainty of which team possesses the puck. Unlike basketball, baseball, or football, possession in hockey is often unclear, and is perhaps best described in terms of a probability.


Intriguing comment from Kirk.


Related follow-up question: Is an academic career in sports research a career killer? (I hope not)


C) Statistics on Ice: Advances in Methods for the Analysis of Ice Hockey

Really interesting forthcoming work from Kevin, although this tweet wasn’t all that clear.

Kevin’s research has found that teams with several players from the same European country were more successful than those with players from several different European countries. However, when looking at individual shifts, it was not relevant for the players from the same European country to be playing on the same lines or on the ice at the same time. Thus, if there is a benefit to a diversified roster, it would be to due to a more welcoming locker room, and not because of a more comfortable style of play.


Because counting data in hockey is usually measured by a set of officials far above the ice, there is the chance for varying rates by arena. Ken Pomeroy has found similar results in college basketball. In any case, this was fun work from Schuckers, who channeled Edward Tufte with his graphics. I look forward to this work being formalized.


There are two aspects of player tracking that most statisticians aren’t ready for. First is the size – standard laptops are nowhere near big enough – and second is the spatial aspect. It will be interesting to see if, how, and when a familiarity with handling this type of data pays off for a franchise. My guess is that we are still a few years away.

Postscript to the 2014 Joint Statistical Meetings

As in 2013, I had a great experience at the 2014 Joint Statistical Meetings (JSM), held August 3-7 in Boston.  I made it to about 20 sessions, and while that sounds like plenty, it still means I missed about 640 other ones!

Here are some of my take home points from the last few days – I’ll post on the sports analytics panels I attended some time next week.

     1. The teaching of statistics is changing, and changing rapidly.

Among the most popular sessions that I attended were a pair (one, two) sponsored by the Section in Statistics Education. While these talks really interesting, I was more amazed at how many statistics teachers appeared willing to enter unfamiliar territory by shunning the curriculum’s that they’ve taught for the past several decades. For example, I heard several comments like these ones, from what appeared to be current statistics faculty members.

“I think I’m finally going to have to learn R”

“My students will love a course in data science. I just hope I can learn it myself?”

“It doesn’t really make sense to use that book anymore”

Along similar lines, the amount of attendees that were rapidly taking notes at each of these talks was astounding. I say this with my own admission – I took my own notes on twitter!

This enthusiasm to improve the teaching of statistics is great, and it’s sorely needed. I was lucky – when I decided that I wanted to be a statistics professor, I had no choice but to learn R to do my research. Others weren’t so lucky, and now they have to relearn computing on their own time. I’m glad this is happening, I hope the transformation continues.

These discussions reminded me of a few lines from George Cobb, professor emeritus at Mt. Holyoke College, who was a likely a few years ahead of his colleagues back in 2007, when he advocated for teaching statistics using randomization methods and a sound base in computing.

For two thousand years, as far back as Archimedes, and continuing until just a few decades ago, the mathematical sciences have had but two strategies for implementing solutions to applied problems: brute force and analytical circumventions……before computers, there was no alternative. Now, there is no excuse.


     2. A Thursday morning session is like losing the lottery.

I was really looking forward to Thursday morning’s panel on Big Data innovations: Sherri Rose and Corwin Zigler of Harvard are both young stars in causal inference, which is my research area, and Etsy’s Hillary Parker and Facebook’s Sean Taylor are two statisticians who have helped expand the popularity of data science. Given the popularity of Big Data talks earlier in the week, I expected a big crowd. Here’s the observed, Thursday morning crowd:

Embedded image permalink

If you were wondering, the six people pictured were the six people involved in the session. There were about 15 additional people behind my camera.

While we’ll never for sure – the fundamental problem of causal inference is that we only got to observe the attendance for this talk at this one session, and not at any other times – but I suspect that if this same talk were to have been held Monday afternoon, the room would have been filled.

        3. There were some really, really good talks

This year, I went outside the box and attended some talks on applications of statistics in areas that I hadn’t heard of before, and found them to be both informative and invigorating. Lots of fun statistics memes and visualizations, for example. And the coolest part is that several speakers posted their talks online.

Here are a few people that posted really interesting slides – please email me if you have others.

Randy Prium on R/Rstudio/Mosaic using this cheat-sheet (here)

Nick Horton on Thinking with Big Data (here)

Mine Cetinkaya-Rundel on Data Fest (here)

Yihui Xie on Reproducible Research (here)

Chris Fonnesbeck on MCMC using Python (here)

     4. Monday’s keynote speaker registered as a missing value

The biggest talk at JSM is reserved for Monday afternoon’s presidential invited lecture. Last year, for example, Nate Silver gave his 11 principles for statistics during this time to a huge crowd in Montreal.

The 2014 speaker for this session was Stephen Sigler, whose talk was titled the “7 pillars of statistical wisdom.” It’s embarrassing to say this, but I actually left early. Why? There was no room. The ballroom for Sigler’s talk only fit about 2/3rds of the audience that was interested, and with no auxiliary room to watch the talk, several people were either turned away at the door or left early due to the crowded ballroom.

As a statistician, it was disappointing to see the attendance prediction estimate be so far off!

If you want to hear more about Sigler’s pillars, see Rick’s summary here.

5. The ASA’s Talent Show was really, really fun.

Credit to these four acts for clever lyrics and great performances.

  • Almost Shirley
  • The Imposteriors
  • Fifth Moment Band
  • Jami Jackson

Glad there was a four-way tie for the top prize, as they each earned it.

In any case, thanks for reading, and I look forward to Seattle in 2015 and beyond (see the schedule below).