Featured post

So you want a graduate degree in statistics?

http://apstatsmonkey.com/StatsMonkey/Descriptive_Statistics_files/internet-statistics_1.jpg

After six years of graduate school – two at UMass-Amherst (MS, statistics), and four more at Brown University (PhD, biostatistics) – I am finally done (I think).

At this point, I have a few weeks left until my next challenge awaits, when I start a position at Skidmore College as an assistant professor in statistics this fall.

While my memories are fresh, I figured it might be a useful task to share some advice that I have picked up over the last several years. Thus, here’s a multi-part series on the lessons, trials, and tribulations of statistics graduate programs, from an n = 1 (or 2) perspective.

Part I: Deciding on a graduate program in statistics

Part II: Thriving in a graduate program in statistics

Part III: What I wish I had known before I started a graduate program in statistics  (with Greg Matthews)

Part IV: What I wish I had learned in my graduate program in statistics (with Greg Matthews)

The point of this series is to be as helpful as possible to students considering statistics graduate programs now or at some point later in their lives. As a result, if you have any comments, please feel free to share below.

Also, I encourage anyone interested in this series to read two related pieces:

Cheers, and thanks for reading.

On Thursday Night Football outcomes

The competitiveness of Thursday Night NFL football (TNF) contests has been discussed practically everywhere this past week. The Bleacher Report, Pro Football Talk, and the USA Today, for example, all got in on the action, deriding the NFL for the number of blowouts in its Thursday contests.

Over on Forbes, Jim Pagels offers an interesting and more quantitative take, showing that, on the whole, Thursday night outcomes are not much different from the average outcomes from weekend contests. Pagels provides this table:

Screen Shot 2014-10-09 at 3.08.55 PM

These results mirror the work of Grantland columnist Bill Barnwell, whose research found no evidence that TNF games were any sloppier than the rest of the league’s games, as judged by, among other metrics, the number of turnovers and dropped passes.

One aspect missing from much of the discourse on TNF contests is whether or not those games were supposed to be close to begin with. This is important because the NFL, and its television networks, get to pick who plays on Thursday night!

For example, if TNF contests were played between teams that were more equivalent in terms of talent, we would actually expect TNF contests to be closer. Under such a scenario, if the average margin of victory for each contest (Thursday night vs. other days) was identical, that would actually imply that TNF contests were yielding more blowouts than we expected.

Fortunately, there is an easy way to judge the “closeness” that we expect in an NFL game- using the game’s point spread! Using Sunshine’s data system (with a major h/t to the “weekdays” function in R), here’s the average absolute spread for each NFL game, by year and weekday, since 1978.

PS

From the plot above, it’s fairly clear that the average point spread of TNF contests (in black) is not much different than that of other NFL games (in red or grey). If anything, over the past few years, TNF contests have had larger absolute point spreads than games played on weekends (the data goes through 2013).

Of course, it’s also strait-forward to compare each game’s point spread to it’s final margin of victory. To do so, I used the root mean square error, as FiveThirtyEight does here. The RMSE represents the square root of the average squared error between the point spread and the margin of victory.

Here’s a similar plot; this one shows the RMSE by year and game day, for all NFL games since 1978.

rmse

On the whole, the variability across the point spread does not appear to vary by the day of the week in which the game was played. If anything, Thursday night contests have actually been closer than the point spread anticipated over the last decade or so (2014, of course, aside). Also, it’s interesting how consistent the RMSE for all NFL games has been over time.

Overall, given these results, we can be more confident that the results of Pagels, Barnwell, and many others are not due to the fact that TNF contests were supposed to be closer, perhaps making it more reasonable to assume that the blowouts over the past few weeks are not accounted for by the day the game is played on, and instead simply due to chance.

It’s that time of year again! Play along with #PlayforOT

The National Hockey League is beginning its 98th season, and the 2014-2015 campaign marks the 10th consecutive season that the league will continue using the point system that it initially implemented after the 2004 lockout.

To review, here’s the current point system:

Win (regulation, overtime, shootout): 2 points
Loss (overtime, shootout): 1 point
Loss (regulation): 0 points

And here’s the expected point total for each team:

Overtime game: 1.5 points/team
Regulation: 1 point/team

For any of this blog’s newer readers, the increased incentives for overtime games have had three primary effects on game outcomes. They are as follows:

1) Teams play more overtime games than they used to.

Exactly 1 in 4 games went to overtime last year, compared to 1 in 5 games from past point systems. While that doesn’t sound like a big difference, that’s between 60 and 70 additional overtime games per season in the new points system.

Or, 60 to 70 extra points floating around the league’s standings.

2) Teams play more overtime games towards the end of the regular season. 

As the playoff chase heats up, so to do the rates of overtime games. About 30% of March and April games have reached overtime over the past few years, an increase over the 20% of games that reached OT between October and December. The implication is that teams play for overtime when the pressure to improve in the standings is higher, because overtime guarantees each participating team at least one point.

Last April was a great example: 6 of the league’s final 11 games went to overtime!

As a point of reference, in April of past point systems (for example, 1997-1999, when there was no point for overtime losers), only about 15% of games went to overtime.

3) Teams play overtime games more frequently against nonconference opponents. 

In my opinion, this is the most damning issue. If you are the Bruins, conceding a point to Vancouver is much preferred over conceding a point to Montreal, because the Canucks, unlike the Canadiens, are not a threat when it comes to postseason qualification. As a result, we would expect teams to be more apt to play OT against nonconference opponents. Sure enough, in previous research (see link here), I estimated about a 15-20% increase in the odds of overtime against nonconference opponents, relative to conference ones.

Further, certain teams appeared to have identified this inefficiency more than others. From the Sloan Conference last year, here’s my poster with team specific odds of overtime, comparing nonconference to conference games in different point systems.

You can also read more about this issue in this article that I wrote for The Hockey News. 

What happened last year?

Beginning last winter, I started to monitor teams playing for overtime using the twitter hashtag #playforOT. While many of those tweets have been lost in the archives, here are a few examples of boxscores that I uncovered last year.

Ex. 1: Columbus and Calgary posted just two shots on goal in final 4.5 minutes, none from inside 60 feet, to play for OT.

Ex. 2: In a tilt between Chicago and Florida in October, neither team recorded a shot within 36 feet in the game’s final eight minutes.

Ex. 3:In Chicago’s contest with Carolina, there were no shots in the game’s final 120 seconds (counting missed, blocked, or on goal shots). In hockey, that’s pretty hard to do.

Ex. 4: it must have been fun to watch the last three minutes of this one between Anaheim and Tampa Bay!

Screen Shot 2014-10-07 at 11.53.11 PM

Overall, given the league’s updated realignment (now, teams are judged relative to only their divisional opponents), we also observed a higher proportion of non-divisional within conference games (26%) than ever before.

So what about the 2014-2015 season?

If the league’s point system inefficiencies (or its loser point, or the 19 columns that the league standings require) bother you too, then play along.

Tweet (using #playforOT, or to @StatsbyLopez) box scores or anecdotes from games where teams stop trying to score in the waning minutes of tie game. While many OT games are likely due to chance, I feel strongly that because the league’s incentives for teams to play OT are as strong as ever, and that we will continue to see several games in which teams stop trying to score in order to reach OT.

Thanks for reading, and I’ll see you in the extra session.

This might be the best thing I did in graduate school

I was looking for a few old Intro Stat activities today, and came across this gem.

It’s a mathematical statistics cheat-sheet!

Specifically, it’s a .pdf file containing the 15 most common distributions in statistics, their density or mass functions, support, first and second moments, and additional notes (For example, did you know that the uniform distribution is a special case of the beta distribution?).

If you want the .word file, feel free to email me and I’ll send it over. I used this when studying for inference exams and then eventually qualifying exams.

Here’s a screenshot:

MathStat

Stat pundit rankings: MLB 2014 over/under win totals

The 2014 Major League Baseball regular season is over, making it the perfect time to look back at how stat pundits performed at predicting each team’s total wins.

Last year, Trading Bases bested competitors while also outperforming totals set in Las Vegas.

Before looking at this year’s results, let’s take a look at our competitors, along with each ones’ abbreviations.

O/U: The over/under set for each team in sportsbooks. I used ones set by sportsbetting.ag in early March (Note: if you are scoring at home, all bets I extracted were between +115 and -135. For simplicity, I treat each the same way)

BP: Baseball prospectus (PECOTA). Picks were taken from the site before the season, although there does not appear to be an active link for these projections.

PM: Prediction Machine

FG: Fan Graphs. There does not appear to be an active link for these projections, although the AL West’s are here

TB: Trading bases, the 2013 winner, and a website of Joe Peta

Cairo: Cairo

DP: Clay Davenport

LS: LiteSabers

Ensemble: An average of the first four site’s above.

Here are my metrics

MSE: Averaged squared error between the prediction and the win totals (lower is better)

MAE: Averaged absolute error between the prediction and the win totals (lower is better)

Percent: Fraction of successful over or under bets (higher is better)

O/U BP FG PM TB Cairo DP LS Ensemble
MSE 69.2 73.1 68.1 81.8 73.5 73.9 64.1 164.6 70.5
MAE 6.38 6.67 6.1 6.94 7.37 6.5 6.2 10.2 6.41
Percent NA 47 43 37 43 50 43 27 50

While Fan Graphs and Davenport appeared to outperform its competitors and boasted slightly lower average errors (MSE, MAE) than sportsbooks, betting each team according to both Fan Graph’s and Davenport predicted totals would have finished just 13-17. In general, the number predicted by sportsbooks finished closer to the eventual win totals.

In summary, it’s pretty amazing that not a single one of the nine prediction methods I used was able to finish with an overall record above 0.500.

Lastly, I plotted the predicted and observed win totals, using both sportsbook predictions (abbreviated as “Las Vegas Prediction”) and the ensemble method (the average of the first four sabermetrics/statistics sites above, listed as “Steadheads Prediction.”)

MLB2014

Win totals were relatively accurate for most teams.  The Rockies, Diamondbacks, Rangers, and Red Sox all falling short of expectation, while the Marlins, Orioles, and Angels all outperformed expectations.

Interestingly, betting the “over” on the 15 teams with the lowest predictions, and the “under” on the 15 teams with the highest prediction would have finished 19-11.

Thanks for reading, and if you have any other prediction website’s you’d like me to include, feel free to send them along.

To Excel or not to Excel?

On the Statistics Education Section’s email listserv, there is an interesting debate going on regarding the software with which statistics education should be taught.

The first email stated:

My dean is a member of a CUPM committee writing a section on using technology in teaching.  She has asked me for recommendations  for technology in teaching Statistics.  I know all the standard ones (TI calculators, Minitab, R, SPSS, SAS, JMP, etc).  Are there other forms of technology that should be mentioned here?

Many aspects of the conversation are curious, and in particular, the conversation was driven towards the usefulness of Microsoft Excel for teaching statistics.

At this point, St. Lawrence’s Robin Lock took over, responding:

Always fun to see the Excel for teaching statistics debate every few years - especially for those who may have missed the earlier versions.  I have to admit that I rarely use Excel for teaching or doing statistics, so I took a quick look at the latest version of base Excel to see what it now does.  

To keep things simple, in intro stat we usually worry about two basic kinds of variables, categorical and quantitative, so how would Excel do for displaying the distribution of a single sample of either of these variables.  Not too bad for categorical, I see bar charts, pie charts and some other possible options.  How about a single quantitative variable? Now I'm stuck.  Maybe, with some effort,  I can coerce a column chart to look vaguely like a histogram, but how about a dotplot, boxplot or real histogram with a good numeric horizontal scale.   There may be ways, but they aren't obvious to me and probably not convenient for students.  I'm sure someone will suggest some possibilities!

Could I recommend using software for teaching statistics that has no easy way to create a reasonable graphical display of one of the two common types of variables (and the type that most often needs a graphical display)?  No.  I'm sure there are add-ins that accomplish such tasks which might be reasonable to use in teaching, but without such capabilities I wouldn't want to try.  Maybe some of the 80% can show me how, but I would not be willing leave an idea as fundamental as looking graphically at the distribution of a single quantitative variable out of a course.

I wholeheartedly agree.

Note: This was published with Lock’s permission

Analyzing the SuperContest

This fall, the Westgate Las Vegas Hotel Casino is hosting the prestigious SuperContest, in which entrants pony up $1500 to pick 5 games a week against the spread. At season’s end, standings are judged by which entrants have the highest number of correctly picked games.

Last year’s winner was David Frohardt-Lane, who took home about $550k for first place. Perhaps not surprisingly, Frohardt-Lane is a statistician!

One of the neat aspects about the SuperContest is that the picks of all 1,403 of the 2014 entrants are posted immediately after the games begin (here). Even better, the data is fairly clean and easy to play with (at least so far).

That said, there are some very intriguing and potentially difficult questions to answer from a statistical perspective: How to account for the fact that people only pick 5 games per week? What about bye weeks? Do entrants pick each week seemingly by chance or are there trends over time? Are there teams that are picked more or less often? And how about success – do entrants perform better or worse than we would expect by chance? Lastly, does success beget more success?

I’ll hope to answer a few of these questions over the course of the season, and I encourage any interested readers to do so, too! Here’s a link to the google doc where I’ll post the picks (csv file). From what I can tell, the hotel only posts picks for the previous week, so I’ll do my best to update the file weekly.

For a taste of what one could do, I decided to plot Week 1 and Week 2 picks by team. The size of the circles in the graph below are proportional to the number of entrants which chose that team, and the colors (for Week 2 only) represent whether or not that team had covered the spread in Week 1 (Green is yes, red is no).

plot_zoom_png

For reference, 38% of participants took the Patriots in Week 2; only 6% of participants had Carolina in Week 1. 

My hypothesis was that teams which covered in Week 1 would be perceived as “hotter” and would be picked more often in Week 2. Hard to tell if that’s the case, although perhaps if you remove the Patriots as the outlier, you could make an argument that the green circles are larger than the red ones. It’s worth pointing out that lines are fixed on the Wednesday prior to games beginning, and so news of Adrien Peterson’s transgressions perhaps moved entrants to pick the Patriots at a higher rate. 

Also, it’s amazing how many entrants lost with Tampa Bay in Week 1 and didn’t pick the Bucs in Week 2.

Thanks for reading, and if you have any ideas for graphs or analyses, please feel free to share below or send me an email!

538 is six months old…where does it stand?

Wednesday marks the six-month anniversary of Nate Silver’s FiveThirtyEight launch with ESPN.

As the site mixes statistics, sports, data visualization, and academic research, it’s been a must-read on nearly a daily basis for me.

Here’s my unsolicited and slightly ambiguous view of where things stand:

It’s really, really hard to do what FiveThirtyEight is trying to do and to do it well.

Here’s why:

FiveThirtyEight’s business model is primarily based on advertising, and advertisers generally don’t flock to sites that only spit out content once a week. As a result, Silver and colleagues are forced to put out articles at a frenetic pace. As one example, I estimated that FiveThirtyEight wrote 2.5 articles per day covering the 2014 World Cup, posts which were generally written by only one or two full time writers. That’s an incredible pace.

But here’s the catch; while the judging of a data journalism website should be based on content alone, and not the frequency of posts, the precision that is required and expected for this level of research can often take an inordinate amount of time. For instance, on his old website, current 538 sports data journalist Benjamin Morris wrote an outstanding set of articles on Dennis Rodman. You should read or skim these if you haven’t already, as its some really fascinating and convincing stuff. It’s no surprise that Morris could put together this type of series, given both his background and, importantly, the time and effort he was able to devote to the series. Based on postdates, the whole series appears to have taken at least 9 months for Morris to put together.

The problem now is that 538 can’t afford to wait a full year for a series of Rodman posts or something similar – data journalism has to meet deadlines, even when the data analysis can’t.

Let’s now move to Morris’ recent piece on Week 2 of the NFL season, linked here. Midway through, he describes some research that looks into the eventual success of rookie QB’s. He writes:

There are a lot of ways to run regressions on rookie [QB] stats to see how likely these players are to have good careers, but here’s a basic version with quick and dirty t-values4 for each stat:

morris-feature-nflweek1-5

To see which stats are predictive, we’re looking to see which have t-values higher than 2 or lower than -2.....

So what does that mean for this year’s crop of rookies? Every week that Teddy Bridgewater and Johnny Manziel sit is more bad news for their career prospects. From a purely predictive perspective, it’s better for them to play and have a bad game than to sit.

It’s better news for Oakland quarterback Derek Carr, who started in Sunday’s performance against the Jets (good), had two touchdowns (good) and no interceptions (doesn’t matter) in a losing effort (doesn’t matter), but only threw for 151 yards (bad). With Carr continuing to start while others continue to sit, his stock continues to rise.

There’s nary a statistician in the country who would let this through an editorial process.

There are many reasons that rookie quarterbacks who start more games turn out to be better quarterbacks. Most obviously, if you start more games as a rookie QB, you are probably a better QB to begin with! Results like this are an obvious case of selection bias, in which QBs that receive one treatment (starting role) differ from those that receive another (bench role) with respect to other traits (talent) that are also associated with the outcome (career success).

To suggest that games started helps to predict future success ignores the talent that the starting rookie QB’s have more talent to begin with.

Let’s imagine, for example, that I had written the following:

There’s a significant association between being drafted early and how well a QB performs.

Well, yes, that statement is obviously true, and it’s an association with the same issue as the “games started” comparison. But what if followed it up with this:

For predictive purposes, that means if Tim Tebow had been drafted higher, he would play better.

You’d call me insane if I wrote that.

Morris isn’t insane, and far from it. But it is misleading at best to argue that Manziel, Bridgewater, and Carr’s career prospects are dependent upon games started because over the last several decades, NFL coaches have usually been smart enough to play good rookie QB’s but not bad rookie QB’s.

Morris’ NFL Week 2 article was posted on 538 on Friday at 1:00 pm. I won’t pretend to know the ins and outs of a webpage’s business model, but I’m smart enough to know that there’s a big difference in pageviews between Friday at 1:00 pm and, say, Friday at 7 pm or Saturday at 7 am, particularly for a post on relating to the NFL’s upcoming weekend.

Additional time to write, an exhaustive editorial process, or the collaboration of a few writers on a piece like this may have yielded additional care in the word choice for the Rookie QB’s section.

But taking care, of course, might not make a deadline.

——————————————————————————

Other related comments that didn’t fit above:

1- In almost all cases, the graphics on 538 are well designed. It’s clear that lots of effort goes into these, and it is not unnoticed.

2- The site has really picked up its work on areas outside of sports over the last few months. Ask Mona keeps me on my toes, the eventual articles on Ferguson were well done, and I’ve enjoyed several of the pieces in the science and economics sections.

3- I’d love to know how much money was wasted in 538’s burrito contest.

4- I encourage readers to check out Alberto Cairo’s related comments here.

5- From a reproducibility aspect, I respect the fact that 538 links to and shares some of its data. Great for aspiring and interested researchers.

6- If nothing else, 538’s doing a much better job with statistics than Grantland (Ex 1, Ex 2)

7- I’d love to see 538 put some long-form pieces together (like Morris did with the Rodman series). It’s writers are creative and talented enough to change the way fans, coaches, and players think about sports, and longer and more in depth pieces could do that.

8- Final idea: The “Sports Analytics Blog” runs a great weekly round-up, which summarizes important sports analytics articles written during the previous week, using content from both major websites and smaller blogs. This is a great resource for weeks in which I don’t have the time to access content daily.

538 should copy/poach/adopt this model, particularly with an eye towards academic articles. For example, the recent article of JQAS just came out, and it’d be easy to link and summarize the key articles (especially because most people can’t get around the paywall). It’d be an easy post to write, if nothing else.

Postscript 1: Morris responds: “Fair enough, but I don’t think the rookie thing means to imply all that you suggest it’s failing to prove….i.e., my analysis of rookie QBs has always been acausal. E.g. or

Postscript 2: Not sure how I forgot this, but someone needs to remind the boss the difference between probabilities and odds.

Postscript 3: Eric writes the following:

An interesting older study that might have something to say about your Tim Tebow comment:
Staw, Barry M., and Ha Hoang. “Sunk costs in the NBA: Why draft order affects playing time and survival in professional basketball.” Administrative Science Quarterly (1995): 474-494.

If Tebow had been drafted higher, he wouldn’t necessarily be better, but the owner/manager would have stuck with him longer. Too much of a sunk cost.

Postscript 4: Another reader writes:

One thing someone pointed out to me and I can’t get past now: 538 follows the web tradition of just using hyperlinks for references. That format makes sense for generic blogs, but for a page that has a footnote format, why not use footnotes to give the reader a sentence on what the linked piece actually did so they know whether they should click it?