NCAA hoops & circadian rhythms

One of my favorite NFL studies (Smith et al, 1996, with several follow-ups) looks at the results of night contests in which a West Coast team has played an East Cost team. Researchers have suggested that, during these contests, West Coast teams perform at a much higher level than the East Coast teams, even after accounting for the point spread set by sportsbooks.

Why might these results occur in night games? Writes this article from Deadspin,

Without knowing it, athletes on teams from the East Coast are playing at a disadvantage. Because of the circadian rhythm, which they can't control, their bodies are past their natural performance peaks before the first quarter ends. By the fourth quarter, the team from the East Coast will be competing close to its equivalent of midnight. Their bodies will be subtly preparing for sleep by taking steps such as lowering the body temperature, slowing the reaction time, and increasing the amount of melatonin in their bloodstream. Athletes on the team from the West Coast, meanwhile, are still competing in the prime time of their circadian cycle.

Unfortunately, in the NFL, night games between West Coast and East Coast franchises are few and far between. In 2014, for example, I only recall two such games, Seattle beating Washington and New England ousting San Diego. Relatedly, Danny Tuccitto found a similar issues with the schedule for Football Outsiders a few years back (bonus: Danny also shows a spreadsheet with the results of past NFL games).

All of this brings me to tonight’s Division 1 NCAA men’s hoop contest between Xavier (EST) and Arizona (PST), which tipped off at 10:40 EST. Given the late start, Xavier is currently playing well past its supposed prime for athletic performance, while Arizona, according to the circadian cycle, should be in much better position. This begs the obvious question – is it worth looking at circadian rhythms in the NCAA basketball tournament?

Going back to 2002, I extracted any NCAA men’s D1 tournament game that was played at 9:00 EST or later. I found 24 of them (note: I did this manually, and my identification of East Coast and West Coast time zones may be a bit off).

Here’s a screenshot of games: I counted West Coast teams at 11-13 ATS.

Screen Shot 2015-03-27 at 12.17.39 AM

At least relative to the game’s spread, there does not see to be an advantage for West Coast teams playing night games against East Coast opponents.

Of course, there are several caveats here. First, the advantage of playing at night could already be built into the line. Second, we are dealing with a really small sample size – only a few tournament games per year meet this standard – making it difficult to learn much. At any rate, if anyone is interested in studying this further, please send along your results. Other postseason games, and perhaps even regular season contests, would be interesting to look at.

For now, given that these athletes are often competing during night games all season, it certainly seems plausible that the effect of circadian rhythms is limited and/or negligible in postseason college hoops.

March Madness bracket advice, adjusted for 2015

It’s that time of year again, and while you can go to just about any media outlet for March Madness advice, I’m fairly confident you won’t get most of the stuff that I’m going to write about here. I think that’s a good thing?

As preliminary thoughts, feel free to check out my two posts from last year:

Value and March Madness

What are the actual odds of someone picking a perfect bracket?

Okay, here are some thoughts and general strategies.

1- Your first round choices depend on your scoring system. And maybe even your second round picks, too.

Most pools can generally be separated into one of two categories – those with upset points or those without upset points.

Strangely enough, the vast majority of people entering picks in pools with upset points pick the same way as they would in pools without upset points. This is silly. In upset pools, for example, correctly backing a No. 13 seed to beat a No. 4 seed could be worth 5-10 times as much as the alternative. This makes taking Mercer over Duke, or Georgia State over Baylor, turn into reasonable choices.

If your pool does not have upset points, however, and is scored in standard 1-2-4-8-16-32 form, its usually not worth picking many upsets. In these formats, the only thing that really matters is picking the champion and finalist. So, it is not worth trying to be hip and taking Georgia State. And its not worth worrying about picking the right No. 12 seed to beat a No. 5 seed. Just take the teams that are favored to win by sportsbooks (lines here) and don’t fall behind.

2- Your champion depends on the size of your pool

Just like most events, there are usually no prizes in March Madness for finishing in 10th place. Your only goal is to finish first. So how does that impact your choice of teams?

Quite simply, it is a better idea to take riskier options as the size of your pool increases.

Over on Grantland, Ed Feng provides a useful example with the 2010 tournament, showing how Duke was undervalued, Kansas was overvalued, and the backers of Duke had much better chances of winning their pools as a result. Here’s his graph:

Screen Shot 2015-03-17 at 8.58.35 PM

The probability of winning in 2010 after backing Duke was roughly five times higher than after backing Kansas, despite the two starting the tournament with similar probabilities.

So, estimate the number of people in your pool, and vary your aggressiveness as a result.

3- You are not trying to pick games correctly, but trying to get more points than your competitors. 

An important March Madness strategy lies in identifying the teams you think your opponent is going to pick, and then picking the opposite teams.

I stole this from Jordan Sperber’s blog, which has some great insights on under and overvalued teams. Here’s a Table with 2015 Final 4 odds, using the expected values from Ken Pomeroy’s website and those chosen by the public.

Screen Shot 2015-03-17 at 8.55.39 PM

So, here are some Final 4 teams with decent 2015 probabilities that your opponents are not picking: Villanova, Gonzaga, Utah, Arizona

And here are some Final 4 teams with decent 2015 probabilities that your opponents are picking way too often: Duke, Louisville, Wisconsin, North Carolina

So, what does this all mean?

It looks like about 1 in 4 sheets on ESPN.com has Kentucky and Duke in the final game. My top advice is that you are not one of these people in 2015.

Of course, this is not because I don’t think UK and Duke can reach the finals; instead, it is because even if they do reach the finals, you still probably won’t win your pool. There’s just too much competition in picking those two teams.

In larger pools, Arizona, Villanova, and Virginia, are all being backed by small fractions of the public (no more than 6%, according to ESPN), but each have between a 9% and a 13% chance of winning the title, according to many of the sites that run through bracket simulations. One of those teams appears much more likely to give you a chance at capturing first place.

Memo to the NHL: Shootouts are still a problem

Last offseason, officials from the National Hockey League met to discuss possible ways to limit the number of contests decided by a shootout. Shootouts, while entertaining, can have a disproportionately large effect on league standings, given that, by and large, these outcomes are random.

One of the minor ‘tweaks‘ to the former rules was that for the 2014-2015 season, overtime sessions would mimic the second-period in that teams would be forced to make longer line changes during overtime. The longer line changes, in principal, would create more scoring opportunities during the overtime session, which would thus limit the number of games that eventually ended in a shootout.

Some solid math supported this idea, too. Rink Stats‘ Stephen Pettigrew, for example, used differences in scoring rates from the first and second periods to estimate that around 35% fewer games would reach the shootout, comparing overtime rules with the longer line-change to the shorter line-change. That would be a massive reduction, and one that most hockey fans would be happy to see.

Alas, not much has changed during the 2014-15 season, despite the rule change. Overall, roughly 14% of NHL games have been decided by a shooutout this year, which is similar to past years.

Here are the overall percentages of games that have reached a shootout, by season:

so2

Next, among OT games, here are the percentages of games that reached a shootout.

so1

Again, pretty similar rates to previous year.  Just more than half (56%) of games reaching OT have subsequently reached a shootout in 2015, which is just a slight tick below the average rate from 2006 through 2014 (57%).

If the league wants to limit the effect of the shootout on league standings, the change is simple – prevent teams from wanting to get to overtime in the first place. Create an incentive for teams to win in regulation – like, say, a three point rule – and they’ll stop playing overtime so often. More on this to follow, but feel free to read more in this article here.

Sunday is a day for relaxing. Just ask the NHL

A few weeks back, folks in the Harvard Sports Analysis Collective (HSAC) looked at the shooting rates of NBA players based on whether or not the game fell on a Sunday (link).  While the evidence was mostly inconclusive in the HSAC study, I thought it was a good idea to check for whether or not similar results exist during Sunday NHL games.

Piggybacking on recent work from Carnegie Mellon’s and War-on-Ice’s Sam Ventura, I calculated the expected goals in each NHL contest between the start of the 2005 season and February 1, 2015, which is done using a logistic regression model based on the type of shot, shot location, and shot distance. This was done based on both the game’s minute, score, and the day of week in which the game was being played. Here, note that I’m using expected goals, which Sam and other folks have shown to be as if not more predictive of future goals than traditional statistics like goals or shots.

Looking at the first period, I plotted the expected goal rate per each tied-game minute (Minute 1 through 20), using a different color for games that occurred on a Sunday and those occurring on all other days. I also included 95% confidence bands for the loess smoother, although I should note that these are the default bands in the ggplot2 package, and might not fully account for the variability in each rate.

sundaynhl

At any rate, the difference between Sundays and all other days of the week was much larger than I anticipated. It does seem plausible that Sunday NHL games take a less aggressive tone, at least in terms of expected offensive output.

And here’s a plot of Sunday compared to every other weekday:

NHL daily

Of course, its impossible to tell exactly what drives potential drops in expected goals based on Sunday. For example, it could simply be that Sunday games tend to be the second of back-to-backs for one or both participating teams, which might cause tired legs. It could also be that Sunday games tend to feature matinees games, although the same could be said for Saturday (Note: reader Justen Fox notes that nearly all Sunday games are played before nighttime, compared to only about a quarter of Saturday games)

In any case, I thought that this was an interesting result worth sharing.

Also, I’m working on a more extensive and related project that should finish up within the month, and at that point, I’ll happily share code.

Here’s what I did on the first day of statistics class

Ample literature has gone into what teachers should do on the first day of class. Should they do an ice-breaker? Dive right into notes? Review a few example questions to motivate the course?

I don’t really have control groups to use as a comparison, but I think these two activities were helpful and engaging, and I figured it was worth passing along.

Introduction to Statistics (Intro level, undergrad)

I stole this one from Gelman and Glickman‘s “Demonstrations for Introductory Probabiity and Statistics.”

When the students come in, I split the course (appx 25 students) into eight groups. Each group was given a sheet of paper with a picture on it, and the groups were tasked with identifying the age of the subject in question. I had some fun coming up with the pictures – I went back to the 90’s with T-boz from TLC and Javy Lopez of the Atlanta Braves, added an impossibly 52 years-of-age Sheryl Crow, and, my personal favorite, Flo from the Progressive commercial.

Screen Shot 2015-01-20 at 9.38.39 PM

How old do you think Flo is?

Anyways, taking approximately one minute with each picture, the groups, without knowing about it, started talking confidence intervals (“No way she’s not between 40 and 50″) and point estimates (“My best guess is 45″). That was good. At the end, I collected the pictures, and revealed the ages.

Next, using one picture as an example, we made a table of some of the class guesses, and went through and calculated estimated errors for each group. This led to the obvious discussion of what metric would be useful for comparing group accuracy. For example, taking the average error would be problematic because the negative’s and positive’s would cancel out. The class settled on mean absolute error, but we also discussed mean squared error, and some form of a relative error, which would account for the fact that there might be more error using older subjects.

If you were wondering, most groups had an average absolute error of about 5 or 6 years (the winning group was less than 3), and Flo is 44 years old.

Probability and Statistics (upper level)

Rocks-paper-scissors (RPS) is one of my favorite games, and so I split the class up into pairs to play a best-of-20 RPS series (approximately n = 25 students). After each throw, each student was responsible for writing down their choice during that turn (R, P, or H). At the end of the series, we now had a set of n sequences, with each element of the sequence drawn from the letters ‘R’, ‘P’, and ‘H.’

Next, we did some quick analysis of these sequences. While there were several metrics possibly of interest, I focused on the longest sequence of a consecutive throw. For example, if the sequence went:

R, R, P, P, P, H, P, S

the longest sequence would have been three (3 consecutive papers).

Next, we compared the distribution of the class to what would have occurred had the throws been randomly chosen. This was easy to do using R/Rstudio.


set.seed(100000)
omega<-c("Rocks","Paper","Scissors")
x<-sample(omega,20,replace=TRUE)
x
rle(x)
max(rle(x)$lengths)
maxSeq<-NULL
for (i in 1:10000){
maxSeq[i]<-max(rle(sample(omega,20,replace=TRUE))$lengths)
}
hist(maxSeq,col="blue",main="Histogram of maximum consecutive throws")
table(maxSeq)/10000

And here’s the resulting histogram, which represents the frequency of maximum consecutive throws in 10,000 randomly drawn sequences of 20 RPS throws.

Screen Shot 2015-01-20 at 9.55.18 PM

The mode of the histogram is 3, and the class was quick to pick up on the fact that, if throws were randomly drawn, we would have expected more maximum’s of 4 than 2. Of course, in our class of about 25 students, there were many more 2’s than 4’s. Such evidence is not surprising, however, given that human nature tends to to underestimate the true randomness of numbers (for examples, Wiki has a few). This helps to confirm it, and also gives students the chance to meet another, learn about Monte Carlo techniques, and gain a quick introduction to R/RStudio, all while playing Rocks-Paper-Scissors. I also ended by showing the class this New York Times RPS game, in which you can play the computer, either on ‘novice’ or ‘expert’ mode.

Obviously, more went into the courses after these two activities, but I think I’ll go back to them in the future. It was certainly more fun than starting with the syllabus.

Everyone asked for Tebow. So, here he is

My recent article for FiveThirtyEight on QBR segmentation using density curves apparently had one big missing component, and his name is Tim Tebow.

After receiving multiple tweets, Facebook comments, and emails, each of which asked where Tebow was, I figured I should give the people what they want.

Tebow has 15 games to his name, and while that’s not a great sample size to learn much about, here are the QB’s with the most similar distributional curves to Tebow: Dan Orlovsky & Seneca Wallace.

And here’s the set of density curves for that threesome. Lots of mediocre games (QBR around 50-60) from this group, and while each QB only had a few terrible games, there were very few, if any, outstanding ones.

Tebow

Going beyond the mean to analyze QB performance

A few months ago, my friend & writer Noah Davis asked me a question that was bothering him. I’ll paraphrase, but this was roughly what he said:

Does consistency matter for quarterbacks? Like would you rather have an average QB who is never really great, or a good QB who occasionally sucks?

Well, fortunately there are ways to measure performance consistency, and one of them is standard deviation. QB’s with high standard deviations in their game-by-game metrics are the less consistent ones, and visa versa.

But perhaps an even better idea than just measuring each QB’s standard deviation of a certain metric is to compare the overall distribution of performance. This can be done using many tools, and we chose density curves, which are just rough approximations of the smoothed lines that one would fit over a histogram.

The culmination of our project into looking at QB density curves is summarized here on FiveThirtyEight. In addition, I created this Shiny app using the R statistical software, which can allow users to (i) graph the density curves of their quarterbacks, (ii) contrast any given QB’s home and away performances, and (iii) to identify, for any given QB, the three other players with the closest curves. We chose ESPN’s Total QBR as our metric of interest.

**********

There are a few finer points to the analysis, however, and I figured it was worth describing them in case any readers were interested or had ideas for future work.

First, I considered a few options for grouping the players, including model based clustering (see this recent post by Brian Mills on pitcher groupings). But the problem I kept running into with a model based approach is that it assumes that the underlying distribution behind the data is Normal. Given the strange shapes in QB performance (including bimodal curves, and curves that were strongly skewed right and left), this approach didn’t feel comfortable.

We settled on using K-means clustering (KMC), settling on using k = 10, which I think did a decent job of grouping players with similar curves. We tried anywhere from k = 2 to 15, and then checked some of the within and between group metrics for each k. We found the best performance using between k = 8 and k = 10, as judged by the elbow method, and the curves looked much easier to interpret with k = 10. Beyond 10 clusters, there was too good of a chance that a cluster ended up with only one quarterback in it, which did not seem ideal.

There are a few issues with KMC, however, one of which is that players can jump back and forth depending on the algorithm and the inputs. Worse, its difficult to measure error. For example, Tom Brady ended up in a cluster with Aaron Rodgers in nearly every one of, if not all, of our iterations. However, Brady was also matched up with Drew Brees sometimes, who, when not matched with the Brady, Manning, and Rodgers group (the ‘Elites’), was always with Matt Ryan. As a result, cluster membership isn’t fixed. Once we had finalized using a k of 10, we ran several iterations of the clustering, and chose the one with the highest within-cluster similarity of those grouped.

That said, part of the reason for creating the app was to allow people to compare anyone they wanted to, without having to rely on the clustering. For comparing one quarterback to all of his peers, distributional similarity can be judged in a few ways. I used Kolmogorov-Smirnov pairwise tests of distributional equality, which are preferred over, for example, two-sample t-tests or Mann-Whitney tests, because the former are sensitive to both distribution center and shape. This is a good thing for us, because quarterbacks with bimodal shapes (Brett Favre), which signify sets of performances that are both really good and really bad, are matched to other ones with bimodal shapes (e.g., Michael Vick).