A few months ago, my friend & writer Noah Davis asked me a question that was bothering him. I’ll paraphrase, but this was roughly what he said:

*Does consistency matter for quarterbacks? Like would you rather have an average QB who is never really great, or a good QB who occasionally sucks?*

Well, fortunately there are ways to measure performance consistency, and one of them is standard deviation. QB’s with high standard deviations in their game-by-game metrics are the less consistent ones, and visa versa.

But perhaps an even better idea than just measuring each QB’s standard deviation of a certain metric is to compare the overall distribution of performance. This can be done using many tools, and we chose density curves, which are just rough approximations of the smoothed lines that one would fit over a histogram.

The culmination of our project into looking at QB density curves is summarized here on FiveThirtyEight. In addition, I created this Shiny app using the R statistical software, which can allow users to (i) graph the density curves of their quarterbacks, (ii) contrast any given QB’s home and away performances, and (iii) to identify, for any given QB, the three other players with the closest curves. We chose ESPN’s Total QBR as our metric of interest.

**********

There are a few finer points to the analysis, however, and I figured it was worth describing them in case any readers were interested or had ideas for future work.

First, I considered a few options for grouping the players, including model based clustering (see this recent post by Brian Mills on pitcher groupings). But the problem I kept running into with a model based approach is that it assumes that the underlying distribution behind the data is Normal. Given the strange shapes in QB performance (including bimodal curves, and curves that were strongly skewed right and left), this approach didn’t feel comfortable.

We settled on using K-means clustering (KMC), settling on using *k* = 10, which I think did a decent job of grouping players with similar curves. We tried anywhere from *k* = 2 to 15, and then checked some of the within and between group metrics for each *k*. We found the best performance using between *k* = 8 and *k = *10, as judged by the elbow method, and the curves looked much easier to interpret with *k = *10. Beyond 10 clusters, there was too good of a chance that a cluster ended up with only one quarterback in it, which did not seem ideal.

There are a few issues with KMC, however, one of which is that players can jump back and forth depending on the algorithm and the inputs. Worse, its difficult to measure error. For example, Tom Brady ended up in a cluster with Aaron Rodgers in nearly every one of, if not all, of our iterations. However, Brady was also matched up with Drew Brees sometimes, who, when not matched with the Brady, Manning, and Rodgers group (the ‘Elites’), was always with Matt Ryan. As a result, cluster membership isn’t fixed. Once we had finalized using a *k* of 10, we ran several iterations of the clustering, and chose the one with the highest within-cluster similarity of those grouped.

That said, part of the reason for creating the app was to allow people to compare anyone they wanted to, without having to rely on the clustering. For comparing one quarterback to all of his peers, distributional similarity can be judged in a few ways. I used Kolmogorov-Smirnov pairwise tests of distributional equality, which are preferred over, for example, two-sample *t*-tests or Mann-Whitney tests, because the former are sensitive to both distribution center *and* shape. This is a good thing for us, because quarterbacks with bimodal shapes (Brett Favre), which signify sets of performances that are both really good and really bad, are matched to other ones with bimodal shapes (e.g., Michael Vick).

Reblogged this on Stats in the Wild.