One of the reasons I love the application of statistics to sports is the unique ways in which sports can help us better understand human behavior.
I was thus excited to read** a working paper out of LSU’s Ozkan Eren and Naci Mocan that looked at how the sentencing of Louisiana judges in juvenile court varied given the performance of the state’s favorite football team, the LSU Tigers. The paper can be read here – alternatively, check out SB Nation’s summary here.
Using regression-based approaches on court decisions between 1996 and 2012, the authors write:
We show that upset losses of the LSU football team increase disposition (sentence) length imposed by judges, and that this effect persists throughout the work week following a Saturday game. On the other hand, losses of games that were expected to be close contests ex-ante, as well as upset wins have no impact. We also find that judges’ reaction, triggered by an upset loss, is more pronounced after more important games (when LSU was ranked in top-10).
If true, such findings would and should have implications for our judicial systems. It would suggest that entities that are tasked with impartial behavior (judges) let their emotions get the best of them, even when based on a college football game.
To the best of my knowledge, this paper has not formally passed peer review, so I’ll give the authors the benefit of the doubt as far as working out any final kinks. That in mind, a few things stood out when reading that don’t pass the smell test.
- Arbitrary cutpoints
The authors take continuous data (point spread, categorized as -4 or less, -3.5 to 3.5, or 4 ore more) and turn it into categorical data (game type: expected win, close, expected loss) in order to run their statistical models. This process requires unjustifiable assumptions from a regression model standpoint, and can yield both positive and negative associations, among other issues. As an example of what can go wrong in sports when categorizing continuous data, read one here.
To the authors credit, they write “we also experimented with different cutoff values (e.g., -3 and 3) to describe unexpected college football game outcomes. The results remained intact.”
In my opinion, this is insufficient. There’s no intuitive football reason to group games based on point spread, at any point spread. A 3.5-point favorite and a 4-point favorite are nearly identical, particularly with respect to how judges may view their teams’ performance. Worse, LSU has historically been an excellent team. This entails that in the bin of “expected wins” lie games where LSU has been anywhere from a 60% favorite to a 99% favorite. Treating LSU hosting Georgia the same way you treat LSU hosting Chattanooga makes no sense from a football perspective — so why would you do the same in your regression model?
There are more appropriate models to handle the relationships the authors are trying to uncover. Fortunately, the authors give us a taste of one such approach. Unfortunately, they then use
2. 90% confidence intervals
The LSU judging paper is filled with tables of regression coefficients, nearly all of which stem from the above categorization of point spread.
In the final section, the authors nicely provide a model that does not categorize point spread, and instead use a third-order (cubic) polynomial term for point spread. Although not perfect (Ex: how do we know the association is cubic and not something else?), at least this model can allow us to explore how point-spread is linked to sentencing across a wider range of point-spreads. Here’s the corresponding chart.
This figure is a giant red flag.
First, the authors write, The effect of a loss on disposition length set by the judges is decreasing in the spread. Sure, the line is decreasing in spread, but that decrease is not significant. If it were, you’d see much tighter error bounds and/or a steeper slope.
Second, across nearly all point-spreads shown, there is not significant evidence that there is an increase in disposition length when LSU loses. This is shown by the lower red line overlapping with 0. Statistically, the win versus loss comparison is indistinguishable from noise for most of the chart (the authors admit as such).
Third, in showing this chart, the authors unknowingly call into question their categorization of point spread. Above, there are no obvious changes in sentencing length around -4 or 4, which while not surprising, does highlight that perhaps such grouping was not sound to begin with.
Finally, and perhaps most disappointingly, the authors use 90% intervals. Had they used the standard (though admittedly not perfect) significance cutoff of 5%, the corresponding 95% intervals would be about 20% wider. With that extra width, the confidence intervals would overlap with 0 throughout the entire figure, and the entire basis of their findings — that sentences are longer after upset losses — would no longer hold.
It’s certainly possible that losses lead to harsher sentences, and I applaud the authors for an intuitive idea, but, for now, evidence appears limited at best.
**Note: I first read the article and started this blog post about a year ago.