Follow-up post, Sloan Sports Analytics Conference paper contest

Several people expressed interest in Friday’s post about the Sloan Sports Analytics Conference. (click here to read Part I)

Thanks for reading, and here I provide a few follow up notes that I found interesting.

1) Over on his website The Spread, I thought Trey Causey wrote a great piece on reproducibility, parts of which I’ll share below.  I encourage you to read his piece, as it made my point better than I could.

When research is not reproducible, it is difficult to verify its veracity. How many models did the authors estimate before arriving at the one in the paper? Are the results robust to different model specifications? What happens when you include or exclude various variables? These are unanswerable questions.
Sports organizations want advanced analytics capacity to make better decisions and get an edge over their competition. The way to get an edge is not to have someone write a proprietary paper and then say ‘trust me on the findings.’ That’s how bad decisions are made.

2a) While initially no representatives from Sloan or MIT reached out (not that I expected them too), poster presenters received an email on Tuesday afternoon informing us that we would be allowed to come, with one free ticket per poster. Great news!

2b) Also, I spoke with both Kirk Goldsberry (phone) and John Ezekowitz (email, twitter). I appreciated both of them taking the time to share their views.

Looking back, my intent was to share my story about not getting to present a poster in person, and that, for the most part, had nothing to do with Kirk or John or their research. As a result, it would have been prudent for me to have contacted both of them before writing about their roles, and I think their input would have been valuable towards my post. Both of them had a right to hear me out, and defend themselves, where appropriate, and I regret not giving them such an opportunity. Lesson learned.

3) A few points still stand:

-The Sloan RP process was, at best, unprofessional

-Valuing reproducible research would greatly improve the RP contest and help the conference’s overall academic growth (but does the conference care?)

-RP submissions should be blinded*, and field experts should be used where appropriate

*That being said, I don’t believe that the ‘connections,’ as I called it in my blog, influenced Kirk’s successful paper in 2014, or his follow-up post on Grantland. This type of praise reinforced such a belief:


4) Over e-mail, John and his co-authors apologized for their literature review “not being as complete as it could’ve been.” The group also touched base with previous hot-hand authors regarding the oversight. John and his co-authors agreed to cite such papers in future disseminations, and indicated that they were already citing at least one of them in a current, longer draft.

5) Lastly, admitting for a large amount of selection bias here – the people who contacted me may differ from those who didn’t with respect to their opinions on my post – here’s a summary of what others had to say.

From current college professors:

Members of the media:



  1. Far more concerning than someone’s reluctance to share proprietary data is someone who writes a paper with a totally unwarranted conclusion:

    This paper shouldn’t be anywhere near this conference. You essentially examined a tiny dataset and hunted for anomalies at the team level. When one looks for such anomalies, one can almost always find them. Any remotely seasoned data scientist, or even a decent sports bettor, knows how dangerous it is to infer causality/intent from such data-mining. Yet you insist that it is “easy to identify” certain teams’ efforts of “manipulation”. (Side note: The “Suspect” teams were determined to be suspect after looking at your data? If you only have one dataset for both model building and model testing, how could they not be?).

    In short, to take data-mined anomalies and assume their existence is 100% intentional is absurd, at best. In your example, the Devils clearly “manipulated” the scoring system because a whopping 3-4 more games went to OT than you expected? WTF.

    1. Hi Dustin,

      Thanks for reading, although I’m not sure I agree with some of your comments. See below.

      “You essentially examined a tiny dataset”.

      I examined every NHL game since 1997 to start, and models for the follow up analysis consisted of at least 750 games. Team specific models included more than 1000 games for each team, too. Those are tiny datasets?

      “When one looks for such anomalies, one can almost always find them.”

      I looked for one anomoly, and I found it. Then I looked for that anomoly among each of the 30 teams. That’s unwarranted? If I were looking at all hospitals in Massachusetts and found something fishy, wouldn’t the next obvious step be to look at each hospital?

      Of course, I admit this obviously creates a multiple testing problem, because if you look at 30 teams, you are bound to find one or two with significant deviations in behavior. With the NHL, however, years prior to 2005 provide the perfect comparison group, and none in this sample showed the same effects. Also, you can adjust for multiple testing (not shown, as we had but 6 pages), and the top few teams remain significant.

      “Yet you insist that it is “easy to identify” certain teams’ efforts of “manipulation”.

      Yes, I’d argue that looking at the Figure in the back makes it obvious which teams are going to OT more often against nonconference opponents. Maybe I did a poor job at making this graph. In any case, for the poster at Sloan, I updated it and hopefully behaviors are even more evident. Also, I’d call picking and choosing when to play for OT manipulating the system.

      “Side note: The “Suspect” teams were determined to be suspect after looking at your data? If you only have one dataset for both model building and model testing, how could they not be?”

      These were two different steps, two different data sets, and two different hypotheses.

      “In short, to take data-mined anomalies and assume their existence is 100% intentional is absurd, at best. In your example, the Devils clearly “manipulated” the scoring system because a whopping 3-4 more games went to OT than you expected? WTF.”

      You are just trolling now. Read the conclusion, where I use phrases like “we find evidence” and “appears to be,” not ones like “100% intentional.” Also, the Devils season was 6-7 more OT games which would have occurred by chance, not 3 or 4. And that’s out of 18 games.


  2. I’m trolling? How about we cut to the chase and take a look at complete sentences you wrote:

    “We find that certain NHL teams have varied their behavior based on league policy.”
    “Looking back, however, it is easy to identify how certain teams have benefited by taking advantage of the point system. For example, under first-year coach Peter DeBoer, fresh off a stint leading the Florida Panthers, the 2011-12 New Jersey Devils earned the Eastern Conference’s sixth seed in the postseason, which they eventually parlayed into a spot in the NHL finals. Hidden behind the Devils postseason run? The team might not have even made the playoffs if not for a bit of point system manipulation” You are outright accusing an individual of something you have no idea whether or not it occurred. [And yes, the dataset you used for concluding that the Devils were master manipulators was, in fact, “tiny,” by any rational measure. 18 eff’ing games decided 1 or 2 binary events. Unreal].

    You use the word “manipulate”/”manipulation” throughout the paper — most notably, in the sensationalistic title — yet provide no evidence that purposeful manipulation has occurred. This practice — taking mildly interesting data and twisting it into some unsupported, dramatic conclusion — is what most ails academic studies these days. Ironic indeed for a guy who so publicly complained about bad science getting publicity in this conference.

    1. You are taking anecdotal evidence which is used in the discussion section of the paper and are pretending it was my main point. That’s lame. I didn’t base the article on the Devils example, for obvious reasons.

      Is this a coincidence? Perhaps. Scientific? Of course not. But its the discussion section. I simply thought it was interesting that the coach of the team with the highest NC OT odds switched to the Devils, and the Devils suddenly went to OT in NC games, too. Take it out and the paper’s arguments still.

      Further, the definition of manipulate is to “handle or control, in a skillful manner.” My use of this word isn’t sensationalistic in this regard, as I feel that’s exactly what NHL teams did. Evidence suggested that while some teams recognized the benefits of going to OT in certain types of games, others didn’t.

      While skillful, I’d also argue it wasn’t what the NHL had in mind when it designed its policy.

      Also, what other evidence do you need other than knowing that some teams stop scoring in tie games?

      I actually interviewed players and an announcer about this, and they agreed.

  3. I cannot claim to know what your “main point” was in writing the paper. As opposed to, say, your “minor points,” your “irrelevant points,” or your “points from which you will back away when challenged.” You didn’t differentiate between valid points and random speculation, so the reader is left to assume you stand behind each and every thing you wrote. What I do know is that you willfully and aggressively titled your paper, “How certain NHL teams are manipulating the league’s point system”. If your title truly has nothing to do with your “main point”, it’s an incredibly bad title, an incredibly bad paper, or perhaps both. Are you somehow backing away from that claim now? If not, where is the actual evidence of team-level manipulation?

    To be clear, it wasn’t just in the Discussion section. Inexplicably, you wrote in the Introduction, “Our results provide evidence of a purposeful manipulation of the league’s point system by specific teams”. The key words here are: “results,” “evidence,” “purposeful,” “manipulation,” and “specific teams.” Observing anomalous data does not at all prove what caused it to occur, which you claim to know in both the paper title and its conclusion. I don’t know of any other way to state this.

    Finally, assuming your “main point” is simply that the teams play for ties in non-conference games, why would you publish it? Clearly, according to your own interviews, the industry was already fully aware of the effect. To me, this paper would have been more accurately titled, “A Confirmation of the NHL Point System’s Effect on Close-Game Strategy.” Interesting. Accurate. Not dramatic. Try it on for size next time.

  4. DMZ, no point in continuing this further. Please feel free to email further if you’d like to continue this conversation. I stand behind everything I wrote, and I never suggested otherwise. Just because I said “Take it out and the argument still stands” in my last post doesn’t mean I am backing away from what I wrote.

    The paper provides evidence that only a handful of teams have been savvy enough to work NHL’s system to gain a competitive advantage. Could something else have caused these results? Of course. Results could be due to chance (like you said, the word ‘evidence’ is used). However, if you have another theory why 150% more Florida Panthers non-conference games went to OT besides my point system theory, I’d love to hear it.

    In any case, thanks for reading, and I encourage you to come to the poster at Sloan.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s