Worst statistics article ever?
On what’s currently an extremely sensitive topic – the causes of gun violence – the Washington Post’s Max Fisher pens what is quite possibly the most disappointing and naive article I’ve ever read on statistics. The link is here:
I’ll share the article’s main issues below, but the damage has already been done. This story made it to the headline page of yahoo.com, where 2300 people have commented – not just read, but commented on – this issue.
Issue 1: The headline: The headline states that this “Comparison Suggests That There’s no Link.” Classic introductory statistics error. Statistics can only provide evidence of a link. Thus, even if the rest of this “Comparison” was well done, all we could do claim based on this study is note there wasn’t enough evidence that an association between video games and violence exists. We can not, however, suggest no link. Big difference.
Issue 2: Sample Size: Any statistics student can tell you after week two that a sample size of 10 is probably too small to make any strong claims.
Issue 3: Sample Choice: The countries chosen for this comparison are the ones with the largest video game consumption. This seems quite arbitrary. There are about 15 other ways to choose a sample, each of which would yield a scatter plot with a different trend line and different story.
Issue 4: Outliers: Again, Intro Stats 101. China, South Korea and Netherlands are both possible outliers with regards to video game spending. As a result, regression lines – and correlation coefficients – are significantly changed due to the effect of these points. Remove South Korea and the Netherlands – and their different laws with regards to gun control – and suddenly you might have an association!
Issue 5: The graphs: Holy cow. This graph is particularly nauseating. First, a linear trend line is not the only way to identify an association. For example, a U-shape or another curved relationship is plausible in many scatter plots. Second, even if the trend was linear, for the association to be significant it would certainly not have to take the form of the graph in question. Any slight increase over x-axis creates the possibility of a positive correlation, not necessarily one of the magnitude used in this example.
Issue 6: Variable Choice: Obviously, the intent is to show the association – or lack thereof – between violent video games and gun violence. Unfortunately, the video game variable is (1) Not violent video games and (2) not measured per capita. The effect of not measuring violent video game consumption is particularly important. For example, if I wanted to know how the effect of smoking on lung cancer, its more important for me to measure this association via cigarette usage as opposed to all forms of smoking.
Max, you probably won’t read this, but if you do, please retract. Thousands of readers are now thinking to themselves “well, video games don’t cause violence,” and worse yet, there’s probably a parent or two who will let their sons or daughters continue to play violent video games. Better research has been done here, here, and here, the general conclusions of which suggest that video games are associated with increased aggressiveness and physiological desensitization to violence.
I totally agree! On the bright side, you’ve identified a shining example of what not to do, which would fit in well in an Intro Stats class.
Issue 1: Whilst it may not be strictly by-the-book to phrase it that way, you cannot attack a journalist for making their headlines accessible to their readers. Also, this does provide evidence that there is no correlation between the two, as conclusions must be drawn from the study, and if they cannot provide evidence of a correlation, then they must conclude there isn’t one. If studies could only prove what wasn’t already accepted or be completely invalid, we’d get nowhere.
Issue 2: Whist 10 is on the surface a small sample, it contains some of the most populous and large countries in the world. This sample encompasses over a billion people, one seventh of the earth’s population, which is clearly large enough.
Issue 3: Here I believe you start to deliberately reference things you know the average reader has no knowledge of and therefore cannot refute, in absence of a point, hence the “15 other ways to choose a sample” line, clearly referencing some methods you have learned in statistics class, which, if they are to work irrespective of the data they are being used to find, have to be independent of it. The reason the countries with the highest video game spending were chosen is because they are countries with populations who play enough games so that, if there was a correlation, it would be expressed there. Many poorer countries don’t have the policing to prevent gun crime, and have populaces unable to afford games, and so are worthless as samples. The countries chosen have efficient policing systems, and a game-plying population.
Issue 4: It is interesting you go from complaining about the way the samples are chosen to outliers. Again here you use terms like “regression lines” and “correlation coefficient” when there are more accessible ways to phrase that, without the terminology, which you use as a placeholder for a solid argument. Also your assertion that there “might” be a correlation if you removed the outliers you named is false, as you seemed unwilling to admit.
Issue 5: You talk about curved shapes being correlations in a general way here, merely repeating facts you know as opposed to anything related to the actual graph. You mention curves being potential trends on scatter plots, when you clearly know very well that only a linear trend would provide evidence for the correlation being tested. The “On many scatter plots” line further confirms my belief that you know this, as you don’t directly link it to the data being talked about. Also, your point about the correlation not having to be “of that magnitude” is invalid, as it would take many new points, all supporting it, to provide any correlation whatsoever, and your line “over the x-axis” is again being deliberately obscure.
Issue 6: Firstly, the majority of successful video games do feature violence in some way, and the author does acknowledge that the study may not cater for different culture’s consumption of different games, so you cannot attack him there. Secondly, I’m confused as to your “per capita” statement. The measurement is per capita, and you are again sounding like you’re using big words to try and confuse your audience. Thirdly, your statement about the smoking makes no sense whatsoever. if you were studying the effects of smoking, you wouldn’t just look at cigarettes, you’d look at as many forms as possible.
Lopez, you probably won’t read this, and if you do, I highly doubt you (or anyone else) got this far, but if you do, please retract. I understand you are afraid that this article being spread is putting people at risk and destroying lives, but that does not mean you should take to the internet and try to trick as many people onto your side as possible. You should instead present your views, and let them decide for themselves, of their own volition. Otherwise we risk ending up in a world of each other’s propaganda, not viewing people as people but as a means to an end, and in an effort to protect others we strip them of their right to freedom of thought.
Impressive dedication reading a blog post thats five years old (and one of my first ever!), and thanks for the lengthy set of comments.
I’ll certainly admit that a few of the arguments I could have described (and/or dropped). That said, the initial Washington Post article combines poor statistical approaches, interpretations, and visualizations. Not sure that is defensible.
Thanks for taking the time to read my comment and write a reply. I’ve realized I jumped straight to a negative conclusion with only the most basic of evidence, and I’m sorry for any disservice that does you and your work. I’m afraid I rather let my cynicism get the better of me – I couldn’t think up any reason for statements such as the U-shaped correlation, and I let the assumption I made take over.
And, thinking about it, although it cites a “10 year comparison” it seems they just looked at game revenue and crime records over 10 years, as opposed to any kind of study being done. That would explain why it counted all games, not just violent ones (he alludes to this without actually saying it). The sources are cited as “The United Nations Office of Drugs and Crime” and “Others”. Also the “how it actually looks” line is just a travesty, so much so that it kind of looks like he’s biased.