On Brains and Football
There are many candidates for the best visual display of quantitative information. But how about a prize for worst display of information? Call it the anti-Tufte. There has been some competition of late. The graph can’t be merely misleading, or distracting. That’s too darn easy! A really bad display has several characteristics: (1) it has to overstate the certainty of the underlying data; and (2) by using pictures, it must reinforce our biases. A recent example is the Obama Cabinet/Private Experience graphic.
Here’s another example I’ve been thinking about lately: the claim that offensive linemen are smarter than other players on the field. Think about it. Doesn’t it just feel true? And here’s the graph that popularized the claim:
Ben Fry, a smart fella by all accounts, created the graph. The size of the circles represent mean scores by position on the Wonderlic, a 12 minute, 50-question, intelligence test which players take during the combine before the NFL draft. This graphic is often deployed to support the cliché that players closer to the ball have to be smarter. But closer examination has led me to believe that the claim – and the graph – are bunk. And bunk of a particular sort: misleading empiricism of the sort that reinforces racial stereotypes.
The figure derives the mean scores by position from Wikipedia, which cites a 1985 (!!) book. We have no sense of the population sizes for the means. More recent positional breakdowns provide different statistics. For example, this page purports to release the Wonderlic scores for some entrants in the 2009 draft. Centers scored highest (above QBs), but kickers and punters (who aren’t particularly close to the ball) were next, while offensive guards, fullbacks, outside linebacks and long-snappers scored about the same There’s no way to know, so far as I can see, what a significantly different score looks like without more information about the sizes of each cell. That is, I don’t think we can know that a score of a 24 is meaningfully different from a score of a 23 in a population this small. And since the only scores we have are means, the size of outlier effects is hidden.
A second criticism relates to selection. What possible mechanism would make the players “closer to the ball” “smarter” than those farther from it? The baseline hypothesis is that linemen need superior decision making skills: quick judgments about blitzes, better memory of the intricacies of the plays and blocking schemes, etc. But this seems hard to swallow: doesn’t the running back need those exact skills? And why does the punter, whose job seems pretty one-off. And the operation of this idea is weird, however popular it might be: the idea seems to be that there’s an undifferentiated mass of football players in pop-warner leagues. Some are smarter than others. The smarter ones get pushed to the o-line and the QB position; the less smart ones are pushed to become little wide-receivers. Then, what happens? In a feat of unprecedented lamarckian adaptation, the little o-linemen become huge o-linemen; the little wide receivers become lithe, tall, or very, very fast.
Or maybe the selection operates over time in a different way: dumb o-linemen, notwithstanding their physical characteristics, are selected out of the football tournament; wide-receivers are encouraged to be stupid. You might have thought football was a game about bashing the other guy, being a freakish physical specimen, and being willing to sacrifice your body and brain for the team. On this hypothesis, it isn’t: it’s a selection process for decisionmaking skills. Look, I guess this is possible, but it seems quite unlikely
So I don’t understand the selection story. And that makes me doubt whether the Wonderlic test measures anything of relevance to football, as opposed to a particular kind of test taking skill which might result in a persistent bias in scores based on position. Of course, I’m not the first person to make either of these observations.
Many teams no longer rely on the Wonderlic since it appears to do a bad job of measuring performance. The teams that continue to rely on the test appear, to me, to be seeking empirical justification for some kind of gut decision making about character.
With respect to bias, start first with the observation that offensive linemen are disproportionately white in a league that is mostly (70% or so) composed of African-American players. The Wonderlic, unfortunately, produces scores that are racially skewed. As law professor Michael McCann argues in Using Social Psychology to Evaluate Race and Law in Sports
“Research by Stanford University social psychologist Claude Steele on “stereotype threat” may help explain the discrepancy in performance [between African-American and White players]. Broadly speaking, “stereotype threat” refers to a self-fulfilling prophecy where “anything one does or any of one’s features that conform to [the stereotype] make the stereotype more plausible as a self-characterization in the eyes of others, and perhaps even in one’s own eyes.” … When the stereotype concerns intelligence, the perceived threat of confirming the negative stereotype about one’s own group may be so strong that it diminishes performance. Applying stereotype threat to the test-taking scenario, the general idea is that before a person takes an examination, he or she has certain preconceived notions about likely success on the test. If this person believes that he or she will do poorly on the test, perhaps because this person is a member of group which tends to do poorly on that test, then, according to Steele and others, he or she will likely feel more anxious while taking that test and, unsurprisingly, be less likely to do well.
Steele and his co-authors have found that stereotype threat negatively impacts African- Americans taking standardized tests. Stereotype threat affects other groups as well. For instance, it has been repeatedly observed when women take math and science tests. In one such study, a math test was administered to one group of women and men after they were informed that the test had produced gender-differentiated results in the past; meanwhile, the control group was told that the same test had produced no gender-based differences. The results were unsurprising: women in the informed group “significantly underperformed in relation to equally qualified men” whereas uninformed women performed statistically equally with the men. White men are also known to suffer from stereotype threat. Joshua Aronson, a psychologist at New York University, and his co-authors conducted a similar study of math and science performance, this time among Caucasian and Asian men. Aronson found that when the white male test-takers were reminded of the stereotype that Asian-Americans tend to outperform Caucasians on math and science tests, the Caucasian males subsequently performed worse than their uninformed counterparts in the next room.”
Similarly, Jason Chung argues
“A study by David Chan et al. noted that African-Americans adults in general have a lower regard in general for aptitude tests than their Caucasian counterparts which caused them to score lower on the tests. After motivation was given to black test-takers their scores improved until there was no discernible difference between black test scores and white test scores.”
Thus, even if Wonderlic scores are measurably different by position, which I doubt, those differences may represent racially-based differences in test-taking performance, not intelligence or decision-making. This explanation, of course, begs the question of why there are racially-based differences between positions. (Many commentators suggest that so called “racial stacking” may explain these differences, though that explanation suggests that selection operates in a way I continue to find implausible.)
In any event, I wanted to end this post with a few observations. The first is that graphics can be both misleading and sticky, and you should be careful when you see a figure that confirms your priors. Second, whatever the correlation between positions and intelligence, it’s clear that being an offensive lineman is a job that irrevocably destroys your brain. As Malcolm Gladwell recently wrote:
Much of the attention in the football world, in the past few years, has been on concussions—on diagnosing, managing, and preventing them—and on figuring out how many concussions a player can have before he should call it quits. But a football player’s real issue isn’t simply with repetitive concussive trauma. It is, as the concussion specialist Robert Cantu argues, with repetitive subconcussive trauma. It’s not just the handful of big hits that matter. It’s lots of little hits, too.
That’s why, Cantu says, so many of the ex-players who have been given a diagnosis of C.T.E. [chronic traumatic encephalopathy] were linemen: line play lends itself to lots of little hits. The HITS data suggest that, in an average football season, a lineman could get struck in the head a thousand times, which means that a ten-year N.F.L. veteran, when you bring in his college and high-school playing days, could well have been hit in the head eighteen thousand times: that’s thousands of jarring blows that shake the brain from front to back and side to side, stretching and weakening and tearing the connections among nerve cells, and making the brain increasingly vulnerable to long-term damage. People with C.T.E., Cantu says, “aren’t necessarily people with a high, recognized concussion history. But they are individuals who collided heads on every play—repetitively doing this, year after year, under levels that were tolerable for them to continue to play.”
In sum: offensive linemen are not (necessarily) smarter than anyone else on the field (though they may be better test-takers) but they are getting less smart faster than everyone else.
(H/T: Readers CDP and Ringo)