On Brains and Football

There are many candidates for the best visual display of quantitative information.  But how about a prize for worst display of information?  Call it the anti-Tufte. There has been some competition of late.  The graph can’t be merely misleading, or distracting. That’s too darn easy! A really bad display has several characteristics: (1) it has to overstate the certainty of the underlying data; and (2) by using pictures, it must reinforce our biases.  A recent example is the Obama Cabinet/Private Experience graphic.

Here’s another example I’ve been thinking about lately: the claim that offensive linemen are smarter than other players on the field.  Think about it.  Doesn’t it just feel true?  And here’s the graph that popularized the claim:


Ben Fry, a smart fella by all accounts, created the graph.  The size of the circles represent mean scores by position on the Wonderlic, a 12 minute, 50-question, intelligence test which players take during the combine before the NFL draft.  This graphic is often deployed to support the cliché that players closer to the ball have to be smarter. But closer examination has led me to believe that the claim – and the graph – are bunk.  And bunk of a particular sort: misleading empiricism of the sort that reinforces racial stereotypes.

The figure derives the mean scores by position from Wikipedia, which cites a 1985 (!!) book. We have no sense of the population sizes for the means.  More recent positional breakdowns provide different statistics.  For example, this page purports to release the Wonderlic scores for some entrants in the 2009 draft.  Centers scored highest (above QBs), but kickers and punters (who aren’t particularly close to the ball) were next, while offensive guards, fullbacks, outside linebacks and long-snappers scored about the same  There’s no way to know, so far as I can see, what a significantly different score looks like without more information about the sizes of each cell.  That is, I don’t think we can know that a score of a 24 is meaningfully different from a score of a 23 in a population this small.  And since the only scores we have are means, the size of outlier effects is hidden.

A second criticism relates to selection.  What possible mechanism would make the players “closer to the ball” “smarter” than those farther from it?  The baseline hypothesis is that linemen need superior decision making skills: quick judgments about blitzes, better memory of the intricacies of the plays and blocking schemes, etc. But this seems hard to swallow: doesn’t the running back need those exact skills? And why does the punter, whose job seems pretty one-off.   And the operation of this idea is weird, however popular it might be: the idea seems to be that there’s an undifferentiated mass of football players in pop-warner leagues. Some are smarter than others.  The smarter ones get pushed to the o-line and the QB position; the less smart ones are pushed to become little wide-receivers. Then, what happens?  In a feat of unprecedented lamarckian  adaptation, the little o-linemen become huge o-linemen; the little wide receivers become lithe, tall, or very, very fast.

Big Mike

Big Mike

Or maybe the selection operates over time in a different way: dumb o-linemen, notwithstanding their physical characteristics, are selected out of the football tournament; wide-receivers are encouraged to be stupid.  You might have thought football was a game about bashing the other guy, being a freakish physical specimen, and being willing to sacrifice your body and brain for the team. On this hypothesis, it isn’t: it’s a selection process for decisionmaking skills.   Look, I guess this is possible, but it seems quite unlikely

So I don’t understand the selection story. And that makes me doubt whether the Wonderlic test measures anything of relevance to football, as opposed to a particular kind of test taking skill which might result in a persistent bias in scores based on position.  Of course, I’m not the first person to make either of these observations.

Many teams no longer rely on the Wonderlic since it appears to do a bad job of measuring performance.  The teams that continue to rely on the test appear, to me, to be seeking empirical justification for some kind of gut decision making about character.

With respect to bias, start first with the observation that offensive linemen are disproportionately white in a league that is mostly (70% or so) composed of African-American players.  The Wonderlic, unfortunately, produces scores that are racially skewed. As law professor Michael McCann argues in Using Social Psychology to Evaluate Race and Law in Sports

“Research by Stanford University social psychologist Claude Steele on “stereotype threat” may help explain the discrepancy in performance [between African-American and White players]. Broadly speaking, “stereotype threat” refers to a self-fulfilling prophecy where “anything one does or any of one’s features that conform to [the stereotype] make the stereotype more plausible as a self-characterization in the eyes of others, and perhaps even in one’s own eyes.”  … When the stereotype concerns intelligence, the perceived threat of confirming the negative stereotype about one’s own group may be so strong that it diminishes performance.  Applying stereotype threat to the test-taking scenario, the general idea is that before a person takes an examination, he or she has certain preconceived notions about likely success on the test. If this person believes that he or she will do poorly on the test, perhaps because this person is a member of group which tends to do poorly on that test, then, according to Steele and others, he or she will likely feel more anxious while taking that test and, unsurprisingly, be less likely to do well.

Steele and his co-authors have found that stereotype threat negatively impacts African- Americans taking standardized tests. Stereotype threat affects other groups as well. For instance, it has been repeatedly observed when women take math and science tests. In one such study, a math test was administered to one group of women and men after they were informed that the test had produced gender-differentiated results in the past; meanwhile, the control group was told that the same test had produced no gender-based differences. The results were unsurprising: women in the informed group “significantly underperformed in relation to equally qualified men” whereas uninformed women performed statistically equally with the men. White men are also known to suffer from stereotype threat. Joshua Aronson, a psychologist at New York University, and his co-authors conducted a similar study of math and science performance, this time among Caucasian and Asian men. Aronson found that when the white male test-takers were reminded of the stereotype that Asian-Americans tend to outperform Caucasians on math and science tests, the Caucasian males subsequently performed worse than their uninformed counterparts in the next room.”

Similarly, Jason Chung argues

“A study by David Chan et al. noted that African-Americans adults in general have a lower regard in general for aptitude tests than their Caucasian counterparts which caused them to score lower on the tests. After motivation was given to black test-takers their scores improved until there was no discernible difference between black test scores and white test scores.”

Thus, even if Wonderlic scores are measurably different by position, which I doubt, those differences may represent racially-based differences in test-taking performance, not intelligence or decision-making.  This explanation, of course, begs the question of why there are racially-based differences between positions.  (Many commentators suggest that so called “racial stacking” may explain these differences, though that explanation suggests that selection operates in a way I continue to find implausible.)

In any event, I wanted to end this post with a few observations.  The first is that graphics can be both misleading and sticky, and you should be careful when you see a figure that confirms your priors.  Second, whatever the correlation between positions and intelligence, it’s clear that being an offensive lineman is a job that irrevocably destroys your brain.  As Malcolm Gladwell recently wrote:

Much of the attention in the football world, in the past few years, has been on concussions—on diagnosing, managing, and preventing them—and on figuring out how many concussions a player can have before he should call it quits. But a football player’s real issue isn’t simply with repetitive concussive trauma. It is, as the concussion specialist Robert Cantu argues, with repetitive subconcussive trauma. It’s not just the handful of big hits that matter. It’s lots of little hits, too.

That’s why, Cantu says, so many of the ex-players who have been given a diagnosis of C.T.E. [chronic traumatic encephalopathy] were linemen: line play lends itself to lots of little hits. The HITS data suggest that, in an average football season, a lineman could get struck in the head a thousand times, which means that a ten-year N.F.L. veteran, when you bring in his college and high-school playing days, could well have been hit in the head eighteen thousand times: that’s thousands of jarring blows that shake the brain from front to back and side to side, stretching and weakening and tearing the connections among nerve cells, and making the brain increasingly vulnerable to long-term damage. People with C.T.E., Cantu says, “aren’t necessarily people with a high, recognized concussion history. But they are individuals who collided heads on every play—repetitively doing this, year after year, under levels that were tolerable for them to continue to play.”

In sum: offensive linemen are not (necessarily) smarter than anyone else on the field (though they may be better test-takers) but they are getting less smart faster than everyone else.

(H/T: Readers CDP and Ringo)

You may also like...

8 Responses

  1. Sarah Lawsky says:

    Your criteria seem to imply that there could never be a good visual representation of a bad underlying study, and your post criticizes the study, not the graph. As you analyze it, this graphic is misleading only because it is based on a misleading study. In fact, based on your criteria, any “good” (eye-catching, simple, accurate, whatever) graphic of a bad study will constitute a bad graphic, because it will make clear and communicate well the underlying data.

    I believe Tufte’s suggestions and criticism are more along the lines of what Andrew Gelman regularly does on his blog ( http://www.stat.columbia.edu/~gelman/blog/ — see the posts in the category “Statistical Graphics”), in which the analysis of the graphic relies in no way on the quality of the study, but only on how the graphic communicates the study.

  2. A.J. Sutter says:

    Sarah read my mind before I had even finished reading your piece (or begun it, for that matter). I second her comment.

  3. Dave Hoffman says:

    Sarah and AJ,
    I think you’ve misread me – or I’ve been unclear.

    The graph is bad because the data are bad.

    But it’s also bad because it doesn’t convey the uncertainty that we ought to have about the data (i.e., it makes us think that we can be confident that O-Linemen have a wonderlic of X, but we don’t know what the variance between the population is, or whether differences in the sizes of the bubbles are statistically significant). In that way, the graph makes the data (arguably) worse.

    You could have a good graph of bad data – it would make clear for the reader how uncertain and contingent the data are. And you can have a bad graph of good data — the traditional tufte problem.

  4. Managing Board says:

    I may be misremembering Zimmerman’s argument from when I read that book about 15 years ago, but I think he says offensive linemen have to be smart because they play every single offensive down and thus have to know the team’s entire repertoire of blocking schemes. These can be quite complex and you have to know what the other linemen are doing. The backs and ends just have to know their particular plays, which are some variation on “run this way and do your thing.” So even if the offensive players are basically running scripts–and most of the time they are–the linemen have to know ten times as many scripts as the backs and ends.

    Teams could rotate offensive linemen in and out like they do backs and ends. But teams don’t do this, apparently because it’s useful to have the same five guys play as many downs together as possible. The other offensive player who plays every possible down is the quarterback, who also has a high score.

    So it’s no so much a “close to the ball” theory as one based on expected playing time and the sheer number of plays that have to be memorized and executed.

  5. Dave Hoffman says:

    Yes, I’ve heard that argument. But the wonderlic doesn’t really measure memory, it measures (if anything) how fast you can make relatively easy decisions.

  6. Adam Benforado says:

    As Dave suggested, I think it’s important to distinguish what the Wonderlic measures and what is needed to be a successful football player. It makes intuitive sense that there are certain “thinking” positions in a game like football and that people who are better “thinkers” (measured by how well they do on timed written problem-solving tests) will do better in these positions. Unfortunately, that intuition does not stand up to scrutiny.

    Jonah Lehrer has a nice summary of the latest research in the mind sciences about game performance in his book How We Decide as well as on his blog.

    Here is an excerpt from one of Lehrer’s blog posts on the topic (http://scienceblogs.com/cortex/2008/12/in_the_latest_new_yorker.php):

    “Why is the Wonderlic so useless at predicting which quarterbacks will succeed? The reason is that finding the open man during an NFL game involves a very different set of decision-making skills than solving a problem on an IQ test. While quarterbacks need to grapple with complexity – the typical offensive playbook is several inches thick – they don’t make sense of the football field the way they make sense of questions on a multiple-choice exam. The Wonderlic measures a specific kind of thought process, but the best quarterbacks don’t think in the pocket. There isn’t time.”

    “So how do quarterbacks do it? How do they make a decision? It’s like asking a baseball player why he decided to swing the bat: the velocity of the game makes thought impossible. What recent research in neuroscience suggests is that quarterbacks choose where to throw the ball by relying on their unconscious brain. Just as a baseball player will decide to swing at a pitch for reasons he can’t explain (he’ is acting on subliminal cues from the hand of the pitcher), an experienced quarterback picks up defensive details he’s not even aware of. Although he doesn’t consciously perceive the lurking cornerback, or the blitzing linebacker, the quarterback’s unconscious is still able to monitor the movement of these players. And then, when he glances at his receivers, his brain automatically converts these details into a set of fast emotional signals, so that a receiver in tight coverage gets associated with a twinge of fear, while an open man triggers a burst of positive feeling. It’s these inarticulate emotions, and not some elaborate set of rational calculations, that tell the best quarterbacks when to let the ball fly. The pocket, it turns out, is too dangerous a place to think.”

  7. dave hoffman says:


    It strikes me that the lack of correlation between the wonderlic and QB success doesn’t necessarily mean that QBs don’t need decision making skills (or the bigger claim that their decisions are subconscious – a “situation sense” in the pocket). That’s a plausible interpretation, but it’s not required by the data. Rather, variance in outcomes just means that it is quite hard to predict QB success based on individual, as opposed to team and systemic, characteristics.

  8. Adam Benforado says:

    Hey Dave, I agree that the lack of a correlation alone doesn’t tell you much; I think it’s other research on the prefrontal cortex and subconscious processes that help make Lehrer’s assertions more robust. On a somewhat different note, it’s interesting to think about whether the Wonderlic might be put to better use with respect to coaches. As a Redskins fan, I’ve wondered all season about coach Jim Zorn’s decision-making skills at critical moments . . . Could the Wonderlic have saved the team from last night’s beating at the hands of the Giants?