When Academics Attack: CELS and the Death Penalty

BIO311-Ch12-Eq5.gifI’m writing from Ithaca, NY, where I’m attending the third annual conference on empirical legal studies. Much talk of heteroscedastic error, t-statistics, sampling bias, Gary King’s most recent programming innovations, and the need for peer review. And also one of the more vigorously contested academic panels I’ve ever seen.

The session title was Law and Criminal Procedure. The paper, by Hashem Dezhbakhsh and Paul H. Rubin, was From the ‘Econometrics of Capital Punishment’ to the ‘Capital Punishment’ of Econometrics: On the Use and Abuse of Sensitivity Analysis. The commentator was Justin Wolfers. If you know anything about the debate, you can imagine what followed. Dezhbakhsh and Rubin complained about (purported) errors in a paper Wolfers wrote with John Donohue, which in turn had criticized earlier D/R work finding that the death penalty deterred. Wolfers responded vigorously, though, to his credit, with much more poise that I would have, had I been so personally attacked.

The bones of contention are many, but I think boiled down to the following key points.

1. W-D critiqued D-R and others because their models were fragile, i.e., if you removed outlying data (like executions in Texas), or changed other seemingly crucial assumptions, the significance and even direction of the predicted effects would flip. D-R’s response was, basically, so what? We know that OLS is highly sensitive: the right response is not to drop observations (like Texas) but rather determine less radical ways to deal with outlying data. Moreover, D-R pointed out that only four states’ characteristics changed the model’s effect, two of which W-D relied on, and suggested that W-D data mined to find particularly bad examples for the model. To which W-D responded that if you look at the distribution of error terms, it was D-R, not W-D, who are guilty of mining.

Verdict: very hard to know without reading the D-R paper, but it struck me as significant that D-R were willing to admit that their model was fragile to manipulation and that Texas represented such a dominant cluster of data. This isn’t the right way to think about it, but what if it were true that the death penalty deterred, but only in cultures that looked like Texas? This openness to the fragility of their claim casts significant doubt on what I saw as the very aggressive rhetorical posture advanced by D-R in their earlier work, not to mention the ways such work has been enlisted politically. But maybe others had a different view of this concession, to the extent one was made.

2. D-R asserted that W-D had been able to replicate their findings, to which Wolfers conceded that under a very cramped definition of replication, i.e., using D-R’s data and D-R’s script on Wolfers’ computer, then indeed he had replicated the findings. But he had been unable to do so more broadly.

Verdict: I’m with Wolfers. Replication shouldn’t mean just re-running a .do file. On the other hand, later in the day Lee Epstein offered a nice speech about Exxon’s infamous footnote 17, in which she suggested the replication might mean nothing more than the ability to re-create work using the original author’s precise methods, and so perhaps D-R’s view dominates outside of the legal academy.

3. Wolfers asserted that D-R had used the same instrumental variables in multiple studies. Thus, for example, in one paper they assumed that republican vote share influenced homicide rates only through its effect on gun carry laws; in another, they assumed that vote share only influences through its effect on capital execution rates. (I think I have these relationships right, if I don’t, forgive me, it’s been a long day.)

Verdict: I don’t think that D-R had a complete response to this critique, apart from saying that it was common practice. At this point, if not earlier, the discussion became notably personal. D-R accused Wolfers of concealing W-D’s findings, of not submitting W-D’s work to peer review, of data mining, of manipulating findings, and a host of other sins. Wolfers rejoined that it was D-R who had not made data available (and produced some emails to that effect), and, much more significantly, had offered no response on the merits to the central critiques of the Stanford Law Review piece. In Q&A afterward, a criminologist said something like “this kind of dispute about methods makes me think that economists are full of nonsense., since it replicate a pattern we see often: strong claims, followed by methodological sniping, followed by animus and a retreat to theory.” D-R in response argued for more education of consumers of empirical work so consumers could tell good from bad work; Wolfers said that consensus did exist, if you asked a wide sample of econometricians.

I know that the above sounds pretty technical and dry, but it wasn’t in person. It was like watching a very elegant car wreck, or your parents fighting over the taxes. Technical jargon, buried normative moves, and emotion, all knotted together.

You may also like...

5 Responses

  1. Matt says:

    On “replication”- my understanding is that in the physical sciences very few times do people try to “replicate” results in the way that D-R seem to be using the term above unless they think there is something clearly funny about the results (either they think there is clear book-cooking or else a clear mistake.) Rather, they perform similar experiments while changing some aspects to see if the results are robust. This is much closer to Wolfer’s approach, from what’s described here. (I haven’t read the papers and am just going on the description.) Michael Weisberg, a philosopher of science at Penn, has a very good paper on robustness for those who are interested, “Robustness Analysis”, in _Philosophy of Science_, vol. 73, 730-742.

  2. Alex says:


    A general question, could you suggest any good introductory boks on empirical methods (in particular empirical methods used in the social sciences – such as employed by the studies you discusss in this post)?


  3. Hadar says:

    Being there actually brought a flashback from the days I was following the debates on Science and Nature about the sexual orientation of fruit flies. Biological geneticists were quibbling and attacking each other over methodological issues, and I kept thinking that, while the debate seemed genuinely about science, it was also perceived as something that would have bearing on a much more significant ideological debate (the innateness of sexual orientation in humans). Elegant car wreck, indeed.

  4. Don Braman says:

    Of course, this is precisely what cultural cognition theory predicts will happen when advocates retreat from explicitly partisan normative justifications to ostensibly neutral empirical arguments. It’s far easier to agree to disagree on foundational worldviews (“we value different things”); when you make a factual claim, the implication is that saying otherwise is a challenge to the truth (“I’m honest; You’re lying!”). And, because worldviews shape factual evaluations, what was once an argument about diverse value preferences is replicated in more heated terms in the empirical debate. See, e.g., empirical debates over guns, abortion, the death penalty, global warming, nuclear power, gay parenting, etc.

  5. David Kaye says:

    Alex asked about an introductory text on empirical methods. He could take a look at Prove It with Figures: Empirical Methods in Law and Litigation, by Hans Zeisel and me. I am biased of course, but this book uses a minimum of math and has received generally good reviews.

    In some fields spurious findings are commonplace because of the sheer number of mindless studies or variables studied. Replication therefore has become de rigeur in genome-wide association studies.

    A broader notion of replication is “triangulation” described by Zeisel. It is particularly useful in trying to discern causation from observational studies.