Fair Criticism: A Response to Rebecca Sandefur, Andrew Martin, Michael Heise, and Ted Eisenberg

We very much appreciate the time Rebecca Sandefur, Andrew Martin, Michael Heise, and Ted Eisenberg have taken to comment on our paper.  We are particularly excited by comments from these authors because for each, we have read and admired his/her work in the past.

We believe that much of the criticism expressed in these comments is well-taken, and we will react accordingly.

First, both Andrew and Ted highlight the importance of power calculations.  We agree that these are important.  We have been doing some of these prior to this post, and our efforts on this front are continuing.  A couple of points:  First, for the Bayesian double-regression results, the concept of power calculations is somewhat challenging, at least for us.  Because the Bayesian posterior predictive distribution provides direct probabilities, quantiles, etc. on the treatment effect, the concepts of rejection region, Type II error, and the like are not ones we have encountered before in the Bayesian context, and we are not sure they fit coherently within a Bayesian framework.  As we say in the paper, we are willing to stake the claims we make on the Bayesian results.  We are doing some research on this issue.  If these authors or any other readers of the concurringopinions have references or suggestions for us on this point, we would be delighted to receive them.

Second, with respect to the Fisher test and interval-based estimates we provide in the paper, power is a concern.  Part of the problem here is how to do a power calculation consistent with our weighting system and the high control group win rate we unexpectedly observed in our dataset.  We have some initial simulations completed and are running more.  We have started with the idea of taking the control group’s win rate of .72 as a baseline and varying the treated group’s win rate to such values as X = .92,.87, .82, and .77.  With that in mind, we can ask two questions:  (i) If X is equal to some specified value, how likely is it that we would get a Fisher p-value of .26 (what we observed) or higher?  (ii) If X is equal to some specified value, how likely is it that we would get a p-value of .05 or lower.  We think both questions are interesting.  The second is a more traditional power calculation.  The first uses additional information from the data, specifically the fact that the p-value from the Fisher test, .26, was not close to .05.  We would very much appreciate comment on these approaches.

Our initial runs (we’re still checking on these) suggest that there is a reasonable amount of power to detect increases in the win probability of .15 or higher at the alpha = .05 level, but that power degrades when the increase drops below .10.  Thus, our preliminary simulations are currently showing more power than is indicated in Ted’s post, although the general thrust is that power degrades below a .10 increase. We are continuing analysis on this subject as well as checking our current runs.

Meanwhile, the same simulation runs are unsurprisingly suggesting that it is less likely that we would have gotten a Fisher p-value of .26 or higher had the increase the HLAB offer caused a .1 or greater increase in probability of victory.

With respect to Michael’s point about our statement in the draft paper that the effect of actual use is the “less interesting question,” we agree.  This is an overstatement, and we have removed it in a draft awaiting ssrn approval.  We continue to believe that the effect of offers is interesting, but the fact is that the effects of both offers and actual use are interesting.  Our explanation of why offers are interesting no-doubt smacks of “The lady doth protest too much” for persons who consume empirical scholarship.  Our hope here was to introduce this concept (the intention-to-treat effect versus an as-treated analysis) to persons who had not encountered it before, meaning persons outside the empirical legal community, and we felt (and continue do feel) that required a more extensive explanation.

Which brings us to Becky’s point.  She is absolutely correct about both our departure from the traditional lit-review-first model of writing and the reason for that departure (i.e., to make the paper accessible to a wide audience).  We do not entirely agree that we address a different question from the literature we criticize.  Both the literature and our paper attempt to address the causal effect of actual use of representation.  The difference is that we admit that we cannot say much useful about the effect of actual use with our data on win/loss and explain why, while the literature we criticize (meaning the publications other than Serron et al. and Stapleton & Teiteilbaum) does not acknowledge the limits in the corresponding data.  Note that we were able to find an actual use effect in our data:  a delay probably larger than the delay due to the HLAB offer.

By way of further explanation on this point, we have read and admired Becky’s attempt to tease out some useful information from these past observational study literature based on bounds.  We wonder, however, whether bounds can respond to the problem we explored in Section II.A.1, the failure of these studies to specify when a legal intervention would be assigned.  This issue goes not to modeling but whether the quantitative analyst has the right dataset for the question being studied, and it is not clear to us how bounding can address the possibility of having the wrong dataset for the question of interest.  Again, we would welcome any further thoughts on this point.

Our great thanks to Ted, Becky, Andrew, and Michael for taking the time to write.

Jim and Cassandra