What Difference Representation: Randomization, Power, and Replication
I’d like to thank Dave and Jaya for inviting me to participate in this symposium, and I’d also like to thank Jim and Cassandra (hereafter “the authors”) for their terrific paper.
This paper exhibits all the features of good empirical work. It’s motivated by an important substantive question that has policy implications. The authors use a precise research design to answer the question: to what extent does an offer of representation affect outcomes? The statistical analysis is careful and concise, and the conclusions drawn from study are appropriately caveated. Indeed, this law review article might just be the one with the most caveats ever published! I’m interested to hear from the critics, and to join the dialogue about the explanation of the findings and the implications for legal services work. In my initial comments about the paper, I’ll make three observations about the study.
First, randomization is key to successful program evaluation. Randomization mitigates against all sorts of confounders, including those that are impossible to anticipate ex ante and or control for ex post. This is the real strength of this study. A corollary is that observational program evaluation studies can rarely be trusted. Indeed, even with very fancy statistics, estimating causal effects with observational data is really difficult. It’s important to note that different research questions will require different randomizations.
Second, the core empirical result with regard to litigation success is that there is not a statistically significant difference between those offered representation by HLAB and those that were not. The authors write: “[a]t a minimum, any effect due to the HLAB offer is likely to be small” (p. 29). I’d like to know how small. Here’s why. It’s always hard what to make from null findings. Anytime an effect is “statistically insignificant” one of two things is true: there really isn’t a difference between the treatment and control group, or that the difference is so small that it cannot be detected with the statistical model employed. Given the sample size and win rates around 70%, how small of a difference could the Fisher test be able to detect? We all might not agree what makes a “big” or “small” difference, but some additional power analysis would tell us a lot about how what these tools could possibly detect.
Finally, if we truly care about legal services and the efficacy of legal representation, this study needs to be replicated, in other courts, other areas of laws, and with different legal aid organizations. Only rigorous program evaluation of this type can allow us to answer the core research question. Of course, the core research question isn’t the only thing of interest. The authors spend a lot of time talking about different explanations for the findings. Determining which of these explanations are correct will go a long way in guiding the practical take-away from the study. Sorting out those explanations will require additional studies and different studies. I spend a lot of time writing on judicial decisionmaking. My money is on the idea that ALJs behave differently with pro se parties in front of them. But this study doesn’t allow us to determine which explanation for the core findings are correct. That doesn’t mitigate against the importance or quality of the work; it’s a known (and disclosed) limitation that leads us to the next set of studies to undertake.