Category: Empirical Analysis of Law


Law’s Arbitrary Endpoints

For many purposes, a season is an arbitrary endpoint to measure a baseball player’s success.  To extract utility from performance data over time, you need to pick endpoints that make sense in light of what you are measuring.  Thus, if we want to know how much to discount a batter’s achievements by luck, it might not make sense to look seasonally – – because there’s no good reason to expect that luck is packaged in April-to-October chunks.  Nonetheless, sabermetricians commonly do talk about BABIP seasonally — thus, Aaron Rowand had an “unusually lucky 2007,” and has since regressed himself off of a major league payroll.  Jayson Werth, similarly, is feeling the bite of lady luck this “season.”  (For pitchers, the analysis makes more sense, since the point of BABIP is that pitchers can’t control outcomes once the ball hits a bat.  Thus, the Phillies fifth starter is supposedly not nearly as good as his haircut suggests he ought to be.)

This bias toward artificial endpoints affects legal studies, though less obviously.  There aren’t legal seasons.  (It’s always a time to weep, to bill, to work, to reap.)  But we still organize our analyses around units which might not exactly track the underlying item of interest.  We want to study disputes, but we look at the records of filing and verdicts (which are a smaller unit in time than the object of study).  We wish to examine ideological voting patterns on the Court, but we organize our study by Term.  We want clear signals of young lawyer quality, but we look at grades in law school, for (mostly) the first three semesters). We want to know how law schools’ influence hiring practices, but we look at deadline-generated 9-month hiring reports.  Different slices at these numbers may produce quite different results — heck, one of the reasons that USNews obtains variable rankings is that they keep on moving the endpoints of the analysis in ways that are perfectly unclear.

There’s no complete solution to the endpoint problem – at least, not one that’s easily compatible with the project of data-driven legal analysis.  It’s important, therefore, to be especially careful when reading studies that take advantage of convenient legal periods.  A prime example is the Supreme Court’s “Term.”  I have no good reason to expect that the Justices’ behavior changes meaningfully from one Term to another — absent an intervening change in personnel.  So, Term analysis is convenient, but I bet it misleads.  Comparing the performance of a Circuit from one Term to another is similarly odd — whatever the value of the “reversal rate” inquiry, it surely doesn’t turn on Terms!

This set of cautions might be extended to a more general one, directed at folks who are interested in doing empirical work but haven’t yet begun to collect data. If your outcome of interest is measured monthly, seasonally or yearly, consider whether that unit of measurement reflects something true about the data, or is merely a convenience.  If it’s the latter, proceed with caution.  Obviously, this isn’t at all a novel caution, but the persistence of the error suggests it can’t be made often enough.

Assessing Medicaid Managed Care

The Washington Post has featured two interesting pieces recently on Medicaid managed care. Christopher Weaver reported on a battle between providers and insurers in Texas. Noting that “federal health law calls for a huge expansion of the Medicaid program in 2014,” Weaver shows how eager insurers are to enroll poor individuals in their plans. Each enrollee would “yield on average $7 a month profit,” according to recent calculations. Cost-cutting legislators see potential fiscal gains, too, once the market starts working its magic.

There’s only one problem with those projections: it turns out that “moving Medicaid recipients into managed care ‘did not lead to lower Medicaid spending during the 1991 to 2003 period,'” according to a report published by the National Bureau of Economic Research this month. Sarah Kliff is surprised to find that this is “the first national look at whether Medicaid managed care has actually done a key thing that states want it to do.”
Read More


The Future of Empirical Legal Studies

Kenesaw Mountain Landis would have hated both sabermetrics and ELS.

Reading these two articles on the problems of complexity for sabermetrics, I wondered if the empirical legal studies community is coming soon to a similar point of crisis. The basic concern is that sabermetricians are devoting oodles of time to ever-more-complex formulae which add only a small amount of predictive power, but which make the discipline more remote from lay understanding, and thus less practically useful.   Basically: the jargonification of a field.  Substituting “law” into Graham MacAree‘s article on the failings of sabermetrics, we get the following dire warning:

“Proper [empirical legal analysis] is something that has to come from the top down ([law]-driven) rather than the bottom up (mathematics/data driven), and to lose sight of that causes a whole host of issues that are plaguing the field at present. Every single formula must be explainable without recourse to using ridiculous numbers. Every analystmust be open to thinking about the [law] in new ways. Every number, every graph in a [ELS] piece musttell a [legal] story*, because otherwise we’re no longer writing about the [legal system ] but indulging in blind number-crunching for its own sake. …

Surveying the field, I no longer believe that those essential precepts hold sway over the [ELS] community. Data analysis methods are being misapplied and sold to readers as the next big thing. Articles are being written for the sake of sharing irrelevant changes in irrelevant metrics. Certain personalities are so revered that their word is taken as gospel when fighting dogma was what brought them the respect they’re now given in the first place. [ELS] is in a sorry state.

How do we fix it? Well, the answer seems simple. [ELS] shouldn’t be so incomprehensible so as not to call up the smell of [a courtroom, or the careful drafting of the definition clauses in a contract, or the delicate tradeoffs involved in family court practice, or the importance of situation sense]. Statistics shouldn’t be sterile and clean and shiny and soulless. They shouldn’t just be about [Law]; they should invoke it. Otherwise, they run the risk of losing the language which makes them so special.”

Note: this is an entirely different  than Leiter’s 2010 odd critique that ELS work was largely mediocre.  The problem, rather, is that the trend is toward a focus on more complex and “accurate” models, often without the input of people with legal training, and insufficient attention to how such models will be explained to lawyers, judges and legal policymakers.  (See also all of Lee Epstein’s work.)


Assessing Twiqbal

Several months ago, the FJC put out a well-publicized study assessing the results of TwomblyIqbal on motions practice.  It concluded that there was little reason, overall, for concern that the Supreme Court’s new pleadings jurisprudence had worked a revolutionary change down below.  Lonny Hoffman (Houston) has just released an important new paper which questions the methods and conclusions of the FJC’s work.  He pulls no punches:

“This paper provides the first comprehensive assessment of the Federal Judicial Center’s long-anticipated study of motions to dismiss for failure to state a claim after Iqbal v. Ashcroft. Three primary assessments are made of the FJC’s study. First, there are reasons to be concerned that the study may be providing an incomplete picture of actual Rule 12(b)(6) activity. Even if the failure to capture all relevant motion activity was a non-biased error, the inclusiveness problem is consequential. Because the study was designed to compare over time the filing and grant rate of Rule 12(b)(6) motions, the size of the effect of the Court’s cases turns on the amount of activity found. Second, even if concerns are set aside that the collected data may be incomplete, it misreads of the FJC’s findings to conclude that the Court’s decisions are having no effect on dismissal practice. The FJC found that after Iqbal, a plaintiff is twice as likely to face a motion to dismiss. This sizeable increase in rate of Rule 12(b)(6) motion activity represents a marked departure from the steady filing rate observed over the last several decades and means, among other consequences, added costs for plaintiffs who have to defend more frequently against these motions. The data regarding orders resolving dismissal motions even more dramatically shows the consequential impacts of the Court’s cases. There were more orders granting dismissal with and without leave to amend, and for every case category examined. Moreover, the data show that after Iqbal it was much more likely that a motion to dismiss would be granted with leave to amend (as compared to being denied) both overall and in the three largest case categories examined (Civil Rights, Financial Instruments and Other). Employment Discrimination, Contract and Torts all show a trend of increasing grant rates. In sum, in every case type studied there was a higher likelihood after Iqbal that a motion to dismiss would be granted. Third, because of inherent limitations in doing empirical work of this nature, the cases may be having effects that the FJC researchers were unable to detect. Comparing how many motions were filed and granted pre-Twombly to post-Iqbal cannot tell us whether the Court’s cases are deterring some claims from being brought, whether they have increased dismissals of complaints on factual sufficiency grounds, or how many meritorious cases have been dismissed as a result of the Court’s stricter pleading filter. Ultimately, perhaps the most important lesson to take away from this last assessment of the FJC’s report is that empirical study cannot resolve all of the policy questions that Twombly and Iqbal raise.”

I should disclose that I provided Lonny comments on an earlier draft, and overall I think he’s done an incredible (and generally very fair) job.  One thing to think about, as always when evaluating litigation data, is the degree to which we would expect to see any results at all given case selection effects.  That Lonny does observe such substantively significant changes notwithstanding selection tells us something about how dramatic the Twiqbal decisions really were.


Beneath the Lamp Post

Though many bemoan the expense and terrible functionality of PACER, the federal government’s electronic docketing system, it is vastly superior to existing state alternatives.  While some states have decent, and searchable, e-dockets, others do not, and it’s often quite hard to figure out the scope of the state databases.  The result is that a researcher (or a lawyer) who wants to study live dockets at the state level is faced with a host of known unknowns, making aggregate statistical inference basically impossible.  Even descriptive statistics about state courts are hard to verify.  It’s a black hole. (With some illumination provided by the BJS and other bodies.)

This frustrates me, and if I could wave a magic wand (or controlled Google) I would create a national e-docketing system for all state filings, permitting full-text searchers across states for comprehensive data – including searches of motions and orders – in both civil and criminal litigation.  The current state of the world, by contrast, directs much of the new empirical legal research to focus on federal cases and federal outcomes, because PACER provides access to the kinds of data that researchers need.  The problem, of course, is that PACER collects only Federal dockets, which aren’t representative of the kind or scope of litigation nationwide. Though of course studying dockets is vastly superior to studying opinions – if you want to know what judges are doing – we’re left still peering through a dark piece of glass.  Worse, I think, is that researchers end up focusing their energies on topics for which federal litigation is the dominant way of resolving legal claims.  Thus, there’s much more, and much better, docket-centered empirical work about securities law and federal civil rights statutes than there is about common law adjudication.

Our sadly patchwork court records system  doesn’t just hurt academics looking to illuminate doctrinal puzzles.  (The horror! Tenured professors can’t write more papers!)  It also means that lawyers and corporate officers may be forced to rely on anecdote and salience when deciding how to engage with the litigation system — a calculation that may lead such repeat players to develop a long-term strategy to exit the litigation system altogether.  If the state courts want to preserve their business, they need to innovate.  One way to do so would be to join forces in data collection, archival, and search.

(Image Source: Flicker)


Randomization Uber Alles?

Jim and Cassandra write:

“To Dave, we say that our enthusiasm for randomized studies is high, but perhaps not high enough to consider a duty to randomize among law school clinics or among legal services providers.  We provided an example in the paper of practice in which randomization was inappropriate because collecting outcomes might have exposed study subjects to deportation proceedings.  We also highlighted in the paper that in the case of a practice (including possibly a law school clinic) that focuses principally on systemic change, randomization of that practice is not constructive.  Instead, what should be done is a series of randomized studies of an alternative service provider’s practice in that same adjudicatory system; these alternative provider studies can help to assess whether the first provider’s efforts at systemic change have been successful.”

I meant to cabin my argument to law school clinics.  And I do understand that there may be very rare cases where collecting outcomes will hurt clients (such as deportation).  But what about a clinic that focuses on “systemic change.” Let’s assume that subsidizing such a clinic would be a good thing for a law school to do (or, put it another way, we think it is a good idea for current law students to incur more debt so that society gets the benefit of the clinics’ social agitation).  Obviously, randomization of client outcomes would be a terrible fit for measuring the success of such a clinic.  It would be precisely the kind of lamppost/data problem that Brian Leiter thinks characterizes much empirical work.

But that doesn’t mean that randomization couldn’t be useful in measuring other kinds of clinic outcomes.  What about randomization in the allocation of law student “employees” to the clinic as a way to measure student satisfaction in the “learning outcomes“? Or randomization of intake and utilizing different client contact techniques as a way of measuring client satisfaction with their representation (or feelings about the legitimacy of the system?)  One thing that the commentators in this symposium have tried to emphasize is that winning & losing aren’t the only outputs of the market for indigent legal services.  Controlled study of the actors in the system needn’t be constrained in the way that Jim and Cassandra’s reply to my modest proposal to mandate randomization suggest.


Randomization, Intake Systems, and Triage

Thanks to Jim and Cassandra for their carefully constructed study of the impact of an offer from the Harvard Legal Aid Bureau for representation before the Massachusetts Division of Unemployment Assistance, and to all of the participants in the symposium for their thoughtful contributions.  What Difference Representation? continues to provoke much thought, and as others have noted, will have a great impact on the access to justice debate.  I’d like to focus on the last question posed in the paper — where do we go from here? — and tie this in with questions about triage raised by Richard Zorza and questions about intake processes raised by Margaret Monsell.   The discussion below is informed by my experience as a legal service provider in the asylum system, a legal arena that the authors note is  strikingly different from the unemployment benefits appeals process described in the article.

My first point is that intake processes vary significantly between different service providers offering representation in similar and different areas of the law.  In my experience selecting cases for the asylum clinics at Georgetown and Yale, for example, we declined only cases that were frivolous, and at least some intake folks (yours truly included) preferred to select the more difficult cases, believing that high-quality student representation could make the most difference in these cases.  Surely other legal services providers select for the cases that are most likely to win, under different theories about the most effective use of resources.  WDR does not discuss which approach HLAB takes in normal practice (that is, outside the randomization study).  On page twenty, the study states that information on financial eligibility and “certain additional facts regarding the caller and the case”  are put to the vote of HLAB’s intake committee.  On what grounds does this committee vote to accept or reject a case?  In other words, does HLAB normally seek the hard cases, the more straightforward cases, some combination, or does it not take the merits into account at all?

Read More


How Much Enthusiasm for Randomized Trials? A Response to Kevin Quinn and David Hoffman

We thank Kevin Quinn and David Hoffman for taking the time to comment in our paper.  Again, these are two authors whose work we have read and admired in the past.

Both Dave and Kevin offer  thoughts about the levelof enthusiasm legal empiricists, legal services providers, and clinicians should have for randomized studies.  We find ourselves in much but not total agreement with both.  To Kevin, we suggest that there is more at stake than just finding out whether legal assistance helps potential clients.  In an era of scarce legal resources, providers and funders have to make allocation decisions across legal practice areas (i.e., should we fund representation for SSI/SSDI appeals or for unemployment appeals or for summary eviction defense).  That requires more precise knowledge about how large representation (offer or actual use) effects are, how much bang for the buck.  Perhaps even more importantly, scarcity requires that we learn how to triage well; see Richard Zorza’s posts here and the numerous entries in his own blog on this subject.  That means studying the effects of limited interventions.  Randomized trials provide critical information on these questions, even if one agrees (as we do) that in some settings, asking whether representation (offer or actual use) helps clients is like asking whether parachutes are useful.

Thus, perhaps the parachute analogy is inapt, or better, it requires clarification:  we are in a world in which not all who could benefit from full-service parachutes can receive them.  Some will have to be provided with rickety parachutes, and some with little more than large blankets.  We all should try to change this situation as much as possible (thus the fervent hope we expressed in the paper that funding for legal services be increased).  But the oversubscription problem is simply enormous.  When there isn’t enough to go around, we need to know what we need to know to allocate well.  Meanwhile, randomized studies can also provide critical information on the pro se accessibility of an adjudicatory system, which can lay the groundwork for reform.

To Dave, we say that our enthusiasm for randomized studies is high, but perhaps not high enough to consider a duty to randomize among law school clinics or among legal services providers.  We provided an example in the paper of practice in which randomization was inappropriate because collecting outcomes might have exposed study subjects to deportation proceedings.  We also highlighted in the paper that in the case of a practice (including possibly a law school clinic) that focuses principally on systemic change, randomization of that practice is not constructive.  Instead, what should be done is a series of randomized studies of an alternative service provider’s practice in that same adjudicatory system; these alternative provider studies can help to assess whether the first provider’s efforts at systemic change have been successful.

Our great thanks to both Kevin and Dave for writing, and (obviously) to Dave (and Jaya) for organizing this symposium.


What Difference Representation: Case Selection and Professional Responsibility

Thanks for the invitation to participate in this interesting and provocative symposium.

I’m a legal services attorney in Boston. My employer, Massachusetts Law Reform Institute (MLRI), has as one of its primary tasks to connect the state’s field programs, where individual client representation occurs, with larger political bodies, including legislatures and administrative agencies, where the systemic changes affecting our clients most often take place. (The legal services programs in many states include organizations comparable to MLRI; we are sometimes known by the somewhat infelicitous name “backup centers.”) Among the programs with which MLRI is in communication is the Harvard Legal Assistance Bureau, and I would take this moment to acknowledge the high regard in which I and my colleagues regard their work.

The substantive area of my work is employment law. It is no surprise that during the past three years of our country’s Great Recession, the importance of the unemployment insurance system for our clients has increased enormously and, consequently, it has occupied a greater portion of my time than might otherwise have been the case.

I’m not a statistician nor do I work in a field program representing individual clients, so my comments will not address in any detail the validity of the HLAB study or the conclusions that may properly be drawn from it. As one member of the community of Massachusetts legal services attorneys, however, I have an obvious interest in the way the study portrays us: we are variously described as self-protective, emotional, distrustful of being evaluated, and reluctant to the point of perverseness in participating in randomized studies of the kind the authors wish to conduct. Our resistance in this regard has itself already been the subject of comment here. Happily, it is not often that one looks into what seems to be a mirror and sees the personage looking back wearing a black hat and a snarl. But when it does happen, it’s hard to look away without some effort at clarification. So I will devote my contribution to the symposium to the topic of the perceived reluctance of the legal services community to cooperate in randomized trials. It goes without saying, but the following thoughts are those of only one member of a larger community.

My understanding is that in the HLAB study, no significant case evaluation occurred prior to randomization. Many of us in legal services view with trepidation the idea of ceding control over case selection to the randomization process. Others have more sanguine views, either because they assume that randomization is already taking place or that it ought to be. For example, in his comments from a few months ago, Dave Hoffman was working under the assumption that to randomize client selection would not change an agency’s representation practices at all, and on that basis, he criticized resistance to randomized control trials as “trying to prevent research from happening.”

The authors of the study are enthusiastic about randomization not only because of its scientific value in statistical research but also because it can help to solve one of the thorniest problems facing legal services programs – the scarcity of resources as compared to the demand. As long as the demand for legal assistance outstrips the supply, Professor Greiner has said, randomization – a roll of the dice or the flip of a coin — is an easy and appropriate way to decide who gets representation and who does not.

I believe it’s erroneous to assume that randomization would not change representation practices, at least in the area of legal services in which I work. I also acknowledge that it is possible, at least theoretically, for all the cases in a randomized control trial to have met the provider’s standards for representation. This would provide some measure of reassurance. However, in one area of law, immigration asylum cases, the authors have concluded that time constraints make such an effort unworkable.

Read More


What Difference Representation: Inconclusive Evidence

Congratulations to the authors on an excellent study that promotes and explores the importance of random assignment.

My comment supports the article’s emphasis on caution and not overgeneralizing. My focus is on the article’s Question 2: Did an offer of HLAB representation increase the probability that the claimant would prevail? My analysis of the simple frequencies (I have not delved into the regressions and ignore weights) suggests that HLAB attorneys should view the results as modest, but inconclusive, evidence that an offer of representation improves outcomes.

Based on Table 1, page 24, there are 129 No offer observations and 78 Offer observations. Ignoring weights, which I think are said not to make a huge difference, page 26 reports that .76 of claimants who received an offer prevailed in their first level appeals, and that .72 of claimants who did not receive an offer prevailed in their first level appeal.

So, those who were offered representation fared better; one measure of which is they did .04/.72 x 100, or 5.6% better. Given the high background (no offer condition) rate of prevailing, the maximum improvement (to 1.00 success rate) is .28/.72 x 100 or 38.9%.  Another measure could be the proportionate reduction in defeat. The no offer group was “defeated” 28% of the time. The offer group was defeated 24% of the time.  The reduction in defeat is .04/.28 x 100 is 14.3%. This measure has the sometimes attractive feature that it can range from 0% to 100%. So by this measure the offeree group did 14% better than the non-offeree group, a modest improvement for the offer condition.

A concern expressed in the paper is that the result is not statistically significant. This raises the question: given the sample size, how likely was it that a statistically significant effect would be detected? Assessing this requires hypothesizing what size effect of an offer would be of societal interest.  Suppose we say that lawyers should do about 10% better and move the win rate from .72 for non offerees to .80 for offerees.  This is an 11.1% improvement by the first measure and a 28.6% improvement by the second measure.  Both strike me as socially meaningful but others might specify different numbers.

We can now pose the question: given the sample size and the effect of specified size, what is the probability of observing a statistically significant effect if one exists?  I use the following Stata command to explore the statistical power of the study:

sampsi .72 .80,n1(129) n2(78), which yields the following output:

Estimated power for two-sample comparison of proportions

Test Ho: p1 = p2, where p1 is the proportion in population 1 and p2 is the proportion in population 2


alpha =   0.0500  (two-sided)

p1 =   0.7200

p2 =   0.8000

sample size n1 = 129

n2 = 78

n2/n1 = 0.60

Estimated power:

power =   0.1936

A power of 0.19 is too low to conclude that the study was large enough to detect an effect of the specified size at a statistically significant level. If one concluded that an offer of representation did not make a significant difference from this study, there is a good chance the conclusion would be incorrect. To achieve power of about 0.70, one would need a sample four times as large as that in the study. If one thought that smaller effects were meaningful, the sample would be even more undersized.

I think my analysis so far underestimates the benefit of an offer by HLAB attorneys.  Perhaps we can take .72 as a reasonable lower bound on success. Even folks without an offer succeeded at that rate.  The realistic upper bound on success is likely not 1.00.  Some cases simply cannot be won, even by the best lawyer in the world. Perhaps not more than 90% of cases are ever winnable, with the real winnable rate likely somewhere between .8 and .9.  If the winnable rate was .8, then the offer got clients halfway there, from .72 to .76. If the real rate was higher, the offer was less effective but not trivial in size.  At .9, the offer got the clients 22% closer to the ideal. The study just was not large enough to detect much of an effect at a statistically significant level.

So while I agree that the study provides no significant evidence that an offer increases success, my analysis (obviously incomplete) suggests that the study provides no persuasive evidence that an offer does not increase success. The study is inconclusive on this issue because of sample size.

HLAB lawyers should not feel that they have to explain away these results; the results modestly, but inconclusively, support the positive effect of an offer because they are in the right direction in a small study.