Category: Empirical Analysis of Law


Reversal Rates, Reconsidered

What is the meaning of an appellate court’s “reversal rate”?  Opinions vary.  (My view, expressed, succinctly, is “basically nothing.”) However conceived, we ought to at least be measuring reversal correctly.  But two lawyers at Hangley Aronchick, a Philadelphia law firm, think that scholars (and journalists) have conceptualized reversal in entirely the wrong way.

According to John Summers and Michael Newman, we’ve forgotten that every case the Supreme Court takes implicitly also considers shadow cases from other circuits ruling on the same issue — that is, the Supreme Court doesn’t just “reverse” the circuit on direct appeal, it also affirms (or reverses) coordinate circuits while resolving a split.  Thus, both our numerator and our denominator have been wrong.  They’ve written up the results of this pretty interesting approach to reversal in a paper you can find blurbed here.   Among the highlights: (1) reversal is less common that is commonly supposed; (2) the Court doesn’t predictably follow the majority of circuits; (3) there are patterns of concordance between circuits in analyzing issues; and (4) even under the new approach, the ninth circuit is still the least loyal agent of the Supreme Court.

I think that this method has real promise, and I bet that folks who are interested in judicial behavior will want to check it out.


In Praise of Complexity

Earlier this month, right here on this very blog, Dave Hoffman pontificated about two of my favorite subjects: empirical legal studies and baseball. Primarily, Dave wondered about whether empirical legal research was facing might face the same problem as sabermatic baseball analysis: inaccessible complexity. I won’t rehash his argument because he did a very good job of explaining it in the original post. Although I completely agree with his conclusion that empirical legal studies should seek to be more accessible (which I always note at the end of my introduction of my empirical work), I disagree with his contention that empirical legal studies are facing might face widespread incomprehensibility due to growing complexity. Because I think it is a helpful analogy, I’ll borrow Dave’s example of advanced statistics in baseball. Read More


Law’s Arbitrary Endpoints

For many purposes, a season is an arbitrary endpoint to measure a baseball player’s success.  To extract utility from performance data over time, you need to pick endpoints that make sense in light of what you are measuring.  Thus, if we want to know how much to discount a batter’s achievements by luck, it might not make sense to look seasonally – – because there’s no good reason to expect that luck is packaged in April-to-October chunks.  Nonetheless, sabermetricians commonly do talk about BABIP seasonally — thus, Aaron Rowand had an “unusually lucky 2007,” and has since regressed himself off of a major league payroll.  Jayson Werth, similarly, is feeling the bite of lady luck this “season.”  (For pitchers, the analysis makes more sense, since the point of BABIP is that pitchers can’t control outcomes once the ball hits a bat.  Thus, the Phillies fifth starter is supposedly not nearly as good as his haircut suggests he ought to be.)

This bias toward artificial endpoints affects legal studies, though less obviously.  There aren’t legal seasons.  (It’s always a time to weep, to bill, to work, to reap.)  But we still organize our analyses around units which might not exactly track the underlying item of interest.  We want to study disputes, but we look at the records of filing and verdicts (which are a smaller unit in time than the object of study).  We wish to examine ideological voting patterns on the Court, but we organize our study by Term.  We want clear signals of young lawyer quality, but we look at grades in law school, for (mostly) the first three semesters). We want to know how law schools’ influence hiring practices, but we look at deadline-generated 9-month hiring reports.  Different slices at these numbers may produce quite different results — heck, one of the reasons that USNews obtains variable rankings is that they keep on moving the endpoints of the analysis in ways that are perfectly unclear.

There’s no complete solution to the endpoint problem – at least, not one that’s easily compatible with the project of data-driven legal analysis.  It’s important, therefore, to be especially careful when reading studies that take advantage of convenient legal periods.  A prime example is the Supreme Court’s “Term.”  I have no good reason to expect that the Justices’ behavior changes meaningfully from one Term to another — absent an intervening change in personnel.  So, Term analysis is convenient, but I bet it misleads.  Comparing the performance of a Circuit from one Term to another is similarly odd — whatever the value of the “reversal rate” inquiry, it surely doesn’t turn on Terms!

This set of cautions might be extended to a more general one, directed at folks who are interested in doing empirical work but haven’t yet begun to collect data. If your outcome of interest is measured monthly, seasonally or yearly, consider whether that unit of measurement reflects something true about the data, or is merely a convenience.  If it’s the latter, proceed with caution.  Obviously, this isn’t at all a novel caution, but the persistence of the error suggests it can’t be made often enough.

Assessing Medicaid Managed Care

The Washington Post has featured two interesting pieces recently on Medicaid managed care. Christopher Weaver reported on a battle between providers and insurers in Texas. Noting that “federal health law calls for a huge expansion of the Medicaid program in 2014,” Weaver shows how eager insurers are to enroll poor individuals in their plans. Each enrollee would “yield on average $7 a month profit,” according to recent calculations. Cost-cutting legislators see potential fiscal gains, too, once the market starts working its magic.

There’s only one problem with those projections: it turns out that “moving Medicaid recipients into managed care ‘did not lead to lower Medicaid spending during the 1991 to 2003 period,'” according to a report published by the National Bureau of Economic Research this month. Sarah Kliff is surprised to find that this is “the first national look at whether Medicaid managed care has actually done a key thing that states want it to do.”
Read More


The Future of Empirical Legal Studies

Kenesaw Mountain Landis would have hated both sabermetrics and ELS.

Reading these two articles on the problems of complexity for sabermetrics, I wondered if the empirical legal studies community is coming soon to a similar point of crisis. The basic concern is that sabermetricians are devoting oodles of time to ever-more-complex formulae which add only a small amount of predictive power, but which make the discipline more remote from lay understanding, and thus less practically useful.   Basically: the jargonification of a field.  Substituting “law” into Graham MacAree‘s article on the failings of sabermetrics, we get the following dire warning:

“Proper [empirical legal analysis] is something that has to come from the top down ([law]-driven) rather than the bottom up (mathematics/data driven), and to lose sight of that causes a whole host of issues that are plaguing the field at present. Every single formula must be explainable without recourse to using ridiculous numbers. Every analystmust be open to thinking about the [law] in new ways. Every number, every graph in a [ELS] piece musttell a [legal] story*, because otherwise we’re no longer writing about the [legal system ] but indulging in blind number-crunching for its own sake. …

Surveying the field, I no longer believe that those essential precepts hold sway over the [ELS] community. Data analysis methods are being misapplied and sold to readers as the next big thing. Articles are being written for the sake of sharing irrelevant changes in irrelevant metrics. Certain personalities are so revered that their word is taken as gospel when fighting dogma was what brought them the respect they’re now given in the first place. [ELS] is in a sorry state.

How do we fix it? Well, the answer seems simple. [ELS] shouldn’t be so incomprehensible so as not to call up the smell of [a courtroom, or the careful drafting of the definition clauses in a contract, or the delicate tradeoffs involved in family court practice, or the importance of situation sense]. Statistics shouldn’t be sterile and clean and shiny and soulless. They shouldn’t just be about [Law]; they should invoke it. Otherwise, they run the risk of losing the language which makes them so special.”

Note: this is an entirely different  than Leiter’s 2010 odd critique that ELS work was largely mediocre.  The problem, rather, is that the trend is toward a focus on more complex and “accurate” models, often without the input of people with legal training, and insufficient attention to how such models will be explained to lawyers, judges and legal policymakers.  (See also all of Lee Epstein’s work.)


Assessing Twiqbal

Several months ago, the FJC put out a well-publicized study assessing the results of TwomblyIqbal on motions practice.  It concluded that there was little reason, overall, for concern that the Supreme Court’s new pleadings jurisprudence had worked a revolutionary change down below.  Lonny Hoffman (Houston) has just released an important new paper which questions the methods and conclusions of the FJC’s work.  He pulls no punches:

“This paper provides the first comprehensive assessment of the Federal Judicial Center’s long-anticipated study of motions to dismiss for failure to state a claim after Iqbal v. Ashcroft. Three primary assessments are made of the FJC’s study. First, there are reasons to be concerned that the study may be providing an incomplete picture of actual Rule 12(b)(6) activity. Even if the failure to capture all relevant motion activity was a non-biased error, the inclusiveness problem is consequential. Because the study was designed to compare over time the filing and grant rate of Rule 12(b)(6) motions, the size of the effect of the Court’s cases turns on the amount of activity found. Second, even if concerns are set aside that the collected data may be incomplete, it misreads of the FJC’s findings to conclude that the Court’s decisions are having no effect on dismissal practice. The FJC found that after Iqbal, a plaintiff is twice as likely to face a motion to dismiss. This sizeable increase in rate of Rule 12(b)(6) motion activity represents a marked departure from the steady filing rate observed over the last several decades and means, among other consequences, added costs for plaintiffs who have to defend more frequently against these motions. The data regarding orders resolving dismissal motions even more dramatically shows the consequential impacts of the Court’s cases. There were more orders granting dismissal with and without leave to amend, and for every case category examined. Moreover, the data show that after Iqbal it was much more likely that a motion to dismiss would be granted with leave to amend (as compared to being denied) both overall and in the three largest case categories examined (Civil Rights, Financial Instruments and Other). Employment Discrimination, Contract and Torts all show a trend of increasing grant rates. In sum, in every case type studied there was a higher likelihood after Iqbal that a motion to dismiss would be granted. Third, because of inherent limitations in doing empirical work of this nature, the cases may be having effects that the FJC researchers were unable to detect. Comparing how many motions were filed and granted pre-Twombly to post-Iqbal cannot tell us whether the Court’s cases are deterring some claims from being brought, whether they have increased dismissals of complaints on factual sufficiency grounds, or how many meritorious cases have been dismissed as a result of the Court’s stricter pleading filter. Ultimately, perhaps the most important lesson to take away from this last assessment of the FJC’s report is that empirical study cannot resolve all of the policy questions that Twombly and Iqbal raise.”

I should disclose that I provided Lonny comments on an earlier draft, and overall I think he’s done an incredible (and generally very fair) job.  One thing to think about, as always when evaluating litigation data, is the degree to which we would expect to see any results at all given case selection effects.  That Lonny does observe such substantively significant changes notwithstanding selection tells us something about how dramatic the Twiqbal decisions really were.


Beneath the Lamp Post

Though many bemoan the expense and terrible functionality of PACER, the federal government’s electronic docketing system, it is vastly superior to existing state alternatives.  While some states have decent, and searchable, e-dockets, others do not, and it’s often quite hard to figure out the scope of the state databases.  The result is that a researcher (or a lawyer) who wants to study live dockets at the state level is faced with a host of known unknowns, making aggregate statistical inference basically impossible.  Even descriptive statistics about state courts are hard to verify.  It’s a black hole. (With some illumination provided by the BJS and other bodies.)

This frustrates me, and if I could wave a magic wand (or controlled Google) I would create a national e-docketing system for all state filings, permitting full-text searchers across states for comprehensive data – including searches of motions and orders – in both civil and criminal litigation.  The current state of the world, by contrast, directs much of the new empirical legal research to focus on federal cases and federal outcomes, because PACER provides access to the kinds of data that researchers need.  The problem, of course, is that PACER collects only Federal dockets, which aren’t representative of the kind or scope of litigation nationwide. Though of course studying dockets is vastly superior to studying opinions – if you want to know what judges are doing – we’re left still peering through a dark piece of glass.  Worse, I think, is that researchers end up focusing their energies on topics for which federal litigation is the dominant way of resolving legal claims.  Thus, there’s much more, and much better, docket-centered empirical work about securities law and federal civil rights statutes than there is about common law adjudication.

Our sadly patchwork court records system  doesn’t just hurt academics looking to illuminate doctrinal puzzles.  (The horror! Tenured professors can’t write more papers!)  It also means that lawyers and corporate officers may be forced to rely on anecdote and salience when deciding how to engage with the litigation system — a calculation that may lead such repeat players to develop a long-term strategy to exit the litigation system altogether.  If the state courts want to preserve their business, they need to innovate.  One way to do so would be to join forces in data collection, archival, and search.

(Image Source: Flicker)


Randomization Uber Alles?

Jim and Cassandra write:

“To Dave, we say that our enthusiasm for randomized studies is high, but perhaps not high enough to consider a duty to randomize among law school clinics or among legal services providers.  We provided an example in the paper of practice in which randomization was inappropriate because collecting outcomes might have exposed study subjects to deportation proceedings.  We also highlighted in the paper that in the case of a practice (including possibly a law school clinic) that focuses principally on systemic change, randomization of that practice is not constructive.  Instead, what should be done is a series of randomized studies of an alternative service provider’s practice in that same adjudicatory system; these alternative provider studies can help to assess whether the first provider’s efforts at systemic change have been successful.”

I meant to cabin my argument to law school clinics.  And I do understand that there may be very rare cases where collecting outcomes will hurt clients (such as deportation).  But what about a clinic that focuses on “systemic change.” Let’s assume that subsidizing such a clinic would be a good thing for a law school to do (or, put it another way, we think it is a good idea for current law students to incur more debt so that society gets the benefit of the clinics’ social agitation).  Obviously, randomization of client outcomes would be a terrible fit for measuring the success of such a clinic.  It would be precisely the kind of lamppost/data problem that Brian Leiter thinks characterizes much empirical work.

But that doesn’t mean that randomization couldn’t be useful in measuring other kinds of clinic outcomes.  What about randomization in the allocation of law student “employees” to the clinic as a way to measure student satisfaction in the “learning outcomes“? Or randomization of intake and utilizing different client contact techniques as a way of measuring client satisfaction with their representation (or feelings about the legitimacy of the system?)  One thing that the commentators in this symposium have tried to emphasize is that winning & losing aren’t the only outputs of the market for indigent legal services.  Controlled study of the actors in the system needn’t be constrained in the way that Jim and Cassandra’s reply to my modest proposal to mandate randomization suggest.


Randomization, Intake Systems, and Triage

Thanks to Jim and Cassandra for their carefully constructed study of the impact of an offer from the Harvard Legal Aid Bureau for representation before the Massachusetts Division of Unemployment Assistance, and to all of the participants in the symposium for their thoughtful contributions.  What Difference Representation? continues to provoke much thought, and as others have noted, will have a great impact on the access to justice debate.  I’d like to focus on the last question posed in the paper — where do we go from here? — and tie this in with questions about triage raised by Richard Zorza and questions about intake processes raised by Margaret Monsell.   The discussion below is informed by my experience as a legal service provider in the asylum system, a legal arena that the authors note is  strikingly different from the unemployment benefits appeals process described in the article.

My first point is that intake processes vary significantly between different service providers offering representation in similar and different areas of the law.  In my experience selecting cases for the asylum clinics at Georgetown and Yale, for example, we declined only cases that were frivolous, and at least some intake folks (yours truly included) preferred to select the more difficult cases, believing that high-quality student representation could make the most difference in these cases.  Surely other legal services providers select for the cases that are most likely to win, under different theories about the most effective use of resources.  WDR does not discuss which approach HLAB takes in normal practice (that is, outside the randomization study).  On page twenty, the study states that information on financial eligibility and “certain additional facts regarding the caller and the case”  are put to the vote of HLAB’s intake committee.  On what grounds does this committee vote to accept or reject a case?  In other words, does HLAB normally seek the hard cases, the more straightforward cases, some combination, or does it not take the merits into account at all?

Read More


How Much Enthusiasm for Randomized Trials? A Response to Kevin Quinn and David Hoffman

We thank Kevin Quinn and David Hoffman for taking the time to comment in our paper.  Again, these are two authors whose work we have read and admired in the past.

Both Dave and Kevin offer  thoughts about the levelof enthusiasm legal empiricists, legal services providers, and clinicians should have for randomized studies.  We find ourselves in much but not total agreement with both.  To Kevin, we suggest that there is more at stake than just finding out whether legal assistance helps potential clients.  In an era of scarce legal resources, providers and funders have to make allocation decisions across legal practice areas (i.e., should we fund representation for SSI/SSDI appeals or for unemployment appeals or for summary eviction defense).  That requires more precise knowledge about how large representation (offer or actual use) effects are, how much bang for the buck.  Perhaps even more importantly, scarcity requires that we learn how to triage well; see Richard Zorza’s posts here and the numerous entries in his own blog on this subject.  That means studying the effects of limited interventions.  Randomized trials provide critical information on these questions, even if one agrees (as we do) that in some settings, asking whether representation (offer or actual use) helps clients is like asking whether parachutes are useful.

Thus, perhaps the parachute analogy is inapt, or better, it requires clarification:  we are in a world in which not all who could benefit from full-service parachutes can receive them.  Some will have to be provided with rickety parachutes, and some with little more than large blankets.  We all should try to change this situation as much as possible (thus the fervent hope we expressed in the paper that funding for legal services be increased).  But the oversubscription problem is simply enormous.  When there isn’t enough to go around, we need to know what we need to know to allocate well.  Meanwhile, randomized studies can also provide critical information on the pro se accessibility of an adjudicatory system, which can lay the groundwork for reform.

To Dave, we say that our enthusiasm for randomized studies is high, but perhaps not high enough to consider a duty to randomize among law school clinics or among legal services providers.  We provided an example in the paper of practice in which randomization was inappropriate because collecting outcomes might have exposed study subjects to deportation proceedings.  We also highlighted in the paper that in the case of a practice (including possibly a law school clinic) that focuses principally on systemic change, randomization of that practice is not constructive.  Instead, what should be done is a series of randomized studies of an alternative service provider’s practice in that same adjudicatory system; these alternative provider studies can help to assess whether the first provider’s efforts at systemic change have been successful.

Our great thanks to both Kevin and Dave for writing, and (obviously) to Dave (and Jaya) for organizing this symposium.