A Taxonomy of Federal Litigation

For the last two years, Christy Boyd and I, along with some friends, have been working on a paper on how attorneys construct complaints.  The project began when we were working to code some other detritus of federal litigation and decided to collect the causes of action in complaints to understand the legal issues in our cases in a better manner than NOS codes alone permitted.  Soon enough, we got to thinking that our causes of action were pled in distinctively patterned ways.  Obviously, this isn’t an earth-shaking insight, as most first year students have thought, at one time or another, that each of their classes’ exam fact patterns could easily substitute for any other.  That is: causes of action are alternative, mutually complementary, theories that channel a limited number of fact patterns into claims to legal relief.   Everyone knows that contract and tort claims are pled together, and that constitutional claims come accompanied by state law torts.  But we thought it’d be worthwhile to nail down this insight using a very similar analysis to the one that enables Amazon to tell you which books you might like — i.e., if you plead a particular cause of action, what other causes of action are you likely to bring in a particular case?

We gathered a set of 2,500 complaints (from a much larger sample of federal complaints derived through RECAP).  The complaints were sampled to be fairly representative of all federal litigation, excluding pro se, social security, and prisoner petition cases. The sample contained 11,500 individual causes of action – around 4.6 causes of action per case.  Guided by co-authors at Temple’s Center for Data Analytics, we used spectral clustering to examine the relationship between causes of action.  Two years later and presto, we’ve a (draft) paper is up on SSRN!  The ungainly title is Building a Taxonomy of Litigation: Clusters of Causes of Action in Federal Complaints. I welcome your comments, and your suggestions for a better title. Follow me after the jump for an exploration of our findings.

The figure below lays out a basic descriptive picture of the types of causes of action in our data.  As you can see, almost one in three causes of action in federal court sounds in tort.  Contract claims are the second most common legal theory advanced.  (For more on the details of coding, including a discussion of the troublesome “bare claims for relief,” you’ll have to read the paper.)

Another cool descriptive question we can ask concerns the pairing of causes of action with one another. The Figure below depicts the most common pairs of causes of action within the data.  For example, if a case had three causes of action, A/B/C, we identified three pairs: (1) cause of action A with cause of action B;   (2) A – C; and (3) B -C.  The Figure depicts those pairings for all the causes of action in the data (gray bars) as well as those with cases with 10 or fewer causes of action.

This shows that the most common pairing is tort claim paired with tort claim. Nearly as common is tort claim paired with contract claim, and tort claim with contract claim.

That Figure then helps to set the stage for the following figure, which is the last for this post.  In it, the nodes (red dots) represent the spatial distribution of causes of action, with the node’s relative size indicating the frequency of the cause of action in the data. The edges (gray lines) depict the relationship between causes of action, with stronger co-occurrences represented with thicker lines.

The Figure illustrates the close relationship between certain kinds of commercial causes of action – contract / tort / fraud -and the relative isolation of others – like tax, which stands alone. But to really understand how causes of action relate to one another, and what those relationships can tell us about attorney strategy, we need to dig into clustering analysis.  I’ll save that for another post. In the meantime, enjoy the paper!

You may also like...

4 Responses

  1. Sasha says:


    Would it be useful at all to understand how the relationships change over time? e.g. produce the last figure using data from individual years, then animate it to show changes over time?

  2. Dave Hoffman says:

    Useful? Yes! Fantastically fun to try? Yes! But we might not have enough data over multiple years in this set. Negotiating to get more – and then need to find funding to have it coded. Boo.

  3. Kyle says:

    Dave: This is fascinating stuff. As a follow-up study – not to suggest that you haven’t done enough already – it might be interesting to chart the mortality / disposition of various causes of action, when situated in different pairings / couplings. In this vein, one might identify whether, in particular contexts, certain causes of action are essentially makeweight tag-alongs, which never provide the grounds for relief for the plaintiff, while others are virtually always in the case to the end, and provide the backbone for the suit. (It’s been my longtime supposition, for example, that relatively few cases that append an IIED claim to other causes of action ultimately lead to a jury finding of IIED, and an award of damages on that basis.)

  4. Dave Hoffman says:

    Yup, we’re on it! (See the third promised post of this series, when I’ve written it, for details.)