Alessandro Acquisti, Sasha Romanosky, and I have a new draft up on SSRN, Empirical Analysis of Data Breach Litigation.  Sasha, who’s really led the charge on this paper, has presented it at many venues, but this draft is much improved (and is the first public version).  From the abstract:

In recent years, a large number of data breaches have resulted in lawsuits in which individuals seek redress for alleged harm resulting from an organization losing or compromising their personal information. Currently, however, very little is known about those lawsuits. Which types of breaches are litigated, which are not? Which lawsuits settle, or are dismissed? Using a unique database of manually-collected lawsuits from PACER, we analyze the court dockets of over 230 federal data breach lawsuits from 2000 to 2010. We use binary outcome regressions to investigate two research questions: Which data breaches are being litigated in federal court? Which data breach lawsuits are settling? Our results suggest that the odds of a firm being sued in federal court are 3.5 times greater when individuals suffer financial harm, but over 6 times lower when the firm provides free credit monitoring following the breach. We also find that defendants settle 30% more often when plaintiffs allege financial loss from a data breach, or when faced with a certified class action suit. While the compromise of financial information appears to lead to more federal litigation, it does not seem to increase a plaintiff’s chance of a settlement. Instead, compromise of medical information is more strongly correlated with settlement.

1.  The paper is the most comprehensive look at privacy litigation to date, but it clearly isn’t complete.  The biggest missing piece of data for me is unobserved state court lawsuits.  The paper justifies this missing data (1) by noting that we’re attempting to evaluate federal policy solutions, and (2) by suggesting that it’s quite unlikely that data-loss class action suits will remain in state court after CAFA.  If you know of recent counter-examples to that second point, please let us know.  We also undertake several kinds of statistical controls potential missing data, and believe our results are fairly robust. However, we’re eager to hear critiques.

2.  An innovation in the paper is our use of the DataLossDB clearinghouse to provide a comparison set — i.e., the denominator of breaches against which the numerator (number/type of suits) is compared.  We also collect information about each breach from outside of that database — Sasha and several research assistants funded by Temple and CMU were indefatigable in tracking this stuff down.  In my view one cool finding is the powerful lawsuit-depressant effect generated when firms provide free credit monitoring post-breach. This seems to have analogies in  the apology literature, though obviously it isn’t quite as cheap for firms to accomplish.

Figure 2: Reported Breaches from DataLossDB versus Known Lawsuits


3.  I was also surprised by the secular downward trend in breaches.  Consider Figure 2 from the paper, to the right.  As it illustrates, the number of breaches has declined in the last three years (from a high in 2008).  What explains this trend, which is in tension with the conventional wisdom about the increasing threat that data and identity loss pose consumers?  The paper suggests that perhaps “US data breach disclosure laws have, indeed, forced firms to internalize more of the cost of a breach, inducing them the invest more to protect personal information, and reducing the number of actual data breaches. This claim is partially substantiated by Verizon showing a reduction in data breaches observed on their computing networks. ” (Cites omitted).  If you have other ideas, or would like to fight the hypothetical, be our guest!

4.  As with most of my recent work, we focused heavily here on which factors predicted settlement, since that disposition dominated resolution of our lawsuit set.  Indeed, over half of our non-pending private suits terminated by settlement, which we might think of as a win for the plaintiffs.  The major finding here is not counterintuitive: cases which survive to class certification settle at significantly higher rates.  We have some information about the content of these settlements, and are eager to learn more about them (i.e., if you are an attorney in a class action privacy litigation, we want to talk with you).

Thoughts? Comments? Downloads?

