On Reverse Engineering Privacy Law

Michael Birnhack, a professor at Tel Aviv University Faculty of Law, is one of the leading thinkers about privacy and data protection today (for some of his previous work see here and here and here; he’s also written a deep, thoughtful, innovative book in Hebrew about the theory of privacy. See here). In a new article, Reverse Engineering Informational Privacy Law, which is about to be published in the Yale Journal of Law & Technology, Birnhack sets out to unearth the technological underpinnings of the EU Data Protection Directive (DPD). The DPD, enacted in 1995 and currently undergoing a process of thorough review, is surely the most influential legal instrument concerning data privacy all over the world. It has been heralded by proponents as “technology neutral” – a recipe for longevity in a world marked by rapid technological change. Alas, Birnhack unveils the highly technology-specific fundamentals of the DPD, thereby putting into doubt its continued relevance.


The first part of Birnhack’s article analyzes what technological neutrality of a legal framework means and why it’s a sought after trait. He posits that the idea behind it is simple: “the law should not name, specify or describe a particular technology, but rather speak in broader terms that can encompass more than one technology and hopefully, would cover future technologies that are not yet known at the time of legislation.” One big advantage is flexibility (the law can apply to a broad, continuously shifting set of technologies); consider the continued viability of the tech-neutral Fourth Amendment versus the obviously archaic nature of the tech-specific ECPA . Another advantage is the promotion of innovation; tech-specific legislation can lock-in a specific technology thereby stifling innovation.


Birnhack continues by creating a typology of tech-related legislation. He examines factors such as whether the law regulates technology as a means or as an end; whether it actively promotes, passively permits or directly restricts technology; at which level of abstraction it relates to technology; and who is put in charge of regulation. Throughout the discussion, Birnhack’s broad, rich expertise in everything law and technology is evident; his examples range from copyright and patent law to nuclear non-proliferation.

The article in its entirety is well worth reading, but I’ll focus here on just its last part, where Birnhack sets out to “reverse engineer” the DPD, revealing its hidden technological agenda. Whether the DPD is tech-neutral or not is a very practical concern these days, as the EU Parliament and Council contemplate a comprehensive reform proposal submitted in January by the European Commission. The reform proposal, for the most part, continues to rely on the same fundamental concepts as the DPD. This is based on the assumption that the DPD has withstood the test of time. Consider the opinion of the group of EU regulators administering the DPD, stating: “Directive 95/46/EC has stood well the influx of these technological developments because it holds principles and uses concepts that are not only sound but also technologically neutral. Such principles and concepts remain equally relevant, valid and applicable in today’s networked world.”


To test this, Birnhack analyzes the key constructs of the DPD. First, he looks at the most basic building block of all – the definition of “personal data” (aka PII in the US). For many years, the European concept of “personal data”, which is content-neutral and based on identifiability of individual “data subjects”, seemed like a success. The definition – “any information relating to an identified or identifiable individual”, proved adaptable to a digital reality where aggregation of innocuous facts could combine to a privacy invasive profile. Unlike the US sector-based approach, which protected certain categories of information – about health (HIPAA), financials (GLBA), credit history (FCRA), video rentals (VPPA), children (COPPA), the European model triggered privacy protection whenever any type of data concerning an “identified or identifiable individual” was implicated.


Of course, if identifiability subjects data to the privacy framework, then lack of identifiability extricates data from the same obligations. Anonymization or de-identification were thus perceived as a silver bullet, allowing organizations to “have the cake and eat it too”, that is to retain information, repurpose and analyze it while at the same time preserving individuals’ privacy. Yet, over the past decade it became clear that in a world of big data collection, storage and analysis, de-identification is becoming increasingly strained by shrewd re-identification techniques applied by clever adversaries. Today, examples of re-identification of apparently de-identified data abound. It was Paul Ohm who drew on the computer science literature to finally “blow the whistle” on de-identification.


This means that the scope of the DPD becomes either overbroad, potentially encompassing every bit and byte of information, ostensibly not about individuals; or overly narrow, excluding de-identified information, which could be re-identified with relative ease. Indeed, it now appears that the DPD concept of personal data is outmoded, ill suited to deal with big data realities. As Birnhack writes: “the definition of personal data is rooted within a digital technological paradigm, for good or for bad. The good part is that it is more advanced than the previous, analogue, content-based definition; the bad part is that the concept of non-identification is about to collapse, if it has not already collapsed.”


The definition of personal data has become deficient not only in its perception of de-identification but also in its view of personal data as a static concept, referring to “an individual”. This notion of data, sometimes referred to as “microdata”, fails to account for the fact that data, which are apparently not about “an individual”, such as the social grid or stylometry (analysis of writing style), may have a profound privacy impact. Clearly, new thinking is needed with respect to the definition of personal data. Unfortunately, more advanced notions, which have gained credence in the scientific community, such as differential privacy and privacy enhancing technologies (PETs), have largely been left out of the debate.


An additional fundamental concept of the DPD is that of data “processing”, defined to mean “any operation or set of operations which is performed upon personal data … such as collection, recording, organization, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, blocking, erasure or destruction”. Here, Birnhack points out the clear linearity of the concept, viewing data as being collected by an organization and then flowing through its systems (storage, retrieval, use…) until finally being put to rest (erasure or destruction). As Birnhack explains, while apparently tech-neutral, “the linear sequence assumes a particular technological environment.”


The DPD’s vision of a data “processing” is – how shall I put it – very “1970s”. In those days, an active “data controller” would collect data from a passive individual and then store, use or transfer them until their ultimate deletion. Today, with the explosion of peer-produced content on social networking services as well as the introduction of layer upon layer of service providers into the data value chain, this linear model has, in many contexts, become obsolete. Privacy risks are now posed by an indefinite number of geographically dispersed actors, not least individuals themselves, who voluntarily share their own information and that of their friends and relatives. In addition, in many contexts, such as mobile applications, it is not necessarily the controller, but rather an intermediary or platform provider, that wields control over information.


An additional fundamental concept of the DPD, not directly addressed by Birnhack’s paper is that of location. The DPD views data transfers as discrete point-to-point transactions occurring between two data controllers. This view of data as “residing” in a jurisdiction no longer fits the ephemeral, geographical indeterminate nature of cloud storage and transfers. For many years, transborder data flow regulation has caused much consternation to businesses on both sides of the Atlantic, while generating formidable legal fees. Unfortunately, this does not seem about to change.


You may also like...

1 Response

  1. A.J. Sutter says:

    Thanks for this interesting post. There’s a passage in the Dwork paper that might help non-specialists like me understand the problem:

    A 1977 paper of Dalenius articulated a desideratum that foreshadows for databases the notion of semantic security defined five years later by Goldwasser and Micali for cryptosystems: access to a statistical database should not enable one to learn anything about an individual that could not be learned without access. We show this type of privacy cannot be achieved. The obstacle is in auxiliary information, that is, information available to the adversary other than from access to the statistical database, and the intuition behind the proof of impossibility is captured by the following example. Suppose one’s exact height were considered a highly sensitive piece of information, and that revealing the exact height of an individual were a privacy breach. Assume that the database yields the average heights of women of different nationalities. An adversary who has access to the statistical database and the auxiliary information “Terry Gross is two inches shorter than the average Lithuanian woman” learns Terry Gross’ height, while anyone learning only the auxiliary information, without access to the average heights, learns relatively little.

    … [This impossibility result] applies regardless of whether or not Terry Gross is in the database and … leads naturally to a new approach to formulating privacy goals: the risk to one’s privacy, or in general, any type of risk, such as the risk of being denied automobile insurance, should not substantially increase as a result of participating in a statistical database. This is captured by differential privacy. [Emphasis in original; cites and footnotes omitted]

    I think there are two, well, risks with this approach. One is in the word “risk”: how is it estimated? And the other is in the word “differential” (cf. “marginal” utility): by focusing on reducing the differential/marginal risk, we may ignore the problem in some cases that the underlying (i.e., total) risk is large.