Big Data for All

Much has been written over the past couple of years about “big data” (See, for example, here and here and here). In a new article, Big Data for All: Privacy and User Control in the Age of Analytics, which will be published in the Northwestern Journal of Technology and Intellectual Property, Jules Polonetsky and I try to reconcile the inherent tension between big data business models and individual privacy rights. We argue that going forward, organizations should provide individuals with practical, easy to use access to their information, so they can become active participants in the data economy. In addition, organizations should be required to be transparent about the decisional criteria underlying their data processing activities.

The term “big data” refers to advances in data mining and the massive increase in computing power and data storage capacity, which have expanded by orders of magnitude the scope of information available for organizations. Data are now available for analysis in raw form, escaping the confines of structured databases and enhancing researchers’ abilities to identify correlations and conceive of new, unanticipated uses for existing information. In addition, the increasing number of people, devices, and sensors that are now connected by digital networks has revolutionized the ability to generate, communicate, share, and access data.

Data creates enormous value for the world economy, driving innovation, productivity, efficiency and growth. In the article, we flesh out some compelling use cases for big data analysis. Consider, for example, a group of medical researchers who were able to parse out a harmful side effect of a combination of medications, which were used daily by millions of Americans, by analyzing massive amounts of online search queries. Or scientists who analyze mobile phone communications to better understand the needs of people who live in settlements or slums in developing countries.

At the same time, the “data deluge” presents formidable privacy concerns. Protecting privacy become harder as information is multiplied and shared ever more widely among multiple parties around the world. As more information regarding individuals’ health, financials, location, electricity use and online activity percolates, concerns arise about profiling, tracking, discrimination, exclusion, government surveillance and loss of control. From a more technical legal angle, big data challenges some of the most fundamental concepts of privacy law, including the definition of “personally identifiable information”, the role of individual control, and the principles of data minimization and purpose limitation.

In our article, we make the case for providing individuals with usable access to their data. The call for transparency is not new, of course. Rather the emphasis is on access to data in usable format, which can work to create value to individuals. Transparency and access alone have not emerged as potent tools because individuals do not care for, and cannot afford to indulge in transparency and access for their own sake (see one oft-cited counterexample here). The enabler of transparency and access is the ability to use the information and benefit from it in a tangible way. This will be achieved through “featurization” or “app-ification” of privacy. Organizations should build as many dials and levers as needed for individuals to engage with their data.

We expect that “featurization” of big data, harnessing its immense force for not only organizational but also individual benefit, will unleash a wave of innovation and create a market for personal data applications. The technological groundwork has already been completed with mash-ups and real-time APIs making it easier for organizations to combine information from different sources and services into a single user experience. Regardless of lingering questions concerning who – if anyone – “owns” the information, we think that fairness dictates that individuals enjoy beneficial use of the data about them.

Our second proposal would require organizations to disclose the decisional criteria underpinning their data analytics machinery. In a big data world, it is often not the data but rather the inferences drawn from them that give cause for concern. Inaccurate, manipulative or discriminatory conclusions may be drawn from perfectly innocuous, accurate data. Much like in quantum physics, the observer in big data analysis can affect the results of her research by defining the data set, proposing a hypothesis or writing an algorithm. At the end of the day, big data analysis is an interpretative process, in which one’s identity and perspective informs one’s results. Like any interpretative process, it is subject to error, inaccuracy and bias. Louis Brandeis, who together with Samuel Warren “invented” the legal right to privacy in 1890, has also written that “[s]unlight is said to be the best of disinfectants”. We trust if the existence and uses of databases were visible to the public, organizations would be more likely to avoid unethical or socially unacceptable uses of data.

 

You may also like...

3 Responses

  1. Ben Isaacson says:

    As a privacy practitioner at a big data company, I have to disagree with a ‘one size fits all’ approach to ‘practical access’. In the vast majority of cases, access is neither practical nor relevant to the vast majority of users. There have been tools and apps to access and provide insight with online data for years (eg;@Bluekai), and the recent DAA adchoices initiative (which Jules co-founded) has made it clear that very few people really want to know what goes on behind the scenes with their online interest data-but simply need some comfort where they have the option to turn off something irrelevant.

    I’m all for access in the right context(like credit reporting), but think your focus should not be on ‘practical’ access for all but rather on ‘contextual’ access for some. Finally, as we’ve seen in the online ad space–the real issue (at least in the short term) should not be about access but rather meaningful and practical choices. If the context and relevance of the algorithm is skewed, the user needs a simple mechanism to turn it off, and help inform the analytics engine to stop others from receiving the same mistake. User choices will drive changes to the algorithm, not simply ‘practical access’.

  2. A.J. Sutter says:

    @ “Data creates enormous value for the world economy, driving innovation, productivity, efficiency and growth.”: How can you be sure that this “value” is not erased by the impact on privacy, the plague of online advertising, “filter bubble” effects, etc.? Or is “value” a euphemism for money?

    @ “Regardless of lingering questions concerning who – if anyone – ‘owns’ the information, we think that fairness dictates that individuals enjoy beneficial use of the data about them.” Unlike beneficial ownership, which has exclusionary rights, “beneficial use” doesn’t, at least judging from Black’s. Why is this, and not beneficial ownership, adequate?

    @ “We trust if the existence and uses of databases were visible to the public, organizations would be more likely to avoid unethical or socially unacceptable uses of data.” This is a page out of Milton Friedman’s corporate-funded Capitalism and Freedom, that disclosure will set in motion an invisible hand to restrain bad actors. Why not use more direct methods and prohibitions?

  3. Frank says:

    I applaud the proposal to require the disclosure of “the decisional criteria underpinning their data analytics machinery.” Your point that “the observer in big data analysis can affect the results of her research by defining the data set, proposing a hypothesis or writing an algorithm” is also very important. On a related note, this interview demonstrates that even academics have unfortunate incentives:

    http://www.econtalk.org/archives/2012/09/nosek_on_truth.html